aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data

Introducing Open Source Insights Data In BigQuery To Help Secure Software Supply Chains

  • aster.cloud
  • May 26, 2022
  • 4 minute read

Today we’re announcing a new Google Cloud Dataset from Open Source Insights which will help developers better understand the structure and security of the software they use. This dataset provides access to critical software supply chain information for developers, maintainers and consumers of open-source software.

Your users rely not only on the code you write, but also on the code your code depends on, the code that code depends on, and so on. This web of dependencies forms a dependency graph, and while each node in the graph brings useful functionality to your project, they may also introduce security vulnerabilities, licensing issues, or other surprises, as recent events like the log4j issue demonstrated. To understand your code, you must have an accurate view of its dependency graph.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

The Open Source Insights project scans millions of open-source packages from the npm, Go, Maven, PyPI, and Cargo ecosystems, computes their dependency graphs, and annotates those graphs with security advisories, license information, popularity metrics, and other metadata. The dataset is regularly updated, keeping it current and relevant while also providing a snapshotted view of change over time. Generated by resolving each package’s dependency constraints, this data provides precise, accurate, and actionable dependency graphs.

The rate of change in open-source packages is significant. Our analysis shows that roughly 15% of the packages in npm see changes to their dependency sets each day, and for 40,000 of those packages (2% of packages in npm) this results in a change to their license or advisory set. Keeping up with these changes is critical yet intractable without good tooling.

Read More  Healthcare Industry Trends 2022: Data Empowers Patients, Researchers, Partners

This new dataset allows anyone to use Google Cloud BigQuery to explore and analyze the dependencies, advisories, ownership, license and other metadata of open-source packages across supported ecosystems, and how this metadata has changed over time.

We are eagerly looking forward to seeing how this data will be used. Whether you’re a developer, security engineer, or researcher, you can use this public dataset to analyze components of your software supply chain, and integrate this information with your existing tools and pipelines.

How the Open Source Insights dataset works

We’re bringing Google’s mission to “organize the world’s information and make it universally accessible and useful” to open-source software. Open Source Insights examines each package in the packaging systems we cover, including npm, Go, Maven (Java), PyPI (Python), and Cargo (Rust) and more to come. A full, detailed graph of its dependencies and their properties is constructed and annotated with security advisory, license, owner, release information and other metadata, making a rich dataset covering entire package management language ecosystems.

The dataset is updated regularly, making this a valuable resource for tracking ecosystem level changes over time, analyzing the scope and impact of issues, or integrating into custom dashboards and build systems.

Getting started with the Open Source Insights dataset

To begin exploring these public dataset tables, you can look at the schema and try some sample queries, like the following examples. As with all other Google Cloud Datasets, users can obtain access without charges of up to 1TB/month in queries and up to 10GB/month in storage through BigQuery’s free tier. SQL queries above these thresholds are subject to regular BigQuery pricing. Users can also leverage the BigQuery sandbox to access BigQuery without the need to create a Google Cloud account or provide credit card information, subject to the sandbox’s limits and BigQuery’s free tier thresholds.

Read More  SAP Backup, The Blended Way

What are the most common licenses across each ecosystem?

We can aggregate the license data across packages within each dependency management system to get a list of the top three licenses per system. To do so we first find the newest snapshot in the dataset. Then within that snapshot we count the number of unique packages with at least one version using each license (multiple versions of a package are not double counted).

 

-- Find the most recent snapshot.
DECLARE
  Time TIMESTAMP DEFAULT (
  SELECT
    MAX(Time)
  FROM
    `bigquery-public-data.deps_dev_v1.Snapshots`);

WITH
  -- Compute the count of unique packages per system and license.
  Counts AS (
  SELECT
    System,
    License,
    COUNT(DISTINCT Name) AS NPackages
  FROM
    `bigquery-public-data.deps_dev_v1.PackageVersions`
  CROSS JOIN
    UNNEST(Licenses) AS License
  WHERE
    SnapshotAt = Time
  GROUP BY
    System,
    License),
  -- Compute a rank for each license within its system
  Ranked AS (
  SELECT
    System,
    License,
    NPackages,
    ROW_NUMBER() OVER (PARTITION BY System ORDER BY NPackages DESC ) AS LicenseRank
  FROM
    Counts)

-- Finally output the top 3 per system.
SELECT
  System,
  License,
  NPackages
FROM
  Ranked
WHERE
  LicenseRank <= 3
ORDER BY
  System,
  LicenseRank;

 

What are the most depended upon package versions?

We can use the dependency graphs to identify the most depended upon package versions in the cargo ecosystem. To do so, we filter all packages and available versions for just the release with the highest semantic version per package. We then sum the number of these highest release versions that depend on each version.

 

-- The dependency management system whose packages we will query. 
DECLARE
  Sys STRING DEFAULT 'CARGO';

-- Find the most recent snapshot.
DECLARE
  Time TIMESTAMP DEFAULT (
  SELECT
    MAX(Time)
  FROM
    `bigquery-public-data.deps_dev_v1.Snapshots`);

WITH
  -- Select just the package-versions that are considered releases
  -- in the system of interest.
  Releases AS (
  SELECT
    Name,
    Version,
    VersionInfo
  FROM
    `bigquery-public-data.deps_dev_v1.PackageVersions`
  WHERE
    SnapshotAt = Time
    AND VersionInfo.IsRelease
    AND System = Sys),
  -- For each package, find its release with the highest version number.
  HighestReleases AS (
  SELECT
    Name,
    Version
  FROM (
    SELECT
      Name,
      Version,
      ROW_NUMBER() OVER (PARTITION BY Name ORDER BY VersionInfo.Ordinal DESC) AS RowNumber
    FROM
      Releases)
  WHERE
    RowNumber = 1)

-- Finally compute the number of dependents per package-version and
-- rank package-versions by this count in descending order.
SELECT
  D.Dependency.Name,
  D.Dependency.Version,
  COUNT(*) AS NDependents
FROM
  `bigquery-public-data.deps_dev_v1.Dependencies` AS D
JOIN
  HighestReleases AS H
ON
  H.Name = D.Name
  AND H.Version = D.Version
WHERE
  D.SnapshotAt = Time
  AND D.System = Sys
GROUP BY
  D.Dependency.Name,
  D.Dependency.Version
ORDER BY
  NDependents DESC
LIMIT
  10;

 

Read More  Establish Consistent Business Language And Improve Trust In Your Data

What’s next for software supply chain security?

We hope this dataset will make it easier for developers to learn more fundamental information about their dependencies. You can also explore the Open Source Insights website for the latest open-source software insights and visualizations, learn more about our open source security and software supply chain security solutions at the upcoming Google Cloud Security Summit on May 17.

 

 

By: Nicky Ringland (Product Manager) and James Wetter (Software Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Google Cloud
  • Open Source
  • Open Source Insights
  • Security
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
dotlah-smartnation-singapore-lawrence-wong
View Post
  • Data
  • Enterprise
  • Technology

Growth, community and trust the ‘building blocks’ as Singapore refreshes Smart Nation strategies: PM Wong

  • October 8, 2024
nobel-prize-popular-physics-prize-2024-figure1
View Post
  • Data
  • Featured
  • Technology

They Used Physics To Find Patterns In Information

  • October 8, 2024
goswifties_number-crunching_202405_wm
View Post
  • Data
  • Featured

Of Nuggets And Tenders. To Know Or Not To Know, Is Not The Question. How To Become, Is.

  • May 25, 2024
View Post
  • Data

Generative AI Could Offer A Faster Way To Test Theories Of How The Universe Works

  • March 17, 2024
Chess
View Post
  • Computing
  • Data
  • Platforms

Chess.com Boosts Performance, Cuts Response Times By 71% With Cloud SQL Enterprise Plus

  • March 12, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.