Vulnerability Correlation Overview

Overview

Vulnerability correlation is a critical feature in Trustify, designed to automatically link security advisories to the software packages listed in a Software Bill of Materials (SBOM). This process enables developers and security teams to quickly identify which components in their software supply chain are affected by known vulnerabilities. By ingesting data from various advisory sources and mapping it to a rich data model, Trustify provides a clear and actionable view of an application’s security posture.

This document provides a technical deep-dive into how the vulnerability correlation logic works, from data ingestion to the database schema and the query mechanisms that tie everything together.

Data Ingestion

Trustify’s correlation process begins with the ingestion of security advisories from multiple sources. Each source has its own format and conventions, and Trustify has dedicated loaders to parse and normalize this data.

Advisory Sources

Trustify primarily supports two industry-standard advisory formats:

  • Open Source Vulnerability (OSV): A format designed for describing vulnerabilities in open-source software. OSV advisories are typically package-centric, providing clear information about affected package ecosystems, names, and version ranges.
  • Common Security Advisory Framework (CSAF): A more comprehensive format often used by vendors to describe vulnerabilities in their products. CSAF advisories are product-centric and can describe complex relationships between products and the components they contain.

Ingestion Process

When an advisory is ingested, the following steps occur:

  1. Parsing: The appropriate loader (OsvLoader or CsafLoader) parses the advisory file.
  2. Advisory & Vulnerability Creation:
    • An Advisory entity is created in the database to represent the advisory document itself, storing metadata like its identifier, issuer, and publication date.
    • Vulnerability identifiers (e.g., CVEs) are extracted from the advisory. For each unique identifier, a Vulnerability entity is created.
    • A link is established between the Advisory and its associated Vulnerability entities.
  3. Status Record Creation: This is the core of the ingestion-side correlation logic. The loader extracts information about which products or packages are affected, fixed, or not affected by the vulnerability. This information is used to create purl_status and product_status records.

PURL and Product Status

  • purl_status: These records create a direct link between a vulnerability and a versioned package, identified by a Package URL (PURL). For OSV advisories, this is not complex, as it is possible to map package field to PURL and version range information. The loader parses these ranges and creates purl_status entries with a status (e.g., “affected”) and the corresponding version range. This is also used for CSAF statuses which define PURL identities for components.
  • product_status: These records are used for more abstract, product-level statuses, which are common in CSAF advisories. A CSAF advisory might state that “Red Hat Enterprise Linux 9” is affected. The CsafLoader uses a StatusCreator to resolve these product definitions, trace their relationships to underlying components (which may be identified by PURLs or CPEs), and create product_status records.
  • If the advisory provides a CPE for the product context, the StatusCreator captures this and stores it in the context_cpe_id field of the status record (both pul and product status). This allows Trustify to handle both package-centric and product-centric advisory data with the correct context. OSV advisories do not typically define a product CPE context.

Database Schema

The correlation logic relies on a well-defined set of tables that model the relationships between advisories, vulnerabilities, and software components.

Core Entities

  • advisory: Stores metadata about each ingested advisory.
  • vulnerability: Stores information about each unique vulnerability (e.g., CVE).
  • advisory_vulnerability: A join table linking advisories to the vulnerabilities they describe.
  • base_purl, versioned_purl, qualified_purl: These tables work together to store and normalize Package URLs, separating the version-independent parts from the versioned parts.
  • version_range: Stores version range information, including the versioning scheme (e.g., SemVer, Maven).
  • purl_status: The central table for package-centric correlation. It links an advisory and vulnerability to a base_purl and a version_range, with a specific status (e.g., “affected”, “fixed”). It also contains an optional context_cpe_id to link the status to a specific product context.
  • product_status: The central table for product-centric correlation. It links an advisory and vulnerability to a product and a version_range, with a specific status. It also contains an optional context_cpe_id to link the status to a specific product context.

Entity Relationships

The schema is designed to efficiently query for vulnerabilities. A purl_status record effectively creates a tuple of (Advisory, Vulnerability, Status, Base PURL, Version Range), which directly answers the question of whether a package is affected by a vulnerability according to a specific advisory.

Correlation Logic (The Query Side)

With the data ingested and stored, the final piece of the puzzle is querying it to determine the vulnerability status of an SBOM’s components.

The Goal

The primary goal of the query-side logic is to answer the question: “Given a list of PURLs from an SBOM, what are all the known vulnerabilities that affect them?”

This logic is primarily located in the modules/fundamental/src/purl/model/details/purl.rs file, within the PurlDetails::from_entity function.

The PurlDetails::from_entity Function

This function is the heart of the correlation engine. When asked for the details of a specific PURL (including its version), it performs the following steps:

  1. Check for SBOM CPE Context: The logic first determines if the SBOM containing the PURL describes a specific product by checking if it is associated with a CPE.
  2. Query purl_status: It initiates a query on the purl_status table, filtering for records that match the base_purl_id of the package in question.
  3. Apply CPE Context Filter (if applicable): If the SBOM has a CPE context, the query is further filtered. It will only return purl_status records where the context_cpe_id either matches the SBOM’s CPE or is NULL. This ensures that only advisories relevant to the SBOM’s specific product context are considered. If the SBOM has no CPE context, this filtering step is skipped.
  4. Version Matching: The most critical step is the version check. The query uses a custom database function, VersionMatches, which takes the package’s version as input and compares it against the version_range stored for each purl_status record. This function understands different versioning schemes (like SemVer) and can correctly determine if a version falls within an affected range (e.g., >=1.2.0, <1.3.0). Only the statuses with matching version ranges are returned.
  5. Query product_status: To account for product-centric advisories, a second, more complex query is executed:
    • First, it finds all SBOMs that are known to contain the PURL being queried.
    • Then, it finds all product_status records that are associated with those SBOMs (via a link from product_version to sbom_id).
    • This query is also filtered by the SBOM’s CPE context, if one exists.
    • This effectively bridges the gap, allowing a PURL to inherit the vulnerability status of the product it belongs to, but only when the context matches.
  6. Data Aggregation: The results from both the purl_status and product_status queries are collected and aggregated. They are grouped by advisory to provide a clean, comprehensive list of all advisories that affect the given PURL.

Putting It All Together (Example Workflow)

Here is a concrete example using a real-world CSAF advisory and a corresponding SBOM.

The Data

  • CSAF Advisory (cve-2023-0044.json): A Red Hat advisory for CVE-2023-0044. It states that the component io.quarkus/quarkus-vertx-http is known_affected within the product “Red Hat build of Quarkus”, which has a CPE of cpe:/a:redhat:quarkus:2.
  • SBOM (quarkus-bom-2.13.8.Final-redhat-00004.json): An SPDX SBOM for the “quarkus-bom”. This SBOM describes the product it belongs to with the CPE cpe:/a:redhat:quarkus:2.13::el8 and lists pkg:maven/io.quarkus/quarkus-vertx-http@2.13.8.Final-redhat-00004 as one of its packages.

The Process

  1. Ingestion: The Red Hat CSAF advisory for CVE-2023-0044 is ingested into Trustify.
    • An Advisory entity is created for the document CVE-2023-0044.
    • A Vulnerability entity is created for CVE-2023-0044.
    • The CsafLoader processes the product_tree and vulnerabilities sections. It creates a product_status record for the component io.quarkus/quarkus-vertx-http with a status of known_affected. This status record is linked to the product context cpe:/a:redhat:quarkus:2.
  2. SBOM Analysis: The quarkus-bom SBOM is uploaded and processed.
    • Trustify parses the SBOM and stores all the packages, including pkg:maven/io.quarkus/quarkus-vertx-http@2.13.8.Final-redhat-00004.
    • It also records that this SBOM describes the product identified by cpe:/a:redhat:quarkus:2.13::el8.
  3. Correlation Query: A user requests the vulnerability status for the uploaded SBOM. For each package in the SBOM, Trustify’s correlation logic runs. When it gets to pkg:maven/io.quarkus/quarkus-vertx-http@2.13.8.Final-redhat-00004, the following happens:
    • The PurlDetails::from_entity function is called.
    • The logic identifies that the SBOM has a CPE context: cpe:/a:redhat:quarkus:2.13::el8.
    • The query against the product_status table finds the known_affected status for io.quarkus/quarkus-vertx-http related to CVE-2023-0044.
    • The query then filters this result based on the CPE context. It checks if the status’s context (cpe:/a:redhat:quarkus:2) is compatible with the SBOM’s context (cpe:/a:redhat:quarkus:2.13::el8). Since cpe:/a:redhat:quarkus:2.13 is a version of the product cpe:/a:redhat:quarkus:2, the contexts are compatible, and the status is considered a valid match.
    • The VersionMatches function would then be called to ensure the specific version 2.13.8.Final-redhat-00004 falls within the range specified by the advisory (in this specific CSAF, the range is implicit, but the logic still applies).
  4. Result: The user is shown that their SBOM is vulnerable to CVE-2023-0044. The details would indicate that the quarkus-vertx-http package is affected and that the vulnerability is relevant in the context of the “Red Hat build of Quarkus” product.