License scanning of CycloneDX files
DETAILS: Tier: Ultimate Offering: GitLab.com, Self-managed, GitLab Dedicated
- Introduced in GitLab 15.9 for GitLab SaaS with two flags named
license_scanning_sbom_scanner
andpackage_metadata_synchronization
. Both flags disabled by default.- Generally available in GitLab 16.4. Feature flags
license_scanning_sbom_scanner
andpackage_metadata_synchronization
removed.- The legacy License Compliance analyzer (
License-Scanning.gitlab-ci.yml
) was removed in GitLab 17.0.
To detect the licenses in use, License Compliance relies on running the Dependency Scanning CI Jobs, and analyzing the CycloneDX Software Bill of Materials (SBOM) generated by those jobs. This method of scanning is capable of parsing and identifying over 500 different types of licenses, as defined in the SPDX list. Third-party scanners may be used to generate the list of dependencies, as long as they produce a CycloneDX report artifact for one of our supported languages and follow the GitLab CycloneDX property taxonomy. Note that it is not yet possible to use a CI report artifact as a source of data for license information, and licenses that are not in the SPDX list are reported as "Unknown". The ability to provide other licenses is tracked in epic 10861.
NOTE: The License Scanning feature relies on publicly available package metadata collected in an external database and synced with the GitLab instance automatically. This database is a multi-region Google Cloud Storage bucket hosted in the United States. The scan is executed exclusively within the GitLab instance. No contextual information (for example, a list of project dependencies) is sent to the external service.
Configuration
To enable License scanning of CycloneDX files:
- Using the Dependency Scanning template
- Enable Dependency Scanning and ensure that its prerequisites are met.
- On GitLab self-managed only, you can choose package registry metadata to synchronize in the Admin area for the GitLab instance. For this data synchronization to work, you must allow outbound network traffic from your GitLab instance to the domain
storage.googleapis.com
. If you have limited or no network connectivity then refer to the documentation section running in an offline environment for further guidance.
- Or use the CI/CD component for applicable package registries.
Supported languages and package managers
License scanning is supported for the following languages and package managers:
Language | Package Manager | Dependency Scanning Template | CI/CD Component |
---|---|---|---|
.NET | NuGet | Yes | No |
C# | Yes | No | |
C | Conan | Yes | No |
C++ | Yes | No | |
Go | Go | Yes | No |
Java | Gradle | Yes | No |
Maven | Yes | No | |
Android | Yes | Yes | |
JavaScript and TypeScript | npm | Yes | No |
pnpm | Yes | No | |
yarn | Yes | No | |
PHP | Composer | Yes | No |
Python | setuptools | Yes | No |
pip | Yes | No | |
Pipenv | Yes | No | |
Poetry | Yes | No | |
Ruby | Bundler | Yes | No |
Scala | sbt | Yes | No |
Rust | cargo | No | Yes |
The supported files and versions are the ones supported by Dependency Scanning.
Data sources
License information for supported packages is obtained from the sources below. GitLab does additional processing on the original data, which includes mapping variations to the canonical license names.
Package manager | Source |
---|---|
Cargo | https://deps.dev/ |
Conan | https://github.com/conan-io/conan-center-index |
Go | https://index.golang.org/ |
Maven | https://storage.googleapis.com/maven-central |
npm | https://deps.dev/ |
NuGet | https://api.nuget.org/v3/catalog0/index.json |
Packagist | https://packagist.org/packages/list.json |
PyPI | https://warehouse.pypa.io/api-reference/bigquery-datasets.html |
Rubygems | https://rubygems.org/versions |
License expressions
The License Scanning of CycloneDX files does not support composite licenses. Adding this capability is tracked in issue 336878.
Blocking merge requests based on detected licenses
Users can require approval for merge requests based on the licenses that are detected by configuring a license approval policy.
Running in an offline environment
For self-managed GitLab instances in an environment with limited, restricted, or intermittent access to external resources through the internet, some adjustments are required to successfully scan CycloneDX reports for licenses. For more information, see the offline quick start guide.
Troubleshooting
A CycloneDX file is not being scanned and appears to provide no results
Ensure that the CycloneDX file adheres to the CycloneDX JSON specification. This specification does not permit duplicate entries. Projects that contain multiple SBOM files should either report each SBOM file up as individual CI report artifacts or they should ensure that duplicates are removed if the SBOMs are merged as part of the CI pipeline.
You can validate CycloneDX SBOM files against the CycloneDX JSON specification
as follows:
$ docker run -it --rm -v "$PWD:/my-cyclonedx-sboms" -w /my-cyclonedx-sboms cyclonedx/cyclonedx-cli:latest cyclonedx validate --input-version v1_4 --input-file gl-sbom-all.cdx.json
Validating JSON BOM...
BOM validated successfully.
If the JSON BOM fails validation, for example, because there are duplicate components:
Validation failed: Found duplicates at the following index pairs: "(A, B), (C, D)"
#/properties/components/uniqueItems
This issue can be fixed by updating the CI template to use jq to remove the duplicate components from the gl-sbom-*.cdx.json
report by overriding the job definition that produces the duplicate components. For example, the following removes duplicate components from the gl-sbom-gem-bundler.cdx.json
report file produced by the gemnasium-dependency_scanning
job:
include:
- template: Jobs/Dependency-Scanning.gitlab-ci.yml
gemnasium-dependency_scanning:
after_script:
- apk update && apk add jq
- jq '.components |= unique' gl-sbom-gem-bundler.cdx.json > tmp.json && mv tmp.json gl-sbom-gem-bundler.cdx.json
Remove unused license data
License scanning changes (released in GitLab 15.9) required a significant amount of additional disk space to be available on the instances. This issue was resolved in GitLab 16.3 by the Reduce package metadata table on-disk footprint epic. But if your instance was running license scanning between GitLab 15.9 and 16.3, you may want to remove the unneeded data.
To remove the unneeded data:
-
Check if the package_metadata_synchronization feature flag is currently, or was previously enabled, and if so, disable it. Use Rails console to execute the following commands.
Feature.enabled?(:package_metadata_synchronization) && Feature.disable(:package_metadata_synchronization)
-
Check if there is deprecated data in the database:
PackageMetadata::PackageVersionLicense.count PackageMetadata::PackageVersion.count
-
If there is deprecated data in the database, remove it by running the following commands in order:
ActiveRecord::Base.connection.execute('SET statement_timeout TO 0') PackageMetadata::PackageVersionLicense.delete_all PackageMetadata::PackageVersion.delete_all