Generating SBOMS FROM FOSS_(Detecting OSS licences)

Generating SBOMS FROM FOSS
(Detecting OSS licences)
Thierry GAYET

The licence detection system can done just after the SAST (Static Application Security Testing) stage !
There are several tools for detecting and analyzing open source licenses in source code.
It is important to note that each tool has its own characteristics, advantages and limitations. Some are more
suited to a quick and cursory scan, while others can provide deeper, more detailed scans. The choice of tool will
depend on your specific needs and the complexity of the project you want to analyze.

https://www.cybeats.com/blog/top-7-sbom-generation-tools-and-how-to-choose

The facts
We all heard about log4j or SolarWinds.
This clearly means that it is not buzzword anymore and it become standard element of the
delivery chain.
https://www.ntia.gov/files/ntia/publications/ntia_sbom_tooling_taxonomy-2021mar30.pdf
It is time to start learn about SBOMs (Software Bill of Material) !

Early stage
I built some picture of current situation during my discussions with experts. We are in the
beginning of the road at this moment.
Yes, the approach, the tools are with us for some time, but the problem lies in the place where we
need to put all puzzles together and have a complete picture.

Software Bill of Materials
So, what is the SBOM? It is an abbreviation (and we, DevOps, we love abbreviations!), it means
Software Bill of Material.
SBOMs are kind of ascendants of BOMs - Bill of Materials. According to Wikipedia
A bill of materials or product structure (sometimes bill of material, BOM or associated list) is a list of
the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the
quantities of each needed to manufacture an end product. A BOM may be used for communication
between manufacturing partners or confined to a single manufacturing plant. A bill of materials is
often tied to a production order whose issuance may generate reservations for components in the
bill of materials that are in stock and requisitions for components that are not in stock.
SBOM is nothing else than BOM for software systems.
NTIA describes it as
A "Software Bill of Materials" (SBOM) is a nested inventory for software, a list of ingredients that
make up software components.
So, SBOM contains list of all dependencies, libraries, their licenses, collected from every element of
the system.

SBOMs and DevOps
Well, these two looks quite separated from each other. Nothing more misleading than that :)
First, let's take a look on SBOM classification
We have three main categories
● Produce - how SBOMs are created
● Consume - view, understand changes, generally - understand what is in the report :)
● Transform - transformation of the data from one or multiple reports into another type of file with preserving
the quality of data.
These three elements should give us a connection to DevOps processes. How to implement SBOMs in the chain?
How to differentiate CI from CD parts? How to preserve data and make the data usable?
In this series we will explore the tools and approaches. How to generate SBOMs and what we can do with them. I
am not an expert in this field, not yet, anyway. But as Security is one of pillars of DevOps (I see Security not Sharing
as 'S' in CALMS), I strongly influence it in our DevOps mindset.
Please, remember that these tools must have one very peculiar functionality. You must know what is unknown.
Another words, the software must mark packages, dependencies, vulnerabilitites, etc as uknown, if these cannot be
resolved. It is important, as you have to know what you have to do manually.
The series will have some theoretical parts too, it is kind of unavoidable :)
So, let's get started then!

Standards
Standard here is about the format of the report. The structure and data which are included. As far as I am
concerned, there are three most popular standards
● Software package data exchange - SPDX - open source, prepared as machine-readable format, originated in
Linux and curated by Linux Foundation
● CycloneDX - CDX - also open source and machine-readable format. Originated in OWASP community
● Software identification tags - SWID - as ISO related project, it is a commercial standard
SPDX and SWID are not new. Yes, the SBOM started to be visible between engineers and technologists (maybe
even going to its hype now), but these standards are on market from around 2010. CDX history started around
2017. It is a long time in IT world.
CDX seems to be more tailored to modern needs. One of the significant differences is that CDX contains also
information about vulnerabilities. Quite important is the support on the market. CDX is supported and surrounded
by many very well known organizations and projects, like RedHat, Aqua, Gitlab, Google, Bridgecrew, OWASP, to
mention only a few.
The CycloneDX model on the top level is constructed with 7 layers
● BOM metadata information
● Components information
● Services information
● Dependency relationships
● Compositions
● Vulnerabilities

The Software Package Data Exchange (SPDX)
An open standard describing SBOMs (Software Bill of Materials), communicating a release: name,
version, components, licenses, copyrights, and useful security references. As a common format, SPDX
reduces redundant work related to sharing important release data, thereby streamlining distribution and
compliance.
The SPDX specification is a freely available international open standard (ISO/IEC 5692:2021).
https://spdx.dev/
https://www.iso.org/fr/standard/81870.html
https://webstore.iec.ch/preview/info_isoiec5962%7Bed1.0%7Den.pdf

CycloneDX is designed to provide advanced supply
chain capabilities for cyber risk reduction.
Compatible with over 200 tools across 20+ programming languages, CycloneDX is trusted by Lockheed Martin, ServiceNow, IBM,
Contrast Security, Sonatype, and many others.
https://cyclonedx.org/guides/
https://cyclonedx.org/guides/sbom/
https://cyclonedx.org/guides/sbom/object-model/#the-anatomy-of-a-cyclonedx-bom

what is the difference between Software Composition Analysis (SCA) and SBOM? Both are analyzing the
dependencies!
Well, not exactly.
SCA is an automated process to identify open source components in code base and evaluate these components
against licenses, vulnerabilitites, security issues, etc. SBOM is the report generated by the SCA software. SBOM is
highly defined and structured document and not all SCA tools generate their reports in the acceptable as SBOM
way. These SBOMs are then compared with multiple up-to-dated databases to ensure the report quality.
So, in very simple way we can say that SCA is a tool where SBOM is a data. SBOM contains list of used components
with some specific information. But SBOM itself doesn't care how its generation was done.
We talk here about different aspects of one approach to ensure security and quality which can be applied in C-SCRM
in the organization.
I read a lot about it currently and I saw many reports where authors predict some numbers related to percentage of
companies which will be use SCA and SBOMs in next few years. Honestly, I do not believe in these predictions. The
awareness of these solutions is quite low now and implementation even lower. Yes, we are more aware of SCA, but
SBOM, which looks like natural extension to the process, is not very well known. That is my personal opinion :)
How the SBOM should look is described, as I mentioned, in quite strict way in standard. If you like to go deeper,
here is the link.
Let's take a look on three main perspectives (as it is called), where SBOM is very useful. The perspectives are
● Produce
● Choose
● Operate

Produce software
The company who creates the software, by attaching the SBOM to their package will gain not only
external (sell) benefits, but internal (development) as well. For example, by monitoring of the
vulnerabilities in used packages, by knowing potential end date of lofe of specific component used
in their software, by knowing all dependencies included in the code.
What is the benefit for external use? Well, simple - it gives the better possibilities to the partners to
know "what is inside" and also creates, let's call it, "better picture" of the seller. Another words -
"They know what they sell".
Below is a representation of areas of interest in Produce software perspective.

Choose software
Now, the ball flies to the second side of the field. The company interested in buying the product is
able to (also) verify vulnerabilities, control and be aware of lifetime of the used components,
control and understand licenses of the components, can target security analysis towards already
defined targets.

Operate software
Final phase is when software is about to be bought and then operated. In this perspective the
organization can use SBOM analysis as one of the decisive elements.
During the operation a phase SBOM can help with independent mitigations (so the organization
does not rely solely on the vendor), can better administer its assets and evaluate risks and usage.

SBOM Tools For C/C++
Although they’re two of the most widely used languages, finding an open source SBOM generator for C
and C++ can be tricky.
Due to the lack of an official or even dominating package manager for C/C++, the work for scanning a
project and recognizing dependencies is not trivial and therefore generally beyond the abilities of free
software.
There are a few package managers for C/C++ out there, though, and developers who use Conan are in
luck. Conan includes extensions to help you create an SBOM and Syft and Trivy also support C/C++
SBOMs via Conan.
If you’re using a different package manager or none at all, sorry to say, but at this point in time there’s no
great automated solution outside of commercial SCA products.

https://about.gitlab.com/blog/2022/10/25/the-ultimate-guide-to-sboms/#types-of-sbom-data-exchange-standards
https://github.blog/2023-03-28-introducing-self-service-sboms/

https://cyclonedx.org/capabilities/sbom/
https://csrc.nist.gov/projects/Software-Identification-SWID
https://spdx.dev/
https://www.reversinglabs.com/blog/5-best-practices-sboms-cicd
https://checkmarx.com/appsec-knowledge-hub/supply-chain-security/understanding-software-bill-of-materials-sbom-a-k
eystone-in-modern-application-security-and-compliance/
https://kondukto.io/blog/how-to-generate-and-audit-sbom-in-a-ci-cd-pipeline
https://github.com/microsoft/sbom-tool
https://sbom.observer/
https://www.cybeats.com/blog/top-7-sbom-generation-tools-and-how-to-choose
https://cyclonedx.org/tool-center/
https://snyk.io/fr/blog/building-sbom-open-source-supply-chain-security/
https://www.youtube.com/watch?v=Fu38YFXpiaA
https://docs-vsm.leanix.net/docs/step-1-generating-cyclonedx-software-bill-of-materials
https://www.techtarget.com/searchitoperations/tip/SBOMs-and-security-What-IT-and-DevOps-need-to-know
https://medium.com/@theowni/list/a-practical-approach-to-sbom-incicd-cbb79ab0fa38
https://medium.com/@theowni/a-practical-approach-to-sbom-in-ci-cd-f3ce8071c0fa
https://medium.com/itnext/a-practical-approach-to-sbom-in-ci-cd-part-ii-deploying-dependency-track-18fbb54d83b9
https://medium.com/itnext/a-practical-approach-to-sbom-in-ci-cd-part-iii-tracking-sboms-with-dependency-track-fb5621d
135ba
https://github.com/CycloneDX/cyclonedx-cli#convert-command :

FOSSology : FOSSology est un projet open source de la Linux Foundation qui fournit des fonctionnalités avancées pour
analyser le contenu des projets open source et détecter les licences utilisées. Il peut identifier les licences à partir de
fichiers individuels et effectuer des analyses approfondies des paquets logiciels.
FOSSology is a open source license compliance software system and toolkit. As a toolkit you can run license, copyright
and export control scans from the command line.
As a system, a database and web ui are provided to give you a compliance workflow.
License, copyright and export scanners are tools available to help with your compliance activities.
The source code will be uploaded and scanned for licenses, copyrights and ECC.
Licence : All of the FOSSology source code is licensed under the terms of the GNU General Public License version 2
(GPL-2.0), or under the terms of the GNU Lesser General Public License version 2.1 (LGPL-2.1)
Full Text GNU General Public License version 2 (GPL-2.0)
Full Text GNU Lesser General Public License version 2.1 (LGPL-2.1)

Official website : https://www.fossology.org/
Dockerfile : https://hub.docker.com/r/fossology/fossology/dockerfile/
Docker compose : https://github.com/fossology/fossology/blob/master/docker-compose.yml
Docker hub : https://hub.docker.com/layers/fossology/fossology/3.11.0/images/sha256-
5d5fad97708518bfd0f23495c5d01ee4cfc5f0a8aea73ea7a78b85ea63a20598?context=explore
https://www.fossology.org/get-started/faq/
https://www.fossology.org/get-started/
https://hub.docker.com/r/fossology/fossology/
https://github.com/fossology
https://github.com/fossology/fossology/wiki/Install-from-Source
https://github.com/fossology/LicenseDb
https://github.com/fossology/FOSSologyUI
https://hub.docker.com/r/fossology/fossology/

FOSSology, the open source tool which is categorized as Audit, Consume and Transform tool. The
project is part of Linux Foundation Project.
We will see how it works in the simplest installation, a docker container.
We have multiple options, starting with installation directly on our system, use single container, or
prepared docker-compose setup to add external database.
As we have docker-compose ready, it is very easy to move the installation to other environments,
like, for example, Kubernetes.
Let's run the container, then!
$ docker run -p 8081:80 fossology/fossology
Now we can enter the website, using http://localhost:8081/repo/ url. The login is simple, user and
password are fossy.

VCS
At the beginning we will try to scan some GitHub repo. I selected one, where I expect a lot of
actions done through time. Select Upload and the From Version Control System.

With the analytics options I go crazy, I select all :

And then we need to click Upload button, which is quite hidden in bottom left side.
Please notice, the UX now is not perfect. In the first moment I didn't know if action was successful
or not.
But, yes, it was :)
Let's go to Jobs and then My recent jobs. This shows the status of jobs. AS we can see on the screen
below, my jobs are still running. Very clean and clear information panel!

We can pause or stop jobs if we want

Scan is in progress, but we can look on part already collected. For example, License Browser

We see here very complex report. We can get even more info, by clicking on each element.

In the Browse menu we can decide what action should be taken.
AS we can see, FOSSology is also an "operational" tool. We can assign people to be the owners of
the report, we can change status of the project.

Reports
I've created unified report, which I downloaded from Jobs section. This report is generated as MS
Word file and is very detailed.

File from URL
As a next excercise, I use tfsec project downloaded directly from GitHub releases. I love tfsec as a
tool, so I use this opportunity to learn more about it :) We will see how many information the tool is
able to extract from executable.
This time, reports shows more data about licenses, etc. Don't get it wrong, it is not the FOSSology
things, simply speaking, more work in this area is done for thsec than for TF AWS provider :)

Duration of scan
Well, it depends on the type of scan. softwareHeritage can be very long. I mean, very, very long.
Generally, we have to prepare ourselves to give around 15 minutes to complete scan (except
SoftwareHeritage).
Of course it depends on the package type, size, etc, but it is definitely not something what can be
seamlessly added to the pipelines (however, this should be obvious).

Summary
Very useful tool. Helps to analyze data from multiple sources. Only one missing point for me is to
scan Docker containers.
The GUI is quite clean and useful. Everything is easy to be found, navigation is really easy. The one
exception is mentioned confirmation (or failure) of adding project to the application.
FOSSology is using external services for some checks, what might be problematic in many
situations.
User management is quite advanced for such tool.
API is available and each user can generate the personal token to use.
I like FOSSology and I think I will remember about it when it comes to SBOMs in pipelines.
The downside of this tool might be the file size limitation.
We can upload files to 700M, so majority of iso images is too big for this scanner (for example

Open source software is licensed using open source licenses.
There are many of open source licenses around and adding to that, open source software packages involve
sometimes multiple licenses for different files.
Atarashi provides different methods for scanning for license statements in open source software.
Unlike existing rule-based approaches - such as the Nomos license scanner from the FOSSology project - atarashi
implements multiple text statistics and information retrieval algorithms.
Anticipated advantages is an improved precision while offering an as easy as possible approach to add new license texts
or new license references.
Atarashi is designed to work stand-alone and with FOSSology.

$ pip3 install atarashi
$ git clone https://github.com/fossology/atarashi.git && pip3 install . && cd
atarashi/ && pip3 install -r requirements.txt && python3 setup.py build

docker pull fossology/atarashi:latest
docker run --rm -v <path/to/scan>:/project fossology/atarashi:latest <options> /project/<path/to/file>

https://fossology.github.io/atarashi
https://github.com/fossology/atarashi

ScanCode
ScanCode est un outil open source développé par la société NexB pour effectuer une analyse rapide des
licences open source dans un projet donné. Il fournit des rapports détaillés sur les licences détectées et
les attributs de licence des fichiers.
Simply speaking, it is a scanner and code analyser tool which allows to scan codebase for origins and
licenses. In general, ScanCode collects information about components and their licenses during the
Software Composition Analysis (SCA) process.
ScanCode is used by several projects and organizations such as the Eclipse Foundation,
OpenEmbedded.org, the FSFE, the FSF, OSS Review Toolkit, ClearlyDefined.io, RedHat Fabric8 analytics,
and many more.
ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory
open source and third-party packages used in your code. Sponsored by NLnet project
https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and
others generous sponsors!
Usage : path/to/scancode [OPTIONS] <OUTPUT FORMAT OPTION(s)> <SCAN INPUT>

Scancode command lines examples :
Scan a single file for copyrights. Print scan results to stdout as pretty JSON:
scancode --copyright samples/zlib/zlib.h --json-pp -
Scan a single file for licenses, print verbose progress to stderr as each file is scanned. Save scan to a JSON file:
scancode --license --verbose samples/zlib/zlib.h --json licenses.json
Scan a directory explicitly for licenses and copyrights. Redirect JSON scan results to a file:
scancode --license --copyright samples/zlib/ --json - > scan.json
Scan a directory while ignoring a single file. Scan for license, copyright and package manifests. Use four parallel
processes. Print scan results to stdout as pretty formatted JSON.
scancode -lc --package --ignore README --processes 4 --json-pp - samples/
Scan a directory while ignoring all files with .txt extension. Print scan results to stdout as pretty formatted JSON. It is
recommended to use quotes around glob patterns to prevent pattern expansion by the shell:
scancode --json-pp - --ignore "*.txt" samples/

Special characters supported in GLOB pattern:
- * matches everything
- ? matches any single character
- [seq] matches any character in seq
- [!seq] matches any character not in seq
For a literal match, wrap the meta-characters in brackets.
For example, '[?]' matches the character '?'.
For details on GLOB patterns see https://en.wikipedia.org/wiki/Glob_(programming).
Note: Glob patterns cannot be applied to path as strings.
For example, this will not ignore "samples/JGroups/licenses".
scancode --json - --ignore "samples*licenses" samples/

Scan a directory while ignoring multiple files (or glob patterns). Print the scan results to stdout as JSON:
scancode --json - --ignore README --ignore "*.txt" samples/
Scan a directory for licenses and copyrights. Save scan results to an HTML file:
scancode --license --copyright --html scancode_result.html samples/zlib
exemple de scan:
scancode -clpieu --json-pp $PWD/output.json ~/Workspace/rtl8188eus/
Setup plugins...
Collect file inventory...
Scan files for: info, packages, licenses, copyrights, emails, urls with 1 process(es)...
[####################] 1524
Scanning done.
Summary: info, packages, licenses, copyrights, emails, urls with 1 process(es)
Errors count: 0

Scan Speed: 0.68 files/sec. 86.19 KB/sec.
Initial counts: 785 resource(s): 762 file(s) and 23 directorie(s)
Final counts: 785 resource(s): 762 file(s) and 23 directorie(s) for 94.98 MB
Timings:
scan_start: 2023-08-29T171906.420323
scan_end: 2023-08-29T173757.913982
setup_scan:licenses: 2.91s
setup: 2.91s
scan: 1128.39s
output:json-pp: 0.27s
output: 0.27s
total: 1131.78s
Removing temporary files...done.

cat output.json
(...)
{
"path": "rtl8188eus/platform/platform_zte_zx296716_sdio.h",
"type": "file",
"name": "platform_zte_zx296716_sdio.h",
"base_name": "platform_zte_zx296716_sdio",
"extension": ".h",
"size": 989,
"date": "2023-01-30",
"sha1": "c5bcfcb4619c50995b2d3db9016f9d67e6b84a57",
"md5": "58a5d941421197d90cd031cad188e2a9",
"sha256": "87e7d049e0e600bb439f503ac79e896fe1084f5432dfccc897e6b6ccbf700036",

"mime_type": "text/x-c",
"file_type": "C source, ASCII text",
"programming_language": "C",
"is_binary": false,
"is_text": true,
"is_archive": false,
"is_media": false,
"is_source": true,
"is_script": false,
"package_data": [],
"for_packages": [],
"detected_license_expression": "gpl-2.0",
"detected_license_expression_spdx": "GPL-2.0-only",

"license_detections": [
{
"license_expression": "gpl-2.0",
"matches": [
{
"score": 100.0,
"start_line": 5,
"end_line": 12,
"matched_length": 69,
"match_coverage": 100.0,
"matcher": "2-aho",
"license_expression": "gpl-2.0",
"rule_identifier": "gpl-2.0_288.RULE",

"rule_relevance": 100,
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-
2.0_288.RULE"
}
],
"identifier": "gpl_2_0-7e9b2517-e984-f771-6e3c-b63d87082cb5"
}
],
"license_clues": [],
"percentage_of_license_text": 54.76,
"copyrights": [
{
"copyright": "Copyright (c) 2016 - 2018 Realtek Corporation",

"end_line": 3
}
],
"holders": [
{
"holder": "Realtek Corporation",
"start_line": 3,
"end_line": 3
}
],
"authors": [],
"emails": [],
"urls": [],
"files_count": 0,
"dirs_count": 0,
"size_count": 0,
"scan_errors": []
}
]
}%
(...)

docker build --tag scancode-toolkit --tag scancode-toolkit:$(git describe --tags) .
docker run -v $PWD/:/project scancode-toolkit -clipeu --json-pp /project/scan-result.json /project/apache-2.0.LICENSE

https://github.com/nexB/scancode-toolkit
https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#installation-via-docker
https://scancode-toolkit.readthedocs.io/en/latest/misc/faq.html
https://scancode-toolkit.readthedocs.io/en/latest/tutorials/how_to_run_a_scan.html
https://scancode-toolkit.readthedocs.io/en/latest/getting-started/install.html#source-code-install

Installation
ScanCode has a few option of installation, we cover here container version. When we clone the
repository
$ git clone https://github.com/nexB/scancode.io.git
we should create the .env file, using prepared Makefile
$ make envfile
This will create the file and secret. In my case it looks like this
$ cat .env
SECRET_KEY="Rxg+cZJQDOdinwXAPc/D2d2QyEODpl5xz4NJp5f/aSSDmf106a"
We are ready to build and run the tool.
In fact, we have docker-compose template in our disposal. This template contains a few elements
● db (postgres)
● redis
● web (app)
● worker
● nginx
Obviously, Nginx is our entry point. Behind it the app is working with workers (where the actual

Build and run
This part for all of us who knows docker is simple and clear. First, we need to build the containers
$ docker-compose build
In fact, only app container is processed in this step. However, the build process takes a lot of time.
And the image is quite huge
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
scancodeio_web latest a6a6380ef6f8 34 seconds ago 2.31GB
scancodeio_worker latest a6a6380ef6f8 34 seconds ago 2.31GB
When build is finished, we are ready to run our stack
$ docker-compose up -d
And here we have some issue (well, maybe "issue" is a little bit too big word). Compose exposes
ports 80 and 443 for Nginx service, but the Nginx server is configured for port 80 only.

GUI
When the stack staarts, we can go to GUI console in the browser, by entering http://localhost:80.a
The console is simplistic but nice, clear and comfortable to work with.

Setup the project
It is time to setup our first project. On the beginning we will setup
simple scan. Many of us use python container image, correct? Let's
see, what we can learn about python:latest!
After we click "New project" button, on the right side we can select the
project type.
These are predefined. We can create our own too.
As we wish to scan docker image, we select docker and now it is time
to configure the project. Configuration is very simple.

I entered three values :
● Project's name
● url docker://python.latest (this will
connect to dockerhub and collect
the proper image)
● pipeline - docker in this case.
And that's it!
Let's click Create.

Processing
On main screen we can observe the progress of project's execution. In my case I had to refresh the
view manually, but it is not an issue.

As this task took a lot of time, I created second project, this time for Python based on Alpine
Linux. This execution was queued.
Can we run these tasks simultaneously?
Well, yes, it is explicit config setting.
Execution of the first run took more than 1 hour on my machine. It is a lot.
So here we see some downside of this process - it can be very time-consuming. Therefore if we
want to design this process as part of CI/CD pipeline, we need to be careful and aware of potential
time needed for execution.

Report
Now the juice. When I generated the report, I started to dig deeper and looking around and after
long time (and I mean it!) I realised "Hey! You write an article. Write the article, then!" Reports
generated by ScanCode are simply great.
Ok, let's navigate through them

Scan's summary
First, let's click on the green Success button in the row where our scan is. This report shows some,
let's say, meta information about scan process.

We see status of the run, info about the task, execution time, dates, resources. Quite useful
summary.
Here we see more details about execution itself. What steps were performed, how long these steps
took, etc.

Scan's details
And here is the core of our report (it is not a SBOM yet!).

On the top of the screen we can see UUID and work directory information.
Below we have some numbers about the execution and buttons to download the raport in
different formats.
And after that we have information about input artifact, in this case python:latest.
Project data shows a lot of information about docker image, with layers, descriptions, commands
etc.

Dependencies information. Here we can learn everything about dependencies discovered during
the scan.

Finally, codebase resources. As the picture above presents, multiple scopes are available for us to
analyze.
Let's go into some of details now.

Hoover the proper element and click

Here we have details about every finding.
Now things go even more interesting. Click on any Path element and...

We go into the file! Let's find out information about licenses. Click Licenses in Detected values list

Ok, let's look on something else now!
Select Other in PACKAGE LICENSE EXPRESSION

We can check every individual package and learn what license type it uses.

Another example, Go to PACKAGE TYPE and click pypi
As we can see, the information detected by ScanCode.io is
very detailed. We took one of the most popular images out
there and we are able to depict it to the smallest elements.

Download data
Finally, we can download reports in different formats.
Click one of the buttons from picture below
And it will be simply downloaded :
Report can be downloaded as JSON od Excel file. Two more options format the report with the
standards restrictions - one for SPDX (Software Package Data eXchange) and second for
CycloneDX standard (and these are our SBOMs).

We scanned docker images so far. I also did the test for code bundle. So, I have my very old python
Alexa Skill script for AWS Lambda.
It contains a few dependencies, let's take a look on the requirements.txt file :
ask-sdk-core==1.9.0
Pillow
That's it. In the function code I import a few libraries :
import logging
import json
import requests
from ask_sdk_core.skill_builder import SkillBuilder
from ask_sdk_core.dispatch_components import AbstractRequestHandler
from ask_sdk_core.dispatch_components import AbstractExceptionHandler
from ask_sdk_core.utils import is_request_type, is_intent_name
from ask_sdk_core.handler_input import HandlerInput
from ask_sdk_model.ui import SimpleCard
from ask_sdk_model import Response

I created a bundle and scanned it with these settings
:
So, I put the zip directly into the GUI.
Of course, I could send it to my artefact storage and
scan it from there (similarly like we did with Docker
images).

After the scan is finished, I have a very interesting report

Let's take a look on dependencies

So, this was the quick review of ScanCode.io. The tool is very easy to use, very easy to maintain
and, what is very important for the teams, very easy to start with.
As this operation - create SBOM - might be obligatory very soon, it is a good idea to start
preparing ourselves for it.
However, there is one thing, which I wasn't able to successfully run. Vulnerability scan
But we will handle it in the next episode.
It is worth to mention that ScanCode provides API. It means that it can serve in the security
pipelines and provide its functionality on demand without delays needed for provisioning.
API functionality gives the needed flexibility and scalability needed to act as a important tools in
the Organization's governance and compliance.

Open Source License Checker (OSLC) : OSLC est un autre outil open source permettant d'identifier les licences open source
dans un projet. Il peut générer des rapports détaillés sur les licences utilisées, les attributs de licence et les dépendances
des composants.
Open Source License Checker is tool for inspection and analysis of lisence information from open source packages.
Licence : GNU General Public License version 2.0 (GPLv2)

wget http://sourceforge.net/projects/oslc/files/OldFiles/OSLC-test1.zip/download
unzip OSLC-test1.zip
java -jar OSLC.jar

https://sourceforge.net/projects/oslc/

Ninka : Ninka est un outil open source développé par Black Duck Software (maintenant partie de Synopsys) qui permet
d'identifier les licences open source dans les fichiers et les archives.
Ninka is licensed under the GPLv2+:

docker build . -t ninka:latest

https://github.com/dmgerman/ninka

FossID : FossID est une solution open source qui permet de scanner et d'identifier les licences open source dans les projets
logiciels et les bibliothèques.
Scan your code for Free and Open Source Software (FOSS) licenses and vulnerabilities and get full transparency of your
software products and services.
Detect open source down to snippet level :
Uncover all Free and Open Source Software (FOSS) in your repositories: from complete components, packages and libraries
to small snippets of code.
Generate Software Bill of Materials (SBOMs) :
Automatically export and import Software Package Data Exchange (SPDX) reports containing license text, copyright
statements, vulnerabilities, and even snippet-level information.

https://fossid.com/

Licensee : Licensee est une gem Ruby open source qui peut aider à identifier les licences open source dans un projet et
générer des rapports sur les licences détectées.
A Ruby Gem to detect under what license a project is distributed.

https://github.com/licensee/licensee

CodePDL est un outil open source qui peut analyser le code source pour détecter les licences open source et générer des
rapports sur les attributs de licence.

Black Duck Code Sight (anciennement Ohloh Code) est un outil d'analyse de code open source qui peut également
détecter et analyser les licences open source dans le code.
Black Duck is a complete open source management solution, which fully discovers all open source in your code. It can map components to
known vulnerabilities, and identify license and component quality risks. You can use Black Duck to set and enforce open source policies, and
integrate open source management into your DevOps environment. Additionally, Black Duck monitors and alerts you when new threats are
reported.
Black Duck helps security and development teams identify and mitigate open source related risks across application portfolios. Black Duck:
● Scans and identifies open source software throughout your code base.
● Maps vulnerabilities to your open source software.
● Triages vulnerability results and tracks remediation.
● Monitors for newly disclosed vulnerabilities in your open source code.
● Finds and fixes open source vulnerabilities in applications and containers.
Consider this:
Thousands of open source vulnerabilities are reported each year. Most organizations have over 30% open source in their code. 98% of
companies are unaware of the open source code they use. Black Duck On-Demand audits reveal that 67% of applications contain open
source vulnerabilities, and 40% of those are considered high severity. However, most organizations track less than half of the open source
they use. If you don't know what's in your code, you leave your systems, data, and customers at risk.
Black Duck can:
● Scan code to identify all embedded open source components.
● Automatically map open source in use to known vulnerabilities.
● Continuously monitor for new vulnerabilities as they are reported.
● Prioritize and track your remediation efforts.

https://www.synopsys.com/software-integrity/security-testing/software-composition-analysis.html
https://synopsys.atlassian.net/wiki/spaces/INTDOCS/pages/622878/Black+Duck+Overview

$ git clone https://github.com/nexB/vulnerablecode.git
$ cd vulnerablecode
$ make envfile
$ docker-compose build
So far, the process is the same like for ScanCode.
No surprise, we run the stack here as well
$ docker-compose up -d
This time we have three containers
$ docker-compose ps
vulnerablecode_db_1 docker-entrypoint.sh postgres Up 5432/tcp
vulnerablecode_nginx_1 /docker-entrypoint.sh ngin ... Up 0.0.0.0:443->443/tcp, 0.0.0.0:80->80/tcp
vulnerablecode_vulnerablecode_1 /bin/sh -c ./manage.py mi ... Up 8000/tcp
According to documentation, we need to import data on this point.
$ docker-compose run vulnerablecode ./manage.py import --list
$ docker-compose exec vulnerablecode ./manage.py import --all
$ docker-compose exec vulnerablecode ./manage.py improve --all
This way we imported / updated the whole data. Documentation shows also how to collect only
part of information (to save time, space and transfer), however I believe that the only way is to
import everything to have full visibility of issues.

The process is long.
I found that some importers need additional configuration.
For example GitHubAPIImporter:
Importing data using vulnerabilities.importers.github.GitHubAPIImporter
Cannot call GitHub API without a token set in the GH_TOKEN environment variable.
Traceback (most recent call last):
File "/app/vulnerabilities/management/commands/import.py", line 61, in import_data
ImportRunner(importer).run()
File "/app/vulnerabilities/import_runner.py", line 44, in run
count = process_advisories(advisory_datas=advisory_datas, importer_name=importer_name)
File "/app/vulnerabilities/import_runner.py", line 54, in process_advisories
for data in advisory_datas:
File "/app/vulnerabilities/importers/github.py", line 171, in advisory_data
response = utils.fetch_github_graphql_query(graphql_query)
File "/app/vulnerabilities/utils.py", line 241, in fetch_github_graphql_query
raise GitHubTokenError(msg)
vulnerabilities.utils.GitHubTokenError: Cannot call GitHub API without a token set in the GH_TOKEN environment variable.
Failed to run importer vulnerabilities.importers.github.GitHubAPIImpor

Well, I had other issues, so I decided to take a look on importers I can use
$ docker-compose exec vulnerablecode ./manage.py import --list
Vulnerability data can be imported from the following importers:
vulnerabilities.importers.nginx.NginxImporter
vulnerabilities.importers.alpine_linux.AlpineImporter
vulnerabilities.importers.github.GitHubAPIImporter
vulnerabilities.importers.nvd.NVDImporter
vulnerabilities.importers.openssl.OpensslImporter
vulnerabilities.importers.redhat.RedhatImporter
vulnerabilities.importers.pysec.PyPIImporter
vulnerabilities.importers.debian.DebianImporter
vulnerabilities.importers.gitlab.GitLabAPIImporter
vulnerabilities.importers.pypa.PyPaImporter
vulnerabilities.importers.archlinux.ArchlinuxImporter
vulnerabilities.importers.ubuntu.UbuntuImporter
vulnerabilities.importers.debian_oval.DebianOvalImporter

I imported a few of them one by one.
After the import, we need to run last command
$ docker-compose exec vulnerablecode ./manage.py improve --all
This takes ages (I mentioned the cause eariler). We can, fortunately, improve them one by one,
selecting proper item from the list
$ docker-compose exec vulnerablecode ./manage.py improve --list

We can provide the package name to find any vulnerabilities in it

At this point we should be able to get more details by clicking package name, but I got only errors.
Not funny.
Another option is a search for vulnerability by its ID.
We have some data there.

A few more words about the collecting data. I said the process took good part of the day. I would
say, around 10 hours and wasn't finished. It is totally dissapointing. I checked the performance a
few times and this part is also dissapointing. Here is a performance snapshot

Summary
I don't like this tool. It is very uneffective, does not add any value to the chain.
And most importantly, I have no idea if the collecting process can work effectivelly or not.
What I observed is useless.
The general idea of this tool is quite ok. If it can be incorporated with ScanCode (by the way, both
of the tools are part of the suit on AboutCode) and serve as service providing details about
vulnerabilitites - I go with it.
Right now, as I can see, it is not the missing tool (which I mentioned in previous episode).
To be clear, this tool doesn't generate any SBOMs, it is not its function.
I didn't spent much time on it, my approach is to find reasonable, quickly implementable and easy
tool to add to CI/CD processes.

I will check two tools, one for SBOM generation and second for containers vulnerability scans.
Let's get started!
Syft is a SBOM generation tool, which works in CLI. And from DevOps perspective this is great!
Grype is a vulnerability scanner, also for CLI use.
Syft and Grype are dedicated to scan containers.

Installation
Another great news. Both tools can be installed on multiple systems using installers like brew,
choco, etc. This is amazing.
Let's install both these tools :
$ curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sudo sh -s -- -b /usr/local/bin
$ curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sudo sh -s -- -b /usr/local/bin
Yes, security, I know We shouldn't do it in this way, however, for now it is ok. These commands are
in documentation of both tools.

Usage
Syft
Ok, we can try to generate some SBOMs now!
Syft by default genaerate data from the final layer of image only. That is not enough for proper
scan, but to enable scan through all layers we can use one argument. So, I will use only that option
in next examples.
I will scan four images
● python
● postgresql
● phpbb
● nginx
Syft can produce SBOM in different formats. I'll use CDX in JSON format. The tool can pull images
from registry, use podman or docker, etc.
$ syft bitnami/phpbb --scope all-layers -o cyclonedx-json=sbom-phpbb.json
$ syft python --scope all-layers -o cyclonedx-json=sbom-python.json
$ syft nginx --scope all-layers -o cyclonedx-json=sbom-nginx.json
$ syft postgres --scope all-layers -o cyclonedx-json=sbom-postgres.json

With multiple use of -o argument we can create multiple output files in one run. Splendid. The
runtime is really nicely presented in the CLI. I love the pulling image progress visualisation :)
Syft can do more than just that. But it is enough at the moment.
Grype
Grype creates an analysis of vulnerabilities in the container. We can provide the container itself (we
will do it for two of them) or it can work on SBOMs generated by Syft - this will be the third run).
$ grype bitnami/phpbb --scope AllLayers -o cyclonedx --file vuln-sbom-phpbb.xml
$ grype python --scope AllLayers -o cyclonedx --file vuln-sbom-python.xml
$ grype sbom:sbom-nginx.json -o cyclonedx --file vuln-sbom-nginx.xml
As we can see, the vulnerability report is generated much faster from SBOM. It is obvious, all data is
already extracted.
Last option is the most fun. Let's connect Syft and Grype together!
$ syft postgres --scope all-layers -o cyclonedx-json | grype -o cyclonedx --file vuln-sbom-postgres.xml
Very nice!

Another very useful and ready to be used in pipelines feature is fail on selected severity of the
vulnerability.
$ grype sbom:sbom-nginx.json -o cyclonedx --file vuln-sbom-nginx1.xml -f high
By using -f high I want to fail if grype finds any issue with high or critical severity.
✔ Vulnerability DB [no update available]
✔ Scanned image [143 vulnerabilities]
Report written to "vuln-sbom-nginx1.xml"
1 error occurred:
* discovered vulnerabilities at or above the severity threshold
And exit code of the runtime is 1. Really, nothing more is needed for CI/CD pipelines!

Summary
I have only one thing, which I want to emphasize.
As these tools are created kind of together and can work together, it will be nice to unify the
agruments.
Example: --scope all-layers vs --scope AllLayers. Small thing, though.
I like these tools very much.
These two are my favourites so far.
The use scope is somehow limited, but that's ok.
I truly recommend these two to implement in your pipelines!

for those of us who know IaC and did quality and security scans of IaC templates.
Well, yes, Checkov is a quality scanner, but from some time already it is more than that!
Let's see on the frameworks which can be scanned by Checkov:
--framework
{bitbucket_pipelines,circleci_pipelines,argo_workflows,arm,azure_pipelines,bicep,cloudformation,dockerfile,github_configur
ation,github_actions,gitlab_configuration,gitlab_ci,bitbucket_configuration,helm,json,yaml,kubernetes,kustomize,openapi,s
ca_package,sca_image,secrets,serverless,terraform,terraform_plan,all}
[{bitbucket_pipelines,circleci_pipelines,argo_workflows,arm,azure_pipelines,bicep,cloudformation,dockerfile,github_configu
ration,github_actions,gitlab_configuration,gitlab_ci,bitbucket_configuration,helm,json,yaml,kubernetes,kustomize,openapi,s
ca_package,sca_image,secrets,serverless,terraform,terraform_plan,all} ...]
Quite a number, don't you think?
But... What about SBOMs? Can Checkov generate SBOM?
No. Well, not really.
But the report generated by Checkov can be exported in CDX format, what means, it can be
consumed in the process!

Let's take a look. I install Checkov and download random repos from GitHub :
● Terraform
● CloudFormation
● Dockerfile
● Serverless
● Kubernetes
● Helm
$ pip3 install checkov
So, installation is not that hard, isn't it?
$ git clone https://github.com/dwmkerr/terraform-consul-cluster.git
$ git clone https://github.com/splunk/splunk-aws-cloudformation.git
$ git clone https://github.com/webdevops/Dockerfile.git
$ git clone https://github.com/softprops/serverless-aws-rust-http.git
$ git clone https://github.com/kubernetes/examples.git
$ git clone https://github.com/prometheus-community/helm-charts.git

Ok. I'll generate a report for each repo with CycloneDX output. Also, I will not specify the
framework, so it is a huge possibility, that some of these repos contain not only the "main"
framework, but others as well. Will see.
checkov -d terraform-consul-cluster/ -o cyclonedx > tf.xml
For some reason, Checkov didn't save the report to the specified file, but created a folder. But it is
not an issue, I used simple redirection and didn't spent time on it :)
Report is not very readable for human, but it doesn't matter, it should be (and it is) readable for
machine. Checkov uses the newest version for CycloneDX - 1.4.
Let's take a look on details. In the "standard report" I found this issue:
Check: CKV2_AWS_12: "Ensure the default security group of every VPC restricts all traffic"
FAILED for resource: module.consul-cluster.aws_vpc.consul-cluster
File: /modules/consul/01-vpc.tf:2-10
Guide: https://docs.bridgecrew.io/docs/networking_4
2 | resource "aws_vpc" "consul-cluster" {
3 | cidr_block = "${var.vpc_cidr}" // i.e. 10.0.0.0 to 10.0.255.255
4 | enable_dns_hostnames = true
5 |
6 | tags {
7 | Name = "Consul Cluster VPC"
8 | Project = "consul-cluster"
9 | }
10 | }

What we have in generated SBOM?
<vulnerability bom-ref="070be6ca-0732-4cf3-b0c7-a423fc0f45be">
<id>CKV2_AWS_12</id>
<source>
<name>checkov</name>
</source>
<ratings>
<rating>
<severity>unknown</severity>
</rating>
</ratings>
<description>Resource: module.consul-cluster.aws_vpc.consul-cluster. Ensure the default security group of every VPC
restricts all traffic</description>
<advisories>
<advisory>
<url>https://docs.bridgecrew.io/docs/networking_4</url>
</advisory>
</advisories>
<affects>
<target>
<ref>pkg:terraform/cli_repo/terraform-consul-cluster/modules/consul/01-vpc.tf/module.consul-
cluster.aws_vpc.consul-cluster@sha1:26077595ad94ad61098ccc203af70aaf518a847b</ref>
</target>
</affects>
</vulnerability>

Looks quite nice.
I generated SBOM reports from all repos I cloned. And I am really satisfied with results. Well done
Bridgecrew!
Summary
I really like Checkov, and I say if for a few years now. It is more and more complex tool, even in
the version available for free. I am really happy to see the SBOM option, as it becomes very
important part of the process.
The great news is that SBOMs can cover also infrastructure as Code. Imagine, you buy a car. And
you receive report where you see that every single component in this car passed verification and
validation. Every single one, except the wheels. What can go wrong? These wheels here - it is IaC.
Why I said no on the beginning, when I asked myself if Checkov is a SBOM tool? Well, the point is
that SBOM should contain all dependencies. Checkov's focus is on templates. Don't get me
wrong, that is OK, there are other tools which should take care about code's dependencies. I said
that to emphasize, Checkov cannot be only tool used in SBOM generation process.
So, to be correct, Checkov is not SCA tool but can generate SBOM report for its part.

The SBOM tool is a highly scalable and enterprise ready tool to create SPDX 2.2 compatible SBOMs for any variety of artifacts.
The tool uses the Component Detection libraries to detect components and the ClearlyDefined API to populate license information
for these components.

$ curl -Lo sbom-tool
https://github.com/microsoft/sbom-tool/releases/latest/download/sbom-tool-linux-x64
$ chmod +x sbom-tool

https://github.com/microsoft/sbom-tool?tab=readme-ov-file

The Ortelius Command Line Interface supports integration with DevOps tools and CI/CD pipelines.
The CLI Python script interacts with the Ortelius REST APIs to perform:
● Approve the Application Version
● Move the _Application Version- using the supplied task
● Create/replace the Component Version for the Application Version
● Assign a Component Version to an Application version
● Assign the key/values pairs to the Component version
● Create a bash file from the Component .toml file
● Export a Domain including all objects to stdout
● Imports the export file into the new Domain
● Deploy the Application Version
● Upload Swagger and SBOM files to Component Version
https://github.com/ortelius/ortelius-cli/blob/main/doc/dh.md
https://cd.foundation/
https://ortelius.io/
https://www.deployhub.com/sbom-automation/

CLI Prerequisites
1. Install Python 3.8 or newer
2. Install Ortelius CLI
pip3 install --upgrade ortelius-cli

jbom generates Runtime and Static SBOMs for local and remote Java apps
Every project should create a Software Bill of Materials (SBOM) and make it available, so that people know what ingredients are
inside. You've got a few options for generating SBOMs:
● GOOD -- Static SBOM (source) - This works fine, but you'll miss runtime libraries from appservers and runtime platforms.
You'll also include libraries that don't matter like test frameworks. You'll also have no idea which libraries are actually active in
the running application.
● BETTER -- Static SBOM (binary) - You'll still miss parts, because code can be located in a variety of different places. And
you'll also probably include libraries that don't matter but happen to be on the filesystem.
● BEST -- Runtime SBOM - This is what 'jbom' is all about. Runtime SBOM is the most accurate approach as it captures the
exact libraries used by the application, even if they are in the platform, appserver, plugins, or anywhere else. This approach
can also include details of services invoked and which libraries are active.
jbom advantages:
● very fast, complete, and accurate
● produces standard CycloneDX SBOM in JSON format
● works on both running apps/APIs and binaries
● finds all libraries, including platform, appserver, plug-in, and dynamic sources.
● doesn't report test or other libraries not present at runtime
● handles nested jar, war, ear, and zip files (including Spring)
● handles jars using common shaded and relocation techniques
● no source code required

https://github.com/eclipse/jbom?tab=readme-ov-file
https://www.contrastsecurity.com/sboms

CycloneDX-CLI
CycloneDX-CLI is a CLI open source tool used for generating SBOMs in CycloneDX format. It is compatible with various ecosystems and
security databases, and it can be used to scan packages and containers for vulnerabilities. The tool can also convert between multiple SBOM
formats and has a simple interface. However, it does not have pipeline integration capabilities like Syft and Grype.
Open Source Code: https://github.com/CycloneDX/cyclonedx-cli

OSV-Scanner
OSV-Scanner is an open source project by Google and part of the efforts for a distributed database of vulnerabilities for open source. OSV-
Scanner is a frontend to the OSV database that connects a project’s list of dependencies with the vulnerabilities that affect them. Creates
SBOMs with vulnerability information in a tabular or JSON format.
Open Source Code: https://github.com/google/osv-scanner
https://osv.dev/

Generating SBOMS FROM FOSS_(Detecting OSS licences)

More Related Content

Similar to Generating SBOMS FROM FOSS_(Detecting OSS licences) (20)

Recently uploaded (20)

Generating SBOMS FROM FOSS_(Detecting OSS licences)