SlideShare a Scribd company logo
©2018 VMware, Inc.
A Fine Kettle of Files
Lessons Learned in Automating
Open Source Software Compliance
for Containers
Nisha Kumar
Open Source Engineer @VMware
All Things Open: October 22nd 2018
2©2018 VMware, Inc.
o Compliance Obligations like FISMA, HIPAA, PCI DSS are
not covered (but the roadblocks may be similar)
o I AM NOT A LAWYER and this presentation doesn’t
provide legal advice
o This is a story about my experience - I would love to hear
about yours!
Disclaimers
3©2018 VMware, Inc.
raise the chicken
grow all the ingredients
for the gravy
grow the cilantro
What is Open Source Software Compliance?
smith the copper pot
4©2018 VMware, Inc.
Grocery Store Chicken
Gravy mix
Cilantro from my garden
What is Open Source Software Compliance?
Garage Sale
5©2018 VMware, Inc.
Grocery Store Chicken
Made the gravy myself
grocery store cilantro
What is Open Source Software Compliance?
Gifted to me
6©2018 VMware, Inc.
What is Open Source Software Compliance?
Attribution: http://tinyurl.com/yb8fcahz
License: CC BY-SA 2.0
‹#›©2018 VMware, Inc.
Open Source Software (OSS) Compliance is about
giving credit where credit is due and following the
terms of use
Not doing so erodes other’s trust in you
8©2018 VMware, Inc.
1) Find your software’s build/run dependencies
2) Find the license of those dependencies
3) Do what the license tells you to do (I am not a lawyer)
How to comply with OSS licenses?
Tools to help with OSS Compliance: https://compliance.linuxfoundation.org/references/tools
‹#› 9©2018 VMware, Inc.
How is OSS Compliance different for Containers?
10©2018 VMware, Inc.
Containers are not real, but Container Images are!
Not A Real Container
Real Files
11©2018 VMware, Inc.
Image layers are created using diff filesystem storage drivers
Reference: https://docs.docker.com/storage/storagedriver/
BaseOS: bin boot etc home lib opt root tmp usr var
Read Only Layer
R/W layer: Add new files or remove existing files
Image
apt-get install ca-certificates
Copy on Write
12©2018 VMware, Inc.
Image layers are created using diff filesystem storage drivers
Reference: https://docs.docker.com/storage/storagedriver/
BaseOS: bin boot etc home lib opt root tmp usr var
Read Only Layer
R/W layer: Add new files or remove existing files
Diff Files: etc/ca-certificates/* usr/share/ca-certificates/*
Read Only Layer
Image
Copy on Write
apt-get update && apt-get install mercurial
13©2018 VMware, Inc.
You create your layers on top of someone else’s layers
Reference: https://docs.docker.com/storage/storagedriver/
BaseOS: bin boot etc home lib opt root tmp usr var
Read Only Layer
R/W layer: Add new files or remove existing files
Diff Files: etc/ca-certificates/* usr/share/ca-certificates/*
Read Only Layer
Diff Files: etc/mercurial/* usr/share/mercurial/*
Read Only Layer
Somebody
else’s
Image
Your Copy on Write
pull
some:image
‹#› 14©2018 VMware, Inc.
Shouldn’t the Dockerfile tell me everything about
what’s in the container?
15©2018 VMware, Inc.
Recipe for chicken curry:
Heat oil in a pan
Stir in cinnamon, cardamom and clove
When spluttering, add chopped onions
Stir onions until golden brown
Add turmeric, ginger-garlic paste and keep stirring
Add tomatoes and cook until oil separates
Add chicken pieces, cover and cook until chicken is done
Add cilantro garnish and serve
16©2018 VMware, Inc.
FROM golang:1.11 as builder
WORKDIR /go/src/github.com/domain/repo
COPY ./ .
RUN go build -v -o bin/executable ./repo/cmd/
FROM photon:2.0
RUN tdnf update -y
WORKDIR /bin/
COPY --from=builder /go/src/github.com/domain/repo/bin/executable .
ENTRYPOINT [ "/bin/executable" ]
CMD [ "8080" ]
EXPOSE 8080
A Typical Dockerfile for a Golang App
App Dependencies
App compiled with dependencies
copied over to new container
17©2018 VMware, Inc.
https://android.googlesource.com/platform/manifest/+/android-1.6_r1/default.xml
‹#›©2018 VMware, Inc.
Build Scripts are not Build
Manifests
‹#› 19©2018 VMware, Inc.
How can I get a container image manifest?
20©2018 VMware, Inc.
Anatomy of a golang image
21©2018 VMware, Inc.
Anatomy of a golang image
Information about what images were used to make this image is lost
22©2018 VMware, Inc.
A waterfall of Dockerfiles!
golang:1.11
buildpack-deps:stretch-scm
buildpack-deps:stretch-curl
debian:jessie
https://github.com/docker-library/golang/blob/ed78459fac108dab72556146b759516cc65ee109/1.11/stretch/Dockerfile
https://github.com/docker-library/buildpack-deps/blob/1845b3f918f69b4c97912b0d4d68a5658458e84f/stretch/scm/Dockerfile
https://github.com/docker-library/buildpack-deps/blob/1845b3f918f69b4c97912b0d4d68a5658458e84f/stretch/curl/Dockerfile
https://github.com/debuerreotype/docker-debian-artifacts/blob/ed15c6a0b511d2985ca252f59f4318b1fe2a7a59/stretch/Dockerfile
23©2018 VMware, Inc.
Anatomy of a golang image
Some BaseOS and and default bash execution
24©2018 VMware, Inc.
Anatomy of a golang image
Using known package manager for Debian or Ubuntu
25©2018 VMware, Inc.
Anatomy of a golang image
Only some layers have files to inspect
26©2018 VMware, Inc.
Anatomy of a golang image
Some files are installed using the package manager
27©2018 VMware, Inc.
Anatomy of a golang image
Some are installed using shell scripts
28©2018 VMware, Inc.
Anatomy of a golang image
docker history --no-trunc docker.io/golang
29©2018 VMware, Inc.
o Possible base OS and
package manager
o Installation scripts
o Software packages
installed using the package
manager
o Software packages
installed using scripts
o Package versions and
licenses
What we have What we’re missing
30©2018 VMware, Inc.
docker save the golang image
31©2018 VMware, Inc.
o The config file has all the container image metadata, including ‘history’
o The sha256sum is the same one when you sha256sum that config file
o That sha256sum is also the image ID
o It’s in json format so it can be easily read by a script that support it (python/ruby)
manifest.json
32©2018 VMware, Inc.
o The layers are ‘non-empty’
o They are ordered from bottom most to top most
o They are paths to the layer tarball
o You can untar and inspect the files
manifest.json
33©2018 VMware, Inc.
Files!
34©2018 VMware, Inc.
o Possible base OS and
package manager
o Installation scripts
o Directory trees
o Container build manifest?
o Software packages
installed using the package
manager
o Software packages
installed using scripts
o Package versions and
licenses
What we have What we’re missing
35©2018 VMware, Inc.
‹#›©2018 VMware, Inc.
Files + Context
Watching a cook make
chicken curry
Attribution: Nisha Kumar 2018
‹#› 37©2018 VMware, Inc.
What command created what tarball?
38©2018 VMware, Inc.
Why is there a 7th layer?
I thought there were
only 6 that occupied
memory!
39©2018 VMware, Inc.
You also get empty directories
40©2018 VMware, Inc.
??
‹#› 41©2018 VMware, Inc.
Can I use this info to figure out what is
installed in that layer?
42©2018 VMware, Inc.
o sudo mount -t proc /proc /path/to/rootfs/proc
o sudo mount -o bind /sys /path/to/rootfs/sys
o sudo mount -o bind /dev /path/to/rootfs/dev
o sudo cp /etc/resolv.conf /path/to/rootfs/etc/resolv.conf
o sudo chroot rootfs /bin/bash -c “dpkg --get-selections”
Mount base rootfs and chroot
43©2018 VMware, Inc.
44©2018 VMware, Inc.
o sudo mount -t overlay overlay -o lowerdir=layer2:layer1,upperdir=emptydir,workdir=workdir
mergedir
o sudo mount -t proc /proc /path/to/rootfs/proc
o sudo mount -o bind /sys /path/to/rootfs/sys
o sudo mount -o bind /dev /path/to/rootfs/dev
o sudo cp /etc/resolv.conf /path/to/rootfs/etc/resolv.conf
o sudo chroot rootfs /bin/bash -c “dpkg --get-selections”
Mount overlay2 filesystem and chroot
‹#› 45©2018 VMware, Inc.
Is this automated?
‹#›©2018 VMware, Inc.
Tern Automates Compliance for
Containers
47©2018 VMware, Inc.
48©2018 VMware, Inc.
49©2018 VMware, Inc.
o Support for Debian, Ubuntu, Photon and Alpine based package managers
o Lists Packages installed and their Dependencies
o Extensible Architecture (add your own method of license and source information)
o Caching by Container Image Layer
o Can be used as a standalone tool to help container developers or part of a container build and
release pipeline
o Structured data output (coming soon in Release 0.2.0)
o Active community
Features
50©2018 VMware, Inc.
51©2018 VMware, Inc.
52©2018 VMware, Inc.
53©2018 VMware, Inc.
o Software packages
installed using package
manager
o Package versions and
licenses
o Software packages
installed using scripts
o Files of unknown origin
o Ad-hoc hacks
o Build manifest
What we have What we’re missing
‹#›©2018 VMware, Inc.
Is the chicken local?
55©2018 VMware, Inc.
Thank you!
Rate this talk:
eventmobi.com/app/allthingsopen2018
https://github.com/vmware/tern
github: @nishakm
twitter: @nishakmr

More Related Content

PPTX
Bandit and Gosec - Security Linters
ODP
Wireguard VPN
PDF
Continuous Security in DevOps
PDF
Open Source in the Era of 5G
ODP
Scaling and securing node.js apps
PPTX
AppSec USA 2014 talk by Chris Swan "Implications & Opportunities at the Bleed...
PDF
Zombies in Kubernetes
ODP
"Containers do not contain"
Bandit and Gosec - Security Linters
Wireguard VPN
Continuous Security in DevOps
Open Source in the Era of 5G
Scaling and securing node.js apps
AppSec USA 2014 talk by Chris Swan "Implications & Opportunities at the Bleed...
Zombies in Kubernetes
"Containers do not contain"

What's hot (20)

PDF
Docker and kernel security
PDF
Ryan Koop's Docker Chicago Meetup Demo March 12 2014
PDF
Kubernetes - security you need to know about it
PPTX
Container BoM Inspection with TERN
PDF
Control Plane: Continuous Kubernetes Security (DevSecOps - London Gathering, ...
PDF
Deployment with Ruby on Rails
PDF
Justin Cormack - The 10 Container Security Tricks That Will Help You Sleep At...
PPT
Securing Network Access with Open Source solutions
PDF
Docker Security - Secure Container Deployment on Linux
PDF
Kubernetes security
PPTX
AWS Survival Guide
PDF
DCSF 19 Deploying Rootless buildkit on Kubernetes
PDF
Security Tips to run Docker in Production
PDF
Appsec DC - wXf -2010
PDF
Building a Video Encoding Pipeline at The New York Times
PPSX
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
PDF
Control Plane: Security Rationale for Istio (DevSecOps - London Gathering, Ja...
PDF
Windows attacks - AT is the new black
PDF
Turbocharged Java with Quarkus | JakartaOne Livestream
PPTX
Devoops: DoJ Annual Cybersecurity Training Symposium Edition 2015
Docker and kernel security
Ryan Koop's Docker Chicago Meetup Demo March 12 2014
Kubernetes - security you need to know about it
Container BoM Inspection with TERN
Control Plane: Continuous Kubernetes Security (DevSecOps - London Gathering, ...
Deployment with Ruby on Rails
Justin Cormack - The 10 Container Security Tricks That Will Help You Sleep At...
Securing Network Access with Open Source solutions
Docker Security - Secure Container Deployment on Linux
Kubernetes security
AWS Survival Guide
DCSF 19 Deploying Rootless buildkit on Kubernetes
Security Tips to run Docker in Production
Appsec DC - wXf -2010
Building a Video Encoding Pipeline at The New York Times
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Control Plane: Security Rationale for Istio (DevSecOps - London Gathering, Ja...
Windows attacks - AT is the new black
Turbocharged Java with Quarkus | JakartaOne Livestream
Devoops: DoJ Annual Cybersecurity Training Symposium Edition 2015
Ad

Similar to Lessons Learned in Automating Compliance for Containers (20)

PDF
Docker Forensics
PDF
Digital Forensics and Incident Response in The Cloud Part 3
PDF
Jump into Squeak - Integrate Squeak projects with Docker & Github
PDF
Using Docker For Development
PPTX
Introduction to Docker
PPTX
No more Dockerfiles? Buildpacks to help you ship your image!
PDF
Software Define your Current Storage with Opensource
PPTX
Securing Containers - Sathyajit Bhat - Adobe
PDF
DevOps Bootcamp course resource (1)-1-99.pdf
PDF
Be a better developer with Docker (revision 3)
PPTX
Custom Buildpacks and Data Services
PPTX
Docker Security workshop slides
PDF
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PDF
Challenges of container configuration
PDF
Building and Customizing CoreOS
PDF
Continuous Delivery w projekcie Open Source - Marcin Stachniuk - DevCrowd 2017
PDF
Dependencies Managers in C/C++. Using stdcpp 2014
PPTX
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
PDF
Improve your Java Environment with Docker
PPTX
Scaling Development Environments with Docker
Docker Forensics
Digital Forensics and Incident Response in The Cloud Part 3
Jump into Squeak - Integrate Squeak projects with Docker & Github
Using Docker For Development
Introduction to Docker
No more Dockerfiles? Buildpacks to help you ship your image!
Software Define your Current Storage with Opensource
Securing Containers - Sathyajit Bhat - Adobe
DevOps Bootcamp course resource (1)-1-99.pdf
Be a better developer with Docker (revision 3)
Custom Buildpacks and Data Services
Docker Security workshop slides
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
Challenges of container configuration
Building and Customizing CoreOS
Continuous Delivery w projekcie Open Source - Marcin Stachniuk - DevCrowd 2017
Dependencies Managers in C/C++. Using stdcpp 2014
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
Improve your Java Environment with Docker
Scaling Development Environments with Docker
Ad

More from All Things Open (20)

PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
PPTX
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
PDF
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
PDF
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
PDF
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
PDF
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
PDF
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
PPTX
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
PDF
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
PDF
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
PPTX
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
PDF
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
PDF
The Death of the Browser - Rachel-Lee Nabors, AgentQL
PDF
Making Operating System updates fast, easy, and safe
PDF
Reshaping the landscape of belonging to transform community
PDF
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
PDF
Integrating Diversity, Equity, and Inclusion into Product Design
PDF
The Open Source Ecosystem for eBPF in Kubernetes
PDF
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
The Death of the Browser - Rachel-Lee Nabors, AgentQL
Making Operating System updates fast, easy, and safe
Reshaping the landscape of belonging to transform community
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
Integrating Diversity, Equity, and Inclusion into Product Design
The Open Source Ecosystem for eBPF in Kubernetes
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf

Lessons Learned in Automating Compliance for Containers

  • 1. ©2018 VMware, Inc. A Fine Kettle of Files Lessons Learned in Automating Open Source Software Compliance for Containers Nisha Kumar Open Source Engineer @VMware All Things Open: October 22nd 2018
  • 2. 2©2018 VMware, Inc. o Compliance Obligations like FISMA, HIPAA, PCI DSS are not covered (but the roadblocks may be similar) o I AM NOT A LAWYER and this presentation doesn’t provide legal advice o This is a story about my experience - I would love to hear about yours! Disclaimers
  • 3. 3©2018 VMware, Inc. raise the chicken grow all the ingredients for the gravy grow the cilantro What is Open Source Software Compliance? smith the copper pot
  • 4. 4©2018 VMware, Inc. Grocery Store Chicken Gravy mix Cilantro from my garden What is Open Source Software Compliance? Garage Sale
  • 5. 5©2018 VMware, Inc. Grocery Store Chicken Made the gravy myself grocery store cilantro What is Open Source Software Compliance? Gifted to me
  • 6. 6©2018 VMware, Inc. What is Open Source Software Compliance? Attribution: http://tinyurl.com/yb8fcahz License: CC BY-SA 2.0
  • 7. ‹#›©2018 VMware, Inc. Open Source Software (OSS) Compliance is about giving credit where credit is due and following the terms of use Not doing so erodes other’s trust in you
  • 8. 8©2018 VMware, Inc. 1) Find your software’s build/run dependencies 2) Find the license of those dependencies 3) Do what the license tells you to do (I am not a lawyer) How to comply with OSS licenses? Tools to help with OSS Compliance: https://compliance.linuxfoundation.org/references/tools
  • 9. ‹#› 9©2018 VMware, Inc. How is OSS Compliance different for Containers?
  • 10. 10©2018 VMware, Inc. Containers are not real, but Container Images are! Not A Real Container Real Files
  • 11. 11©2018 VMware, Inc. Image layers are created using diff filesystem storage drivers Reference: https://docs.docker.com/storage/storagedriver/ BaseOS: bin boot etc home lib opt root tmp usr var Read Only Layer R/W layer: Add new files or remove existing files Image apt-get install ca-certificates Copy on Write
  • 12. 12©2018 VMware, Inc. Image layers are created using diff filesystem storage drivers Reference: https://docs.docker.com/storage/storagedriver/ BaseOS: bin boot etc home lib opt root tmp usr var Read Only Layer R/W layer: Add new files or remove existing files Diff Files: etc/ca-certificates/* usr/share/ca-certificates/* Read Only Layer Image Copy on Write apt-get update && apt-get install mercurial
  • 13. 13©2018 VMware, Inc. You create your layers on top of someone else’s layers Reference: https://docs.docker.com/storage/storagedriver/ BaseOS: bin boot etc home lib opt root tmp usr var Read Only Layer R/W layer: Add new files or remove existing files Diff Files: etc/ca-certificates/* usr/share/ca-certificates/* Read Only Layer Diff Files: etc/mercurial/* usr/share/mercurial/* Read Only Layer Somebody else’s Image Your Copy on Write pull some:image
  • 14. ‹#› 14©2018 VMware, Inc. Shouldn’t the Dockerfile tell me everything about what’s in the container?
  • 15. 15©2018 VMware, Inc. Recipe for chicken curry: Heat oil in a pan Stir in cinnamon, cardamom and clove When spluttering, add chopped onions Stir onions until golden brown Add turmeric, ginger-garlic paste and keep stirring Add tomatoes and cook until oil separates Add chicken pieces, cover and cook until chicken is done Add cilantro garnish and serve
  • 16. 16©2018 VMware, Inc. FROM golang:1.11 as builder WORKDIR /go/src/github.com/domain/repo COPY ./ . RUN go build -v -o bin/executable ./repo/cmd/ FROM photon:2.0 RUN tdnf update -y WORKDIR /bin/ COPY --from=builder /go/src/github.com/domain/repo/bin/executable . ENTRYPOINT [ "/bin/executable" ] CMD [ "8080" ] EXPOSE 8080 A Typical Dockerfile for a Golang App App Dependencies App compiled with dependencies copied over to new container
  • 18. ‹#›©2018 VMware, Inc. Build Scripts are not Build Manifests
  • 19. ‹#› 19©2018 VMware, Inc. How can I get a container image manifest?
  • 20. 20©2018 VMware, Inc. Anatomy of a golang image
  • 21. 21©2018 VMware, Inc. Anatomy of a golang image Information about what images were used to make this image is lost
  • 22. 22©2018 VMware, Inc. A waterfall of Dockerfiles! golang:1.11 buildpack-deps:stretch-scm buildpack-deps:stretch-curl debian:jessie https://github.com/docker-library/golang/blob/ed78459fac108dab72556146b759516cc65ee109/1.11/stretch/Dockerfile https://github.com/docker-library/buildpack-deps/blob/1845b3f918f69b4c97912b0d4d68a5658458e84f/stretch/scm/Dockerfile https://github.com/docker-library/buildpack-deps/blob/1845b3f918f69b4c97912b0d4d68a5658458e84f/stretch/curl/Dockerfile https://github.com/debuerreotype/docker-debian-artifacts/blob/ed15c6a0b511d2985ca252f59f4318b1fe2a7a59/stretch/Dockerfile
  • 23. 23©2018 VMware, Inc. Anatomy of a golang image Some BaseOS and and default bash execution
  • 24. 24©2018 VMware, Inc. Anatomy of a golang image Using known package manager for Debian or Ubuntu
  • 25. 25©2018 VMware, Inc. Anatomy of a golang image Only some layers have files to inspect
  • 26. 26©2018 VMware, Inc. Anatomy of a golang image Some files are installed using the package manager
  • 27. 27©2018 VMware, Inc. Anatomy of a golang image Some are installed using shell scripts
  • 28. 28©2018 VMware, Inc. Anatomy of a golang image docker history --no-trunc docker.io/golang
  • 29. 29©2018 VMware, Inc. o Possible base OS and package manager o Installation scripts o Software packages installed using the package manager o Software packages installed using scripts o Package versions and licenses What we have What we’re missing
  • 30. 30©2018 VMware, Inc. docker save the golang image
  • 31. 31©2018 VMware, Inc. o The config file has all the container image metadata, including ‘history’ o The sha256sum is the same one when you sha256sum that config file o That sha256sum is also the image ID o It’s in json format so it can be easily read by a script that support it (python/ruby) manifest.json
  • 32. 32©2018 VMware, Inc. o The layers are ‘non-empty’ o They are ordered from bottom most to top most o They are paths to the layer tarball o You can untar and inspect the files manifest.json
  • 34. 34©2018 VMware, Inc. o Possible base OS and package manager o Installation scripts o Directory trees o Container build manifest? o Software packages installed using the package manager o Software packages installed using scripts o Package versions and licenses What we have What we’re missing
  • 36. ‹#›©2018 VMware, Inc. Files + Context Watching a cook make chicken curry Attribution: Nisha Kumar 2018
  • 37. ‹#› 37©2018 VMware, Inc. What command created what tarball?
  • 38. 38©2018 VMware, Inc. Why is there a 7th layer? I thought there were only 6 that occupied memory!
  • 39. 39©2018 VMware, Inc. You also get empty directories
  • 41. ‹#› 41©2018 VMware, Inc. Can I use this info to figure out what is installed in that layer?
  • 42. 42©2018 VMware, Inc. o sudo mount -t proc /proc /path/to/rootfs/proc o sudo mount -o bind /sys /path/to/rootfs/sys o sudo mount -o bind /dev /path/to/rootfs/dev o sudo cp /etc/resolv.conf /path/to/rootfs/etc/resolv.conf o sudo chroot rootfs /bin/bash -c “dpkg --get-selections” Mount base rootfs and chroot
  • 44. 44©2018 VMware, Inc. o sudo mount -t overlay overlay -o lowerdir=layer2:layer1,upperdir=emptydir,workdir=workdir mergedir o sudo mount -t proc /proc /path/to/rootfs/proc o sudo mount -o bind /sys /path/to/rootfs/sys o sudo mount -o bind /dev /path/to/rootfs/dev o sudo cp /etc/resolv.conf /path/to/rootfs/etc/resolv.conf o sudo chroot rootfs /bin/bash -c “dpkg --get-selections” Mount overlay2 filesystem and chroot
  • 45. ‹#› 45©2018 VMware, Inc. Is this automated?
  • 46. ‹#›©2018 VMware, Inc. Tern Automates Compliance for Containers
  • 49. 49©2018 VMware, Inc. o Support for Debian, Ubuntu, Photon and Alpine based package managers o Lists Packages installed and their Dependencies o Extensible Architecture (add your own method of license and source information) o Caching by Container Image Layer o Can be used as a standalone tool to help container developers or part of a container build and release pipeline o Structured data output (coming soon in Release 0.2.0) o Active community Features
  • 53. 53©2018 VMware, Inc. o Software packages installed using package manager o Package versions and licenses o Software packages installed using scripts o Files of unknown origin o Ad-hoc hacks o Build manifest What we have What we’re missing
  • 54. ‹#›©2018 VMware, Inc. Is the chicken local?
  • 55. 55©2018 VMware, Inc. Thank you! Rate this talk: eventmobi.com/app/allthingsopen2018 https://github.com/vmware/tern github: @nishakm twitter: @nishakmr

Editor's Notes

  • #3: A few disclaimers before we begin This presentation covers Open Source Software Compliance only. However, the information that you get here may also cover other compliance obligations like HIPAA, PCI DSS, FISMA, etc. because they all have to do with deep knowledge of the *content* you are distributing This presentation doesn’t provide any legal advice as I am not a lawyer This is a story about my experiences with containers. I would love to hear about your adventures in automating compliance for containers, especially open sourced tools. We have a lot to cover so if you could save your questions and comments to the end of the presentation I would appreciate it
  • #4: What if I gave you this picture of chicken curry. If you like to try chicken curry you may be wondering how you can make this for yourself, right? Would you assume that I did all this?
  • #5: No If you knew anything about cooking you would assume that. And
  • #6: Or maybe it’s this combination of things
  • #7: Or maybe it’s just a picture I got off the internet. But if you are interested in making chicken curry, you do not know anything about how it came together, even if you know that this is indeed chicken curry Even if you would want to know some information that is important to you, such as how spicy it is or whether those leaves are indeed cilantro because you’re one of those people for whom it tastes like soap, you won’t know anything about it
  • #8: This is what open source software compliance is about If chicken curry was your application, even if you made it yourself it isn’t entirely yours unless you did the farm-to-table thing Imagine if that chicken was given to you by a friend who raised chickens and would like you to tell everyone that it was their chicken. That’s terms of use In the open source world, not being transparent about where you got your bits from and not following terms of use erodes trust in you as a citizen of this world The legal implications are that, people may decide to take down their code or sue you. And this presentation isn’t about that
  • #9: Complying with OSS licenses for your app is easy as 1, 2, 3 Actually, it isn’t easy even for a simple standalone app You will have to use software package managers, the internet and your own sysadmin skills to find the dependencies and the licenses that govern them If you happen to be dependent on a library who’s source is governed by a copyleft license like one of the GPLs, then you will have to find the sources that created the binary that you are using The Linux Foundation lists some tools that can help with this at the link here There are also some enterprise services that will do compliance checks for you but this presentation isn’t about that
  • #10: The problem becomes more hairy when you containerize your app In order to understand how though, you need to know a little bit about container images and how they are built
  • #11: Some of you may have been to enough of these container talks to hear this line: containers are not real. They’re linux kernel processes that run with some special capabilities. This doesn’t mean the container images are not real. They occupy disk space. This is because in essence a container image is a collection of files
  • #12: All container builders are dependent on the underlying linux kernel’s storage drivers. The storage driver’s API is called to implement a union mount of various filesystem layers. An image starts with what is essentially a linux filesystem consisting of bin, boot, etc, root, etc Then at runtime, a thin copy on write layer is created on which it a container builder will invoke commands to copy in files, untar filesystem bundles or invoke commands Copy on Write means only if you write to a file that exists in the bottom layer, it gets copied to the top layer. What gets stored in the new layer is the newly modified file.
  • #13: Once files are installed, the copy on write layer becomes the new diff layer. The storage driver I am familiar with is overlay2, the copy on write layer is just an empty directory. Once that directory gets populated, it is kept Now that layer becomes the next layer and a new copy on write layer is created for the next addition of files
  • #14: One of Docker’s strong suites is that one can make an image that is specific to certain development environments and share that image with a wider community You do that by pulling an image from a public repository like Dockerhub And then build on top of it To automate builds, you would use a Dockerfile This is where most people stop and ask
  • #16: I have a recipe for chicken curry Heat oil in a pan - how much oil? What kind of oil? Where to I get it from? Stir in cinnamon, cardamom and clove: what are those? where do I get them from? Are they nuts cause I have a peanut allergy A Dockerfile is like a recipe that you would had to someone who doesn’t know the first thing about cooking
  • #17: - Let’s assume we have some go application we would like to containerize - This Dockerfile uses the builder pattern which means use one container to build the golang code and another smaller container to ship the golang binary - The build container has golang dependencies which includes the golang standard library, and whatever build dependencies are required Since go is a statically compiled language, that means you need to be extra careful with your binaries’ dependencies. Even though you are not distributing the golang image, you are copying a binary with statically compiled code into another container and possibly getting rid of the container you used to build it For compliance purposes, you need to know the whole dependency chain. The Dockerfile is not enough And this is why OSS compliance is much harder in containers
  • #18: Not very long ago, I was a build and release engineer for custom Android OSs. Android’s build and release pipeline starts from a manifest that looks like this This is a list of project names and their corresponding git repositories The tool used to manage all the git repositories (repo) can take a snapshot of all the projects that are used to build the android OS. This means you can find the exact commit in the project’s source that was used to build the final OS. In fact, tools used to build embedded OSs like Yocto and Buildroot use this kind of manifest to build their software
  • #19: In general build scripts are not build manifests Software build best practices always start from a manifest of sources. Most package managers allow you to trace back the packages installed to the sources. Container images do have a manifest though, but it’s conceptualized differently
  • #20: This is where my journey starts - finding a container manifest that I could use to find the container’s bill of materials - ingredients to my chicken curry if you will
  • #21: docker history gives you a first pass idea of what your container image is made of Let’s look at the golang container You can see that for each commit in history there is a shell command that created it This view mirrors the shell history command’s output i.e. what commands you ran in the container The OCI specs do not mandate that you put this information in your image. It’s not needed for functionality but it’s useful for debugging
  • #22: First thing we notice is this missing part It means that this image is a composite of several other images, whose information is now lost Docker uses a local image cache for faster execution. Once the image is uploaded information about other images are lost That would have been some useful information to have to create a manifest.
  • #23: In fact you can find this information, given enough time and effort searching through Dockerhub and git This image golang:1.11 is built on top of buildpack-deps images which is built on top of another buildpack-deps image which is finally built on top of a debian:jessie image I have included links to the Dockerfiles that created these images but this is very lucky as most Dockerhub images do not have links to the Dockerfiles that created them But we’re not looking for Dockerfiles, we’re looking for a build manifest with software components
  • #24: There are some clues here though First, notice that this image was built on top of a base linux distribution as a tarball was untarred in an empty directory. In a Dockerfile this would be FROM scratch and then ADD some external file into the root directory Then we see that there is some default execution command for when the container is ran But it doesn’t tell you what is in that filesystem
  • #25: But then we see apt-get being used so we know that’s the package manager for Debian or Ubuntu distributions Knowing that, we can use the package manager to list the packages that are included in that Debian image This is actually pretty accurate in listing your software components. Debian is a trusted supplier of linux distros, they have documented ways of pulling source for their binaries. You can prove this by running a filesystem scanner on this baseOS if you happen to have access to one
  • #26: There are 12 layers out of which 6 occupy memory Well, not exactly. One of them is a non-empty layer that doesn’t occupy space and we’ll get to that later
  • #27: Almost half the content gets installed using the package manager - that’s good because we can use the same package manager to list out all the installed packages in the image Because one of the goals of a package manager is software compliance, so it should allow us to find the licenses of all the software installed and get sources for the binaries
  • #28: But the other half of the content looks like it was installed using what looks like shell scripts
  • #29: These can get very long If you spend time on it though, you can understand what is going on You can see that they downloaded a specific binary and compared it to a sha256 digest, so at least you can go to the url and see if they have any documentation on licenses and sources
  • #30: Installation scripts may say things like we’re downloading a binary from a url but nothing about what version Perhaps that is something that can be gleaned from the docker build log. It may not be something that is available with the image though it would be nice
  • #31: You can use docker save to make a raw image tarball for you to inspect Untarring this tarball gives you a weath of information that you can programmatically analyze One of them is a manifest.json file Maybe we’ve hit the jackpot?
  • #33: Each of the filesystem layers are in their own tarball Each of the sha256 digests are the sha256 digests of the single tarball. This is how Docker checks to see if the files got corrupted during download.
  • #34: Here is the first layer’s layer.tar that is untarred. It’s the container version of Debian jessie with all the fixings
  • #35: Let’s take stock We have some method of figuring out what OS the container builder used We can parse installation scripts - or just read them to figure out where to go from there We have directory trees with files we can inspect We have a manifest.json? Yes, it’s a build manifest but not one that works at the software package level. It does work at the ”collection of files” level and the order in which they are applied, but not what those files contain
  • #36: So how do we get that information? Since we have directory trees, we may consider a few options: You can untar each of the files and apply some filesystem scanning tools on them FOSSA is a service that will do this If you have a Blackduck license you can use that to scan the files Clair is an open source security scanner that will do static analysis on the files However, in my particular case, the requirements are a bit more complicated
  • #37: A static analysis tool may tell you that a file is governed by a certain license A security scanner may even tell you if there is a CVE against the file It may even tell you what software component that file belongs to and what version of the component But it won’t tell you how it got there or what you can do to get it out This is especially the case with containers because, as we’ve seen, a lot of this information is obscure. How do you know that the container you are pulling and installing openssl on already has another version of openssl in it? And as we have already seen - a container build manifest doesn’t give you this information…directly anyway
  • #38: The direct question we’re trying to answer using the manifest is, what command created what tarball
  • #39: Config.json can be used to find this out We know that the layers are applied from the first to the last in the list We know that the history is ordered from the first to the last in the list BTW, how do we know that they are applied from first to last? For one that’s what the OCI spec says but I found this out mostly by testing Some of the lines in history are not empty_layers - so I want only those And there you go we can count …7 why is there a 7th when we counted 6 that occupy memory?? So is this not a fool proof way of checking context?
  • #40: You created a new inode and hence you created a new layer occupying 4096 bytes
  • #41: In the docker image history it’s this line that created the empty directory So maybe 4K is too small to be listed here In any case, this is just proof that maybe parsing the output of docker history, although faster, may not give you accurate or all of the information you need
  • #42: So I have a directory tree for each layer and I know what command was used to make it. Can I figure out from here what is installed? Yes, yes I can
  • #43: This is a brute force method of running a script in a chroot rootfs environment I already have the rootfs - it came with the container image I mount proc, sys and dev in it because those are what dpkg needs to run successfully I copy my host’s resolv.conf - that’s the equivalent as using the same network as your host Then I chroot into the filesystem and run my script How do I know this is what I need to run? Well, from the history clues I was able to make a deduction that the package manager was ‘apt’ and so I can use the same package manager to get all the information
  • #44: And this does actually work. This is very typical of building OSs
  • #45: You can also do this using overlay overlay2 is available in kernel 4.0 and above using the mount API Incidentally these are pretty much the same steps that are used when building a container Only here, you are retracing your steps layer by layer
  • #46: Now it’s a pain to do this for every single layer, so it would be a good idea to automate all these steps
  • #47: For that purpose, I have created an open source project Tern is a tool that uses all of these techniques to automate compliance for containers
  • #48: Tern will either take a Dockerfile and attempt to build it. If the build succeeds, it will analyze the first layer and then analyze the subsequent layers one after another It does this by running docker save on the resulting image After analysis it will generate a report If the build fails, Tern will attempt to parse the Dockerfile to figure out what the base image is If it does manage to pull the image, it will analyze that and try to figure out the rest of the components using just the scripts in the Dockerfile You can also give it images you pulled from a registry After analysis, it will generate a report
  • #49: Tern analyzes the container layer by layer by running commands in a chroot environment. It retrieves those ‘command library’ which is essentially a list of binary names and corresponding scripts to recover package versions and licenses for whatever was installed using that binary As a result, the architecture is extensible for OS package managers and language package managers. Even for stuff you download using git.
  • #50: That means I am working on this on the weekdays and sometimes the weekends. It’s a young project so not many
  • #51: You’ll get the layer, and you’ll get a notice on whether scripts are being run to retrieve information or if it comes from the cache This report is for a different image as the other one is too big for a demo
  • #52: And then you get a list of packages and versions I want to point out that the reason it’s all neat is because the package manager is rpm based and the metadata looks like this. If you were to look at dpkg, there is no License metadata and so the script just dumps the license file if present. It does this layer by layer
  • #53: It will also tell you if it cannot understand what commands you’ve run.
  • #54: Let’s take stock If we have something that automates based on package manager we can get these things We can even collect this information for containers, as long as we know the container ’history’ What we have difficulty with is automating long shell scripts simply because we have to rely on a knowledge base We cannot substitute software development best practices with automation things like having deep knowledge of the code you are working with, including the toolchain or good documentation of your process or knowledge sharing between teams We also do not have a build manifest. There are solutions out there that will provide a base OS with the complete list of software on it, and even language or app specific environments But due to the way containers are built, and shared, there is no way of building a container from a manifest right now
  • #55: In Portland, OR where I am from, there is this trend of customers insisting that restaurants provide information about where their chicken comes from In the same way, users of software no longer care if the software just works, they care about the ecosystem around the software. Because being able to answer all of these questions shows how well a software supplier knows their software Are the wishes of the creator’s code followed? Is the container image free from security vulnerabilities, is the software development team doing their due diligence - a software development team cannot answer this unless they have deep knowledge of what they are distributing This is a hard thing to implement in code and in processes, but the community is moving in the direction of tackling these kinds of due diligence problems, so there is hope on that front