SlideShare a Scribd company logo
The hardest thing
in computer science
Hard things
Docker Caching
Dependency versions
Install dependencies
[ 20 minutes or so ]
Only here copy all sources
Intended behaviour
● No change:
docker is not rebuilt - LIGHTNING FAST!!!!
● Sources change/dependencies not:
only sources are added - QUITE FAST !!!
● Dependencies change:
dependencies installed, sources - LITTLE SLOWER !!
Actual behaviour
same machine - local checkout
● Local docker registry
● Repeated build: 1:06m
● Only sources: 1:30m
● Dependencies: 11m
● Whole build: ~ 30m
CI case
● Always fresh machine
○ no code
○ no registry
● Git clone/checkout
● Build
● Wipeout
Docker registry to the rescue!
Build cache:
● Docker build
● Docker push airflow/airflow:latest
Use cache:
● Docker pull airflow/airflow:latest
● docker build --cache-from ariflow/airflow:latest
Actual behaviour
Docker Hub automated build
● DockerHub docker registry as cache
● Repeated build: 11m
● Only sources: 11m <- Still OK
● Dependencies: ~1h
● Whole build: ~ 2h
Using the cache in Travis CI
● Docker Hub builds are slow
● Travis or Cloud Build use earlier image
with --cache-from
● But only sources change most of the
time
Caching in Docker - the hardest thing in computer science
Actual BAD behaviour
Travis CI automated build
● Build on Travis with cache from DockerHub
● Repeated build: 11m
● Only sources: 1 h <-
● Dependencies: 1h
● Whole build: ~ 2h
Caching in Docker - the hardest thing in computer science
Problem no 1
Git & permissions
● git clone file creation:
○ local user
○ default user’s group
● file/dir permissions (rwxs)
○ preserves user, group and other rx permissions files & dirs
○ does not store w and by default uses umask when cloning by default
○ core.sharedRepository git-config
■ one of: group(true), all, umask(false), 0xxx
● Umask WTF:
○ file: 644 (DockeHub) vs. 664 (Travis CI)
○ dir: 755 (DockerHub) vs. 775 (Travis CI)
Solution to problem 1
Fix group permissions
Problem no 2
Generated files
● not only .gitignore
● generated files
○ autoapi - documentation
○ build artifacts
○ npm cache
○ .pyc files
○ files created accidentally (wget in source folder anyone?)
● COPY .
● Context calculated based on ALL files
● .dockerignore != .gitignore
● slightly different syntax
Solution to problem 2
Set .dockerignore ** by default
Problem no. 3
● Download & compile ALL dependencies takes time!
Partial solution to problem 3
Find the weakest link
Solution to problem 3
a) build image with wheels
Solution to problem 3
b) Copy directory via multi-stage
Docker builds
Solution 3
c) install using wheels
Caching in Docker - the hardest thing in computer science
Thank You!
You can add some info where to follow you,
or add information about
polidea.com/blog

More Related Content

PDF
Virtual Machines and Docker
PDF
Breaking the RpiDocker challenge
PDF
Libcontainer: joining forces under one roof
PPTX
Docker slides
PDF
Rails in docker
PPTX
Docker intro workshop: Dockerize your PHP app
PPTX
what is docker
PPTX
Docker e git lab
Virtual Machines and Docker
Breaking the RpiDocker challenge
Libcontainer: joining forces under one roof
Docker slides
Rails in docker
Docker intro workshop: Dockerize your PHP app
what is docker
Docker e git lab

What's hot (20)

PPTX
Introduction to Docker Compose
PDF
Docker Athens: Docker Engine Evolution & Containerd Use Cases
PDF
RancherOS - The perfect place to run Docker
PDF
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
PPTX
Ansible as a better shell script
PDF
[Szjug] Docker. Does it matter for java developer?
PDF
Let's Count Bytes! Launching Ruby in 32K of RAM
PDF
Containers: What are they, Really?
PDF
CRI Runtimes Deep-Dive: Who's Running My Pod!?
PDF
Clustering Docker with Docker Swarm on openSUSE
PDF
EC2 Storage for Docker 150526b
PDF
CoreOS + Kubernetes @ All Things Open 2015
PDF
It's 2018. Are My Containers Secure Yet!?
PDF
Upstate DevOps - Containers 101 - March 28, 2019
ODP
Docker. Micro services for lazy developers
PPTX
Datacenter Airlift - "Docker and the world of “containerized" environments"
PDF
2 docker engine_hands_on
 
PPTX
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
PDF
Docker tutorial2
PPTX
Dockerin10mins
Introduction to Docker Compose
Docker Athens: Docker Engine Evolution & Containerd Use Cases
RancherOS - The perfect place to run Docker
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Ansible as a better shell script
[Szjug] Docker. Does it matter for java developer?
Let's Count Bytes! Launching Ruby in 32K of RAM
Containers: What are they, Really?
CRI Runtimes Deep-Dive: Who's Running My Pod!?
Clustering Docker with Docker Swarm on openSUSE
EC2 Storage for Docker 150526b
CoreOS + Kubernetes @ All Things Open 2015
It's 2018. Are My Containers Secure Yet!?
Upstate DevOps - Containers 101 - March 28, 2019
Docker. Micro services for lazy developers
Datacenter Airlift - "Docker and the world of “containerized" environments"
2 docker engine_hands_on
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Docker tutorial2
Dockerin10mins
Ad

Similar to Caching in Docker - the hardest thing in computer science (20)

PDF
Использование Docker в CI / Александр Акбашев (HERE Technologies)
PDF
Optimizing Docker Images
PPTX
Effective images remix
PDF
Be a better developer with Docker (revision 3)
PPTX
Build optimization mechanisms in GitLab and Docker
PDF
Docker in Continuous Integration
PDF
ContainerDays Boston 2015: "Continuous Delivery with Containers" (Nick Gauthier)
PPTX
Настройка окружения для кросскомпиляции проектов на основе docker'a
PDF
Optimizing Your CI Pipelines
PDF
5 Things I Wish I Knew About Gitlab CI
PDF
Docker Introduction + what is new in 0.9
PDF
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
PDF
Docker and friends at Linux Days 2014 in Prague
PDF
LXC to Docker Via Continuous Delivery
PPT
Docker - a lot changed in a year
PDF
Things I've learned working with Docker Support
PPTX
Scaling Development Environments with Docker
PDF
Docker primer and tips
PDF
Scale Big With Docker — Moboom 2014
Использование Docker в CI / Александр Акбашев (HERE Technologies)
Optimizing Docker Images
Effective images remix
Be a better developer with Docker (revision 3)
Build optimization mechanisms in GitLab and Docker
Docker in Continuous Integration
ContainerDays Boston 2015: "Continuous Delivery with Containers" (Nick Gauthier)
Настройка окружения для кросскомпиляции проектов на основе docker'a
Optimizing Your CI Pipelines
5 Things I Wish I Knew About Gitlab CI
Docker Introduction + what is new in 0.9
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker and friends at Linux Days 2014 in Prague
LXC to Docker Via Continuous Delivery
Docker - a lot changed in a year
Things I've learned working with Docker Support
Scaling Development Environments with Docker
Docker primer and tips
Scale Big With Docker — Moboom 2014
Ad

More from Jarek Potiuk (11)

PDF
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
PDF
Subtle Differences between Python versions
PDF
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
PDF
Off time - how to use social media to be more out of social media
PDF
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
PDF
Berlin Apache Con EU Airflow Workshops
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
PDF
Ci for android OS
PDF
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
PPTX
It's a Breeze to develop Airflow (Cloud Native Warsaw)
PPTX
React native introduction (Mobile Warsaw)
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Subtle Differences between Python versions
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Off time - how to use social media to be more out of social media
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Berlin Apache Con EU Airflow Workshops
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Ci for android OS
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
React native introduction (Mobile Warsaw)

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation theory and applications.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation theory and applications.pdf
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf

Caching in Docker - the hardest thing in computer science

  • 1. The hardest thing in computer science
  • 3. Docker Caching Dependency versions Install dependencies [ 20 minutes or so ] Only here copy all sources
  • 4. Intended behaviour ● No change: docker is not rebuilt - LIGHTNING FAST!!!! ● Sources change/dependencies not: only sources are added - QUITE FAST !!! ● Dependencies change: dependencies installed, sources - LITTLE SLOWER !!
  • 5. Actual behaviour same machine - local checkout ● Local docker registry ● Repeated build: 1:06m ● Only sources: 1:30m ● Dependencies: 11m ● Whole build: ~ 30m
  • 6. CI case ● Always fresh machine ○ no code ○ no registry ● Git clone/checkout ● Build ● Wipeout
  • 7. Docker registry to the rescue! Build cache: ● Docker build ● Docker push airflow/airflow:latest Use cache: ● Docker pull airflow/airflow:latest ● docker build --cache-from ariflow/airflow:latest
  • 8. Actual behaviour Docker Hub automated build ● DockerHub docker registry as cache ● Repeated build: 11m ● Only sources: 11m <- Still OK ● Dependencies: ~1h ● Whole build: ~ 2h
  • 9. Using the cache in Travis CI ● Docker Hub builds are slow ● Travis or Cloud Build use earlier image with --cache-from ● But only sources change most of the time
  • 11. Actual BAD behaviour Travis CI automated build ● Build on Travis with cache from DockerHub ● Repeated build: 11m ● Only sources: 1 h <- ● Dependencies: 1h ● Whole build: ~ 2h
  • 13. Problem no 1 Git & permissions ● git clone file creation: ○ local user ○ default user’s group ● file/dir permissions (rwxs) ○ preserves user, group and other rx permissions files & dirs ○ does not store w and by default uses umask when cloning by default ○ core.sharedRepository git-config ■ one of: group(true), all, umask(false), 0xxx ● Umask WTF: ○ file: 644 (DockeHub) vs. 664 (Travis CI) ○ dir: 755 (DockerHub) vs. 775 (Travis CI)
  • 14. Solution to problem 1 Fix group permissions
  • 15. Problem no 2 Generated files ● not only .gitignore ● generated files ○ autoapi - documentation ○ build artifacts ○ npm cache ○ .pyc files ○ files created accidentally (wget in source folder anyone?) ● COPY . ● Context calculated based on ALL files ● .dockerignore != .gitignore ● slightly different syntax
  • 16. Solution to problem 2 Set .dockerignore ** by default
  • 17. Problem no. 3 ● Download & compile ALL dependencies takes time!
  • 18. Partial solution to problem 3 Find the weakest link
  • 19. Solution to problem 3 a) build image with wheels
  • 20. Solution to problem 3 b) Copy directory via multi-stage Docker builds
  • 21. Solution 3 c) install using wheels
  • 23. Thank You! You can add some info where to follow you, or add information about polidea.com/blog