SlideShare a Scribd company logo
Automatically Generating Dockerfiles
via Deep-Learning: Challenges and Promises
Giovanni Rosa, Antonio Mastropaolo, Simone Scalabrino,
Gabriele Bavota and Rocco Oliveto
17th International Conference on
Software and System Processes
14-15 May 2023, MCEC (Melbourne)
@giovannipink
University of Molise, Italy
Containerization, how it works?
Build Ship Run
Software Containers
85% of
organizations will
adopt containers
by 2025
#1 Most-Wanted
and
#1 Most Loved
tool
85% of
organizations will
adopt containers
by 2025
Dockerfile
Image Container
run
build
Docker in a Nutshell
Dockerfile
Image Container
run
build
Docker in a Nutshell
Dockerfile
Container
run
build
Docker in a Nutshell
Image
Dockerfile
Image Container
run
build
Docker in a Nutshell
Writing
Dockerfiles
is challenging
Time-consuming
activity
Reis et. al 2021
Henkel et. al 2020
Time-consuming
activity
Technical knowledge
required
Reis et. al 2021
Henkel et. al 2020
Time-consuming
activity
Technical knowledge
required
Limited
supporting tools
Reis et. al 2021
Henkel et. al 2020
Hanayama et. al 2020
AI-based
Code Assistants
…limited to
dependencies
and
code completion
Horton et. al 2019
Ye et. al 2021
Hanayama et. al 2020
Horton et. al 2019
Ye et. al 2021
Hanayama et. al 2020
AI-based
Code Assistants
…limited to
dependencies
and
code completion
The automated
generation
from scratch
is still an
open challenge
What about
ChatGPT?
Not Bad!
but…
build error
L L
Prompt:
Can you provide a Dockerfile
using Debian 9 and installing
Golang 1.9.4 via sources?
What about
state-of-the-art DL models
for code-related tasks?
T5
Phase 1:
Model Construction
T5 Model Generated
Dockerfile
T5
prediction
Dockerfile Generation via T5
input
Natural language
requirements
How to represent software
requirements for a Dockerfile?
Natural Language:
Too Broad!
High-Level specification
HLS
Operating System: “alpine”
Package Manager: “apk”
Package Requirements: [“python3”]
Download from Sources: FALSE
ENV variables: FALSE
Build ARGs: FALSE
LABEL: TRUE
EXPOSE for ports: TRUE
CMD: TRUE
ENTRYPOINT: FALSE
High-Level Specification
High-Level specification
HLS
Operating System: “alpine”
Package Manager: “apk”
Package Requirements: [“python3”]
Download from Sources: FALSE
ENV variables: FALSE
Build ARGs: FALSE
LABEL: TRUE
EXPOSE for ports: TRUE
CMD: TRUE
ENTRYPOINT: FALSE
High-Level specification
High-Level Specification
High-Level specification
HLS
Operating System: “alpine”
Package Manager: “apk”
Package Requirements: [“python3”]
Download from Sources: FALSE
ENV variables: FALSE
Build ARGs: FALSE
LABEL: TRUE
EXPOSE for ports: TRUE
CMD: TRUE
ENTRYPOINT: FALSE
High-Level specification
High-Level Specification
OS:
“alpine3.17”
Pkg. Manager:
“apk”
Pkg. Requirements:
“python3”
High-Level specification
High-Level specification
High-Level Specification
12
software developers
+2 years
> 50%
agrees with the
requirements
specification
High-Level specification
Asking Developers’ Opinion
9.4M
unique Dockerfiles
from 2013 to 2020
Eng et. al 2021
High-Level specification
Dockerfile Dataset
Dataset Filtering
1
Dataset Filtering
2
Dataset Filtering
3
100k Model Tuning
11k Test
Resulting Dataset
Pre-Training
T5 model
construction
560kinstances
Fine-Tuning
90kinstances
3pre-training settings
Dockerfile
& English
Dockerfile
Only
English
Only
T5 model
construction
T5 model
construction Dockerfile
& English
Dockerfile
Only
English
Only
Fine Tuning
Phase 2:
Model evaluation
2baselines
90k
instances
Elasticsearch
HLS ES node
bool query
IR-Baseline 1
SentTransformers
BERT
model
cosine
similarity
IR-Baseline 1
HLS
3dimensions
11k Test
Evaluation
Adherence to the input High-Level Specification
RQ1
Metric:
Field-by-field
match
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
Input HLS Generated/Retrieved HLS
VS
ES
Results for RQ1
OS ~1.00 0.92 0.88
Pkg. Manager 0.98 1.00 1.00
Pkg. Requirements 0.87 0.88 0.76
Download from sources 0.82 0.84 0.52
ENV variables 0.89 0.81 0.17
Build ARGs 0.99 0.88 0.17
LABEL ~1.00 0.87 0.37
EXPOSE for ports 0.80 0.83 0.45
CMD 0.74 0.83 0.26
ENTRYPOINT 0.84 0.85 0.45
T5
Structural similarity between Dockerfiles
RQ2
Metric:
AST
edit distance
Input Dockerfile
Parsed AST
(Henkel et. al 2020)
RQ2
Results for RQ2
The lower
is better
Structural similarity between Dockerfiles
ES
T5
RQ3
Metric:
Percentage of
matching layers
build
Input Dockerfile
build
Gen./Retr. Dockerfile
Docker Image A
Docker Image B
Matching
SHAs
Similarity between Docker images
Similarity between Docker images
RQ3
Results for RQ3
ES
T5
Summary
T5 achieves slightly better
results than IR ...
Summary
T5 achieves slightly better
results than IR ...
Generated Dockerfiles
require manual adjustments
More resource-consuming
compared to IR
What we have learned?
Challenge
#1
Not enough training instances
Challenge
#1
Not enough training instances
Data augmentation
Challenge
#2
A different training procedure
must be used
Challenge
#2
Different stopping
criterion
A different training procedure
must be used
Dockerfile abstractions
Summary
Giovanni Rosa
https://giovannirosa.com Thank you!

More Related Content

PDF
Fixing Dockerfile Smells: An Empirical Study (ICSME 2022)
PPTX
Secure Software Development Life Cycle
PDF
The Road Toward Dependable AI Based Systems
PDF
Clean code
PPTX
PDF
Achieving CI/CD with Kubernetes
PDF
EMBA - From Firmware to Exploit - BHEU22
PPS
Unit Testing
Fixing Dockerfile Smells: An Empirical Study (ICSME 2022)
Secure Software Development Life Cycle
The Road Toward Dependable AI Based Systems
Clean code
Achieving CI/CD with Kubernetes
EMBA - From Firmware to Exploit - BHEU22
Unit Testing

What's hot (20)

PDF
penetration test using Kali linux seminar report
PDF
Practice of Android Reverse Engineering
PPTX
Unit Testing (C#)
PPTX
Unit testing
PPTX
Docker Container Security
PPTX
Building an application security program
PPTX
Static Code Analysis
PPTX
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
ODP
Refactoring: Improving the design of existing code
PPTX
PDF
Test Driven Development With Python
PDF
Overview on TDD (Test Driven Development) & ATDD (Acceptance Test Driven Deve...
PDF
docker installation and basics
PPTX
Basics of python
PDF
Python, the Language of Science and Engineering for Engineers
PPTX
DevOps Engineer [Arabic]
PPTX
Test planning & estimation
PDF
Clean architecture - Protecting the Domain
PDF
Introduction to Docker Compose
PPTX
Introduction to jenkins
penetration test using Kali linux seminar report
Practice of Android Reverse Engineering
Unit Testing (C#)
Unit testing
Docker Container Security
Building an application security program
Static Code Analysis
What Is A Docker Container? | Docker Container Tutorial For Beginners| Docker...
Refactoring: Improving the design of existing code
Test Driven Development With Python
Overview on TDD (Test Driven Development) & ATDD (Acceptance Test Driven Deve...
docker installation and basics
Basics of python
Python, the Language of Science and Engineering for Engineers
DevOps Engineer [Arabic]
Test planning & estimation
Clean architecture - Protecting the Domain
Introduction to Docker Compose
Introduction to jenkins
Ad

Similar to Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises (20)

PPTX
Programming the world with Docker
PDF
Docker adventures in Continuous Delivery - Alex Vranceanu
PDF
Tampere Docker meetup - Happy 5th Birthday Docker
PPTX
Docker Bday #5, SF Edition: Introduction to Docker
PDF
Docker Birthday #5 Meetup Cluj - Presentation
PPTX
DockerDay2015: Keynote
PDF
WSO2Con EU 2015: Keynote - The Containerization of the Developer Workspace
PDF
DockerCon SF 2015: Keynote Day 1
PDF
When to use Serverless? When to use Kubernetes?
PDF
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
PDF
給 RD 的 Kubernetes 初體驗
PPTX
What's New in Docker - February 2017
PPTX
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
PPTX
Deploying R for Production - SRUG
PDF
Immutable Infrastructure: Rise of the Machine Images
PPTX
Bahrain ch9 introduction to docker 5th birthday
PDF
The DevOps Paradigm
PDF
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
PDF
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
PDF
Meetup Devops-Geneva-19.10.2019
Programming the world with Docker
Docker adventures in Continuous Delivery - Alex Vranceanu
Tampere Docker meetup - Happy 5th Birthday Docker
Docker Bday #5, SF Edition: Introduction to Docker
Docker Birthday #5 Meetup Cluj - Presentation
DockerDay2015: Keynote
WSO2Con EU 2015: Keynote - The Containerization of the Developer Workspace
DockerCon SF 2015: Keynote Day 1
When to use Serverless? When to use Kubernetes?
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
給 RD 的 Kubernetes 初體驗
What's New in Docker - February 2017
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Deploying R for Production - SRUG
Immutable Infrastructure: Rise of the Machine Images
Bahrain ch9 introduction to docker 5th birthday
The DevOps Paradigm
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Meetup Devops-Geneva-19.10.2019
Ad

More from Giovanni Rosa (6)

PDF
Tether: A Study on Bubble-Networks
PDF
What Quality Aspects Influence the Adoption of Docker Images?
PDF
Assessing and Improving the Quality of Docker Artifacts (ICSME 2022)
PDF
A Robust Approach for a Real-time Accurate Screening of ST Segment Anomalies ...
PDF
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
PDF
Automatic Real-time Beat-to-beat Detection of Arrhythmia Conditions (HEALTHIN...
Tether: A Study on Bubble-Networks
What Quality Aspects Influence the Adoption of Docker Images?
Assessing and Improving the Quality of Docker Artifacts (ICSME 2022)
A Robust Approach for a Real-time Accurate Screening of ST Segment Anomalies ...
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Automatic Real-time Beat-to-beat Detection of Arrhythmia Conditions (HEALTHIN...

Recently uploaded (20)

PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
L1 - Introduction to python Backend.pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
System and Network Administration Chapter 2
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Design an Analysis of Algorithms II-SECS-1021-03
wealthsignaloriginal-com-DS-text-... (1).pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PTS Company Brochure 2025 (1).pdf.......
L1 - Introduction to python Backend.pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System
How Creative Agencies Leverage Project Management Software.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 2 - PM Management and IT Context
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Strategies for Manufacturing Companies
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
System and Network Administration Chapter 2
Operating system designcfffgfgggggggvggggggggg
Understanding Forklifts - TECH EHS Solution
Design an Analysis of Algorithms II-SECS-1021-03

Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises