SlideShare a Scribd company logo
Characterizing Defective Configuration
Scripts Used for Continuous
Deployment
Akond Rahman and Laurie Williams
aarahman@ncsu.edu
https://akondrahman.github.io/
Outline
• Motivation
• Goal
• Research Questions
• Methodology
• Dataset
• Empirical Findings
• Summary
2
Motivation: Continuous Deployment
3
Continuous
Integration
Integrate software
changes regularly
Continuous
Deployment
Integrate software
changes regularly
Automatically
deploy software
changes rapidly
Motivation: Continuous Deployment
https://en.facebookbrand.com/assets/f-logo https://www.stitcher.com/podcast/etsy https://brand.netflix.com/en/assets/
Facebook Etsy Netflix
4
Motivation: Continuous Deployment Summit
Continuous Deployment
Summit
(2015, 2016, 2017)
Dr. Chris Parnin Dr. Laurie Williams
5
Motivation: Infrastructure as Code ( IaC)
Continuous Deployment Summit
(2015, 2016, 2017)
6
Ansible Chef Puppet
Motivation: Defects in IaC scripts
• Defects in IaC scripts can have serious consequences
7https://wikitech.wikimedia.org/wiki/Incident_documentation/20170118-Labs
Wikimedia
Commons
Goal
To help software practitioners to prioritize validation
and verification efforts for infrastructure as code (IaC)
scripts by identifying the operations that characterize
defective IaC scripts
8
Research Questions
• RQ-1: Which are the operations that characterize
defective infrastructure as code (IaC) scripts? How
frequently do the identified operations appear in IaC
scripts?
• RQ-2: How can we build prediction models for defective
infrastructure as code scripts using text features?
9
Background
case $::operatingsystem {
‘CentOS’: {
exec { "install iptables":
command => "/bin/yum install iptables -y",
creates => "/sbin/iptables";
}
}
default: {
fail("cannot install on $::operatingsystem")
}
}
https://puppet.com/company/press-room/media-kit 10
Methodology
• Dataset Construction
• Empirical Analysis
11
Methodology: Dataset Construction
Extract
Commit
IaC Repositories
12
Rater-1
Rater-2
Disagree?
Mark as
defect or
non-defect
Resolver
Mark as
defect or
non-defect
No
Yes
Methodology: Empirical Analysis
Dataset
Feature
Extraction
Correlating
Features
Answer to
RQ1
Answer to
RQ2
Bag of words,
TF-IDF
Principal
Component
Analysis
Qualitative
Analysis
Random Forest,
Parameter Tuning, AUC, F1
13
Dataset: Sources
Mozilla Openstack Wikimedia
Commons
https://blog.mozilla.org/opendesign/arrival/ https://www.openstack.org/brand/openstack-logo/
https://commons.wikimedia.org/wiki/Category:Logos
14
Dataset
Mozilla Openstack Wikimedia
Time Period Aug, 2011- Sep,
2016
Mar, 2011- Sep,
2016
Apr, 2005 – Sep,
2016
IaC Code Size
(LOC)
30,272 122,083 17,439
Defect-related
Commits
558 of 3074,
18.1%
1987 of 7808,
25.4%
298 of 972,
30.6%
Defective IaC
Scripts
259 of 580
(44.6%)
810 of 1383
(58.5%)
161 of 296
(54.4%)
15
Answer to RQ1: Which are the operations that
characterize of defective infrastructure as
code (IaC) scripts?
• Filesystem operations
• Infrastructure provisioning
– Build systems
– Data analytics systems
– Database systems
– Web server systems
• Managing user accounts
16
Answer to RQ1: Filesystem operations
17
Answer to RQ1: Infrastructure provisioning
• Build systems
18
Answer to RQ1: Infrastructure provisioning
• Data analytics systems
19
Answer to RQ1: Infrastructure provisioning
• Database systems
20
Answer to RQ1: Infrastructure provisioning
• Web server systems
21
Answer to RQ1: Managing user accounts
22
Answer to RQ1: Frequency
21.7
14.5
23.4
6.9
18.9 17.9
2.5 1.1 1.6
0
5
10
15
20
25
Mozilla Openstack Wikimedia
PercentageofIaCScripts
Filesystem Infrastructure provision User account
23
Answer to RQ2: How can we build prediction
models for defective infrastructure as code
scripts using text features?
0.76
0.59
0.68
0.75
0.55 0.56
0
0.2
0.4
0.6
0.8
Mozilla Openstack Wikimedia
MedianAUC
Bag-of-words TF-IDF
0.74
0.71
0.73
0.72
0.74
0.7
0.65
0.7
0.75
Mozilla Openstack Wikimedia
MedianF-Measure
Bag-of-words TF-IDF 24
Limitations
• Dataset construction
• Text mining
25
Implications
• Raise awareness amongst practitioners when doing
certain operations
• Prioritizing verification and validation efforts based on
identified operations
26
Summary
Answer to RQ1: Frequency
21.7
14.5
23.4
6.9
18.9 17.9
2.5 1.1 1.6
0
5
10
15
20
25
Mozilla Openstack Wikimedia
PercentageofIaCScripts
Filesystem Infrastructure provision User account
aarahman@ncsu.edu
akondrahman
https://akondrahman.github.io/
Contact
Answer to RQ2: How can we build prediction
models for defective infrastructure as code
scripts using text features?
0.76
0.59
0.68
0.75
0.55 0.56
0
0.2
0.4
0.6
0.8
Mozilla Openstack Wikimedia
MedianAUC
Bag-of-words TF-IDF
0.74
0.71
0.73
0.72
0.74
0.7
0.65
0.7
0.75
Mozilla Openstack Wikimedia
MedianF-Measure
Bag-of-words TF-IDF
27
Motivation: Defects in IaC scripts
• Defects in IaC scripts can have serious consequences
6https://wikitech.wikimedia.org/wiki/Incident_documentation/20170118-Labs
Wikimedia
Commons

More Related Content

PPTX
Shhh!: Secret Management Practices for Infrastructure as Code
PPTX
Characteristics of Defective Infrastructure as Code Scripts in Continuous Dep...
PPTX
Source Code Properties of Defective Infrastructure as Code Scripts
PPTX
What Questions Do Programmers Ask About Configuration as Code?
PPTX
Under-reported Security Defects in Kubernetes Manifests
PDF
使用 Elastic Stack 进行端对端安全分析
PDF
Monitoring MongoDB Atlas with Datadog
PDF
Combining Logs, Metrics, and Traces for Unified Observability
Shhh!: Secret Management Practices for Infrastructure as Code
Characteristics of Defective Infrastructure as Code Scripts in Continuous Dep...
Source Code Properties of Defective Infrastructure as Code Scripts
What Questions Do Programmers Ask About Configuration as Code?
Under-reported Security Defects in Kubernetes Manifests
使用 Elastic Stack 进行端对端安全分析
Monitoring MongoDB Atlas with Datadog
Combining Logs, Metrics, and Traces for Unified Observability

What's hot (20)

PPTX
Application Security Logging with Splunk using Java
PDF
The Intersection of Security and DevOps
PDF
Security Events Logging at Bell with the Elastic Stack
PDF
End-to-End Security Analytics with the Elastic Stack
PPTX
You Build It, You Secure It: Introduction to DevSecOps
PPTX
ADDO Open Source Observability Tools
PDF
Get full visibility and find hidden security issues
PDF
Automate threat detections and avoid false positives
PPTX
WhiteSource Webinar What's New With WhiteSource in December 2018
PDF
American Ancestors Use Case - Scalability & Support Using the Elasticsearch S...
PDF
Automate Your Container Deployments Securely
PPTX
Nextpoint
PDF
Automatize a detecção de ameaças e evite falsos positivos
PDF
Open source cloud native security with threat mapper
PPTX
Find Out What's New With WhiteSource May 2018- A WhiteSource Webinar
PDF
Reinventing enterprise defense with the Elastic Stack
PPTX
Log Search Service Introduction
PDF
Elastic Security : Protéger son entreprise avec la Suite Elastic
PDF
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
PPTX
Managing Microservices at Scale
Application Security Logging with Splunk using Java
The Intersection of Security and DevOps
Security Events Logging at Bell with the Elastic Stack
End-to-End Security Analytics with the Elastic Stack
You Build It, You Secure It: Introduction to DevSecOps
ADDO Open Source Observability Tools
Get full visibility and find hidden security issues
Automate threat detections and avoid false positives
WhiteSource Webinar What's New With WhiteSource in December 2018
American Ancestors Use Case - Scalability & Support Using the Elasticsearch S...
Automate Your Container Deployments Securely
Nextpoint
Automatize a detecção de ameaças e evite falsos positivos
Open source cloud native security with threat mapper
Find Out What's New With WhiteSource May 2018- A WhiteSource Webinar
Reinventing enterprise defense with the Elastic Stack
Log Search Service Introduction
Elastic Security : Protéger son entreprise avec la Suite Elastic
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Managing Microservices at Scale
Ad

Similar to Characterizing Defective Configuration Scripts Used for Continuous Deployment (20)

PPTX
Infrastrucutre as Code
PDF
Cloud infrastructure as code
PPTX
Infrastructure as Code
PPTX
iac.pptx
PDF
Infrastructure as Code with Ansible
PDF
Infrastructure As Code - NextGen Testing 2015
PPTX
Infrastructure as Code with Terraform on AWS
PPTX
IaC (infrastructure as a code) basics Presentation
PDF
Infrastructure as Code 101: Steve Tegeler + Nathan Ness, VMware
PDF
Infrastructure as Code & its Impact on DevOps
PDF
Infrastructure as Code, Theory Crash Course
PDF
Ansible_ Automating Infrastructure With IaC - Datavalley (1).pdf
DOCX
Empowering DevOps Harnessing Infrastructure as Code Services at HEX64.docx
PPTX
Infrastructure as code
PPTX
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
PDF
Infrastructure as Code
PDF
Getting Started with Infrastructure as Code (IaC)
PDF
Getting Started with Infrastructure as Code
PPTX
Infrastructure as Code (IaC)
PDF
Infrastructure as code managing servers in the cloud Morris
Infrastrucutre as Code
Cloud infrastructure as code
Infrastructure as Code
iac.pptx
Infrastructure as Code with Ansible
Infrastructure As Code - NextGen Testing 2015
Infrastructure as Code with Terraform on AWS
IaC (infrastructure as a code) basics Presentation
Infrastructure as Code 101: Steve Tegeler + Nathan Ness, VMware
Infrastructure as Code & its Impact on DevOps
Infrastructure as Code, Theory Crash Course
Ansible_ Automating Infrastructure With IaC - Datavalley (1).pdf
Empowering DevOps Harnessing Infrastructure as Code Services at HEX64.docx
Infrastructure as code
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Infrastructure as Code
Getting Started with Infrastructure as Code (IaC)
Getting Started with Infrastructure as Code
Infrastructure as Code (IaC)
Infrastructure as code managing servers in the cloud Morris
Ad

More from Akond Rahman (8)

PPTX
How Do Students Feel About Automated Security Static Analysis Exercises?
PPTX
Exercise Perceptions: Experience Report From A Secure Software Development Co...
PPTX
Synthesizing Program Execution Time Discrepancies in Julia Used for Scientifi...
PPTX
Comprehension Effort and Programming Activities: Related? Or Not Related?
PDF
Predicting Android Application Security and Privacy Risk With Static Code Met...
PDF
Which Factors Influence Practitioners' Usage of Build Automation Tools?
PPTX
Software Security in DevOps: Synthesizing Practitioners’ Perceptions and Prac...
PPTX
Synthesizing Continuous Deployment Practices in Software Development
How Do Students Feel About Automated Security Static Analysis Exercises?
Exercise Perceptions: Experience Report From A Secure Software Development Co...
Synthesizing Program Execution Time Discrepancies in Julia Used for Scientifi...
Comprehension Effort and Programming Activities: Related? Or Not Related?
Predicting Android Application Security and Privacy Risk With Static Code Met...
Which Factors Influence Practitioners' Usage of Build Automation Tools?
Software Security in DevOps: Synthesizing Practitioners’ Perceptions and Prac...
Synthesizing Continuous Deployment Practices in Software Development

Recently uploaded (20)

PDF
Introduction to Data Science and Data Analysis
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to machine learning and Linear Models
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
annual-report-2024-2025 original latest.
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Data Science and Data Analysis
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Reliability_Chapter_ presentation 1221.5784
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to machine learning and Linear Models
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Clinical guidelines as a resource for EBP(1).pdf
annual-report-2024-2025 original latest.
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
STUDY DESIGN details- Lt Col Maksud (21).pptx
SAP 2 completion done . PRESENTATION.pptx
Business Analytics and business intelligence.pdf
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx

Characterizing Defective Configuration Scripts Used for Continuous Deployment

  • 1. Characterizing Defective Configuration Scripts Used for Continuous Deployment Akond Rahman and Laurie Williams aarahman@ncsu.edu https://akondrahman.github.io/
  • 2. Outline • Motivation • Goal • Research Questions • Methodology • Dataset • Empirical Findings • Summary 2
  • 3. Motivation: Continuous Deployment 3 Continuous Integration Integrate software changes regularly Continuous Deployment Integrate software changes regularly Automatically deploy software changes rapidly
  • 4. Motivation: Continuous Deployment https://en.facebookbrand.com/assets/f-logo https://www.stitcher.com/podcast/etsy https://brand.netflix.com/en/assets/ Facebook Etsy Netflix 4
  • 5. Motivation: Continuous Deployment Summit Continuous Deployment Summit (2015, 2016, 2017) Dr. Chris Parnin Dr. Laurie Williams 5
  • 6. Motivation: Infrastructure as Code ( IaC) Continuous Deployment Summit (2015, 2016, 2017) 6 Ansible Chef Puppet
  • 7. Motivation: Defects in IaC scripts • Defects in IaC scripts can have serious consequences 7https://wikitech.wikimedia.org/wiki/Incident_documentation/20170118-Labs Wikimedia Commons
  • 8. Goal To help software practitioners to prioritize validation and verification efforts for infrastructure as code (IaC) scripts by identifying the operations that characterize defective IaC scripts 8
  • 9. Research Questions • RQ-1: Which are the operations that characterize defective infrastructure as code (IaC) scripts? How frequently do the identified operations appear in IaC scripts? • RQ-2: How can we build prediction models for defective infrastructure as code scripts using text features? 9
  • 10. Background case $::operatingsystem { ‘CentOS’: { exec { "install iptables": command => "/bin/yum install iptables -y", creates => "/sbin/iptables"; } } default: { fail("cannot install on $::operatingsystem") } } https://puppet.com/company/press-room/media-kit 10
  • 12. Methodology: Dataset Construction Extract Commit IaC Repositories 12 Rater-1 Rater-2 Disagree? Mark as defect or non-defect Resolver Mark as defect or non-defect No Yes
  • 13. Methodology: Empirical Analysis Dataset Feature Extraction Correlating Features Answer to RQ1 Answer to RQ2 Bag of words, TF-IDF Principal Component Analysis Qualitative Analysis Random Forest, Parameter Tuning, AUC, F1 13
  • 14. Dataset: Sources Mozilla Openstack Wikimedia Commons https://blog.mozilla.org/opendesign/arrival/ https://www.openstack.org/brand/openstack-logo/ https://commons.wikimedia.org/wiki/Category:Logos 14
  • 15. Dataset Mozilla Openstack Wikimedia Time Period Aug, 2011- Sep, 2016 Mar, 2011- Sep, 2016 Apr, 2005 – Sep, 2016 IaC Code Size (LOC) 30,272 122,083 17,439 Defect-related Commits 558 of 3074, 18.1% 1987 of 7808, 25.4% 298 of 972, 30.6% Defective IaC Scripts 259 of 580 (44.6%) 810 of 1383 (58.5%) 161 of 296 (54.4%) 15
  • 16. Answer to RQ1: Which are the operations that characterize of defective infrastructure as code (IaC) scripts? • Filesystem operations • Infrastructure provisioning – Build systems – Data analytics systems – Database systems – Web server systems • Managing user accounts 16
  • 17. Answer to RQ1: Filesystem operations 17
  • 18. Answer to RQ1: Infrastructure provisioning • Build systems 18
  • 19. Answer to RQ1: Infrastructure provisioning • Data analytics systems 19
  • 20. Answer to RQ1: Infrastructure provisioning • Database systems 20
  • 21. Answer to RQ1: Infrastructure provisioning • Web server systems 21
  • 22. Answer to RQ1: Managing user accounts 22
  • 23. Answer to RQ1: Frequency 21.7 14.5 23.4 6.9 18.9 17.9 2.5 1.1 1.6 0 5 10 15 20 25 Mozilla Openstack Wikimedia PercentageofIaCScripts Filesystem Infrastructure provision User account 23
  • 24. Answer to RQ2: How can we build prediction models for defective infrastructure as code scripts using text features? 0.76 0.59 0.68 0.75 0.55 0.56 0 0.2 0.4 0.6 0.8 Mozilla Openstack Wikimedia MedianAUC Bag-of-words TF-IDF 0.74 0.71 0.73 0.72 0.74 0.7 0.65 0.7 0.75 Mozilla Openstack Wikimedia MedianF-Measure Bag-of-words TF-IDF 24
  • 26. Implications • Raise awareness amongst practitioners when doing certain operations • Prioritizing verification and validation efforts based on identified operations 26
  • 27. Summary Answer to RQ1: Frequency 21.7 14.5 23.4 6.9 18.9 17.9 2.5 1.1 1.6 0 5 10 15 20 25 Mozilla Openstack Wikimedia PercentageofIaCScripts Filesystem Infrastructure provision User account aarahman@ncsu.edu akondrahman https://akondrahman.github.io/ Contact Answer to RQ2: How can we build prediction models for defective infrastructure as code scripts using text features? 0.76 0.59 0.68 0.75 0.55 0.56 0 0.2 0.4 0.6 0.8 Mozilla Openstack Wikimedia MedianAUC Bag-of-words TF-IDF 0.74 0.71 0.73 0.72 0.74 0.7 0.65 0.7 0.75 Mozilla Openstack Wikimedia MedianF-Measure Bag-of-words TF-IDF 27 Motivation: Defects in IaC scripts • Defects in IaC scripts can have serious consequences 6https://wikitech.wikimedia.org/wiki/Incident_documentation/20170118-Labs Wikimedia Commons

Editor's Notes

  • #5: Keynote today
  • #6: 17 companies
  • #7: 17 companies
  • #8: Say post mortem analysis report
  • #11: Creates: creates => # A file to look for before running the command...
  • #13: Disagreeing on a defect or not