Noise and Heterogeneity in
Historical Build Data:
An Empirical Study of Travis CI
Keheliya Gallaba Shane McIntoshChristian Macho Martin Pinzger
@keheliya
keheliya.github.io
@Mitschiiii
mitschi.github.io
@pinzger
pinzger.github.io
@shane_mcintosh
shanemcintosh.org
Source Code Automated builds
check the impact of
changes on the
software product
Build System
Deliverables
2
Build outcome data is used to solve software
engineering research problems
For
understanding
and predicting
build breakage
For
measuring
the build
breakage rate
For
communicating
the current
build status
3
Build outcome
data is
nuanced!
allow_failure enables experimentation with support for a new platform. 4
Can the off-the-shelf
historical CI build data be
trusted?
The zdavatz/spreadsheet
project has had the
allow_failure feature enabled
for the entire lifetime of the project!
5
Are build
outcomes
free of
noise?
Are build
outcomes
homogeneous?
6
We study 680,209 Travis CI builds spanning 1,276
open source projects
We follow Mockus' four-step procedure
7
Are build
outcomes
free of
noise?
8
We look for passing builds with actively ignored failures
9
680,209
Builds
496,204
Builds
59,904
Builds
Select
passing
builds
Select
builds
with failing jobs
Check if the allow_failure
property is enabled for the
failing jobs in .travis.yml
Passing build outcomes do not always indicate that
the build was entirely clean
12% of passing
builds have an
actively ignored
failure.
Up to 87% of the
jobs are actively
ignored.
10
Passively ignored breakages may introduce noise
when all breakages are assumed to be distracting
11
680,209
Builds
610,550
Builds
Build
filtering
Graph construction using
version control data
Graph
analysis
Long breakage sequences may
mean developers passively ignored
failures by not immediately fixing
them.
In some cases, builds can remain broken for 423 days
Overall median length of the failure sequence is five commits. 12
One of the reasons for ignoring a build breakage:
Staleness
13
Developers may become
desensitized to stale* breakages.
*If the project has encountered a given
breakage in the past it's a stale breakage.
14
Maven
Build Log
Build fails due to the
same reason as a
prior failure?
Stale
Breakage
We measure staleness in Maven build breakages
Failure details are
equal to a prior
failure?
Not Stale
Breakage
YES YES
NONO
Maven Log Analyzer
Two of every three build breakages (67%) that we
analyze are stale
15
We propose
Signal-To-Noise Ratio to
quantify the proportion
of noise
16
Has Ignored
Breakages
No Ignored
Breakages
Broken
Builds
False Build
Breakages
True Build
Breakages
Passing
Builds
False Build
Successes
True Build
Successes
SignalNoise
One in every 7 to 11 builds (9%-14%) is incorrectly labelled
17
Noise may influence analyses
based on build outcome data
18
Passing build outcomes do not
always indicate that the build was
entirely clean
Build breakages can persist for up
to 485 commits (423 days)
67% of build breakages we analyze
are stale
9%-14% of builds are incorrectly
labelled
Are build
outcomes
homogeneous?
19
Noise may influence analyses
based on build outcome data
Passing build outcomes do not
always indicate that the build was
entirely clean
Build breakages can persist for up
to 485 commits (423 days)
67% of build breakages we analyze
are stale
9%-14% of builds are incorrectly
labelled
MBP<1
Environment-specific
breakages
Environment-agnostic
breakages
20
Computing the Matrix Breakage Purity
MBP=1
Environment-specific breakage is commonplace
21
Builds can break for various reasons
22
Compilation
Failure
Test
Failure
Dependency
Resolution
Failure
We extend Maven Log Analyzer to parse and classify broken
Maven build logs by type
Deployment
Failure
Maven Log Analyzer supports new
build breakage categories
23
Ant Inside
Maven
Goal Failed Broken Outside Maven
Run System/Java
Program
Run Jetty
Server
Manage Ruby
Gems
Polyglot for
Maven
No Log
Available
Failed Before
Maven
Travis
Aborted
Failed After
Maven
Travis
Cancelled
Tool-specific breakage is rare.
24
41% of the broken builds failed due to problems
outside of Maven.
25
Noise may influence analyses
based on build outcome data
Passing build outcomes do not
always indicate that the build was
entirely clean
Build breakages can persist for up
to 485 commits (423 days)
67% of build breakages we analyze
are stale
9%-14% of builds are incorrectly
labelled
Build outcomes are heterogenous
Environment-specific breakage is
commonplace
Tool-specific breakage is rare
Future automatic breakage
recovery techniques should tackle
issues in the CI scripts
Our observations have broader implications for
researchers and tool builders 26
For Research
Community
For Tool Builders
Build outcome noise should be
filtered out before analyses
Heterogeneity should be
considered when training build
outcome prediction models
Automatic breakage recovery
should look beyond tool-specific
insight
Richer information should be
included in build outcome reports
and dashboards
github.com/software-rebels/bbchch
@keheliya

More Related Content

PDF
Healthy DevOps - Masto Sitorus
PPTX
Securing your code when you don't even know where it is - Liz Rice - DevOpsDa...
PDF
Msr17b.ppt
PDF
PDF
An Empirical Analysis of Build Failures in the Continuous Integration Workflo...
PDF
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
PDF
Static Analysis For Security and DevOps Happiness w/ Justin Collins
PDF
Relational Database CI/CD
Healthy DevOps - Masto Sitorus
Securing your code when you don't even know where it is - Liz Rice - DevOpsDa...
Msr17b.ppt
An Empirical Analysis of Build Failures in the Continuous Integration Workflo...
A Tale of CI Build Failures: an Open Source and a Financial Organization Pers...
Static Analysis For Security and DevOps Happiness w/ Justin Collins
Relational Database CI/CD

Similar to Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI (20)

PDF
Mining Co-Change Information to Understand when Build Changes are Necessary
PDF
5 Ways to Accelerate Standards Compliance with Static Code Analysis
PDF
Continuous Deployment: Beyond Continuous Delivery
PPTX
Keeping Master Green at Scale
PPT
Improving Development Productivity: Static Analysis and Continuous Integration
PDF
Flight East 2018 Presentation–Continuous Integration––An Overview
PDF
Modern Release Engineering in a Nutshell - Why Researchers should Care!
PPTX
Software engineering
PDF
Principles and Practices in Continuous Deployment at Etsy
PDF
Intro to CI/CD using Docker
PDF
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
PDF
Microservices: Redundancy = Maintainability! (Eberhard Wolff Technology Stream)
KEY
ICSE2011_SRC
PDF
Continuous Integration as a Development Team’s Way of Life
PPTX
Build it, Test it, Ship it: Continuous Delivery at Turner Broadcasting System...
PDF
Delivery at Scale
PDF
Delivery at Scale
PPTX
How To Improve Quality With Static Code Analysis
PDF
A Continuous Delivery Safety Net for Databases
PDF
Developer Productivity Engineering with Gradle
Mining Co-Change Information to Understand when Build Changes are Necessary
5 Ways to Accelerate Standards Compliance with Static Code Analysis
Continuous Deployment: Beyond Continuous Delivery
Keeping Master Green at Scale
Improving Development Productivity: Static Analysis and Continuous Integration
Flight East 2018 Presentation–Continuous Integration––An Overview
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Software engineering
Principles and Practices in Continuous Deployment at Etsy
Intro to CI/CD using Docker
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Microservices: Redundancy = Maintainability! (Eberhard Wolff Technology Stream)
ICSE2011_SRC
Continuous Integration as a Development Team’s Way of Life
Build it, Test it, Ship it: Continuous Delivery at Turner Broadcasting System...
Delivery at Scale
Delivery at Scale
How To Improve Quality With Static Code Analysis
A Continuous Delivery Safety Net for Databases
Developer Productivity Engineering with Gradle
Ad

Recently uploaded (20)

PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
Understanding the Circulatory System……..
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
Microbes in human welfare class 12 .pptx
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
gene cloning powerpoint for general biology 2
PPT
veterinary parasitology ````````````.ppt
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
A powerpoint on colorectal cancer with brief background
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Animal tissues, epithelial, muscle, connective, nervous tissue
Seminar Hypertension and Kidney diseases.pptx
TORCH INFECTIONS in pregnancy with toxoplasma
Understanding the Circulatory System……..
BODY FLUIDS AND CIRCULATION class 11 .pptx
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Microbes in human welfare class 12 .pptx
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
gene cloning powerpoint for general biology 2
veterinary parasitology ````````````.ppt
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
A powerpoint on colorectal cancer with brief background
Ad

Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI

  • 1. Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI Keheliya Gallaba Shane McIntoshChristian Macho Martin Pinzger @keheliya keheliya.github.io @Mitschiiii mitschi.github.io @pinzger pinzger.github.io @shane_mcintosh shanemcintosh.org
  • 2. Source Code Automated builds check the impact of changes on the software product Build System Deliverables 2
  • 3. Build outcome data is used to solve software engineering research problems For understanding and predicting build breakage For measuring the build breakage rate For communicating the current build status 3
  • 4. Build outcome data is nuanced! allow_failure enables experimentation with support for a new platform. 4
  • 5. Can the off-the-shelf historical CI build data be trusted? The zdavatz/spreadsheet project has had the allow_failure feature enabled for the entire lifetime of the project! 5
  • 6. Are build outcomes free of noise? Are build outcomes homogeneous? 6
  • 7. We study 680,209 Travis CI builds spanning 1,276 open source projects We follow Mockus' four-step procedure 7
  • 9. We look for passing builds with actively ignored failures 9 680,209 Builds 496,204 Builds 59,904 Builds Select passing builds Select builds with failing jobs Check if the allow_failure property is enabled for the failing jobs in .travis.yml
  • 10. Passing build outcomes do not always indicate that the build was entirely clean 12% of passing builds have an actively ignored failure. Up to 87% of the jobs are actively ignored. 10
  • 11. Passively ignored breakages may introduce noise when all breakages are assumed to be distracting 11 680,209 Builds 610,550 Builds Build filtering Graph construction using version control data Graph analysis Long breakage sequences may mean developers passively ignored failures by not immediately fixing them.
  • 12. In some cases, builds can remain broken for 423 days Overall median length of the failure sequence is five commits. 12
  • 13. One of the reasons for ignoring a build breakage: Staleness 13 Developers may become desensitized to stale* breakages. *If the project has encountered a given breakage in the past it's a stale breakage.
  • 14. 14 Maven Build Log Build fails due to the same reason as a prior failure? Stale Breakage We measure staleness in Maven build breakages Failure details are equal to a prior failure? Not Stale Breakage YES YES NONO Maven Log Analyzer
  • 15. Two of every three build breakages (67%) that we analyze are stale 15
  • 16. We propose Signal-To-Noise Ratio to quantify the proportion of noise 16 Has Ignored Breakages No Ignored Breakages Broken Builds False Build Breakages True Build Breakages Passing Builds False Build Successes True Build Successes SignalNoise
  • 17. One in every 7 to 11 builds (9%-14%) is incorrectly labelled 17
  • 18. Noise may influence analyses based on build outcome data 18 Passing build outcomes do not always indicate that the build was entirely clean Build breakages can persist for up to 485 commits (423 days) 67% of build breakages we analyze are stale 9%-14% of builds are incorrectly labelled
  • 19. Are build outcomes homogeneous? 19 Noise may influence analyses based on build outcome data Passing build outcomes do not always indicate that the build was entirely clean Build breakages can persist for up to 485 commits (423 days) 67% of build breakages we analyze are stale 9%-14% of builds are incorrectly labelled
  • 22. Builds can break for various reasons 22 Compilation Failure Test Failure Dependency Resolution Failure We extend Maven Log Analyzer to parse and classify broken Maven build logs by type Deployment Failure
  • 23. Maven Log Analyzer supports new build breakage categories 23 Ant Inside Maven Goal Failed Broken Outside Maven Run System/Java Program Run Jetty Server Manage Ruby Gems Polyglot for Maven No Log Available Failed Before Maven Travis Aborted Failed After Maven Travis Cancelled
  • 24. Tool-specific breakage is rare. 24 41% of the broken builds failed due to problems outside of Maven.
  • 25. 25 Noise may influence analyses based on build outcome data Passing build outcomes do not always indicate that the build was entirely clean Build breakages can persist for up to 485 commits (423 days) 67% of build breakages we analyze are stale 9%-14% of builds are incorrectly labelled Build outcomes are heterogenous Environment-specific breakage is commonplace Tool-specific breakage is rare Future automatic breakage recovery techniques should tackle issues in the CI scripts
  • 26. Our observations have broader implications for researchers and tool builders 26 For Research Community For Tool Builders Build outcome noise should be filtered out before analyses Heterogeneity should be considered when training build outcome prediction models Automatic breakage recovery should look beyond tool-specific insight Richer information should be included in build outcome reports and dashboards