SlideShare a Scribd company logo
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
1
16:00 Ahmed Zerouali
Public PhD Defense
“A Measurement Framework for Analyzing Technical Lag in Open-Source
Software Ecosystems”
A Measurement Framework for
Analyzing Technical Lag in
Open-Source Software Ecosystems
Presented by:
Ahmed Zerouali
Advisor:
Dr. Tom Mens
Public PhD Defense
Software Engineering Lab, Université de Mons - Belgium, 4 September 2019
Jury:
Dr. Olivier Delgrange
Dr. Alexandre Decan
Dr. Jesus Gonzalez-Barahona
Dr. Alexander Serebrenik
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Belgian Excellence of
Science Research Project 2018- 2021
https://secoassist.github.io/
3
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Background
4
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Focus
IssuesUp-to-date
5
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Focus
Don’t updateUpdate
How can we help software developers to
decide when and why they should update
6
?
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Published papers
7
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag
8
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Technical lag¹: the increasing difference between
deployed software packages and the ideal available
upstream packages.
¹ Gonzalez-Barahona, et al. "Technical Lag in Software Compilations: Measuring How
Outdated a Software Deployment Is." IFIP International Conference on Open Source
Systems. Springer, Cham, 2017.
➢ Ideal: stability, security, functionality, recency, etc.
➢ Difference: version updates, bugs, vulnerabilities, lines
of code, commits, etc.
9
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Credits: https://exploring-data.com/vis/npm-packages-dependencies/
+20M
dependencies
10
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
package.json
11
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
12
Semantic Versioning
Examples: 0.0.1, 1.0.0, 1.2.3, 1.2.3-beta
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Background
Other: *, ==1.2.3, >1.2.3, <1.2.3, 1.2.x, 1.x.x
13
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
14
1.0.1
3.6.0
*
1.2.0
allowed
4.0.0
^1.0.0
^1.0.0 = [ 1.0.0, 2.0.0 [
dependent
package P
required
package D
up-to-date
dependency
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
15
1.0.1 1.2.0 2.0.1
3.6.0
4.0.0
2.0.0
*
^1.0.0 = [ 1.0.0, 2.0.0 [
allowed
4.1.0
missing updates
dependent
package P
required
package D
outdated
^1.0.0
dependency
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
> Example
16
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Followed research approach
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Technical lag
formal framework
Mixed-methods
approach
17
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
18
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Qualitative analysis
Semi-structured Interviews:
➢ 5 software practitioners
➢ Place: FOSDEM 2019
➢ Highly educated interviewees with an average of 10
years of experience
19
Technical Lag is important, especially if we mix
between the benefits of updating and the effort
needed to do that.
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Qualitative analysis
Online surveys:
➢ 17 candidates (from facebook groups)
➢ Highly educated interviewees with an average of 3
years of experience
MCQ: What would be the most appropriate (ideal) version
of a software library to use?
20
★ Most stable (14)
★ Latest available (9)
★ Most documented (7)
★ Most secure (5)
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
21
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical lag
22
∆ version
∆ time
∆ bugs
∆ vulnerabilities
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
● is a set of component releases
● is a set of possible lag values
● ideal : → is a function returning the “ideal” component release
● delta : x → is a function computing the difference between two
component releases
● agg : is a function aggregating the results of a set of lags
23
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
24
Given a technical lag framework , we define:
Aggregated Technical lag
Technical lag
Let be a set of components, then:
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
25
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
A Framework for Technical Lag
> Framework Instantiation
26
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
27
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
28
Technical Lag in npm Packages
Time-based instantiation of :
➢ Ideal: Highest available version
➢ Delta: Time Lag = date(ideal) - date(used)
➢ Aggregation: Max
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
29
Technical Lag in npm Packages
Version-based instantiation of :
➢ Ideal: Highest available version
➢ Delta: Version lag = (∆Major, ∆Minor, ∆Patch)
➢ Aggregation: Sum
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
30
Technical Lag in npm Packages
> Time-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
time lag = date(2.0.1) - date(1.1.0)
Required
package
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
31
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
Required
package
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
32
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
1 major
Required
package
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
33
Technical Lag in npm Packages
> Version-based instantiation
1.0.1 1.1.0 2.0.01.2.0 2.0.1
Dependent
package
Technical lag
1 minor
time lag = date(2.0.1) - date(1.1.0)
version lag = (1 major, 1 minor ,1 patch)
1 major 1 patch
Required
package
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Time-based technical lag for the package debug:
- ideal (2.6.9) = 3.1.0
- time_lag(2.6.9) = 26-09-2017 - 22-09-2017 = 4 days
- version_lag(2.6.9) = (1,1,1)
34
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Time-based technical lag for the package ms:
- time_lag(2.0.0) = 198 days
35
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
> Example
Aggregated Time-based technical lag for the package release
youtube-player@5.5.0:
- time_lag(debug@2.6.9) = 4 days
- time_lag(ms@2.0.0) = 198 days
➔ agglag({debug@2.6.9, ms@2.0.0}) = max (4 days, 198 days) = 198 days
36
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
New package releases have an increased technical lag.
Technical lag is induced by version constraints
Time lag induced by direct dependencies in npm package
releases:
37
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm Packages
Time lag induced by transitive runtime dependencies in npm
package releases:
Technical lag is accumulated from a level to another in the dependency
tree.
38
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm applications
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in npm applications
Time lag induced by direct dependencies in GitHub applications:
Technical lag in GitHub applications is higher than in npm
package releases
40
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Docker Containers
41
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
42
● Containers are isolated bundles
of software packages
● Docker is one of the main tools
for containerisation
● DockerHub is the largest repository
for container images
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
43
What are the biggest barriers to putting containers in a
production environment? - ClusterHQ
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
44
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Focus
45
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Background
1.0.1 1.1.0 2.0.01.2.1 2.1.0
Docker container
C
Technical lag
technical lag =
∆ Versions (freshness)
∆ Vulnerabilities (security)
∆ Bugs (stability)
Ideal Version
deployed
container
Included
Package
version
Available
releases of a
Debian package
46
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Freshness
The majority of packages in Debian containers is up-to-date...
… but most of the images contain outdated packages.
How outdated are images?
IDEAL = LATEST
47
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Freshness
Outdated Debian packages in Docker containers induce a
median version lag of 1 version.
What is the version lag induced by the used Debian package releases?
IDEAL = LATEST
48
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Security
The number of vulnerabilities depends on the Debian release, and is
moderately correlated with the number of outdated packages in a container.
49
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Software Security
Few packages (2.5%) need to be downgraded in order to have the
most secure version.
Can we reduce security lag in DockerHub container images?
IDEAL = Most Secure
50
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
> Survey
51
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
52
1 out of 3 depending on a vulnerable npm package never update
their dependency and remain vulnerable
A. Decan et al. “On the impact of security vulnerabilities in the npm package
dependency network”, MSR 2018.
"37% of websites include a JavaScript library
with a known open source vulnerability”
T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated
JavaScript Libraries on the Web", NDSS 2017.
So what about Docker containers having npm packages?
Security vulnerabilities in npm JavaScript
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Technical Lag in Docker Containers
53
★ Old Node images might be missing many updates, including
one major update.
★ All official Node-based images have vulnerable npm packages,
with an average of 16 security vulnerabilities per image.
★ Older images are more likely to have more vulnerabilities.
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Qualitative analysis:
surveys and interviews
Tooling
Quantitative
analysis
Mixed-methods
approach
Technical lag
formal framework
54
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> Existing tools
55
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
(*): https://github.com/neglectos/ConPan
ConPan(*): “CONtainer Packages ANalyzer”
ConPan: A tool to analyze packages in software containers
> Overview
56
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Example: $ conpan -p debian -c google/mysql -d ~/ConPan/data/debian/
ConPan: A tool to analyze packages in software containers
> From the CLI
57
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
58
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
59
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
ConPan: A tool to analyze packages in software containers
> From the API
60
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Summary and Outlook
61
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Summary
62
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Future Work
● Other instantiations of the technical lag
○ Effort needed to reduce the technical lag
● Extend and enhance ConPan
● Cross ecosystems comparison
● Promote technical lag to be used by software developers
63
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
Conclusion
The technical lag could help open source software developers
and deployers to keep their software in a healthy shape.
64
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
https://github.com/neglectos/PhD_Dissertation
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
66
Evolution of Dependency constraints in npm packages
- Caret (^) usage is increasing over time.
- Caret introduction coincides with Major version lag increase.
PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems
67
The usage of strict constraint is much higher in external
applications
Evolution of Dependency constraints in GitHub applications

More Related Content

PPTX
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
PDF
130705 zephyrin soh - how developers spend their effort during maintenance ...
PDF
Cloud Resilience with Open Stack
PPTX
Rafail Brouzos thesis
PDF
ICSE2013
PDF
ICSE2014
PDF
PDF
Runtime Behavior of JavaScript Programs
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
130705 zephyrin soh - how developers spend their effort during maintenance ...
Cloud Resilience with Open Stack
Rafail Brouzos thesis
ICSE2013
ICSE2014
Runtime Behavior of JavaScript Programs

Similar to PhD public defense: A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems (19)

PDF
Technical Lag in Software Ecosystems
PDF
An Empirical Analysis of Technical Lag in npm Package Dependencies
PDF
Technical lag in npm and docker ecosystems
PDF
On the fragility of open source software packaging ecosystems
PDF
A multi-dimensional analysis of technical lag in Debian-based Docker images
PPTX
On the health of the npm packaging ecosystem
PDF
"Software Ecosystem Health" lightning talk
PDF
Association Rule Mining Scheme for Software Failure Analysis
PDF
Tools to help and engage emerging open source communities towards development...
PDF
Technical Lag in Docker Containers
PDF
A LOG-BASED TRACE AND REPLAY TOOL INTEGRATING SOFTWARE AND INFRASTRUCTURE
PPTX
On the evolution of technical lag in the npm package dependency network
PPTX
How to increase the technical health of your software?
PDF
FinalReport
PDF
Dependency Issues in Open Source Software Package Registries
KEY
Open Bugs & Development Stages
PDF
Open gen webinar 122810
PDF
Ten Years of Failing Microservices
PDF
OSMC 2023 | OpenTelemetry for Logging by Philipp Krenn
Technical Lag in Software Ecosystems
An Empirical Analysis of Technical Lag in npm Package Dependencies
Technical lag in npm and docker ecosystems
On the fragility of open source software packaging ecosystems
A multi-dimensional analysis of technical lag in Debian-based Docker images
On the health of the npm packaging ecosystem
"Software Ecosystem Health" lightning talk
Association Rule Mining Scheme for Software Failure Analysis
Tools to help and engage emerging open source communities towards development...
Technical Lag in Docker Containers
A LOG-BASED TRACE AND REPLAY TOOL INTEGRATING SOFTWARE AND INFRASTRUCTURE
On the evolution of technical lag in the npm package dependency network
How to increase the technical health of your software?
FinalReport
Dependency Issues in Open Source Software Package Registries
Open Bugs & Development Stages
Open gen webinar 122810
Ten Years of Failing Microservices
OSMC 2023 | OpenTelemetry for Logging by Philipp Krenn
Ad

More from Ahmed Zerouali (11)

PDF
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
PPTX
Analysis And Observations Of The Evolution Of Testing Library Usage
PPTX
On Popularity and Quality Metrics of npm Packages
PDF
On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency ...
PDF
Evolution of Technical Lag in DockerHub images - Benevol20
PPTX
Analyzing Packages in Docker images hosted On DockerHub
PPTX
On the Diversity of Software Package Popularity Metrics: An Empirical Study o...
PPTX
ConPan: A Tool to Analyze Packages in Software Containers
PPTX
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
PPTX
An Empirical Comparison of the Development History of CloudStack and Eucalyptus
PPTX
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Analysis And Observations Of The Evolution Of Testing Library Usage
On Popularity and Quality Metrics of npm Packages
On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency ...
Evolution of Technical Lag in DockerHub images - Benevol20
Analyzing Packages in Docker images hosted On DockerHub
On the Diversity of Software Package Popularity Metrics: An Empirical Study o...
ConPan: A Tool to Analyze Packages in Software Containers
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
An Empirical Comparison of the Development History of CloudStack and Eucalyptus
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Ad

Recently uploaded (20)

PPTX
chapter 5 systemdesign2008.pptx for cimputer science students
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Introduction to Windows Operating System
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
assetexplorer- product-overview - presentation
PPTX
Cybersecurity: Protecting the Digital World
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
STL Containers in C++ : Sequence Container : Vector
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Custom Software Development Services.pptx.pptx
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
chapter 5 systemdesign2008.pptx for cimputer science students
Wondershare Recoverit Full Crack New Version (Latest 2025)
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Trending Python Topics for Data Visualization in 2025
Complete Guide to Website Development in Malaysia for SMEs
Topaz Photo AI Crack New Download (Latest 2025)
Introduction to Windows Operating System
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
assetexplorer- product-overview - presentation
Cybersecurity: Protecting the Digital World
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
STL Containers in C++ : Sequence Container : Vector
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Custom Software Development Services.pptx.pptx
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
iTop VPN Crack Latest Version Full Key 2025
wealthsignaloriginal-com-DS-text-... (1).pdf

PhD public defense: A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems

  • 1. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 1 16:00 Ahmed Zerouali Public PhD Defense “A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems”
  • 2. A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Presented by: Ahmed Zerouali Advisor: Dr. Tom Mens Public PhD Defense Software Engineering Lab, Université de Mons - Belgium, 4 September 2019 Jury: Dr. Olivier Delgrange Dr. Alexandre Decan Dr. Jesus Gonzalez-Barahona Dr. Alexander Serebrenik
  • 3. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Belgian Excellence of Science Research Project 2018- 2021 https://secoassist.github.io/ 3
  • 4. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Background 4
  • 5. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Focus IssuesUp-to-date 5
  • 6. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Focus Don’t updateUpdate How can we help software developers to decide when and why they should update 6 ?
  • 7. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Published papers 7
  • 8. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag 8
  • 9. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Technical lag¹: the increasing difference between deployed software packages and the ideal available upstream packages. ¹ Gonzalez-Barahona, et al. "Technical Lag in Software Compilations: Measuring How Outdated a Software Deployment Is." IFIP International Conference on Open Source Systems. Springer, Cham, 2017. ➢ Ideal: stability, security, functionality, recency, etc. ➢ Difference: version updates, bugs, vulnerabilities, lines of code, commits, etc. 9
  • 10. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Credits: https://exploring-data.com/vis/npm-packages-dependencies/ +20M dependencies 10
  • 11. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background package.json 11
  • 12. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background 12 Semantic Versioning Examples: 0.0.1, 1.0.0, 1.2.3, 1.2.3-beta
  • 13. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Background Other: *, ==1.2.3, >1.2.3, <1.2.3, 1.2.x, 1.x.x 13
  • 14. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 14 1.0.1 3.6.0 * 1.2.0 allowed 4.0.0 ^1.0.0 ^1.0.0 = [ 1.0.0, 2.0.0 [ dependent package P required package D up-to-date dependency
  • 15. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 15 1.0.1 1.2.0 2.0.1 3.6.0 4.0.0 2.0.0 * ^1.0.0 = [ 1.0.0, 2.0.0 [ allowed 4.1.0 missing updates dependent package P required package D outdated ^1.0.0 dependency
  • 16. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag > Example 16
  • 17. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Followed research approach Qualitative analysis: surveys and interviews Tooling Quantitative analysis Technical lag formal framework Mixed-methods approach 17
  • 18. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 18
  • 19. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Qualitative analysis Semi-structured Interviews: ➢ 5 software practitioners ➢ Place: FOSDEM 2019 ➢ Highly educated interviewees with an average of 10 years of experience 19 Technical Lag is important, especially if we mix between the benefits of updating and the effort needed to do that.
  • 20. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Qualitative analysis Online surveys: ➢ 17 candidates (from facebook groups) ➢ Highly educated interviewees with an average of 3 years of experience MCQ: What would be the most appropriate (ideal) version of a software library to use? 20 ★ Most stable (14) ★ Latest available (9) ★ Most documented (7) ★ Most secure (5)
  • 21. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 21
  • 22. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical lag 22 ∆ version ∆ time ∆ bugs ∆ vulnerabilities
  • 23. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag ● is a set of component releases ● is a set of possible lag values ● ideal : → is a function returning the “ideal” component release ● delta : x → is a function computing the difference between two component releases ● agg : is a function aggregating the results of a set of lags 23
  • 24. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag 24 Given a technical lag framework , we define: Aggregated Technical lag Technical lag Let be a set of components, then:
  • 25. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 25
  • 26. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems A Framework for Technical Lag > Framework Instantiation 26
  • 27. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages 27
  • 28. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 28 Technical Lag in npm Packages Time-based instantiation of : ➢ Ideal: Highest available version ➢ Delta: Time Lag = date(ideal) - date(used) ➢ Aggregation: Max
  • 29. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 29 Technical Lag in npm Packages Version-based instantiation of : ➢ Ideal: Highest available version ➢ Delta: Version lag = (∆Major, ∆Minor, ∆Patch) ➢ Aggregation: Sum
  • 30. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 30 Technical Lag in npm Packages > Time-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag time lag = date(2.0.1) - date(1.1.0) Required package
  • 31. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 31 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) Required package
  • 32. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 32 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) 1 major Required package
  • 33. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 33 Technical Lag in npm Packages > Version-based instantiation 1.0.1 1.1.0 2.0.01.2.0 2.0.1 Dependent package Technical lag 1 minor time lag = date(2.0.1) - date(1.1.0) version lag = (1 major, 1 minor ,1 patch) 1 major 1 patch Required package
  • 34. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Time-based technical lag for the package debug: - ideal (2.6.9) = 3.1.0 - time_lag(2.6.9) = 26-09-2017 - 22-09-2017 = 4 days - version_lag(2.6.9) = (1,1,1) 34
  • 35. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Time-based technical lag for the package ms: - time_lag(2.0.0) = 198 days 35
  • 36. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages > Example Aggregated Time-based technical lag for the package release youtube-player@5.5.0: - time_lag(debug@2.6.9) = 4 days - time_lag(ms@2.0.0) = 198 days ➔ agglag({debug@2.6.9, ms@2.0.0}) = max (4 days, 198 days) = 198 days 36
  • 37. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages New package releases have an increased technical lag. Technical lag is induced by version constraints Time lag induced by direct dependencies in npm package releases: 37
  • 38. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm Packages Time lag induced by transitive runtime dependencies in npm package releases: Technical lag is accumulated from a level to another in the dependency tree. 38
  • 39. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm applications
  • 40. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in npm applications Time lag induced by direct dependencies in GitHub applications: Technical lag in GitHub applications is higher than in npm package releases 40
  • 41. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Docker Containers 41
  • 42. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 42 ● Containers are isolated bundles of software packages ● Docker is one of the main tools for containerisation ● DockerHub is the largest repository for container images
  • 43. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 43 What are the biggest barriers to putting containers in a production environment? - ClusterHQ
  • 44. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 44
  • 45. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Focus 45
  • 46. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Background 1.0.1 1.1.0 2.0.01.2.1 2.1.0 Docker container C Technical lag technical lag = ∆ Versions (freshness) ∆ Vulnerabilities (security) ∆ Bugs (stability) Ideal Version deployed container Included Package version Available releases of a Debian package 46
  • 47. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Freshness The majority of packages in Debian containers is up-to-date... … but most of the images contain outdated packages. How outdated are images? IDEAL = LATEST 47
  • 48. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Freshness Outdated Debian packages in Docker containers induce a median version lag of 1 version. What is the version lag induced by the used Debian package releases? IDEAL = LATEST 48
  • 49. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Security The number of vulnerabilities depends on the Debian release, and is moderately correlated with the number of outdated packages in a container. 49
  • 50. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Software Security Few packages (2.5%) need to be downgraded in order to have the most secure version. Can we reduce security lag in DockerHub container images? IDEAL = Most Secure 50
  • 51. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers > Survey 51
  • 52. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers 52 1 out of 3 depending on a vulnerable npm package never update their dependency and remain vulnerable A. Decan et al. “On the impact of security vulnerabilities in the npm package dependency network”, MSR 2018. "37% of websites include a JavaScript library with a known open source vulnerability” T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web", NDSS 2017. So what about Docker containers having npm packages? Security vulnerabilities in npm JavaScript
  • 53. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Technical Lag in Docker Containers 53 ★ Old Node images might be missing many updates, including one major update. ★ All official Node-based images have vulnerable npm packages, with an average of 16 security vulnerabilities per image. ★ Older images are more likely to have more vulnerabilities.
  • 54. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Qualitative analysis: surveys and interviews Tooling Quantitative analysis Mixed-methods approach Technical lag formal framework 54
  • 55. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > Existing tools 55
  • 56. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems (*): https://github.com/neglectos/ConPan ConPan(*): “CONtainer Packages ANalyzer” ConPan: A tool to analyze packages in software containers > Overview 56
  • 57. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Example: $ conpan -p debian -c google/mysql -d ~/ConPan/data/debian/ ConPan: A tool to analyze packages in software containers > From the CLI 57
  • 58. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 58
  • 59. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 59
  • 60. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems ConPan: A tool to analyze packages in software containers > From the API 60
  • 61. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Summary and Outlook 61
  • 62. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Summary 62
  • 63. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Future Work ● Other instantiations of the technical lag ○ Effort needed to reduce the technical lag ● Extend and enhance ConPan ● Cross ecosystems comparison ● Promote technical lag to be used by software developers 63
  • 64. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems Conclusion The technical lag could help open source software developers and deployers to keep their software in a healthy shape. 64
  • 65. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems https://github.com/neglectos/PhD_Dissertation
  • 66. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 66 Evolution of Dependency constraints in npm packages - Caret (^) usage is increasing over time. - Caret introduction coincides with Major version lag increase.
  • 67. PhD Thesis A Measurement Framework for Analyzing Technical Lag in Open-Source Software Ecosystems 67 The usage of strict constraint is much higher in external applications Evolution of Dependency constraints in GitHub applications