SlideShare a Scribd company logo
Monitoring the Execution of 14K Tests:
Methods Tend to Have One Path That Is
Significantly More Executed
Andre Hora
DCC/UFMG
andrehora@dcc.ufmg.br
1
FSE 2024
Ideas, Visions and Reflections
Motivation & Problem
Having a good test suite is fundamental to ensuring software quality and
sustainable software evolution
Developers should focus on testing both the expected and unexpected behaviors
of the program to catch more bugs and protect against regressions
● Expected behavior: the normal execution, simpler to test
● Unexpected behavior: the abnormal execution, harder to test
2
Motivation & Problem
Having a good test suite is fundamental to ensuring software quality and
sustainable software evolution
Developers should focus on testing both the expected and unexpected behaviors
of the program to catch more bugs and protect against regressions
● Expected behavior: the normal execution, simpler to test
● Unexpected behavior: the abnormal execution, harder to test
3
In practice, it is well-known that developers are more
likely to test expected behaviors than unexpected ones
Motivation & Problem
However, existing research is mostly restricted to controlled experiments, like case
studies with students and developers
- Students are likely to (naively) test the “happy cases” [7]
- Expert developers may test the “sad cases” [25]
We still lack empirical evidence extracted from
real-world software systems and their test suites
4
5
Email Python Standard Library
6
Email Python Standard Library
Three possible behaviors at runtime:
1. Entering in both the for and if blocks
2. Entering in the for block and not in the if block
3. Not entering in the for block
7
Email Python Standard Library
Three possible behaviors at runtime:
1. Entering in both the for and if blocks
2. Entering in the for block and not in the if block
3. Not entering in the for block
At this point, it is unclear what
behaviors are the most and least
frequently tested by developers
Can you guess?
8
9
Interesting: the large
discrepancy between the
execution frequency of
different paths
Path 1 concentrates most
of the calls (70.9%)
Path 3 receives only 4.4%
Open Question
Are tested paths of real software likely to concentrate calls or do
calls tend to be more distributed among the tested paths?
Provide insights for developers to improve existing test suites
Support the creation of novel testing tools to better understand test suites
Reveal novel empirical data for researchers to quantify the difference between the
execution frequency of distinct paths in real-world software
10
Proposed Work
We propose an empirical study to assess the tested paths quantitatively
We monitor the execution of 14K tests from 25 real-world Python systems,
assessing 11K tested paths from 2,357 methods
11
Study Design
12
Study Design
1. Detecting the tested paths
2. Selecting software systems
3. Research questions
13
Study Design: Detecting the Tested Paths
1. Collecting executed lines of code
We execute an instrumented version of the
test suite that monitors the tests and collect
data from the execution trace
2. Detecting the tested paths
A tested path represents a set of input
values that make the method execute the
same lines of code
3. Ranking the tested paths
For each method with one or more tested
paths, we sort their paths in descending
order of path frequency
14
Study Design: Selecting Software Systems
25 Python systems
2,357 methods
14,177 tests
11,425 tested paths
15
Study Design: Research Questions
RQ1: Frequency of the most tested paths (top 1 vs. top 2)
RQ2: Frequency of the least tested paths (top 1 vs. top 3+)
16
Results
17
RQ1: Frequency of the Most Tested Paths
18
Top 1 vs. Top 2
RQ1: Frequency of the Most Tested Paths
19
Top 1 vs. Top 2 Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.
RQ1: Frequency of the Most Tested Paths
20
Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.
Top 1 vs. Top 2
Finding 2: In methods with two tested
paths, one path tends receive close to 5x
more calls than the second one.
RQ1: Frequency of the Most Tested Paths
21
Finding 2: In methods with two tested
paths, one path tends receive close to 5x
more calls than the second one.
Finding 3: Even methods with four or more
tested paths have one path that receives
the majority of the calls.
Top 1 vs. Top 2 Finding 1: Overall, one tested path tends
to receive most of the calls. Top 1 receives
4x more calls than the Top 2.
RQ2: Frequency of the Least Tested Paths
22
Top 1 vs. Top 3+
RQ2: Frequency of the Least Tested Paths
23
Top 1 vs. Top 3+
RQ2: Frequency of the Least Tested Paths
24
Top 1 vs. Top 3+
Finding 4: The top 3+ tested paths receive a
minority of the calls, ranging from 4% to 24%.
Overall, the most tested path of a method has
6.5x more calls than the top 3+.
Summary
We presented an empirical study to assess the tested paths quantitatively
We monitored the execution of over 14K tests and 11K tested paths
Overall, we found that one tested path is prevalent and receives most of the calls,
while others are significantly less executed
Possible applications:
● Provide insights for developers to improve existing test suites
● Support the creation of novel testing tools
● Reveal novel empirical data for researchers
25
Monitoring the Execution of 14K Tests:
Methods Tend to Have One Path That Is
Significantly More Executed
Andre Hora
DCC/UFMG
andrehora@dcc.ufmg.br
26
FSE 2024
Ideas, Visions and Reflections

More Related Content

PDF
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
PDF
The Road Not Taken: Estimating Path Execution Frequency Statically
PDF
Software Testing Exam imp Ques Notes.pdf
PPTX
The Current State of the Art of Regression Testing
PDF
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PDF
Staging's channles are being tested
PDF
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
PDF
Go to all channels so that I may test your stats tom
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
The Road Not Taken: Estimating Path Execution Frequency Statically
Software Testing Exam imp Ques Notes.pdf
The Current State of the Art of Regression Testing
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
Staging's channles are being tested
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Go to all channels so that I may test your stats tom

Similar to Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is Significantly More Executed (FSE 2024) (20)

PDF
Software testing techniques
PDF
@#$@#$@#$"""@#$@#$"""
PDF
Content to all channels
PDF
Slideshare - Many files
PDF
Slideshare removal with caption
KEY
Reliability Vs. Testing
PDF
Dc35579583
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
PDF
Too many files
PDF
Software testing techniques - www.testersforum.com
PPTX
Testing Technique
PPTX
Building Test Cases and Plans.pptx in computer science
PDF
Target-based test path prioritization for UML activity diagram using weight a...
PPT
AutoTest.ppt
PPT
AutoTest.ppt
PPT
AutoTest.ppt
PPT
Testing foundations
PPTX
An Empirical Study on the Adequacy of Testing in Open Source Projects
PPT
AutoTest for software engineering for automated testing
Software testing techniques
@#$@#$@#$"""@#$@#$"""
Content to all channels
Slideshare - Many files
Slideshare removal with caption
Reliability Vs. Testing
Dc35579583
International Journal of Engineering Research and Development (IJERD)
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
Too many files
Software testing techniques - www.testersforum.com
Testing Technique
Building Test Cases and Plans.pptx in computer science
Target-based test path prioritization for UML activity diagram using weight a...
AutoTest.ppt
AutoTest.ppt
AutoTest.ppt
Testing foundations
An Empirical Study on the Adequacy of Testing in Open Source Projects
AutoTest for software engineering for automated testing
Ad

More from Andre Hora (13)

PDF
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
PDF
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
PDF
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
PDF
When should internal interfaces be promoted to public? (FSE 2016)
PDF
Assessing the Threat of Untracked Changes in Software Evolution (ICSE 2018)
PDF
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
PDF
Assessing Mock Classes: An Empirical Study (ICSME 2020)
PDF
What Code Is Deliberately Excluded from Test Coverage and Why? (MSR 2021)
PDF
Googling for Software Development: What Developers Search For and What They F...
PDF
Availability and Usage of Platform-Specific APIs: A First Empirical Study (MS...
PDF
How and Why Developers Migrate Python Tests (SANER 2022)
PDF
Predicting Test Results without Execution (FSE 2024)
PDF
SpotFlow: Tracking Method Calls and States at Runtime (ICSE 2024)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
When should internal interfaces be promoted to public? (FSE 2016)
Assessing the Threat of Untracked Changes in Software Evolution (ICSE 2018)
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
Assessing Mock Classes: An Empirical Study (ICSME 2020)
What Code Is Deliberately Excluded from Test Coverage and Why? (MSR 2021)
Googling for Software Development: What Developers Search For and What They F...
Availability and Usage of Platform-Specific APIs: A First Empirical Study (MS...
How and Why Developers Migrate Python Tests (SANER 2022)
Predicting Test Results without Execution (FSE 2024)
SpotFlow: Tracking Method Calls and States at Runtime (ICSE 2024)
Ad

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Digital Strategies for Manufacturing Companies
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
System and Network Administraation Chapter 3
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
AI in Product Development-omnex systems
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
System and Network Administration Chapter 2
PPTX
Odoo POS Development Services by CandidRoot Solutions
How to Migrate SBCGlobal Email to Yahoo Easily
Digital Strategies for Manufacturing Companies
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
L1 - Introduction to python Backend.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Wondershare Filmora 15 Crack With Activation Key [2025
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
System and Network Administraation Chapter 3
PTS Company Brochure 2025 (1).pdf.......
How to Choose the Right IT Partner for Your Business in Malaysia
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
AI in Product Development-omnex systems
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Understanding Forklifts - TECH EHS Solution
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
System and Network Administration Chapter 2
Odoo POS Development Services by CandidRoot Solutions

Monitoring the Execution of 14K Tests: Methods Tend to Have One Path that Is Significantly More Executed (FSE 2024)

  • 1. Monitoring the Execution of 14K Tests: Methods Tend to Have One Path That Is Significantly More Executed Andre Hora DCC/UFMG andrehora@dcc.ufmg.br 1 FSE 2024 Ideas, Visions and Reflections
  • 2. Motivation & Problem Having a good test suite is fundamental to ensuring software quality and sustainable software evolution Developers should focus on testing both the expected and unexpected behaviors of the program to catch more bugs and protect against regressions ● Expected behavior: the normal execution, simpler to test ● Unexpected behavior: the abnormal execution, harder to test 2
  • 3. Motivation & Problem Having a good test suite is fundamental to ensuring software quality and sustainable software evolution Developers should focus on testing both the expected and unexpected behaviors of the program to catch more bugs and protect against regressions ● Expected behavior: the normal execution, simpler to test ● Unexpected behavior: the abnormal execution, harder to test 3 In practice, it is well-known that developers are more likely to test expected behaviors than unexpected ones
  • 4. Motivation & Problem However, existing research is mostly restricted to controlled experiments, like case studies with students and developers - Students are likely to (naively) test the “happy cases” [7] - Expert developers may test the “sad cases” [25] We still lack empirical evidence extracted from real-world software systems and their test suites 4
  • 6. 6 Email Python Standard Library Three possible behaviors at runtime: 1. Entering in both the for and if blocks 2. Entering in the for block and not in the if block 3. Not entering in the for block
  • 7. 7 Email Python Standard Library Three possible behaviors at runtime: 1. Entering in both the for and if blocks 2. Entering in the for block and not in the if block 3. Not entering in the for block At this point, it is unclear what behaviors are the most and least frequently tested by developers Can you guess?
  • 8. 8
  • 9. 9 Interesting: the large discrepancy between the execution frequency of different paths Path 1 concentrates most of the calls (70.9%) Path 3 receives only 4.4%
  • 10. Open Question Are tested paths of real software likely to concentrate calls or do calls tend to be more distributed among the tested paths? Provide insights for developers to improve existing test suites Support the creation of novel testing tools to better understand test suites Reveal novel empirical data for researchers to quantify the difference between the execution frequency of distinct paths in real-world software 10
  • 11. Proposed Work We propose an empirical study to assess the tested paths quantitatively We monitor the execution of 14K tests from 25 real-world Python systems, assessing 11K tested paths from 2,357 methods 11
  • 13. Study Design 1. Detecting the tested paths 2. Selecting software systems 3. Research questions 13
  • 14. Study Design: Detecting the Tested Paths 1. Collecting executed lines of code We execute an instrumented version of the test suite that monitors the tests and collect data from the execution trace 2. Detecting the tested paths A tested path represents a set of input values that make the method execute the same lines of code 3. Ranking the tested paths For each method with one or more tested paths, we sort their paths in descending order of path frequency 14
  • 15. Study Design: Selecting Software Systems 25 Python systems 2,357 methods 14,177 tests 11,425 tested paths 15
  • 16. Study Design: Research Questions RQ1: Frequency of the most tested paths (top 1 vs. top 2) RQ2: Frequency of the least tested paths (top 1 vs. top 3+) 16
  • 18. RQ1: Frequency of the Most Tested Paths 18 Top 1 vs. Top 2
  • 19. RQ1: Frequency of the Most Tested Paths 19 Top 1 vs. Top 2 Finding 1: Overall, one tested path tends to receive most of the calls. Top 1 receives 4x more calls than the Top 2.
  • 20. RQ1: Frequency of the Most Tested Paths 20 Finding 1: Overall, one tested path tends to receive most of the calls. Top 1 receives 4x more calls than the Top 2. Top 1 vs. Top 2 Finding 2: In methods with two tested paths, one path tends receive close to 5x more calls than the second one.
  • 21. RQ1: Frequency of the Most Tested Paths 21 Finding 2: In methods with two tested paths, one path tends receive close to 5x more calls than the second one. Finding 3: Even methods with four or more tested paths have one path that receives the majority of the calls. Top 1 vs. Top 2 Finding 1: Overall, one tested path tends to receive most of the calls. Top 1 receives 4x more calls than the Top 2.
  • 22. RQ2: Frequency of the Least Tested Paths 22 Top 1 vs. Top 3+
  • 23. RQ2: Frequency of the Least Tested Paths 23 Top 1 vs. Top 3+
  • 24. RQ2: Frequency of the Least Tested Paths 24 Top 1 vs. Top 3+ Finding 4: The top 3+ tested paths receive a minority of the calls, ranging from 4% to 24%. Overall, the most tested path of a method has 6.5x more calls than the top 3+.
  • 25. Summary We presented an empirical study to assess the tested paths quantitatively We monitored the execution of over 14K tests and 11K tested paths Overall, we found that one tested path is prevalent and receives most of the calls, while others are significantly less executed Possible applications: ● Provide insights for developers to improve existing test suites ● Support the creation of novel testing tools ● Reveal novel empirical data for researchers 25
  • 26. Monitoring the Execution of 14K Tests: Methods Tend to Have One Path That Is Significantly More Executed Andre Hora DCC/UFMG andrehora@dcc.ufmg.br 26 FSE 2024 Ideas, Visions and Reflections