SlideShare a Scribd company logo
Does Refactoring of Test Smells Induce
Fixing Flaky Tests?
Fabio Palomba, Andy Zaidman
Delft University of Technology

The Netherlands
Regression Testing
Test cases form the first line of 

defense against the introduction of faults
Pinto et al.

“Understanding Myths and Realities of 

Test-Suite Evolution”
FSE 2012
Regression Testing
The entire team rely on them to decide
whether to merge a pull request
Gousios et al.

“Work practices and challenges in pull-based
development: The integrator’s perspective”
ICSE 2015
Regression Testing
Or even whether proceed 

with the deployment of the system
Beller et al.

“Ooops, my tests broke the build: An
explorative analysis of Travis CI with
GitHub” - MSR 2017
Developers’ productivity is dependent on both the
ability to find real problems and the cost of
diagnosing faults in a timely fashion
Perez et al.

“A Test-Suite Diagnosability Metric for
Spectrum-based fault localization
approaches” - ICSE 2017
Regression Testing
Vahabzadeh et al.

“An Empirical Study of Bugs in Test Code”

ICSME 2015
Even angels eat beans
!
Test code can be affected by
bugs as well
One of the typical bugs affecting
test suites is called “flakiness”
Flaky Tests
Test cases that exhibit both
a passing and failing result
with the same code
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
Flaky Tests
Test cases that exhibit both
a passing and failing result
with the same code
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
They might hide real bugs,
increase maintenance costs, and
reduce developers’ confidence
Flaky Tests
Flaky Tests are relevant for practitioners and
highly discussed on the web
Empirical Studies
Likely causes behind test code flakiness
Empirical Studies
Flaky Test Identification
Likely causes behind test code flakiness
Proposing solutions to fix flaky tests
Empirical Studies
Flaky Test Identification
They focus only on specific causes
behind test code flakiness
Likely causes behind test code flakiness
Proposing solutions to fix flaky tests
Empirical Studies
Flaky Test Identification
Thus, only ad-hoc solutions are available
Likely causes behind test code flakiness
Proposing solutions to fix flaky tests
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
Problems faced by previous research
only represent a part of the whole story
A Deeper Analysis
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
Problems faced by previous research
only represent a part of the whole story
A Deeper Analysis
A deeper analysis of possible fixing
strategies of other root causes is still
missing
Studying Test Smells
Van Deursen et al.

“Refactoring Test Code”

XP 2001
Symptoms of poor design or
implementation choices in test code
Studying Test Smells
Symptoms of poor design or
implementation choices in test code
Van Deursen et al.

“Refactoring Test Code”

XP 2001
Resource Optimism is a test that makes
optimistic assumption about the state or the
existence of an external resource
Studying Test Smells
Symptoms of poor design or
implementation choices in test code
Van Deursen et al.

“Refactoring Test Code”

XP 2001
Indirect Testing is a test that exercises
different classes with respect to the
corresponding production class
Studying Test Smells
Symptoms of poor design or
implementation choices in test code
Van Deursen et al.

“Refactoring Test Code”

XP 2001
Test Run War is a test that allocates
resources that are also used by other
methods, possibly causing interferences
Research Questions
?
What are the causes of test flakiness?
Research Questions
?
What are the causes of test flakiness?
To what extent can flaky tests be
explained by the presence of tests smells?
Research Questions
?
What are the causes of test flakiness?
To what extent can flaky tests be
explained by the presence of tests smells?
To what extent does refactoring of test
smells help in removing flaky tests?
Research Questions
?
Software Projects
18open-source systems randomly
selected from Github
Detecting Test Smells
Bavota et al.

“Are Test Smells Really Harmful?”

An Empirical Study

EMSE
Palomba et al.

“On the Diffuseness of Test Smells in
Automatically Generated Test Code”

SBST 2016
The detector exploits the definition of
the smells to detect them
The detector has a precision of 88%
and a recall of 100%
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Detecting Flaky Tests
If the output of a test method was
different at least once
We ran JUnit class ten times
Causes of Test Flakiness
We manually linked each flaky test onto one of the
10 root causes defined in the taxonomy by Luo et al.
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
Causes of Test Flakiness
We manually linked each flaky test onto one of the
10 root causes defined in the taxonomy by Luo et al.
Luo et al.

“An Empirical Analysis of Flaky Tests”

FSE 2014
JAVA
LOG
Source code of the test
Exceptions thrown
Causes of Test Flakiness
Async Wait
A test method making an
asynchronous call and that does
not wait for the result of the call.
27%
The test method does not properly
acquire or release one or more to
its resources
IO issue
Causes of Test Flakiness
22%
Different threads interact in a non-
desirable manner
Concurrency
Causes of Test Flakiness
17%
Causes of Test Flakiness
11%
Test Ordering Network
10%
Ordering of tests execution Network performance
Test Smells vs Flaky Tests
Test Smells Flaky Tests
Test Smells vs Flaky Tests
61%
61%
Test Smells vs Flaky Tests
How many of them are casual co-occurrences?
Test Smells vs Flaky Tests
61%
We manually identified the
flakiness-inducing test smells
Test Smells vs Flaky Tests
61%
We manually identified the
flakiness-inducing test smells
A Resource Optimism was
casually related to a test
case if the flakiness was due
to issues in the management
fo external resources
Test Smells vs Flaky Tests
54%
Test Smells vs Flaky Tests
54%
Resource Optimism
IO issue
Network
Test Smells vs Flaky Tests
54%
Resource Optimism
IO issue
Network
Indirect Testing Test Ordering
Test Smells vs Flaky Tests
54%
Resource Optimism
IO issue
Network
Indirect Testing Test Ordering
Test Run War Concurrency
The Role of Refactoring
We manually refactored according to the guidelines
defined by Van Deursen et al.
Van Deursen et al.

“Refactoring Test Code”

XP 2001
The Role of Refactoring
We re-ran the flaky tests identification
We re-ran the test smell detection
The Role of Refactoring
100%
of the test code refactored did not
present a test smell anymore
The Role of Refactoring
100%
of the test code refactored did not
present a flaky test anymore
The Role of Refactoring
54%
of the total flaky tests were
removed by means of refactoring
Flaky Tests
Test Smells
&
Future Research Agenda
Extending the Study
Understanding whether refactoring of test
smells is actually adopted by developers
as flaky test fixing strategy
Tools for Automated Refactoring of
Test Smells
Does Refactoring of Test Smells Induce
Fixing Flaky Tests?
Fabio Palomba, Andy Zaidman
Delft University of Technology

The Netherlands

More Related Content

PDF
Test Axioms – An Introduction
PDF
The Thinking Tester, Evolved
PDF
Exploratory Testing: Make It Part of Your Test Strategy
PDF
Pragmatic Not Dogmatic TDD Agile2012 by Joseph Yoder and Rebecca Wirfs-Brock
PPT
Chap009
PDF
Exploratory Testing in an Agile Context
PDF
Exploratory Testing in Practice
PDF
On The Relation of Test Smells to Software Code Quality
Test Axioms – An Introduction
The Thinking Tester, Evolved
Exploratory Testing: Make It Part of Your Test Strategy
Pragmatic Not Dogmatic TDD Agile2012 by Joseph Yoder and Rebecca Wirfs-Brock
Chap009
Exploratory Testing in an Agile Context
Exploratory Testing in Practice
On The Relation of Test Smells to Software Code Quality

Similar to Does Refactoring of Test Smells Induce Fixing Flaky Tests? (20)

PDF
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
PDF
Reading Notes of The Art Of Unit Test Ch7
PDF
stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tes...
PDF
Solving Flaky Automated Tests Using Machine Learning
PPTX
Google, quality and you
PDF
SFSCON23 - Juri Solovjov - Flaky tests – how to deal with them
PDF
Shifting Testing Left - The Pain Points and Solutions
PDF
Your Automated Execution Does Not Have to be Flaky
PDF
Test Anti-Patterns: From Definition to Detection
PDF
PHANTA: Diversified Test Code Quality Measurement for Modern Software Develop...
PDF
Keynote presentation at DeepTest Workshop 2025
PDF
Common Test Problems Checklist
PDF
SFSCON23 - Daniel Hiller - squash the flakes!
PDF
The Death of Flaky Tests by Dave Haeffner
PDF
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
PDF
The Art of Test Patterns
PPTX
Big Data Makes The Flake Go Away
PPTX
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
PDF
Strategies to Avoid Test Fixture Smells durin Software Evolution
PDF
Tests antipatterns
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Reading Notes of The Art Of Unit Test Ch7
stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tes...
Solving Flaky Automated Tests Using Machine Learning
Google, quality and you
SFSCON23 - Juri Solovjov - Flaky tests – how to deal with them
Shifting Testing Left - The Pain Points and Solutions
Your Automated Execution Does Not Have to be Flaky
Test Anti-Patterns: From Definition to Detection
PHANTA: Diversified Test Code Quality Measurement for Modern Software Develop...
Keynote presentation at DeepTest Workshop 2025
Common Test Problems Checklist
SFSCON23 - Daniel Hiller - squash the flakes!
The Death of Flaky Tests by Dave Haeffner
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
The Art of Test Patterns
Big Data Makes The Flake Go Away
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Strategies to Avoid Test Fixture Smells durin Software Evolution
Tests antipatterns
Ad

More from Fabio Palomba (14)

PDF
Social Debt Analytics for Improving the Management of Software Evolution Tasks
PDF
Smells Like Teen Spirit: Improving Bug Prediction Performance using the Inten...
PDF
A Textual-based Technique for Smell Detection
PDF
Extract Package Refactoring in ARIES
PDF
When and Why Your Code Starts to Smell Bad
PDF
Textual Analysis for Code Smell Detection
PDF
PhD Symposium 2014
PDF
ARIES: An Eclipse Plug-in To Support Extract Class Refactoring
PDF
Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells
PDF
People management
PDF
Next! - An Android application to support tourists activities
PDF
Detecting Bad Smells in Source Code using Change History Information
PDF
A false digital alibi on Mac OS X
PDF
Un plug-in Eclipse per il supporto all'Extract Class Refactoring
Social Debt Analytics for Improving the Management of Software Evolution Tasks
Smells Like Teen Spirit: Improving Bug Prediction Performance using the Inten...
A Textual-based Technique for Smell Detection
Extract Package Refactoring in ARIES
When and Why Your Code Starts to Smell Bad
Textual Analysis for Code Smell Detection
PhD Symposium 2014
ARIES: An Eclipse Plug-in To Support Extract Class Refactoring
Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells
People management
Next! - An Android application to support tourists activities
Detecting Bad Smells in Source Code using Change History Information
A false digital alibi on Mac OS X
Un plug-in Eclipse per il supporto all'Extract Class Refactoring
Ad

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Transform Your Business with a Software ERP System
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
System and Network Administration Chapter 2
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Nekopoi APK 2025 free lastest update
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Essential Infomation Tech presentation.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Design an Analysis of Algorithms II-SECS-1021-03
Reimagine Home Health with the Power of Agentic AI​
How Creative Agencies Leverage Project Management Software.pdf
Operating system designcfffgfgggggggvggggggggg
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Odoo POS Development Services by CandidRoot Solutions
Odoo Companies in India – Driving Business Transformation.pdf
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms I-SECS-1021-03
Transform Your Business with a Software ERP System
Navsoft: AI-Powered Business Solutions & Custom Software Development
System and Network Administration Chapter 2
Lecture 3: Operating Systems Introduction to Computer Hardware Systems

Does Refactoring of Test Smells Induce Fixing Flaky Tests?

  • 1. Does Refactoring of Test Smells Induce Fixing Flaky Tests? Fabio Palomba, Andy Zaidman Delft University of Technology
 The Netherlands
  • 2. Regression Testing Test cases form the first line of 
 defense against the introduction of faults Pinto et al.
 “Understanding Myths and Realities of 
 Test-Suite Evolution” FSE 2012
  • 3. Regression Testing The entire team rely on them to decide whether to merge a pull request Gousios et al.
 “Work practices and challenges in pull-based development: The integrator’s perspective” ICSE 2015
  • 4. Regression Testing Or even whether proceed 
 with the deployment of the system Beller et al.
 “Ooops, my tests broke the build: An explorative analysis of Travis CI with GitHub” - MSR 2017
  • 5. Developers’ productivity is dependent on both the ability to find real problems and the cost of diagnosing faults in a timely fashion Perez et al.
 “A Test-Suite Diagnosability Metric for Spectrum-based fault localization approaches” - ICSE 2017 Regression Testing
  • 6. Vahabzadeh et al.
 “An Empirical Study of Bugs in Test Code”
 ICSME 2015 Even angels eat beans ! Test code can be affected by bugs as well One of the typical bugs affecting test suites is called “flakiness”
  • 7. Flaky Tests Test cases that exhibit both a passing and failing result with the same code Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014
  • 8. Flaky Tests Test cases that exhibit both a passing and failing result with the same code Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014 They might hide real bugs, increase maintenance costs, and reduce developers’ confidence
  • 9. Flaky Tests Flaky Tests are relevant for practitioners and highly discussed on the web
  • 10. Empirical Studies Likely causes behind test code flakiness
  • 11. Empirical Studies Flaky Test Identification Likely causes behind test code flakiness Proposing solutions to fix flaky tests
  • 12. Empirical Studies Flaky Test Identification They focus only on specific causes behind test code flakiness Likely causes behind test code flakiness Proposing solutions to fix flaky tests
  • 13. Empirical Studies Flaky Test Identification Thus, only ad-hoc solutions are available Likely causes behind test code flakiness Proposing solutions to fix flaky tests
  • 14. Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014 Problems faced by previous research only represent a part of the whole story A Deeper Analysis
  • 15. Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014 Problems faced by previous research only represent a part of the whole story A Deeper Analysis A deeper analysis of possible fixing strategies of other root causes is still missing
  • 16. Studying Test Smells Van Deursen et al.
 “Refactoring Test Code”
 XP 2001 Symptoms of poor design or implementation choices in test code
  • 17. Studying Test Smells Symptoms of poor design or implementation choices in test code Van Deursen et al.
 “Refactoring Test Code”
 XP 2001 Resource Optimism is a test that makes optimistic assumption about the state or the existence of an external resource
  • 18. Studying Test Smells Symptoms of poor design or implementation choices in test code Van Deursen et al.
 “Refactoring Test Code”
 XP 2001 Indirect Testing is a test that exercises different classes with respect to the corresponding production class
  • 19. Studying Test Smells Symptoms of poor design or implementation choices in test code Van Deursen et al.
 “Refactoring Test Code”
 XP 2001 Test Run War is a test that allocates resources that are also used by other methods, possibly causing interferences
  • 21. What are the causes of test flakiness? Research Questions ?
  • 22. What are the causes of test flakiness? To what extent can flaky tests be explained by the presence of tests smells? Research Questions ?
  • 23. What are the causes of test flakiness? To what extent can flaky tests be explained by the presence of tests smells? To what extent does refactoring of test smells help in removing flaky tests? Research Questions ?
  • 24. Software Projects 18open-source systems randomly selected from Github
  • 25. Detecting Test Smells Bavota et al.
 “Are Test Smells Really Harmful?”
 An Empirical Study
 EMSE Palomba et al.
 “On the Diffuseness of Test Smells in Automatically Generated Test Code”
 SBST 2016 The detector exploits the definition of the smells to detect them The detector has a precision of 88% and a recall of 100%
  • 26. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 27. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 28. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 29. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 30. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 31. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 32. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 33. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 34. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 35. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 36. Detecting Flaky Tests If the output of a test method was different at least once We ran JUnit class ten times
  • 37. Causes of Test Flakiness We manually linked each flaky test onto one of the 10 root causes defined in the taxonomy by Luo et al. Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014
  • 38. Causes of Test Flakiness We manually linked each flaky test onto one of the 10 root causes defined in the taxonomy by Luo et al. Luo et al.
 “An Empirical Analysis of Flaky Tests”
 FSE 2014 JAVA LOG Source code of the test Exceptions thrown
  • 39. Causes of Test Flakiness Async Wait A test method making an asynchronous call and that does not wait for the result of the call. 27%
  • 40. The test method does not properly acquire or release one or more to its resources IO issue Causes of Test Flakiness 22%
  • 41. Different threads interact in a non- desirable manner Concurrency Causes of Test Flakiness 17%
  • 42. Causes of Test Flakiness 11% Test Ordering Network 10% Ordering of tests execution Network performance
  • 43. Test Smells vs Flaky Tests Test Smells Flaky Tests
  • 44. Test Smells vs Flaky Tests 61%
  • 45. 61% Test Smells vs Flaky Tests How many of them are casual co-occurrences?
  • 46. Test Smells vs Flaky Tests 61% We manually identified the flakiness-inducing test smells
  • 47. Test Smells vs Flaky Tests 61% We manually identified the flakiness-inducing test smells A Resource Optimism was casually related to a test case if the flakiness was due to issues in the management fo external resources
  • 48. Test Smells vs Flaky Tests 54%
  • 49. Test Smells vs Flaky Tests 54% Resource Optimism IO issue Network
  • 50. Test Smells vs Flaky Tests 54% Resource Optimism IO issue Network Indirect Testing Test Ordering
  • 51. Test Smells vs Flaky Tests 54% Resource Optimism IO issue Network Indirect Testing Test Ordering Test Run War Concurrency
  • 52. The Role of Refactoring We manually refactored according to the guidelines defined by Van Deursen et al. Van Deursen et al.
 “Refactoring Test Code”
 XP 2001
  • 53. The Role of Refactoring We re-ran the flaky tests identification We re-ran the test smell detection
  • 54. The Role of Refactoring 100% of the test code refactored did not present a test smell anymore
  • 55. The Role of Refactoring 100% of the test code refactored did not present a flaky test anymore
  • 56. The Role of Refactoring 54% of the total flaky tests were removed by means of refactoring
  • 58. Future Research Agenda Extending the Study Understanding whether refactoring of test smells is actually adopted by developers as flaky test fixing strategy Tools for Automated Refactoring of Test Smells
  • 59. Does Refactoring of Test Smells Induce Fixing Flaky Tests? Fabio Palomba, Andy Zaidman Delft University of Technology
 The Netherlands