Preempting Flaky Tests
via Non-Idempotent-Outcome Tests
Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, Wing Lam
anjiang@stanford.edu
Funding acknowledgments​
CCF-1763788
CCF-1956374
62161146003
1
2
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
test0
test1
test2
testn
…
Build code
Run tests
3
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
Merge Changes
Pass
test0
test1
test2
testn
…
Build code
Run tests
4
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
4:15 PM
Fail
Debug Changes
test0
test1
test2
testn
…
Build code
Run tests
?
??
5
Developer Anecdote
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
test0
test1
test2
testn
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Build code
Run tests
Build code
Run tests
?
??
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
…
- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);
return ts.size();
…
Servers
Build code
Run tests
Build code
Run tests
Developer Anecdote
Servers
test0
test1
test2
testn
…
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Developer wastes time
debugging & running tests
and goes home
1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can
non-deterministically
pass and fail when run on
the same code version
6
?
?
?
?
??
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
…
Servers
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
…
…
- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);
return db.size();
… Servers
Servers
test0
test1
test2
testn
…
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
test0
test1
test2
testn
…
Servers
test0
test1
test2
testn
…
Developer wastes time
debugging & running tests
and goes home
1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can
non-deterministically
pass and fail when run on
the same version of the code
Public Outcry About Flaky Tests
7
What are Flaky Tests?
• A test is flaky if it passes and fails for the same code version
• Misleads developers to debug nonexistent faults in recent changes
• Reduces trust in tests
• Order-dependent tests are a prominent category of flaky tests
• An order-dependent test deterministically passes or fails in any given test order,
passes in 1+ order, and fails in 1+ order
8
Background: Victim and Polluter
•Victim 𝑡1 fails when run after polluter 𝑡2
• Polluter has modified some shared state
• Victim’s test assertion depends on some shared state
• The same shared state (the variable 𝑥 in the code)
// shared variable x is initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
TestOrder1
t1 t2
TestOrder2
t2 t1
9
Background: Latent-Victim, Latent-Polluter
• Latent-Victim 𝑡3:
• Assertion depends on shared state; currently no tests modify 𝑦
• victims ⊂ latent-victims
• Latent-Polluter 𝑡4:
• Shared state modification; currently no tests put assertions on 𝑧
• polluters ⊂ latent-polluters
// shared variables x, y, z are initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
void t3() { assert y == 0; } // latent-victim
void t4() { z = 1; } // latent-polluter
10
Non-Idempotent-Outcome (NIO) Test
• A test is non-idempotent-outcome (NIO):
• t5(); t5()  pass; fail
• Passes in the first run but fails in the second when run twice consecutively
• An NIO test self-pollutes the state that its own assertions depend on
• NIO ⊂ latent-polluter ∧ NIO ⊂ latent-victim
// shared variables x, y, z, w are initialized to 0
void t1() { assert x == 0; } // victim
void t2() { x = 1; } // polluter
void t3() { assert y == 0; } // latent-victim
void t4() { z = 1; } // latent-polluter
void t5() { assert w = 0; w = 1;} // NIO
11
Why should we detect NIOs?
• Typically, tests are not run twice
• To preempt/prevent flaky tests
• Why not fix latent-polluter?
• Why not fix latent-victim?
• Prior work
• Gyori et al.1 detect 575 latent-polluters
• Manually filter 381 (66%) false positives (cannot reasonably become polluters)
• Huo and Clause2 detect latent-victims with dynamic taint analysis
• Do not report how many can reasonably become victims
• They do NOT fix any tests
• NIOs are more worth fixing
• Both latent-victims and latent-polluters at the same time
• Easy to detect, no false positives
• Well-accepted fixes
1 Gyori et al., “Reliable testing: Detecting state-polluting tests to prevent test dependency”. ISSTA 2015
2 Huo and Clause, “Improving oracle quality by detecting brittle assertions and unused inputs in tests”. In FSE 2014
12
Contributions
• Definition of NIO tests
• Deterministically change from pass to fail when run twice
• Effective detection & empirical evaluation
• Propose 3 modes for detection
• 127 Java test suites  223 NIO tests
• 1006 Python projects  138 NIO tests
• Well-accepted fixes
• Inspect every NIO test (no false positive)
• Open pull requests for 268 tests
• 192 accepted, 70 pending, only 6 rejected
13
Real Example of NIO
Buggy Cleaning Code
def cmd_mock():
def _cmd_mock(name: str):
cmd.__overrides__[name] = [‘/bin/true’]
yield _cmd_mock
- cmd.__overrides__ = []
+ cmd.__overrides__ = {}
def test_slurm_command(tmp_path, cmd_mock):
cmd_mock('srun')
TypeError: list indices must be
integers or slices, not str
14
Real Example of NIO
def to_zero(tvd, northing, easting,
surface_northing, surface_easting):
# perform some checking
- northing -= surface_northing
- easting -= surface_easting
+ northing = northing - surface_northing
+ easting = easting - surface_easting
return tvd, northing, easting
# initialization for global variables: g1,…,g5
g1 = ...
def test_zero():
# global variables passed in as arguments
v1, v2, v3 = to_zero(g1, g2, g3, g4, g5)
np.testing.assert_equal (...) # assertion
Fix: Avoid Function Side Effect
AssertionError:
Mismatched elements: 121 / 121 (100%)
15
Prevalence of NIO Tests
Conclusion:
• NIO tests are prevalent enough that every project should run NIO detection
at least once
Java Python
# Test Suites (total) 127 1006
# Test Suites w/ NIO 34 138
% Test Suites w/ NIO 26% 9%
# NIO Tests 223 138
16
Different Detection Modes
• Three Different Modes
• Isolated-method
• Run1: t1, t1
• Run2: t2, t2
• Run3: t3, t3
• Isolated-class
• Run1: t1, t1, t2, t2
• Run2: t3, t3
• Entire-suite
• Run1: t1, t1, t2, t2, t3, t3
• Conclusion
• All three modes detect similar tests
• Isolated-method (223) > Isolated-class (212) > Entire-suite (210)
• Entire-suite has the lowest overhead
• Why differ? See paper for details
TestClass A
t1 t2
TestClass B
t3
Test Suite
17
• We detect 361 (233 Java + 138 Python) NIO tests
• We fix 268 NIO tests by opening Pull Requests
• 192 tests accepted
• 70 tests pending
• 6 tests are rejected
• We do not fix 51 NIO tests
• Cannot localize pollution
• Difficult to clean the pollution
• 42 tests are N/A
• Not NIO in the latest version (fixed/deleted/etc)
• Conclusion
• Developers are generally positive about fixes for NIO tests
• Providing reproducing steps and explaining the motivation help
Experience with Fixing NIO Tests
192
70
6
51
42
Accepted Pending Rejected Do not Fix N/A
18
NIO vs. Polluter vs. Victim
• NIO tests are related to but not
subsumed by polluters and
victims
• Detecting NIO tests can be an
effective way to preempt
polluters and victims
19
Conclusions
• We focus on Non-Idempotent-Outcome (NIO) tests
• Deterministically change from pass to fail when run twice
• Detect and fix NIO tests
• Preempt order-dependent flaky tests
• Importance: in the intersection of latent-polluters and latent-victims
• Detect 361 NIO tests (223 Java + 138 Python)
• Opened pull requests for 268 tests, with 192 accepted
• Dataset publicly available:
• https://sites.google.com/view/nio-tests
• IDoFT dataset (all flaky tests): https://github.com/TestingResearchIllinois/idoft
Questions? Email: Anjiang Wei <anjiang@stanford.edu> 20

More Related Content

PPTX
CodeChecker summary 21062021
PPT
Verilog Lecture3 hust 2014
PPT
Python testing
PPT
Introduzione al TDD
PDF
Unit testing in iOS featuring OCUnit, GHUnit & OCMock
PPTX
JDD 2016 - Sebastian Malaca - You Dont Need Unit Tests
KEY
Unit testing for Cocoa developers
PDF
Mutation testing in Java
CodeChecker summary 21062021
Verilog Lecture3 hust 2014
Python testing
Introduzione al TDD
Unit testing in iOS featuring OCUnit, GHUnit & OCMock
JDD 2016 - Sebastian Malaca - You Dont Need Unit Tests
Unit testing for Cocoa developers
Mutation testing in Java

What's hot (20)

PPTX
CodeChecker Overview Nov 2019
PPTX
Symbolic Execution And KLEE
PDF
MUTANTS KILLER (Revised) - PIT: state of the art of mutation testing system
PDF
MUTANTS KILLER - PIT: state of the art of mutation testing system
PPTX
Navigating the xDD Alphabet Soup
PDF
(automatic) Testing: from business to university and back
PPTX
Behavioral modelling in VHDL
PPT
AUTOMATED TESTING USING PYTHON (ATE)
PPTX
Pi j4.2 software-reliability
PDF
VHdl lab report
PPT
Handling Exceptions In C &amp; C++[Part A]
PDF
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
PDF
Exception Handling
PDF
Vlsi lab manual exp:2
KEY
Taking a Test Drive: iOS Dev UK guide to TDD
PPTX
How to create a high quality static code analyzer
PPTX
System Verilog 2009 & 2012 enhancements
PDF
UVM TUTORIAL;
PDF
TDD CrashCourse Part3: TDD Techniques
CodeChecker Overview Nov 2019
Symbolic Execution And KLEE
MUTANTS KILLER (Revised) - PIT: state of the art of mutation testing system
MUTANTS KILLER - PIT: state of the art of mutation testing system
Navigating the xDD Alphabet Soup
(automatic) Testing: from business to university and back
Behavioral modelling in VHDL
AUTOMATED TESTING USING PYTHON (ATE)
Pi j4.2 software-reliability
VHdl lab report
Handling Exceptions In C &amp; C++[Part A]
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
Exception Handling
Vlsi lab manual exp:2
Taking a Test Drive: iOS Dev UK guide to TDD
How to create a high quality static code analyzer
System Verilog 2009 & 2012 enhancements
UVM TUTORIAL;
TDD CrashCourse Part3: TDD Techniques
Ad

Similar to NIO-ICSE2022.pptx (20)

PPTX
2016 10-04: tdd++: tdd made easier
PDF
Verilator: Fast, Free, But for Me?
PPTX
Unit Testing with JUnit4 by Ravikiran Janardhana
PDF
Test driven development
PDF
Shift-Left Testing: QA in a DevOps World by David Laulusa
PPTX
Kill the mutants and test your tests - Roy van Rijn
PDF
Kill the mutants - A better way to test your tests
PDF
How good are your tests?
PPTX
materi pengujiannnnnnnnnnnnnnnnnnnnnnnnnn
PDF
DSR Testing (Part 1)
PDF
Introduzione allo Unit Testing
PPT
testing(2).pptjjsieieo2i33kejjskskosowwiwk
PPTX
C++ Testing Techniques Tips and Tricks - C++ London
PDF
Mutation Testing: Leaving the Stone Age. FOSDEM 2017
PDF
TDD reloaded - JUGTAA 24 Ottobre 2012
PPT
Chapter 14 software testing techniques
PDF
The Joy of Testing - Deep Dive @ Devoxx Belgium 2024
PPT
OS Process Synchronization, semaphore and Monitors
PPTX
Cpp Testing Techniques Tips and Tricks - Cpp Europe
2016 10-04: tdd++: tdd made easier
Verilator: Fast, Free, But for Me?
Unit Testing with JUnit4 by Ravikiran Janardhana
Test driven development
Shift-Left Testing: QA in a DevOps World by David Laulusa
Kill the mutants and test your tests - Roy van Rijn
Kill the mutants - A better way to test your tests
How good are your tests?
materi pengujiannnnnnnnnnnnnnnnnnnnnnnnnn
DSR Testing (Part 1)
Introduzione allo Unit Testing
testing(2).pptjjsieieo2i33kejjskskosowwiwk
C++ Testing Techniques Tips and Tricks - C++ London
Mutation Testing: Leaving the Stone Age. FOSDEM 2017
TDD reloaded - JUGTAA 24 Ottobre 2012
Chapter 14 software testing techniques
The Joy of Testing - Deep Dive @ Devoxx Belgium 2024
OS Process Synchronization, semaphore and Monitors
Cpp Testing Techniques Tips and Tricks - Cpp Europe
Ad

Recently uploaded (20)

PPTX
"Secure File Sharing Solutions on AWS".pptx
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Time Tracking Features That Teams and Organizations Actually Need
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
Introduction to Windows Operating System
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PPTX
assetexplorer- product-overview - presentation
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Website Design Services for Small Businesses.pdf
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Trending Python Topics for Data Visualization in 2025
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Cost to Outsource Software Development in 2025
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
"Secure File Sharing Solutions on AWS".pptx
iTop VPN Crack Latest Version Full Key 2025
Computer Software and OS of computer science of grade 11.pptx
CCleaner 6.39.11548 Crack 2025 License Key
Monitoring Stack: Grafana, Loki & Promtail
Time Tracking Features That Teams and Organizations Actually Need
Wondershare Recoverit Full Crack New Version (Latest 2025)
Introduction to Windows Operating System
How Tridens DevSecOps Ensures Compliance, Security, and Agility
assetexplorer- product-overview - presentation
Autodesk AutoCAD Crack Free Download 2025
Website Design Services for Small Businesses.pdf
DNT Brochure 2025 – ISV Solutions @ D365
Oracle Fusion HCM Cloud Demo for Beginners
Trending Python Topics for Data Visualization in 2025
Advanced SystemCare Ultimate Crack + Portable (2025)
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Cost to Outsource Software Development in 2025
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency

NIO-ICSE2022.pptx

  • 1. Preempting Flaky Tests via Non-Idempotent-Outcome Tests Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, Wing Lam anjiang@stanford.edu Funding acknowledgments​ CCF-1763788 CCF-1956374 62161146003 1
  • 2. 2 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM test0 test1 test2 testn … Build code Run tests
  • 3. 3 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM Merge Changes Pass test0 test1 test2 testn … Build code Run tests
  • 4. 4 Developer Anecdote Servers test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … 4:15 PM Fail Debug Changes test0 test1 test2 testn … Build code Run tests
  • 5. ? ?? 5 Developer Anecdote Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … test0 test1 test2 testn … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Build code Run tests Build code Run tests
  • 6. ? ?? … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … … - static int add() { + static int add(r) { - ts.addRow(“”); + ts.addRow(r); return ts.size(); … Servers Build code Run tests Build code Run tests Developer Anecdote Servers test0 test1 test2 testn … Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Developer wastes time debugging & running tests and goes home 1 hour and 15 min later 1 hour 15 min Flaky Test: a test that can non-deterministically pass and fail when run on the same code version 6
  • 7. ? ? ? ? ?? … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … Servers … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … … - static int add() { + static int add(r) { - db.addRow(“”); + db.addRow(r); return db.size(); … Servers Servers test0 test1 test2 testn … 4:15 PM 5:00 PM 5:30 PM 6:15 PM Servers test0 test1 test2 testn … Servers test0 test1 test2 testn … Developer wastes time debugging & running tests and goes home 1 hour and 15 min later 1 hour 15 min Flaky Test: a test that can non-deterministically pass and fail when run on the same version of the code Public Outcry About Flaky Tests 7
  • 8. What are Flaky Tests? • A test is flaky if it passes and fails for the same code version • Misleads developers to debug nonexistent faults in recent changes • Reduces trust in tests • Order-dependent tests are a prominent category of flaky tests • An order-dependent test deterministically passes or fails in any given test order, passes in 1+ order, and fails in 1+ order 8
  • 9. Background: Victim and Polluter •Victim 𝑡1 fails when run after polluter 𝑡2 • Polluter has modified some shared state • Victim’s test assertion depends on some shared state • The same shared state (the variable 𝑥 in the code) // shared variable x is initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter TestOrder1 t1 t2 TestOrder2 t2 t1 9
  • 10. Background: Latent-Victim, Latent-Polluter • Latent-Victim 𝑡3: • Assertion depends on shared state; currently no tests modify 𝑦 • victims ⊂ latent-victims • Latent-Polluter 𝑡4: • Shared state modification; currently no tests put assertions on 𝑧 • polluters ⊂ latent-polluters // shared variables x, y, z are initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter void t3() { assert y == 0; } // latent-victim void t4() { z = 1; } // latent-polluter 10
  • 11. Non-Idempotent-Outcome (NIO) Test • A test is non-idempotent-outcome (NIO): • t5(); t5()  pass; fail • Passes in the first run but fails in the second when run twice consecutively • An NIO test self-pollutes the state that its own assertions depend on • NIO ⊂ latent-polluter ∧ NIO ⊂ latent-victim // shared variables x, y, z, w are initialized to 0 void t1() { assert x == 0; } // victim void t2() { x = 1; } // polluter void t3() { assert y == 0; } // latent-victim void t4() { z = 1; } // latent-polluter void t5() { assert w = 0; w = 1;} // NIO 11
  • 12. Why should we detect NIOs? • Typically, tests are not run twice • To preempt/prevent flaky tests • Why not fix latent-polluter? • Why not fix latent-victim? • Prior work • Gyori et al.1 detect 575 latent-polluters • Manually filter 381 (66%) false positives (cannot reasonably become polluters) • Huo and Clause2 detect latent-victims with dynamic taint analysis • Do not report how many can reasonably become victims • They do NOT fix any tests • NIOs are more worth fixing • Both latent-victims and latent-polluters at the same time • Easy to detect, no false positives • Well-accepted fixes 1 Gyori et al., “Reliable testing: Detecting state-polluting tests to prevent test dependency”. ISSTA 2015 2 Huo and Clause, “Improving oracle quality by detecting brittle assertions and unused inputs in tests”. In FSE 2014 12
  • 13. Contributions • Definition of NIO tests • Deterministically change from pass to fail when run twice • Effective detection & empirical evaluation • Propose 3 modes for detection • 127 Java test suites  223 NIO tests • 1006 Python projects  138 NIO tests • Well-accepted fixes • Inspect every NIO test (no false positive) • Open pull requests for 268 tests • 192 accepted, 70 pending, only 6 rejected 13
  • 14. Real Example of NIO Buggy Cleaning Code def cmd_mock(): def _cmd_mock(name: str): cmd.__overrides__[name] = [‘/bin/true’] yield _cmd_mock - cmd.__overrides__ = [] + cmd.__overrides__ = {} def test_slurm_command(tmp_path, cmd_mock): cmd_mock('srun') TypeError: list indices must be integers or slices, not str 14
  • 15. Real Example of NIO def to_zero(tvd, northing, easting, surface_northing, surface_easting): # perform some checking - northing -= surface_northing - easting -= surface_easting + northing = northing - surface_northing + easting = easting - surface_easting return tvd, northing, easting # initialization for global variables: g1,…,g5 g1 = ... def test_zero(): # global variables passed in as arguments v1, v2, v3 = to_zero(g1, g2, g3, g4, g5) np.testing.assert_equal (...) # assertion Fix: Avoid Function Side Effect AssertionError: Mismatched elements: 121 / 121 (100%) 15
  • 16. Prevalence of NIO Tests Conclusion: • NIO tests are prevalent enough that every project should run NIO detection at least once Java Python # Test Suites (total) 127 1006 # Test Suites w/ NIO 34 138 % Test Suites w/ NIO 26% 9% # NIO Tests 223 138 16
  • 17. Different Detection Modes • Three Different Modes • Isolated-method • Run1: t1, t1 • Run2: t2, t2 • Run3: t3, t3 • Isolated-class • Run1: t1, t1, t2, t2 • Run2: t3, t3 • Entire-suite • Run1: t1, t1, t2, t2, t3, t3 • Conclusion • All three modes detect similar tests • Isolated-method (223) > Isolated-class (212) > Entire-suite (210) • Entire-suite has the lowest overhead • Why differ? See paper for details TestClass A t1 t2 TestClass B t3 Test Suite 17
  • 18. • We detect 361 (233 Java + 138 Python) NIO tests • We fix 268 NIO tests by opening Pull Requests • 192 tests accepted • 70 tests pending • 6 tests are rejected • We do not fix 51 NIO tests • Cannot localize pollution • Difficult to clean the pollution • 42 tests are N/A • Not NIO in the latest version (fixed/deleted/etc) • Conclusion • Developers are generally positive about fixes for NIO tests • Providing reproducing steps and explaining the motivation help Experience with Fixing NIO Tests 192 70 6 51 42 Accepted Pending Rejected Do not Fix N/A 18
  • 19. NIO vs. Polluter vs. Victim • NIO tests are related to but not subsumed by polluters and victims • Detecting NIO tests can be an effective way to preempt polluters and victims 19
  • 20. Conclusions • We focus on Non-Idempotent-Outcome (NIO) tests • Deterministically change from pass to fail when run twice • Detect and fix NIO tests • Preempt order-dependent flaky tests • Importance: in the intersection of latent-polluters and latent-victims • Detect 361 NIO tests (223 Java + 138 Python) • Opened pull requests for 268 tests, with 192 accepted • Dataset publicly available: • https://sites.google.com/view/nio-tests • IDoFT dataset (all flaky tests): https://github.com/TestingResearchIllinois/idoft Questions? Email: Anjiang Wei <anjiang@stanford.edu> 20