SlideShare a Scribd company logo
squash the flakes!
stackconf 2024
Daniel Hiller
agenda
● about me
● about flakes
● impact of flakes
● flake process
● tools
● the future
● Q&A
about me
● Software Engineer @ Red Hat OpenShift Virtualization team
● KubeVirt CI, automation in general
about flakes
a flake?
…
…
…
about flakes
a flake
is a test that
without any code change
will either fail or pass in successive runs
about flakes
a test
can also fail for reasons beyond our control
that is not a flake to us
about flakes
source: https://prow.ci.kubevirt.io/pr-history/?org=kubevirt&repo=kubevirt&pr=9445
about flakes
is it important?
about flakes
does it occur regularly?
about flakes
how often do you have to deal with it?
about flakes
“… test flakiness was a frequently encountered problem, with
● 20% of respondents claiming to experience it monthly,
● 24% encountering it on a weekly basis and
● 15% dealing with it daily”
source: “A survey of flaky tests”
about flakes
“... In terms of severity, of the 91% of developers who claimed to deal with
flaky tests at least a few times a year,
● 56% described them as a moderate problem and
● 23% thought that they were a serious problem. …”
source: “A survey of flaky tests”
about flakes
flakes are caused
either by production code
or by test code
from “A survey of flaky tests”:
● 97% of flakes were false alarms*, and
● more than 50% of flakes could not be reproduced in isolation
conclusion: “ignoring flaky tests is ok”
*code under test actually is not broken, but it works as expected
impact of flakes
impact of flakes
in CI automated testing MUST give a reliable signal of stability
any failed test run signals that the product is unstable
test runs failed due to flakes do not give this reliable signal
they only waste time
impact of flakes
impact of flakes
Flaky tests waste everyone’s time - they cause
● longer feedback cycles for developers
● slowdown of merging pull requests - “retest trap”
● reversal of acceleration effects (i.e. batch testing)
impact of flakes
Flaky tests cause trust issues - they make people
● lose trust in automated testing
● ignore test results
minimizing the
impact
def: quarantine 1
to exclude a flaky test
from test runs as early
as possible, but only as
long as necessary
1: Martin Fowler - Eradicating Non-Determinism in Tests
the flake process
regular meeting
● look at flakes
● decide: fix or
quarantine?
● hand to dev
● bring back in
emergency quarantine
source: QUARANTINE.md
minimizing the
impact
how to find flaky tests?
any merged PR had all tests
succeeding in the end,
thus any test run with test failures
from that PR might contain execution
of flaky tests
minimizing the impact
what do we need?
● easily move a test between the set of
stable tests and the set of quarantined
tests
● a report over possible flaky tests
● enough runtime data to triage flakes
○ devs decide whether we quarantine right
away or they can fix them in time
stable
tests
quaran
tined
tests
flaky tests
data
quarantine
dequarantine
tools
quarantining
tools
quarantine mechanics:
ci honoring QUARANTINE* label
● pre-merge tests skip
quarantined tests
● periodics execute
quarantined tests to check
their stability
* we use the Ginkgo label - text label is
required for backwards compatibility
sources:
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/automation/test.sh#L452
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/hack/functests.sh#L69
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/tests/canary_upgrade_test.go#L177
tools
quarantine overview
(source)
since when?
where?
tools
metrics
tools
flake stats report
why: detect failure hot
spots in one view
(source)
tools
flakefinder report
why: see detailed view for
a certain day
tools
ci-health
why: show overall CI
stability metrics by
tracking
● merge-queue-length,
● time-to-merge,
● retests-to-merge and
● merges-per-day
tools
analysis
tools
ci-search
why: estimate impact as
basis for quarantine
decision
see openshift ci-search
tools
testgrid
why: second way to
determine instabilities,
drill down on all jobs for
kubevirt/kubevirt
tools
pre merge detection
tools
check-tests-for-flakes test lane
why: catch flakes before entering
main
(source)
●
tools
referee bot
why: stop excessive retesting on PRs without
changes
(source)
tools
retest metrics dashboard
why:
● show overall CI health
via number of retests
on PRs
● show PRs exceeding
retest count where
authors might need
support
in a nutshell
In regular intervals:
● follow up on previous action items
● look at data and derive action items
● hand action items over to dev teams
● revisit and dequarantine quarantined tests
main sources of flakiness
● test order dependencies
● concurrency
● data races
● differing execution platforms
key takeaways
● identify outside dependencies you have
● stabilize the testing environment
○ make it resilient against outside dependency failures
○ cache what you can
● use versioning for testing environments
the future - more data, more tooling
gaps we want to close:
● collect more data - run the majority of
tests frequently
● steadily improve in detecting new flakes
● use other methods to detect flaky tests,
i.e. static code analysis
● long term - automatic quarantine PRs
when new flakes have entered the
codebase
Q&A
Any questions?
Any suggestions for improvement?
Who else is trying to tackle this problem?
What have you done to solve this?
Thank you for attending!
Further questions?
Feel free to send questions and comments to:
mailto: dhiller@redhat.com
k8s slack: kubernetes.slack.com/
@dhiller
mastodon: @dhiller@fosstodon.org
web: www.dhiller.de
kubevirt.io
KubeVirt welcomes all kinds of contributions!
● Weekly community meeting every Wed 3PM CET
● Links:
● KubeVirt website
● KubeVirt user guide
● KubeVirt Contribution Guide
● GitHub
● Kubernetes Slack channels
○ #virtualization
○ #kubevirt-dev

More Related Content

PPT
Design testabilty
PPTX
ThoughtWorks Continuous Delivery
PDF
Manual testing
PDF
An exploratory study of the state of practice of performance testing in Java-...
PDF
SFSCON23 - Daniel Hiller - squash the flakes!
PDF
Moving to Continuous Delivery Without Breaking Your Code
PPTX
Moving to Continuous Delivery without breaking everything
PPTX
Random testing & prototyping
Design testabilty
ThoughtWorks Continuous Delivery
Manual testing
An exploratory study of the state of practice of performance testing in Java-...
SFSCON23 - Daniel Hiller - squash the flakes!
Moving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery without breaking everything
Random testing & prototyping

Similar to stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller (20)

PPTX
Random testing & prototyping
PPTX
Google, quality and you
PPTX
Agile testing
PDF
Actionable Continuous Delivery Metrics - QCon San Francisco November 2018
PPT
A confused tester in agile world finalversion
PDF
How to Actually DO High-volume Automated Testing
PDF
Manage Testing by Dependencies—Not Activities
PDF
Agile case studies
PDF
Agile testing (n)
PPTX
The DevOps Dance - Shift Left, Shift Right - Get It Right
PDF
Moving to Continuous Delivery Without Breaking Your Code
PDF
stackconf 2025 | Zap the Flakes! Leveraging AI to Combat Flaky Tests with CAN...
PPT
Istqb chapter 5
PPT
Istqb chapter 5
PDF
Are Your Continuous Tests Too Fragile for Agile?
PPTX
Practitioners’ Expectations on Automated Fault Localization
PPT
product Qa workflow
PDF
Manoj Kolhe - Testing in Agile Environment
PDF
Agile Testing 2020
PPT
ISTQB / ISEB Foundation Exam Practice - 2
Random testing & prototyping
Google, quality and you
Agile testing
Actionable Continuous Delivery Metrics - QCon San Francisco November 2018
A confused tester in agile world finalversion
How to Actually DO High-volume Automated Testing
Manage Testing by Dependencies—Not Activities
Agile case studies
Agile testing (n)
The DevOps Dance - Shift Left, Shift Right - Get It Right
Moving to Continuous Delivery Without Breaking Your Code
stackconf 2025 | Zap the Flakes! Leveraging AI to Combat Flaky Tests with CAN...
Istqb chapter 5
Istqb chapter 5
Are Your Continuous Tests Too Fragile for Agile?
Practitioners’ Expectations on Automated Fault Localization
product Qa workflow
Manoj Kolhe - Testing in Agile Environment
Agile Testing 2020
ISTQB / ISEB Foundation Exam Practice - 2
Ad

Recently uploaded (20)

PPTX
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
PPTX
Emphasizing It's Not The End 08 06 2025.pptx
PPTX
English-9-Q1-3-.pptxjkshbxnnxgchchxgxhxhx
PPTX
Impressionism_PostImpressionism_Presentation.pptx
PPTX
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
PPTX
Introduction to Effective Communication.pptx
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
nose tajweed for the arabic alphabets for the responsive
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPTX
The spiral of silence is a theory in communication and political science that...
PPTX
Tour Presentation Educational Activity.pptx
PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
Effective_Handling_Information_Presentation.pptx
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PPTX
Human Mind & its character Characteristics
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
Emphasizing It's Not The End 08 06 2025.pptx
English-9-Q1-3-.pptxjkshbxnnxgchchxgxhxhx
Impressionism_PostImpressionism_Presentation.pptx
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
Introduction to Effective Communication.pptx
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Anesthesia and it's stage with mnemonic and images
Hydrogel Based delivery Cancer Treatment
nose tajweed for the arabic alphabets for the responsive
Swiggy’s Playbook: UX, Logistics & Monetization
Instagram's Product Secrets Unveiled with this PPT
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
The spiral of silence is a theory in communication and political science that...
Tour Presentation Educational Activity.pptx
An Unlikely Response 08 10 2025.pptx
Effective_Handling_Information_Presentation.pptx
Intro to ISO 9001 2015.pptx wareness raising
Human Mind & its character Characteristics
Ad

stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller

  • 1. squash the flakes! stackconf 2024 Daniel Hiller
  • 2. agenda ● about me ● about flakes ● impact of flakes ● flake process ● tools ● the future ● Q&A
  • 3. about me ● Software Engineer @ Red Hat OpenShift Virtualization team ● KubeVirt CI, automation in general
  • 5. about flakes a flake is a test that without any code change will either fail or pass in successive runs
  • 6. about flakes a test can also fail for reasons beyond our control that is not a flake to us
  • 8. about flakes is it important?
  • 9. about flakes does it occur regularly?
  • 10. about flakes how often do you have to deal with it?
  • 11. about flakes “… test flakiness was a frequently encountered problem, with ● 20% of respondents claiming to experience it monthly, ● 24% encountering it on a weekly basis and ● 15% dealing with it daily” source: “A survey of flaky tests”
  • 12. about flakes “... In terms of severity, of the 91% of developers who claimed to deal with flaky tests at least a few times a year, ● 56% described them as a moderate problem and ● 23% thought that they were a serious problem. …” source: “A survey of flaky tests”
  • 13. about flakes flakes are caused either by production code or by test code
  • 14. from “A survey of flaky tests”: ● 97% of flakes were false alarms*, and ● more than 50% of flakes could not be reproduced in isolation conclusion: “ignoring flaky tests is ok” *code under test actually is not broken, but it works as expected impact of flakes
  • 16. in CI automated testing MUST give a reliable signal of stability any failed test run signals that the product is unstable test runs failed due to flakes do not give this reliable signal they only waste time impact of flakes
  • 17. impact of flakes Flaky tests waste everyone’s time - they cause ● longer feedback cycles for developers ● slowdown of merging pull requests - “retest trap” ● reversal of acceleration effects (i.e. batch testing)
  • 18. impact of flakes Flaky tests cause trust issues - they make people ● lose trust in automated testing ● ignore test results
  • 19. minimizing the impact def: quarantine 1 to exclude a flaky test from test runs as early as possible, but only as long as necessary 1: Martin Fowler - Eradicating Non-Determinism in Tests
  • 20. the flake process regular meeting ● look at flakes ● decide: fix or quarantine? ● hand to dev ● bring back in emergency quarantine source: QUARANTINE.md
  • 21. minimizing the impact how to find flaky tests? any merged PR had all tests succeeding in the end, thus any test run with test failures from that PR might contain execution of flaky tests
  • 22. minimizing the impact what do we need? ● easily move a test between the set of stable tests and the set of quarantined tests ● a report over possible flaky tests ● enough runtime data to triage flakes ○ devs decide whether we quarantine right away or they can fix them in time stable tests quaran tined tests flaky tests data quarantine dequarantine
  • 24. tools quarantine mechanics: ci honoring QUARANTINE* label ● pre-merge tests skip quarantined tests ● periodics execute quarantined tests to check their stability * we use the Ginkgo label - text label is required for backwards compatibility sources: ● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/automation/test.sh#L452 ● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/hack/functests.sh#L69 ● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/tests/canary_upgrade_test.go#L177
  • 27. tools flake stats report why: detect failure hot spots in one view (source)
  • 28. tools flakefinder report why: see detailed view for a certain day
  • 29. tools ci-health why: show overall CI stability metrics by tracking ● merge-queue-length, ● time-to-merge, ● retests-to-merge and ● merges-per-day
  • 31. tools ci-search why: estimate impact as basis for quarantine decision see openshift ci-search
  • 32. tools testgrid why: second way to determine instabilities, drill down on all jobs for kubevirt/kubevirt
  • 34. tools check-tests-for-flakes test lane why: catch flakes before entering main (source) ●
  • 35. tools referee bot why: stop excessive retesting on PRs without changes (source)
  • 36. tools retest metrics dashboard why: ● show overall CI health via number of retests on PRs ● show PRs exceeding retest count where authors might need support
  • 37. in a nutshell In regular intervals: ● follow up on previous action items ● look at data and derive action items ● hand action items over to dev teams ● revisit and dequarantine quarantined tests
  • 38. main sources of flakiness ● test order dependencies ● concurrency ● data races ● differing execution platforms
  • 39. key takeaways ● identify outside dependencies you have ● stabilize the testing environment ○ make it resilient against outside dependency failures ○ cache what you can ● use versioning for testing environments
  • 40. the future - more data, more tooling gaps we want to close: ● collect more data - run the majority of tests frequently ● steadily improve in detecting new flakes ● use other methods to detect flaky tests, i.e. static code analysis ● long term - automatic quarantine PRs when new flakes have entered the codebase
  • 41. Q&A Any questions? Any suggestions for improvement? Who else is trying to tackle this problem? What have you done to solve this?
  • 42. Thank you for attending! Further questions? Feel free to send questions and comments to: mailto: dhiller@redhat.com k8s slack: kubernetes.slack.com/ @dhiller mastodon: @dhiller@fosstodon.org web: www.dhiller.de kubevirt.io KubeVirt welcomes all kinds of contributions! ● Weekly community meeting every Wed 3PM CET ● Links: ● KubeVirt website ● KubeVirt user guide ● KubeVirt Contribution Guide ● GitHub ● Kubernetes Slack channels ○ #virtualization ○ #kubevirt-dev