stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller

squash the ﬂakes!
stackconf 2024
Daniel Hiller

agenda
● about me
● about flakes
● impact of flakes
● flake process
● tools
● the future
● Q&A

about me
● Software Engineer @ Red Hat OpenShift Virtualization team
● KubeVirt CI, automation in general

about ﬂakes
a ﬂake?
…
…
…

about ﬂakes
a ﬂake
is a test that
without any code change
will either fail or pass in successive runs

about ﬂakes
a test
can also fail for reasons beyond our control
that is not a ﬂake to us

about ﬂakes
source: https://prow.ci.kubevirt.io/pr-history/?org=kubevirt&repo=kubevirt&pr=9445

about ﬂakes
is it important?

about ﬂakes
does it occur regularly?

about ﬂakes
how often do you have to deal with it?

about flakes
“… test flakiness was a frequently encountered problem, with
● 20% of respondents claiming to experience it monthly,
● 24% encountering it on a weekly basis and
● 15% dealing with it daily”
source: “A survey of flaky tests”

about flakes
“... In terms of severity, of the 91% of developers who claimed to deal with
flaky tests at least a few times a year,
● 56% described them as a moderate problem and
● 23% thought that they were a serious problem. …”
source: “A survey of flaky tests”

about ﬂakes
ﬂakes are caused
either by production code
or by test code

from “A survey of flaky tests”:
● 97% of flakes were false alarms*, and
● more than 50% of flakes could not be reproduced in isolation
conclusion: “ignoring flaky tests is ok”
*code under test actually is not broken, but it works as expected
impact of flakes

in CI automated testing MUST give a reliable signal of stability
any failed test run signals that the product is unstable
test runs failed due to ﬂakes do not give this reliable signal
they only waste time
impact of ﬂakes

impact of ﬂakes
Flaky tests waste everyone’s time - they cause
● longer feedback cycles for developers
● slowdown of merging pull requests - “retest trap”
● reversal of acceleration eﬀects (i.e. batch testing)

impact of ﬂakes
Flaky tests cause trust issues - they make people
● lose trust in automated testing
● ignore test results

minimizing the
impact
def: quarantine 1
to exclude a ﬂaky test
from test runs as early
as possible, but only as
long as necessary
1: Martin Fowler - Eradicating Non-Determinism in Tests

the flake process
regular meeting
● look at flakes
● decide: fix or
quarantine?
● hand to dev
● bring back in
emergency quarantine
source: QUARANTINE.md

minimizing the
impact
how to find flaky tests?
any merged PR had all tests
succeeding in the end,
thus any test run with test failures
from that PR might contain execution
of flaky tests

minimizing the impact
what do we need?
● easily move a test between the set of
stable tests and the set of quarantined
tests
● a report over possible flaky tests
● enough runtime data to triage flakes
○ devs decide whether we quarantine right
away or they can fix them in time
stable
tests
quaran
tined
tests
flaky tests
data
quarantine
dequarantine

tools
quarantine mechanics:
ci honoring QUARANTINE* label
● pre-merge tests skip
quarantined tests
● periodics execute
quarantined tests to check
their stability
* we use the Ginkgo label - text label is
required for backwards compatibility
sources:
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/automation/test.sh#L452
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/hack/functests.sh#L69
● https://github.com/kubevirt/kubevirt/blob/38c01c34acecfafc89078b1bbaba8d9cf3cf0d4d/tests/canary_upgrade_test.go#L177

tools
quarantine overview
(source)
since when?
where?

tools
ﬂake stats report
why: detect failure hot
spots in one view
(source)

tools
ﬂakeﬁnder report
why: see detailed view for
a certain day

tools
ci-health
why: show overall CI
stability metrics by
tracking
● merge-queue-length,
● time-to-merge,
● retests-to-merge and
● merges-per-day

tools
ci-search
why: estimate impact as
basis for quarantine
decision
see openshift ci-search

tools
testgrid
why: second way to
determine instabilities,
drill down on all jobs for
kubevirt/kubevirt

tools
check-tests-for-ﬂakes test lane
why: catch ﬂakes before entering
main
(source)
●

tools
referee bot
why: stop excessive retesting on PRs without
changes
(source)

tools
retest metrics dashboard
why:
● show overall CI health
via number of retests
on PRs
● show PRs exceeding
retest count where
authors might need
support

in a nutshell
In regular intervals:
● follow up on previous action items
● look at data and derive action items
● hand action items over to dev teams
● revisit and dequarantine quarantined tests

main sources of ﬂakiness
● test order dependencies
● concurrency
● data races
● diﬀering execution platforms

key takeaways
● identify outside dependencies you have
● stabilize the testing environment
○ make it resilient against outside dependency failures
○ cache what you can
● use versioning for testing environments

the future - more data, more tooling
gaps we want to close:
● collect more data - run the majority of
tests frequently
● steadily improve in detecting new flakes
● use other methods to detect flaky tests,
i.e. static code analysis
● long term - automatic quarantine PRs
when new flakes have entered the
codebase

Q&A
Any questions?
Any suggestions for improvement?
Who else is trying to tackle this problem?
What have you done to solve this?

Thank you for attending!
Further questions?
Feel free to send questions and comments to:
mailto: dhiller@redhat.com
k8s slack: kubernetes.slack.com/
@dhiller
mastodon: @dhiller@fosstodon.org
web: www.dhiller.de
kubevirt.io
KubeVirt welcomes all kinds of contributions!
● Weekly community meeting every Wed 3PM CET
● Links:
● KubeVirt website
● KubeVirt user guide
● KubeVirt Contribution Guide
● GitHub
● Kubernetes Slack channels
○ #virtualization
○ #kubevirt-dev

stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller

More Related Content

Similar to stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller (20)

Recently uploaded (20)

stackconf 2024 | Squash the Flakes! – How to Minimize the Impact of Flaky Tests by Daniel Hiller