Release & Iterate Faster: Stop Manual Testing

Release & Iterate Faster:
Stop Manual Testing
Drew Hannay (@drewhannay)

Release Schedule (Ideal)
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Oncall
Handoff
RC Cut Manual
Testing... Release Oncall
Handoff
Repeat 12x yearly

Release Schedule (Reality)
Rush to check in
half-finished code
Oncall
Handoff
RC Cut Manual
Testing... Hotfix Last minute
critical feature
Delayed
Release
Oncall
Handoff
Testing...
Bug found in last
minute feature
Hotfix Release Oncall
Handoff
Repeat 12x yearly; cry

Nobody is happy
● Devs aren’t happy
○ Mad rush to check in code before an RC cut
○ Dread being oncall during release week
● Product managers aren’t happy
○ Only 12 chances per year to release new features
○ Hard to iterate on member feedback; need to “get it right” the first time
● Testing team isn’t happy
○ Code quality drops immediately before RC as code is rushed in
○ Manual testing is time-consuming and tedious
● Leadership isn’t happy
○ Unhappy employees (see above ^)
○ Improvements to product / business delayed by release schedule

Project Voyager
● Rewrite of the LinkedIn app for Android, iOS, and mobile web
○ Brand new codebase
○ Brand new frontend API server
○ Brand new product designs
● Product goals:
○ Weekly releases
○ Faster iteration
○ Easier experiments
● Huge company focus on mobile
○ Mobile was becoming more and more important to the business
○ We needed to get better at it

Easy Engineering Answer
Handoff
RC Cut Manual
Handoff
RC Cut Manual
Handoff
RC Cut Manual
Handoff
RC Cut Manual
Handoff
RC Cut Manual
Handoff
RC Cut Manual
It was a week the whole time! #problemsolved

Better Engineering Answer
● The release schedule should be a product decision
○ Shouldn’t be restricted by engineering problems
● We should always be able to take the latest build and give it to members
○ But we still need frequent and fast builds!
● We needed a mindset shift
○ And a catchy slogan

3x3
Release three times per day, no more than three
hours from code commit to member availability

Why three hours?
● Not enough time for manual testing steps
● Not enough time to test everything
○ The goal isn’t 100% automation, it’s faster iterations
○ More tests → more time fixing tests when product changes → slower iterations
● Easy to emphasize craftsmanship
○ Devs can take the extra time to write quality code when the next release is three hours away
● Easy to fit three releases in an eight hour workday :)
○ 10am, 1pm, 4pm

Commit pipeline
Code
Review
Static
Analysis
Unit
Tests
Build
Release
Artifacts
Scenario
Tests
Alpha
Release
Feature
Development
Production
Release
Beta
Release
Layout
Tests

Static Analysis
● Java Checkstyle
● Android Lint
○ ~300 checks provided by Google
○ ~50 custom checks for LinkedIn-specific patterns and libraries
● Compile-time contract with frontend API server
○ Uses LinkedIn’s open source Rest.li REST framework to define API models
○ Provides static analysis to guarantee no backwards incompatible changes to production models
○ Client model classes are code-generated to ensure correctness
● Experimental: Auto-format code using IntelliJ’s CLI formatter
○ No more time in code reviews on nitpicky style comments

Building the code
● Initially not that bad
○ CI build for debug + release → ~5 minutes
● More features → more code → slower builds
○ Today: over 1 million lines of code in the Android app
○ CI build for debug alone → ~8 minutes
● Today: most of the code is in only two Gradle modules (plus libraries)
○ Eagerly awaiting Android Gradle Plugin 3.0 to start modularizing

APK Splits
● Releasing frequently means frequent updates for members
○ Need to be considerate of their data and keep the app small
● Release builds take advantage of APK splits
○ Separate APK built for each combination of screen density and CPU architecture
○ Total of 30 APKs published for each commit
● If you thought ONE release build was slow…
○ CI build for 30 APK splits → ~35 minutes
○ Almost 20% of our 3 hour “budget”!

Distributed Builds
● Each build uses two CI machines
○ First node builds debug binary and runs tests
○ Second node builds release binaries (APK splits)
● Build time is gated by whichever job is longer
○ Currently release builds :(
● Faster, but requires twice as much CI hardware
○ At one point we were using six machines per build...more on that later

Build Speed: Looking Forward
● Android Gradle Plugin 3.0 brings lots of speed improvements
○ Seeing 40% faster clean builds on the latest beta, with no code changes
○ Modularizing the app code should help even more
● Google Play App Signing
○ Let Google generate the APK splits
○ We only need to build the universal release binary

Testing: How do we test?
● Unit tests
○ Exactly what you think
● Layout tests
○ Unit tests for views
○ Load a layout in a dummy activity, with dummy data
○ Use Espresso ViewAssertions to check for overlaps, RTL layout, etc
● Scenario tests
○ Validate that key business metric flows are working properly
○ Usually flows that span multiple screens of the app
○ App gets mock data from an on-device fixture server
○ NOT an exhaustive suite

Testing: How do we measure coverage?
● Class/method/line code coverage is explicitly NOT measured
○ We don’t want every line covered by an automated test
○ We want high-value, low-maintenance tests
● For each feature, figure out flows that have the highest impact on the business
○ Then cover the happy paths of those flows with scenario tests
○ Teams agree on what flows must be tested and coverage is measured by # of those tests running
● For example:
○ Sharing a post to the LinkedIn feed should succeed → scenario test
■ Large impact on the business if this breaks
○ Sharing a post over 10k characters shows the correct error message → no scenario test
■ Not the end of the world if this is broken
■ Could be covered by unit tests (up to the team)

Testing: Need for Speed
● Currently running over 6.5k tests per build
○ Twice! Once before the commit is checked in, once after
○ Any change that doesn’t pass all tests gets auto-reverted
● Initial approach:
○ One emulator per CI machine
○ Six CI machines per build (!!!)
● Current approach:
○ Custom Gradle-based test harness to optimally shard tests
○ 16 emulators on one CI machine
○ Custom html + junit report
■ Logcat data for each test
■ Screenshots for failing tests

Testing: Stability
● Stable Test Environment
○ Custom code for creating and starting emulators
○ Test Butler
● Stable Test Framework
○ Run one test per instrumentation command
■ Similar to the new Android Test Orchestrator
○ Clear all app data between each test
○ Auto-recover and retry test if it fails because of an emulator issue
● Stable Test Suite
○ If a test passes once, it should pass always (with no code changes)

Testing: Stability - Quiz
● If we have 1000 tests that are each 99.9% reliable,
what’s the overall reliability of our test suite?
a. 99%
b. 95%
c. 90%
d. 80%
e. 50%

36.7%
● Loss of confidence in tests
● Unhappy developers
● Realization: Flaky tests are worse than no tests

Testing: Trunk Guardian
● Detect & disable flaky tests
● Continuously run all tests against the last known good build
○ Auto-disable a test if it fails
○ File a jira ticket for the owner of the test with logs and screenshots
● Daily report to leadership on % of disabled tests for each team
○ Auto-block new commits for teams with more than 10% disabled tests
● Re-enabling a test requires SOME code change
○ Find and fix the root cause of the flakiness
○ If you can’t, add more logs so you have more data next time it fails

Distribution: Alpha
● AKA “latest successful build”
● Employees have a button in the app to get the latest build
○ Usually PMs and execs who want to see the latest code ASAP
● Initially we used Google Play’s Alpha channel
○ Auto-uploaded every three hours for “true 3x3”
○ But people who wanted the latest code wanted it FASTER

Distribution: Beta
● Google Play public beta program
○ Open membership
○ Maxes out at a number that won’t have material business impact
if something goes terribly wrong
● Three beta releases per week
○ Wednesday / Monday / Friday
● Public beta users can (and do!) send feedback through Google Play
○ Most significant issues that get past our automated tests are reported
within four hours of beta release

Distribution: Production
● Once per week, Wednesday
○ Promote the newest beta build without any blocking issues
○ If all three betas are bad, skip the release and hold a post-mortem
● Google Play staged rollout
○ Ramp the build slowly throughout the day
○ Monitor adoption rate and crash rate

Release Schedule (Current)
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
Beta
Release
Prod + Beta
Release
Oncall
Handoff
Beta
Release
~130 releases / year (no release on holidays or “InDay”)

Minimizing Risk & Enabling Experiments
● Take advantage of LinkedIn’s existing A/B testing infrastructure
● New features are developed behind feature flags
○ Code can be ramped dynamically to different groups of members
○ Performance of new features or changes can be monitored
● Dynamic configuration
● Server-controlled kill switch
○ Crashing or buggy code can often be disabled without a new build

But you must have SOME manual tests?
● Nope, not really
● Push notifications and deeplinks are manually tested periodically
○ Doesn’t block releases
● Teams do manual bug bashes and testing on new features
○ Also doesn’t block releases
○ No manual testing for regressions, only to ensure new features are ready to ramp

3x3: What’s next?
● Constantly looking for ways to reduce commit-to-publish time
○ C2P time is a metric tracked at the VP level at LinkedIn
○ Google Play App Signing
○ Gradle modularization + build cache
○ Emulator pools to scale up testing capacity
● Automated performance testing
○ We can sample some types of app performance in production
○ But no great way of catching issues before release
● Automated push notification and deeplink testing
● Automated monitoring and alerting on Google Play reviews
○ Find out when there’s problems in production more quickly

“That’s great for
LinkedIn, but…”

Takeaways
● Keep in mind we built this over 2.5 years
○ This presentation is intended to show some of the challenges, but also that 3x3 IS possible
● Leadership buy-in is crucial
○ 3x3 is a significant mindset shift for everyone in the organization
○ Driving it top-down makes things much easier
● Start with the simpler pieces
○ Anyone can run static analysis, or set up automated tests in CI builds

3x3: Blogs, Videos, & Code
● 3x3: Speeding up mobile releases
● Consistent Android Testing Environments with Gradle (slides)
● Open Sourcing Test Butler (github)
● Test Butler: Reliable Android Testing, at Your Service (slides)
● Open Sourcing Dex Test Parser (github)

Release & Iterate Faster: Stop Manual Testing

More Related Content

What's hot (20)

Similar to Release & Iterate Faster: Stop Manual Testing (20)

Recently uploaded (20)

Release & Iterate Faster: Stop Manual Testing