Continuous testing at scale

Continuous Testing
at Scale
Gergely Orosz, Engineering Manager
@GergelyOrosz
8 May 2018

● Engineering manager @Uber, in Amsterdam
● 10+ years of software development (Skyscanner, Skype, JP Morgan alumni)
● Full-stack, iOS, Android, (Windows Phone)
Introduction

War stories
Trading systems
Oil rig monitoring
XBox One launch
Uber apps rewrites
Payment systems

Iterating is part of the journey

Why do we test?
We test to ship no bugs.

Bug-free code of substance
But at what cost?

Why do we test?
To minimize the business impact of mistakes
while maintaining good execution speed.

We will cover testing of
mobile apps.
Still, a lot of the concepts apply
across the stack.

Crashes
Functional
Bugs
UI Bugs
We test, so we can avoid:

… at Scale … at Uber … Tools & a
Framework
Continuous testing...

● 600+ cities, 65+ countries, 6 continents
● 10 engineering offices (4x US, Amsterdam, Denmark, 2x India, Sofia,
Vilnius)
● 18,000+ people, of which 2,500+ engineers & 400+ mobile engineers
Some Uber facts

Hundreds of mobile
engineers?
Request a ride
Fare split
Cash
Uber for Business
Credit card rewards points
Promotions
Promotions
Safety
Over 10 ways to pay
Scheduled
rides
Drive with Uber
Uber Eats, Freight,
Bike, Rental...
Experimentation
65+ countries,
600+ cities
Performance
Cash
Instant payments
Maps & navigation
uberPOOL
Driver incentives
App health
Developer tools
Networking
Feed cards
Driver experience
Driver recognition
Airport pickup
Uber Family
Beacon
Campaigns
Fraud EATS app
Driver app
Freight app
Restaurants app
Other apps
Fleet app

What can “at scale” mean?
● More functionality
● More users & regions, locales
● More code
● More engineers
● More engineering offices & locations
● More automated testing
● More apps

● More functionality
● More users & regions, locales
● More code
● More engineers
● More engineering offices & locations
● More automated testing
● More apps
What does “at scale” mean?
● More bugs
● Smaller/local bugs have bigger impact
● Longer build times
● Communication overhead
● Developer systems need to work 24/7
● Longer time to run tests
● The same problems repeating
Problems

What does “at scale” mean?

… at Scale … at Uber … Framework
Continuous testing...

A few things I found different @Uber compared to my previous experience:
● No formal QA role, testing teams or dedicated DevOps team
● Dedicated team(s) owning testing infrastructure & developer tooling
● More formal planning process
● No staging systems: test tenancies instead
● Blameless postmortem culture
Engineering culture

Continuous testing process @Uber
Write code & land
to master
Pre-release
testing
Ship to users

Continuous
Integration
arc diff
Phabricator diff
Local
validations
Code
reviewers
● Commit message validation
(e.g. test plan, revert plan)
● Linting
Herald
rules
Rules like:
● “If certain files are touched, add
{certain people} as reviewers
● If the files added contain a certain
phrase, add a comment to the diff
Build results
Do a build with:
● Linting
● Unit tests
● Static code analysis
Create a pull request

● Our lint rules are extensive, evolved since the early years
● NEAL: our language agonistic linting platform (open sourced)
Linting: a first class citizen

Continuous
Integration
arc diff
Phabricator diff
Local
validations
Lint,
Build,
Test
Update the diff
arc
land
“Merge to master”
Code Repo
Submit
Queue
Do a “full” build with:
● Linting
● Unit tests
● Static code analysis
● UI testsBuild Result
Validation
pass

Build speeds matter (even) more, as the team grows

Write code & land
to master
Pre-release
testing
Ship to users
● Local checks (linting)
● Continuous Integration (linting,
unit tests, static analysis)
● Code review
● Safe merging to master (UI
tests, SubmitQueue)

Ready for production
release.
Merge code to master
Release
candidate ?
master
Build cut
Automated tests
Manual tests

Test tenancy
Staging Production
code (master)
Test
accounts
Production
accounts
Production
accounts
Test
accounts
Test
tenancy
Production
tenancy
Staged rollout
code (master)
Staging & production systems Production system with test tenancy

release.
Release
candidate
master
Build cut
Automated tests
Manual tests
Dogfooding
bugreports

Dogfooding: sending bug reports
Bug reporter tool
Phabricator
ticket
Take
screenshot
Teams triage

release.
Release
candidate
master
Build cut
Automated tests
Manual tests
Dogfooding
bugreports
Crash reports

release.
Release
candidate
master
Build cut
Automated tests
Manual tests
Dogfooding
bugreports
Crash reports
Localization
...
Fix
Hotfix

Write code & land
to master
Pre-release
testing
Ship to users
● Manual testing (sanity)
● Dogfooding
● Crash reports
● Build train

Facts
● Bugs will be introduced that none of the previous tests catch
● With native apps
○ New builds can take days to ship due to the app store approval
process
○ Users might not update their apps for a while.
Conclusion
● Every change should be revertable, remotely.
● Let’s use backend-controlled feature flags
Rolling out to production on mobile

Remote Bugfixing: Feature Flags

Rollout can be risky if the population is large & there is no monitoring.
Staged rollout
● Control user exposure in early stages via a feature flag
● Monitor the impact on key business metrics at each stage
Rolling out to production (not just) on mobile

release.
Staged rollout
Monitor
Rolled out
Rolling out a new feature

Staged rollout monitoring for business impact: statistically significant differences

Write code & land
to master
Pre-release
testing
Ship to users
● Staged rollout
● Monitoring & alerting
○ Crash reports
○ Business events
○ Performance

The mobile testing lifecycle
Write code & land
to master
Pre-release
testing
Ship to users
In production
Build cut Release
Staged rollout
& monitoring
Code & functional quality
checks
Functional & UX quality
checks, hotfixes
Are we done testing?
Rolled out

Write code & land
to master
Pre-release
testing
Ship to users
In production
Build cut Release
Staged rollout
& monitoring
checks
checks, hotfixes
Uh-oh...
Monitor & triage issues/alerts

Write code & land
to master
Pre-release
testing
Ship to users
In production
Build cut Release
Staged rollout
& monitoring
checks
checks
Outages
Uh-oh...

How can we make sure this
does not happen again?

The goal of a postmortem
Understand the root cause in order to take
action to prevent the same issue from impacting
customers again.

Write code & land
to master
Pre-release
testing
Ship to users
In production
Requirements &
planning
Product & engineering spec, with testing plan
Outages &
postmortems
Uh-oh...
“We did not do proper planning.”
“We did not test this edge case.”
“We did not have a test plan.”

The mobile testing lifecycle @Uber
Write code & land
to master
Pre-release
testing
Ship to users
In production
Requirements &
planning
Staged rollout
& monitoring
Code level quality checks Functional & UX quality
checks
Outages &
postmortems
Spec & testing plan
Build cut Release
Rolled out

What worked for us, will
not (exactly) work for you.

Continuous testing: tools
Crashes
Functional
Bugs
UI Bugs
We test, so we can avoid:A few tools to detect / avoid:

Continuous testing toolset
Crashes
Functional
Bugs
UI Bugs
● Crash reports
● Crash report
alerting
● Code reviews
● Unit testing
● UI testing
● Manual testing
● Dogfooding
● Staged rollout
● Manual testing
● Dogfooding
● Screenshot testing
A few tools to detect / avoid:

Crashes
Functional
Bugs
UI Bugs
A few tools to detect / avoid:
● Crash reports
● Crash report
alerting
● Code reviews
● Unit testing
● UI testing
● Manual testing
● Dogfooding
● Staged rollout
● Manual testing
● Dogfooding
Other things
impacting
the business
● Business monitoring
& alerting
● Performance testing
/ monitoring
● (Tools that might
work for you)

Crashes
Functional
Bugs
UI Bugs
A few off the shelf tools to detect / avoid:
● Crash
reporting:
Crashlytics
● Code reviews
○ Github
○ In-house: Phab
● CI
○ Travis CI / Bitrise
○ In-house: Jenkins
● Manual testing:
crowdsourced platforms
● UI testing
○ XCTest
○ Espresso
Other things
impacting
the business
● Analytics: GA, Mixpanel
● In-house analytics:
Kafka, Elastisearch &
Grafana + ML
● Performance testing
○ XCode & Android
studio profilers

A framework to think about testing

The Continuous Testing Pyramid
Manual
tests
UI tests
Unit Tests
Dog
fooding
Blameless
postmortems
Code reviews
Continuous
integration
Monitor
Alert
Triage
Things going wrong
for customers
Team owning testing infrastructure
To make all of this scale:
Improve
processes
& systems
All engineers
All engineers
All engineers
All teams
All employees
All teams

Continuous testing at Scale
Why do we test?
To minimize the business impact of mistakes
while maintaining good execution speed.
As you scale, iterate on the tools you use, your team
structure & processes to keep doing this.

Gergely Orosz
Engineering Manager, Uber Amsterdam
Thank you Open sourced tools for more efficient testing
● uber.github.io
● Language agonistic linting platform: NEAL
● Android
○ Nanoscope (tracing tool)
○ NullAway (static checks to avoid
NullPointer exceptions)
○ OkBuck: use the buck build system on
a gradle project
@GergelyOrosz
eng.uber.com

Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt
from disclosure under applicable law. All recipients of this document are notified that the information
contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information
to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.

Continuous testing at scale

More Related Content

Similar to Continuous testing at scale (20)

More from Gergely Orosz (6)

Recently uploaded (20)

Continuous testing at scale