Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman

Testing & Quality
of ML Systems
@CarlosKidman

SLIDESMANIA.COM
● Head of Engineering
● Test Automation University
● Open Source (ie Pylenium, PyClinic)
● International Keynote Speaker
● Twitch and YouTube
● Founder of QAP

“
Test early.
Test often.
Tulsee Doshi and Jacqueline
Pan at Google I/O 2019
Conference.
3

PREDICTION
@JayAlammar
Predicting (estimating, calculating) values based on
patterns in other, existing values.

Predict how much
customers would spend
How much would 3 people spend?
Photo by Greta Hoffman from
Number of
people
Purchase
Amount
1 10
2 20
4 40
10 🎉

Photo by Greta Hoffman from
Number of
people
Purchase
Amount
1 10
2 20
4 40
10 🎉
Dataset
Weight
Features Labels

Shift Left
Quality starts at the beginning
@CarlosKidman

1. Define the Problem
╸ What are we trying to solve?
╸ Who is the ML System for?
╸ Does this even need ML?

2. Define Success and Assess Risk
╸ Define initial baselines
╸ Consider Privacy and Security risks
╸ Create a Proof of Concept
Do we have the proper resources to do this?

3. Design the Initial Architecture
╸ Which DB or tables will we pull the data from?
╸ What does our ELT Process look like?
╸ REST Service or Mobile Device?
╸ How will we monitor and measure our model?

4. Collect Data
╸ Where is the data coming from?
╸ Is this different than STAGE or DEV?
╸ Where are we storing the data?
╸ Is the data streamed and/or batched?
Understand the Data Journey

Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman

5. Prepare Data
╸ What shapes does the data need to be in?
╸ What are the data types?
╸ Are there missing values or errors?
💩 data = 💩 models

6. Train and Validate Models
╸ Experiment and compare multiple models
╸ Capture training and validation metrics
╸ Visualize results for further analysis
╸ Tune weights and parameters
╸ Use tests (exploratory and automated)

VALIDATIO
N
How well does the model perform
(using metrics like accuracy)
against a dataset it has NOT seen?
TESTING
TRAINING VALIDATION
95% 76% ***

7. Test the Models
╸ What behaviors does the model show?
╸ Does it demonstrate harmful biases?
╸ Does it meet Privacy and Security Requirements?
╸ Is it performant and reliable?
╸ Can it withstand things like Adversarial Attacks?
╸ Does it solve the problem(s) we set out to solve?
╸ Do our customers enjoy using our ML System?

8. Deploy the Model
╸ REST or GraphQL Service?
╸ Mobile Device?
╸ Airplane or Self-Driving Car?
╸ Serverless Function?

9. Observe and Iterate
╸ Monitor
╸ Measure
╸ Alert
╸ Insights
╸ Learn

Prevent.
Mitigate.
Detect.
@CarlosKidman

Source: https://medium.com/geekculture/what-can-ds-ml-engineers-learn-from-zillows-flippinggate-b820a1a5c8ef
Buy Sell Profit

What happened?

Lessons Learned
╸ Poor data quality
╸ Easy to game the system
╸ Too dependent on the ML System
╸ Bias from Selective Focus
╸ Risk from External Factors
╸ Good at first, worse over time

Shift Right
Quality continues post-deployment
@CarlosKidman

MLOps requires good
Processes, Testing, and Automation
@CarlosKidman

Adversarial Attacks
Fooling and exploiting ML Models
@CarlosKidman

Attacks can cause harm 😨
Sources: https://www.youtube.com/watch?v=YXy6oX1iNoA
https://arstechnica.com/cars/2017/09/hacking-street-signs-with-stickers-could-confuse-self-driving-cars/

Invisible to the Security AI
Source: https://www.theverge.com/2019/4/23/18512472/fool-ai-surveillance-adversarial-example-yolov2-person-detection

Become the Attacker
╸ Not only for Computer Vision
╸ You don’t have to be a Data Scientist
╸ You don’t have to know how to code
╸ Be a part of Threat Modeling
╸ Be creative and design tests and attacks

Behavioral Testing
Beyond accuracy & loss metrics
@CarlosKidman

Test behaviors and capabilities
╸ Define desired capabilities during design
╸ Testing behaviors gives a more accurate
picture of model performance
╸ “...created twice as many tests and
found almost three times as many bugs”
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) https://aclanthology.org/2020.acl-main.442.pdf

What does it look like? 🤔

Fair and Responsible AI
Test for Harmful Biases
@CarlosKidman

Fairness
● By Data
● By Measurement & Modeling
● By Design

Source: https://youtu.be/6CwzDoE8J4M

Source: Google I/O 2019 Machine Learning Fairness: Lessons Learned https://youtu.be/6CwzDoE8J4M

Recap
╸ Shift Left
╸ Shift Right
╸ Adversarial Attacks
╸ Test Behaviors and the overall experience
╸ Test for Fairness and Harmful Biases

We need holistic testing and quality
╸ Test early and test often
╸ Apply current testing and quality
techniques and strategies
╸ We need better testing!

THANKS!
Let’s continue the
conversation!
@CarlosKidman

Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman

More Related Content

Similar to Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman (20)

More from QA or the Highway (20)

Recently uploaded (20)

Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman

Editor's Notes