SlideShare a Scribd company logo
Testing & Quality
of ML Systems
@CarlosKidman
SLIDESMANIA.COM
● Head of Engineering
● Test Automation University
● Open Source (ie Pylenium, PyClinic)
● International Keynote Speaker
● Twitch and YouTube
● Founder of QAP
“
Test early.
Test often.
Tulsee Doshi and Jacqueline
Pan at Google I/O 2019
Conference.
3
PREDICTION
@JayAlammar
Predicting (estimating, calculating) values based on
patterns in other, existing values.
Predict how much
customers would spend
How much would 3 people spend?
Photo by Greta Hoffman from
Number of
people
Purchase
Amount
1 10
2 20
4 40
10 🎉
Patterns
Relationships
Photo by Greta Hoffman from
Number of
people
Purchase
Amount
1 10
2 20
4 40
10 🎉
Dataset
Weight
Features Labels
Shift Left
Quality starts at the beginning
@CarlosKidman
1. Define the Problem
╸ What are we trying to solve?
╸ Who is the ML System for?
╸ Does this even need ML?
2. Define Success and Assess Risk
╸ Define initial baselines
╸ Consider Privacy and Security risks
╸ Create a Proof of Concept
Do we have the proper resources to do this?
3. Design the Initial Architecture
╸ Which DB or tables will we pull the data from?
╸ What does our ELT Process look like?
╸ REST Service or Mobile Device?
╸ How will we monitor and measure our model?
4. Collect Data
╸ Where is the data coming from?
╸ Is this different than STAGE or DEV?
╸ Where are we storing the data?
╸ Is the data streamed and/or batched?
Understand the Data Journey
Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman
5. Prepare Data
╸ What shapes does the data need to be in?
╸ What are the data types?
╸ Are there missing values or errors?
💩 data = 💩 models
6. Train and Validate Models
╸ Experiment and compare multiple models
╸ Capture training and validation metrics
╸ Visualize results for further analysis
╸ Tune weights and parameters
╸ Use tests (exploratory and automated)
VALIDATIO
N
How well does the model perform
(using metrics like accuracy)
against a dataset it has NOT seen?
TESTING
TRAINING VALIDATION
95% 76% ***
7. Test the Models
╸ What behaviors does the model show?
╸ Does it demonstrate harmful biases?
╸ Does it meet Privacy and Security Requirements?
╸ Is it performant and reliable?
╸ Can it withstand things like Adversarial Attacks?
╸ Does it solve the problem(s) we set out to solve?
╸ Do our customers enjoy using our ML System?
8. Deploy the Model
╸ REST or GraphQL Service?
╸ Mobile Device?
╸ Airplane or Self-Driving Car?
╸ Serverless Function?
9. Observe and Iterate
╸ Monitor
╸ Measure
╸ Alert
╸ Insights
╸ Learn
Prevent.
Mitigate.
Detect.
@CarlosKidman
Source: https://medium.com/geekculture/what-can-ds-ml-engineers-learn-from-zillows-flippinggate-b820a1a5c8ef
Buy Sell Profit
What happened?
Source: https://medium.com/geekculture/what-can-ds-ml-engineers-learn-from-zillows-flippinggate-b820a1a5c8ef
Lessons Learned
╸ Poor data quality
╸ Easy to game the system
╸ Too dependent on the ML System
╸ Bias from Selective Focus
╸ Risk from External Factors
╸ Good at first, worse over time
Source: https://medium.com/geekculture/what-can-ds-ml-engineers-learn-from-zillows-flippinggate-b820a1a5c8ef
Shift Right
Quality continues post-deployment
@CarlosKidman
Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman
MLOps requires good
Processes, Testing, and Automation
@CarlosKidman
Adversarial Attacks
Fooling and exploiting ML Models
@CarlosKidman
Attacks can cause harm 😨
Sources: https://www.youtube.com/watch?v=YXy6oX1iNoA
https://arstechnica.com/cars/2017/09/hacking-street-signs-with-stickers-could-confuse-self-driving-cars/
Invisible to the Security AI
Source: https://www.theverge.com/2019/4/23/18512472/fool-ai-surveillance-adversarial-example-yolov2-person-detection
Become the Attacker
╸ Not only for Computer Vision
╸ You don’t have to be a Data Scientist
╸ You don’t have to know how to code
╸ Be a part of Threat Modeling
╸ Be creative and design tests and attacks
Behavioral Testing
Beyond accuracy & loss metrics
@CarlosKidman
Test behaviors and capabilities
╸ Define desired capabilities during design
╸ Testing behaviors gives a more accurate
picture of model performance
╸ “...created twice as many tests and
found almost three times as many bugs”
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) https://aclanthology.org/2020.acl-main.442.pdf
What does it look like? 🤔
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) https://aclanthology.org/2020.acl-main.442.pdf
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) https://aclanthology.org/2020.acl-main.442.pdf
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) https://aclanthology.org/2020.acl-main.442.pdf
Fair and Responsible AI
Test for Harmful Biases
@CarlosKidman
Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman
Fairness
● By Data
● By Measurement & Modeling
● By Design
Source: https://youtu.be/6CwzDoE8J4M
Source: Google I/O 2019 Machine Learning Fairness: Lessons Learned https://youtu.be/6CwzDoE8J4M
Source: Google I/O 2019 Machine Learning Fairness: Lessons Learned https://youtu.be/6CwzDoE8J4M
Source: Google I/O 2019 Machine Learning Fairness: Lessons Learned https://youtu.be/6CwzDoE8J4M
Recap
╸ Shift Left
╸ Shift Right
╸ Adversarial Attacks
╸ Test Behaviors and the overall experience
╸ Test for Fairness and Harmful Biases
We need holistic testing and quality
╸ Test early and test often
╸ Apply current testing and quality
techniques and strategies
╸ We need better testing!
THANKS!
Let’s continue the
conversation!
@CarlosKidman

More Related Content

PPTX
2024-02-24_Session 1 - PMLE_UPDATED.pptx
PPTX
A New Model for Testing
PPTX
Advancing Testing Using Axioms
PDF
Model evaluation in the land of deep learning
PPTX
vodQA Pune (2019) - Testing AI,ML applications
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PPTX
New model
PPTX
A New Model For Testing
2024-02-24_Session 1 - PMLE_UPDATED.pptx
A New Model for Testing
Advancing Testing Using Axioms
Model evaluation in the land of deep learning
vodQA Pune (2019) - Testing AI,ML applications
MLOps and Data Quality: Deploying Reliable ML Models in Production
New model
A New Model For Testing

Similar to Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman (20)

PPTX
An analytical approach to effective risk based test planning
PDF
The Machine Learning Audit
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
PDF
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
PPTX
AI-900 - Fundamental Principles of ML.pptx
PDF
Building successful and secure products with AI and ML
PPTX
Operationalizing Machine Learning
PPTX
Injecting Threat Modeling into the SDLC by Susan Bradley
PPTX
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
PDF
Data Analysis - Making Big Data Work
PDF
Human in the loop: Bayesian Rules Enabling Explainable AI
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PDF
Barga Data Science lecture 10
PDF
Pstc 2018
PPTX
Ria Sankar on Building AI Products
PPTX
Microsoft for Startups program, designed to help new ventures succeed in comp...
PDF
PDF
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
PDF
TensorFlow vs PyTorch: Quick Framework Overview
PPTX
Explainability for Natural Language Processing
An analytical approach to effective risk based test planning
The Machine Learning Audit
Responsible AI in Industry: Practical Challenges and Lessons Learned
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
AI-900 - Fundamental Principles of ML.pptx
Building successful and secure products with AI and ML
Operationalizing Machine Learning
Injecting Threat Modeling into the SDLC by Susan Bradley
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
Data Analysis - Making Big Data Work
Human in the loop: Bayesian Rules Enabling Explainable AI
Using MLOps to Bring ML to Production/The Promise of MLOps
Barga Data Science lecture 10
Pstc 2018
Ria Sankar on Building AI Products
Microsoft for Startups program, designed to help new ventures succeed in comp...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
TensorFlow vs PyTorch: Quick Framework Overview
Explainability for Natural Language Processing
Ad

More from QA or the Highway (20)

PDF
KrishnaToolComparisionPPT.pdf
PPTX
Ravi Lakkavalli - World Quality Report.pptx
PPTX
Caleb Crandall - Testing Between the Buckets.pptx
PDF
Thomas Haver - Mobile Testing.pdf
PDF
Thomas Haver - Example Mapping.pdf
PDF
Joe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdf
PDF
Sarah Geisinger - Continious Testing Metrics That Matter.pdf
PDF
Jeff Sing - Quarterly Service Delivery Reviews.pdf
PDF
Leandro Melendez - Chihuahua Load Tests.pdf
PDF
Rick Clymer - Incident Management.pdf
PPTX
Robert Fornal - ChatGPT as a Testing Tool.pptx
PDF
Federico Toledo - Extra-functional testing.pdf
PPTX
Andrew Knight - Managing the Test Data Nightmare.pptx
PDF
Melissa Tondi - Automation We_re Doing it Wrong.pdf
PDF
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
PPTX
DesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptx
PDF
Damian Synadinos - Word Smatter.pdf
PDF
Lee Barnes - What Successful Test Automation is.pdf
PPTX
Jordan Powell - API Testing with Cypress.pptx
PPTX
Carlos Kidman - Exploring AI Applications in Testing.pptx
KrishnaToolComparisionPPT.pdf
Ravi Lakkavalli - World Quality Report.pptx
Caleb Crandall - Testing Between the Buckets.pptx
Thomas Haver - Mobile Testing.pdf
Thomas Haver - Example Mapping.pdf
Joe Colantonio - Actionable Automation Awesomeness in Testing Farm.pdf
Sarah Geisinger - Continious Testing Metrics That Matter.pdf
Jeff Sing - Quarterly Service Delivery Reviews.pdf
Leandro Melendez - Chihuahua Load Tests.pdf
Rick Clymer - Incident Management.pdf
Robert Fornal - ChatGPT as a Testing Tool.pptx
Federico Toledo - Extra-functional testing.pdf
Andrew Knight - Managing the Test Data Nightmare.pptx
Melissa Tondi - Automation We_re Doing it Wrong.pdf
Jeff Van Fleet and John Townsend - Transition from Testing to Leadership.pdf
DesiradhaRam Gadde - Testers _ Testing in ChatGPT-AI world.pptx
Damian Synadinos - Word Smatter.pdf
Lee Barnes - What Successful Test Automation is.pdf
Jordan Powell - API Testing with Cypress.pptx
Carlos Kidman - Exploring AI Applications in Testing.pptx
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
A comparative analysis of optical character recognition models for extracting...
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Spectroscopy.pptx food analysis technology
Assigned Numbers - 2025 - Bluetooth® Document
Review of recent advances in non-invasive hemoglobin estimation

Scale your Testing and Quality with Automation Engineering and ML - Carlos Kidman

Editor's Notes

  • #34: NER = Named Entity Recognition