Responsible AI in Industry: Practical Challenges and Lessons Learned

Responsible AI in Industry:
Practical Challenges and Lessons Learned
CVPR 2021 Responsible Computer Vision Workshop
Invited talk
Krishnaram Kenthapadi & Nashlie Sephus, Ph.D.
Amazon AWS AI

• Ethical challenges
posed by AI systems
• Inherent biases present
in society
• Reflected in training
data
• AI/ML models prone to
amplifying such biases
Algorithmic Bias

Laws against Discrimination
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disabilities Act
of 1990
Disability status
Civil Rights Act of 1964
Race
Age Discrimination in Employment Act of
1967
Age
Equal Pay Act of 1963;
Civil Rights Act of 1964
Sex
And more...

Fairness Privacy
Transparency Explainability

Motivation & Business Opportunities
Regulatory. We need to understand why the ML model made a given prediction
and also whether the prediction it made was free from bias, both in training and at
inference.
Business. Providing explanations to internal teams (loan officers, customer service
rep, compliance teams) and end users/customers
Data Science. Improving models through better feature engineering and training
data generation, understanding failure modes of the model, debugging model
predictions, etc.

© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon
SageMaker
VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS
Deep
Learning
AMIs &
Containers
GPUs &
CPUs
Elastic
Inference
Trainium Inferentia FPGA
AI SERVICES
ML SERVICES
FRAMEWORKS & INFRASTRUCTURE
DeepGraphLibrary
Amazon
Rekognition
Amazon
Polly
Amazon
Transcribe
+Medical
Amazon
Lex
Amazon
Personalize
Amazon
Forecast
Amazon
Comprehend
+Medical
Amazon
Textract
Amazon
Kendra
Amazon
CodeGuru
Amazon
Fraud Detector
Amazon
Translate
INDUSTRIAL AI CODE AND DEVOPS
NEW
Amazon
DevOps Guru
Voice ID
For AmazonConnect
Contact Lens
NEW
Amazon
Monitron
NEW
AWS Panorama
+ Appliance
NEW
Amazon Lookout
for Vision
NEW
Amazon Lookout
for Equipment
Scaling Fairness, Explainability & Privacy across the AWS ML Stack
NEW
Amazon
HealthLake
HEALTHCARE AI
NEW
Amazon Lookout
for Metrics
ANOMALY DETECTION
Amazon
Transcribe
for Medical
Amazon
Comprehend
for Medical
Label
data
NEW
Aggregate &
prepare data
NEW
Store & share
features
Auto ML Spark/R
NEW
Detect
bias
Visualize in
notebooks
Pick
algorithm
Train
models
Tune
parameters
NEW
Debug &
profile
Deploy in
production
Manage
& monitor
NEW
CI/CD
Human review
review
NEW: Model management for edge devices
NEW: SageMaker JumpStart
SAGEMAKER STUDIO IDE

LinkedIn operates the largest professional
network on the Internet
Tell your
story
740M members
55M+
companies are
represented
on LinkedIn
90K
schools listed
(high school &
college)
36K
skills listed
14M+
open jobs
on
LinkedIn
Jobs
280B
Feed updates

Error-free (no system is perfect)
100% confident
Intended to replace human judgement
What ML Is Not
9

Fairness Techniques in Faces
12

Detect presence of a face in an
image or a video.
Face Detection
13

A system to determine the gender, age,
emotion, presence of facial hair, etc. from
a detected face.
Face Analysis

A system to determine a detected faces
identity by matching it against a
database of faces and their associated
identities.
Face Recognition
15

Estimation of the
confidence or certainty of any
prediction
Expressed in the
form of a probability or
confidence score
Confidence Score
16

Face Recognition: Common Causes of Errors
ILLUMINATION VARIANCE
POSE / VIEWPOINT
AGING
EXPRESSION / STYLE
OCCLUSION
Lighting, camera controls like exposure, shadows, highlights
Face pose, camera angles
Natural aging, artificial makeup
Face expression like laughing, facial hair such as a beard, hair style
Part of the face hidden as in group pictures
17

Racial Comparisons of Datasets [FairFace]
19

Launch with Confidence: Testing for Bias
• How will you know if users are being
harmed?
• How will you know if harms are unfairly
distributed?
• Detailed testing practices are often not
covered in academic papers
• Discussing testing requirements is a
useful focal point for cross-functional
teams

Reproducibility - Notebook Experiments
21

Gender Classification – PPB2
24

Gender Classification w.r.t. Hair Lengths – PPB2
30

Efficient Testing for Bias
• Development teams are under multiple
constraints
• Time
• Money
• Human resources
• Access to data
• How can we efficiently test for bias?
• Prioritization
• Strategic testing

Choose your evaluation metrics in light
of acceptable tradeoffs between
False Positives and False Negatives

FAccT ’21, March 3–10, 2021, Virtual Event, Canada
Paper Review
Nashlie Sephus, PHD
Tech Evangelist, AWS AI
33

Motivation - Datasets
• How are datasets collected?
• Where did the data come from?
• When it comes to humans, were they aware?
• Had the individuals given consent?
• How are dataset owners being held accountable for consequences
that may arise?
• How to create greater transparency about data?
34

Takeaways
• Testing for blindspots amongst intersectionality is key.
• Taking into account confidence scores/thresholds and error bars
when measuring for biases is necessary.
• Representation matters.
• Transparency, reproducibility, and education can promote change.
• Confidence in your product's fairness requires fairness testing
• Fairness testing has a role throughout the product iteration lifecycle
• Contextual concerns should be used to prioritize fairness testing
36

37
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
37
Detect bias in ML models and understand model predictions
Amazon SageMaker Clarify

40
Amazon
SageMaker
Clarify
Detect bias in ML models
and understand model
predictions
Detect bias during data preparation
Identify imbalances in data
Evaluate the degree to which various types of bias are present in your model
Check your trained model for bias
Understand the relative importance of each feature to your model’s behavior
Explain overall model behavior
Understand the relative importance of each feature for individual inferences
Explain individual predictions
Provide alerts and detect drift over time due to changing real-world conditions
Detect drift in bias and model behavior over time
Generated automated reports
Produce reports on bias and explanations to support internal presentations

Lessons
learned
• Fairness as a Process
• Notions of bias & fairness are highly application
dependent
• Choice of the attribute(s) for which bias is to be
measured & the choice of the bias metrics to be
guided by social, legal, and other non-technical
considerations
• Collaboration/consensus across key
stakeholders
• Wide spectrum of customers with different
levels of technical background
• Managed service vs. open source packages
• Monitoring of the deployed model
• Fairness & explainability considerations
across the ML lifecycle

52
Fairness and Explainability by Design in the ML Lifecycle

53
Additional Pointers
For more information on Amazon SageMaker Clarify, please refer:
• https://aws.amazon.com/sagemaker/clarify
• Amazon Science / AWS Articles
• https://aws.amazon.com/blogs/aws/new-amazon-sagemaker-clarify-detects-
bias-and-increases-the-transparency-of-machine-learning-models
• https://www.amazon.science/latest-news/how-clarify-helps-machine-learning-
developers-detect-unintended-bias
• Technical paper: Fairness Measures for Machine Learning in Finance
• https://github.com/aws/amazon-sagemaker-clarify
Acknowledgments: Amazon SageMaker Clarify core team, Amazon AWS AI
team, and partners across Amazon

Good ML Practices Go a Long Way
Lots of low hanging fruit in terms of
improving fairness simply by using
machine learning best practices
• Representative data
• Introspection tools
• Visualization tools
• Testing
01
Fairness improvements often lead
to overall improvements
• It’s a common misconception that it’s
always a tradeoff
02

Breadth and Depth Required
Looking End-to-End is critical
• Need to be aware of bias and potential
problems at every stage of product and
ML pipelines (from design, data
gathering, … to deployment and
monitoring)
01
Details Matter
• Slight changes in features or labeler
criteria can change the outcome
• Must have experts who understand the
effects of decisions
• Many details are not technical such as
how labelers are hired
02

Process Best
Practices
Identify product goals
Get the right people in the room
Identify stakeholders
Select a fairness approach
Analyze and evaluate your system
Mitigate issues
Monitor Continuously and Escalation Plans
Auditing and Transparency
Policy
Technology

Beyond
Accuracy
Performance and Cost
Fairness and Bias
Transparency and Explainability
Privacy
Security
Safety
Robustness

Fairness, Explainability &
Privacy: Opportunities

Fairness in ML
Application specific challenges
Conversational AI systems: Unique bias/fairness/ethics considerations
E.g., Hate speech, Complex failure modes
Beyond protected categories, e.g., accent, dialect
Entire ecosystem (e.g., including apps such as Alexa skills)
Two-sided markets: e.g., fairness to buyers and to sellers, or to content
consumers and producers
Fairness in advertising (externalities)
Tools for ensuring fairness (measuring & mitigating bias) in AI lifecycle
Pre-processing (representative datasets; modifying features/labels)
ML model training with fairness constraints
Post-processing
Experimentation & Post-deployment

Key Open Problems in Applied Fairness
What if you don’t have
the sensitive
attributes?
When should you use
what approach? For
example, Equal
treatment vs equal
outcome?
How to identify
harms?
Process for framing AI
problems: Will the
chosen metrics lead to
desired results?
How to tell if data
generation and
collection method is
appropriate for a task?
(e.g., causal structure
analysis?)
Processes for
mitigating harms and
misbehaviors quickly

Explainability in ML
Actionable explanations
Balance between explanations & model secrecy
Robustness of explanations to failure modes (Interaction between ML
components)
Application-specific challenges
Conversational AI systems: contextual explanations
Gradation of explanations
Tools for explanations across AI lifecycle
Pre & post-deployment for ML models
Model developer vs. End user focused

Privacy in ML
Privacy for highly sensitive data: model training & analytics using
secure enclaves, homomorphic encryption, federated learning / on-
device learning, or a hybrid
Privacy-preserving model training, robust against adversarial
membership inference attacks (Dynamic settings + Complex data /
model pipelines)
Privacy-preserving mechanisms for data marketplaces

Reflections
“Fairness, Explainability, and Privacy by
Design” when building AI products
Collaboration/consensus across key
stakeholders
NYT / WSJ / ProPublica test :)

Related Tutorials / Resources
• ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)
• AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES)
• Sara Hajian, Francesco Bonchi, and Carlos Castillo, Algorithmic bias: From
discrimination discovery to fairness-aware data mining, KDD Tutorial, 2016.
• Solon Barocas and Moritz Hardt, Fairness in machine learning, NeurIPS Tutorial, 2017.
• Kate Crawford, The Trouble with Bias, NeurIPS Keynote, 2017.
• Arvind Narayanan, 21 fairness definitions and their politics, FAccT Tutorial, 2018.
• Sam Corbett-Davies and Sharad Goel, Defining and Designing Fair Algorithms, Tutorials
at EC 2018 and ICML 2018.
• Ben Hutchinson and Margaret Mitchell, Translation Tutorial: A History of Quantitative
Fairness in Testing, FAccT Tutorial, 2019.
• Henriette Cramer, Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III,
Miroslav Dudík, Hanna Wallach, Sravana Reddy, and Jean Garcia-Gathright, Translation
Tutorial: Challenges of incorporating algorithmic fairness into industry practice, FAccT
Tutorial, 2019.

Related Tutorials / Resources
• Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, Margaret
Mitchell, Fairness-Aware Machine Learning: Practical Challenges and Lessons
Learned, Tutorials at WSDM 2019, WWW 2019, KDD 2019.
• Krishna Gade, Sahin Cem Geyik, Krishnaram Kenthapadi, Varun Mithal, Ankur Taly,
Explainable AI in Industry, Tutorials at KDD 2019, FAccT 2020, WWW 2020.
• Himabindu Lakkaraju, Julius Adebayo, Sameer Singh, Explaining Machine Learning
Predictions: State-of-the-art, Challenges, and Opportunities, NeurIPS 2020 Tutorial.
• Kamalika Chaudhuri, Anand D. Sarwate, Differentially Private Machine Learning:
Theory, Algorithms, and Applications, NeurIPS 2017 Tutorial.
• Krishnaram Kenthapadi, Ilya Mironov, Abhradeep Guha Thakurta, Privacy-preserving
Data Mining in Industry, Tutorials at KDD 2018, WSDM 2019, WWW 2019.

Responsible AI in Industry: Practical Challenges and Lessons Learned

More Related Content

What's hot (20)

Similar to Responsible AI in Industry: Practical Challenges and Lessons Learned (20)

More from Krishnaram Kenthapadi (12)

Recently uploaded (20)

Responsible AI in Industry: Practical Challenges and Lessons Learned

Editor's Notes