SlideShare a Scribd company logo
Not Fair! Testing AI Bias and
Organizational Values
Peter Varhol and Gerie Owen
About me
• International speaker and writer
• Graduate degrees in Math, CS, Psychology
• Technology communicator
• AWS certified
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
Gerie Owen
3
• Quality Engineering Architect
• Testing Strategist & Evangelist
• Test Manager
• Subject expert on testing for
TechTarget’s
SearchSoftwareQuality.com
• International and Domestic
Conference Presenter
Gerie.owen@gerieowen.com
What You Will Learn
• Why bias is often an outcome of machine learning results.
• How bias that reflects organizational values can be a desirable result.
• How to test bias against organizational values.
Agenda
• What is bias in AI?
• How does it happen?
• Is bias ever good?
• Building in bias intentionally
• Bias in data
• Summary
Bug vs. Bias
• A bug is an identifiable and measurable error in process or result
• Usually fixed with a code change
• A bias is a systematic inflection in decisions that produces results
inconsistent with reality
• Bias can’t be fixed with a code change
How Does This Happen?
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” can usually work
• As long as we can quantify “close enough”
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
• We choose the data
• The software “learns” from past actions
How Can We Tell If It’s Biased?
• We look very carefully at the training data
• We set strict success criteria based on the system requirements
• We run many tests
• Most change parameters only slightly
• Some use radical inputs
• Compare results to success criteria
Amazon Can’t Rid Its AI of Bias
• Amazon created an AI to crawl the web to find job candidates
• Training data was all resumes submitted for the last ten years
• In IT, the overwhelming majority were male
• The AI “learned” that males were superior for IT jobs
• Amazon couldn’t fix that training bias
Many Systems Use Objective Data
• Electric wind sensor
• Determines wind speed and direction
• Based on the cooling of filaments
• Designed a three-layer neural network
• Then used the known data to train it
• Cooling in degrees of all four filaments
• Wind speed, direction
Can This Possibly Be Biased?
• Well, yes
• The training data could have been recorded in single
temperature/sunlight/humidity conditions
• Which could affect results under those conditions
• It’s a possible bias that doesn’t hurt anyone
• Or does it?
• Does anyone remember a certain O-ring?
Where Do Biases Come From?
• Data selection
• We choose training data that represents only one segment of the domain
• We limit our training data to certain times or seasons
• We overrepresent one population
• Or
• The problem domain has subtly changed
Where Do Biases Come From?
• Latent bias
• Concepts become incorrectly correlated
• Correlation does not mean causation
• But it is high enough to believe
• We could be promoting stereotypes
• This describes Amazon’s problem
Where Do Biases Come From?
• Interaction bias
• We may focus on keywords that users apply incorrectly
• User incorporates slang or unusual words
• “That’s bad, man”
• The story of Microsoft Tay
• It wasn’t bad, it was trained that way
Why Does Bias Matter?
• Wrong answers
• Often with no recourse
• Subtle discrimination (legal or illegal)
• And no one knows it
• Suboptimal results
• We’re not getting it right often enough
It’s Not Just AI
• All software has biases
• It’s written by people
• People make decisions on how to design and implement
• Bias is inevitable
• But can we find it and correct it?
• Do we have to?
Like This One
• A London doctor can’t get into her fitness center locker room
• The fitness center uses a “smart card” to access and record services
• While acknowledging the problem
• The fitness center couldn’t fix it
• But the software development team could
• They had hard-coded “doctor” to be synonymous
with “male”
• It was meant as a convenient shortcut
About That Data
• We use data from the problem domain
• What’s that?
• In some cases, scientific measurements are accurate
• But we can choose the wrong measures
• Or not fully represent the problem domain
• But data can also be subjective
• We train with photos of one race over another
• We train with our own values of beauty
Is Bias Always Bad?
• Bias can result in suboptimal answers
• Answers that reflect the bias rather than rational thought
• But is that always a problem?
• It depends on how we measure our answers
• We may not want the most profitable answer
• Instead we want to reflect organizational values
• What are those values?
Examples of Organizational Values
• Committed with goals to equal hiring, pay, and promotion
• Will not exclude credit based on location, race, or other irrelevant
factor
• Will keep the environment cleaner than we left it
• Net carbon neutral
• No pollutants into atmosphere
• We will delight our customers
Examples of Organizational Values
• These values don’t maximize profit at the expense of everything
• They represent what we might stand for
• They are extremely difficult to train AI for
• Values tend to be nebulous
• Organizations don’t always practice them
• We don’t know how to measure them
• So we don’t know what data to use
• Are we achieving the desired results?
• How can we test this?
How Do We Design Systems With
These Goals in Mind?
• We need data
• But we don’t directly measure the goal
• Is there proxy data?
• Training the system
• Data must reflect goals
• That means we must know or suspect the data
is measuring the bias we want
Examples of Useful Data
• Customer satisfaction
• Survey data
• Complaints/resolution times
• Maintain a clean environment
• Emissions from operations/employee commute
• Recycling volume
• Equal opportunity
• Salary comparisons, hiring statistics
Sample Scenario
• “We delight our customers”
• AI apps make decisions on customer complaints
• Goal is to satisfy as many as possible
• Make it right if possible
• Train with
• Customer satisfaction survey results
• Objective assessment of customer interaction results
Testing the Bias
• Define hypotheses
• Map vague to operational definitions
• Establish test scenarios
• Specify the exact results expected
• With means and standard deviations
• Test using training data
• Measure the results in terms of definitions
Testing the Bias
• Compare test results to the data
• That data measures your organizational values
• Is there a consistent match?
• A consistent match means that the AI is accurately reflecting organizational
values
• Does it meet the goals set forth at the beginning of the project?
• Are ML recommendations reflecting values?
• If not, it’s time to go back to the drawing board
• Better operational definitions
• New data
Finally
• Test using real life data
• Put the application into production
• Confirm results in practice
• At first, side by side with human decision-makers
• Validate the recommendations with people
• Compare recommendations with results
• Yes/no – does the software reflect values
Back to Bias
• Bias isn’t necessarily bad in ML/AI
• But we need to understand it
• And make sure it reflects our goals
• Testers need to understand organizational values
• And how they represent bias
• And how to incorporate that bias into ML/AI apps
Summary
• Machine learning/AI apps can be designed to reflect organizational
values
• That may not result in the best decision from a strict business standpoint
• Know your organizational values
• And be committed to maintaining them
• Test to the data that represents the values
• As well as the written values themselves
• Draw conclusions about the decisions being made
Thank You
• Peter Varhol
peter@petervarhol.com
• Gerie Owen
gerie@gerieowen.com

More Related Content

PPTX
Culture
PPTX
Culture deck
PPTX
NA Sales Culture
PDF
The 9 Worst Mistakes You Can Ever Make at Work
DOCX
Ethics m 3 moral decision and courage
PDF
Inside of Netflix - What is going on there?
PDF
Reed Hastings, CEO Netflix Culture
PPTX
Infosec Sucks - and its not because of he people.
Culture
Culture deck
NA Sales Culture
The 9 Worst Mistakes You Can Ever Make at Work
Ethics m 3 moral decision and courage
Inside of Netflix - What is going on there?
Reed Hastings, CEO Netflix Culture
Infosec Sucks - and its not because of he people.

What's hot (20)

PPTX
Building and Delivering High Stakes Executive Presentations
PDF
The culture of netflix
PDF
Ultimate qualities of a content strategist and designer
PPT
Campus to corporate
PPTX
Culture9 090801103430-phpapp02
PPTX
I Am Athlete - Defining Culture at the Intersection of Sports and Technology
PDF
Pop Inc. Culture - v1
PDF
Product Management: The Untapped World of Leveraging on Giants
PDF
Hiring for Devops - how to nail that DevOps interview - Uri Cohen VP GigaSpaces
PPTX
The Netflix Culture
PPTX
Hr Covered Consulting Services
PPT
The Values-Driven Startup
PDF
Agile Marketing: Does one size fit all?
PDF
Vermeer manifesto
PPT
Netflix interview questions and answers
PDF
Netflix culture deck
PDF
Better Living Through KPIs: How to Manage Your Community Team (CMX Summit Ris...
PPTX
Culture9 info
PPTX
Developing Agent Empathy Through Emotional Intelligence
PPTX
Volleyball and management
Building and Delivering High Stakes Executive Presentations
The culture of netflix
Ultimate qualities of a content strategist and designer
Campus to corporate
Culture9 090801103430-phpapp02
I Am Athlete - Defining Culture at the Intersection of Sports and Technology
Pop Inc. Culture - v1
Product Management: The Untapped World of Leveraging on Giants
Hiring for Devops - how to nail that DevOps interview - Uri Cohen VP GigaSpaces
The Netflix Culture
Hr Covered Consulting Services
The Values-Driven Startup
Agile Marketing: Does one size fit all?
Vermeer manifesto
Netflix interview questions and answers
Netflix culture deck
Better Living Through KPIs: How to Manage Your Community Team (CMX Summit Ris...
Culture9 info
Developing Agent Empathy Through Emotional Intelligence
Volleyball and management
Ad

Similar to Not fair! testing AI bias and organizational values (20)

PPTX
Not fair! testing ai bias and organizational values
PPTX
Testing for cognitive bias in ai systems
PPT
AI Ethics and Bias By Komninos Chatzipapas
PDF
Model bias in AI
PPTX
AI Fails: Avoiding bias in your systems
PPTX
Correlation does not mean causation
PDF
Algorithmic Bias - What is it? Why should we care? What can we do about it?
PDF
AI ETHICS AND BIASES (For the AI BIAS).pdf
PDF
Algorithmic Bias : What is it? Why should we care? What can we do about it?
PDF
AI day poster
PDF
When the AIs failures send us back to our own societal biases
PDF
Using AI to Build Fair and Equitable Workplaces
PPTX
Machine Learning Pitfalls
PDF
Bias in AI-systems: A multi-step approach
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
PPTX
Avoiding Machine Learning Pitfalls 2-10-18
PPTX
3 Steps To Tackle The Problem Of Bias In Artificial Intelligence
PDF
AAISI AI Colloquium 30/3/2021: Bias in AI systems
PPTX
Combatting Bias in Machine Learning
PPTX
Unpacking-AI-Bias-Towards-Fairer-Intelligent-Systems (5).pptx
Not fair! testing ai bias and organizational values
Testing for cognitive bias in ai systems
AI Ethics and Bias By Komninos Chatzipapas
Model bias in AI
AI Fails: Avoiding bias in your systems
Correlation does not mean causation
Algorithmic Bias - What is it? Why should we care? What can we do about it?
AI ETHICS AND BIASES (For the AI BIAS).pdf
Algorithmic Bias : What is it? Why should we care? What can we do about it?
AI day poster
When the AIs failures send us back to our own societal biases
Using AI to Build Fair and Equitable Workplaces
Machine Learning Pitfalls
Bias in AI-systems: A multi-step approach
Responsible AI in Industry: Practical Challenges and Lessons Learned
Avoiding Machine Learning Pitfalls 2-10-18
3 Steps To Tackle The Problem Of Bias In Artificial Intelligence
AAISI AI Colloquium 30/3/2021: Bias in AI systems
Combatting Bias in Machine Learning
Unpacking-AI-Bias-Towards-Fairer-Intelligent-Systems (5).pptx
Ad

More from Peter Varhol (15)

PPTX
DevOps and the Impostor Syndrome
PPTX
162 the technologist of the future
PPTX
Digital transformation through devops dod indianapolis
PPTX
Making disaster routine
PPTX
What Aircrews Can Teach Testing Teams
PPTX
Identifying and measuring testing debt
PPTX
What aircrews can teach devops teams ignite
PPTX
Talking to people lightning
PPTX
Using Machine Learning to Optimize DevOps Practices
PPTX
Varhol oracle database_firewall_oct2011
PPTX
Qa test managed_code_varhol
PPTX
Testing a movingtarget_quest_dynatrace
PDF
Talking to people: the forgotten DevOps tool
PPTX
How do we fix testing
PPTX
Moneyball peter varhol_starwest2012
DevOps and the Impostor Syndrome
162 the technologist of the future
Digital transformation through devops dod indianapolis
Making disaster routine
What Aircrews Can Teach Testing Teams
Identifying and measuring testing debt
What aircrews can teach devops teams ignite
Talking to people lightning
Using Machine Learning to Optimize DevOps Practices
Varhol oracle database_firewall_oct2011
Qa test managed_code_varhol
Testing a movingtarget_quest_dynatrace
Talking to people: the forgotten DevOps tool
How do we fix testing
Moneyball peter varhol_starwest2012

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf

Not fair! testing AI bias and organizational values

  • 1. Not Fair! Testing AI Bias and Organizational Values Peter Varhol and Gerie Owen
  • 2. About me • International speaker and writer • Graduate degrees in Math, CS, Psychology • Technology communicator • AWS certified • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. Gerie Owen 3 • Quality Engineering Architect • Testing Strategist & Evangelist • Test Manager • Subject expert on testing for TechTarget’s SearchSoftwareQuality.com • International and Domestic Conference Presenter Gerie.owen@gerieowen.com
  • 4. What You Will Learn • Why bias is often an outcome of machine learning results. • How bias that reflects organizational values can be a desirable result. • How to test bias against organizational values.
  • 5. Agenda • What is bias in AI? • How does it happen? • Is bias ever good? • Building in bias intentionally • Bias in data • Summary
  • 6. Bug vs. Bias • A bug is an identifiable and measurable error in process or result • Usually fixed with a code change • A bias is a systematic inflection in decisions that produces results inconsistent with reality • Bias can’t be fixed with a code change
  • 7. How Does This Happen? • The problem domain is ambiguous • There is no single “right” answer • “Close enough” can usually work • As long as we can quantify “close enough” • We don’t know quite why the software responds as it does • We can’t easily trace code paths • We choose the data • The software “learns” from past actions
  • 8. How Can We Tell If It’s Biased? • We look very carefully at the training data • We set strict success criteria based on the system requirements • We run many tests • Most change parameters only slightly • Some use radical inputs • Compare results to success criteria
  • 9. Amazon Can’t Rid Its AI of Bias • Amazon created an AI to crawl the web to find job candidates • Training data was all resumes submitted for the last ten years • In IT, the overwhelming majority were male • The AI “learned” that males were superior for IT jobs • Amazon couldn’t fix that training bias
  • 10. Many Systems Use Objective Data • Electric wind sensor • Determines wind speed and direction • Based on the cooling of filaments • Designed a three-layer neural network • Then used the known data to train it • Cooling in degrees of all four filaments • Wind speed, direction
  • 11. Can This Possibly Be Biased? • Well, yes • The training data could have been recorded in single temperature/sunlight/humidity conditions • Which could affect results under those conditions • It’s a possible bias that doesn’t hurt anyone • Or does it? • Does anyone remember a certain O-ring?
  • 12. Where Do Biases Come From? • Data selection • We choose training data that represents only one segment of the domain • We limit our training data to certain times or seasons • We overrepresent one population • Or • The problem domain has subtly changed
  • 13. Where Do Biases Come From? • Latent bias • Concepts become incorrectly correlated • Correlation does not mean causation • But it is high enough to believe • We could be promoting stereotypes • This describes Amazon’s problem
  • 14. Where Do Biases Come From? • Interaction bias • We may focus on keywords that users apply incorrectly • User incorporates slang or unusual words • “That’s bad, man” • The story of Microsoft Tay • It wasn’t bad, it was trained that way
  • 15. Why Does Bias Matter? • Wrong answers • Often with no recourse • Subtle discrimination (legal or illegal) • And no one knows it • Suboptimal results • We’re not getting it right often enough
  • 16. It’s Not Just AI • All software has biases • It’s written by people • People make decisions on how to design and implement • Bias is inevitable • But can we find it and correct it? • Do we have to?
  • 17. Like This One • A London doctor can’t get into her fitness center locker room • The fitness center uses a “smart card” to access and record services • While acknowledging the problem • The fitness center couldn’t fix it • But the software development team could • They had hard-coded “doctor” to be synonymous with “male” • It was meant as a convenient shortcut
  • 18. About That Data • We use data from the problem domain • What’s that? • In some cases, scientific measurements are accurate • But we can choose the wrong measures • Or not fully represent the problem domain • But data can also be subjective • We train with photos of one race over another • We train with our own values of beauty
  • 19. Is Bias Always Bad? • Bias can result in suboptimal answers • Answers that reflect the bias rather than rational thought • But is that always a problem? • It depends on how we measure our answers • We may not want the most profitable answer • Instead we want to reflect organizational values • What are those values?
  • 20. Examples of Organizational Values • Committed with goals to equal hiring, pay, and promotion • Will not exclude credit based on location, race, or other irrelevant factor • Will keep the environment cleaner than we left it • Net carbon neutral • No pollutants into atmosphere • We will delight our customers
  • 21. Examples of Organizational Values • These values don’t maximize profit at the expense of everything • They represent what we might stand for • They are extremely difficult to train AI for • Values tend to be nebulous • Organizations don’t always practice them • We don’t know how to measure them • So we don’t know what data to use • Are we achieving the desired results? • How can we test this?
  • 22. How Do We Design Systems With These Goals in Mind? • We need data • But we don’t directly measure the goal • Is there proxy data? • Training the system • Data must reflect goals • That means we must know or suspect the data is measuring the bias we want
  • 23. Examples of Useful Data • Customer satisfaction • Survey data • Complaints/resolution times • Maintain a clean environment • Emissions from operations/employee commute • Recycling volume • Equal opportunity • Salary comparisons, hiring statistics
  • 24. Sample Scenario • “We delight our customers” • AI apps make decisions on customer complaints • Goal is to satisfy as many as possible • Make it right if possible • Train with • Customer satisfaction survey results • Objective assessment of customer interaction results
  • 25. Testing the Bias • Define hypotheses • Map vague to operational definitions • Establish test scenarios • Specify the exact results expected • With means and standard deviations • Test using training data • Measure the results in terms of definitions
  • 26. Testing the Bias • Compare test results to the data • That data measures your organizational values • Is there a consistent match? • A consistent match means that the AI is accurately reflecting organizational values • Does it meet the goals set forth at the beginning of the project? • Are ML recommendations reflecting values? • If not, it’s time to go back to the drawing board • Better operational definitions • New data
  • 27. Finally • Test using real life data • Put the application into production • Confirm results in practice • At first, side by side with human decision-makers • Validate the recommendations with people • Compare recommendations with results • Yes/no – does the software reflect values
  • 28. Back to Bias • Bias isn’t necessarily bad in ML/AI • But we need to understand it • And make sure it reflects our goals • Testers need to understand organizational values • And how they represent bias • And how to incorporate that bias into ML/AI apps
  • 29. Summary • Machine learning/AI apps can be designed to reflect organizational values • That may not result in the best decision from a strict business standpoint • Know your organizational values • And be committed to maintaining them • Test to the data that represents the values • As well as the written values themselves • Draw conclusions about the decisions being made
  • 30. Thank You • Peter Varhol peter@petervarhol.com • Gerie Owen gerie@gerieowen.com