Not fair! testing AI bias and organizational values

Not Fair! Testing AI Bias and
Organizational Values
Peter Varhol and Gerie Owen

About me
• International speaker and writer
• Graduate degrees in Math, CS, Psychology
• Technology communicator
• AWS certified
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com

Gerie Owen
3
• Quality Engineering Architect
• Testing Strategist & Evangelist
• Test Manager
• Subject expert on testing for
TechTarget’s
SearchSoftwareQuality.com
• International and Domestic
Conference Presenter
Gerie.owen@gerieowen.com

What You Will Learn
• Why bias is often an outcome of machine learning results.
• How bias that reflects organizational values can be a desirable result.
• How to test bias against organizational values.

Agenda
• What is bias in AI?
• How does it happen?
• Is bias ever good?
• Building in bias intentionally
• Bias in data
• Summary

Bug vs. Bias
• A bug is an identifiable and measurable error in process or result
• Usually fixed with a code change
• A bias is a systematic inflection in decisions that produces results
inconsistent with reality
• Bias can’t be fixed with a code change

How Does This Happen?
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” can usually work
• As long as we can quantify “close enough”
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
• We choose the data
• The software “learns” from past actions

How Can We Tell If It’s Biased?
• We look very carefully at the training data
• We set strict success criteria based on the system requirements
• We run many tests
• Most change parameters only slightly
• Some use radical inputs
• Compare results to success criteria

Amazon Can’t Rid Its AI of Bias
• Amazon created an AI to crawl the web to find job candidates
• Training data was all resumes submitted for the last ten years
• In IT, the overwhelming majority were male
• The AI “learned” that males were superior for IT jobs
• Amazon couldn’t fix that training bias

Many Systems Use Objective Data
• Electric wind sensor
• Determines wind speed and direction
• Based on the cooling of filaments
• Designed a three-layer neural network
• Then used the known data to train it
• Cooling in degrees of all four filaments
• Wind speed, direction

Can This Possibly Be Biased?
• Well, yes
• The training data could have been recorded in single
temperature/sunlight/humidity conditions
• Which could affect results under those conditions
• It’s a possible bias that doesn’t hurt anyone
• Or does it?
• Does anyone remember a certain O-ring?

Where Do Biases Come From?
• Data selection
• We choose training data that represents only one segment of the domain
• We limit our training data to certain times or seasons
• We overrepresent one population
• Or
• The problem domain has subtly changed

• Latent bias
• Concepts become incorrectly correlated
• Correlation does not mean causation
• But it is high enough to believe
• We could be promoting stereotypes
• This describes Amazon’s problem

• Interaction bias
• We may focus on keywords that users apply incorrectly
• User incorporates slang or unusual words
• “That’s bad, man”
• The story of Microsoft Tay
• It wasn’t bad, it was trained that way

Why Does Bias Matter?
• Wrong answers
• Often with no recourse
• Subtle discrimination (legal or illegal)
• And no one knows it
• Suboptimal results
• We’re not getting it right often enough

It’s Not Just AI
• All software has biases
• It’s written by people
• People make decisions on how to design and implement
• Bias is inevitable
• But can we find it and correct it?
• Do we have to?

Like This One
• A London doctor can’t get into her fitness center locker room
• The fitness center uses a “smart card” to access and record services
• While acknowledging the problem
• The fitness center couldn’t fix it
• But the software development team could
• They had hard-coded “doctor” to be synonymous
with “male”
• It was meant as a convenient shortcut

About That Data
• We use data from the problem domain
• What’s that?
• In some cases, scientific measurements are accurate
• But we can choose the wrong measures
• Or not fully represent the problem domain
• But data can also be subjective
• We train with photos of one race over another
• We train with our own values of beauty

Is Bias Always Bad?
• Bias can result in suboptimal answers
• Answers that reflect the bias rather than rational thought
• But is that always a problem?
• It depends on how we measure our answers
• We may not want the most profitable answer
• Instead we want to reflect organizational values
• What are those values?

Examples of Organizational Values
• Committed with goals to equal hiring, pay, and promotion
• Will not exclude credit based on location, race, or other irrelevant
factor
• Will keep the environment cleaner than we left it
• Net carbon neutral
• No pollutants into atmosphere
• We will delight our customers

Examples of Organizational Values
• These values don’t maximize profit at the expense of everything
• They represent what we might stand for
• They are extremely difficult to train AI for
• Values tend to be nebulous
• Organizations don’t always practice them
• We don’t know how to measure them
• So we don’t know what data to use
• Are we achieving the desired results?
• How can we test this?

How Do We Design Systems With
These Goals in Mind?
• We need data
• But we don’t directly measure the goal
• Is there proxy data?
• Training the system
• Data must reflect goals
• That means we must know or suspect the data
is measuring the bias we want

Examples of Useful Data
• Customer satisfaction
• Survey data
• Complaints/resolution times
• Maintain a clean environment
• Emissions from operations/employee commute
• Recycling volume
• Equal opportunity
• Salary comparisons, hiring statistics

Sample Scenario
• “We delight our customers”
• AI apps make decisions on customer complaints
• Goal is to satisfy as many as possible
• Make it right if possible
• Train with
• Customer satisfaction survey results
• Objective assessment of customer interaction results

Testing the Bias
• Define hypotheses
• Map vague to operational definitions
• Establish test scenarios
• Specify the exact results expected
• With means and standard deviations
• Test using training data
• Measure the results in terms of definitions

Testing the Bias
• Compare test results to the data
• That data measures your organizational values
• Is there a consistent match?
• A consistent match means that the AI is accurately reflecting organizational
values
• Does it meet the goals set forth at the beginning of the project?
• Are ML recommendations reflecting values?
• If not, it’s time to go back to the drawing board
• Better operational definitions
• New data

Finally
• Test using real life data
• Put the application into production
• Confirm results in practice
• At first, side by side with human decision-makers
• Validate the recommendations with people
• Compare recommendations with results
• Yes/no – does the software reflect values

Back to Bias
• Bias isn’t necessarily bad in ML/AI
• But we need to understand it
• And make sure it reflects our goals
• Testers need to understand organizational values
• And how they represent bias
• And how to incorporate that bias into ML/AI apps

Summary
• Machine learning/AI apps can be designed to reflect organizational
values
• That may not result in the best decision from a strict business standpoint
• Know your organizational values
• And be committed to maintaining them
• Test to the data that represents the values
• As well as the written values themselves
• Draw conclusions about the decisions being made

Thank You
• Peter Varhol
peter@petervarhol.com
• Gerie Owen
gerie@gerieowen.com

Not fair! testing AI bias and organizational values

More Related Content

What's hot (20)

Similar to Not fair! testing AI bias and organizational values (20)

More from Peter Varhol (15)

Recently uploaded (20)

Not fair! testing AI bias and organizational values