A Tutorial to AI Ethics - Fairness, Bias & Perception

Ethics for Artificial
Intelligent Beings.
AI Ethics Workshop, 25th of June, 2018, Bonn, Germany.
Dr. Kim Kyllesbech Larsen, Deutsche Telekom.
Do we need to care
about it? AI

2
Dr. Kim K. Larsen / How do we Humans feel about AI?
What about Us?

HOW DO YOU FEEL ABOUT A.I.?
16%
36%
48%
Negative Neutral Positive
20% of respondents are enthusiastic about AI.
15% are uncomfortable or scared about AI.
Millennials are significantly more negative towards AI.
SurveyMonkey “Artificial Intelligence & Human Decision Making Sentiment Survey “ (November 2017); 467 responses.
(Note: this data does not include Millennial statistics)

DO YOU BELIEVE THAT AI COULD
IMPACT YOU OR YOUR FRIENDS JOBS?
Yes,
47%
No,
53%
Yes,
20%
No,
80%
Do you believe your job
could be replaced by an AI?
Thinking of your friends,
do you believe their jobs
could be replaced by an AI?
SurveyMonkey “Millennials – Digital Challenges & Human Answers Survey “ (March 2018), average age 28.8 years.

How do you think AI’s will impact your child’s
or children’s future in terms of job & income?
20% 16%
64%
Worse Same as Today Better
31%
23%
46%
Worse Same as Today Better
Men with Children above 18 yrs of Age Men with Children under 18 yrs of Age

Note: The gender bias on this slide is completely intended.

A Tutorial to AI Ethics - Fairness, Bias & Perception

A
DIE
LIVE
• AN AI-BASED AUTONOMOUS CAR SLIPS ON
AN ICY MOUNTAIN ROAD OVER TO OPPOSITE
SIDE OF ROAD WHERE A NORMAL CAR IS
APPROACHING.
• THE AI IS PROGRAMMED TO NOT
DELIBERATELY CAUSE INNOCENT
BYSTANDERS HARM.
• THE AI CONTINUES THE CARS DIRECTION 200
METER DOWN THE ROCKY VALLEY. A FAMILY
OF 4 PARISH.
• “SAVING” THE 1 PASSENGER OF THE
APPROACHING CAR.

B
DIE
LIVE
• AN AI-BASED AUTONOMOUS CAR SLIPS ON
AN ICY MOUNTAIN ROAD OVER TO OPPOSITE
SIDE OF ROAD WHERE A NORMAL CAR IS
APPROACHING.
• THE AI IS PROGRAMMED TO SAVE ITS
PASSENGERS AT ALL COST FROM MORTAL
DANGER IF THE LIKELIHOOD OF SUCCESS IS
HIGHER THAN 50%.
• THE AI COLIDES WITH THE APPROACHING
CAR & SAVES ITS 4 PASSENGERS.
• THE APPROACHING NORMAL CAR IS PUSHED
200 METERS DOWN INTO THE ROCKEY
VALLEY & ITS 1 PASSENGER PERISH.

YOU ARE THE AI DESIGNER!
Q1: SHOULD YOU PROGRAM
THE AUTONOMOUS CAR AI
TO SAVE ITS DRIVER
(& POSSIBLE PASSENGERS)
AT ALL “COST”?
Q2: WOULD YOU DEFINE A
“COST” THRESHOLD?

Immanuel Kant Jeremy Bentham
 Deontological Ethics
 Duty-based.
 Rule-based (e.g., Rule
of Law).
 Asimov’s Laws.
 10 Commandments.
 Golden Rules.
 Moral imperative.
 Utilitarian Ethics
 Consequentialism.
 The “greater” good.
 "it is the greatest
happiness of the
greatest number that is
the measure of right
and wrong.”

An ethical framework
will depend on the
cultural, political &
religious background
of the issuer.

AI
=
?
=
?
=
?
=
?
=
?AI AI AI AI AI
=
? AI
For an AI Ethical Framework Design
Does religious background matter?
Does socio-cultural background matter?

Societal Ethics
(Public, Policy & Law Makers)
Business Ethics
(Corporations)
Data Universe
(all data cumulated)
Part of the Data Universe deemed
permissible by societal norms for algorithmic
processing (analysis). Note that all of what
might be deemed permissible from a societal
perspective might not be acceptable from a
business/commercial ethical perspective,
Part of the Data Universe deemed permissible
by business ethical norms for algorithmic
processing (analysis). Note this can (should)
never be larger than what is in general
acceptable to society or the rule of law.
Algorithm sub-space … that
could be acting on the Data
Universe under given societal
or business ethical guidelines
Part of the Data Universe that has
been deemed impermissible to
process algorithmically by Society
incl. Businesses. E.g, due to
privacy concern, un-ethical (within
societal norms & laws) use, etc..

KIM = WOMAN
NURSE = WOMAN
DOCTOR = MAN
COOKING = WOMAN
MANAGER = MAN
HIGH RISK LOW RISK
https://www.propublica.org/article/machine-
bias-risk-assessments-in-criminal-sentencing
“Men also like shopping” by Jieyu Zhao et al.
By Joy Buolamwini et al
https://newrepublic.com/article/144644/turns-
algorithms-racist
As AIbecomes more andmorecomplex, it can
become difficult for even its own designers
understand why it acts the way it does.

Cognitive biases Statistical bias
Contextual biases
Bias is a disproportionate weight in favor of or against
one thing, person, or group compared with another,
usually in a way considered to be unfair.
ℬO MO = E MO − O
O: Observation.
M: Model estimator of observation O.
E: Expected value (long-run average)
B: Bias of model of O relative to O.
Hundreds of cognitive biases, e.g.,
Anchoring, Aavailability, Confirmation bias,
Belief bias, framing effect, etc…
(See: https://en.wikipedia.org/wiki/List_of_cognitive_biases)
Academic bias, Experimenter bias, Educational bias,
Religious bias, cultural bias, etc…
(See: https://en.wikipedia.org/wiki/Bias#Contextual_biases)

Basic on fairness & statistical bias
ℬ = P Y = y X = x, G = b) − P Y = y X = x, G = w)
X = x ∈ ℝn varables of direct interest i. e. , covariates
for a given score S = S(x).
Group G ∈ b, w of which individuals belong, e.g.,
b = black (African-Americans), w = white (Caucasians).
Binary outcome indicator Y = y ∈ 0,1 depending on S, e.g.,
Y = 1 indicates individual High Risk (S ≥ 4) of re-offend.
Y = 0 indicates individual Low Risk (S < 4) of re-offend.
ℬ = 0 then score S is well calibrated unbiased .
ℬ ≠ 0 then score S is said to be biased.
Based on Chouldechova (2017) Fair prediction with disparate impact A study of bias in recidivism prediction instruments.

Bias case study (1 of 4)
Machine Bias in Risk Assessments of Criminal Sentencing
US Distribution Arrests Likelihood of being arrested
Population Total Total Male Population reference
Caucasians 61% 57% 2.4% 3.7% 1.5%
African-Americans 13% 27% 5.2% 7.9% 0.7%
Others 26% 16% 1.6% 2.4% 0.4%
Total 100% 100% 2.6% 3.9% 2.6%
Race
Arrest Likelihood within Race
• In USA there are ca. 4.5× more Caucasians than African-Americans.
• Still African-Americans are more than 2+ times more likely to be arrested.
• Ca. 11% more African-Americans males (79%) where repeat offender compared to Caucasians (68%).
• Ca. 3% more African-Americans males (18%) where violent recidivist compared to Caucasians (15%).
Propublica found :
a. Odds of a African-American male being assessed to have a high risk of recidivism is double that for a
Caucasian male.
b. Substantially higher amount (2× - 3×) of False Positives among AAs compared to Cs.

High Risk of Recidivism
differences between
African-Americans
& Caucasians
𝓑
The difference in getting a
High Risk Score if you are a
Black Male vs White Male.
P(High Risk | Male, African-American) – P(High Risk | Male, Caucasian) > 0

differences between
African-Americans
& Caucasians
𝓑 The difference in getting a
High Risk Score if you are a
Black Male vs White Male
and being less than 25 yrs old
P(High Risk | Male, Age<25yrs, African-American) – P(High Risk | Male, Age<25yrs, Caucasian) > 0

No
statistically
significant
difference
differences between
African-Americans
& Caucasians
ℬ = P Y = 1 𝐗 = 𝐱, G = African − American) − P Y = 1 𝐗 = 𝐱, G = Caucasian)
𝑌 =
1 𝑓𝑜𝑟 4 ≤ 𝑆(𝑥) ≤ 10 𝐻𝑖𝑔ℎ 𝑅𝑖𝑠𝑘 𝑜𝑓 𝑅𝑒𝑐𝑖𝑑𝑖𝑣𝑖𝑠𝑚
0 𝑓𝑜𝑟 𝑆 𝑥 < 4 𝐿𝑜𝑤 𝑅𝑖𝑠𝑘 𝑜𝑓 𝑅𝑒𝑐𝑖𝑑𝑖𝑣𝑖𝑠𝑚
X = x ∈
𝑀𝑎𝑙𝑒, 𝐿𝑒𝑠𝑠 − 𝑡ℎ𝑎𝑛 − 25, 𝐹𝑒𝑙𝑜𝑛𝑦, 𝑃𝑟𝑖𝑜𝑟𝑠,
𝐽𝑢𝑣𝑒𝑛𝑖𝑙𝑒 𝑃𝑟𝑖𝑜𝑟𝑠, 𝑉𝑖𝑜𝑙𝑒𝑛𝑡 𝑅𝑒𝑐𝑖𝑑
𝓑

44 11
15 31
25 14
14 47
62 8
15 15
27 21
7 45
69 5
17 9
2 26
2 70
33 11
17 39
African-American
Caucasian
Males
Below 25 yrs
Below 25 yrs
Above 25 yrs
Above 25 yrs
0 23
0 77
6 33
7 54
16 14
9 69
51 6
26 17
2 26
2 70
33 11
17 39
2 26
2 70
33 11
17 39
2 Yr recid.
No 2 Yr recid.
2 Yr recid.
No 2 Yr recid.
2 Yr recid.
No 2 Yr recid.
2 Yr recid.
No 2 Yr recid.
0 14
0 86
0 29
0 71
0 13
3 84
0 0
33 67
12 42
15 30
78 0
22 0
75 0
25 0
93 0
7 0
No Juv. Priors + No Priors
Juvenile Priors
Juvenile Priors
Juvenile Priors
Juvenile Priors
Bias case study
(3 of 4)
TN
LOW
RISK
FP
HIGH
RISK
FN
LOW
RISK
TP
High
RISK
Note: the binary classification model used
here does include Race to predict recidivism
risk. Ignoring race in the model does not lead
to substantially less biased results.
FP → A Human is incorrectly
(“falsely”) assessed to have a High
Risk of re-offending → Gets a
substantially stricter treatment.
FN → A Human is incorrectly
(“falsely”) assessed to have a
Low Risk of re-offending → Gets
a substantially lighter treatment.
Might result in serious crime.
Confusion Matrix
(all normalized)

Debiasing strategies (non exhaustive).
25
TN LOW
RISK
14
FP HIGH
RISK
14
FN LOW
RISK
47
TP High
RISK
62
TN LOW
RISK
8
FP HIGH
RISK
15
FN LOW
RISK
15
TP High
RISK
African-American
Males
Caucasian
Males
Binary Classification Model of
Low vs High Risk recidivism.
Model include; Race (African-American,
Caucasian & Other), Gender (female /
male), age category (below & above 25),
Charge Degree (misdemor & felony),
#Juvenile prior count (≥0). #Prior counts
(≥0), 2 Yr recidivism (0/1) & Violent
Recidivism (0/1).
No de-biasing
28
TN LOW
RISK
10
FP HIGH
RISK
19
FN LOW
RISK
42
TP High
RISK
58
TN LOW
RISK
9
FP HIGH
RISK
15
FN LOW
RISK
18
TP High
RISK
African-American
Males
Caucasian
Males
Model note; Race has been taken out of the
model.
Remove race from model
32
TN LOW
RISK
8
FP HIGH
RISK
24
FN LOW
RISK
36
TP High
RISK
57
TN LOW
RISK
5
FP HIGH
RISK
22
FN LOW
RISK
17
TP High
RISK
African-American
Males
Caucasian
Males
Model note; Race has been taken out of the
model. The training data has been
rebalanced from (CC, AA, OTH)=(34%,
51%, 15%) to the demographic blend (61%,
13%, 26%). However, it does result in less
training data as the proportion of African-
Americans (AA) are dramatically reduced,
i.e., likely hurting accuracy.
Remove race from model
& re-balance race mix to
demographic blend
FPR = 36
FNR = 22
FPR = 11
FNR = 43
FPR = 15
FNR = 32
FPR = 9
FNR = 51
FPR = 28
FNR = 30
FPR = 17
FNR = 45

The “Unfairness Law”
On group fairness.
If a model satisfy predictive parity,
but the prevalence differs between groups,
then that model cannot achieve
equal False Positive & False Negative Rates
across those groups.
See: Chouldechova (2017) Fair prediction with disparate impact A study of bias in recidivism prediction instruments
Base Rate P(Y=1 | G = g)
Positive Predictive Value
PPV = TP / (TP + FP)
Not all fairness criteria can be satisﬁed at the same time!

Is Google Translate misogynist?
English → Hungarian → English
He
is
a
Nurse.
She
Is
a
Medical
Doctor.
Note: Hungarian language does not have gender-specific pronouns and lacks
grammatical gender. If starting point is “The Woman is a Doctor, the Man is a Nurse”,
Google translate from and back to English via Hungarian will work better.
Note the Hungarian Ő has
the meaning of She/He
You will get the same
translation bias with
“He is an Assistant.
She is a Manager.”

Father is to a doctor as a Mother is to a nurse
Man is to computer programmer as Woman is to homemaker
Boy is to gun as Girl is to Doll
Man is to Manager as Woman is to Assistant
Gender biases.
https://developers.google.com/machine-learning/fairness-overview/

Approach to de-biasing biased representations.
bias
non-bias
Assistant
Manager
Man
Woman
He
She
Bolukbasi et al (2016), “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word
Embeddings”, 30th Conferenceon Neural Information Processing Systems (NIPS2016), Barcelona, Spain
DefinitionalNeutralWords
Word Embeddings e ∈ ℝ 𝑛
Softmax
Word Embeddings
e ∈ ℝ 𝑛
3 Bn
Words
3 mn 300-dim English
words w2vNEWS vector
• Find bias direction based on difference
between definitional opposing word vectors
(e.g., he-she, male-female, …)
• Neutralized non-definitional words by
projecting them on to the non-bias axis.

bias
non-bias
Assistant
Manager
Man
Woman
He
She
Woman is closer to
Assistant than Man
Woman is closer to
Assistant than Man
Girl
Boy
Word Embeddings w ∈ ℝ 𝑛
between definitional opposing word
vectors (e.g., he-she, male-female, …)
• Equalize pairs of definitional words to be
equidistant to Neutral Words.

bias
non-bias
Assistant
Manager
Man
Woman
He
She
Equalize pairs to be equidistant
to non-biased descriptor
Girl
Boy
Word Embeddings w ∈ ℝ 𝑛
between definitional opposing word
vectors (e.g., he-she, male-female, …)
• Equalize pairs of definitional words to be
equidistant to Neutral Words.

"Gay faces tended to be gender atypical," the researchers said.
"Gay men had narrower jaws and longer noses, while lesbians had larger jaws."
Wang, Y., & Kosinski, M. (in press). Deep neural networks are more accurate than humans at
detecting sexual orientation from facial images. Journal of Personality and Social Psychology
DEEP NEURAL NETWORKS
CAN DETECT
SEXUAL ORIENTATION
FROM FACES

ACCURACY = 80%
What’s the likelihood you are Gay given
your have been “diagnosed” Gay;
P(Gay │Positive Detection) =
P(G) × P( PD│G) divided by
( P(G)×P( PD│G) + P(¬G )×P( PD│ ¬G )
NOTE: This estimate given here is illustrative and possible wrong. As Wang, Y., &
Kosinski has not provided other numbers than their 81% accuracy and supposedly a 23%
False Negative rate, i.e., algorithm predicts a Gay man to be Straight.
TRUE
NEGATIVE
40%
TRUE
POSITIVE
40%
FALSE
POSITIVE
10%
FALSE
NEGATIVE
10%
ActualClass
Predicted Class
Males
Only
GAY
STRAIGHT
GAYSTRAIGHT
P(Gay│Positive Detection) =
2% × 40% / (2% × 40% + 98% × 10%)
≈ 8% and ….
after 5 positive detections the likelihood is
≈ 73%
(Don’t take this analysis too serious! … I don’t).
Own guestimates
FP → A Human is incorrectly (“falsely”)
assessed to be Gay → Can lead to severe
repercussions for the individual.
FN → A Human is incorrectly
(“falsely”) assessed to Straight →
Unlikely to have any impact.
TP → A Human is correctly (“truly”)
assessed to be Gay → Can lead to severe
repercussions for the individual.

?
64x64x3
Female
German
Telekom
DNN Architecture e.g., 128/64/32/1 (4 Layers)
Trained on 6,992+ LinkedIn pictures.
TRUE POSITIVES
Male
Polish
Vodafone
FALSE NEGATIVE
What’s your Gender, Nationality & Employer.
How much does your face tell about you?

What’s your Gender?
MANp –
MANa
TRUE
NEGATIVE
WOMANp -
WOMANa
TRUE
POSITIVE
WOMANp -
MANa
FALSE
POSITIVE
MANp -
WOMANa
FALSE
NEGATIVE
ActualClass
Predicted Class
On Test
Data
0.45
0.32
0.16
0.07
Predicted Class
ACC = 77%
PRE = 74%
REC = 87%

36
The Cost of Machine Error.
(and of Human error)
20 Million Muslims in EU. Expect < 1,300 active (Muslim) terrorists in EU* ~ 1 in 15 Thousand.
(note: 5,000+ Europeans estimated to have travelled to Iraq & Syria by end of 2015)
Assume a sophisticated model** gives the following:
Thsd. FALSE (0) TRUE (1)
FALSE (0) 19,978 9.0
TRUE (1) 0.3 1.0
Costly police work
and time spend on
the wrong people.
Predicted
Actual
(*) 687 (suspects) were arrested for terrorism related-offences in 2015 (source: Europol TE-SAT 2016 report).
(**) e.g., Bayesian machine learning models, Deep learning methodologies, social network analysis (e.g., social physics).
STILL
BOOM!
?
ACCURACY = 99.99%
ERROR = 0.01%

Performance of Learning Machines.
(the cost of machine error vs human error)
TRUE
NEGATIVE
(TN)
TRUE
POSITIVE
(TP)
FALSE
POSITIVE
(FP)
FALSE
NEGATIVE
(FN)
Actual
Class
Predicted
Class
Precision =
TP
TP + FP
TP
TP + FN
Recall =
TN
TN + FN
Accuracy =
TP +
TP + FP +
F1-Score =
+
2
1 1
Prec. Recl.

Could be Very
Costly!
Could be Very
Costly!
 

(Ex. 10%) (Ex. 77%)
(Ex. 99.99%)
(Ex. 18%)
Note: the structure of Python sklearn.metrics confusion_matrix.is used.
(Positive Predictive Value) (Sensitivity)

Ethical AI architectures – Illustration (1 of 2).
Including Bias & Fairness
checks with corrections
(e.g., de-biasing)

Ethical AI architectures – Illustration (2 of 2).
Vanderelst, D. and Winfield, A. (2018). An architecture for ethical robots inspired
by the simulation theory of cognition. Cognitive Systems Research, 48, pp.56-66.
Roboticist Alan Winfield of Bristol Robotics
Laboratory in the UK built an ethical trap for a
robot following Asimov’s Laws.
https://www.youtube.com/watch?v=jCZDyqcxwlo
42% of the time the
Robot took so long to decide action
that the “Human” perished.

40
What about Us?

41
• 4 AI-related surveys
• Paid responses.
• Social media responses.
• Error-margin 4% - 6%.
 US population focused.
 18 – 75 Yrs old.
 HH Income > 75k US$.
 Millennials 18 – 40 yrs old
(*) Median US HH income in 2016 was 58k pa. A little more than 30% of US households earn 75k or more annually (2016).
 Do weHumans trust AIs?
 Acceptance of Artificial Intelligence
inCorporate Decision Making.
 AIEthics.
AI Strategy&Policy
Aistrategyblog.com

THE GOOD
& THE
VERY
BAD
“How to train your
ai chatbot to
become evil in less
than 24 hours”.

Humans are
increasingly
dependent
on digital
technologies.
“Today,
everything
& more that
makes us us
are replicated
in the digital
world.

10%
39%
28%
13%
10%
Exclusive Frequently Sometime Rarely Never
How often do you rely on
news from social media ?
~50%
7%
14%
43%
22%
14%
Very High High About half
appears
truthfull
Low Very Low
What is your trust level in
what you read on social
media ?
~80%

Is it acceptable that your personal data,
residing on social media platforms, are used for
ADVERTISING? INFLUENCING ?
Yes,
18%
No,
82%
Yes,
33%
No,
67%

4%
52%
44%
Below average Average Above average
~44%
Thinking of your
friends, do you
believe their
opinions & values
are negatively
influenced by
social media?
How would you
characterize your
ability to detect
fake news
compared
to your friends?
50%
40%
10%
Frequently Sometimes Rarely
~50%

4
A
I

HOW DO YOU FEEL ABOUT A.I.?
Millennials (18 – 38 yrs old).
31%
45%
24%
~24%
23%
41%
36%
~36%

51
How do you feel about A.I.? (All Groups)
18%
44%
38%
14%
30%
57%
16%
44%
41%
WOMEN WITH CHILDREN
UNDER 18 YRS OF AGE
8%
27%
65%
MEN WITH CHILDREN
UNDER 18 YRS OF AGE

Would you trust an AI
with a critical
corporate decision?
AI “Allergy” is a Real
Corporate Ailment!
19%
43%
25%
12%
0%
Never Infrequently About half the
time
Frequently Always
Would you trust a critical corporate decision made by
an AI?12%
2%
9%
36%
47%
6%
Never Infrequently About half
the time
Frequently Always
Would you trust a critical corporate decision made
by a fellow human expert or superior?53%

53Dr. Kim K. Larsen / How do we Humans feel about AI?
We hold AI-s to much stricter standards
than our fellow humans.
Who do you trust the most with corporate decisions …
Your fellow Human or your Corporate AI?
2%
12%
33%
49%
4 %
the time
Frequently Always
Would you trust a corporate decision made by a
fellow human whos sucess rate is better than 70%?53%
11%
32%
40%
16%
1%
the time
Frequently Always
Would you trust a critical corporate decision made by
an AI which success rate is better than 70%?17%
Think a minute about how forgiving we
are with the accident made by human
drivers versus self-driving cars!

Do you trust that companies using AI have
your best interest in mind?
≈
25%
75%
Yes No
≈

THANK YOU!Acknowledgement
Many thanks to Viktoria Anna Laufer in
particular and other colleagues who have
contributed with valuable insights, discussions
& comments throughout this work.
Also I would like to thank my wife Eva Varadi
for her patience during this work.
Contact:
Email: kim.larsen@telekom.hu
Linkedin: www.linkedin.com/in/kimklarsen
Blogs: www.aistrategyblog.com & www.techneconomyblog.com
Twitter: @KimKLarsen

A Tutorial to AI Ethics - Fairness, Bias & Perception

More Related Content

What's hot (20)

Similar to A Tutorial to AI Ethics - Fairness, Bias & Perception (20)

More from Dr. Kim (Kyllesbech Larsen) (20)

Recently uploaded (20)

A Tutorial to AI Ethics - Fairness, Bias & Perception

Editor's Notes