Ramon van den Akker. Fairness of machine learning models an overview and practical considerations

Fairness of machine learning models
an overview and practical considerations
AI & BigData Online Day 2020

Agenda
Disclaimer: The views and opinions expressed in this document are those of the author and do not necessarily reflect the official policy or position of de Volksbank and Tilburg University.
Motivating Examples
Operationalizing Fairness
1
3
Response: high-level frameworks2

1. Motivating examples
 Correctional Offender Management Profiling
for Alternative Sanctions (COMPAS)
 Recidivism Prediction Instruments (RPI) used
by US courts for pretrial decision-making, parole
decisions, and sometimes sentencing.
 On basis of 7,000 cases 2013-2014 in Broward
county, Florida ProPublica’s conclusion is that
algorithm is biased against blacks.

Google (2018). Reducing gender bias in Google Translate.
Zou, J. and L. Schiebinger (2018). AI can be sexist and racist — it’s time to make it fair, Nature 559, pp. 324-326.

Ensign, D., S.A. Friedler, S. Neville, C. Scheidegger, and S. Venkatasubramanian (2018). Runaway Feedback Loops in
Predictive Policing, Proceedings of Machine Learning Research 81, pp. 160-171.

2016 20182016 20182006 2019

2. Response: regulation is coming…
2020

Jobin, Ienca, Vayena (2019):
• Mapped and analyzed the current corpus of
principles and guidelines on ethical AI
• Identified 84 documents containing ethical
principles or guidelines for AI, 88% being after
2016.
• Convergence?
• “In particular, the prevalence of calls for
transparency, justice and fairness points to an
emerging moral priority”.
Transparency & Explainability
Justice and fairness
Non-maleficence
Responsibility
Privacy
2. Meta-analysis: Global Landscape of Ethical Frameworks for AI
and data science
From point-of-view of data scientist: fairness and explainability appear
to most difficult to implement.

3. Non-discrimination and data science
but for data scientist, typical goal is to maximize
discriminatory power!
Models should not yield unintended discrimination

protected features
A
X
Y
data
model
featureslabels3. Avoiding unintended discrimination: forbidden variables
Supervised Learning
Ignoring protected features does not work (in general)!

3. Defining (un)fairness, discrimination
 How to define (un)fairness?
 Note that is actually not a new topic
 see, for example, Hutchinson and
Mitchell (2019)
 rapidly developing literature
 mainly for binary classifiers!
 Barocas, S., M. Hardt, and A. Narayanan (2020). Fairness and machine learning – limitations and opportunities. Book in progress (https://fairmlbook.org/)
 Corbett-Davies, S. and S. Goel (2018). The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning, working paper
 Hutchinson, B. and M. Mitchell (2019). 50 Years of Test (Un)fairness: Lessons for Machine Learning. In: FAT* ’19: Conference on Fairness, Accountability, and Transparency (FAT* ’19), ACM, pp. 49-58.
 Verma, S. and J. Rubin (2018). Fairness Definitions Explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (FairWare). IEEE, pp. 1–7.
statistical
parity
fairness
through
unawareness
equal
opportunity predictive
parity
calibration
conditional
statistical
parity
equalized
odds
counter-
factual
fairness
no proxy
discrimination

Great variety of open-source fairness
dashboarding tools are available
AI Fairness 360
3. Bias/fairness dashboarding
Saleiro, P., B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London, and R. Ghani (2018). Aequitas: A Bias and Fairness Audit Toolkit, arXiv preprint arXiv:1811.05577.
How to proceed if protected group concerns special personal data?
Implementation requires choices for protected groups and fairness metrics!

3. Why do we need to make a choice: just consider all
fairness metrics
We think ProPublica’s report was based on faulty
statistics and data analysis, and that the report
failed to show that the COMPAS itself is racially
biased, let alone that other risk instruments are
biased. -- Flores et al. (2016)
There’s software used across the country to predict
future criminals. And it’s biased against blacks. --
ProPublica (2016)
source: Chouldekova (2017)
 Chouldechova (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, Big Data 5(2), pp. 153-163.
 Flores, A.W., K. Bechtel, and C.T. Lowenkamp (2016). False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s
Biased Against Blacks, Federal Probation 80(2), pp.38-46.
 Kleinberg, J., S. Mullainathan, and M. Raghavan (2017). Inherent Trade-Offs in the Fair Determination of Risk Scores. Proc. 8th Conf. on Innovations in Theoretical Computer Science (ITCS).
source: Chouldekova (2017)
In general it is impossible to satisfy multiple fairness definitions at the same time!
A choice is therefore required.

3. Fairness-by-design
“post”
processing
“pre”
processing
“in”
processing
 No changes to model itself
 Alter use of model or insert additional randomness
 No changes to algorithm
 Alter input data
 Tailor-made methods: new algorithms
If we detect problems: how to fix them? And how to create models which are `fair-by-design’?

Google (2018). Reducing gender bias in Google Translate.
 suppose 𝑷 𝒀 = 𝟏 | 𝑨 = 𝒃 = 0.3 and
𝑷 𝒀 = 𝟏 | 𝑨 = 𝒘 = 0.5.
 add additional randomness:
for A = b and 𝒀 = 𝟎 flip a coin with prob.
`heads’ equal to 0.2 and set 𝒀 = 𝟏 if `heads’
 yields statistical parity
 but loss of performance!
Examples Post Processing

protected features
A
X
Y
data
model
features
transformed
features
Calmon, F.P., D. Wei, B. Vinzamuri, K.N. Ramamurthy, and K.R. Varshney (2017). Optimized pre-processing for discrimination prevention, NIPS.
 advantage: existing algorithms can still be used in Step 2
 in general there will be a reduction of performance!
Step 1 Step 2
Example Pre-Processing

new tailor made algorithms / model
common structure:
max “performance” subject to “fairness
constraints”
max [“performance” + 𝛼“fairness”]
in general there will be a reduction of performance!
see, for example, IBM AI Fairness 360 for (open
source) demo of different algorithms
Zafar, M.B., I. Valera, M. Gomez-Rodriguez, and K.P. Gummadi (2019). Fairness Constraints: A Flexible Approach for Fair Classiﬁcation, Journal of Machine Learning Research 20 , pp. 1-42
source: Zafar et al. (2019)
Example In-Processing

3. Round-up and reflection
 Implementation fairness requires choice of protected groups
and choice for fairness metric
 Law does not seem to offer guidance on the precise metric
 GDPR can yield problems if definition protected group
depends on special personal data.
 Even worse: in some cases it will concern data that most
companies do not (want to) have available
 There is trade-off between performance and fairness. If not
clearly specificied in law: what is the appropriate balance?
Gives rise to new governance questions:
- Who should be responsible for decisions on fairness?
- How to ensure company-wide consistency in decision
making?

About me
Ramon van den Akker
Risk Modelling and AI Center, de Volksbank
Associate Professor, department of Econometrics & OR, Tilburg University
ramon.vandenakker@devolksbank.nl, r.vdnakker@uvt.nl
www.linkedin.com/in/ramonvdakker/

Ramon van den Akker. Fairness of machine learning models an overview and practical considerations

More Related Content

Similar to Ramon van den Akker. Fairness of machine learning models an overview and practical considerations (20)

More from Lviv Startup Club (20)

Recently uploaded (20)

Ramon van den Akker. Fairness of machine learning models an overview and practical considerations

Editor's Notes