A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation

A Fairness-aware Machine Learning
Interface for End-to-end Discrimination
Discovery and Mitigation
Niels Bantilan
New York, NY
https://arxiv.org/abs/1710.06921 (2017)
Seminar: Fortgeschrittene Themen in Data Mining
Student: Waqar Alamgir / TU Braunschweig / 4850580 / wajrcs@gmail.com
09 March 2018
Source: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing 1

Problem
Machine learning models optimized only for prediction
accuracy reflect and amplify real-world social biases.
2

Bias is an amoral concept
“The preference for or against something”
3

Bias in Machine Learning
Biased
Decisions
Biased
Data
Biased
Algorithm
Biased
Predictions
4

Solution # 1
Biased
Decisions
Biased
Data
Biased
Predictions
Preprocessing
Machine
Learning
Algorithm
5

Solution # 2
Biased
Decisions
Data
Biased
Predictions
Post
processing
Bias Machine
Learning
Algorithm
6

Themis-ml
(thee-mus em-el)
An open source Python library built on
top of pandas and skit-learn that
implements fairness-aware machine
learning interfaces to measure and
reduce social bias in machine learning.
Available at github, pip and conda!
https://github.com/cosmicBboy/themis-ml
7

Fairness-aware Machine
Learning
Given set of records {(X, y)} ∈ D,
measurement of social bias b with
protected class s, and a measure of
performance p, train a machine learning
model that makes fair predictions while
preserving the accuracy of decisions.
8

Machine Learning Pipeline Recap
Instantiated Models
Raw Data
Model Specifications
Predictions on New Data
Deployed Model
Preprocessing
Training
Evaluation
Prediction
9

Themis-ml Functions
10
Preprocessing
Fairness-aware
models
Postprocessing Metrics

Let’s see some new conventions
y +
Positive target labels i.e. credit
given.
y –
Negative target labels i.e. credit not
given.
yTrain
Target variable for training set i.e.
give credit for training set.
s
Protected class (a binary variable)
i.e. female, age below 25.
Xd
Members of the disadvantaged
group i.e. immigrants.
Xa
Members of the advantaged group
i.e. citizens.
X d,y-
Negatively labeled members of
disadvantaged group.
X d,y+
Positively labeled members of
disadvantaged group.
y
Class/ target labels i.e. give credit.
11

1. Preprocessing / Transformer API / Relabeling
Description
Generates new yTrain
variable by relabelling
the target variables.
Parameters
ranker: An instance of
binary classifier i.e.
DecisionTreeClassifier
Return Values
yTrain: Dataframe of
modified target
variables for the
replacement of old y
train set.
Example
massager = Relabeller ( ranker = DecisionTreeClassifier ())
newYTrain = massager . fit ( x , yTrain , s ). transform ( x )
12

1. Preprocessing / Transformer API / Relabeling
income
Good Credit Risk
Bad Credit Risk
Woman
Man
Original Data Relabeled Data
is homeowner
13

2. Fairness-aware model / Training / Estimator API
Prejudice Remover Regularizer
Description
measures the degree to
which predictions y
and s are dependent on
each other.
Parameters
penalty: A string as tuning
parameter for biasing data
towards particular values
i.e. L1/ L2 regularization [5].
discrimination_penalty: A
string to add the
discrimination penalizer as
the prejudice index
Return Values
predicted class
variables when called
with fit and predict
functions
simultaneously.
Example
y_pred = LogisticRegressionPRR (penalty = "L2" , discrimination_penalty = "PI")
. fit (x_train , y_train , s_train). predict (x_test , s_test) 14

Prejudice Remover Regularizer
Value of weight Θ
Fairness-unaware Objective
Fairness-aware Model Objective
Fairness-utility
tradeoff
Cost
15

Additive Counterfactually Fair
Description
computes residuals
between predicted and
original class values
which is used to train
the model.
Parameters
target_estimator
continuous_estimator
binary_estimator
binary_residual_type
Return Values
predicted class
functions
simultaneously.
Example
y_pred = LinearACFClassifier ()
. fit ( x_train , y_train , s_train )
. predict ( x_test , s_test )
16

Additive Counterfactually Fair
X
s
E
X
ˆ
residual
model
model
y
ˆ
y
protected
classes
features
labels
X - ˆX
predicted
features
residual
features 17

4. Postprocessing / Predictor API
Reject Option Classification
Description
generates predicted
probabilities on train set
and compute the
proximities of each
prediction to the decision
boundary learned by
classifier.
Parameters
estimator
theta
demote
Return Values
predicted class
functions
simultaneously.
Example
y_pred = SingleROClassifier ( estimator = DecisionTreeClassifier ())
. fit ( x_train , y_train ). predict ( x_test , s_test ) 18

4. Postprocessing / Predictor API
Reject Option Classification
income
Original Prediction Relabeled Data
is homeowner
19
Good Credit Risk
Bad Credit Risk
Woman
Man

5. Metrics / Scorer
Mean Difference
Description
calculates difference
between p(a U y+) and
p(d U y+), resulting
betwen -1 to +1.
Parameters
y
s
d
Return Values
Array of float value
which is mean
difference between
advantaged group and
disadvantaged group
with error margin.
Example
md_y_true = mean_difference ( y_train , s_train )[ 0 ]
md_y_pred = mean_difference ( y_pred , s_test )[ 0 ]
diff = md_y_pred - md_y_true
20

“
Experiment with Themis-ml
Available at
https://github.com/waqar-alamgir/Fairness-aware-Machine-Learning
21

Case Study: German Credit Data
1 binary target variable y
700 “good” credit_risk
300 “bad” credit_risk
~20 input variables X
housing
credit_history
purpose
foreign_worker
personal_status_and_sex
age_in_years
3 binary protected classes s
is_foreign
is_female
age_below_25
1000 loan application records
22

German Credit Data Results
Does the baseline make socially biased predictions? 23
Baseline (B) - Remove Protected Attribute (RPA) - Relabel Target Variable (RTV) - Counterfactually Fair Model (CFM) - Reject-option Classification (ROC)

fairml
Author: Julius Adebayo
Version: 0.1.1.5
Development: Active
1. Measures fairness at data level.
2. Great visualisation of features to validate
discrimination.
Attribute variable significance (from fairml)
25

Fair-classification
Author: Muhammad Bilal Zafar
Version: Not available
Development: Active
1. Fair Classification.
2. Classification without disparate impact.
3. Classification without disparate mistreatment.
Loss in accuracy to achieve fairness (from Fair-classification).
26

“
Live Demo
From jupyter notebook available at
http://nbviewer.jupyter.org/github/waqar-alamgir/Fairness-aware-Machine-
Learning/blob/master/experiment-german-credit.ipynb
27

Conclusion
● Themis-ml is a better library compared to others.
● It has well defined interface and methods to deal discrimination as well as
mitigation.
● Model flexibility: can be applied to numbers of existing machine learning
models.
● Fairness as performance: well not just that, but includes tools to optimize
for accuracy.
● Transparency of fairness-utility tradeoff
Having said that,
● Poorly documented.
● Wrong specification / incompatible with paper.
28

References
1. Themis-ml: A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation (2017): Niels Bantilan, [online]
https://arxiv.org/abs/1710.06921 [01.11.2017]
2. Themis-ml (2017): Niels Bantilan, [online]
https://github.com/cosmicBboy/themis-ml [02.12.2017]
3. Scikit-learn (2010): David Cournapeau, [online]
https://github.com/scikit-learn/scikit-learn [15.06.2017]
4. Themis-ml installation (2017): Niels Bantilan, [online] https://github.com/cosmicBboy/themis-ml#installation [02.12.2017]
5. Objective function: [online] https://en.wikipedia.org/wiki/Loss_function [18.02.2018]
6. Regularization: Simple Definition, L1 & L2 Penalties, [online] http://www.statisticshowto.com/regularization/ [18.02.2018]
7. German-Credit Data (1994): [online]
https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) [02.12.2017]
8. Census-Income Data (2000): [online]
https://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29 [02.12.2017]
9. Fairness-aware Machine Learning (2018): Waqar Alamgir, [online] https://github.com/waqar-alamgir/Fairness-aware-Machine-Learning [02.02.2018]
10. FairML: Auditing Black-Box Predictive Models (2017): Julius Adebayo, [online] https://github.com/adebayoj/fairml [10.01.2018]
11. Fairness in Classification (2017): Muhammad Bilal Zafar, [online] https://github.com/mbilalzafar/fair-classification [13.01.2018]
12. Decision Theory for Discrimination-Aware Classification (2011): F. Kamiran, A. Karim & Xiangliang Zhang [online]
http://ieeexplore.ieee.org/document/6413831/ [02.03.2018]
13. Scikit-learn: Machine Learning in Python (2011), Pedregosa et al., JMLR 12, pp. 2825-2830
14. API design for machine learning software: experiences from the scikit-learn project (2013), Buitinck et al.
15. A survey on measuring indirect discrimination in machine learning (2015), [online]
https://www.researchgate.net/publication/283471618_A_survey_on_measuring_indirect_discrimination_in_machine_learning
16. Themis-ml experiment / Jupyter notebook (2018), [online] http://nbviewer.jupyter.org/github/waqar-alamgir/Fairness-aware-Machine-
Learning/blob/master/experiment-german-credit.ipynb 29

A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation

More Related Content

What's hot (20)

Similar to A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation (20)

Recently uploaded (20)

A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation