Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification

Proprietary + Conﬁdential
Frederick Liu
7/17/19 @ Robust AI
Incorporating priors with feature
attribution on text classification

Proprietary + ConﬁdentialProprietary + Conﬁdential
Machine learning
.6 .2
.1
.3
.7
.1
.8
.7
.5.3
.1
.4
.9 .2
.0
.6.8 .2.1.6Toxic … … … … … … …
Neutral … … … … … …
Toxic … … … … … … …
Toxic … … … … … …
Neutral … … … … … …
Training
Inference
Gay pride is in June.
.6 .2
.1
.3
.7
.1
.8
.7
.5.3
.1
.4
.9 .2
.0
.6.8 .2.1.6
95%
Toxic

Machine learning + Explainability
.6 .2
.1
.3
.7
.1
.8
.7
.5.3
.1
.4
.9 .2
.0
.6.8 .2.1.6Toxic … … … … … … …
Neutral … … … … … …
Toxic … … … … … … …
Toxic … … … … … …
Neutral … … … … … …
Training
Inference
.6 .2
.1
.3
.7
.1
.8
.7
.5.3
.1
.4
.9 .2
.0
.6.8 .2.1.6
95%
Toxic
Gay
Pride
is
in
June
90%
1%
1%
1%
2%

Machine learning + Regularization
Toxic … … … … … … …
Neutral … … … … … …
Toxic … … … … … … …
Toxic … … … … … …
Neutral … … … … … …
Training
Inference
.5 .2
.1
.3
.5
.1
.5
.5
.5.3
.1
.4
.5 .2
.0
.5.5 .2.1.5
85%
Toxic
.5 .2
.1
.3
.5
.1
.5
.5
.5.3
.1
.4
.5 .2
.0
.5.5 .2.1.5

Machine learning + Regularization + Explainability
Toxic … … … … … … …
Neutral … … … … … …
Toxic … … … … … … …
Toxic … … … … … …
Neutral … … … … … …
Training
Inference
15%
Toxic
Gay
Pride
is
in
June
He
is
an
impolite
gay 0%
.7 .2
.1
.3
.7
.1
.7
.5.3.4
.9 .2
.1
.6.8 .20.
1
.1
.2
.5
.7 .2
.1
.3
.7
.1
.7
.5.3.4
.9 .2
.1
.6.8 .20.
1
.1
.2
.5
+
person

Regularizing + Explainability → Controllability
.6
.2
.1
.3
.7
.1
.8
.7
.5.3
.1
.4
.9
.2
.0
.6.8 .2 .1.6
Explanation

Regularizing + Explainability → Controllability
.6
.2
.1
.3
.7
.1
.3
.7
.5.3
.9
.4
.9
.2
.0
.6.8 .4 .1.7
Explanation
More Red!
Less Green!

Explainability - Integrated Gradients
Link to paper - https://arxiv.org/pdf/1703.01365.pdf

Explainability + Regularization

Results - Classification Metric

Results - Fairness Metric

Results - Shift in embedding

Proprietary + Conﬁdential
Thank You
Link to paper - https://arxiv.org/pdf/1906.08286.pdf
Sign up if you want to know more: bit.ly/model-interpret-interest

Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification

More Related Content

More from Sanjana Chowdhury (12)

Recently uploaded (20)

Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification