Understanding Black-box Predictions via Influence Functions (2017)

Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry T. Um
UNDERSTANDING BLACK-BOX PRED
-ICTION VIA INFLUENCE FUNCTIONS
1

TODAY’S PAPER
ICML2017 best paper
https://youtu.be/0w9fLX_T6tY

QUESTIONS
• How can we explain the predictions of a black-box model?
• Why did the system make this prediction?
• How can we explain where the model came from?
• What would happen if the values of a training point where
slightly changed?

INTERPRETATION OF DL RESULTS
• Retrieving images that maximally activate a neuron [Girshick et al. 2014]
• Finding the most influential part from the image [Zhou et al. 2016]
• Learning a simpler model around a test point [Ribeiro et al. 2016]
But, they assumed a
fixed model
 My NN is a function
of training inputs

INFLUENCE OF A TRAINING POINT
• What is the influence of a training example for
the model (or for the loss of a test example)?
Optimal model param. :
Model param. by training w/o z :
Model param. by upweighting z :
without z == (𝜖 = −
1
𝑛
)
• The influence of upweighting z on the parameters 𝜃

• Influence vs. Euclidean distance

• The influence of upweighting z on the loss at a test point

PERTURBING A TRAINING POINT
• Move 𝜖 mass from 𝑧 to 𝑧 𝛿
• If x is continuous and 𝛿 is small
• The effect of 𝑧  𝑧 𝛿 on the loss at a test point

SUMMARY
• The influence of 𝑧  𝑧 𝛿 on the loss at a test point
• The influence of upweighting z on the loss at a test point

EXAMPLE
• The influence of upweighting z
• In logistic regression,
• Test : 7, Train : 7 (green), 1 (red)

SEVERAL PROBLEMS
• Calculation of
 Use Hessian-vector products (HVPs)

precompute 𝑠𝑡𝑒𝑠𝑡 by optimizing
or sampling-based approximation

SEVERAL PROBLEMS
• What if is non-convex, so H < 0
 Assuming that is a local minimum point, define a quadratic loss
Then calculate using the above
 empirically working!
• Influence function vs. retraining

SEVERAL PROBLEMS
• What if is non-differentiable?
e.g.) hinge loss
 Use a differentiable variation of the hinge loss

APPLICATIONS
• Understanding model behavior

APPLICATIONS
• Adversarial examples
c.f.) The effect of 𝑧  𝑧 𝛿 on the loss at a test point

APPLICATIONS
• Debugging domain mismatch

APPLICATIONS
• Fixing mislabeled examples

Understanding Black-box Predictions via Influence Functions (2017)

More Related Content

What's hot (7)

Viewers also liked (20)

More from Terry Taewoong Um (7)

Recently uploaded (20)

Understanding Black-box Predictions via Influence Functions (2017)