ICML2017 best paper (Understanding black box predictions via influence functions)

Understanding black-box
predictions via influence
functions
XIE Ruiming

Outline
• Background
• Taylor's Formula
• Newton's Method
• Introduction
• Influence Function
• Definition
• Efficiently Calculating Influence
• Validation and Extensions
• Use cases of influence function

Background : Taylor
• Taylor's theorem gives an approximation of a k-times
differentiable function around a given point by a k-th
order Taylor polynomial
• Linear approximation
• Quadratic approximation

Background : Newton
• Find x:F(x) = 0 through iteration.
• Recall Taylor’s Formula
• F(a) ≈ F(Xn) + F’(Xn)(a – Xn)
• Set F(a) = 0, get a = Xn – F(Xn)/F’(Xn)
• Newton in optimizing
• X = argmin(F(x)), then F’(X) = 0
• Doing Newton’s method with F’(X)
• Xn+1 = Xn – F’(Xn)/F’’(Xn)

Background : Newton
• X = parameters

Introduction
• Why did the model make this prediction?

Introduction
• Retrieving images that maximally activate a neuron [Girshick et
al. 2014]

Introduction
al. 2014]
• Finding the most influential part from the image [Zhou et al. 2016]

Introduction
al. 2014]
• Finding the most influential part from the image [Zhou et al. 2016]
But, they assumed a
fixed model

Introduction
• Existing Methods
• Treat model as fixed
• Explain prediction w.r.t parameters or test input
• Our Method
• Treat model as a function of training data
• Explain prediction w.r.t the training data “most
responsible” for prediction
• How would the prediction change if we up-weighted/
modified a training point?

Influence Function
• Introduction
• Efficiently calculation
• Validation and extension

Influence Function
• the origin loss function
• optimized theta
• If we up-weighted a point z by e, new loss function
• optimized new theta

Influence Function
• We are interested in the parameter & test loss change.
• Theta change:
• Loss change:

Influence Function
• We define two influence function
• - ≈
• F(ε) = argmin =
• F(ε) ≈ F(0) + ε *

Deriving
• Use Taylor expansion on the right side
• F(θ)=
• F(θz) = F(θ) + (θz - θ) * F’(θ)
• F(θz) = 0

Perturbing a training input
• If we change (x, y) to (x + delta, y), what will test loss
change?
• (x, y) to (x + delta, y) equals to delete (x, y) then add (x +
delta, y)

Efficiently calculation
• Two challenges:
• calculating Inverse Hessian Matrix
• calculating influence function on all training points
• n training points, p parameters
• Inverting Hessian O(np2 + p3)
• Use Conjugate gradients(refer to paper), O(np)
• Stochastic estimation(refer to paper)

Validation and Extensions
• There are some assumptions & approximation:
• model parameter minimized the loss
• the loss is twice-differentiable
• We want to check the performance of influence function
when these assumptions are violated.

• Influence function vs leave-one-out retraining
• actually retrain a linear regression model after
removing a training point

• Non-convexity and non-convergence
• When theta is not a minimizer, the loss change will be a
little different( refer to paper)
• Iloss non-convex
• Person’s correlation = 0.86

• Non-differentiable losses
• Hinge loss: we can approximate using some smooth
methods

Use cases of influence functions
• Understanding model behavior
• Fixing mislabeled examples
• Adversarial training examples
• Debugging domain mismatch( refer to paper)

Understanding model behaviors
• Model1: Inception v3 with all but top layer frozen
• Model2: SVM with rbf kernel
• Task: binary image classification of fish and dog

Fixing mislabeled examples
• Only have training set.
• What do we usually do?
• Find example with the largest loss

Fixing mislabeled examples
• Experiment:
• spam email data, random change 10% label

Adversarial training examples
• There exists some paper generating some adversarial test
images that are visually indistinguishable but can fool a
classifier.
• We demonstrate we can craft adversarial training images
that can flip a model’s prediction
• Basically the idea is iterating on training images on the
direction of influence function.

Adversarial training examples
• Data same as fish vs dog
• origin correctly classified 591/600 test images.
• for each test image, find only one training image and
do 100 iterations.
• 335(57%) of the testing images were flipped
• Also, attack on one training image can influence
multiple test images.

Thank you
code: http://bit.ly/gt-influence

ICML2017 best paper (Understanding black box predictions via influence functions)

More Related Content

Similar to ICML2017 best paper (Understanding black box predictions via influence functions) (20)

Recently uploaded (20)

ICML2017 best paper (Understanding black box predictions via influence functions)