DNN Model Interpretability

Interpretation and Explanation Methods For DNN Models
Presented by:
• Subhashis Hazarika, The Ohio State University
CSE 6559: Seminar Presentation
TopicsCovered
Activation Maximization Sensitivity Analysis Simple Taylor Decomposition
Layer-wise Relevance Propagation Guided Backpropagation LRP and deep Taylor
Explanation Evaluation Techniques Applications Concept Activation Vectors (CAV)
Testing with Concept Activation Vectors (TCAV) TCAV Applications

Methods for Interpreting and Understanding
Deep Neural Networks
Authors:
• Gregoire Montavon, TU Berlin
• Wojciech Samek, Fraunhofer Heinrich Hertz Institute, Berlin
• Klaus-Robert Muller, TU Berlin
Presented by:

Outline
• Motivation
• Interpretation vs Explanation
• Interpretation
• Activation Maximization
• Explanation
• Gradient vs Decomposition
• Relevance Propagation (deep Taylor decomposition)
• Evaluating Explanations
• Applications

Motivation
1. Verify that the classifier works as expected

Motivation
2. Improve Classifier

Motivation
3. Learn from the learning machine

Motivation
4. Interpretability in the sciences

Motivation
5. Compliance to legislation

Understanding Deep Nets : Two Views

Model Analysis  Interpretation

Decision Analysis  Explanation

Interpreting a DNN Model
• Build a prototype in the input domain of the DNN (e.g image or text) that is
interpretable and representative of the abstract learned concept
• Activation Maximization (AM): An analysis framework that searches for an input
pattern that produces a maximum model response for a quantity of interest.

Improving AM
• Simple AM :
• Improving AM with an expert:
• Perform AM in code space:
• These techniques require an unsupervised model of the data, either a density model p(x)
or a generator model g(z).

Limitation of Global Interpretations

From Global Interpretations to Individual Explanations
Explanability

Explaining DNN decisions
• “why” does the model arrive at a certain prediction?
• “why” is the image below classified as a shark?

Explainability: Basic Techniques
• Sensitivity Analysis (Gradient Approach)
• Simple Taylor Decomposition (Function-decomposition Approach)
• Backward Propagation Techniques
• Layer-wise Relevance Propagation (decomposition)
• Guided Backpropagation (gradient)

What does Sensitivity Analysis decompose?

Decomposing the Correct Quantity
𝒇(𝒙) =
Taylor Series:

Simple Taylor Decomposition
• Achievable for linear models and deep ReLU
𝑓 𝑡𝑥 = 𝑡𝑥 → 𝑓 0 = 0
• We can choose 𝑥 = lim
𝑡→0
𝑡. 𝑥
• Final Relevance; 𝑅𝑖 =
𝜕𝑓
𝜕𝑥 𝑖 𝑥=0
. 𝑥𝑖
• i.e, gradient x input

Why Simple Taylor doesn’t work?
1. Root point is hard to find or too far from 𝑥 (losses the context of 𝑥)
2. Gradient shattering  the gradient of deep nets has low informative value [Balduzzi ‘17]

Backward Propagation Techniques
• Make explicit use of the graph structure
• Layer-wise Relevance Propagation (Conserving)
• Guided Backpropagation (Non conserving)
• BPT were shown empirically to scale better to complex DNN models
• Facilitates filtering (breaks the explanation into multiple subtask)

Layer-wise Relevance Propagation
• Each neuron receives a share of the network output, and redistributes it to its
predecessors in equal amount, until the input variables are reached
Conservation Property:
LRP - 𝛼1 𝛽0 rule:

LRP propagation rules
• LRP - 𝛼𝛽 rule [Landecker`13, Bach`15, Zhang`16, Montavon`17]
• General rule:
1
0
-1
1
0
-1
1
0
-1

LRP and deep Taylor decomposition
• LRP - 𝛼1 𝛽0 rule  deep Taylor decomposition [Montavon`17]
Assumption:

1. Build the Relevance Neuron

2. Expand the Relevance Neuron

3. Decompose Relevance

4. Pooling relevance over all outgoing neurons

Evaluating Explanation Quality
• Explanation Continuity
• Explanation Selectivity

Explanation Continuity
• “If two data points are nearly equivalent, then the explanations of their predictions
should also be nearly equivalent.”
• Explanation continuity (or lack of it) can be quantified by looking for the strongest
variation of the explanation R(x) in the input domain

Explanation Continuity
2-layer ReLU network:

Explanation Selectivity
• Can be quantified by measuring how fast f(x) goes down when removing features with
the highest relevance scores
• “Pixel flipping” test for image data

Applications
• Model Validation Procedure
• Analysis of scientific data

Model Validation
• Traditional validation  use validation set (subset of the initial training data)
• Human interpretable results can add to basic validation procedures

Analysis of Scientific Data
• The context of interpretability and explanability can extended to domain specific
knowledge, useful for scientific inferences
• Atomistic simulation:

• Human Brain Pattern:

• DNA sequence:

• Human face analysis:

Summary so far
• ML model transparency:
• Interpreting the concepts learned by a model by building prototypes
• Explaining the model’s decisions by identifying the relevant input variables
• Crucial distinction between sensitivity analysis and decomposition approaches
• Evaluating Explanations
• Continuity
• Selectivity
• Applications

Interpretability Beyond Feature Attribution: Quantitative
Testing with Concept Activation Vectors (TCAV)
Authors:
• Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres
• Affiliation: Google Brain
Presented by:

Other Analysis
• TCAV extension: Relative TCAV
• Validating the learned CAVs:

TCAV for medical application
• Diabetic Retinopathy (DR) (level 0- 4)
• Validating the trained model with known concepts from the doctors

DNN Model Interpretability

More Related Content

What's hot (20)

Similar to DNN Model Interpretability (20)

More from Subhashis Hazarika (13)

Recently uploaded (20)

DNN Model Interpretability