Lpixel論文読み会資料 "Interpretation of neural network is fragile"

Interpretation of
Neural Networks is
Fragile
1

• Interpretation of Neural Networks is Fragile
• Amirata Ghorbani, Abubakar Abid, James Zou
• Stanford University
• AAAI 2019
• arXiv:1710.10547
•
• DNN Adversarial Attack
•
•
• Adversarial Attack
• Attack Attack
2

•
 
•
 
•  
•
 
•
3

Adversarial Attacks
•
 
• panda → gibbon ( )
x
4

Adversarial Attacks for Against Feature Importance
• Attack
Attack
•
5

• Feature importance
•
• 3
• Simple gradient method 
 
• Integrated gradient DeepLIFT …
•  
• Sample Importance
•
• [Koh&Liang2017]
I S ∇x
7
Sl(xt) xt ∈ ℝd
l
zi zt

•
•
 
•
• clip
•
xt I(xt; 𝒩) xt + δ
I(xt + δ; 𝒩) D( ⋅ ) δ
8

• Feature Importance
• Top-k
•
• Targeted
•
• Mass-center
•
• N
9
xt D
xp
(p ∈ [1,N]) D(xt, xp
)

•
• Feature Importance Attack
• Attack
• Sample Importance Attack
• Attack
•
• Feature Importance
• ImageNet SqueezeNet
• CIFAR-10 (cf. Appendix A.)
•
•
• InceptionNet v3 ImageNet
• rose or sunﬂower
• 1000
• Validation acc. 97.5 %
•
•
•
•
• FI
• SI
• TopK
• FI 1000
• SI Top
• ※ Appendix E F Center Attack CenterShift
P = 100, α = 0.5
±ϵ
10

Feature Importance Targeted Attack
• 3 Targeted Attack
•
•
11

Feature importance Attack
• ImageNet 512
• Top 1000
•
• l∞
12

Sample importance Attack
• (a)
• (b) clip size
• (c) 2
ϵ l∞
ϵ
13

• 1
•
•
•
•
• g
• i feature
•
• feature
S(x; w)
∇xS(x + δ) − ∇xS(x) ≃ Hδ, x ∈ ℝd
, Hi,j =
∂S
∂xi∂xj
S(w ⋅ x)
∇xS = w x
S = g(w ⋅ x)
x → x + δ
δ w l1
14

• Attack
•
• Fig6 2 NN
•
•
15

•
 
•  
•
•  
•
16

Lpixel論文読み会資料 "Interpretation of neural network is fragile"

More Related Content

Recently uploaded (20)

Featured (20)

Lpixel論文読み会資料 "Interpretation of neural network is fragile"