Explaining and harnessing adversarial examples (2015)

Explaining and
Harnessing
Adversarial Examples
(2015)
Ian J. Goodfellow, Jonathon Shlens,
Christian Szegedy
@mikibear_ 논문 정리 170118

핵심,
ADVERSARIAL EXAMPLES
적대적 예제...?

Intriguing properties of neural networks
(2013, Christian Szegedy at el.)

분류를 시행하는 뉴럴 네트워크 모델 아무거나 하나를 생각해봅시다
ex) Alexnet, VGG, ResNet, Inception...

이 논문에서는 다음과 같은 상황을 가능케 하는 예시
(Examples)를 제시합니다
분류 모델 : "이것은 '확실히' 개다"
(위와 같은)
분류 모델 : "이것은 '확실히' 개가 아니다"

Explaining and harnessing adversarial examples (2015)

더해지는
노이즈
제대로 분류되는
원본 사진
높은 confidence로
오분류되는 사진

네, 사실 두 사진은 다른 사진입니다.
'정상적으로 분류되는 사진'에
'적절한 작은 노이즈'를 더하면
'명백하게 오분류하는 사진'을 만들 수
있다…
는 것이 본 논문의 요지입니다.

근데 이 사진 두 개가 정말 달라보이나요?
“개” (99%)
"타조" (99%)

근데 이 사진 두 개가 정말 달라보이나요?
“개” (99%)
"타조" (99%)
아뇨

그럼 이런 노이즈를 어떻게
적절하게 찾을 수 있나요?

너무나 흔한 Optimization Problem...

너무나 흔한 Optimization Problem...
분류 모델의
Black-box 함수
원본 이미지
노이즈 라벨

1) 이미지에 노이즈를 더했더니
오분류를 일으키는 노이즈를 찾되,
2) Norm이 가장 작은 것을 찾아야 한다
정리하면...

그 다음엔 원하는 Optimization 기법을
걸면 됩니다. 논문에서는 L-BFGS를
걸고 있습니다.
근데 이 문제 Non-convex라네요

Explaining and Harnessing Adversarial Examples
(2015, Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy)
다시 처음으로 돌아와서요...

‘좀 더 편하게
이런 Adversarial Example를
찾을 순 없을까?’

Linear Model에서...
얘를 가능한 작게 키워서
Decision Boundary를 크게
넘기는 term을 찾으면…
(따라서 모델의 입력이
high-dimension일수록
이러한 예제를 찾기
쉬워집니다.)

Non-linear model에 linear한 노이즈 때려박아
Adversarial Example 만들기
"The linear view of adversarial
examples suggests a fast way of
generating them. We hypothesize that
neural networks are too linear to resist
linear adversarial perturbation."

"neural networks
are too linear"

그러니까…
Non-linear하다고 받아들여지는 모델에
만약 linear perturbation을 넣어서 그 모델이
깨진다면, 그 모델은 충분히 linear하다고 볼
수 있다…
이런 말입니다.

Non-linear model에 linear한 노이즈 때려박아
Adversarial Example 만들기
Backpropagation으로 너무나 쉽게 구할 수
있는
Gradient

Google
LeNet
VS
Linear Perturbations

첫번째 의문,
'이렇게 얻어진 Adversarial Example을
갖다가 다시 모델에 학습시키면 어떨까?'

결론,
'Adversarial Example 자체에
효과가 어느 정도 있을 뿐만 아니라,
Model Generalization에 효과도 있다.
심지어 그 성능이 Dropout보다도 낫다.'

(사견) 하지만,
몇몇 다른 논문들을 보면 효과가 없을 때도
있고, 심지어 또 다른 Adversarial
Example에 노출된다고 하니 좀 경계해야
하는 부분 같습니다.

두번째 의문,
'그러면 좀 더 Non-linear한 RBF
network는 어떤가?'

역시나,
좀 더 원본 데이터와 뚜렷하게 차이가 납니다.
즉, Adversarial Example에 좀 더 robust하죠.

세번째 의문,
'Ensembel 기법을 쓰면 좀 낫지 않을까?' -> 안 낫다네요.
네번째 의문,
'인풋에 일괄적으로 distortion을 걸면서 학습을 시키면 좀 낫지
않을까?' -> 안 낫다네요.

논문의 결론,
1) Universal approximation theorem이
적용되는 현존하는 모든 모델은
Adversarial Example을 막기엔 너무
Linear하다
2) 근데 Adversarial Example로 모델을
학습시키면 좀 낫다

References,
1) Intriguing properties of neural networks
https://arxiv.org/abs/1312.6199
2) Explaining and Harnessing Adversarial Examples
https://arxiv.org/abs/1412.6572
3) Adversarial Examples
http://www.iro.umontreal.ca/~memisevr/dlss2015/goo
dfellow_adv.pdf

틀린 내용이 있거나 중요한데 빠져있는 경우 알려주세요!
@mikibear

Explaining and harnessing adversarial examples (2015)

More Related Content

Viewers also liked (20)

Similar to Explaining and harnessing adversarial examples (2015) (20)

Explaining and harnessing adversarial examples (2015)