SlideShare a Scribd company logo
2021년 1월 31일
딥러닝 논문읽기 모임
이미지 처리팀 : 김병현 박동훈 안종식 홍은기 허다운
Training data-Efficient Image transformer &
Distillation through Attention(DeiT)
Contents
Summary 01
03
02
04
05
Experience
Prerequisites
Method
Discussion
Summary
01
Summary of DeiT
01. Summary
1. 2020년 12월 발표, Facebook AI
2. ViT를 일부 발전시키고 Distillation 개념 도입
3. Contribution
- CNN을 사용하지 않은 Image Classification
- ImageNet만으로 학습
- Single 8-GPU Node로 2~3일정도만 학습
- SOTA CNN기반 Model과 비슷한 성능 확인
- Distillation 개념 도입
4. Conclusion
- CNN 기반 Architecture들은 다년간 연구가 진행되어 성능 향상
- Image Context Task에서 Transformer는 이제 막 연구되기 시작함
> 비슷한 성능을 보여준다는 점에서 Transformer의 가능성을 보여줌
Prerequisites
02
Vision Transformer & Knowledge Distillation
02. Prerequisites
1. Vision Transformer
- An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google
> 참조 : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - 홍은기
02. Prerequisites
1. Vision Transformer
- Training Dataset : JFT-300M
- Pre-train : Low Resolution, Fine-tunning : High Resolution
> Position Embedding : Bicubic Interpolation
02. Prerequisites
2. Knowledge Distillation
- 미리 잘 학습된 Teacher Model을 작은 Student Model에 지식을 전달한다는 개념
> 참조 : Explaining knowledge distillation by quantifying the knowledge - 김동희
Q & A
Architecture
03
Implement of DeiT
03. Architecture
1. Knowledge Distillation
- Class Token과 같은 구조의 Distillation Token 추가
- Soft Distillation
- Hard Distillation
- Random Crop으로 인한 잘못된 학습 방지 가능
GT : Cat / Prediction : Cat
GT : Cat / Prediction : ???
03. Architecture
2. Bag of Tricks
- 기본적으로, ViT 구조를 그대로 사용 (ViT-B = DeiT-B)
> 기본적인 학습 방법 동일
> Hyper parameter Tunning으로 성능 향상
Q & A
EXPERIMENTS
04
Experiment Result of DeiT
04. Experiments
1. Distillation
- Teacher Model : RegNetY-16GF
> ConvNet is Better than Transformer Model
“Probably” Inductive Bias !
- Distillation Comparison : Hard is Better
* Inductive Bias
- Distillation Method가 Convnet의 Inductive Bias를 더 잘 학습한다
04. Experiments
2. Efficiency vs Accuracy
- Parameter의 개수, 처리속도, Accuracy를 비교
> Throughput과 Accuracy로 비교하면, Convnet와 유사한 성능을 보인다
- Base Model : DeiT-B (= ViT-B)
3. Transfer Learning
- ImageNet으로 학습한 Pre-Train Model을 다른 데이터 Set으로 Test
Discussion
05
Conclusion & Discussing
05. Discussion
1. Contribution
1) Transformer 기반의 ViT Model의 성능 향상 (Convnet X)
2) ViT보다 더 적은 Dataset으로 학습 및 학습속도 향상
3) SOTA Convnet과 유사한 성능 확인
4) 간편한 Knowledge Distillation 방법 제안
2. Opinion
1) 여전히 많은 Epoch 필요 (300~500Epoch)
2) Transformer의 단점이 드러남
> Hyper Parameter에 민감
> Convnet대비 많은 Dataset과 Training 시간이 필요
> 연구단계에서는 많은 연구 가능, 현업에 적용하기에는 어려움
3) Deep Learning 개발 초기단계의 연구 방식
> Quantitative Research (Experiment  Theory)
> Experiment의 결과를 충분히 해석하지 못함
3. Conclusion
1) 아직 연구가 많이 필요한 분야
2) 연구 초기단계임에도 불구하고 CNN과 유사한 성능을 나타낸다는 것은
NLP에서의 변화처럼, CNN을 대체할 수 있을 가능성을 확인할 수 있음
Q & A
THANK YOU
for Watching

More Related Content

PPTX
Technetium (99mTc) sestamibi & Technetium (99mTc) tetrofosmin
PDF
PR-297: Training data-efficient image transformers & distillation through att...
PPT
Technique chart construction
PPTX
Parathyroid Scan
PPTX
Quantitative CT and MR Imaging of Liver
PPTX
Tumors of pancreas
PPTX
Imaging of the temporal bone
PPTX
Proton beam therapy
Technetium (99mTc) sestamibi & Technetium (99mTc) tetrofosmin
PR-297: Training data-efficient image transformers & distillation through att...
Technique chart construction
Parathyroid Scan
Quantitative CT and MR Imaging of Liver
Tumors of pancreas
Imaging of the temporal bone
Proton beam therapy

What's hot (20)

PDF
Transformer in Computer Vision
PPTX
Computed Tomography Dose Index
PDF
Implementation of an audit and dose reduction program for ct matyagin
PPTX
Imaging in pediatric Brain tumors: from basics to molecular diagnosis (Dr Fel...
PDF
larynx Imaging 1st part laryngeal anatomy CT MRI Dr Ahmed Esawy
PDF
Radiology procedure questions and answer 1
PPTX
ICRU reports 50 and 62
PPTX
Presentation1.pptx, radiological imaging of soft tissue masses of the hand an...
PPT
Quality assurance in ultrasound machines
PPTX
Sectional radio-anatomy of abdomen
PPTX
Dose reduction technique in ct scan
PPTX
Presentation1, radiological imaging of popliteal fossa masses.
PPT
Skull base imaging
PPTX
Icru 50,62,83 volume deliniation
PPTX
barium swallow EDIT.pptx
PPTX
pre sacral lesion sept5.pptx RADIOLOGY
PPTX
CT Enteroclysis
PPTX
Source transport
PDF
Body ct protocols
PPTX
Ct & mr enterography
Transformer in Computer Vision
Computed Tomography Dose Index
Implementation of an audit and dose reduction program for ct matyagin
Imaging in pediatric Brain tumors: from basics to molecular diagnosis (Dr Fel...
larynx Imaging 1st part laryngeal anatomy CT MRI Dr Ahmed Esawy
Radiology procedure questions and answer 1
ICRU reports 50 and 62
Presentation1.pptx, radiological imaging of soft tissue masses of the hand an...
Quality assurance in ultrasound machines
Sectional radio-anatomy of abdomen
Dose reduction technique in ct scan
Presentation1, radiological imaging of popliteal fossa masses.
Skull base imaging
Icru 50,62,83 volume deliniation
barium swallow EDIT.pptx
pre sacral lesion sept5.pptx RADIOLOGY
CT Enteroclysis
Source transport
Body ct protocols
Ct & mr enterography
Ad

Similar to Training data-efficient image transformers & distillation through attention (20)

PDF
Automl
PDF
AUTOML
PDF
TinyBERT
PDF
Summary in recent advances in deep learning for object detection
PDF
Summary in recent advances in deep learning for object detection
PDF
Pretrained summarization on distillation
PDF
델켐매거진5호
PPTX
Image net classification with deep convolutional neural networks
PDF
Deview deep learning-김정희
PDF
밑바닥부터 시작하는딥러닝 8장
PDF
기계 학습의 현재와 미래
PDF
"Learning transferable architectures for scalable image recognition" Paper Re...
PDF
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
PDF
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PDF
I3D and Kinetics datasets (Action Recognition)
PDF
딥러닝 논문읽기 efficient netv2 논문리뷰
PPTX
Convolutional Neural Networks
PPTX
Anomaly detection practive_using_deep_learning
PDF
Io17ex automl & autodraw
PDF
Deep learning의 이해
Automl
AUTOML
TinyBERT
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
Pretrained summarization on distillation
델켐매거진5호
Image net classification with deep convolutional neural networks
Deview deep learning-김정희
밑바닥부터 시작하는딥러닝 8장
기계 학습의 현재와 미래
"Learning transferable architectures for scalable image recognition" Paper Re...
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
I3D and Kinetics datasets (Action Recognition)
딥러닝 논문읽기 efficient netv2 논문리뷰
Convolutional Neural Networks
Anomaly detection practive_using_deep_learning
Io17ex automl & autodraw
Deep learning의 이해
Ad

More from taeseon ryu (20)

PDF
VoxelNet
PDF
OpineSum Entailment-based self-training for abstractive opinion summarization...
PPTX
3D Gaussian Splatting
PDF
JetsonTX2 Python
PPTX
Hyperbolic Image Embedding.pptx
PDF
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
PDF
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
PDF
YOLO V6
PDF
Dataset Distillation by Matching Training Trajectories
PDF
RL_UpsideDown
PDF
Packed Levitated Marker for Entity and Relation Extraction
PPTX
MOReL: Model-Based Offline Reinforcement Learning
PDF
Scaling Instruction-Finetuned Language Models
PDF
Visual prompt tuning
PDF
PDF
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
PDF
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
PDF
The Forward-Forward Algorithm
PPTX
Towards Robust and Reproducible Active Learning using Neural Networks
PDF
BRIO: Bringing Order to Abstractive Summarization
VoxelNet
OpineSum Entailment-based self-training for abstractive opinion summarization...
3D Gaussian Splatting
JetsonTX2 Python
Hyperbolic Image Embedding.pptx
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
YOLO V6
Dataset Distillation by Matching Training Trajectories
RL_UpsideDown
Packed Levitated Marker for Entity and Relation Extraction
MOReL: Model-Based Offline Reinforcement Learning
Scaling Instruction-Finetuned Language Models
Visual prompt tuning
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
The Forward-Forward Algorithm
Towards Robust and Reproducible Active Learning using Neural Networks
BRIO: Bringing Order to Abstractive Summarization

Training data-efficient image transformers & distillation through attention

  • 1. 2021년 1월 31일 딥러닝 논문읽기 모임 이미지 처리팀 : 김병현 박동훈 안종식 홍은기 허다운 Training data-Efficient Image transformer & Distillation through Attention(DeiT)
  • 4. 01. Summary 1. 2020년 12월 발표, Facebook AI 2. ViT를 일부 발전시키고 Distillation 개념 도입 3. Contribution - CNN을 사용하지 않은 Image Classification - ImageNet만으로 학습 - Single 8-GPU Node로 2~3일정도만 학습 - SOTA CNN기반 Model과 비슷한 성능 확인 - Distillation 개념 도입 4. Conclusion - CNN 기반 Architecture들은 다년간 연구가 진행되어 성능 향상 - Image Context Task에서 Transformer는 이제 막 연구되기 시작함 > 비슷한 성능을 보여준다는 점에서 Transformer의 가능성을 보여줌
  • 5. Prerequisites 02 Vision Transformer & Knowledge Distillation
  • 6. 02. Prerequisites 1. Vision Transformer - An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google > 참조 : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - 홍은기
  • 7. 02. Prerequisites 1. Vision Transformer - Training Dataset : JFT-300M - Pre-train : Low Resolution, Fine-tunning : High Resolution > Position Embedding : Bicubic Interpolation
  • 8. 02. Prerequisites 2. Knowledge Distillation - 미리 잘 학습된 Teacher Model을 작은 Student Model에 지식을 전달한다는 개념 > 참조 : Explaining knowledge distillation by quantifying the knowledge - 김동희
  • 11. 03. Architecture 1. Knowledge Distillation - Class Token과 같은 구조의 Distillation Token 추가 - Soft Distillation - Hard Distillation - Random Crop으로 인한 잘못된 학습 방지 가능 GT : Cat / Prediction : Cat GT : Cat / Prediction : ???
  • 12. 03. Architecture 2. Bag of Tricks - 기본적으로, ViT 구조를 그대로 사용 (ViT-B = DeiT-B) > 기본적인 학습 방법 동일 > Hyper parameter Tunning으로 성능 향상
  • 13. Q & A
  • 15. 04. Experiments 1. Distillation - Teacher Model : RegNetY-16GF > ConvNet is Better than Transformer Model “Probably” Inductive Bias ! - Distillation Comparison : Hard is Better * Inductive Bias - Distillation Method가 Convnet의 Inductive Bias를 더 잘 학습한다
  • 16. 04. Experiments 2. Efficiency vs Accuracy - Parameter의 개수, 처리속도, Accuracy를 비교 > Throughput과 Accuracy로 비교하면, Convnet와 유사한 성능을 보인다 - Base Model : DeiT-B (= ViT-B) 3. Transfer Learning - ImageNet으로 학습한 Pre-Train Model을 다른 데이터 Set으로 Test
  • 18. 05. Discussion 1. Contribution 1) Transformer 기반의 ViT Model의 성능 향상 (Convnet X) 2) ViT보다 더 적은 Dataset으로 학습 및 학습속도 향상 3) SOTA Convnet과 유사한 성능 확인 4) 간편한 Knowledge Distillation 방법 제안 2. Opinion 1) 여전히 많은 Epoch 필요 (300~500Epoch) 2) Transformer의 단점이 드러남 > Hyper Parameter에 민감 > Convnet대비 많은 Dataset과 Training 시간이 필요 > 연구단계에서는 많은 연구 가능, 현업에 적용하기에는 어려움 3) Deep Learning 개발 초기단계의 연구 방식 > Quantitative Research (Experiment  Theory) > Experiment의 결과를 충분히 해석하지 못함 3. Conclusion 1) 아직 연구가 많이 필요한 분야 2) 연구 초기단계임에도 불구하고 CNN과 유사한 성능을 나타낸다는 것은 NLP에서의 변화처럼, CNN을 대체할 수 있을 가능성을 확인할 수 있음
  • 19. Q & A