SlideShare a Scribd company logo
2
Most read
14
Most read
18
Most read
SURVEY OF ATTENTION MECHANISM &
USE IN COMPUTER VISION
CMPE 297 – Emerging Technologies
Swati Ganesh Narkhede
INTRODUCTION TO ATTENTION
• First introduced in year 2014 for Machine Translation task.
• Became important part of neural network architectures for any
problems from problem areas of natural language processing,
computer vision, speech recognition, etc.
• Attention Mechanism focuses on the significant part of input that is
useful to functioning of a task while ignoring other parts.
• Functioning of human biological system in speech translation, image
captioning is similar to the notion behind attention mechanism.
USE OF ATTENTION TO OVERCOME
DRAWBACKS OF TRADITIONAL
ENCODER-DECODER
CATEGORIES OF ATTENTION
1. NUMBER OF SEQUENCES
• In Distinctive Attention models, the candidate state and query states
from encoder-decoder belongs to two distinct sequences of input and
output.
• Co-attention models have multiple input sequences at the same time.
Attention weights are learned based on all the input sequences. Co-
attention models can be used for image inputs.
• In Self Attention models, the candidate state and query state both
belongs to a same input sequence. Useful for recommendation and text
classification problems.
2. NUMBER OF ABSTRACTIONS
• The attention models having single level abstractions compute attention
weights just for the original input sequence.
• The multi level abstraction attention models apply attention on multiple
levels of abstraction on the input sequence.
• In this type of attention, the context vector of the lower abstraction level
becomes the query state for high level abstraction. Such models can be
classified further as top-down or bottom-up models.
3. NUMBER OF POSITIONS
• In Soft Attention models, context vector is computed using the weighted
average of all hidden stages of the input sequence.
• These type of models enable the neural network to efficiently learn from
backpropagation, however it leads to quadratic computational loss.
• In Hard Attention models, the context vector is built using hidden states
which are stochastically sampled in the input sequence.
• The Global Attention model is similar to the soft attention model
whereas the Local Attention model is midway between soft and hard
attention mechanisms.
4. NUMBER OF REPRESENTATIONS
• The Multi-Representational attention models determine different aspects
of the input sequence through multiple feature representations.
• In Multi-Dimensional attention, the weights are generated to determine
the relevance of each dimension of the input sequence.
• These models are used for natural language processing applications.
NETWORK ARCHITECTURES WITH
ATTENTION
1. Encoder-Decoder
• The ability of Attention models to separate the input representations from
output enables one to introduce hybrid encoder-decoders.
• This architecture is useful for image and video captioning, speech
recognition, etc.
2. Memory Networks
• For some applications like Chatbots, input to the network is knowledge
database and query, having some facts more relevant to the query than
others.
• For such problems, the end to end memory networks use array of memory
blocks to store the database of facts and use the attention models to
determine the relevance of fact to answer the query.
APPLICATIONS
• Natural Language Generation
• Classification
• Recommender System
• Computer Vision
STAND-ALONE SELF-ATTENTION
MODEL FOR COMPUTER VISION
• Convolutional Neural Networks (CNNs) is considered as a building block
of the computer vision architectures.
• Attention model is always used on top of other networks for computer
vision tasks.
• The fully stand-alone self-attention vision model was built by replacing
all instances of spatial convolutions from an existing convolutional
architecture with a form of self-attention applied to ResNet model and by
replacing the convolutional stem.
EXPERIMENTS PERFORMED USING STAND-
ALONE SELF-ATTENTION MODEL
a. ImageNet Classification
• The researchers performed experiment on ImageNet Classification task
containing 1.28 million training images and 50000 test images.
• They replaced spatial convolutional layer with a self-attention layer and
used position aware attention stem.
• The attention models outperform the baseline across all depths.
COMPARISON OF RESULTS FOR
IMAGENET EXPERIMENT
b. COCO Object Detection:
• The standalone self-attention model was evaluated on COCO Object
Detection task using RetinaNet Architecture.
• The researchers used attention-based backbone in RetinaNet.
• Fully self-attention model performed efficiently across all vision tasks.
COMPARISON OF RESULTS FOR COCO
OBJECT DETECTION EXPERIMENT
REVIEW ABOUT THE SURVEY PAPER
a) Overall quality: Authors have provided sufficient information to
Attention models and their application in Deep Learning. They have
provided summary of the key papers based on Attention Models which is
insightful.
b) Critique of the paper: Ablation study is missing in the survey paper.
c) What can be done to improve the (b): Ablation study can be added to
this survey paper to improve.
d) Future directions and suggestions: In future work, authors can
provide information about application of Attention Models in Computer
Vision domain.
REFERENCES
1. https://arxiv.org/pdf/1904.02874.pdf
2. http://papers.nips.cc/paper/8302-stand-alone-self-attention-in-vision-
models.pdf
THANK YOU

More Related Content

PPTX
Introduction to CNN
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
PDF
Introduction to Recurrent Neural Network
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PDF
Recurrent Neural Networks
PDF
Backpropagation in RNN and LSTM
PDF
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Introduction to CNN
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Introduction For seq2seq(sequence to sequence) and RNN
Introduction to Recurrent Neural Network
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Networks
Backpropagation in RNN and LSTM
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)

What's hot (20)

PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PDF
PPTX
Image classification with Deep Neural Networks
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
Convolutional Neural Networks (CNN)
PPTX
PDF
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PDF
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
PDF
Attention is All You Need (Transformer)
PPTX
Data Augmentation
PDF
Single Image Super Resolution Overview
PPTX
Emotion recognition and drowsiness detection using python.ppt
PPTX
Survey of Attention mechanism
PPTX
Recurrent neural network
PPTX
Recurrent Neural Network
PPTX
U-Net (1).pptx
PDF
Deep Learning - Convolutional Neural Networks
PPTX
U-Netpresentation.pptx
PPTX
Deep Learning Explained
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Image classification with Deep Neural Networks
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Networks (CNN)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Transfer Learning and Fine-tuning Deep Neural Networks
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Attention is All You Need (Transformer)
Data Augmentation
Single Image Super Resolution Overview
Emotion recognition and drowsiness detection using python.ppt
Survey of Attention mechanism
Recurrent neural network
Recurrent Neural Network
U-Net (1).pptx
Deep Learning - Convolutional Neural Networks
U-Netpresentation.pptx
Deep Learning Explained
Ad

Similar to Survey of Attention mechanism & Use in Computer Vision (20)

PDF
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
PPTX
Presentation_Conversion of Sign language to text.pptx
PDF
Deep learning for real life applications
PPTX
How can pre-training help to solve the cold start problem?
PDF
PR-355: Masked Autoencoders Are Scalable Vision Learners
PPTX
Handwritten Digit Recognition and performance of various modelsation[autosaved]
PPTX
04 Deep CNN (Ch_01 to Ch_3).pptx
PPTX
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
PDF
Transformer models for FER
PPTX
Image Segmentation Using Deep Learning : A survey
PPTX
Unit one ppt of deeep learning which includes Ann cnn
PPT
Ukd2008 18-9-08 andrea
PDF
VAEs for multimodal disentanglement
PPTX
Biometric Recognition using Deep Learning
PDF
DaViT.pdf
PDF
attention is all you need.pdf attention is all you need.pdfattention is all y...
DOC
Table of Contents
PPTX
Perception.JS - A Framework for Context Acquisition Processing and Presentation
PPTX
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
PPTX
tech_seminar_ppt on vision transformers.pptx
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
Presentation_Conversion of Sign language to text.pptx
Deep learning for real life applications
How can pre-training help to solve the cold start problem?
PR-355: Masked Autoencoders Are Scalable Vision Learners
Handwritten Digit Recognition and performance of various modelsation[autosaved]
04 Deep CNN (Ch_01 to Ch_3).pptx
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
Transformer models for FER
Image Segmentation Using Deep Learning : A survey
Unit one ppt of deeep learning which includes Ann cnn
Ukd2008 18-9-08 andrea
VAEs for multimodal disentanglement
Biometric Recognition using Deep Learning
DaViT.pdf
attention is all you need.pdf attention is all you need.pdfattention is all y...
Table of Contents
Perception.JS - A Framework for Context Acquisition Processing and Presentation
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
tech_seminar_ppt on vision transformers.pptx
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
ISS -ESG Data flows What is ESG and HowHow
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
[EN] Industrial Machine Downtime Prediction
Fluorescence-microscope_Botany_detailed content

Survey of Attention mechanism & Use in Computer Vision

  • 1. SURVEY OF ATTENTION MECHANISM & USE IN COMPUTER VISION CMPE 297 – Emerging Technologies Swati Ganesh Narkhede
  • 2. INTRODUCTION TO ATTENTION • First introduced in year 2014 for Machine Translation task. • Became important part of neural network architectures for any problems from problem areas of natural language processing, computer vision, speech recognition, etc. • Attention Mechanism focuses on the significant part of input that is useful to functioning of a task while ignoring other parts. • Functioning of human biological system in speech translation, image captioning is similar to the notion behind attention mechanism.
  • 3. USE OF ATTENTION TO OVERCOME DRAWBACKS OF TRADITIONAL ENCODER-DECODER
  • 5. 1. NUMBER OF SEQUENCES • In Distinctive Attention models, the candidate state and query states from encoder-decoder belongs to two distinct sequences of input and output. • Co-attention models have multiple input sequences at the same time. Attention weights are learned based on all the input sequences. Co- attention models can be used for image inputs. • In Self Attention models, the candidate state and query state both belongs to a same input sequence. Useful for recommendation and text classification problems.
  • 6. 2. NUMBER OF ABSTRACTIONS • The attention models having single level abstractions compute attention weights just for the original input sequence. • The multi level abstraction attention models apply attention on multiple levels of abstraction on the input sequence. • In this type of attention, the context vector of the lower abstraction level becomes the query state for high level abstraction. Such models can be classified further as top-down or bottom-up models.
  • 7. 3. NUMBER OF POSITIONS • In Soft Attention models, context vector is computed using the weighted average of all hidden stages of the input sequence. • These type of models enable the neural network to efficiently learn from backpropagation, however it leads to quadratic computational loss. • In Hard Attention models, the context vector is built using hidden states which are stochastically sampled in the input sequence. • The Global Attention model is similar to the soft attention model whereas the Local Attention model is midway between soft and hard attention mechanisms.
  • 8. 4. NUMBER OF REPRESENTATIONS • The Multi-Representational attention models determine different aspects of the input sequence through multiple feature representations. • In Multi-Dimensional attention, the weights are generated to determine the relevance of each dimension of the input sequence. • These models are used for natural language processing applications.
  • 9. NETWORK ARCHITECTURES WITH ATTENTION 1. Encoder-Decoder • The ability of Attention models to separate the input representations from output enables one to introduce hybrid encoder-decoders. • This architecture is useful for image and video captioning, speech recognition, etc. 2. Memory Networks • For some applications like Chatbots, input to the network is knowledge database and query, having some facts more relevant to the query than others. • For such problems, the end to end memory networks use array of memory blocks to store the database of facts and use the attention models to determine the relevance of fact to answer the query.
  • 10. APPLICATIONS • Natural Language Generation • Classification • Recommender System • Computer Vision
  • 11. STAND-ALONE SELF-ATTENTION MODEL FOR COMPUTER VISION • Convolutional Neural Networks (CNNs) is considered as a building block of the computer vision architectures. • Attention model is always used on top of other networks for computer vision tasks. • The fully stand-alone self-attention vision model was built by replacing all instances of spatial convolutions from an existing convolutional architecture with a form of self-attention applied to ResNet model and by replacing the convolutional stem.
  • 12. EXPERIMENTS PERFORMED USING STAND- ALONE SELF-ATTENTION MODEL a. ImageNet Classification • The researchers performed experiment on ImageNet Classification task containing 1.28 million training images and 50000 test images. • They replaced spatial convolutional layer with a self-attention layer and used position aware attention stem. • The attention models outperform the baseline across all depths.
  • 13. COMPARISON OF RESULTS FOR IMAGENET EXPERIMENT
  • 14. b. COCO Object Detection: • The standalone self-attention model was evaluated on COCO Object Detection task using RetinaNet Architecture. • The researchers used attention-based backbone in RetinaNet. • Fully self-attention model performed efficiently across all vision tasks.
  • 15. COMPARISON OF RESULTS FOR COCO OBJECT DETECTION EXPERIMENT
  • 16. REVIEW ABOUT THE SURVEY PAPER a) Overall quality: Authors have provided sufficient information to Attention models and their application in Deep Learning. They have provided summary of the key papers based on Attention Models which is insightful. b) Critique of the paper: Ablation study is missing in the survey paper. c) What can be done to improve the (b): Ablation study can be added to this survey paper to improve. d) Future directions and suggestions: In future work, authors can provide information about application of Attention Models in Computer Vision domain.