Survey of Attention mechanism & Use in Computer Vision

SURVEY OF ATTENTION MECHANISM &
USE IN COMPUTER VISION
CMPE 297 – Emerging Technologies
Swati Ganesh Narkhede

INTRODUCTION TO ATTENTION
• First introduced in year 2014 for Machine Translation task.
• Became important part of neural network architectures for any
problems from problem areas of natural language processing,
computer vision, speech recognition, etc.
• Attention Mechanism focuses on the significant part of input that is
useful to functioning of a task while ignoring other parts.
• Functioning of human biological system in speech translation, image
captioning is similar to the notion behind attention mechanism.

USE OF ATTENTION TO OVERCOME
DRAWBACKS OF TRADITIONAL
ENCODER-DECODER

1. NUMBER OF SEQUENCES
• In Distinctive Attention models, the candidate state and query states
from encoder-decoder belongs to two distinct sequences of input and
output.
• Co-attention models have multiple input sequences at the same time.
Attention weights are learned based on all the input sequences. Co-
attention models can be used for image inputs.
• In Self Attention models, the candidate state and query state both
belongs to a same input sequence. Useful for recommendation and text
classification problems.

2. NUMBER OF ABSTRACTIONS
• The attention models having single level abstractions compute attention
weights just for the original input sequence.
• The multi level abstraction attention models apply attention on multiple
levels of abstraction on the input sequence.
• In this type of attention, the context vector of the lower abstraction level
becomes the query state for high level abstraction. Such models can be
classified further as top-down or bottom-up models.

3. NUMBER OF POSITIONS
• In Soft Attention models, context vector is computed using the weighted
average of all hidden stages of the input sequence.
• These type of models enable the neural network to efficiently learn from
backpropagation, however it leads to quadratic computational loss.
• In Hard Attention models, the context vector is built using hidden states
which are stochastically sampled in the input sequence.
• The Global Attention model is similar to the soft attention model
whereas the Local Attention model is midway between soft and hard
attention mechanisms.

4. NUMBER OF REPRESENTATIONS
• The Multi-Representational attention models determine different aspects
of the input sequence through multiple feature representations.
• In Multi-Dimensional attention, the weights are generated to determine
the relevance of each dimension of the input sequence.
• These models are used for natural language processing applications.

NETWORK ARCHITECTURES WITH
ATTENTION
1. Encoder-Decoder
• The ability of Attention models to separate the input representations from
output enables one to introduce hybrid encoder-decoders.
• This architecture is useful for image and video captioning, speech
recognition, etc.
2. Memory Networks
• For some applications like Chatbots, input to the network is knowledge
database and query, having some facts more relevant to the query than
others.
• For such problems, the end to end memory networks use array of memory
blocks to store the database of facts and use the attention models to
determine the relevance of fact to answer the query.

APPLICATIONS
• Natural Language Generation
• Classification
• Recommender System
• Computer Vision

STAND-ALONE SELF-ATTENTION
MODEL FOR COMPUTER VISION
• Convolutional Neural Networks (CNNs) is considered as a building block
of the computer vision architectures.
• Attention model is always used on top of other networks for computer
vision tasks.
• The fully stand-alone self-attention vision model was built by replacing
all instances of spatial convolutions from an existing convolutional
architecture with a form of self-attention applied to ResNet model and by
replacing the convolutional stem.

EXPERIMENTS PERFORMED USING STAND-
ALONE SELF-ATTENTION MODEL
a. ImageNet Classification
• The researchers performed experiment on ImageNet Classification task
containing 1.28 million training images and 50000 test images.
• They replaced spatial convolutional layer with a self-attention layer and
used position aware attention stem.
• The attention models outperform the baseline across all depths.

COMPARISON OF RESULTS FOR
IMAGENET EXPERIMENT

b. COCO Object Detection:
• The standalone self-attention model was evaluated on COCO Object
Detection task using RetinaNet Architecture.
• The researchers used attention-based backbone in RetinaNet.
• Fully self-attention model performed efficiently across all vision tasks.

COMPARISON OF RESULTS FOR COCO
OBJECT DETECTION EXPERIMENT

REVIEW ABOUT THE SURVEY PAPER
a) Overall quality: Authors have provided sufficient information to
Attention models and their application in Deep Learning. They have
provided summary of the key papers based on Attention Models which is
insightful.
b) Critique of the paper: Ablation study is missing in the survey paper.
c) What can be done to improve the (b): Ablation study can be added to
this survey paper to improve.
d) Future directions and suggestions: In future work, authors can
provide information about application of Attention Models in Computer
Vision domain.

REFERENCES
1. https://arxiv.org/pdf/1904.02874.pdf
2. http://papers.nips.cc/paper/8302-stand-alone-self-attention-in-vision-
models.pdf

Survey of Attention mechanism & Use in Computer Vision

More Related Content

What's hot (20)

Similar to Survey of Attention mechanism & Use in Computer Vision (20)

Recently uploaded (20)

Survey of Attention mechanism & Use in Computer Vision