SlideShare a Scribd company logo
Visual Attention
Zahra Sadeghi
1
Overview
• Attention
• Visual saliency
• Bottom-up attention
• Koch-Ulman framework
• Visual Attention in brain
• Coarse to Fine theory
• Top-Down Facilitation
• Comparing Attentional Neural Network with human behavior
2
Attention
• Attention implements an information-processing bottleneck that
allows only a small part of the incoming sensory information to reach
short-term memory and visual awareness.
• key challenge is to select which impressions are relevant and which
inputs should be ignored.
• This process of selecting a subset of the input, and ignoring the rest, is
referred to as attention
• bottom-up and top-down attention, or stimulus-driven and goal-
oriented attention
3
Visual saliency
• At a pre-attentive stage some parts
of the scene may pop out.
• Visual saliency refers to the idea that
certain parts of a scene are pre-
attentively distinctive and create
some form of immediate significant
visual arousal
• how can a machine vision system
extract the salient regions from an
unknown background?
4
1. low level feature
extraction
2. Saliency map
creation
3. Winner-Take-All
(WTA)
4. Inhibition of Return
(IoR)
5. Top-down attentional
bias
Flow diagram of a typical model for the control of attention
5
Feature extraction








0
0
0
2
/
)
(
R
R
b
g
r
R








0
0
0
2
/
)
(
G
G
b
r
g
G








0
0
0
2
/
)
(
B
B
g
r
b
B










0
0
0
2
/
2
/
)
(
Y
Y
b
g
r
g
r
Y
3
/
)
( b
g
r
I 
















cos
sin
sin
cos
)
2
cos(
)
2
exp(
)
,
,
,
,
;
,
(
'
'
'
2
2
'
2
2
'
y
x
y
y
x
x
x
y
x
y
x
g









Orientation feature Map: using Gabor filters with four
orientations (0,45,90,135)
3
/
)
( b
g
r
I 


6
Original image R G B
Y I O (0) O (45)
O (90) O (135) 7
R
Y
O(0)
8
9
Saliency map construction
1- Cross-scaling sum on all created feature channels
))
(
),
(
),
(
),
(
(
))
(
(
))
(
),
(
),
(
),
(
(
4
3
2
1
3
1
3
1
3
1
s
S
s
S
s
S
s
S
S
s
S
S
s
S
s
S
s
S
s
S
S
O
O
O
O
s
O
I
s
I
Y
B
G
R
s
c









3- Saliency maps are then smoothed
with Gaussian filter
O
o
I
i
C
c S
W
S
W
S
W
S *
*
* 


2- Integrated saliency map
c
S I
S
O
S
10
Segmentation
• Threshold segmentation (the saliency map is converted into a binary
image using a threshold)
)
(
)
(
0
)
(
1
)
(
sa
E
threshold
threshold
x
sa
threshold
x
sa
x
bm







}
)
(
{ 



 A
B
z
B
A z
dilation erosion 11
• The ventral (’what’) stream processes visual
shape appearance and is largely responsible
for object recognition.
• The dorsal (’where’) stream encodes spatial
locations and processes motion information.
• Bottom-up information that can guide
attention propagates thus from the visual
cortex to the PFC.
• PFC areas can provide top-down signals to
control attention to some degree
How does the brain process attention?
12
• coarse, low spatial frequency (LSF)
information is processed first
• quickly projects from primary visual
cortex to higher level visual areas (PFC, OFC)
• Psychophysical and single-unit recordings in monkeys
indicate that low spatial frequencies are extracted from
scenes earlier than high spatial frequencies
13
14
• We trained a 3 layer deep belief
network and performed an
unsupervised learning scheme
on the obtained deep
representations.
Developmental learning in DNNs: Fine to coarse development
• There’s a progression in
depth in hidden layers of
DBN where low level layers
represent finer distinctions
and high level layers
represent coarser
distinctions
Sadeghi, Zahra. "Deep learning and developmental learning: emergence of fine-to-coarse conceptual categories at layers of deep belief
network." Perception 45.9 (2016): 1036-1045.
• Input to the visual system is often noisy and ambiguous
• a growing body of theoretical work and empirical evidence support the idea
that visual recognition is facilitated by top-down expectations
• Context facilitates the recognition of related objects even if these objects are
ambiguous when seen in isolation
• an ambiguous object becomes recognizable if another object that shares the
same context is placed in an appropriate spatial relation to it.
15
Top-down processing contribution
+ +
inconsistent case consistent case
Effect of context in occluded object recognition
500 ms 500 ms
'Type the name of the object and then press enter’
300 ms
300 ms
16
Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
+ +
+
+
Consistent setting inconsistent setting
Easy case
17
18
Hit const vs
hit inconst
Miss const vs
miss inconst
Sup hit const vs
sup hit inconst
Sup miss const vs
sup miss inconst
Hypo_pos1 vs
hypo_neg1
Hypo_pos2 vs
hypo_neg2
Resp-time
const vs
inconst
p-val 0.0027 0.0027 0.0027 0.0027 0.0027 0.0027 4.6921e-11
Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
Clickme.ai experiment
• Collect human feature
importance map for
objects
19
Global-And-Local-Attention (GALA)
• Global-and-Local-attention (GALA) network extends the squeeze-and-
excitation (SE) network by adding a local saliency module.
• The attention mechanism is embedded in the cost function as a
regularization term
20
• three cases are considered:
1- networks trained on color images and tested on color images.
2- networks trained on grayscale image and tested on grayscale images.
3- networks trained on color images and tested on grayscale image.
the best performance in both color and grayscale
cases is achieved by gala click, while gala no click
and no gala no click obtained second and third best
results respectively.
the highest accuracy for all models is attributed to
the case in which images are trained on colorful
images and tested on colorful images.
21
Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object
Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates,
March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
• to test the effect of importance maps
collected in clickme.ai experiment, a rapid
object recognition experiment was
designed
• The dataset contains 100 images from
animal and non-animal.
• Phase scrambled masks are applied to
images
• eleven versions of each image ordered
ascendingly based on their level of pixel
revelation of important pixels.
22
Model and human performance
• Average accuracy of the two gala models
(gala-click and gala no-click) and ResNet-50
model (no-gala-no-click) is compared on
the behavioral test images at different
levels of pixel revelation.
• gala click and gala no click models
achieved similar accuracy.
• gala click model produces superior results
compared to all other models in full pixel
revelation.
• The second best performance in full level,
is achieved by gala-no-click model.
23
Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object
Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates,
March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
• Human visual attention is well-studied
• while there exist different models, they lack computational efficacy of
our visual system
• Attention Mechanisms in Neural Networks are still loosely based on
the visual attention mechanism found in humans.
24
Thanks for your attention 
25

More Related Content

PPTX
Face recognition using artificial neural network
PDF
introduction to deeplearning
PDF
最近の研究情勢についていくために - Deep Learningを中心に -
PPTX
Deep Learning Training at Intel
PDF
alexVAE_New.pdf
PPTX
Algorithms that mimic the human brain (1)
PPTX
Algorithms that mimic the human brain
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
Face recognition using artificial neural network
introduction to deeplearning
最近の研究情勢についていくために - Deep Learningを中心に -
Deep Learning Training at Intel
alexVAE_New.pdf
Algorithms that mimic the human brain (1)
Algorithms that mimic the human brain
Deep Learning - The Past, Present and Future of Artificial Intelligence

Similar to Attention mechanism in brain and deep neural network (20)

PDF
Top object detection algorithms in deep neural networks
PDF
A Deep Belief Network Approach to Learning Depth from Optical Flow
PDF
Learning where to look: focus and attention in deep vision
PDF
Deep learning for pose-invariant face detection in unconstrained environment
PDF
GPU Computing for Cognitive Robotics
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
Recent advances of AI for medical imaging : Engineering perspectives
PDF
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
PDF
Dynamic routing between capsules - A brief presentation
PPTX
Neural Networks and Deep Learning: An Intro
PDF
A Survey of Deep Learning Algorithms for Malware Detection
PPTX
Introduction to Deep learning
PDF
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
PDF
20141003.journal club
PPTX
Scene recognition using Convolutional Neural Network
PDF
Deborah_Sandoval
PPT
Fcv bio cv_cottrell
PPT
Fcv bio cv_cottrell
PPTX
Facial emotion detection on babies' emotional face using Deep Learning.
PPT
one shot15729752 Deep Learning for AI and DS
Top object detection algorithms in deep neural networks
A Deep Belief Network Approach to Learning Depth from Optical Flow
Learning where to look: focus and attention in deep vision
Deep learning for pose-invariant face detection in unconstrained environment
GPU Computing for Cognitive Robotics
IRJET- Real-Time Object Detection using Deep Learning: A Survey
Recent advances of AI for medical imaging : Engineering perspectives
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Dynamic routing between capsules - A brief presentation
Neural Networks and Deep Learning: An Intro
A Survey of Deep Learning Algorithms for Malware Detection
Introduction to Deep learning
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
20141003.journal club
Scene recognition using Convolutional Neural Network
Deborah_Sandoval
Fcv bio cv_cottrell
Fcv bio cv_cottrell
Facial emotion detection on babies' emotional face using Deep Learning.
one shot15729752 Deep Learning for AI and DS
Ad

More from Zahra Sadeghi (20)

PDF
cross-cutting structure for semantic representation
PDF
Maritime Anomaly Detection
PDF
Quality Assurance in Modern Software Development
PDF
Perception, representation, structure, and recognition
PDF
An introduction to Autonomous mobile robots
PDF
Bluetooth Technoloty
PDF
Self Organization Map
PDF
A survey on ant colony clustering papers
PDF
Pittssburgh approach
PDF
Cerebellar Model Articulation Controller
PDF
Semantic Search with Semantic Web
PDF
Interval programming
PDF
16-bit microprocessors
PDF
Logic converter
PDF
Ms dos boot process
PDF
An Introduction to threads
PDF
An intoroduction to Multimedia
PDF
Penalty function
PDF
Neural networks
PDF
Parametric and non parametric classifiers
cross-cutting structure for semantic representation
Maritime Anomaly Detection
Quality Assurance in Modern Software Development
Perception, representation, structure, and recognition
An introduction to Autonomous mobile robots
Bluetooth Technoloty
Self Organization Map
A survey on ant colony clustering papers
Pittssburgh approach
Cerebellar Model Articulation Controller
Semantic Search with Semantic Web
Interval programming
16-bit microprocessors
Logic converter
Ms dos boot process
An Introduction to threads
An intoroduction to Multimedia
Penalty function
Neural networks
Parametric and non parametric classifiers
Ad

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Transcultural that can help you someday.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Managing Community Partner Relationships
PPT
Predictive modeling basics in data cleaning process
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPT
Quality review (1)_presentation of this 21
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Transcultural that can help you someday.
Miokarditis (Inflamasi pada Otot Jantung)
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
Managing Community Partner Relationships
Predictive modeling basics in data cleaning process
ISS -ESG Data flows What is ESG and HowHow
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Quality review (1)_presentation of this 21
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Database Infoormation System (DBIS).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

Attention mechanism in brain and deep neural network

  • 2. Overview • Attention • Visual saliency • Bottom-up attention • Koch-Ulman framework • Visual Attention in brain • Coarse to Fine theory • Top-Down Facilitation • Comparing Attentional Neural Network with human behavior 2
  • 3. Attention • Attention implements an information-processing bottleneck that allows only a small part of the incoming sensory information to reach short-term memory and visual awareness. • key challenge is to select which impressions are relevant and which inputs should be ignored. • This process of selecting a subset of the input, and ignoring the rest, is referred to as attention • bottom-up and top-down attention, or stimulus-driven and goal- oriented attention 3
  • 4. Visual saliency • At a pre-attentive stage some parts of the scene may pop out. • Visual saliency refers to the idea that certain parts of a scene are pre- attentively distinctive and create some form of immediate significant visual arousal • how can a machine vision system extract the salient regions from an unknown background? 4
  • 5. 1. low level feature extraction 2. Saliency map creation 3. Winner-Take-All (WTA) 4. Inhibition of Return (IoR) 5. Top-down attentional bias Flow diagram of a typical model for the control of attention 5
  • 6. Feature extraction         0 0 0 2 / ) ( R R b g r R         0 0 0 2 / ) ( G G b r g G         0 0 0 2 / ) ( B B g r b B           0 0 0 2 / 2 / ) ( Y Y b g r g r Y 3 / ) ( b g r I                  cos sin sin cos ) 2 cos( ) 2 exp( ) , , , , ; , ( ' ' ' 2 2 ' 2 2 ' y x y y x x x y x y x g          Orientation feature Map: using Gabor filters with four orientations (0,45,90,135) 3 / ) ( b g r I    6
  • 7. Original image R G B Y I O (0) O (45) O (90) O (135) 7
  • 9. 9
  • 10. Saliency map construction 1- Cross-scaling sum on all created feature channels )) ( ), ( ), ( ), ( ( )) ( ( )) ( ), ( ), ( ), ( ( 4 3 2 1 3 1 3 1 3 1 s S s S s S s S S s S S s S s S s S s S S O O O O s O I s I Y B G R s c          3- Saliency maps are then smoothed with Gaussian filter O o I i C c S W S W S W S * * *    2- Integrated saliency map c S I S O S 10
  • 11. Segmentation • Threshold segmentation (the saliency map is converted into a binary image using a threshold) ) ( ) ( 0 ) ( 1 ) ( sa E threshold threshold x sa threshold x sa x bm        } ) ( {      A B z B A z dilation erosion 11
  • 12. • The ventral (’what’) stream processes visual shape appearance and is largely responsible for object recognition. • The dorsal (’where’) stream encodes spatial locations and processes motion information. • Bottom-up information that can guide attention propagates thus from the visual cortex to the PFC. • PFC areas can provide top-down signals to control attention to some degree How does the brain process attention? 12
  • 13. • coarse, low spatial frequency (LSF) information is processed first • quickly projects from primary visual cortex to higher level visual areas (PFC, OFC) • Psychophysical and single-unit recordings in monkeys indicate that low spatial frequencies are extracted from scenes earlier than high spatial frequencies 13
  • 14. 14 • We trained a 3 layer deep belief network and performed an unsupervised learning scheme on the obtained deep representations. Developmental learning in DNNs: Fine to coarse development • There’s a progression in depth in hidden layers of DBN where low level layers represent finer distinctions and high level layers represent coarser distinctions Sadeghi, Zahra. "Deep learning and developmental learning: emergence of fine-to-coarse conceptual categories at layers of deep belief network." Perception 45.9 (2016): 1036-1045.
  • 15. • Input to the visual system is often noisy and ambiguous • a growing body of theoretical work and empirical evidence support the idea that visual recognition is facilitated by top-down expectations • Context facilitates the recognition of related objects even if these objects are ambiguous when seen in isolation • an ambiguous object becomes recognizable if another object that shares the same context is placed in an appropriate spatial relation to it. 15 Top-down processing contribution
  • 16. + + inconsistent case consistent case Effect of context in occluded object recognition 500 ms 500 ms 'Type the name of the object and then press enter’ 300 ms 300 ms 16 Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
  • 17. + + + + Consistent setting inconsistent setting Easy case 17
  • 18. 18 Hit const vs hit inconst Miss const vs miss inconst Sup hit const vs sup hit inconst Sup miss const vs sup miss inconst Hypo_pos1 vs hypo_neg1 Hypo_pos2 vs hypo_neg2 Resp-time const vs inconst p-val 0.0027 0.0027 0.0027 0.0027 0.0027 0.0027 4.6921e-11 Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
  • 19. Clickme.ai experiment • Collect human feature importance map for objects 19
  • 20. Global-And-Local-Attention (GALA) • Global-and-Local-attention (GALA) network extends the squeeze-and- excitation (SE) network by adding a local saliency module. • The attention mechanism is embedded in the cost function as a regularization term 20
  • 21. • three cases are considered: 1- networks trained on color images and tested on color images. 2- networks trained on grayscale image and tested on grayscale images. 3- networks trained on color images and tested on grayscale image. the best performance in both color and grayscale cases is achieved by gala click, while gala no click and no gala no click obtained second and third best results respectively. the highest accuracy for all models is attributed to the case in which images are trained on colorful images and tested on colorful images. 21 Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates, March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
  • 22. • to test the effect of importance maps collected in clickme.ai experiment, a rapid object recognition experiment was designed • The dataset contains 100 images from animal and non-animal. • Phase scrambled masks are applied to images • eleven versions of each image ordered ascendingly based on their level of pixel revelation of important pixels. 22
  • 23. Model and human performance • Average accuracy of the two gala models (gala-click and gala no-click) and ResNet-50 model (no-gala-no-click) is compared on the behavioral test images at different levels of pixel revelation. • gala click and gala no click models achieved similar accuracy. • gala click model produces superior results compared to all other models in full pixel revelation. • The second best performance in full level, is achieved by gala-no-click model. 23 Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates, March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
  • 24. • Human visual attention is well-studied • while there exist different models, they lack computational efficacy of our visual system • Attention Mechanisms in Neural Networks are still loosely based on the visual attention mechanism found in humans. 24
  • 25. Thanks for your attention  25