Mechanisms of bottom-up and top-down processing in visual perception

13 likes7,170 views

This document discusses mechanisms of bottom-up and top-down visual processing. It outlines that rapid recognition in humans can occur through feedforward processing alone, extracting the gist of scenes at 7 images per second without eye movements or expectations. Beyond this, top-down feedback and attention are needed to solve the "clutter problem" in complex scenes. It also describes the hierarchical architecture of object recognition in the ventral visual stream, from primary visual cortex to anterior inferior temporal cortex, with increasing complexity and invariance properties.

Technology

More Related Content

PPTX

History Of Cognitive Psychology

Ali Hasan

PPTX

Pattern Recognition

Yi-Cheng Tsai

PPT

Perception

orengomoises

PPTX

Pattern Recognition: A cognitive process

Muna Shrestha

PPT

Representation of knowledge

Veera Balaji kumar veeraswamy

PPTX

Sigmund Freud

sai nath

PPTX

DISORDERS OF EXPERIENCE OF SELF

PrecharlaRao

KEY

Psychophysics

Raoul

History Of Cognitive Psychology

Ali Hasan

Pattern Recognition

Yi-Cheng Tsai

Perception

orengomoises

Pattern Recognition: A cognitive process

Muna Shrestha

Representation of knowledge

Veera Balaji kumar veeraswamy

Sigmund Freud

sai nath

DISORDERS OF EXPERIENCE OF SELF

PrecharlaRao

Psychophysics

Raoul

What's hot (20)

PDF

Depth cues

coburgpsych

PPT

Depth perception by imran ali

Imran Sono

PPT

Physiological psychology

Edward Rogers

PDF

Mental representation

Hamed Abdi

PPTX

What is Cognitive Psychology?

RichaDhingra10

PPTX

filter & capacity theories.pptx

Rajnesh5

PPTX

Neuropsychology compiled report

Monica Policarpio

PPTX

Karen horney personality theory

Sajjad Khan

PPT

Horney's theory

Jasmine Nadja Pinugu

PPT

Mmpi

Mediadores Interculturales

PPTX

Perception of movement and time perception

Lahore Garrison University

ODP

Sigmund freud biography

ANAAAVELAA

PPT

History of Psychology

Michael Caesar Tubal

PPTX

Unit 2 Psychophysics.pptx

MadehaAshraf1

PPTX

PHENOMENOLOGY OF DELUSION

Faisal Shaan

PPTX

Attention

Maria Angela Leabres-Diopol

PPTX

Cognitive psychology

WEEKLYMEDIC

PPTX

George Kelly - Personal Construct Theory- Princy Hannah

PRINCYHANNAHA

PDF

Psychodynamic approach

sssfcpsychology

PPT

introduction to cognition

Anju Gautam

Depth cues

coburgpsych

Depth perception by imran ali

Imran Sono

Physiological psychology

Edward Rogers

Mental representation

Hamed Abdi

What is Cognitive Psychology?

RichaDhingra10

filter & capacity theories.pptx

Rajnesh5

Neuropsychology compiled report

Monica Policarpio

Karen horney personality theory

Sajjad Khan

Horney's theory

Jasmine Nadja Pinugu

Mmpi

Mediadores Interculturales

Perception of movement and time perception

Lahore Garrison University

Sigmund freud biography

ANAAAVELAA

History of Psychology

Michael Caesar Tubal

Unit 2 Psychophysics.pptx

MadehaAshraf1

PHENOMENOLOGY OF DELUSION

Faisal Shaan

Attention

Maria Angela Leabres-Diopol

Cognitive psychology

WEEKLYMEDIC

George Kelly - Personal Construct Theory- Princy Hannah

PRINCYHANNAHA

Psychodynamic approach

sssfcpsychology

introduction to cognition

Anju Gautam

Viewers also liked (20)

PPTX

Top down process

Hiroshi Sakae

PPT

Visual perception 1

cece2012

PPT

Bottom up & top down tutorial 2

Darshiny Rajasegaran

PPTX

Psychological processes: Bottom-up and Top-Down Listening Schemata

JC Mark Gumban

PPTX

Top down-bottom-up

Adrian Wooster

PDF

Top Down and Bottom Up Design Model

Abdul Rahman Sherzad

PPT

3 game 1 (persepsi)

pujakesuma313

PDF

Brand Sense And Sensitive 2013 Brand In Trend

Dimitar Trendafilov, PhD

PDF

Lecture05

mavillard

PDF

Lecture01

mavillard

PDF

Lecture06

mavillard

PPT

Perception

orengomoises

PDF

Achieving interoperability between CARARE schema for monuments and sites and ...

Valentine Charles

PPT

indera pengecap

Anggita Dwi Lestari Lestari

PPTX

Anfis sistem sensori

STIKES GRAHA MEDIKA

PPT

Questioning Strategies

guest0f30ee6

PPT

Literacy

nelgiles

PDF

2015.12.17 kg bim

Konstantinos Gkoumas

PDF

Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...

Jonathon Hare

PPTX

Prior and background knowledge in reading

Debbie Lahav

Top down process

Hiroshi Sakae

Visual perception 1

cece2012

Bottom up & top down tutorial 2

Darshiny Rajasegaran

Psychological processes: Bottom-up and Top-Down Listening Schemata

JC Mark Gumban

Top down-bottom-up

Adrian Wooster

Top Down and Bottom Up Design Model

Abdul Rahman Sherzad

3 game 1 (persepsi)

pujakesuma313

Brand Sense And Sensitive 2013 Brand In Trend

Dimitar Trendafilov, PhD

Lecture05

Lecture01

Lecture06

Perception

Achieving interoperability between CARARE schema for monuments and sites and ...

Valentine Charles

indera pengecap

Anggita Dwi Lestari Lestari

Anfis sistem sensori

STIKES GRAHA MEDIKA

Questioning Strategies

guest0f30ee6

Literacy

nelgiles

2015.12.17 kg bim

Konstantinos Gkoumas

Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and B...

Jonathon Hare

Prior and background knowledge in reading

Debbie Lahav

Similar to Mechanisms of bottom-up and top-down processing in visual perception (20)

KEY

A neuromoprhic approach to computer vision

Thomas Serre

PDF

Piazza 1 lecture

elena.pasquinelli

PDF

Self Organinising neural networks

ESCOM

PDF

08.10.12 Artificial Intelligence and Cognition - Natural Cognition

LESIS More UoB

PDF

Fcv bio cv_poggio

zukun

PDF

Fcv bio cv_poggio

zukun

PDF

Piazza cogmaster cognitive_neuroscience2013

elena.pasquinelli

KEY

Reading the mind’s eye: Decoding object information during mental imagery fr...

Thomas Serre

PDF

Bosch Expert Days

Dariolakis

PDF

High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...

npinto

PDF

Graded Patterns in Attractor Networks

tristanjwebb

PDF

A neural networks model of self-representation for autonomous agents in compe...

FET AWARE project - Self Awareness in Autonomic Systems

PPTX

Economic Attention Networks

Matthew Ikle

PDF

Object recognition with cortex like mechanisms pami-07

dingggthu

PDF

Fcv bio cv_simoncelli

zukun

PPTX

Short presentation about my thesis

fourthplanet

PDF

Visual Recording - SEE clearly DO smartly by Tom Kealey

Tom Kealey

PDF

Simple Visuals for Complex Research (poster)

Visual Resources Association

PDF

Fcv appli science_perona

zukun

PDF

Fcv appli science_perona

zukun

A neuromoprhic approach to computer vision

Thomas Serre

Piazza 1 lecture

elena.pasquinelli

Self Organinising neural networks

ESCOM

08.10.12 Artificial Intelligence and Cognition - Natural Cognition

LESIS More UoB

Fcv bio cv_poggio

zukun

Fcv bio cv_poggio

zukun

Piazza cogmaster cognitive_neuroscience2013

elena.pasquinelli

Reading the mind’s eye: Decoding object information during mental imagery fr...

Thomas Serre

Bosch Expert Days

Dariolakis

High-Performance Computing Needs Machine Learning... And Vice Versa (NIPS 201...

npinto

Graded Patterns in Attractor Networks

tristanjwebb

A neural networks model of self-representation for autonomous agents in compe...

FET AWARE project - Self Awareness in Autonomic Systems

Economic Attention Networks

Matthew Ikle

Object recognition with cortex like mechanisms pami-07

dingggthu

Fcv bio cv_simoncelli

zukun

Short presentation about my thesis

fourthplanet

Visual Recording - SEE clearly DO smartly by Tom Kealey

Tom Kealey

Simple Visuals for Complex Research (poster)

Visual Resources Association

Fcv appli science_perona

zukun

Fcv appli science_perona

zukun

Recently uploaded (20)

PDF

The Rise and Fall of 3GPP – Time for a Sabbatical?

3G4G

PDF

Unlocking AI with Model Context Protocol (MCP)

Brian McKeiver

PDF

Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Peter Tan

DOCX

The AUB Centre for AI in Media Proposal.docx

GAONE9

PPTX

MYSQL Presentation for SQL database connectivity

Swati270511

PDF

Empathic Computing: Creating Shared Understanding

Mark Billinghurst

PDF

Reach Out and Touch Someone: Haptics and Empathic Computing

Mark Billinghurst

PDF

Encapsulation theory and applications.pdf

gurumoop

PDF

Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Julien SIMON

PDF

Mobile App Security Testing_ A Comprehensive Guide.pdf

flufftailshop

PPTX

Programs and apps: productivity, graphics, security and other tools

4mqw9zch22

PPTX

VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

blackmambaettijean

PPT

“AI and Expert System Decision Support & Business Intelligence Systems”

ZubinRadhakrishnan

PDF

Electronic commerce courselecture one. Pdf

OmerMohamed64

PDF

Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Akar16

PDF

Agricultural_Statistics_at_a_Glance_2022_0.pdf

SandeepSingh286037

PDF

KodekX | Application Modernization Development

KodekX

PDF

NewMind AI Weekly Chronicles - August'25 Week I

NewMind AI

PPT

Teaching material agriculture food technology

LiaRayya

PDF

Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Libreria ERP

The Rise and Fall of 3GPP – Time for a Sabbatical?

3G4G

Unlocking AI with Model Context Protocol (MCP)

Brian McKeiver

Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Peter Tan

The AUB Centre for AI in Media Proposal.docx

GAONE9

MYSQL Presentation for SQL database connectivity

Swati270511

Empathic Computing: Creating Shared Understanding

Mark Billinghurst

Reach Out and Touch Someone: Haptics and Empathic Computing

Mark Billinghurst

Encapsulation theory and applications.pdf

gurumoop

Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Julien SIMON

Mobile App Security Testing_ A Comprehensive Guide.pdf

flufftailshop

Programs and apps: productivity, graphics, security and other tools

4mqw9zch22

VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

blackmambaettijean

“AI and Expert System Decision Support & Business Intelligence Systems”

ZubinRadhakrishnan

Electronic commerce courselecture one. Pdf

OmerMohamed64

Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Akar16

Agricultural_Statistics_at_a_Glance_2022_0.pdf

SandeepSingh286037

KodekX | Application Modernization Development

KodekX

NewMind AI Weekly Chronicles - August'25 Week I

NewMind AI

Teaching material agriculture food technology

LiaRayya

Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Libreria ERP

Editor's Notes

#2: Thank you very much Charles for inviting me. I am delighted to be here and enjoying a weather that we could never hope for in the Spring in Boston...
#3: Here is the problem I am trying to solve: You give me an image and I tell you for instance whether or not it contains an animal. Object recognition is a very hard computational problem. The reason for that is that despite the fact that all of these are images of a giraffe, they look quite different at the pixel level. Objects in the real-world and these animal images in particular can vary drastically in terms of their appearance, shape, texture. In particular, changes in position and scale can create very large changes in the pattern of activity that they elicit on the retina... Think about that: even just a small shift in position of 2 deg of visual angle corresponds to shifting of the image on the retina of more than 120 photoreceptors! This is an extremely difficult task and today, no artificial computer vision system can do this task as robustly and accurately as the primate visual system. However as primates we are extremely good at solving this task despite all these variations...
#4: A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
#5: A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
#6: A classical paradigm that has been extensively used to study object recognition and visual perception is what I would call the rapid recognition paradigms. Here I am flashing images in rapid succession. This paradigm is called RSVP and was introduced by Molly Potter in the 70’s. Images are being presented at a rate of 7/s. At this speed you probably don’t get every details in the image but at the very least you are able to build a coarse description of the scene. For instance most of you should be able to recognize and perhaps memorize objects in these images... While these types of task do not necessarily reflect natural everyday vision when the visual world moves continuously and you are free to move your eyes and shift your attention. However they are able to isolate the first 100-150 ms of visual processing during which a base representation for images is being formed before more complex visual routines can come into play...
#7: In this talk I will argue that this base representation corresponds to the activation of a hierarchy of image fragments following a single feedforward sweep through the visual system. This bottom-up feedforward sweep rapidly activates specific sub-population of neurons in the ventral stream of the visual cortex that are tuned to image fragments with different levels of selectivity and invariance. I will show you that consistent with human psychophysics, a key limitation of this architecture is that it is susceptible to clutter. While it does relatively well on images that contains a single object and limited clutter (like the ones I just showed you), we found that the performance decreases significantly with increased amount of clutter.
#8: In this talk I will argue that this base representation corresponds to the activation of a hierarchy of image fragments following a single feedforward sweep through the visual system. This bottom-up feedforward sweep rapidly activates specific sub-population of neurons in the ventral stream of the visual cortex that are tuned to image fragments with different levels of selectivity and invariance. I will show you that consistent with human psychophysics, a key limitation of this architecture is that it is susceptible to clutter. While it does relatively well on images that contains a single object and limited clutter (like the ones I just showed you), we found that the performance decreases significantly with increased amount of clutter.
#9: In the second part of my talk I will argue that the way the visual system solves this clutter problem is via cortical feedback and shifts of attention. I will outline an integrated model of object recognition and attention. I will show that the object recognition performance of the model increases significantly when used in conjunction with attentional mechanisms. Using eye movements as a proxy for attention, I will show that the resulting model can account for a significant fraction of human eye movements during search tasks in complex natural images.
#19: We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
#20: We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
#21: We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
#22: Computational considerations suggest that you need two types of operations and therefore functional classes of cells to explain those data. By analogy with H&B hierarchical model of processing in the visual cortex, we have called these two classes of cells simple and complex. The scheme that I am going to describe essentially extend their proposal from striate to extra-striate visual areas. We have assumed that these two types of functional units implement two types of computations or mathematical operations: Gaussian-like or bell-shape tuning and a max-like operation. The gaussian-bell tuning was motivated by a learning algorithm called Radial Basis Function while the max operation was motivated by the standard scanning approach in computer vision and theoretical arguments from signal processing. The goal of the simple units is to increase the complexity of the representation. Here on this example by pooling together the activity of afferent units with different orientations via this Gaussian-like tuning. This Gaussian tuning is ubiquitous in the visual cortex from orientation tuning in V1 to tuning for complex objects around certain poses in IT. The complex units pool together afferent units with the same preferred stimuli eg vertical bar but slightly different positions and scales. At the complex unit level we thus build some tolerance with respect to the exact position and scale of the stimulus within the receptive field of the unit.
#23: We have implemented a computational model (shown on the right) that implement these sets of principles. Van Essen on the left. We do not try to account for the whole visual cortex, only the ventral stream of the visual cortex... The model is hierarchical with only feedforward connections.
#24: EMPHASIZE AFTER TRAINING: NO DATA FITTING MENTION CHARLES It builds a simple-to-complex cells hierarchies. Mimic as closely as possible the tuning properties of neurons in various areas of the ventral stream Builds on earlier work in the lab by Max Riesenhuber
#25: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#26: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#27: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#28: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#29: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#30: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#31: -- I would argue that a key aspect of this model is the learning of a large dictionary of reusable features (I would call them shape components) from V1 to IT. These features represent a basic vocabulary of shape components that can be used to represent any visual input. These features correspond to patches of images which appear with high probability in the natural world. We argue that learning of this dictionary is done UNSUPERVISED during a developmental period. -- In this model, the goal of the ventral stream of the visual cortex from V1 to IT is to build a good representation for images, i.e. a representation which is compact and invariant with respect to 2D transformations such as translation and scale. -- With a good image representation, learning a new image category is relatively easy. We speculate that this can be done from a handful of labeling examples by training task-specific circuits running from IT to the PFC. We showed that it worked well on multiple object categories on standard computer vision databases.
#32: for the sake of time I am only going to show you that you can simulate a neurophysiology experiment with this model. You can record from population of random neurons and perform the same exact analysis as in a real experiment. On the bar plot shown here we performed the same exact readout experiment as in the study by Hung et al. What is shown here the classification performance when training in a specific position and scale and evaluating the generalization capability of the classifier to positions and scales not presented during training. This measures the built-in invariance inherited from the response properties of population of neurons and you can see that the fit is quite good.
#33: In parallel we have used this model in real-world computer vision applications. For instance we have developed a computer vision system for the automatic parsing of street scene images. Here are examples of automatic parsing by the system overlaid over the original images. The colors and bounding boxes indicate predictions from the model (eg green for trees etc). The computer vision system shown here is based exclusively on the response properties
#34: More recently we have extended the approach for the recognition of human actions such as running, walking, jogging, jumping, waving etc... In all cases we have shown that the resulting biologically motivated computer vision systems were performing on par or better than state-of-the-art computer vision systems.
#36: The goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention. Here is an example on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
#37: The goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention. Here is an example on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
#44: We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
#45: We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
#46: We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
#47: We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
#48: We have seen that in the model and in the visual cortex, when two stimuli fall within the receptive field of a neuron, the two stimuli “compete”, that is they reduce the selectivity of the neurons. I just showed you that at the psychophysical level, the amount of clutter in an image largely determines the performance of the model and of human observers during rapid categorization tasks.
#50: Using eye movements as correlate of attention. Assumption is that attention gets to an item just before eye moves so if eyes move we an assume that just before that attention was there
#51: Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
#52: Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
#53: Here is the original model: we added back-projections to account for these attentional modulations we assume that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. We also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence). This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
#55: here the way we implemented that is via belief propagation in polytrees (here the messages are shown for the simplified case of a single feature for clarity). Within framework, spatial attention can be described as a series of msgs from L to Fil to Fi to O while feature-based attention goes the opposite way. Thus the model makes specific predictions about the timing of visual areas in the ventral stream and the parietal cortex depending on the task at end. Obviously I am leaving a lot of details open unfortunately...
#57: We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
#58: We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
#59: We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
#60: We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
#71: here the way we implemented that is via belief propagation in polytrees (here the messages are shown for the simplified case of a single feature for clarity). Within framework, spatial attention can be described as a series of msgs from L to Fil to Fi to O while feature-based attention goes the opposite way. Thus the model makes specific predictions about the timing of visual areas in the ventral stream and the parietal cortex depending on the task at end. Obviously I am leaving a lot of details open unfortunately...
#78: Unlike artificial search arrays were arbitrary objects are simply randomly placed on a display, natural scenes are highly structured. This is a point that has been made by Antonio Torralba and Aude Oliva and the fact that global features could provide a good representation of the gist of the scene which is sufficient to associate contextual information from the visual scene to actual object locations like here for instance where you would expect people to be most in these darker regions...