"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications," a Presentation from Ryad B. Benosman

What is Neuromorphic Event-based Computer Vision?
Sensors, Theory and Applications

Landscape of possible eye forms (M. Land)
“the eye has evolved independently between 40 and 60 times around the animal
kingdom, “It seems that life, at least as we know it on this planet, is almost
indecently eager to evolve eyes. …. There are only so many ways to make an eye,
and life as we know it may well have found them all

• Invention of the camera obscura in 1544 (L. Da Vinci)
• The mother of all cameras
• A more realistic and fast depiction of reality
Origins of Imaging

Origins of Imaging
• Increasing painters profits: painting faster
• Evolution from portable models for travellers to current digital cameras
• Evolving from canvas, to paper, to glass, to celluloid, to pixels

Motion Picture: origins of video
• Early work in motion-picture projection
• known for his pioneering work on animal locomotion in 1877 and 1878,
which used multiple cameras to capture motion in stop-motion photographs
Eadweard Muybridge
(1830-1904)

Over-sampling
Under-sampling
Over-sampling
Under-sampling
• Motion blur
• Displacement between frames
• When dealing with change or motion the universally accepted paradigm of
visual acquisition becomes FUNDAMENTALLY FLAWED!
• FIXED frame rate for all pixels is always WRONG
• NO relation between frame rate and scene dynamics
• Over-sampling AND under-sampling of scene!
Why are images bad?

Beyond the Shannon-Nyquist Acquisition principle
if a higher number of simultaneous signals are considered as it
is the case in images. In order to overcome these limitations and
provide an accurate temporal sampling of f (t), it is more efficient to
detect variations of f (t) just at the exact time at which they occur
(figure.3(b)), namely sampling on the other axis.
(a) t
f(t)
(b)
f(t)
t
Fig. 3. Two ways to sample functions values, in (a) using a
classic constant scale on the t axis, in (b) using a constant scale
but on the values of f (t).
This process is data oriented and discards redundancies at the
lowest level. Changes are detected precisely when they occur over-
coming all limitations of constant time sampling on the t axis.
This codification provides a compact representation of light changes,
this time oriented process is also compatible with observations that
temporal changes in scenes never occur spatially at the same time. It
is very rare that the whole content of an image changes completely
between two consecutive frames. If f (t) changes are quantized
according to a predefined quantity ∆ f , it becomes possible to define
a function E v providing temporal events corresponding to the exact
E v(x, y, t) gives a m
t0
f x,y (t)
∆f
α1
Ev(t)
+1 +1 +
Fig. 4. Codification
contrast events follow
its values are in the
an absence of change
than ∆ f , while + 1 a
given according to th
approximate f x ,y by
∆ f and f x ,y (t0 ):
Claude
Shannon
(1916-2001)

Neural acquisition
• Amplitude sampling
• Information is sent when it happens
• When nothing happens, nothing is sent or processed
• Sparse information coding
• Time is the most valuable information
in images. In order to overcome these limitations and
ccurate temporal sampling of f (t), it is more efficient to
ions of f (t) just at the exact time at which they occur
namely sampling on the other axis.
(a) t (b)
f(t)
t
o ways to sample functions values, in (a) using a
stant scale on the t axis, in (b) using a constant scale
values of f (t).
ess is data oriented and discards redundancies at the
Changes are detected precisely when they occur over-
limitations of constant time sampling on the t axis.
t0 tk
f x,y(t)
∆f
α1 β1 α2
Ev(t)
+1 +1 +1
-1 -1-1 -1
+1
Fig. 4. Codification of pixels gray-leve
contrast events following T1 codification

Castle Metaphor: digital acquisition
• Ensuring the safety of the castle
• Each sentinel has to wait for a drum beat to send information

A chip based on the neural architecture
of the eye proves a new, more powerful
way of doing computations
The Silicon Retina
Misha
Mahowald
1963-1996
Carver
Mead

Dynamic Vision Sensor
• Each event represents a quantized change in log intensity (brightness or relative
intensity change or “temporal contrast”
• Artificial retinas do not provide images,, Data driven, energy efficient
LICHTSTEINER et al.: A 128 128 120 dB 15 s LATENCY ASYNCHRONOUS TEMPORAL CONTRAST VISION SENSOR
Fig. 4. (a) Die photo of the 0.35 m 4M2P process chip. (b) Pixel layout is
quad-mirror-symmetric with photodiode (PD) and analog and digital parts of
the pixel. Most of the rest of the pixel is occupied by capacitance.
metal cut over the PD overlaps the n-well edge slightly to pro-
tect nFETs from parasitic photocurrent.
Fig. 5. Present implementation of the TMPDIFF128 cam
USB2.0 interface. (a) Vision sensor system. (b) Schematic
hardware and software interface. The vision sensor (TMPDIF
to the USB interface, which also captures time-stamps fro
counter running at 100 kHz that shares the same 16-bit bus. Th
events are buffered by the USB FIFOs to be sent to the Host
buffers the data in USB driver FIFOs, “unwraps” the 16-b
32-bit values, and offers this data to other threads for furthe
same USB chip also uses a serial interface to control the vis
Flash memory on the USB chip stores persistent bias values.
A. Uniformity of Response
For standard CMOS image sensors, the FPN
the uniformity of response. For this vision sens
alent measure is the pixel-to-pixel variation
threshold , which was introduced in Section II
on the settings of the comparator thresholds an
LICHTSTEINER et al.: A 128 128 120 dB 15 s LATENCY ASYNCHRONOUS TEMPORAL CONTRAST VISION SENSOR 571
Fig. 5. Present implementation of the TMPDIFF128 camera system with
USB2.0 interface. (a) Vision sensor system. (b) Schematic view of the USB
hardware and software interface. The vision sensor (TMPDIFF128) sends AEs
to the USB interface, which also captures time-stamps from a free-running
counter running at 100 kHz that shares the same 16-bit bus. These time-stamped
events are buffered by the USB FIFOs to be sent to the Host PC. The PC also
buffers the data in USB driver FIFOs, “unwraps” the 16-bit time-stamps to
32-bit values, and offers this data to other threads for further processing. The
P. Lichtseiner, C. Posch, T. Delbruck,

DAVIS(Dynamic and Active-pixel Vision Sensor)
• Hybrid solution: Frame+Events
• Low temporal resolution,
• Need of a high bandwidth
• Grey levels (absolute measurements) : low dynamic range (50db)
C. Brandli, M. Yang, S.-C. Liu, V.
Villeneuva, and T. Delbruck,

ATIS (Asynchronous Time-based Image)
Sensor
• Each event triggers an intensity measurement encoding in time
• 143 dB Dynamic Range Frame-Free PWM
• Image Sensor With Lossless Pixel-Level Video
• Compression and Time-Domain CDS
C.Posch, Daniel
Matolin, and
Rainer
Wohlgenannt

Sample Data
• Data driven: only moving edges produce data
• Temporal edges, precisely timed

Asynchronous Time-based Image Sensor

CCAM sensors provide frame-free visual information
CCAM is generat
less events than a
equivalent 1000 fp
based camera
the number of events depends on the
dynamics of the scene. For standard
cameras this amount is constant.
Low data bandwidth
• Comparing amount of data generated by a conventional camera running at 30Hz
vs an event based camera (1MHz)
• 1M times faster for a lower data load (.1-20% less)

Neuromorphic engineering
Physiology - Models
- Hardware
- Prosthetics
- Robotics
- Computation
• Makes of machine vision a science!
• Develop new bidirectional methodology to understand the brain
• Merging Computational and Biological Vision

• Matching binocular events only using the time of arrival
• Two events arriving at the same time and fulfilling geometric constraints are
matched
Applications: Stereovision

Asynchronous Time-base Image Sensor (ATIS)
•
•

Time-based Luminance Matching – EI

"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications," a Presentation from Ryad B. Benosman

Motion estimation: optical flow

Optical flow
• High temporal resolution allows to generate smooth space-time surface
• The slope of the local surface contains the orientation and amplitude of the
optical flow
Motion estimation: optical flow

Tracking, Iterative Closest Point

Tracking real-time outdoor scenes

sible wrong assignments and more robustness to noise. It is interesting to n
that errorsaremoreequally shared between thevan and thecar thanksto the
reliable event attribution.
Figure 16: The events generated by the van and the car during 3s are ﬁtted into
planes, denoted as P1 in blue and P2 in red. n1 and n2 are the surface normals.
Theexclusiveuseof spatial dataignoring timelogically implies lessavailablei
mation. Thedynamic content of visual information isthen lost. On theother hand,
Outdoor vehicle tracking
•
•

Event-based tracking with a moving camera

Multi-kernel High Speed Visual Features Tracking

Model Based Pose Estimation
• Estimation of the pose of a camera given a known shape

• Estimation of the pose of a calibrated camera given a set of n 3D points in the
world and their corresponding 2D projections in the image
Perspective-n-Point

Dense feature-less, Event Based Visual Odometry

Extra low badwidth low power streaming

Do not generate Images from Events
• Not event based,
• Useless approach, GPU use, to generate 100-200Hz bad quality images
• Fake SLAM, using binary images….
What is NOT event based Machine Vision
son to the method of [1]. The ﬁrst row shows the raw inp
both methods. The second row depicts the results of Bard
our result. We can see that out method produces more det
ore graceful gray value variations in untextured areas, whe
s.

(c) Spatio-tem poral dom ain
(e) Exponential kernels
(b) Events from the sensor
(f) Tim e surface
(d) Tim e context
(a) Event-driven tim e-based
vision sensor (ATIS or DVS)
Context am plitude
surface am plitude
X (spatial)
Y
(spatial)
ON events OFF events
Dynamic Machine Learning

8
Layer 1 Layer 2 Layer 3
features
featuresfeatures
pixels pixels pixels
ATIS
sensor
Stimulus
Classifier
(a)
(b )
(c)
(d )
(e) (f )
(g ) (h ) (i)
(j)
HOTS: A Hierarchy Of event-based Time-Surfaces

(a) Layer 1 feature activation
(b) Layer 2 feature activation
(c) Layer 3 feature activation
(d) Examples of layer outputs
Layer 1 Layer 3Layer 2
Events Feature activation Layer output

Playing cards experiment
5 10 15
0
500
1000
5 10 15
0
500
1000
5 10 15
0
500
1000
5 10 15
0
500
1000
feature # feature # feature # feature #
timesactivated
Time

Times activated (log)
Feature #Time
Dynamic Faces

(a) Layer 1
(b) Layer 2
experiment: Reconstructed features extracted by layers 1 and 2.
obtained cluster centers for the first two layers used in the face recognition task (8 features for layer 1, 16 for layer 2, 32 for layer
ctly obtained from events generated by the event-driven time-based vision sensor. Each feature is represented as a time-surface like
the first, positive, half corresponds to the ON events and the second, negative, half corresponds to the OFF events. Each layer is
its preceding layer to interpret its own features as a pattern of event produced by the camera. As the position of the layer increases,
creases. So does its complexity. See Fig. 6 for the third layer.
3
(a) Layer 1
(b) Layer 2
(c) Layer 3
Fig. 2. Letters & Digits experiment: Reconstructed features extracted by the different layers.
This figure presents the obtained cluster centers for the three successive layers used in the letters & digits recognition task (4 features for layer 1, 8 for layer
2, 16 for layer 3). The first layer is directly obtained from events generated by the event-driven time-based vision sensor. Each feature is represented as a
time-surface like in Fig. 3 from the paper, the first, positive, half corresponds to the ON events and the second, negative, half corresponds to the OFF events.
Each layer is then using the features of its preceding layer to interpret its own features as a pattern of event produced by the camera. As the position of the
layer increases, the size of the feature increases. So does its complexity. The actual features used by the classifier are the features from layer 3.
Letters and digitsFaces
Dynamic Features

Retina implants
Epi-retinal
Sub-retinal

Asynchronous Display: retina prosthetics
• Development of retina stimulation goggles
• Asynchronous retina stimulation: prosthetics and optogenetics
• Development fo stimulation optogénétique (GenSight, FMI),
• New generation of display for gaming and wearable devices

& much more...
Decision making: game theory
stock Market
Low power Online
decoding and classification
Robotics
Autonomous driving
Always on sensing

•
•
•
•
•
•
Conclusions

"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications," a Presentation from Ryad B. Benosman

More Related Content

What's hot (20)

Similar to "What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications," a Presentation from Ryad B. Benosman (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applications," a Presentation from Ryad B. Benosman