Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Representations in Deep Neural Networks - Viviane Clay

RTG Computational Cognition
Sensorimotor Learning in ANNs
Viviane Clay
The Effect of Sensorimotor Learning on the
Learned Representations in Deep Neural
Networks
Viviane Clay

Differences in Computational Abilities
Viviane Clay
Current ANNs are vulnerable to adversarial attacks, have trouble to continually learn without catastrophic forgetting,
extrapolate knowledge to new tasks, learn causal models, and learn efficiently from few labeled examples.
(Hendrycks et al., 2019)
(Goodfellow et. al., 2015)

Learning more like Children
Viviane Clay
I investigate the effect of supervision and embodiment as well as gradual knowledge acquisition on learning.
How Humans Learn How Machines Learn
Sensorimotor
Preoperational
Concrete
Operational
Formal Operational
<2 years 2-7 years 7-11 years >11 years
Piaget’s Stages of
Cognitive Development

Learning more like Children
Viviane Clay
“Instead of trying to produce a program to simulate the adult
mind, why not rater try to produce one which simulates the
childs? If this were then subjected to an appropriate course
of education one would obtain the adult brain.” – Alan
Turing, 1950

Learning Through Interaction
Viviane Clay
The idea is to let an agent learn through interaction with the world and then take a look at what kind of representation is learned.

Learning Through Interaction
Viviane Clay
Sensorimotor learning takes place in a randomly generated 3D maze environment with interactable objects.

Experimental Conditions
Viviane Clay
Perception Reconstruction
Action
Perception
Perception Classification
1. Representation Learning
Embodied Agent Autoencoder
Classifier
Control Conditions

Analysis of the Learned Representations
Viviane Clay

Viviane Clay
The representations of the embodied agent are correlated with actions and action-relevant objects such as doors.

Viviane Clay
The agent learns a sparse encoding of the high dimensional visual input. This is a very efficient and robust way to code
information (Ahmad & Scheinkman, 2019) and can be observed in nature (Perez-Orive et al., 2002; Laurent, 2002; Young &
Yamane, 1992; Brecht & Sakmann, 2002)
Avg # active
each frame
# neurons
used
Active in
more that
40% of frames
Most active
neuron active
in
12.6 (4.9%) 182 (71%) 3 (1.2%) 68.3%
239 (93.4%) 239 (93.4%) 239 (93.4%) 100%
175.2 (68.5%) 256 (100%) 233 (91%) 100%

Viviane Clay
When measuring the sparsity using the Gini index one can see that the embodied agent gets
more and more sparse over time.

Part I - Summary
Viviane Clay
Next question: Can we build on these encodings of the world and use them for tasks such as few-
shot object detection?
Summary:
• The representations learned through very weakly supervised or self-
supervised exploration are structured and meaningful
• They encode action-oriented information in very sparse activation
patterns
• The encodings learned through interaction differ significantly from the
encodings learned without interaction

Fast Mapping
Viviane Clay
Children (and adults) can make an object-word association with as little as one example. This is called Fast Mapping.
How Humans Learn How Machines Learn
Sensorimotor
Preoperational
Concrete
Operational
Formal Operational
<2 years 2-7 years 7-11 years >11 years
Piaget’s Stages of
Cognitive Development

Fast Concept Mapping
Viviane Clay
We want to transition from self-supervised learning through interaction to learning complex
concepts from few examples.

Viviane Clay
1.
Perception Reconstruction
Action
Perception
Perception Classification

Viviane Clay
2.

Viviane Clay
?

Viviane Clay
Take N examples of the concept

Viviane Clay
Give them to the trained Agent and see how they are represented

Viviane Clay
Look at which Neurons fire together in each Example

Viviane Clay
Sum up the correlations

Viviane Clay
Use the M strongest connections to define the concept

Viviane Clay
The strength of a connection defines its weight in the concept

Viviane Clay
Do this for any concept you like to extract

Viviane Clay
Inference: Compare the activations of a new input to the concept definition

Viviane Clay
Inference: Add up the amount of evidence for the concept and check if it is above a threshold
Evidence: 0.66

Viviane Clay
Inference: Do this for all concepts of interest
Evidence: 0.66
Evidence >= TH
Evidence < TH

Viviane Clay
It works with surprisingly few examples!

Viviane Clay
About equally well as with the classifier representations, which were optimized for exactly these
concepts.

Viviane Clay
Why? The idea is that in a good encoder few examples of a class should already be representative of the overall pattern used to
encode this class.

Viviane Clay
FCM works better on some concepts than on others. It can show which information is encoded well in the networks.

Viviane Clay
You can add as many concepts as you want! If the concept is encoded in the representation, it can be extracted without
retraining or huge amounts of labeled data.

Part II - Summary
Viviane Clay
Making to computational aspects of learning more natural can lead to more brain-like abilities.
Summary:
• The representations learned through very weakly supervised or self-supervised
exploration encode information about concepts which were never explicitly
taught
• These concepts can be learned with very few examples (above chance
accuracy already with one example)
• Concept mapping has no problem with catastrophic forgetting
• There is no need for a big labeled data set (there often is in few-shot
approaches)
• Learning more naturally (embodied & self/weakly-supervised) leads to abilities
that we also find in humans such as fast-mapping

Takeaways
Viviane Clay
1) Our brains are sensorimotor systems, and if we want to model them, our models should be as well.
2) What a model learns is a reflection of its environment and task.
3) If we want to understand cognition and our brains, we need to understand the tasks that need to be solved. If we want our
artificial models to learn brain-like representations, we need to let them learn tasks that brains need to solve.

Embodied AI
Viviane Clay
Thanks for
your
attention!
Any
questions?
Ideas?

Publications
Viviane Clay
https://doi.org/10.1016/j.neunet.2020.11.004 https://arxiv.org/abs/2102.02153
(currently under review @ journal)

Differences in Information Processing
Viviane Clay
“An algorithm is likely to be understood more readily by understanding the nature of
the problem being solved than by examining the mechanism (and the hardware) in
which it is embodied. In a similar vein, trying to understand perception by studying
only neurons is like trying to understand bird flight by studying only feathers: It just
cannot be done. In order to understand bird flight, we have to understand
aerodynamics; only then do the structure of feathers and the different shapes of
birds’ wings make sense.” (Marr,2010, Chapter1, p.27)

Differences in Information Processing
Viviane Clay

Looking for Inspiration
Viviane Clay
In nature certain solution have developed independently in different species which means they are at least local optima. Using
solutions for information processing that evolution came up with does not seem like a bad start.
© Arizona Board of Regents / ASU Ask A Biologist

The Importance of Embodiment
Viviane Clay
Self-generated movements with concurrent visual feedback seem necessary in order to display visually guided behavior and to
perform tasks that require depth perception.

Emerging Sparsity and its Mysterious Causes
Viviane Clay
Observations
Action
Encoding
Encoded
State
Encoded
Next
State
Predicted
Next
State
Predicted
Next
Action
Forward Loss
Inverse Loss
Encoding
Encoded
State
Observations
Encoding
Encoded
State
Decoding
Reconstruction
Reconstruction Error
Observations
Encoding
Encoded
State
Error
Object
Classification
True Label
Encoding
Encoded
State
Encoding
Encoded
State
Encoded
State

Test Concepts & Data Distribution
No door Level door Green door Key door Other door
Nothing 150 200 150 150 200 850
Key 150 0 50 50 0 250
Orb 50 50 50 50 50 250
350 250 250 250 250

My PhD Project – A Birds Eye View
Viviane Clay
0 1 0 0 0 0
21
10 8
No
Key
1
Key
2
Keys
3
Keys
4
Keys
5
Keys
Time
Left
Level
Walk
Turn Body
Jump
Turn Camera
Observations Actions
(168x168x3)

Viviane Clay
Agents with all types of rewards have a very action-oriented encoding of the sensory input.

Fast Concept Mapping from Embodied Representations
Viviane Clay
Concept logic: This can improve performance. The classifier already implicitly makes use of this.
OR
AND
AND

Bring the Concepts Back into the World
Idea: Add extracted concepts as additional information for decision making. This will also refine the activations to match the
concepts (hopefully).

Conventional RL (simplified)
Viviane Clay
Rewards from the environment are used to optimize the policy.
Observation
φ
Representation
Action
Value
Estimate
Environment
Rewards
𝜋
Actor (policy gradient) & Critic (value function) Loss

Exploration by Curiosity
Viviane Clay
Can learn without external rewards by using the prediction error as internal reward.
Observation
φ
Representation
Action
Value
Estimate
Environment
𝜋
Φ’
Next Observation
Forward
Model
Inverse
Model
Predicted
Action
Predicted
Φ(next obs)
Inverse Loss
Internal
Reward
Actor & Critic
Loss
Next Observation
Forward Loss
(Pathak et. al., 2017)

Exploration by Disagreement
Viviane Clay
The inverse loss can be seen as an auxiliary task for feature learning. Φ’ can also be random
features or learned with other loss like VAE.
Observation
φ
Representation
Action
Value
Estimate
Environment
𝜋
Φ’
Next Observation
Forward
Model
Inverse
Model
Predicted
Action
Predicted
Φ(next obs)
Inverse Loss
Forward Loss
Internal
Reward

Viviane Clay
The inverse loss can be seen as an auxiliary task for feature learning. Φ’ can also be random
features or learned with other loss like VAE.
Observation
φ
Representation
Action
Value
Estimate
Environment
𝜋
Φ’
Next Observation
Forward
Model
Auxiliary
Task/Loss
Predicted
Φ(next obs)
Forward Loss
Internal
Reward

Viviane Clay
We can train an ensemble of forward models.
Observation
φ
Representation
Action
Value
Estimate
Environment
𝜋
Φ’
Next Observation
Forward
Model
Auxiliary
Task/Loss
Predicted
Φ(next obs)
Forward Loss
Forward
Model
Predicted
Φ(next obs)
Forward
Model
Predicted
Φ(next obs)
Forward
Model
Predicted
Φ(next obs)

Viviane Clay
As an intrinsic reward we can then take the disagreement between models.
Observation
φ
Representation
Action
Value
Estimate
Environment
𝜋
Φ’
Next Observation
Forward
Model
Auxiliary
Task/Loss
Predicted
Φ(next obs)
Forward Loss
Forward
Model
Predicted
Φ(next obs)
Forward
Model
Predicted
Φ(next obs)
Forward
Model
Predicted
Φ(next obs)
Internal
Reward
Variance/
Disagreement =

My PhD Project – A Birds Eye View
Viviane Clay
The agent shows more spike like activations while AE and C are more continuously active with
changes in activation magnitude.

Viviane Clay

Comparison Between Agent Representations –
Consistent Threshold and Pattern Complexity

How well can Concepts be Extracted using Single Neurons or
Triplets?
TH: 20%

How does this Compare to an SVM?
TH: 20%
Fitting a support vector machine on 5 positive and 5 negative examples (only positive is not possible here) does not lead to
increased accuracy.

Viviane Clay
5 Examples

Viviane Clay
250 Examples

Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Representations in Deep Neural Networks - Viviane Clay

More Related Content

Similar to Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Representations in Deep Neural Networks - Viviane Clay (20)

More from Numenta (20)

Recently uploaded (20)

Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Representations in Deep Neural Networks - Viviane Clay