Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence

Numenta Workshop
October 17, 2014
Jeff Hawkins
jhawkins@Numenta.com
Principles of Hierarchical Temporal Memory
Foundations of Machine Intelligence

1) Discover operating principles of neocortex.
2) Create technology for machine intelligence
based on neocortical principles.
Numenta’s Mission

Why Will Machine Intelligence be Based on Cortical Principles?
1) Cortex uses a common learning algorithm
vision
hearing
touch
behavior
2) Cortical algorithm is incredibly adaptable
languages
engineering
science
arts …
3) Network effects
hardware and software efforts will
focus on most universal solution

Talk Topics
- Cortical facts
- Cortical theory (HTM)
- Research roadmap
- Applications roadmap
- Thoughts on Machine Intelligence
easy
deep
easy

What the Cortex Does
patterns The neocortex learns a model
from fast changing sensory data
The model generates
- predictions
- anomalies
- actions
Most of sensory changes are due
to your own movement
The neocortex learns a sensory-motor model of the world
patterns
patterns
light
sound
touch
retina
cochlear
somatic

Cortical Facts
Hierarchy
Cellular layers
Mini-columns
Neurons w/1000’s of synapses
- 10% proximal
- 90% distal
Active distal dendrites
Synaptogenesis
Remarkably uniform
- anatomically
- functionally
2.5 mm
Sheet of cells
2/3
4
6
5

Cortical Theory
Hierarchy
Cellular layers
Mini-columns
Neurons w/1000’s of synapses
- 10% proximal
- 90% distal
Active distal dendrites
Synaptogenesis
Remarkably uniform
- anatomically
- functionally
Sheet of cellsHTM
Hierarchical Temporal Memory
1) Hierarchy of identical regions
2) Each region learns sequences
3) Stability increases going up hierarchy if
input is predictable
4) Sequences unfold going down
Questions
- What does a region do?
- What do the cellular layers do?
- How do neurons implement this?
- How does this work in hierarchy?
2/3
4
6
5

2/3
4
5
6
Cellular Layers
Sequence memory
Sequence memory
Sequence memory
Sequence memory
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer implements a variation of a common sequence
memory algorithm.

2/3
4
5
6
Copy of motor commands
Sensor/afferent data Next higher region
Two Types of Inference (L4, L2/3)
Learns sensory-motor sequences
Learns high-order sequences Stable
Predicted
Pass through
changes
Un-predicted
A-B-C-D
X-B-C-Y
A-B-C- ? D
X-B-C- ? Y
These are universal inference steps.
They apply to all sensory modalities.
Produces receptive field properties seen in cortex.

Feedforward
Linear summation
Binary activation
Local
Feedback
Feedback
Local
Feedforward
Linear
Generate spikes
The Neuron
Biological neuron HTM neuron
Non-linear
Dendritic APs depolarize soma
Threshold coincidence detectors
“Predicted” cell state
HTM SynapsesBiological Synapses
Learning is mostly formation
of new synapses.
Synapses are low fidelity.
Binary weight
0.0 1.00.4 Scalar “permanence”
0 1

Sparse Distributed Representations (SDRs)
• Many bits (thousands)
• Few 1’s mostly 0’s
• Example: 2,000 bits, 2% active
• Each bit has semantic meaning
• Learned
01000000000000000001000000000000000000000000000000000010000…………01000
Dense Representations
• Few bits (8 to 128)
• All combinations of 1’s and 0’s
• Example: 8 bit ASCII
• Bits have no inherent meaning
• Arbitrarily assigned by programmer
01101101 = m
Sparse Distributed Representations (SDRs)
The Language of Intelligence

SDR Properties
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union
E.g. a cell can recognize many
unique patterns on a single
dendritic branch.
Ten synapses from Pattern 1
Ten synapses from Pattern N
1) Similarity:
shared bits = semantic similarity

Cell activates
from dozens of feedforward patterns
Neurons Recognize Hundreds of Patterns
Cell predicts its activity
in hundreds of contexts

Learning Transitions
Feedforward activation

Inhibition

Sparse cell activation

Form connections to previously active cells.
Predict future activity.

- This is a first order sequence memory.
- It cannot learn A-B-C-D vs. X-B-C-Y.
- Mini-columns turn this into a high-order sequence memory.
Multiple predictions can occur at once.
A-B A-C A-D

Forming High Order Representations
Feedforward
Sparse activation of columns
No prediction
All cells in column become active
With prediction
Only predicted cells in column become active

Representing High-order Sequences
A-B-C-D vs. X-B-C-Y
A
X B
B
C
C
Y
D
Before training
A
X B’’
B’
C’’
C’
Y’’
D’
After training
Same columns,
but only one cell active per column.
IF 40 active columns, 10 cells per column
THEN 1040 ways to represent the same input in different contexts

HTM Temporal Memory (aka Cellular Layer)
Converts input to sparse activation of columns
Recognizes, and recalls high-order sequences
- Continuous learning
- High capacity
- Local learning rules
- Fault tolerant
- No sensitive parameters
- Semantic generalization
HTM Temporal Memory is a building block of neocortex/machine intelligence
motor
inference
inference
attention

2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Capabilities: Prediction
Anomaly detection
Classification
Applications: Predictive maintenance
Security
Natural Language Processing

Applications Using HTM High-order Inference
Server anomalies
GROK
available on AWS
Unusual human
behavior
Geospatial
anomalies
Natural language
search/prediction
Cortical.IO
Stock volume
anomalies
HTM High Order
Sequence Memory
Encoder
SDRData Predictions
Anomalies
All use the same HTM code base

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Anomaly detection
Classification
Applications: IT
Security
Data: Static
(with simple behaviors)
Capabilities: Classification
Prediction
Applications: Vision image classification
(with saccades)
Network classification

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Anomaly detection
Classification
Applications: IT
Security
Data: Static
With simple behaviors
Prediction
(with saccades)
Data: Static and/or streaming
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Anomaly detection
Classification
Applications: IT
Security
Data: Static
(with simple behavior)
Prediction
(with saccades and hierarchy)
Data: Static and/or streaming
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
Enables : Multi-sensory modalities
Multi-behavioral modalities

Algorithms are documented
Multiple independent implementations
Numenta’s software is open source (GPLv3)
NuPIC www.Numenta.org
Active discussion groups for theory and implementation
Numenta’s research code is posted daily
Collaborations
IBM Almaden Research, San Jose, CA
DARPA, Washington D.C
Cortical.IO, Austria
Jhawkins@numenta.com
@Numenta
Research Roadmap: open and transparent

Machine Intelligence Landscape
Cortical
(e.g. HTM)
ANNs
(e.g. Deep learning)
A.I.
(e.g. Watson)

Premise Biological Mathematical Engineered
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Data Spatial-temporal
Behavior
Spatial-temporal Language
Documents
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Behavior
Documents
Capabilities Prediction
Classification
Goal-oriented Behavior
Classification NL Query
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Behavior
Documents
Classification
Valuable? Yes Yes Yes
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Behavior
Documents
Classification
Valuable? Yes Yes Yes
Path to M.I.? Yes Probably not No
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

1940’s 1950’s
- Analog vs. digital
- Decimal vs. binary
- Wired vs. memory-based programming
- Serial vs. random access memory
Many approaches
- Digital
- Binary
- Memory-based programming
- Two tier memory
One dominant paradigm
The Birth of Programmable Computing
Why Did One Paradigm Win?
- Network effects
Why Did This Paradigm Win?
- Most flexible
- Most scalable

2010’s 2020’s
The Birth of Machine Intelligence
- Specific vs. universal algorithms
- Mathematical vs. memory-based
- Spatial vs. time-based patterns
- Batch vs. on-line learning
Many approaches
- Universal algorithms
- Memory-based
- Time-based patterns
- On-line learning
One dominant paradigm
Why Will One Paradigm Win?
- Network effects
Why Will This Paradigm Win?
- Most flexible
- Most scalable
How Do We Know This is Going to Happen?
- Brain is proof case
- We have made great progress

What Can Be Done With Software
1 layer
30 msec / learning-inference-prediction step
10-6 of human cortex
2048 columns 65,000 neurons
300M synapses

Challenges
Dendritic regions
Active dendrites
1,000s of synapses
10,000s of potential synapses
Continuous learning
Challenges and Opportunities for Neuromorphic HW
Opportunities
Low precision memory (synapses)
Fault tolerant
- memory
- connectivity
- neurons
- natural recovery
Simple activation states (no spikes)
Connectivity
- very sparse, topological

Requirements for Online learning
• Train on every new input
• If pattern does not repeat, forget it
• If pattern repeats, reinforce it
Connection strength/weight is binary
Connection permanence is a scalar
Training changes permanence
If permanence > threshold then connected
Learning is the
formation of connections
10
connectedunconnected
Connection
permanence 0.2

1
2/3
4
5
6
Motor
1
2/3
4
5
6
Sensory
Motor
Kinesthetic
Thalamus
Thalamus
We believe all layers implement variations of the same learning algorithm:
- Learning transitions in afferent data.
Stable representations are formed for predicted transitions.
Unpredicted transitions are passed to next layer.
Layer 4: Learns sensory/motor transitions.
Layer 3: Learns high-order sequence transitions.
Layers 5 and 6 learn sequences for motor and attention
Sequence Memory
Cortical Layers

Document corpus
(e.g. Wikipedia)
128 x 128
100K “Word SDRs”
- =
Apple Fruit Computer
Macintosh
Microsoft
Mac
Linux
Operating system
….
Natural Language +

Training set
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Word 3Word 2Word 1
Sequences of Word SDRs
HTM

Training set
eats“fox”
?
frog eats flies
cow eats grain
goat eats grass
wolf eats rabbit
cat likes ball
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
HTM

Training set
eats“fox”
rodent
1) Unsupervised Learning
2) Semantic Generalization
3) Many Applications
frog eats flies
cow eats grain
goat eats grass
wolf eats rabbit
cat likes ball
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
HTM

Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence (20)

More from Numenta (20)

Recently uploaded (20)

Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence

Editor's Notes