What the Brain says about Machine Intelligence

November 21, 2014
Jeff Hawkins
jhawkins@Numenta.com
What the Brain Says About Machine Intelligence

1940’s 1950’s
- Dedicated vs. universal
- Analog vs. digital
- Decimal vs. binary
- Wired vs. memory-based programming
- Serial vs. random access memory
Many approaches
- Universal
- Digital
- Binary
- Memory-based programming
- Two tier memory
One dominant paradigm
The Birth of Programmable Computing
Why Did One Paradigm Win?
- Network effects
Why Did This Paradigm Win?
- Most flexible
- Most scalable

2010’s 2020’s
The Birth of Machine Intelligence
- Specific vs. universal algorithms
- Mathematical vs. memory-based
- Batch vs. on-line learning
- Labeled vs. behavior-based learning
Many approaches
- Universal algorithms
- Memory-based
- On-line learning
- Behavior-based learning
One dominant paradigm
Why Will One Paradigm Win?
- Network effects
Why Will This Paradigm Win?
- Most flexible
- Most scalable
How Do We Know This is Going to Happen?
- Brain is proof case
- We have made great progress

1) Discover operating principles of neocortex.
2) Create machine intelligence technology
based on neocortical principles.
Numenta’s Mission
Talk Topics
- Cortical facts
- Cortical theory
- Research roadmap
- Applications
- Thoughts on Machine Intelligence

What the Cortex Does
patterns Learns a model of world
from changing sensory data
The model generates
- predictions
- anomalies
- actions
Most sensory changes are due
to your own movement
The neocortex learns a sensory-motor model of the world
patterns
patterns
light
sound
touch
retina
cochlear
somatic

Cortical Facts
Hierarchy
Cellular layers
Mini-columns
Neurons: 3-10K synapses
- 10% proximal
- 90% distal
Active dendrites
Learning = new synapses
Remarkably uniform
- anatomically
- functionally
2.5 mm
Sheet of cells
2/3
4
6
5

Cortical Theory
Hierarchy
Cellular layers
Mini-columns
Neurons: 3-10K synapses
- 10% proximal
- 90% distal
Active dendrites
Learning = new synapses
Remarkably uniform
- anatomically
- functionally
Sheet of cellsHTM
Hierarchical Temporal Memory
1) Hierarchy of identical regions
2) Each region learns sequences
3) Stability increases going up hierarchy if
input is predictable
4) Sequences unfold going down
Questions
- What does a region do?
- What do the cellular layers do?
- How do neurons implement this?
- How does this work in hierarchy?
2/3
4
6
5

2/3
4
5
6
Cellular Layers
Sequence memory:
Sequence memory:
Sequence memory:
Sequence memory:
Inference (high-order)
Inference (sensory-motor)
Motor
Attention
FeedforwardFeedback
Each layer is a variation of common sequence memory algorithm.
These are universal functions. They apply to:
- all cortical regions
- all sensory-motor modalities.
Copy of motor commands
Sensor data Higher region
Sub-cortical
Motor centers
Lower region

2/3
4
5
6
Sequence memory:
Sequence memory:
Sequence memory:
Sequence memory:
?
?
?
?
How Does Sequence Memory Work?

HTM Temporal Memory
Learns sequences
Recognizes and recalls sequences
Predicts next inputs
- High capacity
- Distributed
- Local learning rules
- Fault tolerant
- No sensitive parameters
- Generalizes

HTM Temporal Memory
Not Just Another ANN 1) Cortical Anatomy
Mini-columns
Inhibitory cells
Cell connectivity patterns
2) Sparse Distributed
Representations
3) Realistic Neurons
Active dendrites
Thousands of synapses
Learn via synapse formation
numenta.com/learn/

2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Capabilities: Prediction
Anomaly detection
Classification
Applications: Predictive maintenance
Security
Natural Language Processing

HTM
Encoder
SDRData stream Predictions
Anomalies
Classification
Streaming Data Applications
Numbers
Categories
Date
Time
GPS
Words
Applications
Servers
Biometrics
Medical
Vehicles
Industrial equipment
Social media
Comm. networks

Streaming Data Applications
Server metrics Human metrics
Natural languageGPS dataEEG data
Financial data

.
.
.
Anomaly Detection in Server Metrics (Grok for AWS)
HTM
Encoder
SDRServer Metric
Anomaly Score
HTM
Encoder
SDRServer Metric
Anomaly Score
Mobile Dashboard
 Servers sorted by
anomaly score
 Continuously updated
Web Dashboard

What Kind of Anomalies Can HTM Detect?
Sudden changes Slow changes Changes in noisy dataSubtle changes
in regular data

Changes that humans can’t see
Engineer manually started
build on automated build server
What Kind of Anomalies Can HTM Detect?

Created large
Zip file
Anomaly Detection in Human Metrics
Keystrokes
File access
CPU usage
App access

Anomaly Detection in Financial and Social Media Data
Stock volume
Social media
Stock volume
Social media

Berkeley Cognitive Technology Group
Classification of EEG Data

Document corpus
(e.g. Wikipedia)
128 x 128
100K “Word SDRs”
- =
Apple Fruit Computer
Macintosh
Microsoft
Mac
Linux
Operating system
….
Natural Language

Training set
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Word 3Word 2Word 1
Sequences of Word SDRs
HTM

Training set
eats“fox”
?
frog eats flies
cow eats grain
goat eats grass
wolf eats rabbit
cat likes ball
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
HTM

Training set
eats“fox”
rodent
- Learning is unsupervised
- Semantic generalization
- Works across languages
- Many applications
Intelligent search
Sentiment analysis
Semantic filtering
frog eats flies
cow eats grain
goat eats grass
wolf eats rabbit
cat likes ball
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
HTM

Server metrics Human metrics
Natural language
GPS dataEEG dataFinancial data
All these applications run on
the exact same HTM code.

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Anomaly detection
Classification
Applications: IT
Security
Static Data (via active learning)
Capabilities: Classification
Prediction
Applications: Vision image classification
Network classification
Classification of connected graphs

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Anomaly detection
Classification
Applications: IT
Security
Prediction
Static and/or streaming Data
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense

2/3
4
5
6
Research Roadmap
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Anomaly detection
Classification
Applications: IT
Security
Prediction
Static and/or streaming Data
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
Enables : Multi-sensory modalities
Multi-behavioral modalities

- Algorithms are documented
- Multiple independent implementations
NuPIC www.Numenta.org
- Numenta’s software is open source (GPLv3)
- Numenta’s daily research code is online
- Active discussion groups for theory and implementation
- Collaborative
IBM Almaden Research, San Jose, CA
DARPA, Washington D.C
Cortical.IO, Austria
Research Transparency

Machine Intelligence Landscape
Cortical
(e.g. HTM)
ANNs
(e.g. Deep learning)
A.I.
(e.g. Watson)

Premise Biological Mathematical Engineered
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Data Spatial-temporal
Language, Behavior
Spatial-temporal Language
Documents
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Language, Behavior
Documents
Capabilities Classification
Prediction
Goal-oriented Behavior
Classification NL Query
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Language, Behavior
Documents
Capabilities Classification
Prediction
Goal-oriented Behavior
Classification NL Query
Path to M.I.? Yes Probably not Probably not
Cortical
(e.g. HTM)
ANNs
A.I.
(e.g. Watson)

Geospatial Anomalies
Deviation in path Change in direction

Learning Transitions
Form connections to previously active cells.
Predict future activity.

- This is a first order sequence memory.
- It cannot learn A-B-C-D vs. X-B-C-Y.
- Mini-columns turn this into a high-order sequence memory.
Multiple predictions can occur at once.
A-B A-C A-D

Forming High Order Representations
Feedforward: Sparse activation of columns
Burst of activity Highly sparse unique pattern
Unpredicted Predicted
Feedforward: Sparse activation of columns

Representing High-order Sequences
A
X B
B
C
C
Y
D
Before training
A
X B’’
B’
C’’
C’
Y’’
D’
After training
Same columns,
but only one cell active per column.
IF 40 active columns, 10 cells per column
THEN 1040 ways to represent the same input in different contexts

SDR Properties
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union
1) Similarity:
shared bits = semantic similarity

What Can Be Done With Software
1 layer
30 msec / learning-inference-prediction step
10-6 of human cortex
2048 columns 65,000 neurons
300M synapses

Challenges
Dendritic regions
Active dendrites
1,000s of synapses
10,000s of potential synapses
Continuous learning
Challenges and Opportunities for Neuromorphic HW
Opportunities
Low precision memory (synapses)
Fault tolerant
- memory
- connectivity
- neurons
- natural recovery
Simple activation states (no spikes)
Connectivity
- very sparse, topological

2/3
4
5
6
Cellular Layers
Sequence memory
Sequence memory
Sequence memory
Sequence memory
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer implements a variation of a common sequence
memory algorithm.
Higher cortexSensor/lower cortex
Lower cortex
Motor center

Why Will Machine Intelligence be Based on Cortical Principles?
1) Cortex uses a common learning algorithm
vision
hearing
touch
behavior
2) Cortical algorithm is incredibly adaptable
languages
engineering
science
arts …
3) Network effects
Hardware and software efforts will
focus on most universal solution

2/3
4
5
6
Cellular Layers
Sequence memory:
Sequence memory:
Sequence memory:
Sequence memory:
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer is a variation of a common sequence memory algorithm.
Higher cortexSensor/lower cortex
Lower cortex
Sub-cortical
motor center
Inputs/outputs define the role of each layer.

Feedforward activation

Inhibition

Sparse Distributed Representations (SDRs)
- Sensory perception
- Planning
- Motor control
- Prediction
- Attention
Sparse Distribution Representations are used
everywhere in the cortex.

Sparse Distributed Representations
What are they
• Many bits (thousands)
• Few 1’s mostly 0’s
• Example: 2,000 bits, 2% active
• Each bit has semantic meaning
• No bit is essential
01000000000000000001000000000000000000000000000000000010000…………01000
Desirable attributes
• High capacity
• Robust to noise and deletion
• Efficient and fast
• Enable new operations

SDR Operations
1) Similarity:
shared bits = semantic similarity
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union

Feedback
Local
Feedforward
Activates cell
Neurons
Biological neuron HTM neuron
Non-linear
Dendritic AP’s
Depolarize soma
Coincidence
detectors
HTM SynapsesBiological Synapses
Learning is formation of new
synapses.
Synapses have low fidelity. Connection weight is binary
0.0 1.00.4
Learning forms new connections
(“permanence” is scalar)
0 1
Feedforward
Activates cell
Prediction:
Recognize hundreds
of unique patterns
Synapses
Activation:
Recognize dozens of
unique patterns

SDRs are used everywhere in the cortex.
Sparse Distributed Representations (SDRs)

From: Prof. Hasan, Max-Planck-
Institute for Research

x = 0100000000000000000100000000000110000000
• Extremely high capacity
• Robust to noise and deletions
• Have many desirable properties
• Solve semantic representation problem
Attributes
SDR Basics
• Large number of neurons
• Few active at once
• Every cell represents something
• Information is distributed
• SDRs are binary
10 to 15 synapses are
sufficient to
recognize patterns in
thousands of cells.
A single dendrite can
recognize multiple
unique patterns
without confusion.

Example: SDR Classification Capacity in Presence of Noise
• n = number of bits in SDR
• w = number of 1 bits
• W = number of vectors that overlap vector x by b bits
• Probability of false positive for one stored pattern
• Probability of false positive for M stored patterns
Wx (n,w,b) =
wx
b
æ
èç
ö
ø÷ ´
n - wx
w - b
æ
èç
ö
ø÷
fpw
n
(q) =
Wx (n,w,b)
b=q
w
å
n
w
æ
èç
ö
ø÷
fpX (q) £ fpwxi
n
(q)
i=0
M-1
å n = 2048, w = 40
With 50% noise, you can classify 1015 patterns with an error < 10-11
n = 64, w=12
With 33% noise, you can classify only 10 patterns with an error 0.04%
Link.to.whitepaper.com

What the Brain says about Machine Intelligence

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to What the Brain says about Machine Intelligence (20)

More from Numenta (20)

Recently uploaded (20)

What the Brain says about Machine Intelligence