Sparse Distributed Representations: Our Brain's Data Structure

Sparse Distributed Representations:
Our Brain’s Data Structure
Numenta Workshop
October 17, 2014
Subutai Ahmad, VP Research
sahmad@numenta.com

The Role of Sparse Distributed Representations in Cortex
1) Sensory perception
3) Motor control
4) Prediction
2) Planning
5) Attention
Sparse Distribution Representations (SDRs) are the foundation for all these
functions, across all sensory modalities
Analysis of this common cortical data structure can provide a rigorous
foundation for cortical computing

Talk Outline
1) Introduction to Sparse Distributed Representations (SDRs)
2) Fundamental properties of SDRs
– Error bounds
– Scaling laws

From: Prof. Hasan, Max-Planck-
Institut for Research

Basics Attributes of SDRs
1) Only a small number of neurons are firing
at any point in time
3) Every cell represents something and has
meaning
4) Information is distributed and no single
neuron is critical
2) There are a very large number of neurons
5) Every neuron only connects to a subset of
other neurons
6) SDRs enable extremely fast computation
7) SDRs are binary
x = 0100000000000000000100000000000110000000

Multiple input SDR’s Single bit in an output SDR
How Does a Single Neuron Operate on SDRs?

Proximal segments
represent dozens of
separate patterns in a single
segment
How Does a Single Neuron Operate on SDRs?
Hundreds of distal segments each detect a
unique SDR using a threshold
Feedback SDR
Context SDR
Bottom-up input SDR
In both cases each synapse corresponds to one bit in
the incoming high dimensional SDR

• Extremely high capacity
• Recognize patterns in the presence of noise
• Robust to random deletions
• Represent dynamic set of patterns in a single fixed structure
• Extremely efficient
Fundamental Properties of SDRs

Notation
• We represent a SDR vector as a vector with n binary values
where each bit represents the activity of a single neuron:
• s = percent of ON bits, w = number of ON bits
x =[b0,… ,bn-1]
wx = s ´ n = x 1
Example
• n = 40, s = 0.1, w = 4
• Typical range of numbers in HTM implementations:
n = 2048 to 65,536 s = 0.05% to 2% w = 40
y =1000000000000000000100000000000110000000
x = 0100000000000000000100000000000110000000

SDRs Have Extremely High Capacity
• The number of unique patterns that can be represented is:
• This is far smaller than 2n, but far larger than any reasonable need
• Example: with n = 2048 and w = 40,
the number of unique patterns is > 1084 >> # atoms in universe
• Chance that two random vectors are identical is essentially zero:
n
w
æ
èç
ö
ø÷ =
n!
w! n - w( )!
1/
n
w
æ
èç
ö
ø÷

• Extremely high capacity
• Recognize patterns in the presence of noise
• Robust to random deletions
• Represent multiple patterns in a single fixed structure
• Extremely efficient
Fundamental Properties of SDRs

Similarity Metric for Recognition of SDR Patterns
• We don’t use typical vector similarities
– Neurons cannot compute Euclidean or Hamming distance between SDRs
– Any p-norm requires full connectivity
• Compute similarity using an overlap metric
– The overlap is simply the number of bits in common
– Requires only minimal connectivity
– Mathematically, take the AND of two vectors and compute its length
• Detecting a “Match”
– Two SDR vectors “match” if their overlap meets a minimum threshold
overlap(x,y) º x Ù y
match(x,y) º overlap(x,y) ³q
q

Overlap example
• N=40, s=0.1, w=4
• The two vectors have an overlap of 3, so they “match” if the
threshold is 3.
y =1000000000000000000100000000000110000000
x = 0100000000000000000100000000000110000000

How Accurate is Matching With Noise?
• As you decrease the match threshold , you decrease sensitivity and increase
robustness to noise
• You also increase the chance of false positives
Decrease
q
q

How Many Vectors Match When You Decrease the Threshold?
• Define the “overlap set of x” to be the set of
vectors with exactly b bits of overlap with x
• The number of such vectors is:
Wx (n,w,b) =
wx
b
æ
èç
ö
ø÷ ´
n - wx
w - b
æ
èç
ö
ø÷
Wx (n,w,b)
Number subsets of x with
exactly b bits ON
Number patterns occupying the rest
of the vector with exactly w-b bits ON

Error Bound for Classification with Noise
• Give a single stored pattern, probability of false positive is:
• Given M patterns, probability of a false positive is:
fpw
n
(q) =
Wx (n,w,b)
b=q
w
å
n
w
æ
èç
ö
ø÷
fpX (q) £ fpwxi
n
(q)
i=0
M-1
å

What Does This Mean in Practice?
• With SDRs you can classify a huge number of patterns with substantial noise
(if n and w are large enough)
Examples
• n = 2048, w = 40
With up to 14 bits of noise (33%), you can classify a quadrillion
patterns with an error rate of less than 10-24
With up to 20 bits of noise (50%), you can classify a quadrillion
patterns with an error rate of less than 10-11
• n = 64, w=12
With up to 4 bits of noise (33%), you can classify 10 patterns
with an error rate of 0.04%

Neurons Are Highly Robust Pattern Recognizers
Hundreds of distal segments each detect a
unique SDR using a threshold
You can have tens of thousands of neurons examining a single input SDR, and very
robustly matching complex patterns

SDRs are Robust to Random Deletions
• In cortex bits in an SDR can randomly disappear
– Synapses can be quite unreliable
– Individual neurons can die
– A patch of cortex can be damaged
• The analysis for random deletions is very similar to noise
• SDRs can naturally handle fairly significant random failures
– Failures are tolerated in any SDR and in any part of the system
• This is a great property for those building HTM based hardware
– The probability of failures can be exactly characterized

Representing Multiple Patterns in a Single SDR
• There are situations where we want to store multiple patterns within a single SDR
and match them
• In temporal inference the system might make multiple predictions about the future
Example

Unions of SDRs
• We can store a set of patterns in a single fixed representation by taking the OR of
all the individual patterns
• The vector representing the union is also going to match a large number of other
patterns that were not one of the original 10
• How many such patterns can we store reliably, without a high chance of false
positives?
Is this SDR
a member?
1)
2)
3)
….
10)
2%
< 20%Union

Error Bounds for Unions
• Expected number of ON bits:
• Give a union of M patterns, the expected probability of a false positive (with
noise) is:

What Does This Mean in Practice?
• You can form reliable unions of a reasonable number of patterns (assuming
large enough n and w)
Examples
• n = 2048, w = 40
The union of 50 patterns leads to an error rate of 10-9
• n = 512, w=10
The union of 50 patterns leads to an error rate of 0.9%

SDRs Enable Highly Efficient Operations
• In cortex complex operations are carried out rapidly
– Visual system can perform object recognition in 100-150 msecs
• SDR vectors are large, but all operations are O(w) and independent of
vector size
– No loops or optimization process required
• Matching a pattern against a dynamic list (unions) is O(w) and
independent of the number of items in the list
• Enables a tiny dendritic segment to perform robust pattern recognition
• We can simulate 200,000 neurons in software at about 25-50Hz

Summary
• SDR’s are the common data structure in the cortex
• SDR’s enable flexible recognition systems that have very high capacity, and are
robust to a large amount of noise
• The union property allows a fixed representation to encode a dynamically
changing set of patterns
• The analysis of SDR’s provides a principled foundation for characterizing the
behavior of the HTM learning algorithms and all cognitive functions
• Sparse memory (Kanerva), Sparse coding (Olshausen), Bloom filters (Broder)
Related work

Questions? Math jokes?
Follow us on Twitter @numenta
Sign up for our newsletter at www.numenta.com
Subutai Ahmad
sahmad@numenta.com
nupic-theory mailing list
numenta.org/lists

Sparse Distributed Representations: Our Brain's Data Structure

More Related Content

What's hot (20)

Viewers also liked (10)

Similar to Sparse Distributed Representations: Our Brain's Data Structure (20)

More from Numenta (13)

Recently uploaded (20)

Sparse Distributed Representations: Our Brain's Data Structure