SlideShare a Scribd company logo
Digging into the Dirichlet
Max Sklar
@maxsklar

New York Machine Learning Meetup
December 19th, 2013
Dedication

Meyer Marks
1925 - 2013
The Dirichlet Distribution
Let’s start with something simpler
Let’s start with something simpler
A Pie Chart!
Let’s start with something simpler
A Pie Chart!
AKA
Discrete Distribution
Multinomial Distribution
Let’s start with something simpler
A Pie Chart!
K = The number of
categories.
K=5
Examples of Multinomial Distributions
Examples of Multinomial Distributions
Examples of Multinomial Distributions
Examples of Multinomial Distributions
Examples of Multinomial Distributions
What does the raw data look like?
What does the raw data look like?
id

# dislikes

1

231

23

2

Counts!

# likes

81

40

3

67

9

4

121

14

5

9

31

6

18

0

7

1

1
What does the raw data look like?
More specifically:
- K columns of counts
- N rows of data

id

# likes

# dislikes

1

231

23

2

81

40

3

67

9

4

121

14

5

9

31

6

18

0

7

1

1
BUT...

Counts != Multinomial Distribution
BUT...
We can estimate the multinomial
distribution with the counts, using
the maximum likelihood estimate

366

181

203
BUT...
We can estimate the multinomial
distribution with the counts, using
the maximum likelihood estimate

Sum =
366 + 181 + 203 =
750

366

181

203
BUT...
We can estimate the multinomial
distribution with the counts, using
the maximum likelihood estimate

366 / 750
181 / 750
203 / 750

366

181

203
BUT...
We can estimate the multinomial
distribution with the counts, using
the maximum likelihood estimate

48.8%
24.1%
27.1%

366

181

203
BUT...
366

203

1

Uh Oh

181
2

1
BUT...
366

This column
will be all
Yellow
right?

181

203

1

2

1

0

1

0
BUT...
366

203

1

Panic!!!!

181
2

1

0

1

0

0

0

0
Bayesian Statistics to the Rescue
Bayesian Statistics to the Rescue
Still assume each row was
generated by a
multinomial distribution
Bayesian Statistics to the Rescue
Still assume each row was
generated by a
multinomial distribution
We just don’t know
which one!
The Dirichlet Distribution
Is a probability
distribution over all
possible multinomial
distributions, p.
The Dirichlet Distribution

?

Represents our
uncertainty over the
actual distribution
that created the row.

?
The Dirichlet Distribution

p: represents a multinomial distribution
alpha: the parameters of the dirichlet
K: the number of categories
Bayesian Updates
Bayesian Updates

Also a Dirichlet!
Bayesian Updates

Also a Dirichlet!
(Conjugate Prior)
Bayesian Updates
Bayesian Updates
Bayesian Updates

+1
Why Does
this Work?
Let’s look at it
again.
Entropy
Entropy
Information Content
Entropy
Information Content
Energy
Entropy
Information Content
Energy
Log Likelihood
Entropy
Information Content
Energy
Log Likelihood
Digging into the Dirichlet Distribution by Max Sklar
The Dirichlet Distribution
The Dirichlet Distribution

Normalizing Constant
The Dirichlet Distribution

Normalizing Constant
The Dirichlet Distribution
The Dirichlet Distribution
The Dirichlet Distribution
The Dirichlet Distribution

Linear
The Dirichlet MACHINE

Prior

1.2

3.0

0.3
The Dirichlet MACHINE

Prior

1.2

3.0

0.3
The Dirichlet MACHINE

Update

2.2

3.0

0.3
The Dirichlet MACHINE

Prior

2.2

3.0

0.3
The Dirichlet MACHINE

Update

2.2

3.0

1.3
The Dirichlet MACHINE

Prior

2.2

3.0

1.3
The Dirichlet MACHINE

Update

2.2

3.0

2.3
Interpreting the Parameters
1

3

2

What does this alpha
vector really mean?
Interpreting the Parameters
1

normalized

1/6

3/6

3

2

sum

2/6

6
Interpreting the Parameters
1

Expected
Value

3

2

6
Interpreting the Parameters
1

Expected
Value

3

2

6

Weight
ANALOGY: Normal Distribution

Precision =
1 / variance
ANALOGY: Normal Distribution
High precision:
data is close to
the mean
Low precision:
far away from
the mean
Interpreting the Parameters
1

Expected
Value

3

2

6

Precision
Digging into the Dirichlet Distribution by Max Sklar
High Weight Dirichlet
0.4

0.4

0.2
Low Weight Dirichlet
0.4

0.4

0.2
Urn Model
At each step,
pick a ball from
the urn..
Replace it, and
add another ball
of that color into
the urn
Urn Model
At each step,
pick a ball from
the urn..
Replace it, and
add another ball
of that color into
the urn
Urn Model
At each step,
pick a ball from
the urn..
Replace it, and
add another ball
of that color into
the urn
Urn Model
At each step,
pick a ball from
the urn..
Replace it, and
add another ball
of that color into
the urn
Urn Model
At each step,
pick a ball from
the urn..
Replace it, and
add another ball
of that color into
the urn
Urn Model
Rich get richer...
Urn Model
Rich get richer...
Urn Model
Finally yellow
catches a break
Urn Model
Finally yellow
catches a break
Urn Model
But it’s too late...
Urn Model
As the urn gets
more populated,
the distribution
gets “stuck” in
place.
Urn Model
Once lots of data
has been collected,
or the dirichlet has
high precision, it’s
hard to overturn that
with new data
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
When you find
the white ball,
throw a new
color into the
mix.
Chinese Restaurant Process
The expected
infinite
distribution
(mean) is
exponential.
# of white balls
controls the
exponent
So coming back to the count data:
20

0

2

What Dirichlet
parameters explain
the data?

0
1

17

14

6

0

15

5

0

0

20

0

0

14

6
So coming back to the count data:
20

0

2

Newton’s Method:
Requires Gradient
+ Hessian

0
1

17

14

6

0

15

5

0

0

20

0

0

14

6
So coming back to the count data:
20

0

2

Reads all of the
data...

0
1

17

14

6

0

15

5

0

0

20

0

0

14

6
So coming back to the count data:
20

https://github.
com/maxsklar/Baye
sPy/tree/master/Con
jugatePriorTools

0

0

2

1

17

14

6

0

15

5

0

0

20

0

0

14

6
So coming back to the count data:
Compress the data
into a Matrix and a
Vector:
Works for lots of
sparsely populated
rows

20

0

0

2

1

17

14

6

0

15

5

0

0

20

0

0

14

6
The Compression MACHINE

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0
The Compression MACHINE
K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0
The Compression MACHINE
1

0

0

K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0
The Compression MACHINE
K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0
The Compression MACHINE
1

3

2

K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0
The Compression MACHINE
K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

2

0

0

0

0

0

1

1

1

0

0

0

1

1

0

0

0

0

2

1

1

1

1

1
The Compression MACHINE
0

6

0

K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

2

0

0

0

0

0

1

1

1

0

0

0

1

1

0

0

0

0

2

1

1

1

1

1
The Compression MACHINE
K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

2

0

0

0

0

0

2

2

2

1

1

1

1

1

0

0

0

0

3

2

2

2

2

2
The Compression MACHINE
2

1

1

K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

2

0

0

0

0

0

2

2

2

1

1

1

1

1

0

0

0

0

3

2

2

2

2

2
The Compression MACHINE
K=3
(the 4th row is a special, total row)

M=6
The maximum # samples per
input

3

1

0

0

0

0

3

2

2

1

1

1

2

1

0

0

0

0

4

3

3

3

2

2
DEMO
Our Popularity Prior
Dirichlet Mixture Models
Anything you can
go with a
Gaussian, you
can also do with a
Dirichlet
Dirichlet Mixture Models
Example:
Mixture of
Gaussians using
ExpectationMaximization
Dirichlet Mixture Models
Assume each row is
a mixture of
multinomials.
And the parameters
of that mixture are
pulled from a
Dirichlet.

?
Dirichlet Mixture Models
Latent Dirichlet
Allocation
Topic Model
Questions

More Related Content

PDF
bag-of-words models
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PPTX
Machine Learning lecture6(regularization)
PDF
Machine Learning: Generative and Discriminative Models
PDF
Latent Dirichlet Allocation
PPTX
PPTX
Support Vector Machines Simply
PDF
Dimensionality reduction with UMAP
bag-of-words models
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Machine Learning lecture6(regularization)
Machine Learning: Generative and Discriminative Models
Latent Dirichlet Allocation
Support Vector Machines Simply
Dimensionality reduction with UMAP

What's hot (20)

PDF
Natural Language Processing (NLP)
PPT
Introduction to Natural Language Processing
PDF
Recurrent Neural Networks. Part 1: Theory
PDF
Introduction to TensorFlow 2.0
PDF
ODP
Machine Learning with Decision trees
PPTX
Neural Networks with Google TensorFlow
PDF
Winning Kaggle 101: Introduction to Stacking
PDF
Deep Learning for Recommender Systems
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
PPTX
Data Science: Past, Present, and Future
PDF
An introduction to deep reinforcement learning
PPTX
Causal Inference in Marketing
PPTX
Deep neural networks
PDF
Support Vector Machines ( SVM )
PPTX
Data Wrangling
PDF
Jonathan Ronen - Variational Autoencoders tutorial
PDF
Introduction to Machine Learning
PPTX
Data Mining: clustering and analysis
Natural Language Processing (NLP)
Introduction to Natural Language Processing
Recurrent Neural Networks. Part 1: Theory
Introduction to TensorFlow 2.0
Machine Learning with Decision trees
Neural Networks with Google TensorFlow
Winning Kaggle 101: Introduction to Stacking
Deep Learning for Recommender Systems
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Data Science: Past, Present, and Future
An introduction to deep reinforcement learning
Causal Inference in Marketing
Deep neural networks
Support Vector Machines ( SVM )
Data Wrangling
Jonathan Ronen - Variational Autoencoders tutorial
Introduction to Machine Learning
Data Mining: clustering and analysis
Ad

Viewers also liked (11)

PPTX
A Linked Data Dataset for Madrid Transport Authority's Datasets
PDF
NIPS machine learning in computational biology presentation
PPTX
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
PDF
אלעד גולדנברג: איקומרס טוב למדינה - 3 צעדים לשינוי צרכני אמיתי באינטרנט בישראל
PPTX
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
PPTX
Image segmentation hj_cho
PPTX
Normalization 방법
PDF
[225]yarn 기반의 deep learning application cluster 구축 김제민
PDF
Generative adversarial networks
PDF
Expectation Maximization and Gaussian Mixture Models
PPTX
Squeezing Deep Learning Into Mobile Phones
A Linked Data Dataset for Madrid Transport Authority's Datasets
NIPS machine learning in computational biology presentation
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
אלעד גולדנברג: איקומרס טוב למדינה - 3 צעדים לשינוי צרכני אמיתי באינטרנט בישראל
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
Image segmentation hj_cho
Normalization 방법
[225]yarn 기반의 deep learning application cluster 구축 김제민
Generative adversarial networks
Expectation Maximization and Gaussian Mixture Models
Squeezing Deep Learning Into Mobile Phones
Ad

Similar to Digging into the Dirichlet Distribution by Max Sklar (20)

PPTX
Dirichlet processes and Applications
PDF
Gentle Introduction to Dirichlet Processes
PDF
A Gentle Introduction to Bayesian Nonparametrics
PDF
data-microscopes
PDF
DirichletProcessNotes
PDF
A Gentle Introduction to Bayesian Nonparametrics
PDF
Introduction to Evidential Neural Networks
PPTX
Matlab Distributions
PPTX
Matlab: Statistics and Distributions
PPTX
Basic Machine Learning in Python tutorial
PDF
Testing for mixtures at BNP 13
PPTX
Data association for semantic world modeling from partial views
PDF
DAVLectuer3 Exploratory data analysis .pdf
PDF
Resolving e commerce challenges with probabilistic programming
PPTX
Statistics for data science
PDF
Locally Averaged Bayesian Dirichlet Metrics
PPTX
Chinese Restaurant Process
PDF
03-Data-Analysis-Final.pdf
PDF
Probably, Definitely, Maybe
PDF
How many components in a mixture?
Dirichlet processes and Applications
Gentle Introduction to Dirichlet Processes
A Gentle Introduction to Bayesian Nonparametrics
data-microscopes
DirichletProcessNotes
A Gentle Introduction to Bayesian Nonparametrics
Introduction to Evidential Neural Networks
Matlab Distributions
Matlab: Statistics and Distributions
Basic Machine Learning in Python tutorial
Testing for mixtures at BNP 13
Data association for semantic world modeling from partial views
DAVLectuer3 Exploratory data analysis .pdf
Resolving e commerce challenges with probabilistic programming
Statistics for data science
Locally Averaged Bayesian Dirichlet Metrics
Chinese Restaurant Process
03-Data-Analysis-Final.pdf
Probably, Definitely, Maybe
How many components in a mixture?

More from Hakka Labs (20)

PDF
Always Valid Inference (Ramesh Johari, Stanford)
PPTX
DataEngConf SF16 - High cardinality time series search
PDF
DataEngConf SF16 - Data Asserts: Defensive Data Science
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
PDF
DataEngConf SF16 - Recommendations at Instacart
PDF
DataEngConf SF16 - Running simulations at scale
PDF
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
PDF
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
PDF
DataEngConf SF16 - Three lessons learned from building a production machine l...
PDF
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
PDF
DataEngConf SF16 - Bridging the gap between data science and data engineering
PDF
DataEngConf SF16 - Multi-temporal Data Structures
PDF
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
PDF
DataEngConf SF16 - Beginning with Ourselves
PDF
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
PDF
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
PDF
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
PDF
DataEngConf SF16 - Spark SQL Workshop
Always Valid Inference (Ramesh Johari, Stanford)
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - Data Asserts: Defensive Data Science
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Spark SQL Workshop

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
A Presentation on Artificial Intelligence
PDF
Mushroom cultivation and it's methods.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
1. Introduction to Computer Programming.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
OMC Textile Division Presentation 2021.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Assigned Numbers - 2025 - Bluetooth® Document
A Presentation on Artificial Intelligence
Mushroom cultivation and it's methods.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
1. Introduction to Computer Programming.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
SOPHOS-XG Firewall Administrator PPT.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Getting Started with Data Integration: FME Form 101
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Machine Learning_overview_presentation.pptx
Heart disease approach using modified random forest and particle swarm optimi...
OMC Textile Division Presentation 2021.pptx

Digging into the Dirichlet Distribution by Max Sklar