SlideShare a Scribd company logo
Classification
APAM E4990
Computational Social Science
Jake Hofman
Columbia University
April 26, 2013
Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
Prediction a la Bayes1
• You’re testing for a rare condition:
• 1% of the student population is in this class
• You have a highly sensitive and specific test:
• 99% of students in the class visit compsocialscience.org
• 99% of students who aren’t in the class don’t visit this site
• Given that a student visits the course site, what is probability
the student is in our class?
1
Follows Wiggins, SciAm 2006
Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
So given that a student visits the site (198 ppl), there is a 50%
chance the student is in our class (99 ppl)!
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
The small error rate on the large population outside of our class
produces many false positives.
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Inverting conditional probabilities
Bayes’ Theorem
Equate the far right- and left-hand sides of product rule
p (y|x) p (x) = p (x, y) = p (x|y) p (y)
and divide to get the probability of y given x from the probability
of x given y:
p (y|x) =
p (x|y) p (y)
p (x)
where p (x) = y∈ΩY
p (x|y) p (y) is the normalization constant.
Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
Predictions a la Bayes
Given that a patient tests positive, what is probability the patient
is sick?
p (class|visit) =
99/100
p (visit|class)
1/100
p (class)
p (visit)
99/1002+99/1002=198/1002
=
99
198
=
1
2
where p (visit) = p (visit|class) p (class) + p visit|class p class .
Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
(Super) Naive Bayes
We can use Bayes’ rule to build a one-site student classifier:
p (class|site) =
p (site|class) p (class)
p (site)
where we estimate these probabilities with ratios of counts:
ˆp(site|class) =
# students in class who visit site
# students in class
ˆp(site|class) =
# students not in class who visit site
# students not in class
ˆp(class) =
# students in class
# students
ˆp(class) =
# students not in class
# students
Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
Naive Bayes
Represent each student by a binary vector x where xj = 1 if the
student has visited the j-th site (xj = 0 otherwise).
Modeling each site as an independent Bernoulli random variable,
the probability of visiting a set of sites x given class membership
c = 0, 1:
p (x|c) =
j
θ
xj
jc (1 − θjc)1−xj
where θjc denotes the probability that the j-th site is visited by a
student with class membership c.
Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
Naive Bayes
Using this likelihood in Bayes’ rule and taking a logarithm, we have:
log p (c|x) = log
p (x|c) p (c)
p (x)
=
j
xj log
θjc
1 − θjc
+
j
log(1 − θjc) + log
θc
p (x)
Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
Naive Bayes
We can eliminate p (x) by calculating the log-odds:
log
p (1|x)
p (0|x)
=
j
xj log
θj1(1 − θj0)
θj0(1 − θj1)
wj
+
j
log
1 − θj1
1 − θj0
+ log
θ1
θ0
w0
which gives a linear classifier of the form w · x + w0
Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
Naive Bayes
We train by counting students and sites to estimate θjc and θc:
ˆθjc =
njc
nc
ˆθc =
nc
n
and use these to calculate the weights ˆwj and bias ˆw0:
ˆwj = log
ˆθj1(1 − ˆθj0)
ˆθj0(1 − ˆθj1)
ˆw0 =
j
log
1 − ˆθj1
1 − ˆθj0
+ log
ˆθ1
ˆθ0
.
We we predict by simply adding the weights of the sites that a
student has visited to the bias term.
Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
Naive Bayes
In practice, this works better than one might expect given its
simplicity2
2
http://www.jstor.org/pss/1403452
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Training is computationally cheap and scalable, and the model is
easy to update given new observations2
2
http://www.springerlink.com/content/wu3g458834583125/
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Performance varies with document representations and
corresponding likelihood models2
2
http://ceas.cc/2006/15.pdf
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
It’s often important to smooth parameter estimates (e.g., by
adding pseudocounts) to avoid overfitting
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11

More Related Content

PDF
Computational Social Science, Lecture 11: Regression
PPTX
Computational Social Science, Lecture 04: Counting at Scale, Part II
PDF
Computational Social Science, Lecture 10: Online Experiments
PDF
Computational Social Science, Lecture 08: Counting Fast, Part II
PDF
Computational Social Science, Lecture 07: Counting Fast, Part I
PDF
Computational Social Science, Lecture 09: Data Wrangling
PPTX
Computational Social Science, Lecture 06: Networks, Part II
PPTX
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 05: Networks, Part I

Viewers also liked (18)

PPTX
Computational Social Science, Lecture 03: Counting at Scale, Part I
PDF
Computational Social Science, Lecture 02: An Introduction to Counting
PDF
Modeling Social Data, Lecture 6: Regression, Part 1
PDF
Modeling Social Data, Lecture 2: Introduction to Counting
PDF
Modeling Social Data, Lecture 1: Overview
PDF
Modeling Social Data, Lecture 7: Model complexity and generalization
PDF
практ3
PPT
LAS PLANTAS
PDF
практ7
PPS
No Esperes
PPT
Estancias en Guadalajara
DOC
Conferencia educación católica versión final - abril 24, 2009..[1]
ODP
Cerveceria LOS VIKINGOS
PPT
4 de cada barroco
PPSX
Starbucks
PDF
лаб3
PDF
Boletín IgualSí Nº 3 | Diciembre 2015
PPTX
Question Four
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 02: An Introduction to Counting
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 7: Model complexity and generalization
практ3
LAS PLANTAS
практ7
No Esperes
Estancias en Guadalajara
Conferencia educación católica versión final - abril 24, 2009..[1]
Cerveceria LOS VIKINGOS
4 de cada barroco
Starbucks
лаб3
Boletín IgualSí Nº 3 | Diciembre 2015
Question Four
Ad

Similar to Computational Social Science, Lecture 13: Classification (20)

PDF
Modeling Social Data, Lecture 6: Classification with Naive Bayes
PDF
Modeling Social Data, Lecture 8: Classification
PDF
Bayes 6
PPT
Text classification
PPT
Text classification
PPT
Text classification
PPT
Text classification
PPT
Text classification
PPT
Text classification
PPT
Text classification
PDF
Classification_Algorithms_Student_Data_Presentation
PPTX
Belief Networks & Bayesian Classification
PPT
Classification
PPT
9-Decision Tree Induction-23-01-2025.ppt
PPT
UNIT2_NaiveBayes algorithms used in machine learning
PPT
2.3 bayesian classification
PPTX
Introduction to Machine Learning Concepts
PDF
Data classification sammer
PDF
19BayesTheoremClassification19BayesTheoremClassification.ppt
PPT
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 8: Classification
Bayes 6
Text classification
Text classification
Text classification
Text classification
Text classification
Text classification
Text classification
Classification_Algorithms_Student_Data_Presentation
Belief Networks & Bayesian Classification
Classification
9-Decision Tree Induction-23-01-2025.ppt
UNIT2_NaiveBayes algorithms used in machine learning
2.3 bayesian classification
Introduction to Machine Learning Concepts
Data classification sammer
19BayesTheoremClassification19BayesTheoremClassification.ppt
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
Ad

More from jakehofman (14)

PPTX
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
PPTX
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
PDF
Modeling Social Data, Lecture 10: Networks
PDF
Modeling Social Data, Lecture 4: Counting at Scale
PDF
Modeling Social Data, Lecture 3: Data manipulation in R
PDF
Modeling Social Data, Lecture 8: Recommendation Systems
PDF
Modeling Social Data, Lecture 3: Counting at Scale
PDF
Modeling Social Data, Lecture 2: Introduction to Counting
PDF
Modeling Social Data, Lecture 1: Case Studies
PDF
NYC Data Science Meetup: Computational Social Science
PDF
Technical Tricks of Vowpal Wabbit
PDF
Data-driven modeling: Lecture 10
PDF
Data-driven modeling: Lecture 09
PDF
Using Data to Understand the Brain
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 1: Case Studies
NYC Data Science Meetup: Computational Social Science
Technical Tricks of Vowpal Wabbit
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 09
Using Data to Understand the Brain

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Institutional Correction lecture only . . .
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Computing-Curriculum for Schools in Ghana
PDF
Sports Quiz easy sports quiz sports quiz
PDF
RMMM.pdf make it easy to upload and study
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Institutional Correction lecture only . . .
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
GDM (1) (1).pptx small presentation for students
Abdominal Access Techniques with Prof. Dr. R K Mishra
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial disease of the cardiovascular and lymphatic systems
O7-L3 Supply Chain Operations - ICLT Program
Classroom Observation Tools for Teachers
102 student loan defaulters named and shamed – Is someone you know on the list?
STATICS OF THE RIGID BODIES Hibbelers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Types and Its function , kingdom of life
Computing-Curriculum for Schools in Ghana
Sports Quiz easy sports quiz sports quiz
RMMM.pdf make it easy to upload and study
PPH.pptx obstetrics and gynecology in nursing
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

Computational Social Science, Lecture 13: Classification

  • 1. Classification APAM E4990 Computational Social Science Jake Hofman Columbia University April 26, 2013 Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
  • 2. Prediction a la Bayes1 • You’re testing for a rare condition: • 1% of the student population is in this class • You have a highly sensitive and specific test: • 99% of students in the class visit compsocialscience.org • 99% of students who aren’t in the class don’t visit this site • Given that a student visits the course site, what is probability the student is in our class? 1 Follows Wiggins, SciAm 2006 Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
  • 3. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 4. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl So given that a student visits the site (198 ppl), there is a 50% chance the student is in our class (99 ppl)! Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 5. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl The small error rate on the large population outside of our class produces many false positives. Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 6. Inverting conditional probabilities Bayes’ Theorem Equate the far right- and left-hand sides of product rule p (y|x) p (x) = p (x, y) = p (x|y) p (y) and divide to get the probability of y given x from the probability of x given y: p (y|x) = p (x|y) p (y) p (x) where p (x) = y∈ΩY p (x|y) p (y) is the normalization constant. Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
  • 7. Predictions a la Bayes Given that a patient tests positive, what is probability the patient is sick? p (class|visit) = 99/100 p (visit|class) 1/100 p (class) p (visit) 99/1002+99/1002=198/1002 = 99 198 = 1 2 where p (visit) = p (visit|class) p (class) + p visit|class p class . Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
  • 8. (Super) Naive Bayes We can use Bayes’ rule to build a one-site student classifier: p (class|site) = p (site|class) p (class) p (site) where we estimate these probabilities with ratios of counts: ˆp(site|class) = # students in class who visit site # students in class ˆp(site|class) = # students not in class who visit site # students not in class ˆp(class) = # students in class # students ˆp(class) = # students not in class # students Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
  • 9. Naive Bayes Represent each student by a binary vector x where xj = 1 if the student has visited the j-th site (xj = 0 otherwise). Modeling each site as an independent Bernoulli random variable, the probability of visiting a set of sites x given class membership c = 0, 1: p (x|c) = j θ xj jc (1 − θjc)1−xj where θjc denotes the probability that the j-th site is visited by a student with class membership c. Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
  • 10. Naive Bayes Using this likelihood in Bayes’ rule and taking a logarithm, we have: log p (c|x) = log p (x|c) p (c) p (x) = j xj log θjc 1 − θjc + j log(1 − θjc) + log θc p (x) Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
  • 11. Naive Bayes We can eliminate p (x) by calculating the log-odds: log p (1|x) p (0|x) = j xj log θj1(1 − θj0) θj0(1 − θj1) wj + j log 1 − θj1 1 − θj0 + log θ1 θ0 w0 which gives a linear classifier of the form w · x + w0 Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
  • 12. Naive Bayes We train by counting students and sites to estimate θjc and θc: ˆθjc = njc nc ˆθc = nc n and use these to calculate the weights ˆwj and bias ˆw0: ˆwj = log ˆθj1(1 − ˆθj0) ˆθj0(1 − ˆθj1) ˆw0 = j log 1 − ˆθj1 1 − ˆθj0 + log ˆθ1 ˆθ0 . We we predict by simply adding the weights of the sites that a student has visited to the bias term. Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
  • 13. Naive Bayes In practice, this works better than one might expect given its simplicity2 2 http://www.jstor.org/pss/1403452 Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 14. Naive Bayes Training is computationally cheap and scalable, and the model is easy to update given new observations2 2 http://www.springerlink.com/content/wu3g458834583125/ Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 15. Naive Bayes Performance varies with document representations and corresponding likelihood models2 2 http://ceas.cc/2006/15.pdf Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 16. Naive Bayes It’s often important to smooth parameter estimates (e.g., by adding pseudocounts) to avoid overfitting Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11