SlideShare a Scribd company logo
Data-driven modeling
                                           APAM E4990


                                           Jake Hofman

                                          Columbia University


                                         February 6, 2012




Jake Hofman   (Columbia University)        Data-driven modeling   February 6, 2012   1 / 13
Diagnoses a la Bayes1



              • You’re testing for a rare disease:
                  • 1% of the population is infected
              • You have a highly sensitive and specific test:
                  • 99% of sick patients test positive
                  • 99% of healthy patients test negative
              • Given that a patient tests positive, what is probability the
                   patient is sick?




              1
                  Wiggins, SciAm 2006
Jake Hofman       (Columbia University)   Data-driven modeling       February 6, 2012   2 / 13
Diagnoses a la Bayes


                                           Population
                                           10,000 ppl



                              1% Sick                        99% Healthy
                              100 ppl                          9900 ppl

               99% Test +             1% Test -     1% Test +      99% Test -
                 99 ppl                 1 per         99 ppl        9801 ppl




Jake Hofman   (Columbia University)         Data-driven modeling            February 6, 2012   3 / 13
Diagnoses a la Bayes

                                           Population
                                           10,000 ppl



                              1% Sick                        99% Healthy
                              100 ppl                          9900 ppl

               99% Test +             1% Test -     1% Test +      99% Test -
                 99 ppl                 1 per         99 ppl        9801 ppl

          So given that a patient tests positive (198 ppl), there is a 50%
                        chance the patient is sick (99 ppl)!




Jake Hofman   (Columbia University)         Data-driven modeling            February 6, 2012   3 / 13
Diagnoses a la Bayes

                                            Population
                                            10,000 ppl



                               1% Sick                        99% Healthy
                               100 ppl                          9900 ppl

                99% Test +             1% Test -     1% Test +      99% Test -
                  99 ppl                 1 per         99 ppl        9801 ppl
              The small error rate on the large healthy population produces
                                   many false positives.




Jake Hofman    (Columbia University)         Data-driven modeling            February 6, 2012   3 / 13
Inverting conditional probabilities


       Bayes’ Theorem
       Equate the far right- and left-hand sides of product rule

                               p (y |x) p (x) = p (x, y ) = p (x|y ) p (y )

       and divide to get the probability of y given x from the probability
       of x given y :
                                        p (x|y ) p (y )
                             p (y |x) =
                                            p (x)

       where p (x) =                  y ∈ΩY   p (x|y ) p (y ) is the normalization constant.




Jake Hofman   (Columbia University)                Data-driven modeling           February 6, 2012   4 / 13
Diagnoses a la Bayes



       Given that a patient tests positive, what is probability the patient
       is sick?
                                           99/100       1/100

                                        p (+|sick) p (sick)              99   1
                  p (sick|+) =                                      =       =
                                              p (+)                     198   2
                                      99/1002 +99/1002 =198/1002

       where p (+) = p (+|sick) p (sick) + p (+|healthy ) p (healthy ).




Jake Hofman   (Columbia University)          Data-driven modeling                 February 6, 2012   5 / 13
(Super) Naive Bayes
       We can use Bayes’ rule to build a one-word spam classifier:

                                              p (word|spam) p (spam)
                            p (spam|word) =
                                                     p (word)

       where we estimate these probabilities with ratios of counts:
                                         # spam docs containing word
                      p (word|spam) =
                      ˆ
                                                # spam docs
                                         # ham docs containing word
                       p (word|ham)
                       ˆ               =
                                                # ham docs
                                         # spam docs
                           p (spam)
                           ˆ           =
                                           # docs
                                         # ham docs
                            p (ham)
                            ˆ          =
                                           # docs


Jake Hofman   (Columbia University)     Data-driven modeling           February 6, 2012   6 / 13
(Super) Naive Bayes

                 $ ./enron_naive_bayes.sh meeting
                 1500 spam examples
                 3672 ham examples
                 16 spam examples containing meeting
                 153 ham examples containing meeting

                 estimated            P(spam) = .2900
                 estimated            P(ham) = .7100
                 estimated            P(meeting|spam) = .0106
                 estimated            P(meeting|ham) = .0416

                 P(spam|meeting) = .0923




Jake Hofman   (Columbia University)         Data-driven modeling   February 6, 2012   7 / 13
(Super) Naive Bayes

                 $ ./enron_naive_bayes.sh money
                 1500 spam examples
                 3672 ham examples
                 194 spam examples containing money
                 50 ham examples containing money

                 estimated            P(spam) = .2900
                 estimated            P(ham) = .7100
                 estimated            P(money|spam) = .1293
                 estimated            P(money|ham) = .0136

                 P(spam|money) = .7957




Jake Hofman   (Columbia University)         Data-driven modeling   February 6, 2012   8 / 13
(Super) Naive Bayes

                 $ ./enron_naive_bayes.sh enron
                 1500 spam examples
                 3672 ham examples
                 0 spam examples containing enron
                 1478 ham examples containing enron

                 estimated            P(spam) = .2900
                 estimated            P(ham) = .7100
                 estimated            P(enron|spam) = 0
                 estimated            P(enron|ham) = .4025

                 P(spam|enron) = 0




Jake Hofman   (Columbia University)         Data-driven modeling   February 6, 2012   9 / 13
Naive Bayes


       Represent each document by a binary vector x where xj = 1 if the
       j-th word appears in the document (xj = 0 otherwise).

       Modeling each word as an independent Bernoulli random variable,
       the probability of observing a document x of class c is:
                                                          x
                                      p (x|c) =          θjcj (1 − θjc )1−xj
                                                     j


       where θjc denotes the probability that the j-th word occurs in a
       document of class c.




Jake Hofman   (Columbia University)           Data-driven modeling             February 6, 2012   10 / 13
Naive Bayes



       Using this likelihood in Bayes’ rule and taking a logarithm, we have:

                                          p (x|c) p (c)
          log p (c|x) = log
                                              p (x)
                                                   θjc                                       θc
                              =           xj log         +            log(1 − θjc ) + log
                                                 1 − θjc                                    p (x)
                                      j                           j

       where θc is the probability of observing a document of class c.




Jake Hofman   (Columbia University)            Data-driven modeling                  February 6, 2012   11 / 13
Naive Bayes



       We can eliminate p (x) by calculating the log-odds:

              p (1|x)                          θj1 (1 − θj0 )                      1 − θj1       θ1
        log           =               xj log                  +              log           + log
              p (0|x)                          θj0 (1 − θj1 )                      1 − θj0       θ0
                                 j                                       j
                                                  wj
                                                                                      w0

       which gives a linear classifier of the form w · x + w0




Jake Hofman   (Columbia University)               Data-driven modeling                      February 6, 2012   12 / 13
Naive Bayes
       We train by counting words and documents within classes to
       estimate θjc and θc :

                                               ˆ            njc
                                               θjc     =
                                                            nc
                                                  ˆ         nc
                                                  θc   =
                                                            n
       and use these to calculate the weights wj and bias w0 :
                                              ˆ           ˆ

                                                   ˆ        ˆ
                                                   θj1 (1 − θj0 )
                                      ˆ
                                      wj   = log
                                                   ˆ        ˆ
                                                   θj0 (1 − θj1 )
                                                             ˆ
                                                         1 − θj1      ˆ
                                                                      θ1
                                      w0 =
                                      ˆ            log           + log .
                                                             ˆ
                                                         1 − θj0      ˆ
                                                                      θ0
                                              j

       We we predict by simply adding the weights of the words that
       appear in the document to the bias term.
Jake Hofman   (Columbia University)           Data-driven modeling         February 6, 2012   13 / 13
Naive Bayes




              In practice, this works better than one might expect given its
                                        simplicity2




              2
                  http://www.jstor.org/pss/1403452
Jake Hofman       (Columbia University)   Data-driven modeling     February 6, 2012   14 / 13
Naive Bayes




         Training is computationally cheap and scalable, and the model is
                      easy to update given new observations2




              2
                  http://www.springerlink.com/content/wu3g458834583125/
Jake Hofman       (Columbia University)   Data-driven modeling      February 6, 2012   14 / 13
Naive Bayes




                     Performance varies with document representations and
                               corresponding likelihood models2




              2
                  http://ceas.cc/2006/15.pdf
Jake Hofman       (Columbia University)   Data-driven modeling      February 6, 2012   14 / 13
Naive Bayes




              It’s often important to smooth parameter estimates (e.g., by
                         adding pseudocounts) to avoid overfitting




Jake Hofman   (Columbia University)   Data-driven modeling        February 6, 2012   14 / 13

More Related Content

PDF
Modeling Social Data, Lecture 8: Classification
PDF
Data-driven modeling: Lecture 10
PDF
Data-driven modeling: Lecture 09
PDF
Using Data to Understand the Brain
PDF
Large-scale social media analysis with Hadoop
PDF
Modeling Social Data, Lecture 1: Overview
PDF
Modeling Social Data, Lecture 3: Counting at Scale
PDF
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 8: Classification
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 09
Using Data to Understand the Brain
Large-scale social media analysis with Hadoop
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 2: Introduction to Counting

Viewers also liked (16)

PDF
Modeling Social Data, Lecture 4: Counting at Scale
PDF
Modeling Social Data, Lecture 3: Data manipulation in R
PPTX
Bitcoin's Killer App 2017 - Sean Walsh
PDF
Recomendaciones integrales de política pública para las juventudes en la ar...
PPTX
Social Media Policies
DOCX
Cuadro de los pagos tributarios
PPT
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
PPTX
Charitable trust ppt
PPTX
Motivación laboral
PDF
IBM Hadoop-DS Benchmark Report - 30TB
PPTX
ระบบสารสนเทศ
PPTX
Amazon alexa - building custom skills
PDF
2016 Results & Outlook
PDF
Jupyter for Education: Beyond Gutenberg and Erasmus
PDF
Teraproc Application Cluster-as-a-Service Overview Presentation
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 3: Data manipulation in R
Bitcoin's Killer App 2017 - Sean Walsh
Recomendaciones integrales de política pública para las juventudes en la ar...
Social Media Policies
Cuadro de los pagos tributarios
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
Charitable trust ppt
Motivación laboral
IBM Hadoop-DS Benchmark Report - 30TB
ระบบสารสนเทศ
Amazon alexa - building custom skills
2016 Results & Outlook
Jupyter for Education: Beyond Gutenberg and Erasmus
Teraproc Application Cluster-as-a-Service Overview Presentation
Ad

Similar to Data-driven Modeling: Lecture 03 (20)

PDF
Modeling Social Data, Lecture 6: Classification with Naive Bayes
PPTX
Statistical learning
PPTX
Probabilistic Reasoning
PDF
Lesson 27
PDF
AI Lesson 27
PDF
Computational Social Science, Lecture 13: Classification
PDF
04 Bayes Rule
PPTX
FALLSEM2023-24_MSM5001_ETH_VL2023240106334_2023-09-19_Reference-Material-I (3...
PDF
Bayes Rule with a formula for students studying
PDF
Warmup_New.pdf
PPTX
2.statistical DEcision makig.pptx
PPT
Active learning lecture
PDF
Introduction to Statistical Machine Learning
PDF
Bayesian Learning - Naive Bayes Algorithm
PDF
1615 probability-notation for joint probabilities
PPT
Bayesian networks
PDF
21 Inference
PDF
03 Probability.pdf
PDF
Naive Bayes
PPTX
Day 3.pptx
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Statistical learning
Probabilistic Reasoning
Lesson 27
AI Lesson 27
Computational Social Science, Lecture 13: Classification
04 Bayes Rule
FALLSEM2023-24_MSM5001_ETH_VL2023240106334_2023-09-19_Reference-Material-I (3...
Bayes Rule with a formula for students studying
Warmup_New.pdf
2.statistical DEcision makig.pptx
Active learning lecture
Introduction to Statistical Machine Learning
Bayesian Learning - Naive Bayes Algorithm
1615 probability-notation for joint probabilities
Bayesian networks
21 Inference
03 Probability.pdf
Naive Bayes
Day 3.pptx
Ad

More from jakehofman (20)

PPTX
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
PPTX
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
PDF
Modeling Social Data, Lecture 10: Networks
PDF
Modeling Social Data, Lecture 7: Model complexity and generalization
PDF
Modeling Social Data, Lecture 6: Regression, Part 1
PDF
Modeling Social Data, Lecture 8: Recommendation Systems
PDF
Modeling Social Data, Lecture 2: Introduction to Counting
PDF
Modeling Social Data, Lecture 1: Case Studies
PDF
NYC Data Science Meetup: Computational Social Science
PDF
Computational Social Science, Lecture 11: Regression
PDF
Computational Social Science, Lecture 10: Online Experiments
PDF
Computational Social Science, Lecture 09: Data Wrangling
PDF
Computational Social Science, Lecture 08: Counting Fast, Part II
PDF
Computational Social Science, Lecture 07: Counting Fast, Part I
PPTX
Computational Social Science, Lecture 06: Networks, Part II
PPTX
Computational Social Science, Lecture 05: Networks, Part I
PPTX
Computational Social Science, Lecture 04: Counting at Scale, Part II
PPTX
Computational Social Science, Lecture 03: Counting at Scale, Part I
PDF
Computational Social Science, Lecture 02: An Introduction to Counting
PDF
Technical Tricks of Vowpal Wabbit
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 1: Case Studies
NYC Data Science Meetup: Computational Social Science
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 02: An Introduction to Counting
Technical Tricks of Vowpal Wabbit

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectroscopy.pptx food analysis technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Data-driven Modeling: Lecture 03

  • 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University February 6, 2012 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 1 / 13
  • 2. Diagnoses a la Bayes1 • You’re testing for a rare disease: • 1% of the population is infected • You have a highly sensitive and specific test: • 99% of sick patients test positive • 99% of healthy patients test negative • Given that a patient tests positive, what is probability the patient is sick? 1 Wiggins, SciAm 2006 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 2 / 13
  • 3. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 99% Healthy 100 ppl 9900 ppl 99% Test + 1% Test - 1% Test + 99% Test - 99 ppl 1 per 99 ppl 9801 ppl Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 3 / 13
  • 4. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 99% Healthy 100 ppl 9900 ppl 99% Test + 1% Test - 1% Test + 99% Test - 99 ppl 1 per 99 ppl 9801 ppl So given that a patient tests positive (198 ppl), there is a 50% chance the patient is sick (99 ppl)! Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 3 / 13
  • 5. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 99% Healthy 100 ppl 9900 ppl 99% Test + 1% Test - 1% Test + 99% Test - 99 ppl 1 per 99 ppl 9801 ppl The small error rate on the large healthy population produces many false positives. Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 3 / 13
  • 6. Inverting conditional probabilities Bayes’ Theorem Equate the far right- and left-hand sides of product rule p (y |x) p (x) = p (x, y ) = p (x|y ) p (y ) and divide to get the probability of y given x from the probability of x given y : p (x|y ) p (y ) p (y |x) = p (x) where p (x) = y ∈ΩY p (x|y ) p (y ) is the normalization constant. Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 4 / 13
  • 7. Diagnoses a la Bayes Given that a patient tests positive, what is probability the patient is sick? 99/100 1/100 p (+|sick) p (sick) 99 1 p (sick|+) = = = p (+) 198 2 99/1002 +99/1002 =198/1002 where p (+) = p (+|sick) p (sick) + p (+|healthy ) p (healthy ). Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 5 / 13
  • 8. (Super) Naive Bayes We can use Bayes’ rule to build a one-word spam classifier: p (word|spam) p (spam) p (spam|word) = p (word) where we estimate these probabilities with ratios of counts: # spam docs containing word p (word|spam) = ˆ # spam docs # ham docs containing word p (word|ham) ˆ = # ham docs # spam docs p (spam) ˆ = # docs # ham docs p (ham) ˆ = # docs Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 6 / 13
  • 9. (Super) Naive Bayes $ ./enron_naive_bayes.sh meeting 1500 spam examples 3672 ham examples 16 spam examples containing meeting 153 ham examples containing meeting estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(meeting|spam) = .0106 estimated P(meeting|ham) = .0416 P(spam|meeting) = .0923 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 7 / 13
  • 10. (Super) Naive Bayes $ ./enron_naive_bayes.sh money 1500 spam examples 3672 ham examples 194 spam examples containing money 50 ham examples containing money estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(money|spam) = .1293 estimated P(money|ham) = .0136 P(spam|money) = .7957 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 8 / 13
  • 11. (Super) Naive Bayes $ ./enron_naive_bayes.sh enron 1500 spam examples 3672 ham examples 0 spam examples containing enron 1478 ham examples containing enron estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(enron|spam) = 0 estimated P(enron|ham) = .4025 P(spam|enron) = 0 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 9 / 13
  • 12. Naive Bayes Represent each document by a binary vector x where xj = 1 if the j-th word appears in the document (xj = 0 otherwise). Modeling each word as an independent Bernoulli random variable, the probability of observing a document x of class c is: x p (x|c) = θjcj (1 − θjc )1−xj j where θjc denotes the probability that the j-th word occurs in a document of class c. Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 10 / 13
  • 13. Naive Bayes Using this likelihood in Bayes’ rule and taking a logarithm, we have: p (x|c) p (c) log p (c|x) = log p (x) θjc θc = xj log + log(1 − θjc ) + log 1 − θjc p (x) j j where θc is the probability of observing a document of class c. Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 11 / 13
  • 14. Naive Bayes We can eliminate p (x) by calculating the log-odds: p (1|x) θj1 (1 − θj0 ) 1 − θj1 θ1 log = xj log + log + log p (0|x) θj0 (1 − θj1 ) 1 − θj0 θ0 j j wj w0 which gives a linear classifier of the form w · x + w0 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 12 / 13
  • 15. Naive Bayes We train by counting words and documents within classes to estimate θjc and θc : ˆ njc θjc = nc ˆ nc θc = n and use these to calculate the weights wj and bias w0 : ˆ ˆ ˆ ˆ θj1 (1 − θj0 ) ˆ wj = log ˆ ˆ θj0 (1 − θj1 ) ˆ 1 − θj1 ˆ θ1 w0 = ˆ log + log . ˆ 1 − θj0 ˆ θ0 j We we predict by simply adding the weights of the words that appear in the document to the bias term. Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 13 / 13
  • 16. Naive Bayes In practice, this works better than one might expect given its simplicity2 2 http://www.jstor.org/pss/1403452 Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 14 / 13
  • 17. Naive Bayes Training is computationally cheap and scalable, and the model is easy to update given new observations2 2 http://www.springerlink.com/content/wu3g458834583125/ Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 14 / 13
  • 18. Naive Bayes Performance varies with document representations and corresponding likelihood models2 2 http://ceas.cc/2006/15.pdf Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 14 / 13
  • 19. Naive Bayes It’s often important to smooth parameter estimates (e.g., by adding pseudocounts) to avoid overfitting Jake Hofman (Columbia University) Data-driven modeling February 6, 2012 14 / 13