SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Bayes’ theorem and logistic regression
 Bayes’ theorem gives the relationship between the probabilities of A and B, P(A) and
P(B), and the conditional probabilities of A given B and B given A, P(A|B) and P(B|A), in
its most common form P(A|B)=
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃(𝐵)
 In Bayesian interpretation, probability measures the degree of belief. Bayes theorem links
the belief in a proposition before and after accounting for an evidence.
 For proposition A and evidence B:
 P(A), the prior, is the initial degree of belief in A
 P(A|B), the posterior, is the degree of belief having accounted for B
 The quotient P(B|A)/P(B) represents the support B provides for A
 The probability model for a classifier is a conditional model p(C|F1, …., Fn), over a
dependent class variable C, with a small number of outcomes or classes conditioned on
several feature variables F1 through Fn .
 Problem – large number of features or features that can take large number of values
makes the probability tables infeasible
 Using Bayesian theorem p(C|F1, …., Fn) =
𝒑 𝑪 𝒑(𝑭 𝟏
,…….𝑭𝒏|𝑪)
𝒑(𝑭 𝟏
,…….𝑭𝒏)
 in plain English, posterior=
𝑝𝑟𝑖𝑜𝑟 ∗𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
 Since the denominator is not dependent on C and the values of the features Fi are given,
so that the denominator is effectively constant. The numerator is equivalent to the joint
probability model p(C,F1, …., Fn ).
 Using the chain rule for repeated applications of definition of conditional probability
 Role of Naïve condition: assume that the feature Fi is conditionally independent of every
other feature Fj, for j≠i given the category C.
 The joint model can be represented as
 Under conditional distribution over the class variable
 Where z=p(F1, ….., Fn) is a scaling factor
 Naïve Bayes classifier combines this model with a decision rule.
 Most common rule is to pick hypothesis that is most probable, known as maximum a
posteriori
 The probability of a document F being in class c is computed as
 P(F|c) is the conditional probability of term F occurring in a document of class c.
 It is a measure of how much evidence F contributes that c is a correct class.
 P(c) is the prior probability of a document occurring in class c.
 Statistical classification model
 Predicts binary response from a binary predictor for predicting the outcome of a
categorical dependent variable
 Logistic regression measures the relationship between a categorical dependent variable
and one or more independent variable
 Applications:
 medical and social science field like Trauma and Injury Severity Score (TRISS), used to predict
mortality in injured patients
 used to predict whether a patient has diabetes based on observed characteristics like age,
gender, BMI
 Predict whether a person will vote for congress or BJP based on age, income, gender, race, state
of residency
 Classification
 Binomial or Binary logistic regression deals with variable in which the observed outcome have
two possible types ex dead or alive
 Outcome is coded as 0 or 1
 Straightforward interpretation
 Multinomial logistic regression deals with situation where there are three or more outcomes
 Logistic regression is used for predicting binary outcomes rather than continuous
 Takes the natural logarithm of odds of the logit transformation
 Selects a subset of terms occurring in the training set and uses this subset as features in
text classification
 Serves two main purposes:
 First, makes training and applying classifier more efficient by decreasing the size of effective
vocabulary
 Increases the accuracy by eliminating noise features
 A noise feature is one which when added to the document representation, increases the
classification error on new data.
 Feature selection replaces the complex classifier (using all features) with a simpler one
(using a subset of features)
 Mutual Information measures how much information the presence / absence of a term
contributes to making the correct classification decision on c
 X2 Feature Selection test’s the independence of two events
 Frequency-based feature selection selects terms that are most common in the class.
 frequency can be either defined as document frequency – documents in class c that contain the
terms t
 Collection frequency – tokens of t that occur in documents in c
 document frequency -> Bernoulli model
 Collection frequency -> multinomial model
 Feature selection for multiple classifiers selects single set of features instead of different
one for each classifier

More Related Content

PPT
Correlation testing
PPTX
Lesson 5 data presentation
PDF
Modeling Social Data, Lecture 6: Regression, Part 1
PDF
Modeling Social Data, Lecture 7: Model complexity and generalization
PPTX
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
PPTX
Data array and frequency distribution
PPTX
What is a one sample wilcoxon test
Correlation testing
Lesson 5 data presentation
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 7: Model complexity and generalization
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data array and frequency distribution
What is a one sample wilcoxon test

What's hot (20)

PPTX
13 random forest
PPTX
Lecture 03.
PPTX
Data Presenetation
PDF
Statistics in linguistics 2014
PPT
L2 flash cards quantitative methods - SS3
PPTX
Cluster analysis
PPT
Organizing-Qualitative-and-Quanti.-data
PPT
Chapter 2 250110 083240
PPT
1.4 Data Collection & Sampling
PDF
9 accuracyassessment
PPTX
Quantum generalized linear models
PPTX
Experimental design
PDF
Describing categorical data one variable
PPTX
Cannonical Correlation
PDF
Frequency Tables, Frequency Distributions, and Graphic Presentation
PPT
Comparison statisticalsignificancetestir
PPTX
Presenting Data
DOCX
Naive bayes classifier
PDF
Strict intersection types for the lambda calculus
PDF
ma12012id536
13 random forest
Lecture 03.
Data Presenetation
Statistics in linguistics 2014
L2 flash cards quantitative methods - SS3
Cluster analysis
Organizing-Qualitative-and-Quanti.-data
Chapter 2 250110 083240
1.4 Data Collection & Sampling
9 accuracyassessment
Quantum generalized linear models
Experimental design
Describing categorical data one variable
Cannonical Correlation
Frequency Tables, Frequency Distributions, and Graphic Presentation
Comparison statisticalsignificancetestir
Presenting Data
Naive bayes classifier
Strict intersection types for the lambda calculus
ma12012id536
Ad

Viewers also liked (9)

PPT
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
PPTX
Sosiaaliset suhteet järjestäytyvät uudelleen, Riitta Hanifi
PPTX
fMRI in machine learning
PPTX
Neural network for machine learning
PPTX
Random forest
PPTX
Lessons from 2MM machine learning models
PPT
Analysis of covariance
PDF
Logistic Regression Analysis
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Sosiaaliset suhteet järjestäytyvät uudelleen, Riitta Hanifi
fMRI in machine learning
Neural network for machine learning
Random forest
Lessons from 2MM machine learning models
Analysis of covariance
Logistic Regression Analysis
Ad

Similar to Bayes’ theorem and logistic regression (20)

PPTX
Introduction to Naive bayes and baysian belief network
PDF
Bayes 6
PPTX
Applications of Classification Algorithm.pptx
PPTX
Navies bayes
PPT
Bayes Classification
PDF
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
PDF
Naive bayes Naive bayes Naive bayes Naive bayes
PPTX
Naive Bayes Presentation
PPTX
baysian in machine learning in Supervised Learning .pptx
PDF
Data classification sammer
PDF
Bayesian Learning- part of machine learning
PDF
Lecture2 xing
PPTX
Ml4 naive bayes
PPTX
Unit 2 Machine Learning it's most important topic of basic
PDF
navi bays algorithm in data mining ppt.pdf
PPT
BAYESIAN theorem and implementation of i
PPTX
"Naive Bayes Classifier" @ Papers We Love Bucharest
PPT
2.3 bayesian classification
PPTX
MACHINE LEARNING Unit -2 Algorithm.pptx
PPTX
Belief Networks & Bayesian Classification
Introduction to Naive bayes and baysian belief network
Bayes 6
Applications of Classification Algorithm.pptx
Navies bayes
Bayes Classification
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
Naive bayes Naive bayes Naive bayes Naive bayes
Naive Bayes Presentation
baysian in machine learning in Supervised Learning .pptx
Data classification sammer
Bayesian Learning- part of machine learning
Lecture2 xing
Ml4 naive bayes
Unit 2 Machine Learning it's most important topic of basic
navi bays algorithm in data mining ppt.pdf
BAYESIAN theorem and implementation of i
"Naive Bayes Classifier" @ Papers We Love Bucharest
2.3 bayesian classification
MACHINE LEARNING Unit -2 Algorithm.pptx
Belief Networks & Bayesian Classification

More from Ujjawal (7)

PPTX
Information retrieval
PPTX
Genetic algorithm
PPTX
K nearest neighbor
PPTX
Support vector machines
PPTX
Vector space classification
PPTX
Scoring, term weighting and the vector space
PPTX
Introduction to data mining
Information retrieval
Genetic algorithm
K nearest neighbor
Support vector machines
Vector space classification
Scoring, term weighting and the vector space
Introduction to data mining

Recently uploaded (20)

PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Knowledge Engineering Part 1
Fluorescence-microscope_Botany_detailed content
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
Moving the Public Sector (Government) to a Digital Adoption
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx

Bayes’ theorem and logistic regression

  • 2.  Bayes’ theorem gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(A|B) and P(B|A), in its most common form P(A|B)= 𝑃 𝐵 𝐴 𝑃(𝐴) 𝑃(𝐵)  In Bayesian interpretation, probability measures the degree of belief. Bayes theorem links the belief in a proposition before and after accounting for an evidence.  For proposition A and evidence B:  P(A), the prior, is the initial degree of belief in A  P(A|B), the posterior, is the degree of belief having accounted for B  The quotient P(B|A)/P(B) represents the support B provides for A
  • 3.  The probability model for a classifier is a conditional model p(C|F1, …., Fn), over a dependent class variable C, with a small number of outcomes or classes conditioned on several feature variables F1 through Fn .  Problem – large number of features or features that can take large number of values makes the probability tables infeasible  Using Bayesian theorem p(C|F1, …., Fn) = 𝒑 𝑪 𝒑(𝑭 𝟏 ,…….𝑭𝒏|𝑪) 𝒑(𝑭 𝟏 ,…….𝑭𝒏)  in plain English, posterior= 𝑝𝑟𝑖𝑜𝑟 ∗𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
  • 4.  Since the denominator is not dependent on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model p(C,F1, …., Fn ).  Using the chain rule for repeated applications of definition of conditional probability  Role of Naïve condition: assume that the feature Fi is conditionally independent of every other feature Fj, for j≠i given the category C.
  • 5.  The joint model can be represented as  Under conditional distribution over the class variable  Where z=p(F1, ….., Fn) is a scaling factor
  • 6.  Naïve Bayes classifier combines this model with a decision rule.  Most common rule is to pick hypothesis that is most probable, known as maximum a posteriori  The probability of a document F being in class c is computed as  P(F|c) is the conditional probability of term F occurring in a document of class c.  It is a measure of how much evidence F contributes that c is a correct class.  P(c) is the prior probability of a document occurring in class c.
  • 7.  Statistical classification model  Predicts binary response from a binary predictor for predicting the outcome of a categorical dependent variable  Logistic regression measures the relationship between a categorical dependent variable and one or more independent variable  Applications:  medical and social science field like Trauma and Injury Severity Score (TRISS), used to predict mortality in injured patients  used to predict whether a patient has diabetes based on observed characteristics like age, gender, BMI  Predict whether a person will vote for congress or BJP based on age, income, gender, race, state of residency
  • 8.  Classification  Binomial or Binary logistic regression deals with variable in which the observed outcome have two possible types ex dead or alive  Outcome is coded as 0 or 1  Straightforward interpretation  Multinomial logistic regression deals with situation where there are three or more outcomes  Logistic regression is used for predicting binary outcomes rather than continuous  Takes the natural logarithm of odds of the logit transformation
  • 9.  Selects a subset of terms occurring in the training set and uses this subset as features in text classification  Serves two main purposes:  First, makes training and applying classifier more efficient by decreasing the size of effective vocabulary  Increases the accuracy by eliminating noise features  A noise feature is one which when added to the document representation, increases the classification error on new data.  Feature selection replaces the complex classifier (using all features) with a simpler one (using a subset of features)
  • 10.  Mutual Information measures how much information the presence / absence of a term contributes to making the correct classification decision on c  X2 Feature Selection test’s the independence of two events  Frequency-based feature selection selects terms that are most common in the class.  frequency can be either defined as document frequency – documents in class c that contain the terms t  Collection frequency – tokens of t that occur in documents in c  document frequency -> Bernoulli model  Collection frequency -> multinomial model  Feature selection for multiple classifiers selects single set of features instead of different one for each classifier

Editor's Notes

  • #7: Maximum a Posteriori probability estimate is a mode of the posterior distribution. Posteriori probability of a random event or an uncertain proposition is the conditional probability assigned after the relevant evidence is taken into account. ‘Posteriori’ means taking into account the relevant evidence related to the particular case being examined.