SlideShare a Scribd company logo
Text
Classification
Positive or negative movie
review?
• unbelievably disappointing
• Full of zany characters and richly applied satire,
and some great plot twists
• this is the greatest screwball comedy ever
filmed
• It was pathetic. The worst part about it was the
boxing scenes.
2
What is the subject of this
article?
• Management/mba
• admission
• arts
• exam preparation
• nursing
• technology
• …
3
Subject Category
?
Text Classification
• Assigning subject categories, topics, or
genres
• Spam detection
• Authorship identification
• Age/gender identification
• Language Identification
• Sentiment analysis
• …
Text Classification: definition
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• Output: a predicted class c ∈ C
Classification Methods:
Hand-coded rules
• Rules based on combinations of words or other
features
• spam: black-list-address OR (“dollars” AND“have been
selected”)
• Accuracy can be high
• If rules carefully refined by expert
• But building and maintaining these rules is
expensive
Classification Methods:
Supervised Machine Learning
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-labeled documents
(d1,c1),....,(dm,cm)
• Output:
• a learned classifier γ:d  c
7
Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naïve Bayes
• Logistic regression
• Support-vector machines
• Maximum Entropy Model
• Generative Vs Discriminative
• …
Naïve Bayes Intuition
• Simple (“naïve”) classification method
based on Bayes rule
• Relies on very simple representation of
document
• Bag of words
The bag of words
representation
I love this movie! It's sweet,
but with satirical humor. The
dialogue is great and the
adventure scenes are fun… It
manages to be whimsical and
romantic while laughing at the
conventions of the fairy tale
genre. I would recommend it to
just about anyone. I've seen
it several times, and I'm
always happy to see it again
whenever I have a friend who
hasn't seen it yet.
γ
(
)=c
The bag of words
representation
I love this movie! It's sweet,
but with satirical humor. The
dialogue is great and the
adventure scenes are fun… It
manages to be whimsical and
romantic while laughing at the
conventions of the fairy tale
genre. I would recommend it to
just about anyone. I've seen
it several times, and I'm
always happy to see it again
whenever I have a friend who
hasn't seen it yet.
γ
(
)=c
The bag of words representation:
using a subset of words
x love xxxxxxxxxxxxxxxx sweet
xxxxxxx satirical xxxxxxxxxx
xxxxxxxxxxx great xxxxxxx
xxxxxxxxxxxxxxxxxxx fun xxxx
xxxxxxxxxxxxx whimsical xxxx
romantic xxxx laughing
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx recommend xxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xx several xxxxxxxxxxxxxxxxx
xxxxx happy xxxxxxxxx again
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxx
γ
(
)=c
Planning GUIGarbage
Collection
Machine 
Learning NLP
parser
tag
training
translation
language...
learning
training
algorithm
shrinkage
network...
garbage
collection
memory
optimization
region...
Test
document
parser
language
label
translation
…
Bag of words for document
classification
...planning
temporal
reasoning
plan
language...
?
Bayes’ Rule Applied to
Documents and Classes
• For a document d and a class c
P(c| d) =
P(d| c)P(c)
P(d)
Naïve Bayes Classifier (I)
MAP is “maximum a
posteriori” = most
likely class
Bayes Rule
Dropping the
denominator
cMAP = argmax
c∈C
P(c| d)
= argmax
c∈C
P(d| c)P(c)
P(d)
= argmax
c∈C
P(d| c)P(c)
Naïve Bayes Classifier (II)
Document d
represented
as features
x1..xn
cMAP = argmax
c∈C
P(d| c)P(c)
= argmax
c∈C
P(x1, x2,…, xn | c)P(c)
Naïve Bayes Classifier (IV)
How often does this
class occur?
O(|X|n•|C|) parameters
We can just count the
relative frequencies
in a corpus
Could only be estimated if
a very, very large number
of training examples was
available.
cMAP = argmax
c∈C
P(x1, x2,…, xn | c)P(c)
Multinomial Naïve Bayes
Independence Assumptions
• Bag of Words assumption: Assume position
doesn’t matter
• Conditional Independence: Assume the
feature probabilities P(xi|cj) are independent
given the class c.
P(x1, x2,…, xn | c)
P(x1,…, xn |c) = P(x1 |c)•P(x2 |c)•P(x3 |c)•...•P(xn | c)
Multinomial Naïve Bayes
Classifier
cMAP = argmax
c∈C
P(x1, x2,…, xn | c)P(c)
cNB = argmax
c∈C
P(cj ) P(x| c)
x∈X
∏
Learning the Multinomial Naïve
Bayes Model
• First attempt: maximum likelihood
estimates
• simply use the frequencies in the data
ˆP(wi | cj ) =
count(wi,cj )
count(w,cj )
w∈V
∑
ˆP(cj ) =
doccount(C = cj )
Ndoc
• Create mega-document for topic j by
concatenating all docs in this topic
• Use frequency of w in mega-document
Parameter estimation
fraction of times word wi
appears
among all words in documents
of topic cj
ˆP(wi | cj ) =
count(wi,cj )
count(w,cj )
w∈V
∑
Summary: Naive Bayes is Not
So Naive
• Very Fast, low storage requirements
• Robust to Irrelevant Features
Irrelevant Features cancel each other without affecting results
• Very good in domains with many equally important
features
Decision Trees suffer from fragmentation in such cases –
especially if little data
• Optimal if the independence assumptions hold: If
assumed independence is correct, then it is the Bayes Optimal
Classifier for problem
• A good dependable baseline for text classification
Real-world systems generally
combine:
• Automatic classification
• Manual review of
uncertain/difficult/"new” cases
23
24
The Real World
• Gee, I’m building a text classifier for real, now!
• What should I do?
25
The Real World
• Write your own classifier code.
• Tools:
●
Apache Mahout (java)
●
NLTK (python)
●
Lingpipe
●
Stanford Classifier …..
• APIs:
●
OpenCalais
●
AlchemiApi
●
UIUC CCG.....

More Related Content

PPTX
Decision Tree Learning
PDF
Introduction to Statistical Machine Learning
PDF
Decision trees in Machine Learning
PPTX
Unsupervised learning clustering
PPT
Naive bayes
PPTX
Naive bayes
PPTX
Knowledge representation in AI
PPTX
Random forest algorithm
Decision Tree Learning
Introduction to Statistical Machine Learning
Decision trees in Machine Learning
Unsupervised learning clustering
Naive bayes
Naive bayes
Knowledge representation in AI
Random forest algorithm

What's hot (20)

PPT
Support Vector Machines
PDF
Machine Learning
PDF
K - Nearest neighbor ( KNN )
PPTX
Curse of dimensionality
PPT
Classification (ML).ppt
PDF
Naive Bayes
PPT
MachineLearning.ppt
PDF
Data preprocessing using Machine Learning
PPTX
Optimization in Deep Learning
PPTX
Batch normalization presentation
PDF
AI 7 | Constraint Satisfaction Problem
PPTX
Text similarity measures
ODP
NAIVE BAYES CLASSIFIER
PPT
2.3 bayesian classification
PPTX
Machine Learning (Classification Models)
PPTX
Text Classification
PDF
Introduction to Machine Learning with SciKit-Learn
PDF
Bayesian learning
PPTX
Deep neural networks
Support Vector Machines
Machine Learning
K - Nearest neighbor ( KNN )
Curse of dimensionality
Classification (ML).ppt
Naive Bayes
MachineLearning.ppt
Data preprocessing using Machine Learning
Optimization in Deep Learning
Batch normalization presentation
AI 7 | Constraint Satisfaction Problem
Text similarity measures
NAIVE BAYES CLASSIFIER
2.3 bayesian classification
Machine Learning (Classification Models)
Text Classification
Introduction to Machine Learning with SciKit-Learn
Bayesian learning
Deep neural networks
Ad

Similar to Introduction to text classification using naive bayes (20)

PPTX
Nave Bias algorithm in Nature language processing
PDF
Text classification & sentiment analysis
PPTX
Topic_5_NB_Sentiment_Classification_.pptx
PDF
Text Classification.pdf
PDF
PPT
lecture15-supervised.ppt
PDF
An Overview of Naïve Bayes Classifier
PDF
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
PDF
Classifying text with Bayes Models
PPT
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
PDF
Naive Bayes Classifier
DOCX
First ML Experience
PPTX
Lecture 10
PPTX
best data science courses
PPTX
data science training in hyderabad
PPTX
Artificial intelligence training in bangalore
PPTX
Sentiment analysis
PPTX
Sediment analysis: what is Sediment analysis
PPTX
Sentiment analysis using naive bayes classifier
PPTX
02 naive bays classifier and sentiment analysis
Nave Bias algorithm in Nature language processing
Text classification & sentiment analysis
Topic_5_NB_Sentiment_Classification_.pptx
Text Classification.pdf
lecture15-supervised.ppt
An Overview of Naïve Bayes Classifier
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
Classifying text with Bayes Models
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
Naive Bayes Classifier
First ML Experience
Lecture 10
best data science courses
data science training in hyderabad
Artificial intelligence training in bangalore
Sentiment analysis
Sediment analysis: what is Sediment analysis
Sentiment analysis using naive bayes classifier
02 naive bays classifier and sentiment analysis
Ad

Recently uploaded (20)

PPTX
TLE Review Electricity (Electricity).pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Enhancing emotion recognition model for a student engagement use case through...
TLE Review Electricity (Electricity).pptx
1 - Historical Antecedents, Social Consideration.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Encapsulation_ Review paper, used for researhc scholars
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Encapsulation theory and applications.pdf
Getting Started with Data Integration: FME Form 101
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A novel scalable deep ensemble learning framework for big data classification...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
SOPHOS-XG Firewall Administrator PPT.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 5: Probability Theory and Statistics
NewMind AI Weekly Chronicles - August'25-Week II
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Enhancing emotion recognition model for a student engagement use case through...

Introduction to text classification using naive bayes

  • 2. Positive or negative movie review? • unbelievably disappointing • Full of zany characters and richly applied satire, and some great plot twists • this is the greatest screwball comedy ever filmed • It was pathetic. The worst part about it was the boxing scenes. 2
  • 3. What is the subject of this article? • Management/mba • admission • arts • exam preparation • nursing • technology • … 3 Subject Category ?
  • 4. Text Classification • Assigning subject categories, topics, or genres • Spam detection • Authorship identification • Age/gender identification • Language Identification • Sentiment analysis • …
  • 5. Text Classification: definition • Input: • a document d • a fixed set of classes C = {c1, c2,…, cJ} • Output: a predicted class c ∈ C
  • 6. Classification Methods: Hand-coded rules • Rules based on combinations of words or other features • spam: black-list-address OR (“dollars” AND“have been selected”) • Accuracy can be high • If rules carefully refined by expert • But building and maintaining these rules is expensive
  • 7. Classification Methods: Supervised Machine Learning • Input: • a document d • a fixed set of classes C = {c1, c2,…, cJ} • A training set of m hand-labeled documents (d1,c1),....,(dm,cm) • Output: • a learned classifier γ:d  c 7
  • 8. Classification Methods: Supervised Machine Learning • Any kind of classifier • Naïve Bayes • Logistic regression • Support-vector machines • Maximum Entropy Model • Generative Vs Discriminative • …
  • 9. Naïve Bayes Intuition • Simple (“naïve”) classification method based on Bayes rule • Relies on very simple representation of document • Bag of words
  • 10. The bag of words representation I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. γ ( )=c
  • 11. The bag of words representation I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. γ ( )=c
  • 12. The bag of words representation: using a subset of words x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx recommend xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx γ ( )=c
  • 14. Bayes’ Rule Applied to Documents and Classes • For a document d and a class c P(c| d) = P(d| c)P(c) P(d)
  • 15. Naïve Bayes Classifier (I) MAP is “maximum a posteriori” = most likely class Bayes Rule Dropping the denominator cMAP = argmax c∈C P(c| d) = argmax c∈C P(d| c)P(c) P(d) = argmax c∈C P(d| c)P(c)
  • 16. Naïve Bayes Classifier (II) Document d represented as features x1..xn cMAP = argmax c∈C P(d| c)P(c) = argmax c∈C P(x1, x2,…, xn | c)P(c)
  • 17. Naïve Bayes Classifier (IV) How often does this class occur? O(|X|n•|C|) parameters We can just count the relative frequencies in a corpus Could only be estimated if a very, very large number of training examples was available. cMAP = argmax c∈C P(x1, x2,…, xn | c)P(c)
  • 18. Multinomial Naïve Bayes Independence Assumptions • Bag of Words assumption: Assume position doesn’t matter • Conditional Independence: Assume the feature probabilities P(xi|cj) are independent given the class c. P(x1, x2,…, xn | c) P(x1,…, xn |c) = P(x1 |c)•P(x2 |c)•P(x3 |c)•...•P(xn | c)
  • 19. Multinomial Naïve Bayes Classifier cMAP = argmax c∈C P(x1, x2,…, xn | c)P(c) cNB = argmax c∈C P(cj ) P(x| c) x∈X ∏
  • 20. Learning the Multinomial Naïve Bayes Model • First attempt: maximum likelihood estimates • simply use the frequencies in the data ˆP(wi | cj ) = count(wi,cj ) count(w,cj ) w∈V ∑ ˆP(cj ) = doccount(C = cj ) Ndoc
  • 21. • Create mega-document for topic j by concatenating all docs in this topic • Use frequency of w in mega-document Parameter estimation fraction of times word wi appears among all words in documents of topic cj ˆP(wi | cj ) = count(wi,cj ) count(w,cj ) w∈V ∑
  • 22. Summary: Naive Bayes is Not So Naive • Very Fast, low storage requirements • Robust to Irrelevant Features Irrelevant Features cancel each other without affecting results • Very good in domains with many equally important features Decision Trees suffer from fragmentation in such cases – especially if little data • Optimal if the independence assumptions hold: If assumed independence is correct, then it is the Bayes Optimal Classifier for problem • A good dependable baseline for text classification
  • 23. Real-world systems generally combine: • Automatic classification • Manual review of uncertain/difficult/"new” cases 23
  • 24. 24 The Real World • Gee, I’m building a text classifier for real, now! • What should I do?
  • 25. 25 The Real World • Write your own classifier code. • Tools: ● Apache Mahout (java) ● NLTK (python) ● Lingpipe ● Stanford Classifier ….. • APIs: ● OpenCalais ● AlchemiApi ● UIUC CCG.....