SlideShare a Scribd company logo
Sentiment Analysis
Steps
• Acquire
• Pre-Process
• Explore
• Model
• Test
Acquire
Sources: database,
API, scraping...
Storage: memory,
database, flat file...
Format: language
specific classes that
help with processing
(corpus)
Pre-Process
Letter Case
!
Stop Words
!
Stemming
Pre-Process
Letter Case
"Hello World" -> "hello world"
Stop Words
!
Stemming
Pre-Process
Letter Case
"Hello World" -> "hello world"
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming
Pre-Process
Letter Case
"Hello World" -> "hello world"
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming
"run, runs, running, ran" -> ("run," 4)
Explore
Model
• Pointwise Mutual
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy
Unsupervised
• Pointwise Mutual
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy
Pointwise Mutual Information
Original: I am loving my new iPhone!!
Pre-Processed: love new iphone
TERM VALUE
love +1
new +1
iphone 0
Sentiment Score: +2
Discrete PMI
Original: I am loving my new iPhone!!
Pre-Processed: love new iphone
TERM VALUE
love +4
new +1
iphone 0
Sentiment Score: +5
Supervised
• Pointwise Mutual
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy
Naive Bayes
Pre-Processed: love new iphone
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone
Naive Bayes
Pre-Processed: love new iphone
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone
1.0, 0.0
!
!
!
Naive Bayes
Pre-Processed: love new iphone
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!
Naive Bayes
Pre-Processed: love new iphone
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!
0.5, 0.5
Naive Bayes
Pre-Processed: love new iphone
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!
0.5, 0.5
0.72, 0.28
Evidence points to a 72% likelihood that this document is positive.
Maximum Entropy
Maximum Entropy
Maximum Entropy
Really good
at large
document
sets.
Black box of
magic
VS.
Test
Human Validation:
Randomly select n
documents to be
manually tagged and
compare accuracy.
Test
Cross Validation:
Hold n% out of already
classified samples to see
how good our model
generalizes to new data.
Challenges
Spelling:
the new iPohne is grate!
Ambiguous:
The iPhone is fantastic at sucking
Figurative:
this new iPhone is sick.
Irony:
screen broke, thanks Apple
Context:
Angry Birds looks awesome on my iPhone
Sentiment Analysis

More Related Content

PDF
Text modeling with R, Python, and Spark
PDF
Intro to Machine Learning: Thunderplains 2016
PDF
Basics of Machine Learning
PDF
PitchFX Hackathon - Team R
PDF
Topic Modeling with Spark
PDF
How to get started in Data Science
PPT
Topic Models - LDA and Correlated Topic Models
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
Text modeling with R, Python, and Spark
Intro to Machine Learning: Thunderplains 2016
Basics of Machine Learning
PitchFX Hackathon - Team R
Topic Modeling with Spark
How to get started in Data Science
Topic Models - LDA and Correlated Topic Models
2024 Trend Updates: What Really Works In SEO & Content Marketing

Recently uploaded (20)

PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Managing Community Partner Relationships
PDF
Global Data and Analytics Market Outlook Report
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Database Infoormation System (DBIS).pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
New ISO 27001_2022 standard and the changes
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
retention in jsjsksksksnbsndjddjdnFPD.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
IBA_Chapter_11_Slides_Final_Accessible.pptx
A Complete Guide to Streamlining Business Processes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Managing Community Partner Relationships
Global Data and Analytics Market Outlook Report
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
CYBER SECURITY the Next Warefare Tactics
Database Infoormation System (DBIS).pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
SAP 2 completion done . PRESENTATION.pptx
IMPACT OF LANDSLIDE.....................
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
New ISO 27001_2022 standard and the changes
STERILIZATION AND DISINFECTION-1.ppthhhbx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Ad
Ad

Intro to Sentiment Analysis