Intro to Sentiment Analysis

Steps
• Acquire
• Pre-Process
• Explore
• Model
• Test

Acquire
Sources: database,
API, scraping...
Storage: memory,
database, flat file...
Format: language
specific classes that
help with processing
(corpus)

Pre-Process
Letter Case
!
Stop Words
!
Stemming

Pre-Process
Letter Case
"Hello World" -> "hello world"
Stop Words
!
Stemming

Pre-Process
Letter Case
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming

Pre-Process
Letter Case
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming
"run, runs, running, ran" -> ("run," 4)

Model
• Pointwise Mutual
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy

Unsupervised
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy

Pointwise Mutual Information
Original: I am loving my new iPhone!!
Pre-Processed: love new iphone
TERM VALUE
love +1
new +1
iphone 0
Sentiment Score: +2

Discrete PMI
Original: I am loving my new iPhone!!
TERM VALUE
love +4
new +1
iphone 0
Sentiment Score: +5

Supervised
Information (PMI)
• Discrete PMI
• Naive Bayes
• Maximum Entropy

Naive Bayes
POSITIVE

Who doesn't love the new iPhone?

I love tacos, especially from Taco Mayo.

New shoes are the best!!!

!
NEGATIVE

My friend Chuck is an idiot.

The new iPhone sucks, DO NOT BUY!

I got dumped today, feeling super sad.
love
!
new
!
iphone

Naive Bayes
POSITIVE




!
NEGATIVE



love
!
new
!
iphone
1.0, 0.0
!
!
!

Naive Bayes
POSITIVE




!
NEGATIVE



love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!

Naive Bayes
POSITIVE




!
NEGATIVE



love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!
0.5, 0.5

Naive Bayes
POSITIVE




!
NEGATIVE



love
!
new
!
iphone
1.0, 0.0
!
0.67, 0.33
!
0.5, 0.5
0.72, 0.28
Evidence points to a 72% likelihood that this document is positive.

Maximum Entropy
Really good
at large
document
sets.
Black box of
magic
VS.

Test
Human Validation:
Randomly select n
documents to be
manually tagged and
compare accuracy.

Test
Cross Validation:
Hold n% out of already
classiﬁed samples to see
how good our model
generalizes to new data.

Challenges
Spelling:
the new iPohne is grate!
Ambiguous:
The iPhone is fantastic at sucking
Figurative:
this new iPhone is sick.
Irony:
screen broke, thanks Apple
Context:
Angry Birds looks awesome on my iPhone

Intro to Sentiment Analysis

More Related Content

Recently uploaded (20)

Featured (20)

Intro to Sentiment Analysis