SlideShare a Scribd company logo
Sentiment Analysis
An Overview of Concepts and
Selected Techniques
Terms
 Sentiment
 A thought, view, or attitude, especially one
based mainly on emotion instead of reason
 Sentiment Analysis
 aka opinion mining
 use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text
Motivation
 Consumer information
 Product reviews
 Marketing
 Consumer attitudes
 Trends
 Politics
 Politicians want to know voters’ views
 Voters want to know policitians’ stances and who else
supports them
 Social
 Find like-minded individuals or communities
Problem
 Which features to use?
 Words (unigrams)
 Phrases/n-grams
 Sentences
 How to interpret features for sentiment
detection?
 Bag of words (IR)
 Annotated lexicons (WordNet, SentiWordNet)
 Syntactic patterns
 Paragraph structure
Challenges
 Harder than topical classification, with
which bag of words features perform well
 Must consider other features due to…
 Subtlety of sentiment expression
 irony
 expression of sentiment using neutral words
 Domain/context dependence
 words/phrases can mean different things in different
contexts and domains
 Effect of syntax on semantics
Approaches
 Machine learning
 Naïve Bayes
 Maximum Entropy Classifier
 SVM
 Markov Blanket Classifier
 Accounts for conditional feature dependencies
 Allowed reduction of discriminating features from
thousands of words to about 20 (movie review
domain)
 Unsupervised methods
 Use lexicons
Assume pairwise
independent features
LingPipe Polarity Classifier
 First eliminate objective sentences, then
use remaining sentences to classify
document polarity (reduce noise)
LingPipe Polarity Classifier
 Uses unigram features extracted from
movie review data
 Assumes that adjacent sentences are
likely to have similar subjective-objective
(SO) polarity
 Uses a min-cut algorithm to efficiently
extract subjective sentences
LingPipe Polarity Classifier
Graph for classifying three items.
LingPipe Polarity Classifier
 Accurate as baseline but uses only 22% of
content in test data (average)
 Metrics suggests properties of movie
review structure
SentiWordNet
 Based on WordNet “synsets”
 http://wordnet.princeton.edu/
 Ternary classifier
 Positive, negative, and neutral scores for each
synset
 Provides means of gauging sentiment for
a text
SentiWordNet: Construction
 Created training sets of synsets, Lp and Ln
 Start with small number of synsets with fundamentally
positive or negative semantics, e.g., “nice” and “nasty”
 Use WordNet relations, e.g., direct antonymy, similarity,
derived-from, to expand Lp and Ln over K iterations
 Lo (objective) is set of synsets not in Lp or Ln
 Trained classifiers on training set
 Rocchio and SVM
 Use four values of K to create eight classifiers with
different precision/recall characteristics
 As K increases, P decreases and R increases
SentiWordNet: Results
 24.6% synsets with Objective<1.0
 Many terms are classified with some degree of
subjectivity
 10.45% with Objective<=0.5
 0.56% with Objective<=0.125
 Only a few terms are classified as definitively
subjective
 Difficult (if not impossible) to accurately
assess performance
SentiWordNet: How to use it
 Use score to select features (+/-)
 e.g. Zhang and Zhang (2006) used words in
corpus with subjectivity score of 0.5 or greater
 Combine pos/neg/objective scores to
calculate document-level score
 e.g. Devitt and Ahmad (2007) conflated
polarity scores with a Wordnet-based graph
representation of documents to create
predictive metrics
References
1. http://www.answers.com/sentiment, 9/22/08
 B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment
classification using machine learning techniques,” in Proc Conf
on Empirical Methods in Natural Language Processing (EMNLP),
pp. 79–86, 2002.
 Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical
Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf
on Language Resources and Evaluation, 2006.
 Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining.
TREC 2006 Blog Track, Opinion Retrieval Task.
 Devitt A, Ahmad K. Sentiment Polarity Identification in Financial
News: A Cohesion-based Approach. ACL 2007.
 Bo Pang , Lillian Lee, A sentimental education: sentiment
analysis using subjectivity summarization based on minimum
cuts, Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, p.271-es, July 21-26, 2004.

More Related Content

PDF
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
PDF
J1803015357
PPTX
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
PPTX
Lac presentation
PPTX
Sentiment analysis
PDF
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
PDF
Supervised Sentiment Classification using DTDP algorithm
PDF
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
J1803015357
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
Lac presentation
Sentiment analysis
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Supervised Sentiment Classification using DTDP algorithm
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language

Similar to An overview of concepts of Sentiment Analysis (20)

PPTX
Sentiment Analysis
PDF
N01741100102
PDF
Implementation of Semantic Analysis Using Domain Ontology
PPTX
Fyp ca2
PPTX
A review on sentiment analysis and emotion detection.pptx
PDF
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
PDF
Experiences with Sentiment Analysis with Peter Zadrozny
PDF
Sentence level sentiment polarity calculation for customer reviews by conside...
PDF
A Survey On Sentiment Analysis Of Movie Reviews
PDF
Emotion Detection from Text
PDF
Opinion mining on newspaper headlines using SVM and NLP
PPTX
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
PPTX
DOCX
NLP Techniques for Sentiment Anaysis.docx
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
PDF
A SURVEY OF S ENTIMENT CLASSIFICATION TECHNIQUES USED FOR I NDIAN REGIONA...
PPTX
Sentiment analysis using ml
PPTX
Sentiment analysis
PPTX
detect emotion from text
PPTX
A presentation on Sentiment Analysis....
Sentiment Analysis
N01741100102
Implementation of Semantic Analysis Using Domain Ontology
Fyp ca2
A review on sentiment analysis and emotion detection.pptx
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Experiences with Sentiment Analysis with Peter Zadrozny
Sentence level sentiment polarity calculation for customer reviews by conside...
A Survey On Sentiment Analysis Of Movie Reviews
Emotion Detection from Text
Opinion mining on newspaper headlines using SVM and NLP
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
NLP Techniques for Sentiment Anaysis.docx
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
A SURVEY OF S ENTIMENT CLASSIFICATION TECHNIQUES USED FOR I NDIAN REGIONA...
Sentiment analysis using ml
Sentiment analysis
detect emotion from text
A presentation on Sentiment Analysis....
Ad

More from Ravi Kumar Lanke (20)

PPTX
mariadb_odbc_step_by_step_implementation.pptx
PPTX
Local users and groups missing in windows 10.pptx
PPT
Steps for Multimedia Signal Processesing.ppt
PPT
Step by Step Oracle Virtual Manager Installation.ppt
PPTX
CA workload Automation Tool Power Point Presentation
PDF
Creating and configuring vnc sessions
PDF
Copying files between linux machines using scp and ssh without linux user pas...
PDF
Exporting schema to dmp file and importing it into other oracle database
PDF
Installing Endeca Server, Studio, Integrator ETL , Commerce and Platform Serv...
PDF
Installing solaris on virtual box and installing weblogic server
PDF
Enabling remote desktop connection on windows 7 64 bit
PDF
Connecting to the remote database through sql developer without database clie...
PDF
Setting home path class path and path for java on windows 7
PDF
How to find ip and mac address
PDF
Step by step deployment of sampleappv406
PDF
Installing and configuring informatica 910 and dac 11 g on windows 64 bit
PDF
Installing bi applications 7.9.6.4 on obiee 11.1.1.7.0
PDF
Installing my sql on windows
PDF
How to prevent access to command prompt and registry editing tools and window...
PDF
How to disable and enable task manager
mariadb_odbc_step_by_step_implementation.pptx
Local users and groups missing in windows 10.pptx
Steps for Multimedia Signal Processesing.ppt
Step by Step Oracle Virtual Manager Installation.ppt
CA workload Automation Tool Power Point Presentation
Creating and configuring vnc sessions
Copying files between linux machines using scp and ssh without linux user pas...
Exporting schema to dmp file and importing it into other oracle database
Installing Endeca Server, Studio, Integrator ETL , Commerce and Platform Serv...
Installing solaris on virtual box and installing weblogic server
Enabling remote desktop connection on windows 7 64 bit
Connecting to the remote database through sql developer without database clie...
Setting home path class path and path for java on windows 7
How to find ip and mac address
Step by step deployment of sampleappv406
Installing and configuring informatica 910 and dac 11 g on windows 64 bit
Installing bi applications 7.9.6.4 on obiee 11.1.1.7.0
Installing my sql on windows
How to prevent access to command prompt and registry editing tools and window...
How to disable and enable task manager
Ad

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Complications of Minimal Access Surgery at WLH
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Lesson notes of climatology university.
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Cell Types and Its function , kingdom of life
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
master seminar digital applications in india
PPTX
Cell Structure & Organelles in detailed.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Complications of Minimal Access Surgery at WLH
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
TR - Agricultural Crops Production NC III.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Lesson notes of climatology university.
Pharma ospi slides which help in ospi learning
Cell Types and Its function , kingdom of life
Abdominal Access Techniques with Prof. Dr. R K Mishra
O7-L3 Supply Chain Operations - ICLT Program
master seminar digital applications in india
Cell Structure & Organelles in detailed.
Microbial diseases, their pathogenesis and prophylaxis
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing

An overview of concepts of Sentiment Analysis

  • 1. Sentiment Analysis An Overview of Concepts and Selected Techniques
  • 2. Terms  Sentiment  A thought, view, or attitude, especially one based mainly on emotion instead of reason  Sentiment Analysis  aka opinion mining  use of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from typically unstructured text
  • 3. Motivation  Consumer information  Product reviews  Marketing  Consumer attitudes  Trends  Politics  Politicians want to know voters’ views  Voters want to know policitians’ stances and who else supports them  Social  Find like-minded individuals or communities
  • 4. Problem  Which features to use?  Words (unigrams)  Phrases/n-grams  Sentences  How to interpret features for sentiment detection?  Bag of words (IR)  Annotated lexicons (WordNet, SentiWordNet)  Syntactic patterns  Paragraph structure
  • 5. Challenges  Harder than topical classification, with which bag of words features perform well  Must consider other features due to…  Subtlety of sentiment expression  irony  expression of sentiment using neutral words  Domain/context dependence  words/phrases can mean different things in different contexts and domains  Effect of syntax on semantics
  • 6. Approaches  Machine learning  Naïve Bayes  Maximum Entropy Classifier  SVM  Markov Blanket Classifier  Accounts for conditional feature dependencies  Allowed reduction of discriminating features from thousands of words to about 20 (movie review domain)  Unsupervised methods  Use lexicons Assume pairwise independent features
  • 7. LingPipe Polarity Classifier  First eliminate objective sentences, then use remaining sentences to classify document polarity (reduce noise)
  • 8. LingPipe Polarity Classifier  Uses unigram features extracted from movie review data  Assumes that adjacent sentences are likely to have similar subjective-objective (SO) polarity  Uses a min-cut algorithm to efficiently extract subjective sentences
  • 9. LingPipe Polarity Classifier Graph for classifying three items.
  • 10. LingPipe Polarity Classifier  Accurate as baseline but uses only 22% of content in test data (average)  Metrics suggests properties of movie review structure
  • 11. SentiWordNet  Based on WordNet “synsets”  http://wordnet.princeton.edu/  Ternary classifier  Positive, negative, and neutral scores for each synset  Provides means of gauging sentiment for a text
  • 12. SentiWordNet: Construction  Created training sets of synsets, Lp and Ln  Start with small number of synsets with fundamentally positive or negative semantics, e.g., “nice” and “nasty”  Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand Lp and Ln over K iterations  Lo (objective) is set of synsets not in Lp or Ln  Trained classifiers on training set  Rocchio and SVM  Use four values of K to create eight classifiers with different precision/recall characteristics  As K increases, P decreases and R increases
  • 13. SentiWordNet: Results  24.6% synsets with Objective<1.0  Many terms are classified with some degree of subjectivity  10.45% with Objective<=0.5  0.56% with Objective<=0.125  Only a few terms are classified as definitively subjective  Difficult (if not impossible) to accurately assess performance
  • 14. SentiWordNet: How to use it  Use score to select features (+/-)  e.g. Zhang and Zhang (2006) used words in corpus with subjectivity score of 0.5 or greater  Combine pos/neg/objective scores to calculate document-level score  e.g. Devitt and Ahmad (2007) conflated polarity scores with a Wordnet-based graph representation of documents to create predictive metrics
  • 15. References 1. http://www.answers.com/sentiment, 9/22/08  B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002.  Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf on Language Resources and Evaluation, 2006.  Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task.  Devitt A, Ahmad K. Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007.  Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.

Editor's Notes

  • #2: 1. Subjective vs objective information 2. Essentially the same as other information retrieval tasks, but with some additional challenges as we will see
  • #3: Review info from blogs, newsgroups, etc Consumer attitudes towards -company’s products -competitor’s products Politics -can form basis of policy decisions
  • #4: Lead in: these problems are similar to other IR tasks Have a body of text--- need to know how to classify it GRANULARITY --Most research has used unigrams (single words) --some research shows that k-length n-grams work best -------------------------------------------------------- Wordnet: Contains large lexicon with relationships Synonymy, antonymy, etc Syntactic patterns Indirect negation Setup/contradiction
  • #5: “[it] avoids all cliches and predictability found in Hollywood movies” “avoids” reverses polarity of “cliches” and “predictability” Thwarted expectation: “This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can't hold up" “unpredictable”: good for movie plot, bad for car steering
  • #6: Machine learning Strengths: -perform fairly well within a given domain with sufficient training data Weaknesses: --in a given domain tends to overfit training data; hard to transfer learning to other domains --need training data Unsupervised Strengths --domain independent; prior polarity --may aid machine learning techniques weaknesses: --when used alone, does not perform as well as machine learning w/in a given domain
  • #9: Document with three sentences: Y, M, N – nodes in the graph Assign weights for each node’s (sentence’s) preference for being in each of two classes (positive or negative) Assign weights for each node’s (sentence’s) preference for being in the same class as adjacent nodes.
  • #10: Also shows performance of different classifiers
  • #11: Wordnet: lexical resource developed at princeton A Synset represents a distinct semantic concept --contains a set of synonymous words