SlideShare a Scribd company logo
Rule-based approach to
sentiment analysis at ROMIP’11
               Dmitry Kan
         dmitry.kan@gmail.com
          Twitter: @DmitryKan
            AlphaSense Inc
            Dialogue, 2012
Outline
•   Problem definition
•   Base level for accuracy
•   Towards shallow parsing of input text
•   Rule-based algorithm
•   Object-oriented sentiment detection
•   Performance
•   Open problems
Problem definition
• What is sentiment for people:
  – Mood of the author? Mood of the reader? Personal
    attitude?
  – Opinion about the target object (product etc)?
  – Something else, defined by an annotator’s boss?
• What is sentiment for a computer:
  –   General polarity background
  –   General opinion mining
  –   Object (product) oriented opinion mining
  –   Polarity strength detection
Base level for accuracy

• cross-annotator agreement gives 80% [1]
• Real performance of the system is the one it
  shows when used on un-annotated data
• Real example: ”CEO of the company turned
  50” (was marked as positive -> why?)
• Some machine learning (ML) methods can
  give 90% and more on test data
• Hard (unless impossible) to do object oriented
  sentiment detection with ML
Towards shallow parsing of input text
              Opposite conjunction
                           negation                  totalSentimentScore =
        Subclause 1        Subclause 2               totalPositiveScore – totalNegativeScore -
                                                     ½ * sentimentCount, if opp. conj found

                                                      0, if no opp conj found
    Majority likes this, but I do not like this
                                                     NOT(polarity) = opposite_polarity



           Opposite conjunction                   Object: iPhone   Sentiment: positive
                              negation
    Subclause 1              Subclause 2          Object: GalaxyS Sentiment: negative

                                                       Object: -    Sentiment: neutral
                                                                    (mixed)
I liked new iPhone, but GalaxyS is not easy to use
           iPhone       GalaxyS
Rule based algorithm flow on example
              sentence
    Majority likes this, but I do not like this.
      Phase1 (negations): posScore = 0 – negation weight = -2
      Phase2 (individual words):
      Word ”likes”: posScore = -2 + 1 = -1
      Word ”not”: negScore = 0 + 1 = 1
      Word ”like”: posScore = -1 + 1 = 0
      Phase3 (oppositeConjuctions): sentimentCount = 3

      totalScore = posScore – negScore – ½ * sentimentCount =
      0 – 1 – 3/2 = -5/2


      Sentiment: Negative
Rule-based algorithm #1/3
• Suits micro-posts (twitter) or individual sentences
• Polarity dictionaries for Russian (1739 positive
  and 2338 negative words)
• All words are lemmatized (A. Zaliznyak [2])
• Set of negations of Russian, that tend to
  noticeably affect on polarity of connected
  word(s): не плохо (not bad); also gap between
  words are processed correctly, for example: Я не
  сильно люблю это (I do not strongly like this)
Rule-based algorithm #2/3

• Set of opposite conjunctions of Russian, which
  affect on polarity of sentence’s subclauses in
  relation to each other: Большинству это всё
  нравится, а мне нет (Majority likes this, but I do
  not)
• totalScore = positiveScore – negativeScore -
  oppositeConjuctionSentimentScore, where
  oppositeConjuctionSentimentScore removes the
    polarity mass from the sentence with a conjunction
    and is: sentimentWordCount / 2
Rule-based algorithm #3/3
• Object oriented sentiment detection

• First each sentence of the input text is examined for the
  presense of the keywords of the object
• If the sentence was found, it is checked for the presence of
  conjuctions or other boundaries of subclauses (like
  punctuation)
• If there is no boundary found, the sentiment of the entire
  found sentence is detected according to the algorithm
  described above
• If there is a boundary, the subclause containing the
  keywords is identified and sentiment of the subclause is
  detected according to the algorithm described above
Performance
• Test data: text reviews (many sentences)
• Accuracy of 64%
• 92% precision and 69% recall for positive class
  when two annotators have agreed
• Much lower precision and recall for negative class
  (not enough dictionary entries, sentiment for text
  level to be defined)
• Worked slightly better for 2-way classifier
  ensemble with Multinomial Naive Bayes [3]
Open problems
•   Multi-sentence sentiment detection
•   Domain adaptation: mining polarity words [4]
•   Adding more rules for shallow parsing
•   Trying out formal syntactic parsing
•   Automatic detection of product names
    (Named Entity Recognition)
Questions?




Thank you!
Bibliography
• [1] Bermingham, A. and Smeaton, A.F. (2009).
  A study of interannotator agreement for
  opinion retrieval. In SIGIR, 784-785.
• [2] Andrey Zaliznyak. Grammaticheskij slovar'
  russkogo jazyka. Moskva, 1977, (further
  editions are 1980, 1987, 2003).
• [3] Poroshin V. (2012). Proof of concept
  statistical sentiment classification at ROMIP
  2011. In Dialog.
Bibliography
• [4] Chetverkin I., Loukachevitch N. (2010).
  Automatic Extraction of Domain-specific
  Opinion Words. Dialogue.
• [5] Minqing Hu, Bing Liu. (2004). Mining and
  summarizing customer reviews. In Proc. of the
  tenth ACM SIGKDD international conference
  on Knowledge discovery and data mining.

More Related Content

PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
PPT
Opinion Mining Tutorial (Sentiment Analysis)
PPTX
Understanding GloVe
PDF
Incorporating Diversity in a Learning to Rank Recommender System
PDF
Introduction to Recommendation Systems
PDF
Natural Language Processing (NLP)
PDF
Recommender Systems
PPT
Opinion Mining
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Opinion Mining Tutorial (Sentiment Analysis)
Understanding GloVe
Incorporating Diversity in a Learning to Rank Recommender System
Introduction to Recommendation Systems
Natural Language Processing (NLP)
Recommender Systems
Opinion Mining

What's hot (20)

PPT
How Sentiment Analysis works
PDF
Practical sentiment analysis
PDF
Chicken swarm optimization (CSO)
PPTX
sentiment analysis using support vector machine
PDF
Amazon sentimental analysis
PDF
Graph-Powered Machine Learning
PPTX
Knn Algorithm presentation
PPTX
DBSCAN : A Clustering Algorithm
PDF
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
PPTX
Datajob 2013 - Construire un système de recommandation
PDF
Machine learning for social media analytics
PDF
Tutorial: Context In Recommender Systems
PPTX
Tutorial on Question Answering Systems
PPTX
Opinion Mining or Sentiment Analysis
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PDF
Twitter Big Data
PDF
Rakutenとsreと私 yanagimoto koichi
PDF
Natural Language Processing Crash Course
PDF
Glove global vectors for word representation
PPTX
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
How Sentiment Analysis works
Practical sentiment analysis
Chicken swarm optimization (CSO)
sentiment analysis using support vector machine
Amazon sentimental analysis
Graph-Powered Machine Learning
Knn Algorithm presentation
DBSCAN : A Clustering Algorithm
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Datajob 2013 - Construire un système de recommandation
Machine learning for social media analytics
Tutorial: Context In Recommender Systems
Tutorial on Question Answering Systems
Opinion Mining or Sentiment Analysis
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Twitter Big Data
Rakutenとsreと私 yanagimoto koichi
Natural Language Processing Crash Course
Glove global vectors for word representation
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Ad

Viewers also liked (9)

PDF
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
PPTX
Tutorial on Opinion Mining and Sentiment Analysis
PPTX
Sentiment analysis using naive bayes classifier
PDF
CS571: Sentiment Analysis
PDF
Introduction to Deep Learning and neon at Galvanize
PDF
CS571: Gradient Descent
PPT
Text categorization
PDF
(Deep) Neural Networks在 NLP 和 Text Mining 总结
PPTX
Text categorization
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
Tutorial on Opinion Mining and Sentiment Analysis
Sentiment analysis using naive bayes classifier
CS571: Sentiment Analysis
Introduction to Deep Learning and neon at Galvanize
CS571: Gradient Descent
Text categorization
(Deep) Neural Networks在 NLP 和 Text Mining 总结
Text categorization
Ad

Similar to Rule based approach to sentiment analysis at romip’11 slides (20)

PDF
Rule based approach to sentiment analysis at ROMIP 2011
PPTX
Lac presentation
PPTX
Fypca4
PPTX
Fypca4
PDF
Estimating the overall sentiment score by inferring modus ponens law
PDF
Sentence level sentiment polarity calculation for customer reviews by conside...
PPT
Fypca4
PPT
sa-mincut-aditya.ppt
PPT
ppt on sentiment analysis using various techniques
PDF
A survey on approaches for performing sentiment analysis ijrset october15
PDF
Sentiment Analysis (GDSCTU).pdf
PDF
H046025258
PDF
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
PPTX
Fypca5
PPTX
Sentiment analysis
PDF
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
PPTX
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
PDF
Supervised Approach to Extract Sentiments from Unstructured Text
Rule based approach to sentiment analysis at ROMIP 2011
Lac presentation
Fypca4
Fypca4
Estimating the overall sentiment score by inferring modus ponens law
Sentence level sentiment polarity calculation for customer reviews by conside...
Fypca4
sa-mincut-aditya.ppt
ppt on sentiment analysis using various techniques
A survey on approaches for performing sentiment analysis ijrset october15
Sentiment Analysis (GDSCTU).pdf
H046025258
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Fypca5
Sentiment analysis
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
Supervised Approach to Extract Sentiments from Unstructured Text

More from Dmitry Kan (20)

PDF
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
PDF
Vector databases and neural search
PPTX
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
PDF
IR: Open source state
PDF
SentiScan: система автоматической разметки тональности в social media
PDF
Social spam detection by SemanticAnalyzer Group
PDF
Lucene revolution eu 2013 dublin writeup
PDF
Starget sentiment analyzer for English
PDF
Linguistic component Tokenizer for the Russian language
PDF
Linguistic component Lemmatizer for the Russian language
PDF
Linguistic component Sentiment Analyzer for the Russian language
PDF
Solr onfitnesse learningfromberlinbuzzwords
PDF
MTEngine: Semantic-level Crowdsourced Machine Translation
PDF
Machine translation course program (in English)
PDF
Icsoft 2011 51_cr
PDF
Poster: Method for an automatic generation of a semantic-level contextual tra...
PDF
Semantic feature machine translation system
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Introduction To Machine Translation 1
PDF
Introduction To Machine Translation
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
Vector databases and neural search
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
IR: Open source state
SentiScan: система автоматической разметки тональности в social media
Social spam detection by SemanticAnalyzer Group
Lucene revolution eu 2013 dublin writeup
Starget sentiment analyzer for English
Linguistic component Tokenizer for the Russian language
Linguistic component Lemmatizer for the Russian language
Linguistic component Sentiment Analyzer for the Russian language
Solr onfitnesse learningfromberlinbuzzwords
MTEngine: Semantic-level Crowdsourced Machine Translation
Machine translation course program (in English)
Icsoft 2011 51_cr
Poster: Method for an automatic generation of a semantic-level contextual tra...
Semantic feature machine translation system
NoSQL, Apache SOLR and Apache Hadoop
Introduction To Machine Translation 1
Introduction To Machine Translation

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology

Rule based approach to sentiment analysis at romip’11 slides

  • 1. Rule-based approach to sentiment analysis at ROMIP’11 Dmitry Kan dmitry.kan@gmail.com Twitter: @DmitryKan AlphaSense Inc Dialogue, 2012
  • 2. Outline • Problem definition • Base level for accuracy • Towards shallow parsing of input text • Rule-based algorithm • Object-oriented sentiment detection • Performance • Open problems
  • 3. Problem definition • What is sentiment for people: – Mood of the author? Mood of the reader? Personal attitude? – Opinion about the target object (product etc)? – Something else, defined by an annotator’s boss? • What is sentiment for a computer: – General polarity background – General opinion mining – Object (product) oriented opinion mining – Polarity strength detection
  • 4. Base level for accuracy • cross-annotator agreement gives 80% [1] • Real performance of the system is the one it shows when used on un-annotated data • Real example: ”CEO of the company turned 50” (was marked as positive -> why?) • Some machine learning (ML) methods can give 90% and more on test data • Hard (unless impossible) to do object oriented sentiment detection with ML
  • 5. Towards shallow parsing of input text Opposite conjunction negation totalSentimentScore = Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found 0, if no opp conj found Majority likes this, but I do not like this NOT(polarity) = opposite_polarity Opposite conjunction Object: iPhone Sentiment: positive negation Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative Object: - Sentiment: neutral (mixed) I liked new iPhone, but GalaxyS is not easy to use iPhone GalaxyS
  • 6. Rule based algorithm flow on example sentence Majority likes this, but I do not like this. Phase1 (negations): posScore = 0 – negation weight = -2 Phase2 (individual words): Word ”likes”: posScore = -2 + 1 = -1 Word ”not”: negScore = 0 + 1 = 1 Word ”like”: posScore = -1 + 1 = 0 Phase3 (oppositeConjuctions): sentimentCount = 3 totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2 Sentiment: Negative
  • 7. Rule-based algorithm #1/3 • Suits micro-posts (twitter) or individual sentences • Polarity dictionaries for Russian (1739 positive and 2338 negative words) • All words are lemmatized (A. Zaliznyak [2]) • Set of negations of Russian, that tend to noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)
  • 8. Rule-based algorithm #2/3 • Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not) • totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, where oppositeConjuctionSentimentScore removes the polarity mass from the sentence with a conjunction and is: sentimentWordCount / 2
  • 9. Rule-based algorithm #3/3 • Object oriented sentiment detection • First each sentence of the input text is examined for the presense of the keywords of the object • If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation) • If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above • If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above
  • 10. Performance • Test data: text reviews (many sentences) • Accuracy of 64% • 92% precision and 69% recall for positive class when two annotators have agreed • Much lower precision and recall for negative class (not enough dictionary entries, sentiment for text level to be defined) • Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]
  • 11. Open problems • Multi-sentence sentiment detection • Domain adaptation: mining polarity words [4] • Adding more rules for shallow parsing • Trying out formal syntactic parsing • Automatic detection of product names (Named Entity Recognition)
  • 13. Bibliography • [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785. • [2] Andrey Zaliznyak. Grammaticheskij slovar' russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003). • [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.
  • 14. Bibliography • [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue. • [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.