SlideShare a Scribd company logo
Detecting Negative
Words
Adel Rahimi
Adel.Rahimi@mehr.sharif.edu
RezaTakhshid
Reza.takhshid95@student.sharfi.edu
Sharif University ofTechnology
Spring 2017
Sentiment datasets for other
languages
• AFINN by Finn Årup Nielsen
AFINN is a list of English words rated for valence with an integer
between minus five (negative) and plus five (positive).The words have
been manually labeled by Finn Årup Nielsen in 2009-2011.The file
is tab-separated.
Sentiment datasets for other
languages
• Opinion Lexicon by Hu and Liu
A list of English positive and negative opinion words or sentiment words
(around 6800 words)
Sentiment datasets for other
languages
• NRCWord-EmotionAssociation Lexicon by Saif Mohammad
and PeterTurney
The NRC Emotion Lexicon is a list of English words and their associations with
eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and
disgust) and two sentiments (negative and positive).The annotations were
manually done by crowdsourcing.
Datasets
• Refined Persian Polarity corpus
• Dehdarbehbahani, I., Shakery, A., & Faili, H. (2014). Semi-supervised word
polarity identification in resource-lean languages. Neural Networks, 58, 50-
59.
• Corpus of exceptions
• [working dataset]
Datasets
• Corpus of exceptions was extracted from Flexicon
database
• ‫ناشتا‬
• ‫نارنگی‬
• ‫نارج‬
• ‫بیدمشک‬
• ‫الروبی‬
Datasets
• The exceptions list also needs to be refined
• ‫اجتماعی‬ ‫غیر‬
• ‫اصولی‬ ‫غیر‬
• ‫ناشنوایی‬
• ‫پادگان‬
Datasets
• List of validly affixed words
How does the algorithm work?
INPUT
Not
Negative
Not
Negative
Negatives
list
Exception
list
Negative
Affix
searching
Negative
Further development
• Creating a database of Affixed but positive words
• ‫پروا‬ ‫بی‬
• ‫ضدآب‬
• Using Elasticsearch as database for making the process
faster
• Using a corpus instead of FLexicon
• Using statistical approaches for increasing accuracy
Further Research
• The datasets are still not reliable enough. Need to be
worked on so that the accuracy of the algorithm will be
higher.

More Related Content

PDF
Singapore's Macroeconomics analysis
PPTX
Artificial Bee Colony: An introduction
PDF
Talking Animals
PPTX
Neural Networks with Focus on Language Modeling
PDF
Neural Networks
PPTX
Improvement of English to Persian Machine Translation via N-grams of Part-of-...
PPTX
corpus study of multi token units
PPTX
Big Data + Sentiment Analysis = Awesome
Singapore's Macroeconomics analysis
Artificial Bee Colony: An introduction
Talking Animals
Neural Networks with Focus on Language Modeling
Neural Networks
Improvement of English to Persian Machine Translation via N-grams of Part-of-...
corpus study of multi token units
Big Data + Sentiment Analysis = Awesome

Recently uploaded (20)

PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
medical staffing services at VALiNTRY
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
AutoCAD Professional Crack 2025 With License Key
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
Transform Your Business with a Software ERP System
PDF
Digital Systems & Binary Numbers (comprehensive )
Autodesk AutoCAD Crack Free Download 2025
Odoo Companies in India – Driving Business Transformation.pdf
medical staffing services at VALiNTRY
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
CHAPTER 2 - PM Management and IT Context
Wondershare Filmora 15 Crack With Activation Key [2025
Design an Analysis of Algorithms I-SECS-1021-03
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
AutoCAD Professional Crack 2025 With License Key
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
How to Choose the Right IT Partner for Your Business in Malaysia
L1 - Introduction to python Backend.pptx
Computer Software and OS of computer science of grade 11.pptx
Designing Intelligence for the Shop Floor.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Oracle Fusion HCM Cloud Demo for Beginners
iTop VPN Free 5.6.0.5262 Crack latest version 2025
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Transform Your Business with a Software ERP System
Digital Systems & Binary Numbers (comprehensive )
Ad
Ad

Detecting negative words

  • 2. Sentiment datasets for other languages • AFINN by Finn Årup Nielsen AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive).The words have been manually labeled by Finn Årup Nielsen in 2009-2011.The file is tab-separated.
  • 3. Sentiment datasets for other languages • Opinion Lexicon by Hu and Liu A list of English positive and negative opinion words or sentiment words (around 6800 words)
  • 4. Sentiment datasets for other languages • NRCWord-EmotionAssociation Lexicon by Saif Mohammad and PeterTurney The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).The annotations were manually done by crowdsourcing.
  • 5. Datasets • Refined Persian Polarity corpus • Dehdarbehbahani, I., Shakery, A., & Faili, H. (2014). Semi-supervised word polarity identification in resource-lean languages. Neural Networks, 58, 50- 59. • Corpus of exceptions • [working dataset]
  • 6. Datasets • Corpus of exceptions was extracted from Flexicon database • ‫ناشتا‬ • ‫نارنگی‬ • ‫نارج‬ • ‫بیدمشک‬ • ‫الروبی‬
  • 7. Datasets • The exceptions list also needs to be refined • ‫اجتماعی‬ ‫غیر‬ • ‫اصولی‬ ‫غیر‬ • ‫ناشنوایی‬ • ‫پادگان‬
  • 8. Datasets • List of validly affixed words
  • 9. How does the algorithm work? INPUT Not Negative Not Negative Negatives list Exception list Negative Affix searching Negative
  • 10. Further development • Creating a database of Affixed but positive words • ‫پروا‬ ‫بی‬ • ‫ضدآب‬ • Using Elasticsearch as database for making the process faster • Using a corpus instead of FLexicon • Using statistical approaches for increasing accuracy
  • 11. Further Research • The datasets are still not reliable enough. Need to be worked on so that the accuracy of the algorithm will be higher.