SlideShare a Scribd company logo
Linguistic Component: Sentiment
Analyzer for the Russian language
Technical description
SemanticAnalyzer Group, 2013-08-30
www.semanticanalyzer.info
This document describes technical details of sentiment analyzer for the Russian language. The
component has several modes of operation:
 Processing of generic texts: news, technical articles etc
 Processing of Twitter messages
 Processing of above two types of texts for generic background sentiment
 Processing of above two types of texts for a set of multi-word synonyms representing a target
object
The sentiment analyzer is based on two other linguistic components: tokenizer and lemmatizer (see
their respective Technical descriptions). Beside attributing to one of three classes {NEGATIVE, NEUTRAL,
POSITIVE} the analyzer is capable of analyzing objectivity / subjectivity of an input message.
Demo package sent upon request contains the following:
 Java library of sentiment analyzer in a form of a binary
 Polarity dictionaries
 run_sentiment_engine.sh script for swift checking the functionality of the module
 messages_to_detect_sentiment.txt file containing examples of generic text and tweets for
sentiment attribution using the run_sentiment_engine.sh script
The algorithm is based on a set of rules, that compactly model flow of sentiment within an input
message. The synonym matching can be strong and fuzzy (accomodating misspellings of an object name
in a text).
Speed of processing
Server: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz
Operating system: ubuntu 10.04, Java 1.7.0_21 64 bit server
480 characters/ms
70 tokens/ms
Tests were conducted in a single thread on 63 511 tweet messages with 2 527 227 words and 17 350 258
characters. Total time of execution: 36170 ms.
Format of the messages_to_detect_sentiment.txt file
This file describes input data for the sentiment analyzer for demo purposes.
Format:
Text
OR
TexttKeyword comma separated list
Text contains textual data in Russian for detecting sentiment
t – tab symbol
Keyword comma separated list is a list of object synonyms to detect sentiment against.
Examples of detecting sentiment
The run_sentiment_engine.sh script will generate the following file: messages_to_detect_sentiment.out.
For the following input file messages_to_detect_sentiment.txt:
Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone
(sentence: ”I liked new iPhone, but GalaxyS is unhandy” with the object described with the keyword
”iPhone”)
This output gets generated:
Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone [iphone] POSITIVE
For the following input file messages_to_detect_sentiment.txt:
Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS
(same sentence, but with the object described with the keyword ”GalaxyS”)
This output gets generated:
Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS [galaxys]
NEGATIVE
Examples of using the library from the Java code
public void testdetectPolarityOfText() throws Exception {
SentimentEngine sentimentEngine = new SentimentEngine(new
File("conf/sentiment-module.properties"));
sentimentEngine.setVerbose(true);
// variants of the same brand McCafe in Russian tweets
String synonyms[] = {"МсCafe", "maccafe", "маккафе", ""мак кафе"",
"маккафэ", ""мак кафэ""};
List<List<String>> synonymsList = new ArrayList<List<String>>();
for(String synonym: synonyms) {
List<String> curSynonym = new ArrayList<String>();
curSynonym.add(synonym);
synonymsList.add(curSynonym);
}
// tweet message: ”We were in McCafe today! Unbelievable tasty cakes,
but damn, they are so big!!”
SynonymSentiment synonymSentiment =
sentimentEngine.detectPolarityOfTextForSynonyms("ох сегодня
были в МакКафе! безумно вкусные пирожные, но блии н они ж гиганские!!",
synonymsList);
assertEquals(true, synonymSentiment.isSynonymFound());
assertEquals(Enumerations.Sentiment.POSITIVE,
synonymSentiment.getSentimentTag());
}
This test case should pass, i.e. the detected sentiment for a set of object synonyms is going to be
POSITIVE.

More Related Content

PPTX
Custom analyzer using lucene
PPTX
Document Summarizer
PDF
Semantic feature machine translation system
PDF
Introduction To Machine Translation 1
PDF
Starget sentiment analyzer for English
PDF
Social spam detection by SemanticAnalyzer Group
PDF
Automatic Build Of Semantic Translational Dictionary
PDF
Machine translation course program (in English)
Custom analyzer using lucene
Document Summarizer
Semantic feature machine translation system
Introduction To Machine Translation 1
Starget sentiment analyzer for English
Social spam detection by SemanticAnalyzer Group
Automatic Build Of Semantic Translational Dictionary
Machine translation course program (in English)

Viewers also liked (12)

PDF
Lucene revolution eu 2013 dublin writeup
PDF
Solr onfitnesse learningfromberlinbuzzwords
PDF
Linguistic component Lemmatizer for the Russian language
PDF
MTEngine: Semantic-level Crowdsourced Machine Translation
PDF
Introduction To Machine Translation
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Rule based approach to sentiment analysis at ROMIP 2011
PDF
Poster: Method for an automatic generation of a semantic-level contextual tra...
PPTX
Rule based approach to sentiment analysis at romip’11 slides
PDF
Linguistic component Tokenizer for the Russian language
PDF
Semantic Analysis: theory, applications and use cases
PDF
IR: Open source state
Lucene revolution eu 2013 dublin writeup
Solr onfitnesse learningfromberlinbuzzwords
Linguistic component Lemmatizer for the Russian language
MTEngine: Semantic-level Crowdsourced Machine Translation
Introduction To Machine Translation
NoSQL, Apache SOLR and Apache Hadoop
Rule based approach to sentiment analysis at ROMIP 2011
Poster: Method for an automatic generation of a semantic-level contextual tra...
Rule based approach to sentiment analysis at romip’11 slides
Linguistic component Tokenizer for the Russian language
Semantic Analysis: theory, applications and use cases
IR: Open source state
Ad

Similar to Linguistic component Sentiment Analyzer for the Russian language (20)

PDF
Using Static Analysis in Program Development
PPTX
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
PDF
Elasticsearch Analyzers Field-Level Optimization.pdf
PPTX
PDF
Sales_Prediction_Technique using R Programming
PDF
Creation of a Test Bed Environment for Core Java Applications using White Box...
PDF
Apache UIMA Introduction
PDF
Working with Dictionaries and ListsSets Modules you can use.pdf
PDF
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
PDF
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
PPTX
Oles Petriv “Creating one concept embedding space for persons, brands and new...
DOCX
Must be similar to screenshotsI must be able to run the projects.docx
PDF
Language Search
PPT
Compiler_Project_Srikanth_Vanama
PDF
Introduction to Data Structure
PDF
Sentiment Analysis
PDF
Userguide xmllistboxlite
PDF
Difficulties of comparing code analyzers, or don't forget about usability
PDF
Difficulties of comparing code analyzers, or don't forget about usability
PDF
Difficulties of comparing code analyzers, or don't forget about usability
Using Static Analysis in Program Development
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Elasticsearch Analyzers Field-Level Optimization.pdf
Sales_Prediction_Technique using R Programming
Creation of a Test Bed Environment for Core Java Applications using White Box...
Apache UIMA Introduction
Working with Dictionaries and ListsSets Modules you can use.pdf
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Must be similar to screenshotsI must be able to run the projects.docx
Language Search
Compiler_Project_Srikanth_Vanama
Introduction to Data Structure
Sentiment Analysis
Userguide xmllistboxlite
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
Ad

More from Dmitry Kan (6)

PDF
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
PDF
Vector databases and neural search
PPTX
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
PDF
SentiScan: система автоматической разметки тональности в social media
PDF
Icsoft 2011 51_cr
PDF
Computer Semantics And Machine Translation
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
Vector databases and neural search
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
SentiScan: система автоматической разметки тональности в social media
Icsoft 2011 51_cr
Computer Semantics And Machine Translation

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4

Linguistic component Sentiment Analyzer for the Russian language

  • 1. Linguistic Component: Sentiment Analyzer for the Russian language Technical description SemanticAnalyzer Group, 2013-08-30 www.semanticanalyzer.info This document describes technical details of sentiment analyzer for the Russian language. The component has several modes of operation:  Processing of generic texts: news, technical articles etc  Processing of Twitter messages  Processing of above two types of texts for generic background sentiment  Processing of above two types of texts for a set of multi-word synonyms representing a target object The sentiment analyzer is based on two other linguistic components: tokenizer and lemmatizer (see their respective Technical descriptions). Beside attributing to one of three classes {NEGATIVE, NEUTRAL, POSITIVE} the analyzer is capable of analyzing objectivity / subjectivity of an input message. Demo package sent upon request contains the following:  Java library of sentiment analyzer in a form of a binary  Polarity dictionaries  run_sentiment_engine.sh script for swift checking the functionality of the module  messages_to_detect_sentiment.txt file containing examples of generic text and tweets for sentiment attribution using the run_sentiment_engine.sh script The algorithm is based on a set of rules, that compactly model flow of sentiment within an input message. The synonym matching can be strong and fuzzy (accomodating misspellings of an object name in a text). Speed of processing Server: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz Operating system: ubuntu 10.04, Java 1.7.0_21 64 bit server 480 characters/ms 70 tokens/ms Tests were conducted in a single thread on 63 511 tweet messages with 2 527 227 words and 17 350 258 characters. Total time of execution: 36170 ms. Format of the messages_to_detect_sentiment.txt file This file describes input data for the sentiment analyzer for demo purposes. Format: Text OR
  • 2. TexttKeyword comma separated list Text contains textual data in Russian for detecting sentiment t – tab symbol Keyword comma separated list is a list of object synonyms to detect sentiment against. Examples of detecting sentiment The run_sentiment_engine.sh script will generate the following file: messages_to_detect_sentiment.out. For the following input file messages_to_detect_sentiment.txt: Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone (sentence: ”I liked new iPhone, but GalaxyS is unhandy” with the object described with the keyword ”iPhone”) This output gets generated: Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone [iphone] POSITIVE For the following input file messages_to_detect_sentiment.txt: Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS (same sentence, but with the object described with the keyword ”GalaxyS”) This output gets generated: Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS [galaxys] NEGATIVE Examples of using the library from the Java code public void testdetectPolarityOfText() throws Exception { SentimentEngine sentimentEngine = new SentimentEngine(new File("conf/sentiment-module.properties")); sentimentEngine.setVerbose(true); // variants of the same brand McCafe in Russian tweets String synonyms[] = {"МсCafe", "maccafe", "маккафе", ""мак кафе"", "маккафэ", ""мак кафэ""}; List<List<String>> synonymsList = new ArrayList<List<String>>(); for(String synonym: synonyms) { List<String> curSynonym = new ArrayList<String>(); curSynonym.add(synonym); synonymsList.add(curSynonym); } // tweet message: ”We were in McCafe today! Unbelievable tasty cakes, but damn, they are so big!!” SynonymSentiment synonymSentiment =
  • 3. sentimentEngine.detectPolarityOfTextForSynonyms("ох сегодня были в МакКафе! безумно вкусные пирожные, но блии н они ж гиганские!!", synonymsList); assertEquals(true, synonymSentiment.isSynonymFound()); assertEquals(Enumerations.Sentiment.POSITIVE, synonymSentiment.getSentimentTag()); } This test case should pass, i.e. the detected sentiment for a set of object synonyms is going to be POSITIVE.