Sentiment analysis for Serbian language

Sentiment analysis of sentences in
Serbian language

Nikola Milošević

Why to analyze sentiment in Serbian?
●

Great industrial need
–
–

Automated market research

–
●

Ads websites
Customer satisfaction

NLP tools for Serbian are not developed
–

Need for tools and resources

–

Almost no accessible tools through API

Serbian language
●

Belongs to Indo-Europian language group

●

Slavic language

●

Highly inflectional

●

3 pronunciation types

●

3 dialect groups

●

Write as you speak

●

Latin and Cyrillic
writing system

Tokenization and preprocessing
●

Process of breaking a stream of text up into
words

●

Stop-word filtering

●

Negation handling
–
–

●

Adding NE_ prefix after negation
All words before punctuation

Irregular verbs

Stemming
●

Process for reducing inflected words to their
stem, base or root form

●

Kešelj and Šipka (2008)

●

Hand crafted rule based stemmer

●

~300 rules

Sentiment analysis
●

Aim to build binary sentiment analysis

●

General Serbian language

●

No annotated corpus for Serbian

●

Annotation work (~1000 small texts)

●

Supervised machine learning

Naive Bayes
●

Algorithm that learns fast

●

Bag of words approach

●

Assumption of conditional independence

●

Laplace smoothing

Implementation
●

Web API with presentation layer

●

JSON communication

●

Secured page for annotating

●

Build using PHP and MySQL

●

Web & Android

Results
●

Stemmer
–
–

90% correct on news articles

–

●

Smallest and most precise stemmer
Problems: small words, irregular inflections,
voice changes

Sentiment analyzer
–

80% correct

–

Problems: Irony, ambiguity, small training
data

Future work
●

Stemmer
–
–

●

Use snowball framework
Build multi-step stemmer

Sentiment analyzer
–

POS tagging

–

Complex negation handling

–

SVM algorithm

Thank you

●

Available from http://inspiratron.org

●

Contact: nikola.milosevic@postgrad.manchester.ac.uk

Sentiment analysis for Serbian language

More Related Content

More from Nikola Milosevic (20)

Recently uploaded (20)

Sentiment analysis for Serbian language