SlideShare a Scribd company logo
Sentiment analysis of sentences in
Serbian language

Nikola Milošević
Why to analyze sentiment in Serbian?
●

Great industrial need
–
–

Automated market research

–
●

Ads websites
Customer satisfaction

NLP tools for Serbian are not developed
–

Need for tools and resources

–

Almost no accessible tools through API
Serbian language
●

Belongs to Indo-Europian language group

●

Slavic language

●

Highly inflectional

●

3 pronunciation types

●

3 dialect groups

●

Write as you speak

●

Latin and Cyrillic
writing system
Sentiment analysis work-flow
Tokenization and preprocessing
●

Process of breaking a stream of text up into
words

●

Stop-word filtering

●

Negation handling
–
–

●

Adding NE_ prefix after negation
All words before punctuation

Irregular verbs
Stemming
●

Process for reducing inflected words to their
stem, base or root form

●

Kešelj and Šipka (2008)

●

Hand crafted rule based stemmer

●

~300 rules
Sentiment analysis
●

Aim to build binary sentiment analysis

●

General Serbian language

●

No annotated corpus for Serbian

●

Annotation work (~1000 small texts)

●

Supervised machine learning
Naive Bayes
●

Algorithm that learns fast

●

Bag of words approach

●

Assumption of conditional independence

●

Laplace smoothing
Implementation
●

Web API with presentation layer

●

JSON communication

●

Secured page for annotating

●

Build using PHP and MySQL

●

Web & Android
Results
●

Stemmer
–
–

90% correct on news articles

–

●

Smallest and most precise stemmer
Problems: small words, irregular inflections,
voice changes

Sentiment analyzer
–

80% correct

–

Problems: Irony, ambiguity, small training
data
Future work
●

Stemmer
–
–

●

Use snowball framework
Build multi-step stemmer

Sentiment analyzer
–

POS tagging

–

Complex negation handling

–

SVM algorithm
Thank you

●

Available from http://inspiratron.org

●

Contact: nikola.milosevic@postgrad.manchester.ac.uk

More Related Content

PDF
Natural language processing and its application in ai
PDF
sete linguagens em sete semanas
ODP
why now is deep learning
PPTX
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
PDF
Sentiment Analysis and Social Media: How and Why
PPTX
CAMBRIDGE A2 HISTORY: IRON CURTAIN
PPTX
Tutorial of Sentiment Analysis
PDF
Sentiment Analysis of Twitter Data
Natural language processing and its application in ai
sete linguagens em sete semanas
why now is deep learning
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
Sentiment Analysis and Social Media: How and Why
CAMBRIDGE A2 HISTORY: IRON CURTAIN
Tutorial of Sentiment Analysis
Sentiment Analysis of Twitter Data

More from Nikola Milosevic (20)

PPTX
Classifying intangible social innovation concepts using machine learning and ...
PPTX
Machine learning (ML) and natural language processing (NLP)
PPTX
Veštačka inteligencija
PPTX
AI an the future of society
PPTX
Machine learning prediction of stock markets
PPTX
Equity forecast: Predicting long term stock market prices using machine learning
PPTX
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
PPTX
Extracting patient data from tables in clinical literature
PPTX
Supporting clinical trial data curation and integration with table mining
PPTX
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
PPTX
PPTX
Table mining and data curation from biomedical literature
PDF
PDF
Http and security
PDF
Android business models
ODP
Android(1)
PPT
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
PPTX
Mašinska analiza sentimenta rečenica na srpskom jeziku
PPT
PDF
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Classifying intangible social innovation concepts using machine learning and ...
Machine learning (ML) and natural language processing (NLP)
Veštačka inteligencija
AI an the future of society
Machine learning prediction of stock markets
Equity forecast: Predicting long term stock market prices using machine learning
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
Extracting patient data from tables in clinical literature
Supporting clinical trial data curation and integration with table mining
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Table mining and data curation from biomedical literature
Http and security
Android business models
Android(1)
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Mašinska analiza sentimenta rečenica na srpskom jeziku
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
Ad

Sentiment analysis for Serbian language

  • 1. Sentiment analysis of sentences in Serbian language Nikola Milošević
  • 2. Why to analyze sentiment in Serbian? ● Great industrial need – – Automated market research – ● Ads websites Customer satisfaction NLP tools for Serbian are not developed – Need for tools and resources – Almost no accessible tools through API
  • 3. Serbian language ● Belongs to Indo-Europian language group ● Slavic language ● Highly inflectional ● 3 pronunciation types ● 3 dialect groups ● Write as you speak ● Latin and Cyrillic writing system
  • 5. Tokenization and preprocessing ● Process of breaking a stream of text up into words ● Stop-word filtering ● Negation handling – – ● Adding NE_ prefix after negation All words before punctuation Irregular verbs
  • 6. Stemming ● Process for reducing inflected words to their stem, base or root form ● Kešelj and Šipka (2008) ● Hand crafted rule based stemmer ● ~300 rules
  • 7. Sentiment analysis ● Aim to build binary sentiment analysis ● General Serbian language ● No annotated corpus for Serbian ● Annotation work (~1000 small texts) ● Supervised machine learning
  • 8. Naive Bayes ● Algorithm that learns fast ● Bag of words approach ● Assumption of conditional independence ● Laplace smoothing
  • 9. Implementation ● Web API with presentation layer ● JSON communication ● Secured page for annotating ● Build using PHP and MySQL ● Web & Android
  • 10. Results ● Stemmer – – 90% correct on news articles – ● Smallest and most precise stemmer Problems: small words, irregular inflections, voice changes Sentiment analyzer – 80% correct – Problems: Irony, ambiguity, small training data
  • 11. Future work ● Stemmer – – ● Use snowball framework Build multi-step stemmer Sentiment analyzer – POS tagging – Complex negation handling – SVM algorithm
  • 12. Thank you ● Available from http://inspiratron.org ● Contact: nikola.milosevic@postgrad.manchester.ac.uk