SlideShare a Scribd company logo
Emotion Analysis for
Big Data
NTHU CS, Yi-Shin Chen
Hello!
I am Yi-Shin Chen
Currently in NTHU CS
Intelligent Data Engineering and
Application Lab (IDEA Lab)
You can find me at:
yishin@gmail.com
2
We Promote Diversity at
More than 50 % students come from other countries
Belize
France
St Lucia
Honduras
India China
Japan
Taiwan
Indonesia
São Tomé
3
1.
Why Emotion Analysis
There are few personal reasons
4
“I don’t understand woman!! Their
words are very vague and ambiguous”
From Carlos Argueta, my first foreign Ph.D. graduate
He’s the one to select the topic of sentiment analysis.
And the first suffering from depression in our lab
5
Children are Bewildering
They don't say and they cannot say.
6
2.
Emotion Analysis
Let's see what others did/do
7
Natural Language Processing
▷Analyze Part-of-Speech (POS) tagging
▷Understand word meaning
▷Analyze the relationships between words
 Need dictionaries & semantic relationships
 Word positions affect statement meanings
 Need different data for different languages
This is the best thing happened in my life.
Det. Det. NN PNPre.Verb VerbAdj Difficult
8
Data Mining/Machine Learning
▷Collect massive data
▷Manually annotate training data
▷Analyze data with classifiers
 Recollect training data for different
languages
 Low recall rates (<<25%) Easier?
9
3.
Learning from Experience
Difference between Reality and Practice
10
Emotion Embedded in Trivia
▷Most trivia are ignored in previous works
• Stop Words are the first batch to be removed
→E.g., often, above, again
• Determiner, pronoun are usually ignored
• Most nouns are considered unimportant
My mom always said school is more important
😒 Angry 😂 Sad 👶 Joy
11
Emotional Mistakes
▷Mistakes everywhere
• Some are careless
→E.g., Luve you
• Some are intentional
→E.g., I’m soooooooo happppppy
▷Mistakes are not recorded in dictionaries
• How to annotate mistakes?
→ Annotation cost A LOT!
12
Children
are our
mentors
Mumbling from a mom
▷My one-year-old kid can detect my emotion
• Without seeing my face
• I did not change my tone
• How come she is always right?
▷Guessing
• She did not know grammar
• She did not memorize any dictionary
• My statements might have a lot of mistakes
Goal
Multi-lingual
13
4.
Overcome Challenges
Insufficient Research Fund
14
Free Resources
▷Free Data
• As long as they can be legally accessed
▷Open source software
15
Philosophy Slow Life
▷Our students are often delayed by various reasons
▷Not follow the trends
• Usually against common sense in academic
No POS Tagging
No dictionary
Multilingual
😱
Failure Success
16
POS Tagging
Multiple dictionaries
One language
Teamwork
▷Implementation team
• Coding
• More coding
▷Dreaming team
• Reading papers
• Design
▷Boasting team
• Writing papers
• Generating presentation
▷Anonymous
17
Crowdsourcing
Merriam-Webster: Obtaining needed services, ideas, or
content by soliciting contributions from a large group of
people, especially an online community
Cost
$$$
18
Subconscious Crowdsourcing
▷Crowdsourcing in subconscious
• Free
• Extract the subconscious from daily-life records
→ Ex1: “computers/companies/product-support/apple” in
delicious tag
→ Ex2: “Trump” “Nickname generator” in search log
→ Ex3: “School day again #sad” in Twitter
Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection
Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August,
2016 19
5.
Case1: Analyze Emotions from Text
Utilize subconscious emotion patterns
20
Subconscious Emotion Big Data
▷Twitter, a good public source
Throwing my phone always calms me down #anger
My sister always makes things look much more worse than they seem >:[ #anger
Why my brother always crabby !?!? #rude #youranadult #anger #issues
WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger
Im wanna crazy,if my life always sucks like this. #anger
Hashtag and emoticon can represent emotion well;
hence can be treated as annotated answers
21
Collect Emotion Data
22
Collect Emotion Data
23
Collect Emotion Data Wait!
Need
Control
Group
24
Not-Emotion Data
25
Not-Emotion Data
26
Not-Emotion Data
27
Preprocessing Steps
▷Hints: Remove troublesome ones
o Too short
→ Too short to get important features
o Contain too many hashtags
→ Too much information to process
o Are retweets
→ Increase the complexity
o Have URLs
→ Too trouble to collect the page data
o Convert user mentions to <usermention> and hashtags to
<hashtag>
→ Remove the identification. We should not peek answers!
Big
Data
anyway
28
Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups
• Analyze the frequency of words
→ TF•IDF (Term frequency, inverse document frequency)
• Analyze the co-occurrence between words/patterns
→ Co-occurrence
• Analyze the importance between words
→ Centrality
Graph
29
Graph Construction
▷Construct two graphs
• E.g.
→Emotion one: I love the World of Warcraft new game 
→ Not-emotion one: 3,000 killed in the world by ebola
I
of
Warcraft
new
game
WorldLove
the
0.9
0.84
0.65
0.12
0.12
0.53
0.67

0.45
3,000
world
by
ebola
the
killed in
0.49
0.87
0.93
0.83
0.55
0.25 30
Graph Processes
▷Remove the common ones between two graphs
• Leave the significant ones only appear in the
emotion graph
▷Analyze the centrality of words
• Betweenness, Closeness, Eigenvector, Degree, Katz
→ Can use the free/open software, e.g, Gaphi, GraphDB
▷Analyze the cluster degrees
• Clustering Coefficient
GraphKey
patterns
31
Essence Only
Only key phrases
→emotion patterns
32
Ranking Emotion Patterns
▷ Ranking the emotion patterns for each emotion
• Frequency, exclusiveness, diversity
• One ranked list for each emotion
SadJoy Anger
33
Emotion Pattern Samples
SadJoy Anger
finally * my
tomorrow !!! *
<hashtag> birthday .+
* yay !
:) * !
princess *
* hehe
prom dress *
memories *
* without my
sucks * <hashtag>
* tonight :(
* anymore ..
felt so *
. :( *
* :((
my * always
shut the *
teachers *
people say *
-.- *
understand why *
why are *
with these *
34
Precision
35
Naïve Bayes SVM NRCWE Our Approach
English 81.90% 76.60% 35.40% 81.20%
Spanish 70.00% 52.00% 0.00% 80.00%
French 72.00% 61.00% 0.00% 84.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy
LIWC
No LIWC
Feedback for Products
36
商品喜好分析
37
5.
Case2:Analyze Emotion
Status for individuals
Who is bi-polar disorder?
Who is borderline personal disorder?
38
Collect Patient Data
39
Support
Group
Collect Patient Data
40
Followers
Collect Patient Data
41
Collect Patient Data
42
Collect Patient Data
43
Wait!
Control
Group
Needed
Collect Data from Ordinary People
44
Collect Data from Ordinary People
45
Collect Data from Ordinary People
46
Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups
• Word/pattern frequency
• Emotion related data (e.g., flipping rates, occurrence rates)
• Social interaction (e.g., retweet, reply)
• Lifestyle (e.g., online time, stay-up or not)
• Age and gender
Features
47
Apply Classifiers
▷ By utilize the extracted features
▷ Various classifiers
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
• Random forest
48
Precisions
49
Possible Applications
50
Possible Applications
51
Possible Applications
52
Possible Applications
53
Election Analysis?
54
Election Analysis?
55
Election Analysis?
56
Election Analysis?
57
Election Analysis?
58
More in the future…
Thank you.
Contact me at:
yishin@gmail.com

More Related Content

PPTX
Natural Language processing
PDF
Practical Natural Language Processing
PPTX
Quick Tour of Text Mining
PPTX
Quick tour all handout
PPTX
大數據下的情緒分析
PPTX
Research and life
PPTX
TAAI 2016 Keynote Talk: Contention and Disruption
PDF
TAAI 2016 Keynote Talk: It is all about AI
Natural Language processing
Practical Natural Language Processing
Quick Tour of Text Mining
Quick tour all handout
大數據下的情緒分析
Research and life
TAAI 2016 Keynote Talk: Contention and Disruption
TAAI 2016 Keynote Talk: It is all about AI

Viewers also liked (18)

PPTX
照海華德福教育簡介
PDF
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
PPTX
Viti kei rotuma MBA437 marketing presentation
PPTX
Manfaat coffee
PPTX
MaGIC Startup Academy : Day 3- The Art of Effective Negotiations (Randy Lubin)
PPT
Controversy as Pedagogy
DOCX
Relationship between Employment generation and Monetary Policy
PPTX
Rosa
PPTX
PDF
Presence infographic final
PPTX
Information technology
PDF
MaGIC Startup Academy Launch : Keynote Day 1 - Challenging Startup Convention...
PPTX
materi TIK bab 1 kelas 9f smp 18 semarang
PDF
Duck Duck Go
PPT
Simple compound and complex
PDF
MaGIC Academy : The Power of Validation
PPTX
Tik bab 3
PPTX
Tik bab 1
照海華德福教育簡介
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
Viti kei rotuma MBA437 marketing presentation
Manfaat coffee
MaGIC Startup Academy : Day 3- The Art of Effective Negotiations (Randy Lubin)
Controversy as Pedagogy
Relationship between Employment generation and Monetary Policy
Rosa
Presence infographic final
Information technology
MaGIC Startup Academy Launch : Keynote Day 1 - Challenging Startup Convention...
materi TIK bab 1 kelas 9f smp 18 semarang
Duck Duck Go
Simple compound and complex
MaGIC Academy : The Power of Validation
Tik bab 3
Tik bab 1
Ad

Similar to 2016 datascience emotion analysis - english version (20)

PDF
Triagile Conference Workshop on Ubiquitous Coaching using Clean Language
PDF
Multimodal opinion mining from social media
PDF
Agile NYC 9_5_2018
PDF
Technology Wellness in the Nonprofit Workplace
PDF
Social Human Architecture for Beginners
PDF
Webinar - Combatting Nonprofit Burnout: Managing Tech and Email Overload - 20...
PPTX
Intro To Year 8.pptx
PDF
7 things I didn't know when I graduated
PDF
Customer Interviews: Uncovering Product or Feature Ideas
PPTX
Doing Customer Interviews Right
PDF
NeuroEducation: Science of the Learning Brain
PDF
21st century skills. Some ideas about teaching english today and tomorrow.
PPTX
Having Hard Conversations
PPTX
SXSW - Diving Deep: Best Practices For Interviewing Users
PPTX
Cesif week 2
PPT
Differentiation/ Stretch&Challenge Training
DOCX
College of Doctoral StudiesRES-866 Quali.docx
PPTX
MCL Jan 4 Professional Development
PPTX
AMA Reseach & Strategy Summit: Community Workshop
Triagile Conference Workshop on Ubiquitous Coaching using Clean Language
Multimodal opinion mining from social media
Agile NYC 9_5_2018
Technology Wellness in the Nonprofit Workplace
Social Human Architecture for Beginners
Webinar - Combatting Nonprofit Burnout: Managing Tech and Email Overload - 20...
Intro To Year 8.pptx
7 things I didn't know when I graduated
Customer Interviews: Uncovering Product or Feature Ideas
Doing Customer Interviews Right
NeuroEducation: Science of the Learning Brain
21st century skills. Some ideas about teaching english today and tomorrow.
Having Hard Conversations
SXSW - Diving Deep: Best Practices For Interviewing Users
Cesif week 2
Differentiation/ Stretch&Challenge Training
College of Doctoral StudiesRES-866 Quali.docx
MCL Jan 4 Professional Development
AMA Reseach & Strategy Summit: Community Workshop
Ad

More from Yi-Shin Chen (9)

PPTX
從自然語言處理到文字探勘
PPTX
從人工智慧反思教育現場
PPTX
From NLP to text mining
PPTX
2017大數據情緒分析的經驗分享
PPTX
照海華德福教育簡介
PPTX
新竹實驗教育的新契機
PPTX
一名女科技人的反思
PPTX
Examples of working with streaming data
PPTX
2017 ncu experience sharing
從自然語言處理到文字探勘
從人工智慧反思教育現場
From NLP to text mining
2017大數據情緒分析的經驗分享
照海華德福教育簡介
新竹實驗教育的新契機
一名女科技人的反思
Examples of working with streaming data
2017 ncu experience sharing

Recently uploaded (20)

PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
annual-report-2024-2025 original latest.
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Transcultural that can help you someday.
PDF
.pdf is not working space design for the following data for the following dat...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
[EN] Industrial Machine Downtime Prediction
Data_Analytics_and_PowerBI_Presentation.pptx
Business Analytics and business intelligence.pdf
annual-report-2024-2025 original latest.
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Leprosy and NLEP programme community medicine
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Optimise Shopper Experiences with a Strong Data Estate.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Transcultural that can help you someday.
.pdf is not working space design for the following data for the following dat...

2016 datascience emotion analysis - english version