Industrialize Sentiment Analysis
for Comment Moderation
Maggie Xiong
Huffington Post
Industrialize Sentiment Analysis for Comment Moderation
Basic Comment Moderation Process

User comments on an article

Moderator publishes or rejects a comment based on a
set of guidelines

“10 commandments”

Comments for different articles come in every second.
We would need a small army to handle the moderation.
The comment should contribute to the discussion, conveying a respectful message, thought 
or idea, whether or not it agrees with another user or the author.
The comment should not intentionally misspell words, use non-alphabetic characters, or use 
extra or missing spaces to bypass moderation.
The comment should not attack, demean, belittle, or stereotype any person or group.
...
JuLiA to the Rescue

Sentiment analysis suite - JuLiA

Supports various preprocessing options

Stemming, stopwords, etc

Includes a number of popular ML algorithms

SVM, naïve Bayes, AdaBoost (decision tree), etc

Uses hadoop for parallelizing the training of different
models and for the exploration of the parameter space

Train 1000's of models with different param setup in parallel

Pick the winner for production

Ensemble the different winners for even higher accuracy
Training Data

Goldset

About 20000 comments (~13000 train, ~7000 holdout)

Publish-or-reject votes from 3 moderators
Christian and Gay? One Politician's Personal Interview (VIDEO)
I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have
read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's
your interpretation of the scripture then make sure you abide by it.
Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'
what an angry petty little man he is. issues too. lots of issues he needs to work on. He
certainly has nothing of value to offer or to say. he's a screwed up little prick
Paul Ryan Spending Cuts Face Backlash From Moderate Republicans
You seem to take a negative view of democrats and draw reference to a study "I co-
authored with Robert Book".....sort of like a Muslim professor writing a book on
Christianity your biases disqualify you from offering anything other than a self serving
opinion....now of course I'm just using republican/fox news logic here"
Training Process
73 923 balanced_winnow 5 1 10 …
73 923 balanced_winnow 5 2 10 …
73 923 balanced_winnow 5 3 10 …
73 923 balanced_winnow 5 1 20 …
73 923 balanced_winnow 5 2 20 …
73 923 balanced_winnow 5 3 20 …
…
Train Request (a parameter set per line)
Investments are taxed as capital gains..... 1
It was the overleveraged and underregulated banks … 1
I am afraid we may be headed for … 1
In the famous words of Homer Simpson, “it takes 2 to lie …” 0
…
Training Data
Model 1Model 1
Model 2Model 2
Model 3Model 3
Model 4Model 4
Model 5Model 5
Model kModel k
Hadoop Cluster
Results

Single best model: Naïve Bayes
Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Pool for Better Results

Logistic regression using multiple model results
Pool for Better Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Further Steps

Improve the training data set

Data gathered within moderators' normal work flow

More votes per comment

More comments

Per vertical models

Incorporate comment-to-article similarity
In addition to saving his
own life, Zimmerman likely
save a couple other lives
as well.
Thanks!

Conversation and Machine Learning teams

We are hiring!
– maggie.xiong@huffingtonpost.com

More Related Content

PPTX
Sentiment Analysis
PPTX
Market penetration - intensification strategies - corporate level strategies ...
PPT
Market Penetration Of Maggie Noodels
PDF
Brand Study : MAGGI
PDF
CFP workshop
PDF
Knowing Ranking Factors won't be enough!
PPTX
2020 09 24 - CONDG ML.Net
PDF
Modelling Heuristics
Sentiment Analysis
Market penetration - intensification strategies - corporate level strategies ...
Market Penetration Of Maggie Noodels
Brand Study : MAGGI
CFP workshop
Knowing Ranking Factors won't be enough!
2020 09 24 - CONDG ML.Net
Modelling Heuristics

Similar to Industrialize Sentiment Analysis for Comment Moderation (20)

PPTX
2020 04 10 Catch IT - Getting started with ML.Net
PDF
Politics Of Usability 09
PDF
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
PDF
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
PDF
Discovering User's Topics of Interest in Recommender Systems
PPT
Winning At The Politics Of Usability Proposal 18 June 2008
PPTX
Design patterns - The Good, the Bad, and the Anti-Pattern
PPTX
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
PDF
>Wondershare Recoverit 13.5.11.3 Free crack | 2025
PDF
MiniTool Partition Wizard 12.8 Crack License Key [2025] Free
PDF
Wondershare Recoverit 13.5.11.3 Free crack
PDF
EssentialPIM Pro Business Free Download
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PDF
Mastercam 2025 v27.0.7027 Free Download
PPTX
2020 01 21 Data Platform Geeks - Machine Learning.Net
PPTX
Machine Learning with Hadoop
PDF
Keepler Data Tech | Entendiendo tus propios modelos predictivos
PPTX
2020 04 04 NetCoreConf - Machine Learning.Net
PPT
Enterprise 2.0 Adoption Models.
2020 04 10 Catch IT - Getting started with ML.Net
Politics Of Usability 09
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Discovering User's Topics of Interest in Recommender Systems
Winning At The Politics Of Usability Proposal 18 June 2008
Design patterns - The Good, the Bad, and the Anti-Pattern
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
>Wondershare Recoverit 13.5.11.3 Free crack | 2025
MiniTool Partition Wizard 12.8 Crack License Key [2025] Free
Wondershare Recoverit 13.5.11.3 Free crack
EssentialPIM Pro Business Free Download
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
Mastercam 2025 v27.0.7027 Free Download
2020 01 21 Data Platform Geeks - Machine Learning.Net
Machine Learning with Hadoop
Keepler Data Tech | Entendiendo tus propios modelos predictivos
2020 04 04 NetCoreConf - Machine Learning.Net
Enterprise 2.0 Adoption Models.
Ad

Recently uploaded (20)

DOCX
search engine optimization ppt fir known well about this
PDF
Architecture types and enterprise applications.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Five Habits of High-Impact Board Members
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPT
Geologic Time for studying geology for geologist
PPT
What is a Computer? Input Devices /output devices
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Configure Apache Mutual Authentication
PPTX
TEXTILE technology diploma scope and career opportunities
PPTX
The various Industrial Revolutions .pptx
PPTX
Build Your First AI Agent with UiPath.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
CloudStack 4.21: First Look Webinar slides
search engine optimization ppt fir known well about this
Architecture types and enterprise applications.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
A proposed approach for plagiarism detection in Myanmar Unicode text
Five Habits of High-Impact Board Members
Benefits of Physical activity for teenagers.pptx
Enhancing plagiarism detection using data pre-processing and machine learning...
Geologic Time for studying geology for geologist
What is a Computer? Input Devices /output devices
UiPath Agentic Automation session 1: RPA to Agents
Consumable AI The What, Why & How for Small Teams.pdf
sustainability-14-14877-v2.pddhzftheheeeee
A contest of sentiment analysis: k-nearest neighbor versus neural network
A review of recent deep learning applications in wood surface defect identifi...
Configure Apache Mutual Authentication
TEXTILE technology diploma scope and career opportunities
The various Industrial Revolutions .pptx
Build Your First AI Agent with UiPath.pptx
Module 1.ppt Iot fundamentals and Architecture
CloudStack 4.21: First Look Webinar slides
Ad

Industrialize Sentiment Analysis for Comment Moderation

  • 1. Industrialize Sentiment Analysis for Comment Moderation Maggie Xiong Huffington Post
  • 3. Basic Comment Moderation Process  User comments on an article  Moderator publishes or rejects a comment based on a set of guidelines  “10 commandments”  Comments for different articles come in every second. We would need a small army to handle the moderation. The comment should contribute to the discussion, conveying a respectful message, thought  or idea, whether or not it agrees with another user or the author. The comment should not intentionally misspell words, use non-alphabetic characters, or use  extra or missing spaces to bypass moderation. The comment should not attack, demean, belittle, or stereotype any person or group. ...
  • 4. JuLiA to the Rescue  Sentiment analysis suite - JuLiA  Supports various preprocessing options  Stemming, stopwords, etc  Includes a number of popular ML algorithms  SVM, naïve Bayes, AdaBoost (decision tree), etc  Uses hadoop for parallelizing the training of different models and for the exploration of the parameter space  Train 1000's of models with different param setup in parallel  Pick the winner for production  Ensemble the different winners for even higher accuracy
  • 5. Training Data  Goldset  About 20000 comments (~13000 train, ~7000 holdout)  Publish-or-reject votes from 3 moderators Christian and Gay? One Politician's Personal Interview (VIDEO) I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it. Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America' what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick Paul Ryan Spending Cuts Face Backlash From Moderate Republicans You seem to take a negative view of democrats and draw reference to a study "I co- authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving opinion....now of course I'm just using republican/fox news logic here"
  • 6. Training Process 73 923 balanced_winnow 5 1 10 … 73 923 balanced_winnow 5 2 10 … 73 923 balanced_winnow 5 3 10 … 73 923 balanced_winnow 5 1 20 … 73 923 balanced_winnow 5 2 20 … 73 923 balanced_winnow 5 3 20 … … Train Request (a parameter set per line) Investments are taxed as capital gains..... 1 It was the overleveraged and underregulated banks … 1 I am afraid we may be headed for … 1 In the famous words of Homer Simpson, “it takes 2 to lie …” 0 … Training Data Model 1Model 1 Model 2Model 2 Model 3Model 3 Model 4Model 4 Model 5Model 5 Model kModel k Hadoop Cluster
  • 8. Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 9. Pool for Better Results  Logistic regression using multiple model results
  • 10. Pool for Better Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 11. Further Steps  Improve the training data set  Data gathered within moderators' normal work flow  More votes per comment  More comments  Per vertical models  Incorporate comment-to-article similarity
  • 12. In addition to saving his own life, Zimmerman likely save a couple other lives as well.
  • 13. Thanks!  Conversation and Machine Learning teams  We are hiring! – maggie.xiong@huffingtonpost.com