SlideShare a Scribd company logo
2
Most read
11
Most read
21
Most read
NLP
Sentiment Analysis
Sentimental analysis
What is sentimental analysis?
Sentimental analysis is contextual mining of text which identifies and
extracts subjective information and helping the business to understand
the social sentiment of their brand, product or service.
In other words it is the process of determining whether a piece of writing
is positive, negative or neutral.
Application of sentiment analysis
Sources of data:- Twitter, facebook, survey, product reviews etc.
Applications:
1.) Fashion: Accessories, apparel, outlets, designing, brands etc.
2.) Automobile: type of pre-owned cars, features, requirements etc.
3.) Books, Malls and stores, Online Services, Travel, Healthcare, etc.
Rupak Roy
Sentimental analysis:1.Naïve Bayes
Machine Learning Classification Methods
1) Naïve Bayes: this supervised classification method uses Bayes Rule.
So this depends on “bag of Words” of a document.
Office 1
Traffic 3
Time 2
Early 1
Late 2
* A Bag of words or BoW means collection of words, discarding
grammar and order of words but keeping the multiplicity.
Its a way of extracting features from text for use in machine learning
modeling.
Rupak Roy
Recap: Naive Bayes Rule
In spam filtering the Naive Bayes algorithm was widely used. The
algorithm takes the count of “a particular word" mention in the spam list
with a normal mail, then it multiplies both probabilities using the Bayes
equation.
Good word list
Spam list
Later, spammers figure it out how to trick spam filters by adding lots of
"good" words at the end of the email and this method is
called Bayesian poisoning.
Rupak Roy
Great -235
Opportunities -3
Speak -44
Meeting -246
Collaborative-3
Sales-77
Scope - 98
100% - 642
Fast -78
Hurry - 40
“hello”
P(B|A) P(A)
P(A|B) = = Not Spam
P(B)
Recap: Naive Bayes Rule
It ignore few things:
words, word order, length. It just looks for frequency to do the
classification
Naïve Bayes strength & weakness
Advantage:
Being a supervised classification algorithm it is easy to implement
Weakness:
It breaks in funny ways. Previously when people did Google search for
Chicago bulls. It gave animals rather than city.
Because phrases that comprises multiple words with distinct different
meanings. Don‟t work with Naïve Bayes. And requires categorical
variable as target.
Assumptions: Bag of words position doesn‟t matter.
Conditional independence. Eg. „Great‟ occurring not dependent or
word „fabulous‟ in the same document.
Rupak Roy
Recap: Naive Bayes Rule
Prior probability of Green = no.of green objects/total no. of objects
Prior probability of Red = No. of Red objects/ total number of objects
Green 40/60=4/6
Red 20/60=2/6
Prior probability is computed without any knowledge about the point
likelihood computed after knowing what the data point is.
What is the likelihood of Red point= no. of red points/ total no. of points in
the neighborhood
What is the likelihood of green point = no. of green points/ total no. of points
in the neighborhood
Posterior probability of ‘x’ being Green = prior probability of green X
likelihood of „x‟ given Green = 4/6 X1/40=1/60 = 0.016
Posterior probability of ‘x’ being Red = prior probability of Red X likelihood of
„x‟ given Red = 2/6 X 3/20 =1/20 = 0.05
Prior Probability X test evidence = posterior probability
Recap: Naive Bayes Rule
Finally we classify „x‟ as Red since it class membership achieves the
largest posterior probability.
Formula to remember
In Naïve Bayes we simply take the maximum & convert them into Yes &
No, Classification.
Rupak Roy
Recap: Naive Bayes Rule
Marty
Love
.1
Deal
.8
Life
.1
Rupak Roy
Alica
Love
.5
Deal
.2
Life
.3
Assume,
Prior Probability
P(Alica)=0.5
P(Marty)=0.5
Love Life: So what is the probability of who wrote this mail:
Marty: .1.1 * .5
Alica: .5 .3 * .5(Its Alica) easy by seeing
Life Deal: Marty: .1 .8 .5(prior prob.) = 0.04
Alica: .2 .3 .5(prior prob.) = 0.03. So its Marty.
We can also do the same like
Posterior P(Marty|”Life Deal”)=0.04/(0.04+0.03)=4/7=57
P(Alica|”Life Deal”)=0.03/0.07=3/7=48
(0.04+0.03 i.e. 0.07 way to scale/normalize to 1)
Sentimental analysis: 1.Naive Bayes
A/c Bayes Theorem to sentimental analysis
Sentiment
A/c Bayes theorem, Classifier
P(Word/class) = P(class/word) * P (word) / P(class)
=P(Positive/Early)*P(Early) / P(Positive)
Rupak Roy
Bag of Words
Early
Late
Positive
Negative
Positive
Negative
70%
30%
80%
20%
20%
30%
Unconditional
(Probability)
Conditional
Sentimental analysis: 1.Naive Bayes
Naïve Bayes Assumptions:
1. Bag of words assumptions: position doesn‟t matter
2. Conditional Independence: Assume the feature probabilities are
independent given to the class.
Eg. Great occurring not dependent on word fabulous in the same
document.
So Phrases that comprises multiple words with distinct different
meaning, don‟t work with Naïve Bayes
Rupak Roy
Sentimental analysis: 2.Decision Trees
Give a Loan?
Decision trees can separate Non-Linear
To Linear decision surface
Random Forest is the collection of several models in this case collection
of decision trees that are used in order to increase predictive power &
the final score is obtained by aggregating them.
 This is known as Ensemble Method in Machine Learning.
Credit
History
Good
Debt<1000
No
Time
Bad
Time >18
P=.3
Rupak Roy
Sentimental analysis: 2.Random Forest
Steps on how to use and build a random forest model:
1. Select the number of trees to be build i.e. Ntree = N (default N is 500)
2. Now select a bagging sample from the train dataset.
3. Define the mtry that is the number of randomly selected
predictors/features will be used to make the split.
4. Grow until it stops improving, in other words until the error no longer
decreases.
OOB Error (Out Of Bag)
 For each sample ran from the data set(training dataset), there will be
samples left behind that were not included due to its robustness to
outliers and missing values comes with a cost of throwing some data
as we have learned in our previous chapter.
 So these samples are called as Out of Bag (OOB) samples.
Rupak Roy
Sentimental analysis: 2.Random Forest
Advantages:
 Can handle noisy or missing data very well.
 In RF we don‟t need to separately create a test data set for cross
validation as each model uses 60% of the observations and 30%
approx. for accessing the performance of the model.
 OOB or Out Of Bag sample also works as a cross validation for the
accuracy of a random forest model.
 Helps to identify the important variables.
Disadvantages:
 Unlike decision trees the model is not easily interpretable.
 Prune to over fitting. Two common ways to avoid over fitting
Pre- pruning and Post-pruning. Post-pruning is more preferable because
predicting an estimate.
Over fitting refers to a model that models the training data too well to
the extent that it cannot recognize the pattern on an unseen new
data. Hence negatively impacts the performance of the model on new
data.
Rupak Roy
Sentimental analysis: 2.Random Forest
Random Forest Classification Technique
Positive Positive Negative Positive
Hence positive.
Rupak Roy
Data
Features
Decision Tree
Sample 1
Decision Tree
Sample 2
Decision Tree
Sample 3
Decision Tree
Sample 4
Sentimental analysis: 3.SVM
The most popular method of classical classification.
It tries to draw two lines between data points with the largest margin
between them.
Which is the line that best separates the data?
And why this line is the
best line that separates
the data?
What this does it maximizes the distance to the
nearest points and is named as MARGIN.
Margin is the distance between the line and the
nearest point between two classes.
Rupak Roy
Sentimental analysis: 3.SVM
Which line here is the best line?
This(blue) line maximizes the distance between the
data points while sacrificing a class which in turn
called as Class Error. So the 2nd(green) is the best
line that maximizes the distance between 2 classes
Support Vector Machine first classifies classes
correctly then maximizes the margin.
How can we solve this?
SVM‟s are good to find the
decision boundaries that max
the distance between classes
and at the same tolerates
the individual outliers.
Outlier
Sentimental analysis: 3.SVM
Non-Linear Data
Yes SVM will work!
SVM‟s will use Feature X and Y and will convert it
to a label (either Blue or Red)
Now we will have 3 dimensional space where we can separate
the classes linearly.
We will find, we will have small amount of Z in X axis and small with blue class.
Z measures the distance from the origin.
So is this linearly separable? Yes!
This blue line in actual represents the circle.
x
Y
𝑧 = 𝑥2
+ 𝑦2
𝑦
𝑥
SVM
Labels
𝑥
𝑧
Sentimental analysis: 4.Maximum Entropy
4. Maximum entropy: is technique of learning probability distribution
from data.
Maximum entropy models offer a clean way to combine diverse pieces
of contextual evidence in order to estimate the probability of a certain
linguistic class occurring in a document.
Eg. classify our documents into 3 classes: Positive, Negative, Neutral
• Each document must be classified into one of the classes, so
P(positive)+P(negative)+P(neutral)= 1 i.e. 100%
• Without additional information choose the model that makes the least
Rupak Roy
Sentimental analysis: 4.Maximum Entropy
Least Assumptions = Most Uniform
If the word “Good "appears in the document then
P(positive|”Good”) = 0.8
The Max Entropy model what it does, it starts adjusting with the other
classifiers whenever one of the classification is very high.
P(negative|”Good”)=0.1
P(neutral|”Good”)=0.1
Maximum Entropy modeling creates a distribution that accepts all these
constraints, while being uniform as possible. It tries to distribute equally
among all the classifiers but also takes into account the constraints.
So when we have more observations/constraints:
• P(Positive|”Good”)=0.8
• P(Negative|”Not Okay”)=0.7
• P(Neutral|”SoSo”) =0.3
Rupak Roy
Sentimental analysis: 4.Maximum Entropy
Why uniform distribution?
• Most uniform = Maximum Entropy
Least assumptions = simplest explanation
Maximum Entropy is one of the machine learning modeling technique
in NLP that is highly effective in classification with high accuracy.
Therefore MaxEntropy is a useful and easy-to-understand tool to help
computers make decisions based off of “features” on your data.
Rupak Roy
Next
Let‟s perform Sentimental analysis with the help of an example where
we will have reviews of the product.
Rupak Roy

More Related Content

PPT
Introduction to Natural Language Processing
PPTX
Sentimental Analysis - Naive Bayes Algorithm
PPTX
Sentiment analysis using naive bayes classifier
PPTX
PPT
First order logic
PDF
Natural Language Processing (NLP)
PPTX
5. phases of nlp
PPTX
Nlp toolkits and_preprocessing_techniques
Introduction to Natural Language Processing
Sentimental Analysis - Naive Bayes Algorithm
Sentiment analysis using naive bayes classifier
First order logic
Natural Language Processing (NLP)
5. phases of nlp
Nlp toolkits and_preprocessing_techniques

What's hot (20)

PDF
P, NP, NP-Complete, and NP-Hard
PPTX
Lecture 1: Semantic Analysis in Language Technology
PPTX
Introduction to Transformer Model
PDF
Understanding Bagging and Boosting
PDF
Web scraping in python
PPTX
Natural language processing: feature extraction
PDF
An overview of Hidden Markov Models (HMM)
PDF
Little o and little omega
PPTX
Sentiment analysis of tweets
PPTX
Sentiment Analysis in Twitter
PPTX
Load Balancing in Parallel and Distributed Database
PPTX
Artificial Intelligence- TicTacToe game
PPTX
Text MIning
PDF
bag-of-words models
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PPTX
Natural language processing and transformer models
PPTX
Introduction to natural language processing (NLP)
PPTX
INTRODUCTION TO NLP, RNN, LSTM, GRU
DOCX
Nonrecursive predictive parsing
P, NP, NP-Complete, and NP-Hard
Lecture 1: Semantic Analysis in Language Technology
Introduction to Transformer Model
Understanding Bagging and Boosting
Web scraping in python
Natural language processing: feature extraction
An overview of Hidden Markov Models (HMM)
Little o and little omega
Sentiment analysis of tweets
Sentiment Analysis in Twitter
Load Balancing in Parallel and Distributed Database
Artificial Intelligence- TicTacToe game
Text MIning
bag-of-words models
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Natural language processing and transformer models
Introduction to natural language processing (NLP)
INTRODUCTION TO NLP, RNN, LSTM, GRU
Nonrecursive predictive parsing
Ad

Similar to NLP - Sentiment Analysis (20)

PDF
Understanding the Machine Learning Algorithms
PDF
Naive.pdf
PPTX
Reuqired ppt for machine learning algirthms and part
PPTX
MACHINE LEARNING Unit -2 Algorithm.pptx
PPTX
Data mining approaches and methods
PPT
An Introduction to SPSS
PDF
Unit-1.pdf
PDF
07 dimensionality reduction
PDF
M08 BiasVarianceTradeoff
PPTX
SVM - Functional Verification
PDF
Questions for R language.pdf
PPTX
Naive_hehe.pptx
PDF
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
PPTX
Machine Learning with Python unit-2.pptx
PDF
SVM Based Identification of Psychological Personality Using Handwritten Text
PPT
Machine Learning: Decision Trees Chapter 18.1-18.3
PPT
Supervised and unsupervised learning
PPTX
Data Science Interview Questions | Data Science Interview Questions And Answe...
PPT
Basic Level Quantitative Analysis Using SPSS.ppt
PPT
Introduction to spss
Understanding the Machine Learning Algorithms
Naive.pdf
Reuqired ppt for machine learning algirthms and part
MACHINE LEARNING Unit -2 Algorithm.pptx
Data mining approaches and methods
An Introduction to SPSS
Unit-1.pdf
07 dimensionality reduction
M08 BiasVarianceTradeoff
SVM - Functional Verification
Questions for R language.pdf
Naive_hehe.pptx
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Machine Learning with Python unit-2.pptx
SVM Based Identification of Psychological Personality Using Handwritten Text
Machine Learning: Decision Trees Chapter 18.1-18.3
Supervised and unsupervised learning
Data Science Interview Questions | Data Science Interview Questions And Answe...
Basic Level Quantitative Analysis Using SPSS.ppt
Introduction to spss
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
PDF
Clustering K means and Hierarchical - NLP
PDF
Network Analysis - NLP
PDF
Topic Modeling - NLP
PDF
Sentiment Analysis Practical Steps
PDF
Text Mining using Regular Expressions
PDF
Introduction to Text Mining
PDF
Apache Hbase Architecture
PDF
Introduction to Hbase
PDF
Apache Hive Table Partition and HQL
PDF
Installing Apache Hive, internal and external table, import-export
PDF
Introductive to Hive
PDF
Scoop Job, import and export to RDBMS
PDF
Apache Scoop - Import with Append mode and Last Modified mode
PDF
Introduction to scoop and its functions
PDF
Introduction to Flume
PDF
Apache Pig Relational Operators - II
PDF
Passing Parameters using File and Command Line
PDF
Apache PIG Relational Operations
PDF
Apache PIG casting, reference
Hierarchical Clustering - Text Mining/NLP
Clustering K means and Hierarchical - NLP
Network Analysis - NLP
Topic Modeling - NLP
Sentiment Analysis Practical Steps
Text Mining using Regular Expressions
Introduction to Text Mining
Apache Hbase Architecture
Introduction to Hbase
Apache Hive Table Partition and HQL
Installing Apache Hive, internal and external table, import-export
Introductive to Hive
Scoop Job, import and export to RDBMS
Apache Scoop - Import with Append mode and Last Modified mode
Introduction to scoop and its functions
Introduction to Flume
Apache Pig Relational Operators - II
Passing Parameters using File and Command Line
Apache PIG Relational Operations
Apache PIG casting, reference

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Lecture1 pattern recognition............
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to machine learning and Linear Models
PPTX
1_Introduction to advance data techniques.pptx
PDF
Business Analytics and business intelligence.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Lecture1 pattern recognition............
Qualitative Qantitative and Mixed Methods.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to machine learning and Linear Models
1_Introduction to advance data techniques.pptx
Business Analytics and business intelligence.pdf
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

NLP - Sentiment Analysis

  • 2. Sentimental analysis What is sentimental analysis? Sentimental analysis is contextual mining of text which identifies and extracts subjective information and helping the business to understand the social sentiment of their brand, product or service. In other words it is the process of determining whether a piece of writing is positive, negative or neutral. Application of sentiment analysis Sources of data:- Twitter, facebook, survey, product reviews etc. Applications: 1.) Fashion: Accessories, apparel, outlets, designing, brands etc. 2.) Automobile: type of pre-owned cars, features, requirements etc. 3.) Books, Malls and stores, Online Services, Travel, Healthcare, etc. Rupak Roy
  • 3. Sentimental analysis:1.Naïve Bayes Machine Learning Classification Methods 1) Naïve Bayes: this supervised classification method uses Bayes Rule. So this depends on “bag of Words” of a document. Office 1 Traffic 3 Time 2 Early 1 Late 2 * A Bag of words or BoW means collection of words, discarding grammar and order of words but keeping the multiplicity. Its a way of extracting features from text for use in machine learning modeling. Rupak Roy
  • 4. Recap: Naive Bayes Rule In spam filtering the Naive Bayes algorithm was widely used. The algorithm takes the count of “a particular word" mention in the spam list with a normal mail, then it multiplies both probabilities using the Bayes equation. Good word list Spam list Later, spammers figure it out how to trick spam filters by adding lots of "good" words at the end of the email and this method is called Bayesian poisoning. Rupak Roy Great -235 Opportunities -3 Speak -44 Meeting -246 Collaborative-3 Sales-77 Scope - 98 100% - 642 Fast -78 Hurry - 40 “hello” P(B|A) P(A) P(A|B) = = Not Spam P(B)
  • 5. Recap: Naive Bayes Rule It ignore few things: words, word order, length. It just looks for frequency to do the classification Naïve Bayes strength & weakness Advantage: Being a supervised classification algorithm it is easy to implement Weakness: It breaks in funny ways. Previously when people did Google search for Chicago bulls. It gave animals rather than city. Because phrases that comprises multiple words with distinct different meanings. Don‟t work with Naïve Bayes. And requires categorical variable as target. Assumptions: Bag of words position doesn‟t matter. Conditional independence. Eg. „Great‟ occurring not dependent or word „fabulous‟ in the same document. Rupak Roy
  • 6. Recap: Naive Bayes Rule Prior probability of Green = no.of green objects/total no. of objects Prior probability of Red = No. of Red objects/ total number of objects Green 40/60=4/6 Red 20/60=2/6 Prior probability is computed without any knowledge about the point likelihood computed after knowing what the data point is. What is the likelihood of Red point= no. of red points/ total no. of points in the neighborhood What is the likelihood of green point = no. of green points/ total no. of points in the neighborhood Posterior probability of ‘x’ being Green = prior probability of green X likelihood of „x‟ given Green = 4/6 X1/40=1/60 = 0.016 Posterior probability of ‘x’ being Red = prior probability of Red X likelihood of „x‟ given Red = 2/6 X 3/20 =1/20 = 0.05 Prior Probability X test evidence = posterior probability
  • 7. Recap: Naive Bayes Rule Finally we classify „x‟ as Red since it class membership achieves the largest posterior probability. Formula to remember In Naïve Bayes we simply take the maximum & convert them into Yes & No, Classification. Rupak Roy
  • 8. Recap: Naive Bayes Rule Marty Love .1 Deal .8 Life .1 Rupak Roy Alica Love .5 Deal .2 Life .3 Assume, Prior Probability P(Alica)=0.5 P(Marty)=0.5 Love Life: So what is the probability of who wrote this mail: Marty: .1.1 * .5 Alica: .5 .3 * .5(Its Alica) easy by seeing Life Deal: Marty: .1 .8 .5(prior prob.) = 0.04 Alica: .2 .3 .5(prior prob.) = 0.03. So its Marty. We can also do the same like Posterior P(Marty|”Life Deal”)=0.04/(0.04+0.03)=4/7=57 P(Alica|”Life Deal”)=0.03/0.07=3/7=48 (0.04+0.03 i.e. 0.07 way to scale/normalize to 1)
  • 9. Sentimental analysis: 1.Naive Bayes A/c Bayes Theorem to sentimental analysis Sentiment A/c Bayes theorem, Classifier P(Word/class) = P(class/word) * P (word) / P(class) =P(Positive/Early)*P(Early) / P(Positive) Rupak Roy Bag of Words Early Late Positive Negative Positive Negative 70% 30% 80% 20% 20% 30% Unconditional (Probability) Conditional
  • 10. Sentimental analysis: 1.Naive Bayes Naïve Bayes Assumptions: 1. Bag of words assumptions: position doesn‟t matter 2. Conditional Independence: Assume the feature probabilities are independent given to the class. Eg. Great occurring not dependent on word fabulous in the same document. So Phrases that comprises multiple words with distinct different meaning, don‟t work with Naïve Bayes Rupak Roy
  • 11. Sentimental analysis: 2.Decision Trees Give a Loan? Decision trees can separate Non-Linear To Linear decision surface Random Forest is the collection of several models in this case collection of decision trees that are used in order to increase predictive power & the final score is obtained by aggregating them.  This is known as Ensemble Method in Machine Learning. Credit History Good Debt<1000 No Time Bad Time >18 P=.3 Rupak Roy
  • 12. Sentimental analysis: 2.Random Forest Steps on how to use and build a random forest model: 1. Select the number of trees to be build i.e. Ntree = N (default N is 500) 2. Now select a bagging sample from the train dataset. 3. Define the mtry that is the number of randomly selected predictors/features will be used to make the split. 4. Grow until it stops improving, in other words until the error no longer decreases. OOB Error (Out Of Bag)  For each sample ran from the data set(training dataset), there will be samples left behind that were not included due to its robustness to outliers and missing values comes with a cost of throwing some data as we have learned in our previous chapter.  So these samples are called as Out of Bag (OOB) samples. Rupak Roy
  • 13. Sentimental analysis: 2.Random Forest Advantages:  Can handle noisy or missing data very well.  In RF we don‟t need to separately create a test data set for cross validation as each model uses 60% of the observations and 30% approx. for accessing the performance of the model.  OOB or Out Of Bag sample also works as a cross validation for the accuracy of a random forest model.  Helps to identify the important variables. Disadvantages:  Unlike decision trees the model is not easily interpretable.  Prune to over fitting. Two common ways to avoid over fitting Pre- pruning and Post-pruning. Post-pruning is more preferable because predicting an estimate. Over fitting refers to a model that models the training data too well to the extent that it cannot recognize the pattern on an unseen new data. Hence negatively impacts the performance of the model on new data. Rupak Roy
  • 14. Sentimental analysis: 2.Random Forest Random Forest Classification Technique Positive Positive Negative Positive Hence positive. Rupak Roy Data Features Decision Tree Sample 1 Decision Tree Sample 2 Decision Tree Sample 3 Decision Tree Sample 4
  • 15. Sentimental analysis: 3.SVM The most popular method of classical classification. It tries to draw two lines between data points with the largest margin between them. Which is the line that best separates the data? And why this line is the best line that separates the data? What this does it maximizes the distance to the nearest points and is named as MARGIN. Margin is the distance between the line and the nearest point between two classes. Rupak Roy
  • 16. Sentimental analysis: 3.SVM Which line here is the best line? This(blue) line maximizes the distance between the data points while sacrificing a class which in turn called as Class Error. So the 2nd(green) is the best line that maximizes the distance between 2 classes Support Vector Machine first classifies classes correctly then maximizes the margin. How can we solve this? SVM‟s are good to find the decision boundaries that max the distance between classes and at the same tolerates the individual outliers. Outlier
  • 17. Sentimental analysis: 3.SVM Non-Linear Data Yes SVM will work! SVM‟s will use Feature X and Y and will convert it to a label (either Blue or Red) Now we will have 3 dimensional space where we can separate the classes linearly. We will find, we will have small amount of Z in X axis and small with blue class. Z measures the distance from the origin. So is this linearly separable? Yes! This blue line in actual represents the circle. x Y 𝑧 = 𝑥2 + 𝑦2 𝑦 𝑥 SVM Labels 𝑥 𝑧
  • 18. Sentimental analysis: 4.Maximum Entropy 4. Maximum entropy: is technique of learning probability distribution from data. Maximum entropy models offer a clean way to combine diverse pieces of contextual evidence in order to estimate the probability of a certain linguistic class occurring in a document. Eg. classify our documents into 3 classes: Positive, Negative, Neutral • Each document must be classified into one of the classes, so P(positive)+P(negative)+P(neutral)= 1 i.e. 100% • Without additional information choose the model that makes the least Rupak Roy
  • 19. Sentimental analysis: 4.Maximum Entropy Least Assumptions = Most Uniform If the word “Good "appears in the document then P(positive|”Good”) = 0.8 The Max Entropy model what it does, it starts adjusting with the other classifiers whenever one of the classification is very high. P(negative|”Good”)=0.1 P(neutral|”Good”)=0.1 Maximum Entropy modeling creates a distribution that accepts all these constraints, while being uniform as possible. It tries to distribute equally among all the classifiers but also takes into account the constraints. So when we have more observations/constraints: • P(Positive|”Good”)=0.8 • P(Negative|”Not Okay”)=0.7 • P(Neutral|”SoSo”) =0.3 Rupak Roy
  • 20. Sentimental analysis: 4.Maximum Entropy Why uniform distribution? • Most uniform = Maximum Entropy Least assumptions = simplest explanation Maximum Entropy is one of the machine learning modeling technique in NLP that is highly effective in classification with high accuracy. Therefore MaxEntropy is a useful and easy-to-understand tool to help computers make decisions based off of “features” on your data. Rupak Roy
  • 21. Next Let‟s perform Sentimental analysis with the help of an example where we will have reviews of the product. Rupak Roy