Hotel Rating Classification
Team
Members:
 Ashwini Salwadgi
 Anuja Borse
 Shubham Pawar
 Akshay Kumar
 Rudra Shukla
 Devesh Gaonkar
 Pranjalee Bokde
CONTENTS
01 02 03
Project
Architecture
Problem
statement
Introduction to ML
classification
04 05 07
Dataset Details
Data Preprocessing
And EDA
Feature
engineering
08
Model Selection
09
Deployment
Project Architecture
Datasets
-Import Libraries
-Load Datasets
Data Cleaning
-Missing Value Treatment
-Checking Duplicates
Data Preprocessing/EDA
-Normalization & Lemmatization
-Punctuation Removal & Stopwords Removal
Data Visualization
-Positive Reviews
-Negative Reviews
Model Selection
Model Deployment
Problem statement
The Hotel dataset consists of 20,491 reviews and feedbacks for different hotels.
Our goal is to examine how travelers are communicating their positive and
negative experiences on online platforms for staying in a specific hotel.
The major objective is what are the attributes that travelers are considering
while selecting a hotel. With this managers can understand which elements of
their hotel influence more in forming a positive review or improving the hotel
brand image.
Introduction to NLP
 Natural Language Processing (NLP) is a field of computer science and
artificial intelligence focused on the interaction between computers and
human languages.
 It involves programming computers to process, analyze, and derive
meaning from large amounts of natural language data.
 NLP is applied in various areas, including automatic question answering,
text summarization, and language translation.
 Research in NLP spans across disciplines such as cognitive science,
linguistics, and psychology.
 One significant application of NLP is text classification, where the goal is
to categorize text into predefined labels based on its content.
NLP classification
Text Classification:
 Text classification is a common NLP task used to solve business problems
in various fields.
 The goal of text classification is to categorize or predict a class of unseen
text documents, often with the help of supervised machine learning.
 Similar to a classification algorithm that has been trained on a tabular
dataset to predict a class, text classification also uses supervised machine
learning.
 The fact that text is involved in text classification is the main distinction
between the two.
Dataset Details
 Hotel_Review.csv- the dataset we are using in our
project.
 No. of Rows: 20,491
 No. of Columns: 02
Data Preprocessing
Data Types
• Checked Data
Types: Both
“Review” and
“Feedback”
columns are of
object type
Null Value
Treatment
• Checked
for Missing
Values: No
missing
values
found
Duplicate
Value
Treatment
• Checked
for
Duplicates:
No
duplicates
found
Text Preprocessing
Text
Preprocessing
Normalization:
Converted text to lower
case
Punctuation Removal:
Removed unnecessary
punctuation
Lemmatization:
Reduced words to their
root forms
Stopwords Removal:
Excluded common
stopwords
Data Visualization(EDA)
Distribution of Feedback Labels
 Bar Plot of Feedback Counts
 Visual representation of the unique counts for each class
 (positive/negative).
Positive Reviews
using Word cloud
Negative Reviews
using Word cloud
Top Bigrams
Top Trigrams
Feature engineering
o Feature engineering in Natural Language Processing (NLP) involves
transforming raw text data into meaningful features that can be used by
machine learning algorithms to make predictions or generate insights.
o Unlike traditional structured data, text data is unstructured, so feature
engineering in NLP often involves a series of pre-processing steps and the
creation of specialized features to capture the nuances of language.
o Feature engineering in NLP is highly dependent on the specific problem
and the type of data being used.
o The goal is to create features that best capture the underlying patterns in the
text, leading to better model performance.
Sentiment Analysis
Features
Custom Features
Model Selection
Logistic regression:
 Logistic regression is a fundamental machine learning algorithm that is
widely used in Natural Language Processing (NLP) tasks, particularly for
binary classification problems.
 Despite its simplicity, it performs well on many NLP tasks when combined
with the right features and data preprocessing techniques.
 Logistic regression is trained using the maximum likelihood estimation,
where the model parameters are optimized to best fit the training data.
 Logistic regression remains a powerful tool in NLP, especially when you
need a model that is simple, interpretable, and performs well on a wide
range of binary classification tasks.
Applications of Logistic regression in NLP:
 Text Classification: Logistic regression can be used for tasks like sentiment
analysis, spam detection, or any other task where text needs to be classified
into two categories.
 Feature Representation:
Bag of Words (BoW): Converts text into a vector of word frequencies.
TF-IDF: Weights words by their importance, giving more significance
to rarer words in the document.
Word Embeddings: Converts words into dense vectors capturing
semantic meaning (e.g., using Word2Vec, GloVe).
n-grams: Captures sequences of words (e.g., bigrams, trigrams) to
consider word order and context.
Why we selected Logistic regression ?
Deployment
 Deployment is the process by which a ML model is
moved from an offline environment and integrated
into an existing production environment such as a live
application.
 It is a critical step that must be completed in order for
a model to serve its intended purpose and solve the
challenges it is designed.
 Here, we are using ‘Stremlit’ for deploying our
application.
Hotel Review Classification(NLP Classification) PPT
Hotel Review Classification(NLP Classification) PPT
Challenges in project
 Data Collection and Quality
 Noise in Data: Hotel reviews often contain spelling errors, slang,
abbreviations, and grammatical mistakes, which can make text preprocessing
difficult.
 Length Variation: Reviews can vary significantly in length, from a few words
to several paragraphs, which might require different handling during
preprocessing.
 Text Preprocessing Challenges
 Handling Stop Words: Deciding whether to remove stop words (common
words like "and", "the") can be tricky, as they might carry sentiment in some
contexts (e.g., "not good").
 Stemming and Lemmatization: Reducing words to their base forms can help
in generalizing features but might also lose some context (e.g., "better" being
reduced to "good").
Challenges in project
 Model Selection and Training
 Choosing the Right Model: Simple models like logistic regression might not
capture complex relationships in the data, while more advanced models like
neural networks might require extensive tuning and more computational
resources.
 Overfitting: With limited data or noisy data, the model might overfit, especially
when using complex models, leading to poor generalization to new reviews.
 Deployment Challenges
 Real-time Processing: If the model is to be deployed in a real-time system
(e.g., for live review monitoring), efficiency and speed of processing become
critical.
 Scalability: The model needs to scale with an increasing volume of reviews,
requiring optimization in terms of computational resources and processing
time.
references
 Pandas documentation Link- https://pandas.pydata.org/docs/
 Matplotlib documentation- https://matplotlib.org/stable/index.html
 Streamlit documentation- https://docs.streamlit.io/
 https://www.kaggle.com/
THANK YOU!!!!!.........

More Related Content

PDF
Text Document Classification System
PDF
Sentiment Analysis
PDF
IRJET- Rating Prediction based on Textual Review: Machine Learning Approach, ...
PPTX
Building NLP solutions using Python
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PDF
Project report
PDF
Financial Tracker using NLP
PDF
Week 2 Sentiment Analysis Using Machine Learning
Text Document Classification System
Sentiment Analysis
IRJET- Rating Prediction based on Textual Review: Machine Learning Approach, ...
Building NLP solutions using Python
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
Project report
Financial Tracker using NLP
Week 2 Sentiment Analysis Using Machine Learning

Similar to Hotel Review Classification(NLP Classification) PPT (20)

PPTX
Text Mining_big_data_machine_learning.pptx
PPTX
Group 5 Text Vectorization in Natural Language Processing.pptx
PPTX
Building NLP solutions for Davidson ML Group
PPTX
Presentation for top (Hotel Review).pptx
DOCX
First ML Experience
PDF
Top Natural Language Processing |aitech.studio
PPTX
04-Text Classificationnnnnnnnnnnnnn.pptx
PDF
ML_Project_Report. for srm devation cdf pdf
PDF
Machine Learning for Natural Language Processing| ashokveda . pdf
PPTX
Taras Fedorov "Evolution from ML to DL in NLP project"
PPTX
Unit - I Sentiment anlysis with logistic regression.pptx
PDF
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
PPTX
Text Classification
PDF
Deep Machine Reading
PPTX
Cd project
PPTX
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
PDF
Text Classification, Sentiment Analysis, and Opinion Mining
PDF
Applications of NLP to become a high earning ML Engineer.pdf
PPTX
Machine learning and_nlp
PPTX
Fake news detection
Text Mining_big_data_machine_learning.pptx
Group 5 Text Vectorization in Natural Language Processing.pptx
Building NLP solutions for Davidson ML Group
Presentation for top (Hotel Review).pptx
First ML Experience
Top Natural Language Processing |aitech.studio
04-Text Classificationnnnnnnnnnnnnn.pptx
ML_Project_Report. for srm devation cdf pdf
Machine Learning for Natural Language Processing| ashokveda . pdf
Taras Fedorov "Evolution from ML to DL in NLP project"
Unit - I Sentiment anlysis with logistic regression.pptx
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
Text Classification
Deep Machine Reading
Cd project
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
Text Classification, Sentiment Analysis, and Opinion Mining
Applications of NLP to become a high earning ML Engineer.pdf
Machine learning and_nlp
Fake news detection
Ad

Recently uploaded (20)

PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Managing Community Partner Relationships
DOCX
Factor Analysis Word Document Presentation
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Transcultural that can help you someday.
PPT
Predictive modeling basics in data cleaning process
PDF
Microsoft Core Cloud Services powerpoint
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Steganography Project Steganography Project .pptx
PDF
Introduction to the R Programming Language
PDF
Introduction to Data Science and Data Analysis
PPTX
IMPACT OF LANDSLIDE.....................
PPT
statistic analysis for study - data collection
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Managing Community Partner Relationships
Factor Analysis Word Document Presentation
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Navigating the Thai Supplements Landscape.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
SAP 2 completion done . PRESENTATION.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Leprosy and NLEP programme community medicine
Transcultural that can help you someday.
Predictive modeling basics in data cleaning process
Microsoft Core Cloud Services powerpoint
[EN] Industrial Machine Downtime Prediction
Steganography Project Steganography Project .pptx
Introduction to the R Programming Language
Introduction to Data Science and Data Analysis
IMPACT OF LANDSLIDE.....................
statistic analysis for study - data collection
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Ad

Hotel Review Classification(NLP Classification) PPT

  • 1. Hotel Rating Classification Team Members:  Ashwini Salwadgi  Anuja Borse  Shubham Pawar  Akshay Kumar  Rudra Shukla  Devesh Gaonkar  Pranjalee Bokde
  • 2. CONTENTS 01 02 03 Project Architecture Problem statement Introduction to ML classification 04 05 07 Dataset Details Data Preprocessing And EDA Feature engineering 08 Model Selection 09 Deployment
  • 3. Project Architecture Datasets -Import Libraries -Load Datasets Data Cleaning -Missing Value Treatment -Checking Duplicates Data Preprocessing/EDA -Normalization & Lemmatization -Punctuation Removal & Stopwords Removal Data Visualization -Positive Reviews -Negative Reviews Model Selection Model Deployment
  • 4. Problem statement The Hotel dataset consists of 20,491 reviews and feedbacks for different hotels. Our goal is to examine how travelers are communicating their positive and negative experiences on online platforms for staying in a specific hotel. The major objective is what are the attributes that travelers are considering while selecting a hotel. With this managers can understand which elements of their hotel influence more in forming a positive review or improving the hotel brand image.
  • 5. Introduction to NLP  Natural Language Processing (NLP) is a field of computer science and artificial intelligence focused on the interaction between computers and human languages.  It involves programming computers to process, analyze, and derive meaning from large amounts of natural language data.  NLP is applied in various areas, including automatic question answering, text summarization, and language translation.  Research in NLP spans across disciplines such as cognitive science, linguistics, and psychology.  One significant application of NLP is text classification, where the goal is to categorize text into predefined labels based on its content.
  • 6. NLP classification Text Classification:  Text classification is a common NLP task used to solve business problems in various fields.  The goal of text classification is to categorize or predict a class of unseen text documents, often with the help of supervised machine learning.  Similar to a classification algorithm that has been trained on a tabular dataset to predict a class, text classification also uses supervised machine learning.  The fact that text is involved in text classification is the main distinction between the two.
  • 7. Dataset Details  Hotel_Review.csv- the dataset we are using in our project.  No. of Rows: 20,491  No. of Columns: 02
  • 8. Data Preprocessing Data Types • Checked Data Types: Both “Review” and “Feedback” columns are of object type Null Value Treatment • Checked for Missing Values: No missing values found Duplicate Value Treatment • Checked for Duplicates: No duplicates found
  • 9. Text Preprocessing Text Preprocessing Normalization: Converted text to lower case Punctuation Removal: Removed unnecessary punctuation Lemmatization: Reduced words to their root forms Stopwords Removal: Excluded common stopwords
  • 10. Data Visualization(EDA) Distribution of Feedback Labels  Bar Plot of Feedback Counts  Visual representation of the unique counts for each class  (positive/negative).
  • 11. Positive Reviews using Word cloud Negative Reviews using Word cloud
  • 13. Feature engineering o Feature engineering in Natural Language Processing (NLP) involves transforming raw text data into meaningful features that can be used by machine learning algorithms to make predictions or generate insights. o Unlike traditional structured data, text data is unstructured, so feature engineering in NLP often involves a series of pre-processing steps and the creation of specialized features to capture the nuances of language. o Feature engineering in NLP is highly dependent on the specific problem and the type of data being used. o The goal is to create features that best capture the underlying patterns in the text, leading to better model performance.
  • 15. Model Selection Logistic regression:  Logistic regression is a fundamental machine learning algorithm that is widely used in Natural Language Processing (NLP) tasks, particularly for binary classification problems.  Despite its simplicity, it performs well on many NLP tasks when combined with the right features and data preprocessing techniques.  Logistic regression is trained using the maximum likelihood estimation, where the model parameters are optimized to best fit the training data.  Logistic regression remains a powerful tool in NLP, especially when you need a model that is simple, interpretable, and performs well on a wide range of binary classification tasks.
  • 16. Applications of Logistic regression in NLP:  Text Classification: Logistic regression can be used for tasks like sentiment analysis, spam detection, or any other task where text needs to be classified into two categories.  Feature Representation: Bag of Words (BoW): Converts text into a vector of word frequencies. TF-IDF: Weights words by their importance, giving more significance to rarer words in the document. Word Embeddings: Converts words into dense vectors capturing semantic meaning (e.g., using Word2Vec, GloVe). n-grams: Captures sequences of words (e.g., bigrams, trigrams) to consider word order and context.
  • 17. Why we selected Logistic regression ?
  • 18. Deployment  Deployment is the process by which a ML model is moved from an offline environment and integrated into an existing production environment such as a live application.  It is a critical step that must be completed in order for a model to serve its intended purpose and solve the challenges it is designed.  Here, we are using ‘Stremlit’ for deploying our application.
  • 21. Challenges in project  Data Collection and Quality  Noise in Data: Hotel reviews often contain spelling errors, slang, abbreviations, and grammatical mistakes, which can make text preprocessing difficult.  Length Variation: Reviews can vary significantly in length, from a few words to several paragraphs, which might require different handling during preprocessing.  Text Preprocessing Challenges  Handling Stop Words: Deciding whether to remove stop words (common words like "and", "the") can be tricky, as they might carry sentiment in some contexts (e.g., "not good").  Stemming and Lemmatization: Reducing words to their base forms can help in generalizing features but might also lose some context (e.g., "better" being reduced to "good").
  • 22. Challenges in project  Model Selection and Training  Choosing the Right Model: Simple models like logistic regression might not capture complex relationships in the data, while more advanced models like neural networks might require extensive tuning and more computational resources.  Overfitting: With limited data or noisy data, the model might overfit, especially when using complex models, leading to poor generalization to new reviews.  Deployment Challenges  Real-time Processing: If the model is to be deployed in a real-time system (e.g., for live review monitoring), efficiency and speed of processing become critical.  Scalability: The model needs to scale with an increasing volume of reviews, requiring optimization in terms of computational resources and processing time.
  • 23. references  Pandas documentation Link- https://pandas.pydata.org/docs/  Matplotlib documentation- https://matplotlib.org/stable/index.html  Streamlit documentation- https://docs.streamlit.io/  https://www.kaggle.com/