SlideShare a Scribd company logo
International Conference on
Emerging Techniques in Machine Learning, Data Science and Internet of Things (ETMDIT-
2024)
Presented
by
{Presenter Name}
Designation
Affiliation
PAPER-ID:ETMDIT-{XXX}
{Paper Title}
S.No Name Affiliation
1 {Author 1} {Author 1 Affiliation}
2 {Author 2} {Author 2 Affiliation}
3 {Author 3} {Author 3 Affiliation}
4 {Author 4} {Author 4 Affiliation}
AUTHORS
Contents
• Introduction
• Literature Survey
• Proposed Methodology
• Results and Discussion
• Conclusion
• Future Scope
• References
2
Introduction
Twitter, a dynamic platform, serves as a real-time canvas for public
opinions and emotions.
The rapid growth of user-generated content highlights the necessity of
understanding sentiments on this platform.
Sentiment analysis on Twitter is crucial for businesses, policymakers, and
researchers to gauge public opinion and trends.
Research Focus
• This study explores Twitter sentiment analysis using a diverse range of
machine learning algorithms.
• Emphasis is placed on decoding the complex emotions within tweets.
• The goal is not only to identify sentiments but also to understand the
nuances and context behind them.
• Ethical considerations, such as user privacy and consent, are integral
to this study.
3
Literature Survey
4
Proposed Methodology
Data Preprocessing:
• Dataset Details: 160,000 tweets (80,000 positive, 80,000 negative).
• Steps: Data cleansing, tokenization, normalization.
Algorithmic Ensemble:
• Support Vector Regression (SVR): Handles non-linear relationships; excels in capturing
nuanced sentiment patterns.
• Decision Trees: Interpretable, handles non-linear relationships; captures contextual cues.
• Random Forest: Ensemble of decision trees; mitigates overfitting, enhances robustness.
• Logistic Regression: Efficient for binary classification; balances complexity.
Feature Selection and Extraction:
• Identifies relevant features (words, n-grams, emojis).
• Ensures each feature captures sentiment nuances.
Training and Validation:
• Cross-validation: Ensures algorithm adaptability to evolving language trends.
• Figures: Word clouds for positive and negative tweets.
• Evaluation Metrics:
Precision, recall, F1 score: Metrics to assess algorithm performance.
5
Data Collection
• Data Source: Twitter API
• Collected a dataset of 160,000 tweets.
• Balanced dataset: 80,000 positive tweets,
80,000 negative tweets.
Criteria for Selection:
• Focused on tweets in English.
• Included a mix of topics and hashtags to
ensure diversity.
Data Preprocessing
Data Cleansing:
• Removed irrelevant data (e.g.,
advertisements, non-English tweets).
• Filtered out noisy and ambiguous content to
enhance data quality.
• Tokenization:
• Split tweets into individual words or tokens.
Data Preprocessing
Normalization:
• Converted text to lowercase.
• Removed punctuation and special characters.
• Handled contractions and common social media slangs.
Feature Extraction:
• Transformed text data into numerical format using
techniques like TF-IDF.
• Handling Emoticons and Emojis:
• Incorporated emoticons and emojis as features due to their
sentiment-bearing potential.
Machine Learning Algorithms
Support Vector Regression (SVR)
• Strength: Effective in handling high-dimensional data and capturing
complex relationships by finding the optimal hyperplane. It's particularly
useful in cases where the data has clear margins of separation.
Decision Trees
• Strength: Intuitive and easy to interpret, decision trees are adept at
handling both numerical and categorical data. They're excellent for feature
selection and can handle non-linear relationships well.
Algorithm: Random Forest
• Strength: Combines multiple decision trees to improve accuracy and
reduce overfitting. It's robust to outliers and noisy data, and it doesn't
require much data preprocessing.
Algorithm: Logistic Regression
• Strength: A simple yet powerful algorithm for binary classification tasks.
It's interpretable and efficient, making it suitable for scenarios with limited
computational resources.
Training and Validation Process
10
Training and Validation Process:
• Cross-validation: Utilized to assess model performance by splitting
the dataset into multiple subsets, training on a portion, and
validating on the remainder. This helps in estimating the model's
generalization capability.
• Training on real-world data: Models were trained on authentic
datasets reflecting real-world sentiments, ensuring relevance and
accuracy in classification tasks.
Visuals:
• Word clouds for positive and negative sentiments: Word clouds
visually represent the frequency of words in a corpus, with word
size indicating frequency. For positive sentiment, words like
"happy," "great," and "excellent" would dominate, while for
negative sentiment, words like "bad," "poor," and "disappointing"
would be prominent. These word clouds offer a quick snapshot of
Training and Validation Process
11
Training and Validation Process:
• Cross-validation: Utilized to assess model performance by splitting
the dataset into multiple subsets, training on a portion, and
validating on the remainder. This helps in estimating the model's
generalization capability.
• Training on real-world data: Models were trained on authentic
datasets reflecting real-world sentiments, ensuring relevance and
accuracy in classification tasks.
Visuals:
• Word clouds for positive and negative sentiments: Word clouds
visually represent the frequency of words in a corpus, with word
size indicating frequency. For positive sentiment, words like
"happy," "great," and "excellent" would dominate, while for
negative sentiment, words like "bad," "poor," and "disappointing"
would be prominent. These word clouds offer a quick snapshot of
Evaluation and Performance
12
Results and Discussion
•In the context of sentiment analysis on a vast dataset comprising 1.6 million tweets, our
exploration of machine learning algorithms has yielded insightful outcomes.
•Logistic Regression emerged as a robust performer, achieving a high training accuracy of
approximately 85% and maintaining commendable generalization with a test accuracy of
around 84%.
•This algorithm effectively balances simplicity with effectiveness, making it a promising
choice for sentiment analysis on the given dataset.
•Support Vector Regression (SVR), while not conventionally tailored for classification
tasks, displayed potential for evaluating sentiment.
•Utilizing regression metrics, such as mean absolute error, offered a fitting assessment of
SVR's predictive accuracy.
•The continuous predictions generated by SVR necessitate a different evaluation
perspective compared to conventional classification algorithms.
•Moving to Decision Tree analysis, the model exhibited a near-perfect training accuracy,
reaching close to 100%.
•However, signs of potential overfitting emerged, as evidenced by a drop in test accuracy.
Decision Trees, with their inclination to memorize training data, underscore the
importance of regularization techniques or ensemble methods, such as Random Forest, to
enhance generalization. 13
Future Scope
14
References
15
Thank You
16

More Related Content

PDF
IRJET - Twitter Sentiment Analysis using Machine Learning
PDF
Sentimental Emotion Analysis using Python and Machine Learning
PDF
Sentiment Analysis on Twitter Data
PDF
ML_Project_Report. for srm devation cdf pdf
PPTX
Predicting Tweet Sentiment
PDF
Hybrid Classifier for Sentiment Analysis using Effective Pipelining
PDF
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET - Twitter Sentiment Analysis using Machine Learning
Sentimental Emotion Analysis using Python and Machine Learning
Sentiment Analysis on Twitter Data
ML_Project_Report. for srm devation cdf pdf
Predicting Tweet Sentiment
Hybrid Classifier for Sentiment Analysis using Effective Pipelining
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Analysis of Brand Value Prediction based on Social Media Data

Similar to Emerging Techniques in Machine Learning, Data Science and Internet of Things (20)

PPTX
Sentiment analysis using ml
PDF
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
PDF
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
PDF
IRJET- Sentiment Analysis of Twitter Data using Python
DOCX
Sentiment analysis using machine learning
PDF
IRJET- Comparative Study of Classification Algorithms for Sentiment Analy...
PDF
Sentiment Analysis of Twitter Data
PDF
IRJET- Twitter Opinion Mining
PDF
Twitter Sentiment Analysis
PDF
IRJET- Sentiment Analysis using Twitter Data
PDF
Emotion Recognition By Textual Tweets Using Machine Learning
PDF
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
PPTX
REPORT Sentiment_Analysis_of_Social_Media (5).pptx
PDF
IRJET-Sentiment Analysis in Twitter
ODP
Sentiment Analysis on Twitter
PDF
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
PDF
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS
PDF
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
PPTX
Online social network analysis with machine learning techniques
PDF
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
Sentiment analysis using ml
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
IRJET- Sentiment Analysis of Twitter Data using Python
Sentiment analysis using machine learning
IRJET- Comparative Study of Classification Algorithms for Sentiment Analy...
Sentiment Analysis of Twitter Data
IRJET- Twitter Opinion Mining
Twitter Sentiment Analysis
IRJET- Sentiment Analysis using Twitter Data
Emotion Recognition By Textual Tweets Using Machine Learning
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
REPORT Sentiment_Analysis_of_Social_Media (5).pptx
IRJET-Sentiment Analysis in Twitter
Sentiment Analysis on Twitter
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
Online social network analysis with machine learning techniques
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
Ad

Recently uploaded (20)

PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
Soil Improvement Techniques Note - Rabbi
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
737-MAX_SRG.pdf student reference guides
PPTX
communication and presentation skills 01
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Fundamentals of safety and accident prevention -final (1).pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Soil Improvement Techniques Note - Rabbi
distributed database system" (DDBS) is often used to refer to both the distri...
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Categorization of Factors Affecting Classification Algorithms Selection
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Safety Seminar civil to be ensured for safe working.
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Management Information system : MIS-e-Business Systems.pptx
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Abrasive, erosive and cavitation wear.pdf
737-MAX_SRG.pdf student reference guides
communication and presentation skills 01
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Ad

Emerging Techniques in Machine Learning, Data Science and Internet of Things

  • 1. International Conference on Emerging Techniques in Machine Learning, Data Science and Internet of Things (ETMDIT- 2024) Presented by {Presenter Name} Designation Affiliation PAPER-ID:ETMDIT-{XXX} {Paper Title} S.No Name Affiliation 1 {Author 1} {Author 1 Affiliation} 2 {Author 2} {Author 2 Affiliation} 3 {Author 3} {Author 3 Affiliation} 4 {Author 4} {Author 4 Affiliation} AUTHORS
  • 2. Contents • Introduction • Literature Survey • Proposed Methodology • Results and Discussion • Conclusion • Future Scope • References 2
  • 3. Introduction Twitter, a dynamic platform, serves as a real-time canvas for public opinions and emotions. The rapid growth of user-generated content highlights the necessity of understanding sentiments on this platform. Sentiment analysis on Twitter is crucial for businesses, policymakers, and researchers to gauge public opinion and trends. Research Focus • This study explores Twitter sentiment analysis using a diverse range of machine learning algorithms. • Emphasis is placed on decoding the complex emotions within tweets. • The goal is not only to identify sentiments but also to understand the nuances and context behind them. • Ethical considerations, such as user privacy and consent, are integral to this study. 3
  • 5. Proposed Methodology Data Preprocessing: • Dataset Details: 160,000 tweets (80,000 positive, 80,000 negative). • Steps: Data cleansing, tokenization, normalization. Algorithmic Ensemble: • Support Vector Regression (SVR): Handles non-linear relationships; excels in capturing nuanced sentiment patterns. • Decision Trees: Interpretable, handles non-linear relationships; captures contextual cues. • Random Forest: Ensemble of decision trees; mitigates overfitting, enhances robustness. • Logistic Regression: Efficient for binary classification; balances complexity. Feature Selection and Extraction: • Identifies relevant features (words, n-grams, emojis). • Ensures each feature captures sentiment nuances. Training and Validation: • Cross-validation: Ensures algorithm adaptability to evolving language trends. • Figures: Word clouds for positive and negative tweets. • Evaluation Metrics: Precision, recall, F1 score: Metrics to assess algorithm performance. 5
  • 6. Data Collection • Data Source: Twitter API • Collected a dataset of 160,000 tweets. • Balanced dataset: 80,000 positive tweets, 80,000 negative tweets. Criteria for Selection: • Focused on tweets in English. • Included a mix of topics and hashtags to ensure diversity.
  • 7. Data Preprocessing Data Cleansing: • Removed irrelevant data (e.g., advertisements, non-English tweets). • Filtered out noisy and ambiguous content to enhance data quality. • Tokenization: • Split tweets into individual words or tokens.
  • 8. Data Preprocessing Normalization: • Converted text to lowercase. • Removed punctuation and special characters. • Handled contractions and common social media slangs. Feature Extraction: • Transformed text data into numerical format using techniques like TF-IDF. • Handling Emoticons and Emojis: • Incorporated emoticons and emojis as features due to their sentiment-bearing potential.
  • 9. Machine Learning Algorithms Support Vector Regression (SVR) • Strength: Effective in handling high-dimensional data and capturing complex relationships by finding the optimal hyperplane. It's particularly useful in cases where the data has clear margins of separation. Decision Trees • Strength: Intuitive and easy to interpret, decision trees are adept at handling both numerical and categorical data. They're excellent for feature selection and can handle non-linear relationships well. Algorithm: Random Forest • Strength: Combines multiple decision trees to improve accuracy and reduce overfitting. It's robust to outliers and noisy data, and it doesn't require much data preprocessing. Algorithm: Logistic Regression • Strength: A simple yet powerful algorithm for binary classification tasks. It's interpretable and efficient, making it suitable for scenarios with limited computational resources.
  • 10. Training and Validation Process 10 Training and Validation Process: • Cross-validation: Utilized to assess model performance by splitting the dataset into multiple subsets, training on a portion, and validating on the remainder. This helps in estimating the model's generalization capability. • Training on real-world data: Models were trained on authentic datasets reflecting real-world sentiments, ensuring relevance and accuracy in classification tasks. Visuals: • Word clouds for positive and negative sentiments: Word clouds visually represent the frequency of words in a corpus, with word size indicating frequency. For positive sentiment, words like "happy," "great," and "excellent" would dominate, while for negative sentiment, words like "bad," "poor," and "disappointing" would be prominent. These word clouds offer a quick snapshot of
  • 11. Training and Validation Process 11 Training and Validation Process: • Cross-validation: Utilized to assess model performance by splitting the dataset into multiple subsets, training on a portion, and validating on the remainder. This helps in estimating the model's generalization capability. • Training on real-world data: Models were trained on authentic datasets reflecting real-world sentiments, ensuring relevance and accuracy in classification tasks. Visuals: • Word clouds for positive and negative sentiments: Word clouds visually represent the frequency of words in a corpus, with word size indicating frequency. For positive sentiment, words like "happy," "great," and "excellent" would dominate, while for negative sentiment, words like "bad," "poor," and "disappointing" would be prominent. These word clouds offer a quick snapshot of
  • 13. Results and Discussion •In the context of sentiment analysis on a vast dataset comprising 1.6 million tweets, our exploration of machine learning algorithms has yielded insightful outcomes. •Logistic Regression emerged as a robust performer, achieving a high training accuracy of approximately 85% and maintaining commendable generalization with a test accuracy of around 84%. •This algorithm effectively balances simplicity with effectiveness, making it a promising choice for sentiment analysis on the given dataset. •Support Vector Regression (SVR), while not conventionally tailored for classification tasks, displayed potential for evaluating sentiment. •Utilizing regression metrics, such as mean absolute error, offered a fitting assessment of SVR's predictive accuracy. •The continuous predictions generated by SVR necessitate a different evaluation perspective compared to conventional classification algorithms. •Moving to Decision Tree analysis, the model exhibited a near-perfect training accuracy, reaching close to 100%. •However, signs of potential overfitting emerged, as evidenced by a drop in test accuracy. Decision Trees, with their inclination to memorize training data, underscore the importance of regularization techniques or ensemble methods, such as Random Forest, to enhance generalization. 13