SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
SARVAJANIK COLLEGE OF ENGINEERING AND TECHNOLOGY
INFORMATION AND TECHNOLOGY DEPARTMENT
B. E - VII, IT, SEM - 8
(Term : EVEN - 2022-23)
Presentation
on
“DATA ANALYTICS INTERN”
Subject Name : Internship (3181601)
Prepared and Presented by
Anuj Vaghani (Enrollment No : 190420116070)
Guided by
Prof. Apurva Bharat Mandalaywala
PRESENTATION OUTLINES
● Introduction
● Learning Data Science With Python - Libraries
● Methodology
● Machine Learning
● Outline of work
● Future Work
● References
● Conclusion
Introduction
● Background of the Internship and Company
My name is Anuj Vaghani and I am currently interning at Devotee Infotech Private Limited. During
my internship, I have been working with the data analytics and machine learning teams, and have
gained valuable insights into how these technologies can drive business success .
● Definition of Data Analytics and Machine Learning
Data analytics is the process of analysing and interpreting large sets of data to extract insights and
make informed decisions. Machine learning, on the other hand, is a subset of artificial intelligence that
enables computer systems to learn from data and improve their performance over time.
Introduction
● Objective of Presentation
The objective of this presentation is to provide an overview of data analytics and machine learning, and
their importance in today's business landscape. I will discuss the key concepts, processes, and
techniques involved in data analytics and machine learning, as well as their applications in various
industries. Additionally, I will share my experiences working with the data analytics and machine
learning teams at Devotee Infotech Private Limited, and provide recommendations for how these
technologies can be leveraged to drive business success.
Learning Data Science With Python - Libraries
NumPy is a powerful library for numerical computing in
Python. It provides support for multi-dimensional arrays,
mathematical functions, and operations on arrays. Some of
the key topics that will be covered in this section include:
• Basics of NumPy arrays
• Array operations and calculations
• Mathematical functions and operations
• Random number generation
Pandas is a library for data manipulation and analysis. It
provides tools for working with structured data, such as
data frames and series, and supports a wide range of data
formats. Some of the key topics that will be covered in
this section include:
• Basics of Pandas data frames and series
• Data manipulation and cleaning
• Data aggregation and summarization
• Merging and joining data frames
Learning Data Science With Python - Libraries
Matplotlib is a powerful library for data visualization in
Python. It provides support for creating a wide range of
charts and plots, including line charts, scatter plots,
histograms, and heatmaps. Some of the key topics that will
be covered in this section include:
• Basics of Matplotlib charts and plots
• Customizing charts and plots
• Adding labels and annotations
• Creating subplots and multiple charts
Seaborn is a library for data visualization that is built
on top of Matplotlib. It provides a higher-level
interface for creating sophisticated and aesthetically
pleasing visualizations. Some of the key topics that
will be covered in this section include:
• Basics of Seaborn charts and plots
• Customizing Seaborn visualizations
• Creating complex visualizations, such as heatmaps and
• violin plots
• Visualizing relationships between variables
Matplotlib Seaborn
Methodology
(1) Introduction to Methodology
In order to effectively utilize data analytics and machine learning during my internship, I followed a
specific methodology to guide my work.
There is seven steps show in below for methodology
(2) Define Problem Statement
The first step was to clearly define the problem statement or objective that I wanted to achieve. This
involved identifying the business problem or opportunity and specifying the data sources and variables
that were relevant to the problem.
(3) Data Collection and Preparation
The next step was to collect and prepare the data for analysis. This involved identifying the relevant data
sources and extracting the data, cleaning and transforming the data, and ensuring that the data was ready
for analysis.
The third step was to conduct exploratory data analysis (EDA) to gain a better understanding of the data
and identify any patterns or anomalies. This involved using various statistical and visualization techniques
to explore the data and gain insights.
(4) Exploratory Data Analysis (EDA)
Methodology
(5) Feature Selection and Engineering
The next step was to select and engineer the features that would be used for machine learning. This
involved identifying the relevant features and engineering them to improve their predictive power.
(6) Model Selection and Training
The next step was to select the appropriate machine learning model and train it on the prepared data.
This involved selecting the right algorithm, tuning the hyperparameters, and training the model using a
variety of techniques.
(7) Model Evaluation and Deployment
The final step was to evaluate the performance of the machine learning model and deploy it for use in the
real world. This involved evaluating the model's accuracy and performance, testing it on new data, and
deploying it in a way that could be easily integrated into the business process.
Machine Learning
Types of Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Deep Learning
Machine Learning
SUPERVISED LEARNING
• Common Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests,
Support Vector Machines, Naive Bayes, k-Nearest Neighbours.
• Applications in Industry, such as Fraud Detection, Demand Forecasting, and Image Recognition
• Demo: Building a Regression Model to Predict Housing Prices Based on Features such as
Location, Square Footage, and Number of Bedrooms/Bathrooms.
UNSUPERVISED LEARNING
• Common Algorithms: k-Means Clustering, Hierarchical Clustering, Principal Component Analysis,
t-SNE .
• Applications in Industry, such as Customer Segmentation and Anomaly Detection
• Demo: Using k-Means Clustering to Segment Customer Data and Identify Groups with Similar
Behaviour's and Characteristics.
Outline of work
During my internship, I created two major projects that covered all
the concepts of technology
(1) HOTEL BOOKING ANALYSIS
The purpose of this project was to analyse hotel booking data and gain insights into the factors that influence
hotel booking cancellations. The data was collected from a publicly available dataset on Kaggle, and the
analysis was performed using Python and its data analytics libraries. The project involved data cleaning and
pre-processing, exploratory data analysis, and data visualization to gain insights into the patterns and trends
in the data. The results of the analysis provide insights into the key factors that contribute to hotel booking
cancellations and offer recommendations to hotel operators to reduce cancellations and optimize revenue.
Outline of work
• Identify Relevant Data Sources: Hotel Booking Dataset from Kaggle
• Collect Data and Store it in a Structured Manner
• Perform Data Cleaning and Pre-processing
• Exploratory Data Analysis B. Data Modelling and Analysis
• Determine Relevant Variables and Features: Guest Demographics, Booking Details, Hotel
Information, etc.
• Choose Appropriate Modelling Techniques: Linear Regression, Random Forest Regression, etc.
• Evaluation of Models: Mean Squared Error, R-Squared Value, etc. C. Results and Conclusion
• Insights Gained from Analysis: Key Drivers of Booking Cancellations, Popular Booking Channels,
etc.
• Potential Business Applications: Improve Booking Experience, Optimize Hotel Inventory
Management, etc.
Outline of work
OBSERVATION
'Direct' and 'Online
TA' are contributing
the most in both types
of hotels. Aviation
segment should focus
on increasing the
bookings of 'City
Hotel’.
Outline of work
In conclusion, this project has successfully analysed the hotel booking dataset and
provided insights into the factors that contribute to booking cancellations. The analysis
revealed that the lead time between booking and arrival, the type of booking, and the
deposit type were the most significant factors contributing to cancellations. Moreover, it
was found that customers who book through online travel agencies are more likely to
cancel their bookings than those who book directly through the hotel website. The results
of this analysis can help hotel operators to optimize their booking processes, improve
customer experience, and reduce cancellations, ultimately leading to improved revenue
and profitability. Further research could be conducted to explore additional factors that
may influence hotel booking cancellations and to develop predictive models that can help
to forecast cancellations and adjust booking policies accordingly.
Outline of work
During my internship, I created two major projects that covered all
the concepts of technology
(2) Bike Sharing Demand Prediction Project :
The purpose of this project was to predict the demand for bike sharing services based on historical data
using regression techniques. The data was collected from a publicly available dataset on Kaggle and the
analysis was performed using Python and its machine learning libraries. The project involved data cleaning
and pre-processing, feature engineering, model selection, and evaluation to develop an accurate regression
model that can predict bike demand. The results of the analysis provide insights into the key factors that
influence bike demand and offer a predictive model that can help bike-sharing operators to optimize their
service and improve customer satisfaction.
Outline of work
• Using different algorithms gave me different accuracy
Linear Regression
Looks like our r2 score value is 0.77
that means our model is able to
capture most of the data variance.
Let's save it in a data frame for
later comparisons.
Outline of work
LASSO REGRESSION ( L1 REGULARIZATION )
Looks like our r2 score value is 0.40
that means our model is not able to
capture most of the data variance. Let's
save it in a Data Frame for later
comparisons
Outline of work
RIDGE REGRESSION ( L2 REGULARIZATION )
Looks like our r2 score value is 0.77
that means our model is able to
capture most of the data variance.
Let's save it in a data frame for later
comparisons.
Outline of work
ELASTIC NET REGRESSION
Looks like our r2 score value is 0.62
that means our model is able to
capture most of the data variance.
Let's save it in a data frame for later
comparisons.
Outline of work
• No overfitting is seen.
• Random forest Regressor and Gradient Boosting
GridSearchCV gives the highest R2 score of 99%
and 95% respectively for Train Set and 92% for Test
set.
• Feature Importance value for Random Forest and
Gradient Boost are different.
• We can deploy this model.
Future Work
DEEP LEARNING : Deep learning is a subfield of machine learning that uses artificial neural
networks to model and solve complex problems. With the increasing availability of large-scale data sets
and powerful computing resources, deep learning has the potential to transform many industries and
solve some of the world's most pressing problems.
REINFORCEMENT LEARNING : Reinforcement learning is a type of machine learning that focuses
on learning through trial and error. This approach has been used to develop sophisticated game-playing
algorithms, but has the potential to be applied to a wide range of fields, including robotics, finance, and
healthcare.
DATA VISUALIZATION : Data visualization is the art of communicating complex data through
visual representations. Future work in this area could focus on developing new techniques for
visualizing large-scale and high-dimensional data sets, as well as exploring the use of augmented and
virtual reality for data visualization.
References
Kaggle: https://www.kaggle.com/competitions
Google AI Residency Program: https://ai.google/education/research/ai-residency/
Microsoft AI Residency Program: https://www.microsoft.com/en-us/research/academic-program/microsoft-ai
IBM Data Science Elite Team: https://www.ibm.com/analytics/data-science-elite-team
Amazon Machine Learning Internship: https://www.amazon.jobs/en/teams/internships-for-students-machine-learning
Kaggle: Machine Learning Tutorials: https://www.kaggle.com/learn/machine-learning
Fast.Ai: Practical Deep Learning for Coders: https://course.fast.ai/
Conclusion
Ø Machine learning and data analytics have become essential tools for businesses and organizations of all
sizes and industries. They allow us to extract insights and knowledge from vast amounts of data, automate
repetitive tasks, and make better-informed decisions based on data-driven evidence.
Ø I have had the opportunity to work with a variety of machine learning algorithms and data analytics tools,
such as Python, TensorFlow, and Tableau. I have learned how to pre-process data, train models, and
visualize results, and I have gained a deep appreciation for the power and complexity of these
technologies.
Ø Overall, I am grateful for the opportunity to have worked on real-world projects in machine learning and
data analytics, and I am excited to continue learning and growing in these fields. Thank you for your
attention, and I am happy to answer any questions you may have.
Thank
You !!

More Related Content

PPTX
data science & machine learning prasentation
PPTX
Data scientist roadmap
PPTX
Industrial training ppt
PDF
LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec...
PPTX
Data science life cycle
PDF
Data science - An Introduction
PPTX
Next generation of data scientist
PDF
Data Science Project Lifecycle
data science & machine learning prasentation
Data scientist roadmap
Industrial training ppt
LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec...
Data science life cycle
Data science - An Introduction
Next generation of data scientist
Data Science Project Lifecycle

What's hot (20)

PPTX
Internship Presentation 1 Web Developer
PPTX
CAR PRICE PREDICTION.pptx
PPTX
Twitter sentiment analysis ppt
PPTX
Sentiment analysis
PDF
Loan approval prediction based on machine learning approach
DOCX
Tweet sentiment analysis
PDF
Data science presentation
DOCX
Big data lecture notes
PPTX
Introduction to Data Science.pptx
PPTX
DIABETES PREDICTION SYSTEM .pptx
PPTX
Loan Prediction System Using Machine Learning.pptx
PPTX
Attendance Management System using Face Recognition
PPTX
Credit card fraud detection using machine learning Algorithms
PPTX
Presentation on Sentiment Analysis
PPTX
Predicting house price
PDF
Driver Drowsiness Detection report
PPTX
Driver drowsiness monitoring system using visual behavior and Machine Learning.
PDF
Project black book TYIT
PPTX
Face recognition attendance system
PPTX
Machine Learning for Disease Prediction
Internship Presentation 1 Web Developer
CAR PRICE PREDICTION.pptx
Twitter sentiment analysis ppt
Sentiment analysis
Loan approval prediction based on machine learning approach
Tweet sentiment analysis
Data science presentation
Big data lecture notes
Introduction to Data Science.pptx
DIABETES PREDICTION SYSTEM .pptx
Loan Prediction System Using Machine Learning.pptx
Attendance Management System using Face Recognition
Credit card fraud detection using machine learning Algorithms
Presentation on Sentiment Analysis
Predicting house price
Driver Drowsiness Detection report
Driver drowsiness monitoring system using visual behavior and Machine Learning.
Project black book TYIT
Face recognition attendance system
Machine Learning for Disease Prediction
Ad

Similar to Internship Presentation.pdf (20)

PPTX
ML_Internship Presentation_Infidata_2021.pptx
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Machine learning and big data
PDF
Big Data And Machine Learning Using MATLAB.pdf
PDF
Data Analysis - Making Big Data Work
PDF
DILE CSE SEO DIGITAL GGGTECHNICAL INTERm.pdf
PDF
Python Advanced Predictive Analytics Kumar Ashish
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Data science guide
PDF
Brochure data science learning path board-infinity (1)
PPTX
Python for Data Science Professionals.pptx
PPTX
Introduction to data science
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PDF
Data mining guest lecture (CSE6331 University of Texas, Arlington) 2004
PPTX
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
PPTX
Data Science Roadmap by Swapnil Microsoft
PPTX
Workshop_Presentation.pptx
PDF
Top Data Science Projects in Python for Practice | IABAC
ML_Internship Presentation_Infidata_2021.pptx
Mastering Predictive Analytics with R 2nd edition Edition Forte
Machine learning and big data
Big Data And Machine Learning Using MATLAB.pdf
Data Analysis - Making Big Data Work
DILE CSE SEO DIGITAL GGGTECHNICAL INTERm.pdf
Python Advanced Predictive Analytics Kumar Ashish
Mastering Predictive Analytics with R 2nd edition Edition Forte
Data science guide
Brochure data science learning path board-infinity (1)
Python for Data Science Professionals.pptx
Introduction to data science
Mastering Predictive Analytics with R 2nd edition Edition Forte
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Data mining guest lecture (CSE6331 University of Texas, Arlington) 2004
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Data Science Roadmap by Swapnil Microsoft
Workshop_Presentation.pptx
Top Data Science Projects in Python for Practice | IABAC
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
A Presentation on Artificial Intelligence

Internship Presentation.pdf

  • 1. SARVAJANIK COLLEGE OF ENGINEERING AND TECHNOLOGY INFORMATION AND TECHNOLOGY DEPARTMENT B. E - VII, IT, SEM - 8 (Term : EVEN - 2022-23) Presentation on “DATA ANALYTICS INTERN” Subject Name : Internship (3181601) Prepared and Presented by Anuj Vaghani (Enrollment No : 190420116070) Guided by Prof. Apurva Bharat Mandalaywala
  • 2. PRESENTATION OUTLINES ● Introduction ● Learning Data Science With Python - Libraries ● Methodology ● Machine Learning ● Outline of work ● Future Work ● References ● Conclusion
  • 3. Introduction ● Background of the Internship and Company My name is Anuj Vaghani and I am currently interning at Devotee Infotech Private Limited. During my internship, I have been working with the data analytics and machine learning teams, and have gained valuable insights into how these technologies can drive business success . ● Definition of Data Analytics and Machine Learning Data analytics is the process of analysing and interpreting large sets of data to extract insights and make informed decisions. Machine learning, on the other hand, is a subset of artificial intelligence that enables computer systems to learn from data and improve their performance over time.
  • 4. Introduction ● Objective of Presentation The objective of this presentation is to provide an overview of data analytics and machine learning, and their importance in today's business landscape. I will discuss the key concepts, processes, and techniques involved in data analytics and machine learning, as well as their applications in various industries. Additionally, I will share my experiences working with the data analytics and machine learning teams at Devotee Infotech Private Limited, and provide recommendations for how these technologies can be leveraged to drive business success.
  • 5. Learning Data Science With Python - Libraries NumPy is a powerful library for numerical computing in Python. It provides support for multi-dimensional arrays, mathematical functions, and operations on arrays. Some of the key topics that will be covered in this section include: • Basics of NumPy arrays • Array operations and calculations • Mathematical functions and operations • Random number generation Pandas is a library for data manipulation and analysis. It provides tools for working with structured data, such as data frames and series, and supports a wide range of data formats. Some of the key topics that will be covered in this section include: • Basics of Pandas data frames and series • Data manipulation and cleaning • Data aggregation and summarization • Merging and joining data frames
  • 6. Learning Data Science With Python - Libraries Matplotlib is a powerful library for data visualization in Python. It provides support for creating a wide range of charts and plots, including line charts, scatter plots, histograms, and heatmaps. Some of the key topics that will be covered in this section include: • Basics of Matplotlib charts and plots • Customizing charts and plots • Adding labels and annotations • Creating subplots and multiple charts Seaborn is a library for data visualization that is built on top of Matplotlib. It provides a higher-level interface for creating sophisticated and aesthetically pleasing visualizations. Some of the key topics that will be covered in this section include: • Basics of Seaborn charts and plots • Customizing Seaborn visualizations • Creating complex visualizations, such as heatmaps and • violin plots • Visualizing relationships between variables Matplotlib Seaborn
  • 7. Methodology (1) Introduction to Methodology In order to effectively utilize data analytics and machine learning during my internship, I followed a specific methodology to guide my work. There is seven steps show in below for methodology (2) Define Problem Statement The first step was to clearly define the problem statement or objective that I wanted to achieve. This involved identifying the business problem or opportunity and specifying the data sources and variables that were relevant to the problem. (3) Data Collection and Preparation The next step was to collect and prepare the data for analysis. This involved identifying the relevant data sources and extracting the data, cleaning and transforming the data, and ensuring that the data was ready for analysis. The third step was to conduct exploratory data analysis (EDA) to gain a better understanding of the data and identify any patterns or anomalies. This involved using various statistical and visualization techniques to explore the data and gain insights. (4) Exploratory Data Analysis (EDA)
  • 8. Methodology (5) Feature Selection and Engineering The next step was to select and engineer the features that would be used for machine learning. This involved identifying the relevant features and engineering them to improve their predictive power. (6) Model Selection and Training The next step was to select the appropriate machine learning model and train it on the prepared data. This involved selecting the right algorithm, tuning the hyperparameters, and training the model using a variety of techniques. (7) Model Evaluation and Deployment The final step was to evaluate the performance of the machine learning model and deploy it for use in the real world. This involved evaluating the model's accuracy and performance, testing it on new data, and deploying it in a way that could be easily integrated into the business process.
  • 9. Machine Learning Types of Machine Learning • Supervised Learning • Unsupervised Learning • Reinforcement Learning • Deep Learning
  • 10. Machine Learning SUPERVISED LEARNING • Common Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, Naive Bayes, k-Nearest Neighbours. • Applications in Industry, such as Fraud Detection, Demand Forecasting, and Image Recognition • Demo: Building a Regression Model to Predict Housing Prices Based on Features such as Location, Square Footage, and Number of Bedrooms/Bathrooms. UNSUPERVISED LEARNING • Common Algorithms: k-Means Clustering, Hierarchical Clustering, Principal Component Analysis, t-SNE . • Applications in Industry, such as Customer Segmentation and Anomaly Detection • Demo: Using k-Means Clustering to Segment Customer Data and Identify Groups with Similar Behaviour's and Characteristics.
  • 11. Outline of work During my internship, I created two major projects that covered all the concepts of technology (1) HOTEL BOOKING ANALYSIS The purpose of this project was to analyse hotel booking data and gain insights into the factors that influence hotel booking cancellations. The data was collected from a publicly available dataset on Kaggle, and the analysis was performed using Python and its data analytics libraries. The project involved data cleaning and pre-processing, exploratory data analysis, and data visualization to gain insights into the patterns and trends in the data. The results of the analysis provide insights into the key factors that contribute to hotel booking cancellations and offer recommendations to hotel operators to reduce cancellations and optimize revenue.
  • 12. Outline of work • Identify Relevant Data Sources: Hotel Booking Dataset from Kaggle • Collect Data and Store it in a Structured Manner • Perform Data Cleaning and Pre-processing • Exploratory Data Analysis B. Data Modelling and Analysis • Determine Relevant Variables and Features: Guest Demographics, Booking Details, Hotel Information, etc. • Choose Appropriate Modelling Techniques: Linear Regression, Random Forest Regression, etc. • Evaluation of Models: Mean Squared Error, R-Squared Value, etc. C. Results and Conclusion • Insights Gained from Analysis: Key Drivers of Booking Cancellations, Popular Booking Channels, etc. • Potential Business Applications: Improve Booking Experience, Optimize Hotel Inventory Management, etc.
  • 13. Outline of work OBSERVATION 'Direct' and 'Online TA' are contributing the most in both types of hotels. Aviation segment should focus on increasing the bookings of 'City Hotel’.
  • 14. Outline of work In conclusion, this project has successfully analysed the hotel booking dataset and provided insights into the factors that contribute to booking cancellations. The analysis revealed that the lead time between booking and arrival, the type of booking, and the deposit type were the most significant factors contributing to cancellations. Moreover, it was found that customers who book through online travel agencies are more likely to cancel their bookings than those who book directly through the hotel website. The results of this analysis can help hotel operators to optimize their booking processes, improve customer experience, and reduce cancellations, ultimately leading to improved revenue and profitability. Further research could be conducted to explore additional factors that may influence hotel booking cancellations and to develop predictive models that can help to forecast cancellations and adjust booking policies accordingly.
  • 15. Outline of work During my internship, I created two major projects that covered all the concepts of technology (2) Bike Sharing Demand Prediction Project : The purpose of this project was to predict the demand for bike sharing services based on historical data using regression techniques. The data was collected from a publicly available dataset on Kaggle and the analysis was performed using Python and its machine learning libraries. The project involved data cleaning and pre-processing, feature engineering, model selection, and evaluation to develop an accurate regression model that can predict bike demand. The results of the analysis provide insights into the key factors that influence bike demand and offer a predictive model that can help bike-sharing operators to optimize their service and improve customer satisfaction.
  • 16. Outline of work • Using different algorithms gave me different accuracy Linear Regression Looks like our r2 score value is 0.77 that means our model is able to capture most of the data variance. Let's save it in a data frame for later comparisons.
  • 17. Outline of work LASSO REGRESSION ( L1 REGULARIZATION ) Looks like our r2 score value is 0.40 that means our model is not able to capture most of the data variance. Let's save it in a Data Frame for later comparisons
  • 18. Outline of work RIDGE REGRESSION ( L2 REGULARIZATION ) Looks like our r2 score value is 0.77 that means our model is able to capture most of the data variance. Let's save it in a data frame for later comparisons.
  • 19. Outline of work ELASTIC NET REGRESSION Looks like our r2 score value is 0.62 that means our model is able to capture most of the data variance. Let's save it in a data frame for later comparisons.
  • 20. Outline of work • No overfitting is seen. • Random forest Regressor and Gradient Boosting GridSearchCV gives the highest R2 score of 99% and 95% respectively for Train Set and 92% for Test set. • Feature Importance value for Random Forest and Gradient Boost are different. • We can deploy this model.
  • 21. Future Work DEEP LEARNING : Deep learning is a subfield of machine learning that uses artificial neural networks to model and solve complex problems. With the increasing availability of large-scale data sets and powerful computing resources, deep learning has the potential to transform many industries and solve some of the world's most pressing problems. REINFORCEMENT LEARNING : Reinforcement learning is a type of machine learning that focuses on learning through trial and error. This approach has been used to develop sophisticated game-playing algorithms, but has the potential to be applied to a wide range of fields, including robotics, finance, and healthcare. DATA VISUALIZATION : Data visualization is the art of communicating complex data through visual representations. Future work in this area could focus on developing new techniques for visualizing large-scale and high-dimensional data sets, as well as exploring the use of augmented and virtual reality for data visualization.
  • 22. References Kaggle: https://www.kaggle.com/competitions Google AI Residency Program: https://ai.google/education/research/ai-residency/ Microsoft AI Residency Program: https://www.microsoft.com/en-us/research/academic-program/microsoft-ai IBM Data Science Elite Team: https://www.ibm.com/analytics/data-science-elite-team Amazon Machine Learning Internship: https://www.amazon.jobs/en/teams/internships-for-students-machine-learning Kaggle: Machine Learning Tutorials: https://www.kaggle.com/learn/machine-learning Fast.Ai: Practical Deep Learning for Coders: https://course.fast.ai/
  • 23. Conclusion Ø Machine learning and data analytics have become essential tools for businesses and organizations of all sizes and industries. They allow us to extract insights and knowledge from vast amounts of data, automate repetitive tasks, and make better-informed decisions based on data-driven evidence. Ø I have had the opportunity to work with a variety of machine learning algorithms and data analytics tools, such as Python, TensorFlow, and Tableau. I have learned how to pre-process data, train models, and visualize results, and I have gained a deep appreciation for the power and complexity of these technologies. Ø Overall, I am grateful for the opportunity to have worked on real-world projects in machine learning and data analytics, and I am excited to continue learning and growing in these fields. Thank you for your attention, and I am happy to answer any questions you may have.