SlideShare a Scribd company logo
Exploratory Analysis of data
Rushikesh Kulkarni (46)
Mohammed Salmanuddin (58)
Atharva Mohite (59)
Guide : Prof. Mahendra Patil
Group no: - 03
Atharva college of engineering
Exploratory Data
ANALYSIS
● As there is much such data available on the internet at this age, we have
thought of using a specific means of clustering method to cluster this
unanalysed data properly and present it to the client.
● In this analysis, the main problem is the proper clustering of the available
data and using that clustered data to plot the data on the geolocational
map according to the clusters for a better understanding.
● The objective is to use the K- means and DBSCAN algorithm as both are
an unsupervised learning method of Machine Learning technique.
● It is relatively simple to implement and understand, guarantees
convergence and mainly generalizes to clusters of different shapes and
sizes.
Problem Definition
Data ANALYSIS
Introduction
● This project aims to help incoming students find the best
accommodation in a new city by using clustering algorithms
such as K-Means and DBSCAN.
● The analysis is based on the preferences of students for
amenities, budget, and proximity to the location. By applying
exploratory data analysis techniques, the project provides
valuable insights into the dataset, and the clustering
algorithms classify the accommodation into different clusters.
● The findings of the study can assist students, universities,
and accommodation providers in improving the quality of
services and amenities provided to students.
The existing system contains hostels and apartments for
rent, and it has bought and sold options. It doesn’t
recommend accommodation in our budget. It has rare cases
of rental houses on our preferences. It also doesn’t
recommend restaurants, gyms etc., based on users’
preferences previous research lacks the accuracy of true
recommendations.
Data ANALYSIS
Background
Review of Literature
Data ANALYSIS
● Automating Exploratory Data Analysis via Machine Learning (by Tova Milo, Amit
Somech): -
The paper tells us how data scientists interactively explore unfamiliar datasets by issuing a
sequence of analysis operations (e.g. filter, aggregation, and visualization).
● Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation
(by M. Sumithra, A.Sai Pavithra, Sowmiya): -
This paper makes us understand how K-Means Clustering to find the best
accommodation for the migrants by classifying accommodation for migrants. It helped
to understand the basic working of clustering through the K-Means algorithm and was
helpful to understand how to find accommodations.
Review of Literature
Data ANALYSIS
● Clustering Evaluation by Davies-Bouldin Index in Cereal data using
K-Means (by Akhilesh Kumar Singh;Shantanu Mittal; Prashant Malhotra):
The motivation behind this research paper is to distribute the research discoveries
of applying K-Means clustering, on a cereal dataset and to differentiate the
outcomes found on the number of bunches to identify whether the ideal or best
number of groups to be 3 or 5.
● Exploratory Data Analysis using Artificial Neural Networks (by Sriram D ,
Kalaivani K , Ulaga Priya K , Saritha A , Sajeevram A ):-
This research makes us aware of the basic concepts, various types and levels of
data analysis, predictive modeling techniques and appropriate performance
measures. The authors evaluate the effectiveness of their proposed system using a
case study of a real-world dataset.
Proposed Solution
Data ANALYSIS
● The proposed system recommends various types of accommodation and (hotels, apartments,
and houses) based on the user’s budget and location.
● The clustering will help to visualize the accommodation locations much easier and efficient.
● The use of K-Means algorithm will provide the plotting of clusters on geolocational map.
● To overcome the limitations of K-Means algorithm, which fails when dealing with non
circular clusters, the system uses hybrid approach that combines the DBSCAN.
● The System is designed to provide true analysis without any major flaws or inaccuracies.
● The system has a large database of houses and apartments that fit within the user’s budget,
making it easier to find suitable accommodation options.
Technical Feasibility:
Feasibility
Data ANALYSIS
From a technical feasibility perspective, the proposed data analytics model appears
to be viable and achievable. The use of K-Means and DBSCAN clustering
algorithms, along with artificial neural networks, is a well-established approach in
data analytics and has been used successfully in a variety of applications.
from a technical perspective, the proposed model is feasible and can be
implemented using standard hardware and software tools. However, it's important to
consider the specific requirements of the project and adjust as needed to ensure
optimal performance and accuracy.
From an economic feasibility perspective, the proposed data analytics model is a
cost-effective solution as it utilizes open-source software tools and does not require any
specialized hardware. The model can be implemented on standard hardware with basic
computing capabilities.
The main purpose of the project is to identify the best accommodation for students
based on their preferences, which can have a significant impact on their academic
performance and overall satisfaction.
Economical Feasibility:
Data ANALYSIS
Block Diagram
Data ANALYSIS
Data ANALYSIS
Design of System
High-Level Approach:
• Fetch Datasets from the relevant locations (Data Collection)
• Clean the Datasets to prepare them for analysis. (Data Cleaning via Pandas)
• Visualise the data using boxplots. (Using Matplotlib /Seaborn /Pandas)
• Fetch Geolocational Data from the Foursquare API. (REST APIs)
• Use K-Means and DBSCAN Clustering to cluster the locations (Using
Scikit-Learn)
• Present findings on a map. (Using Folium/Seaborn)
Control Flow Diagram
Gantt CHart
Result Analysis
Data ANALYSIS
● Exploratory Analysis of Data with K-means Clustering and DBSCAN has been
implemented successfully on the data collected in Bombay city.
● The project uses a clustering method to find best accommodation for users
around their preferred location by classifying accommodation for migrating
students based on their preferences on amenities, budget and proximity to the
location.
● The code reads a dataset from a csv file, performs some data cleaning, and
extracts relevant features. It drops missing values and saves the cleaned data
to a new csv file.
● The first box plot shows the distribution of the features for the cleaned data. It
helps to identify any outliers, skewness or other distributional characteristics.
The boxplot of the cleaned data helps to identify any outliers or distributional
characteristics of the data.
Data ANALYSIS
● The k-means clustering algorithm is used to cluster data based on similarity
of featuresThe k-means clustering algorithm is a useful method for grouping
similar observations together. In this case, the algorithm was able to create
three distinct clusters based on the features provided.
● The DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
algorithm is used to cluster data based on the density of observations.The
algorithm groups together observations that are close to each other in terms
of their proximity in the feature space and separates those that are more
distant. The DBSCAN algorithm is useful for identifying clusters of arbitrary
shape and can handle noise and outliers in the data.
IS WORTH
A THOUSAND WORDS
Data ANALYSIS
Data
Fetched
from
APIs
Data ANALYSIS
Clustered data for K-Means Clustered data for DBSCAN
Data ANALYSIS
Visualization via Boxplot
Data ANALYSIS
Finally,
Clusters on
MAP
Data ANALYSIS
K-
M
E
A
N
S
D
BS
C
A
N
Data ANALYSIS
Future Scope
● The project model can be further refined and expanded by incorporating
additional features, such as pricing data or customer reviews.
● The model can be integrated with existing booking platforms to provide
real-time recommendations for users based on their preferences and location.
● The project can be extended to include predictive analytics for seasonal
fluctuations in demand, which can help businesses optimize pricing and
inventory management.
● Collaborating with local businesses and tourist attractions to offer exclusive deals
and discounts to customers who book accommodations through the platform, thus
increasing customer loyalty and generating additional revenue streams.
Data ANALYSIS
Conclusion
● In conclusion, the project model aimed to develop a clustered map
model that would assist immigrant students and workers in finding
suitable accommodations in a new place.
● The project utilized several techniques and methodologies such as
data mining, clustering algorithms, and Gantt charts to implement
the solution effectively.
● The results showed that the application was successful in clustering
similar accommodations based on location, price, and amenities,
and it provided accurate recommendations to the users.
Data ANALYSIS
References
[1] Exploratory Data Analysis Using Dimension Reduction [Tejas Nanaware ,
Prashant Mahajan , Ravi Chandak, Pratik Deshpande, Prof. Mahendra Patil]
[2] Automating Exploratory Data Analysis via Machine Learning [ Tova Milo, Amit
Somech ]
[3] Visualization Methods for Exploratory Data Analysis [ IEEE A.Nasser , D.Hamad ,
C.Sar ]
[4] Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation
[ M. Sumithra, A.Sai Pavithra, L.Sowmiya ]
[5] Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using
K-Means [Akhilesh Kumar Singh;Shantanu Mittal;Prashant Malhotra]
[6] Exploratory Data Analysis using Artificial Neural Networks by Sriram D , Kalaivani
K , Ulaga Priya K , Saritha A , Sajeevram A
[7] Exploratory analysis of the fire statistics using automatic time series
decomposition [M.M. Tatur;A.G. Ivanitskiy]
CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon and infographics & images by Freepik
THANK
YOU!
Data ANALYSIS
Atharva College of Engineering

More Related Content

PDF
Variance rover system web analytics tool using data
PDF
Variance rover system
PDF
Advantages of Data mining Techniques in improving CRM in the Hospitality domain
PDF
13584 27 multimedia mining
PPTX
Project 0th Review
PPT
introduction to data mining applications
PDF
Consumption capability analysis for Micro-blog users based on data mining
PDF
Benchmarking data mining approaches for traveler segmentation
Variance rover system web analytics tool using data
Variance rover system
Advantages of Data mining Techniques in improving CRM in the Hospitality domain
13584 27 multimedia mining
Project 0th Review
introduction to data mining applications
Consumption capability analysis for Micro-blog users based on data mining
Benchmarking data mining approaches for traveler segmentation

Similar to Exploratory_Analysis_of_Data_ppt.pdf (20)

PPTX
Data and Information Visualization part 2.pptx
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
PPTX
8clustering.pptx
PPT
DM UNIT_5 ppt for btech final year students
PPT
Data mining concepts and techniques Chapter 10
PPTX
KDD, Data Mining, Data Science_I.pptx
PPTX
Cluster Analysis.pptx
PPTX
Introduction to Data Mining
PPTX
Data Mining – analyse Bank Marketing Data Set
PPT
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
PPT
1.3 applications, issues
PPT
1.3 applications, issues
PDF
Lecture-1-Introduction-to-Data-Mining.pdf
PPT
data mining cocepts and techniques chapter
PDF
Clustering techniques data mining book ....
PPTX
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
PPT
10 clusbasic
PPT
CLUSTERING
PDF
10 clusbasic
DOCX
Mayer_R_212017705
Data and Information Visualization part 2.pptx
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
8clustering.pptx
DM UNIT_5 ppt for btech final year students
Data mining concepts and techniques Chapter 10
KDD, Data Mining, Data Science_I.pptx
Cluster Analysis.pptx
Introduction to Data Mining
Data Mining – analyse Bank Marketing Data Set
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
1.3 applications, issues
1.3 applications, issues
Lecture-1-Introduction-to-Data-Mining.pdf
data mining cocepts and techniques chapter
Clustering techniques data mining book ....
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
10 clusbasic
CLUSTERING
10 clusbasic
Mayer_R_212017705
Ad

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to the R Programming Language
PPTX
Computer network topology notes for revision
PDF
annual-report-2024-2025 original latest.
PPTX
1_Introduction to advance data techniques.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to the R Programming Language
Computer network topology notes for revision
annual-report-2024-2025 original latest.
1_Introduction to advance data techniques.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
[EN] Industrial Machine Downtime Prediction
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to machine learning and Linear Models
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
SAP 2 completion done . PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Supervised vs unsupervised machine learning algorithms
Ad

Exploratory_Analysis_of_Data_ppt.pdf

  • 1. Exploratory Analysis of data Rushikesh Kulkarni (46) Mohammed Salmanuddin (58) Atharva Mohite (59) Guide : Prof. Mahendra Patil Group no: - 03 Atharva college of engineering Exploratory Data ANALYSIS
  • 2. ● As there is much such data available on the internet at this age, we have thought of using a specific means of clustering method to cluster this unanalysed data properly and present it to the client. ● In this analysis, the main problem is the proper clustering of the available data and using that clustered data to plot the data on the geolocational map according to the clusters for a better understanding. ● The objective is to use the K- means and DBSCAN algorithm as both are an unsupervised learning method of Machine Learning technique. ● It is relatively simple to implement and understand, guarantees convergence and mainly generalizes to clusters of different shapes and sizes. Problem Definition Data ANALYSIS
  • 3. Introduction ● This project aims to help incoming students find the best accommodation in a new city by using clustering algorithms such as K-Means and DBSCAN. ● The analysis is based on the preferences of students for amenities, budget, and proximity to the location. By applying exploratory data analysis techniques, the project provides valuable insights into the dataset, and the clustering algorithms classify the accommodation into different clusters. ● The findings of the study can assist students, universities, and accommodation providers in improving the quality of services and amenities provided to students.
  • 4. The existing system contains hostels and apartments for rent, and it has bought and sold options. It doesn’t recommend accommodation in our budget. It has rare cases of rental houses on our preferences. It also doesn’t recommend restaurants, gyms etc., based on users’ preferences previous research lacks the accuracy of true recommendations. Data ANALYSIS Background
  • 5. Review of Literature Data ANALYSIS ● Automating Exploratory Data Analysis via Machine Learning (by Tova Milo, Amit Somech): - The paper tells us how data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. filter, aggregation, and visualization). ● Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation (by M. Sumithra, A.Sai Pavithra, Sowmiya): - This paper makes us understand how K-Means Clustering to find the best accommodation for the migrants by classifying accommodation for migrants. It helped to understand the basic working of clustering through the K-Means algorithm and was helpful to understand how to find accommodations.
  • 6. Review of Literature Data ANALYSIS ● Clustering Evaluation by Davies-Bouldin Index in Cereal data using K-Means (by Akhilesh Kumar Singh;Shantanu Mittal; Prashant Malhotra): The motivation behind this research paper is to distribute the research discoveries of applying K-Means clustering, on a cereal dataset and to differentiate the outcomes found on the number of bunches to identify whether the ideal or best number of groups to be 3 or 5. ● Exploratory Data Analysis using Artificial Neural Networks (by Sriram D , Kalaivani K , Ulaga Priya K , Saritha A , Sajeevram A ):- This research makes us aware of the basic concepts, various types and levels of data analysis, predictive modeling techniques and appropriate performance measures. The authors evaluate the effectiveness of their proposed system using a case study of a real-world dataset.
  • 7. Proposed Solution Data ANALYSIS ● The proposed system recommends various types of accommodation and (hotels, apartments, and houses) based on the user’s budget and location. ● The clustering will help to visualize the accommodation locations much easier and efficient. ● The use of K-Means algorithm will provide the plotting of clusters on geolocational map. ● To overcome the limitations of K-Means algorithm, which fails when dealing with non circular clusters, the system uses hybrid approach that combines the DBSCAN. ● The System is designed to provide true analysis without any major flaws or inaccuracies. ● The system has a large database of houses and apartments that fit within the user’s budget, making it easier to find suitable accommodation options.
  • 8. Technical Feasibility: Feasibility Data ANALYSIS From a technical feasibility perspective, the proposed data analytics model appears to be viable and achievable. The use of K-Means and DBSCAN clustering algorithms, along with artificial neural networks, is a well-established approach in data analytics and has been used successfully in a variety of applications. from a technical perspective, the proposed model is feasible and can be implemented using standard hardware and software tools. However, it's important to consider the specific requirements of the project and adjust as needed to ensure optimal performance and accuracy.
  • 9. From an economic feasibility perspective, the proposed data analytics model is a cost-effective solution as it utilizes open-source software tools and does not require any specialized hardware. The model can be implemented on standard hardware with basic computing capabilities. The main purpose of the project is to identify the best accommodation for students based on their preferences, which can have a significant impact on their academic performance and overall satisfaction. Economical Feasibility: Data ANALYSIS
  • 11. Data ANALYSIS Design of System High-Level Approach: • Fetch Datasets from the relevant locations (Data Collection) • Clean the Datasets to prepare them for analysis. (Data Cleaning via Pandas) • Visualise the data using boxplots. (Using Matplotlib /Seaborn /Pandas) • Fetch Geolocational Data from the Foursquare API. (REST APIs) • Use K-Means and DBSCAN Clustering to cluster the locations (Using Scikit-Learn) • Present findings on a map. (Using Folium/Seaborn)
  • 14. Result Analysis Data ANALYSIS ● Exploratory Analysis of Data with K-means Clustering and DBSCAN has been implemented successfully on the data collected in Bombay city. ● The project uses a clustering method to find best accommodation for users around their preferred location by classifying accommodation for migrating students based on their preferences on amenities, budget and proximity to the location. ● The code reads a dataset from a csv file, performs some data cleaning, and extracts relevant features. It drops missing values and saves the cleaned data to a new csv file. ● The first box plot shows the distribution of the features for the cleaned data. It helps to identify any outliers, skewness or other distributional characteristics. The boxplot of the cleaned data helps to identify any outliers or distributional characteristics of the data.
  • 15. Data ANALYSIS ● The k-means clustering algorithm is used to cluster data based on similarity of featuresThe k-means clustering algorithm is a useful method for grouping similar observations together. In this case, the algorithm was able to create three distinct clusters based on the features provided. ● The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm is used to cluster data based on the density of observations.The algorithm groups together observations that are close to each other in terms of their proximity in the feature space and separates those that are more distant. The DBSCAN algorithm is useful for identifying clusters of arbitrary shape and can handle noise and outliers in the data.
  • 18. Data ANALYSIS Clustered data for K-Means Clustered data for DBSCAN
  • 23. Data ANALYSIS Future Scope ● The project model can be further refined and expanded by incorporating additional features, such as pricing data or customer reviews. ● The model can be integrated with existing booking platforms to provide real-time recommendations for users based on their preferences and location. ● The project can be extended to include predictive analytics for seasonal fluctuations in demand, which can help businesses optimize pricing and inventory management. ● Collaborating with local businesses and tourist attractions to offer exclusive deals and discounts to customers who book accommodations through the platform, thus increasing customer loyalty and generating additional revenue streams.
  • 24. Data ANALYSIS Conclusion ● In conclusion, the project model aimed to develop a clustered map model that would assist immigrant students and workers in finding suitable accommodations in a new place. ● The project utilized several techniques and methodologies such as data mining, clustering algorithms, and Gantt charts to implement the solution effectively. ● The results showed that the application was successful in clustering similar accommodations based on location, price, and amenities, and it provided accurate recommendations to the users.
  • 25. Data ANALYSIS References [1] Exploratory Data Analysis Using Dimension Reduction [Tejas Nanaware , Prashant Mahajan , Ravi Chandak, Pratik Deshpande, Prof. Mahendra Patil] [2] Automating Exploratory Data Analysis via Machine Learning [ Tova Milo, Amit Somech ] [3] Visualization Methods for Exploratory Data Analysis [ IEEE A.Nasser , D.Hamad , C.Sar ] [4] Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation [ M. Sumithra, A.Sai Pavithra, L.Sowmiya ] [5] Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means [Akhilesh Kumar Singh;Shantanu Mittal;Prashant Malhotra] [6] Exploratory Data Analysis using Artificial Neural Networks by Sriram D , Kalaivani K , Ulaga Priya K , Saritha A , Sajeevram A [7] Exploratory analysis of the fire statistics using automatic time series decomposition [M.M. Tatur;A.G. Ivanitskiy]
  • 26. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon and infographics & images by Freepik THANK YOU! Data ANALYSIS Atharva College of Engineering