Exploratory_Analysis_of_Data_ppt.pdf

Exploratory Analysis of data
Rushikesh Kulkarni (46)
Mohammed Salmanuddin (58)
Atharva Mohite (59)
Guide : Prof. Mahendra Patil
Group no: - 03
Atharva college of engineering
Exploratory Data
ANALYSIS

● As there is much such data available on the internet at this age, we have
thought of using a specific means of clustering method to cluster this
unanalysed data properly and present it to the client.
● In this analysis, the main problem is the proper clustering of the available
data and using that clustered data to plot the data on the geolocational
map according to the clusters for a better understanding.
● The objective is to use the K- means and DBSCAN algorithm as both are
an unsupervised learning method of Machine Learning technique.
● It is relatively simple to implement and understand, guarantees
convergence and mainly generalizes to clusters of different shapes and
sizes.
Problem Definition
Data ANALYSIS

Introduction
● This project aims to help incoming students find the best
accommodation in a new city by using clustering algorithms
such as K-Means and DBSCAN.
● The analysis is based on the preferences of students for
amenities, budget, and proximity to the location. By applying
exploratory data analysis techniques, the project provides
valuable insights into the dataset, and the clustering
algorithms classify the accommodation into different clusters.
● The findings of the study can assist students, universities,
and accommodation providers in improving the quality of
services and amenities provided to students.

The existing system contains hostels and apartments for
rent, and it has bought and sold options. It doesn’t
recommend accommodation in our budget. It has rare cases
of rental houses on our preferences. It also doesn’t
recommend restaurants, gyms etc., based on users’
preferences previous research lacks the accuracy of true
recommendations.
Data ANALYSIS
Background

Review of Literature
Data ANALYSIS
● Automating Exploratory Data Analysis via Machine Learning (by Tova Milo, Amit
Somech): -
The paper tells us how data scientists interactively explore unfamiliar datasets by issuing a
sequence of analysis operations (e.g. filter, aggregation, and visualization).
● Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation
(by M. Sumithra, A.Sai Pavithra, Sowmiya): -
This paper makes us understand how K-Means Clustering to find the best
accommodation for the migrants by classifying accommodation for migrants. It helped
to understand the basic working of clustering through the K-Means algorithm and was
helpful to understand how to find accommodations.

Review of Literature
Data ANALYSIS
● Clustering Evaluation by Davies-Bouldin Index in Cereal data using
K-Means (by Akhilesh Kumar Singh;Shantanu Mittal; Prashant Malhotra):
The motivation behind this research paper is to distribute the research discoveries
of applying K-Means clustering, on a cereal dataset and to differentiate the
outcomes found on the number of bunches to identify whether the ideal or best
number of groups to be 3 or 5.
● Exploratory Data Analysis using Artificial Neural Networks (by Sriram D ,
Kalaivani K , Ulaga Priya K , Saritha A , Sajeevram A ):-
This research makes us aware of the basic concepts, various types and levels of
data analysis, predictive modeling techniques and appropriate performance
measures. The authors evaluate the effectiveness of their proposed system using a
case study of a real-world dataset.

Proposed Solution
Data ANALYSIS
● The proposed system recommends various types of accommodation and (hotels, apartments,
and houses) based on the user’s budget and location.
● The clustering will help to visualize the accommodation locations much easier and efficient.
● The use of K-Means algorithm will provide the plotting of clusters on geolocational map.
● To overcome the limitations of K-Means algorithm, which fails when dealing with non
circular clusters, the system uses hybrid approach that combines the DBSCAN.
● The System is designed to provide true analysis without any major flaws or inaccuracies.
● The system has a large database of houses and apartments that fit within the user’s budget,
making it easier to find suitable accommodation options.

Technical Feasibility:
Feasibility
Data ANALYSIS
From a technical feasibility perspective, the proposed data analytics model appears
to be viable and achievable. The use of K-Means and DBSCAN clustering
algorithms, along with artificial neural networks, is a well-established approach in
data analytics and has been used successfully in a variety of applications.
from a technical perspective, the proposed model is feasible and can be
implemented using standard hardware and software tools. However, it's important to
consider the specific requirements of the project and adjust as needed to ensure
optimal performance and accuracy.

From an economic feasibility perspective, the proposed data analytics model is a
cost-effective solution as it utilizes open-source software tools and does not require any
specialized hardware. The model can be implemented on standard hardware with basic
computing capabilities.
The main purpose of the project is to identify the best accommodation for students
based on their preferences, which can have a significant impact on their academic
performance and overall satisfaction.
Economical Feasibility:
Data ANALYSIS

Data ANALYSIS
Design of System
High-Level Approach:
• Fetch Datasets from the relevant locations (Data Collection)
• Clean the Datasets to prepare them for analysis. (Data Cleaning via Pandas)
• Visualise the data using boxplots. (Using Matplotlib /Seaborn /Pandas)
• Fetch Geolocational Data from the Foursquare API. (REST APIs)
• Use K-Means and DBSCAN Clustering to cluster the locations (Using
Scikit-Learn)
• Present ﬁndings on a map. (Using Folium/Seaborn)

Result Analysis
Data ANALYSIS
● Exploratory Analysis of Data with K-means Clustering and DBSCAN has been
implemented successfully on the data collected in Bombay city.
● The project uses a clustering method to find best accommodation for users
around their preferred location by classifying accommodation for migrating
students based on their preferences on amenities, budget and proximity to the
location.
● The code reads a dataset from a csv file, performs some data cleaning, and
extracts relevant features. It drops missing values and saves the cleaned data
to a new csv file.
● The first box plot shows the distribution of the features for the cleaned data. It
helps to identify any outliers, skewness or other distributional characteristics.
The boxplot of the cleaned data helps to identify any outliers or distributional
characteristics of the data.

Data ANALYSIS
● The k-means clustering algorithm is used to cluster data based on similarity
of featuresThe k-means clustering algorithm is a useful method for grouping
similar observations together. In this case, the algorithm was able to create
three distinct clusters based on the features provided.
● The DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
algorithm is used to cluster data based on the density of observations.The
algorithm groups together observations that are close to each other in terms
of their proximity in the feature space and separates those that are more
distant. The DBSCAN algorithm is useful for identifying clusters of arbitrary
shape and can handle noise and outliers in the data.

Data ANALYSIS
Data
Fetched
from
APIs

Data ANALYSIS
Clustered data for K-Means Clustered data for DBSCAN

Data ANALYSIS
Visualization via Boxplot

Data ANALYSIS
Finally,
Clusters on
MAP

Data ANALYSIS
Future Scope
● The project model can be further refined and expanded by incorporating
additional features, such as pricing data or customer reviews.
● The model can be integrated with existing booking platforms to provide
real-time recommendations for users based on their preferences and location.
● The project can be extended to include predictive analytics for seasonal
fluctuations in demand, which can help businesses optimize pricing and
inventory management.
● Collaborating with local businesses and tourist attractions to offer exclusive deals
and discounts to customers who book accommodations through the platform, thus
increasing customer loyalty and generating additional revenue streams.

Data ANALYSIS
Conclusion
● In conclusion, the project model aimed to develop a clustered map
model that would assist immigrant students and workers in finding
suitable accommodations in a new place.
● The project utilized several techniques and methodologies such as
data mining, clustering algorithms, and Gantt charts to implement
the solution effectively.
● The results showed that the application was successful in clustering
similar accommodations based on location, price, and amenities,
and it provided accurate recommendations to the users.

Data ANALYSIS
References
[1] Exploratory Data Analysis Using Dimension Reduction [Tejas Nanaware ,
Prashant Mahajan , Ravi Chandak, Pratik Deshpande, Prof. Mahendra Patil]
[2] Automating Exploratory Data Analysis via Machine Learning [ Tova Milo, Amit
Somech ]
[3] Visualization Methods for Exploratory Data Analysis [ IEEE A.Nasser , D.Hamad ,
C.Sar ]
[4] Exploratory Analysis of Geo-Locational Data - Accommodation Recommendation
[ M. Sumithra, A.Sai Pavithra, L.Sowmiya ]
[5] Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using
K-Means [Akhilesh Kumar Singh;Shantanu Mittal;Prashant Malhotra]
[6] Exploratory Data Analysis using Artificial Neural Networks by Sriram D , Kalaivani
K , Ulaga Priya K , Saritha A , Sajeevram A
[7] Exploratory analysis of the fire statistics using automatic time series
decomposition [M.M. Tatur;A.G. Ivanitskiy]

CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon and infographics & images by Freepik
THANK
YOU!
Data ANALYSIS
Atharva College of Engineering

Exploratory_Analysis_of_Data_ppt.pdf

More Related Content

Similar to Exploratory_Analysis_of_Data_ppt.pdf (20)

Recently uploaded (20)

Exploratory_Analysis_of_Data_ppt.pdf