SlideShare a Scribd company logo
DATA ANALYSIS ON
WEATHER FORECASTING
Prepared by,
Trupti Shingala
Introduction: Dataset
We have used weather forecast dataset having 366
observations from rattle package in R.
Used following Independent variables from the
dataset:
Max_Temperature , Min_Temperature,
WindSpeed3pm,WindSpeed9am, Pressure3pm,
Humidity9am, Humidity3pm,RainToday,
RainTomorrow.
Data Clean and Goals
 Replaced the missing value with the field mean for
numerical data.
 Implement various algorithms on the data to help
derive conclusion on classification and clustering of
data.
Algorithms used
Classification:
 K-nearest neighbors
 Naive Bayes
 DecisionTree- Rpart
Clustering:
 K means clustering
Classification and RegressionTree
 The decision trees produced by CART are
strictly binary, containing exactly two branches
for each decision node.
 CART recursively partitions the records in the
training data set into subsets of records with
similar values for the target attribute.
 The CART algorithm grows the tree by
conducting for each decision node, an
exhaustive search of all available variables and
all possible splitting values.
 Formula = Rain_Tomorrow ~ min_temp+
max_temp+windspeed9am+windspeed3pm+h
umidity3pm+pressure3pm
DecisionTree
Decision Tree
 To determine if the tree is appropriate or if some of the
branches need to be subjected to pruning we can use
the cptable element of the rpart object:
 The xerror column contains of estimates of cross-
validated prediction error for different numbers of splits
(nsplit).The best tree has three splits.
 Now we can prune back the large initial tree using the
min Cp value.
The error rate of the decision tree after pruning is 16%
K-MEANS CLUSTRING
 k-means clustering is a method of vector
quantization, originally from signal processing, that is
popular for cluster analysis in data mining.
 The goal of K-Means algorithm is to find the best
division of n entities in k groups, so that the total
distance between the group's members and its
corresponding centroid, representative of the group,
is minimized.
 Formally, the goal is to partition the n entities
into k sets Si, i=1, 2, ..., k in order to minimize the
within-cluster sum of squares (WCSS), defined as:
K-means Algorithm Step #1
A typical version of the K-means algorithm runs
in the following steps:
1. Initial cluster seeds are
chosen (at random).
– These represent the
“temporary” means of the
clusters.
– Imagine our random
numbers were 60 for
group 1 and 70 for group
SEED1
SEED
2
K-means Algorithm Step #2
2.The squared
Euclidean distance
from each object to
each cluster is
computed, and each
object is assigned to
the closest cluster.
K-means Algorithm Step #3
3. For each
cluster, the new
centroid is
computed – and
each seed value
is now replaced
by the respective
cluster centroid.
• The new
mean for cluster
1 is 62.3
• The new
mean for cluster
2 is 68.9
K-means Algorithm Step #4 – #6
4.The squared Euclidean distance from an
object to each cluster is computed, and the
object is assigned to the cluster with the
smallest squared Euclidean distance.
5.The cluster centroids are recalculated
based on the new membership assignment.
6. Steps 4 and 5 are repeated until no object
moves clusters.
Applications
 market segmentation
 computer vision
 geostatistics
 astronomy
 Agriculture
 It often is used as a preprocessing step
for other algorithms, for example to find a
starting configuration.
FREQUENCYTABLE
 for k=2
For k=3
PLOTTING CLUSTER
FOR K=2 FOR K=3
Naïve Bayes Classifier
 Computes the conditional a-posterior
probabilities of a categorical class variable
given independent predictor variables
using the Bayes rule.
Naïve Bayes Classifier(Cont..)
 Naive Bayes classifiers assume that the
effect of a variable value on a given class is
independent of the values of other
variable.This assumption is called class
conditional independence.
 An advantage of the naive Bayes classifier
is that it requires a small amount of
training data to estimate the variable
values necessary for classification.
Naïve Bayes Classifier(Cont..)
 Here, we implemented Naïve Bayes on
RainToday and RainTomorrow attributes with
another attributes of MinTemp, MaxTemp,
Temp9am,Temp3pm, Pressure9am,
Pressure3pm.
Naïve Bayes Classifier(Cont..)
 Perform naïve Bayes on categorical data only. Here
in predict model if type is row then the
conditional a-posterior probabilities for each class
are returned.
 Else the class with maximum probability is
returned
Naïve Bayes Classifier(Cont..)
Pred No Yes
No 300 66
Yes 0 0
 Output
 Perform naïve Bayes using Laplace
smoothing. It is technique that used to
smooth categorical data.
 The default (0) value of laplace disables
Laplace smoothing.
Naïve Bayes Classifier(Cont..)
Naïve Bayes Classifier(Cont..)
Pred No Yes
No 258 34
Yes 42 32
Pred No Yes
No 271 38
Yes 29 28
 RainToday  RainTomorrow
 It is a Lazy Learning Algorithm
 Whenever we have a new point to classify , we
find its K nearest neighbors from the training
data
 It Defers the decision to generalize the past
training examples till a new query is encountered
 K-NN uses distance function to calculate the
distance between points from the center
 Our Goal is to specify for which value of K the
weather data is most accurate
K - Nearest Neighbor
 Given a query instance xq to be classified,
 Let x1,x2….xk denote the k instances from
training examples that are nearest to xq
 Return the class that represents the maximum of
the k instances
 For eg: if we take K=5
In this case query Xq
Will be classified as
Negative since 3 of its
Nearest neighbors are classified as negative
K - Nearest Neighbor
K-Nearest Neighbor – Transitional
Conclusions
 For K = 1 we have following Table result & error
rate for rain tomorrow
 For K = 2 we have followingTable result &
error rate for rain tomorrow
 For K = 5 we have following Table result & error rate for
rain tomorrow
 For K = 10 we have following Table result & error
rate for rain tomorrow
K - Nearest Neighbor
K-Nearest Neighbor – Conclusions
and Error Rate
 The error rate changes every time since
the training and the test dataset are not
stable
 The error rate is 21%
Comparison of Algorithms
Accuracy of the following algorithms are:
1. KNN – 79%
2. K-means – 80.5%
3. Decision tree – 84%

More Related Content

PPTX
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
PDF
Machine Learning for Weather Forecasts
PPT
3.5 model based clustering
PDF
Machine learning Lecture 2
PPTX
Decision trees for machine learning
PPTX
Data mining: Classification and prediction
PPT
Classification and prediction
PPTX
Naive bayes
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
Machine Learning for Weather Forecasts
3.5 model based clustering
Machine learning Lecture 2
Decision trees for machine learning
Data mining: Classification and prediction
Classification and prediction
Naive bayes

What's hot (20)

PDF
How Deep Learning Could Predict Weather Events
PPTX
Machine learning and types
PPTX
Random forest
PPT
3. mining frequent patterns
PDF
Decision tree
PPT
4.2 spatial data mining
PPT
Temporal data mining
PDF
Understanding random forests
PDF
Predictive Modelling
PPTX
Lect4 principal component analysis-I
PDF
Machine Learning Strategies for Time Series Prediction
PPTX
Anomaly Detection Technique
PDF
Big Data Analytics Architecture PowerPoint Presentation Slides
PPTX
Exploratory data analysis
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPT
Data warehousing and online analytical processing
PDF
Feature selection
PPTX
Learning from imbalanced data
PPTX
Presentation on "Knowledge acquisition & validation"
PDF
Naive Bayes
How Deep Learning Could Predict Weather Events
Machine learning and types
Random forest
3. mining frequent patterns
Decision tree
4.2 spatial data mining
Temporal data mining
Understanding random forests
Predictive Modelling
Lect4 principal component analysis-I
Machine Learning Strategies for Time Series Prediction
Anomaly Detection Technique
Big Data Analytics Architecture PowerPoint Presentation Slides
Exploratory data analysis
Naïve Bayes Classifier Algorithm.pptx
Data warehousing and online analytical processing
Feature selection
Learning from imbalanced data
Presentation on "Knowledge acquisition & validation"
Naive Bayes
Ad

Viewers also liked (6)

PPTX
Prediction of rainfall using image processing
PPT
Data mining
PPTX
Implementation of Data Mining Techniques for Meteorological Data Analysis
PDF
Weather forecasting
PPT
Forecasting Techniques
PPTX
Data mining
Prediction of rainfall using image processing
Data mining
Implementation of Data Mining Techniques for Meteorological Data Analysis
Weather forecasting
Forecasting Techniques
Data mining
Ad

Similar to Data analysis of weather forecasting (20)

PPTX
Machine learning algorithms
PDF
Di35605610
PPT
3 DM Classification HFCS kilometres .ppt
PDF
Hypothesis on Different Data Mining Algorithms
PDF
Introduction to data mining and machine learning
PPT
3.Unsupervised Learning.ppt presenting machine learning
PPTX
Mathematics online: some common algorithms
PPTX
Csci101 lect10 algorithms_iii
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PDF
Data scientist course in hyderabad
PDF
Best data science training, best data science training institute in hyderabad.
PDF
Data science certification
PDF
data science training in mumbai
PDF
Data science online course
PDF
Data scientist training in bangalore
PDF
data science training in hyderabad
PDF
data science course in chennai
PDF
data science course in delhi
PDF
data science course in delhi
PDF
Best data science training, best data science training institute in Chennai
Machine learning algorithms
Di35605610
3 DM Classification HFCS kilometres .ppt
Hypothesis on Different Data Mining Algorithms
Introduction to data mining and machine learning
3.Unsupervised Learning.ppt presenting machine learning
Mathematics online: some common algorithms
Csci101 lect10 algorithms_iii
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Data scientist course in hyderabad
Best data science training, best data science training institute in hyderabad.
Data science certification
data science training in mumbai
Data science online course
Data scientist training in bangalore
data science training in hyderabad
data science course in chennai
data science course in delhi
data science course in delhi
Best data science training, best data science training institute in Chennai

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Global journeys: estimating international migration
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Lecture1 pattern recognition............
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Introduction to Business Data Analytics.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Global journeys: estimating international migration
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
Lecture1 pattern recognition............
Data_Analytics_and_PowerBI_Presentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Business Data Analytics.
Miokarditis (Inflamasi pada Otot Jantung)
Business Acumen Training GuidePresentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
Reliability_Chapter_ presentation 1221.5784
Moving the Public Sector (Government) to a Digital Adoption
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

Data analysis of weather forecasting

  • 1. DATA ANALYSIS ON WEATHER FORECASTING Prepared by, Trupti Shingala
  • 2. Introduction: Dataset We have used weather forecast dataset having 366 observations from rattle package in R. Used following Independent variables from the dataset: Max_Temperature , Min_Temperature, WindSpeed3pm,WindSpeed9am, Pressure3pm, Humidity9am, Humidity3pm,RainToday, RainTomorrow.
  • 3. Data Clean and Goals  Replaced the missing value with the field mean for numerical data.  Implement various algorithms on the data to help derive conclusion on classification and clustering of data.
  • 4. Algorithms used Classification:  K-nearest neighbors  Naive Bayes  DecisionTree- Rpart Clustering:  K means clustering
  • 5. Classification and RegressionTree  The decision trees produced by CART are strictly binary, containing exactly two branches for each decision node.  CART recursively partitions the records in the training data set into subsets of records with similar values for the target attribute.  The CART algorithm grows the tree by conducting for each decision node, an exhaustive search of all available variables and all possible splitting values.  Formula = Rain_Tomorrow ~ min_temp+ max_temp+windspeed9am+windspeed3pm+h umidity3pm+pressure3pm
  • 7. Decision Tree  To determine if the tree is appropriate or if some of the branches need to be subjected to pruning we can use the cptable element of the rpart object:  The xerror column contains of estimates of cross- validated prediction error for different numbers of splits (nsplit).The best tree has three splits.  Now we can prune back the large initial tree using the min Cp value.
  • 8. The error rate of the decision tree after pruning is 16%
  • 9. K-MEANS CLUSTRING  k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.  The goal of K-Means algorithm is to find the best division of n entities in k groups, so that the total distance between the group's members and its corresponding centroid, representative of the group, is minimized.  Formally, the goal is to partition the n entities into k sets Si, i=1, 2, ..., k in order to minimize the within-cluster sum of squares (WCSS), defined as:
  • 10. K-means Algorithm Step #1 A typical version of the K-means algorithm runs in the following steps: 1. Initial cluster seeds are chosen (at random). – These represent the “temporary” means of the clusters. – Imagine our random numbers were 60 for group 1 and 70 for group SEED1 SEED 2
  • 11. K-means Algorithm Step #2 2.The squared Euclidean distance from each object to each cluster is computed, and each object is assigned to the closest cluster.
  • 12. K-means Algorithm Step #3 3. For each cluster, the new centroid is computed – and each seed value is now replaced by the respective cluster centroid. • The new mean for cluster 1 is 62.3 • The new mean for cluster 2 is 68.9
  • 13. K-means Algorithm Step #4 – #6 4.The squared Euclidean distance from an object to each cluster is computed, and the object is assigned to the cluster with the smallest squared Euclidean distance. 5.The cluster centroids are recalculated based on the new membership assignment. 6. Steps 4 and 5 are repeated until no object moves clusters.
  • 14. Applications  market segmentation  computer vision  geostatistics  astronomy  Agriculture  It often is used as a preprocessing step for other algorithms, for example to find a starting configuration.
  • 18. Naïve Bayes Classifier  Computes the conditional a-posterior probabilities of a categorical class variable given independent predictor variables using the Bayes rule.
  • 19. Naïve Bayes Classifier(Cont..)  Naive Bayes classifiers assume that the effect of a variable value on a given class is independent of the values of other variable.This assumption is called class conditional independence.  An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the variable values necessary for classification.
  • 20. Naïve Bayes Classifier(Cont..)  Here, we implemented Naïve Bayes on RainToday and RainTomorrow attributes with another attributes of MinTemp, MaxTemp, Temp9am,Temp3pm, Pressure9am, Pressure3pm.
  • 21. Naïve Bayes Classifier(Cont..)  Perform naïve Bayes on categorical data only. Here in predict model if type is row then the conditional a-posterior probabilities for each class are returned.  Else the class with maximum probability is returned
  • 22. Naïve Bayes Classifier(Cont..) Pred No Yes No 300 66 Yes 0 0  Output
  • 23.  Perform naïve Bayes using Laplace smoothing. It is technique that used to smooth categorical data.  The default (0) value of laplace disables Laplace smoothing. Naïve Bayes Classifier(Cont..)
  • 24. Naïve Bayes Classifier(Cont..) Pred No Yes No 258 34 Yes 42 32 Pred No Yes No 271 38 Yes 29 28  RainToday  RainTomorrow
  • 25.  It is a Lazy Learning Algorithm  Whenever we have a new point to classify , we find its K nearest neighbors from the training data  It Defers the decision to generalize the past training examples till a new query is encountered  K-NN uses distance function to calculate the distance between points from the center  Our Goal is to specify for which value of K the weather data is most accurate K - Nearest Neighbor
  • 26.  Given a query instance xq to be classified,  Let x1,x2….xk denote the k instances from training examples that are nearest to xq  Return the class that represents the maximum of the k instances  For eg: if we take K=5 In this case query Xq Will be classified as Negative since 3 of its Nearest neighbors are classified as negative K - Nearest Neighbor
  • 27. K-Nearest Neighbor – Transitional Conclusions  For K = 1 we have following Table result & error rate for rain tomorrow  For K = 2 we have followingTable result & error rate for rain tomorrow
  • 28.  For K = 5 we have following Table result & error rate for rain tomorrow  For K = 10 we have following Table result & error rate for rain tomorrow K - Nearest Neighbor
  • 29. K-Nearest Neighbor – Conclusions and Error Rate  The error rate changes every time since the training and the test dataset are not stable  The error rate is 21%
  • 30. Comparison of Algorithms Accuracy of the following algorithms are: 1. KNN – 79% 2. K-means – 80.5% 3. Decision tree – 84%