SlideShare a Scribd company logo
An Efficient Two-Step Method for
Classification of Spatial Data
Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic
Presented on : Spatial Data Handling (SDH’ 98)
Reviewed by: Abhishek Agrawal
Introduction
• In spatial databases very large amounts of Spatial Data have been collected used in various
applications ranging from remote sensing to geographical information systems (GIS), computer
cartography, environmental assessment and planning etc.
• These spatial databases contains many hidden and interesting implicit spatial relations and
patterns which are extracted which are not explicitly stored in such databases.
• One of the spatial data mining techniques is the classification of the spatial objects stored in the
spatial databases where the objective is to label different spatial objects by identifying set of rules
that can describe the partition.
Classification Approach : Spatial Decision Tree
❖ In this paper[1], authors have used decision tree to classify spatial objects based on
➢ Non-Spatial properties of the classified objects (Traditional)
➢ Spatial relations of the classified objects to other objects in the database
❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to
thematic maps and and spatial relationships to other objects in the database.
❖ With the new approach of spatial classification using decision tree, authors provided the
experimental results of both real and synthetic data to compare the performance and quality of the
results with other existing methods in the same problem space.
Business Problem: Label the local business units such as shopping malls
or stores based on their business profit status based on the influence of
their trade area.
Problem Definition
Problem Definition Continue..
Data Mining Problem: Classification of spatial objects such as shopping
malls or stores defined by its attributes, that belong to two or different
classes Y and N which are selected based on attribute high_profit with two
values Y for “yes” and N for “no”.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
➢ non-spatial attributes of the
thematic map
State of the Art
● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars
and galaxies. They used low-level image processing system FOCAS to select and generate basic
attributes. The proposed method deals with image databases and is tailored for the astronomical
attributes which is not suitable for vector data format (GIS Database) .
● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood
graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the
neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search
space.
● Ng and Yu[4] described a method for the extraction of strong, common and discriminating
characteristics of clusters based on the thematic map. They have not extended the result
characteristics of thematic map to construct decision trees.
Classification Algorithm
Building a decision tree to classify spatial object based on spatial predicates, functions and
thematic maps.
Input :
1. Spatial Database containing:
a. classified objects Oc
b. other spatial objects with non-spatial attributes
2. Geo-mining query specifying:
a. objects to be used, predictive attributes, predicates and functions
b. attribute, predicate or function used as a class label
Output :
Binary Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step1.a : Define MBR(Minimum Bounding Rect.)
using data distribution and confidence
level as threshold.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step 2.a : Find coarse description for the sample to
list the spatial attributes, functions etc.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
Step 2.b : Generalize the predicates using concept
hierarchies
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : For every object s in the sample two nearest neighbours are found,
where one neighbour belongs to the same class(Y/N) as object s (nearest hit)
and other neighbour belongs to a class different than s (nearest miss)
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : Give weights to the predicate based on neighbourhood predicates:
➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑
➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑
Now based on weight > threshold, we select the relevant predicates
Method: Spatial Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Now for the shape of the buffer, different criteria
may be used. The buffers may be based on
rings or customer penetration polygons.
• The rings have some advantages:
1. ease of use,
2. no need to determine trade area based on
customer data
1. easy comparison between sites
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Buffers represents area that have an impact
on class label attribute of classified objects.
• The size of buffer is fixed by finding for all
relevant non-spatial aggregate attributes,
the size of the buffer Xmax where the
information gain for the aggregated attribute
is maximum.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 5 : Build Decision Tree
Method: Spatial Decision Tree
Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
Complexity Analysis
Complexity Analysis:
Results & Performance Evaluation
• Experiments were performed on synthetic data merge with TIGER U.S. census data for
washington state.
• With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically
increased when relevance analysis was used.
Conclusion and Future Directions
• Classification of geographical objects enables researcher to explore
interesting relations between spatial and non-spatial data.
• The algorithm performs less costly, approximate spatial computations,
relevance analyses for producing smaller and more accurate decision trees.
• The pre-computed spatial indexes can be stored as part of regular spatial
query to find neighbourhood attributes.
• Authors plan to perform experiments using aggregate values for thematic
maps and by varying distance for close_to spatial predicates.
• Integrate with their spatial data mining prototype GeoMiner
References
[1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for
classification of spatial data." proceedings of International Symposium on Spatial Data Handling
(SDH’98). 1998.
[2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and
cataloging of sky surveys." Advances in knowledge discovery and data mining. American
Association for Artificial Intelligence, 1996.
[3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach."
Advances in spatial databases. Springer Berlin Heidelberg, 1997.
[4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters
from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.

More Related Content

PPTX
K means clustering
PDF
PPTX
Cluster
PPT
Clustering
PPT
3.1 clustering
PPT
Capter10 cluster basic
PDF
Cluster analysis
PPTX
Fuzzy Clustering(C-means, K-means)
K means clustering
Cluster
Clustering
3.1 clustering
Capter10 cluster basic
Cluster analysis
Fuzzy Clustering(C-means, K-means)

What's hot (11)

PPT
My8clst
PPT
What is cluster analysis
PPTX
Clusters techniques
PDF
Data clustering
PDF
Cluster Analysis
PPTX
Cluster Analysis Introduction
PPT
Dataa miining
PPT
Lect4
PPT
Machine Learning Project
PDF
Cluster Analysis for Dummies
PDF
K means Clustering
My8clst
What is cluster analysis
Clusters techniques
Data clustering
Cluster Analysis
Cluster Analysis Introduction
Dataa miining
Lect4
Machine Learning Project
Cluster Analysis for Dummies
K means Clustering
Ad

Viewers also liked (18)

PPSX
Classification Using Decision tree
PDF
Svm implementation for Health Data
PPTX
Classification ANN
PDF
Support Vector Machines
PDF
Customer Centric Data Mining
PDF
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
PDF
Lecture12 - SVM
PPTX
Artificial Neural Network
PPT
artificial neural network
PPT
Artificial Intelligence: Artificial Neural Networks
PPTX
Decision tree
PPT
Back propagation
PPT
Support Vector Machines
PDF
Decision tree
PPTX
Neural network & its applications
PDF
Artificial neural networks
PPTX
Artificial neural network
Classification Using Decision tree
Svm implementation for Health Data
Classification ANN
Support Vector Machines
Customer Centric Data Mining
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Lecture12 - SVM
Artificial Neural Network
artificial neural network
Artificial Intelligence: Artificial Neural Networks
Decision tree
Back propagation
Support Vector Machines
Decision tree
Neural network & its applications
Artificial neural networks
Artificial neural network
Ad

Similar to Two-step Classification method for Spatial Decision Tree (20)

PDF
17 manjula aakunuri final_paper--185-190
PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
PPT
4.2 spatial data mining
PDF
unitiv-spacialdataanalysis-200423132043.pdf
PPTX
TYBSC IT PGIS Unit IV Spacial Data Analysis
PDF
G046024851
PDF
[IJET-V1I3P9] Authors :Velu.S, Baskar.K, Kumaresan.A, Suruthi.K
PPTX
Automated features extraction from satellite images.
DOCX
ContentsPreface vii1 Introduction 11.1 What .docx
PDF
A_Survey_Paper_on_Image_Classification_and_Methods.pdf
PDF
International Journal of Engineering Research and Development
PDF
CLUSTERING HYPERSPECTRAL DATA
PDF
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
PDF
Introduction to Big Data Science
PDF
IRJET-Scaling Distributed Associative Classifier using Big Data
PDF
Feature extraction based retrieval of
PDF
IRJET- Agricultural Crop Classification Models in Data Mining Techniques
PDF
Aggregation of data by using top k spatial query preferences
PDF
Multispectral Image Analysis Using Random Forest
PDF
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
17 manjula aakunuri final_paper--185-190
Feature Subset Selection for High Dimensional Data using Clustering Techniques
4.2 spatial data mining
unitiv-spacialdataanalysis-200423132043.pdf
TYBSC IT PGIS Unit IV Spacial Data Analysis
G046024851
[IJET-V1I3P9] Authors :Velu.S, Baskar.K, Kumaresan.A, Suruthi.K
Automated features extraction from satellite images.
ContentsPreface vii1 Introduction 11.1 What .docx
A_Survey_Paper_on_Image_Classification_and_Methods.pdf
International Journal of Engineering Research and Development
CLUSTERING HYPERSPECTRAL DATA
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Introduction to Big Data Science
IRJET-Scaling Distributed Associative Classifier using Big Data
Feature extraction based retrieval of
IRJET- Agricultural Crop Classification Models in Data Mining Techniques
Aggregation of data by using top k spatial query preferences
Multispectral Image Analysis Using Random Forest
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...

Recently uploaded (20)

PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Empowerment Technology for Senior High School Guide
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
Lesson notes of climatology university.
PPTX
Cell Types and Its function , kingdom of life
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Hazard Identification & Risk Assessment .pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Introduction to Building Materials
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
IGGE1 Understanding the Self1234567891011
PDF
advance database management system book.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Empowerment Technology for Senior High School Guide
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Lesson notes of climatology university.
Cell Types and Its function , kingdom of life
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Unit 4 Skeletal System.ppt.pptxopresentatiom
History, Philosophy and sociology of education (1).pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Hazard Identification & Risk Assessment .pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Introduction to Building Materials
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
1_English_Language_Set_2.pdf probationary
Orientation - ARALprogram of Deped to the Parents.pptx
IGGE1 Understanding the Self1234567891011
advance database management system book.pdf

Two-step Classification method for Spatial Decision Tree

  • 1. An Efficient Two-Step Method for Classification of Spatial Data Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic Presented on : Spatial Data Handling (SDH’ 98) Reviewed by: Abhishek Agrawal
  • 2. Introduction • In spatial databases very large amounts of Spatial Data have been collected used in various applications ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning etc. • These spatial databases contains many hidden and interesting implicit spatial relations and patterns which are extracted which are not explicitly stored in such databases. • One of the spatial data mining techniques is the classification of the spatial objects stored in the spatial databases where the objective is to label different spatial objects by identifying set of rules that can describe the partition.
  • 3. Classification Approach : Spatial Decision Tree ❖ In this paper[1], authors have used decision tree to classify spatial objects based on ➢ Non-Spatial properties of the classified objects (Traditional) ➢ Spatial relations of the classified objects to other objects in the database ❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to thematic maps and and spatial relationships to other objects in the database. ❖ With the new approach of spatial classification using decision tree, authors provided the experimental results of both real and synthetic data to compare the performance and quality of the results with other existing methods in the same problem space.
  • 4. Business Problem: Label the local business units such as shopping malls or stores based on their business profit status based on the influence of their trade area. Problem Definition
  • 5. Problem Definition Continue.. Data Mining Problem: Classification of spatial objects such as shopping malls or stores defined by its attributes, that belong to two or different classes Y and N which are selected based on attribute high_profit with two values Y for “yes” and N for “no”.
  • 6. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N.
  • 7. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi
  • 8. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi ➢ non-spatial attributes of the thematic map
  • 9. State of the Art ● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars and galaxies. They used low-level image processing system FOCAS to select and generate basic attributes. The proposed method deals with image databases and is tailored for the astronomical attributes which is not suitable for vector data format (GIS Database) . ● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search space. ● Ng and Yu[4] described a method for the extraction of strong, common and discriminating characteristics of clusters based on the thematic map. They have not extended the result characteristics of thematic map to construct decision trees.
  • 10. Classification Algorithm Building a decision tree to classify spatial object based on spatial predicates, functions and thematic maps. Input : 1. Spatial Database containing: a. classified objects Oc b. other spatial objects with non-spatial attributes 2. Geo-mining query specifying: a. objects to be used, predictive attributes, predicates and functions b. attribute, predicate or function used as a class label Output : Binary Decision Tree
  • 11. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 12. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description
  • 13. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step1.a : Define MBR(Minimum Bounding Rect.) using data distribution and confidence level as threshold.
  • 14. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step 2.a : Find coarse description for the sample to list the spatial attributes, functions etc.
  • 15. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies Step 2.b : Generalize the predicates using concept hierarchies
  • 16. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : For every object s in the sample two nearest neighbours are found, where one neighbour belongs to the same class(Y/N) as object s (nearest hit) and other neighbour belongs to a class different than s (nearest miss)
  • 17. 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : Give weights to the predicate based on neighbourhood predicates: ➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑ ➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑ Now based on weight > threshold, we select the relevant predicates Method: Spatial Decision Tree
  • 18. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 19. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Now for the shape of the buffer, different criteria may be used. The buffers may be based on rings or customer penetration polygons. • The rings have some advantages: 1. ease of use, 2. no need to determine trade area based on customer data 1. easy comparison between sites
  • 20. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Buffers represents area that have an impact on class label attribute of classified objects. • The size of buffer is fixed by finding for all relevant non-spatial aggregate attributes, the size of the buffer Xmax where the information gain for the aggregated attribute is maximum.
  • 21. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 22. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 23. Method: Spatial Decision Tree Step 5 : Build Decision Tree
  • 24. Method: Spatial Decision Tree Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
  • 26. Results & Performance Evaluation • Experiments were performed on synthetic data merge with TIGER U.S. census data for washington state. • With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically increased when relevance analysis was used.
  • 27. Conclusion and Future Directions • Classification of geographical objects enables researcher to explore interesting relations between spatial and non-spatial data. • The algorithm performs less costly, approximate spatial computations, relevance analyses for producing smaller and more accurate decision trees. • The pre-computed spatial indexes can be stored as part of regular spatial query to find neighbourhood attributes. • Authors plan to perform experiments using aggregate values for thematic maps and by varying distance for close_to spatial predicates. • Integrate with their spatial data mining prototype GeoMiner
  • 28. References [1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for classification of spatial data." proceedings of International Symposium on Spatial Data Handling (SDH’98). 1998. [2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and cataloging of sky surveys." Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, 1996. [3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach." Advances in spatial databases. Springer Berlin Heidelberg, 1997. [4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.