SlideShare a Scribd company logo
Overview of Data Mining Meeting of WP Data Mining April 28, 2008 Bowo Prasetyo http://www.scribd.com/prazjp http://www.slideshare.net/bowoprasetyo This presentation will probably involve audience discussion, which will create action items.  Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.
Contents What Is Data Mining? Does It Differ To Statistics? Why Uses Data Mining? What Can Data Mining Do? Methods of Data Mining Contoh Kasus - Toserba Mining Environmental Data Conclusion
What Is Data Mining? The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules 1)  . 1) Berry and Linoff,  Data Mining Techniques for Marketing, Sales and Customer Support  (Book), 1997
Does It Differ To Statistics? Data mining is a blend of statistics, artificial intelligence, and database research 16)  . 16) D. Pregibon,  Data Mining: Statistical Computing and Graphics , p. 7-8, 1997 Statistics Artificial Intelligence Database Data Mining
Statistics, AI, Database Statistics Distribution, mean, median, standard deviation Artificial Intelligence (AI) Neural network, fuzzy theory, genetic algorithm, particle swarm optimization Database Relational, object-oriented, spatial, temporal
Why Uses Data Mining? Data explosion Automated data collection Log data of large organizations 2) : 44%    1 terabyte per month 11%    10 terabytes per month World’s digital data on PCs, digital cameras, servers, sensors, etc. 3) : in 2006    161 billion gigabytes  In 2010    988 billion gigabytes (predicted) Large amounts of data, but small amounts of knowledge Data mining to discover the knowledge 2) ESG Research,  New ESG Research Finds Large Organizations Experiencing Explosive Growth in Log Data Collection, Analysis, and Storage , 2007 ( http://www.enterprisestrategygroup.com/_documents/NewsEvent/NewsEvent439.pdf )  3) EMC — IDC Research,  The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010 , 2006 ( http://www.emc.com/about/destination/digital_universe/ )
What Can Data Mining Do? Examples
On Business and Network Security Builds customer profiles based on his/her transactional histories 4) Analyzes corporate credit ratings using public financial statements, such as financial ratios 5) Detects credit card fraud by analyzing customer transaction database 6) Detects network intrusion based on system program behavior such as  sendmail  and  tcpdump 7) 4) G. Adomavicius and A. Tuzhilin,  Using data mining methods to build customer profiles , in Computer magazine p. 74-82, 2001 5) Z. Huang, H. Chen, C. Hsu, W. Chen, S. Wu,  Credit rating analysis with support vector machines and neural networks: a market comparative study , in Journal of Decision Support Systems p. 543-558, 2004 6) T. Fawcett and F. Provost,  Adaptive Fraud Detection , in Journal of Data Mining and Knowledge Discovery p. 291-316, 2004 7) W. Lee and S. J. Stolfo,  Data Mining Approaches for Intrusion Detection , in Proceedings of the 7th USENIX Security Symposium, 1998
On The Web Discovers useful patterns from log files, contents, and links of websites 8) Ranks the web pages on the internet using link structure analysis 9) Personalizes a website based on log files, contents, and profile data 10) Supports on-line recommendation to customers by analyzing e-commerce transaction records 11) 8) R. Cooley, B. Mobasher, J. Srivastava,  Web Mining: Information and Pattern Discovery on the World Wide Web , in Proceedings of 9th International Conference on Tools with Artificial Intelligence (ICTAI) p. 0558, 1997 9) Larry Page, Sergey Brin, R. Motwani, T. Winograd,  The PageRank Citation Ranking: Bringing Order to the Web , 1998 ( http://citeseer.ist.psu.edu/page98pagerank.html )  10) M. Eirinaki and M. Vazirgiannis,  Web mining for web personalization , in ACM Transactions on Internet Technology (TOIT) p. 1- 27, 2003.  11) S. W. Changchien and T. Lu,  Mining association rules procedure to support on-line recommendation by customers and products fragmentation , in Journal of Expert Systems with Applications v. 20-4 p. 325-335, 2001
On Environment Discovers rules in geo-spatial database 12) Analyzes weather impacts on airspace system 13) Discovers interesting patterns on Earth Science variables (soil moisture, temperature, precipitation) along with ecosystem data (Net Primary Production) 14) Finds Ocean Climate Indices based on pressure and temperature data 15) 12)  J. Han, K. Koperski, N. Stefanovic, GeoMiner: a system prototype for spatial data mining, in  Proceedings of ACM SIGMOD international conference on Management of data p. 553 - 556, 1997 13) Z. Nazeri and J. Zhang,  Mining aviation data to understand impacts of severe weather on airspace system performance , in Proceedings of International Conference on Coding and Computing p. 518- 523, 2002.  14) V. Kumar, M. Steinbach, P. Tan, S. Klooster, C. Potter, A. Torregrosa,  Mining Scientific Data: Discovery of Patterns in the Global Climate System , in Proceedings of the Joint Statistical Meetings p. 5--9, 2001 15) M.  Steinbach, P. Tan, V. Kumar, S. Klooster, C. Potter ,  Data Mining for the Discovery of Ocean Climate Indices , in Proceedings of the 5th Workshop on Scientific Data Mining p. 7-16, 2002
Methods in Data Mining Basic Methods
Classification, Clustering, Association Rules Data mining consists of several basic methods: Classification Places items into groups based on a training set of previously labeled items (supervised) Clustering Places items into groups based on some defined distance measure (unsupervised) Association Rules Discovers items that co-occur frequently within a data set and also their rules, such as implication or correlation
Classification Naive Bayesian classifier Spam/Non-spam classification Spam if 17)  http ://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering K-means algorithm 18) Partitions items into  k  clusters Calculates mean of each cluster as centroid Associates each items to the closest centroid using defined distance  Back to 2 until convergence 18) J. A. Hartigan and M. A. Wong,  A k-means clustering algorithm,  in Applied Statistics, 28 (1) p. 100-108, 1979
Association Rules If a customer buys bread and butter, then she will likely buy milk too with 90% confidence Algorithm 19) : Finds frequent itemsets whose  support  >=  minsup Finds interesting rules from frequent itemsets above whose  confidence  >=  minconf 19) R. Agrawal, R. Srikant,  Fast Algorithms for Mining Association Rules , in Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994
Association Rules Apriori algorithm to find frequent itemsets  L  in database  D 19) : Find frequent set  L k −1 Join step C k  is generated by joining  L k −1 with itself  Prune step Any ( k −1)-itemset that is not frequent cannot be a subset of a frequent  k -itemset, hence should be removed ( C k : Candidate itemset of size  k )  ( L k : frequent itemset of size  k  whose  support  >=  minsup )
Association Rules Apriori algorithm to find rules  R  from frequent itemsets  L 19) : For each  l     L  generate  S  = non-empty subsets of  l For each  s      S  generate rule  s     ( l - s ) if  confidence  >=  minconf
Visualization Of Mining Results Problem of mining results Too much results to display Difficult to find important rules Difficult to understand the rules Needs good visualization tools Chart for statistical results Graph (node & edge) for association rules Globe map for geo-spatial results Animation for temporal results Utilizes colors, styles, thickness etc.
Contoh Kasus Aturan Asosiasi di Toserba
Item dan Transaksi Pembelian Pak Joko bulan Januari: beras, minyak goreng, daging sapi gula pasir, minyak goreng, telur ayam beras, gula pasir, minyak goreng, telur ayam gula pasir, telur ayam transaksi item
Frequent Item (Item Sering) “ Sering”: pembelian >= 2 daging sapi = 1 kali    bukan sering support minimum support
n -Length Item ( n -Item) n > 1 2-length item 3-length item
Aturan Asosiasi Kustomer yang membeli beras akan membeli juga minyak goreng. “ jika beras maka minyak goreng" beras => minyak goreng support(minyak goreng & beras) support(beras) = 2/2 = 1 confidence antecedent consequent
Aturan Asosiasi Lengkap
Mining Environmental Data Examples
Explosion in Environmental Data Temperature, humidity, pressure, precipitation, sound, light, shock Weather & rainfall trends, river height & flows, air & water quality, pollution levels, salinity, emissions, FPAR, NPP Earth science, oceanography, meteorology, ecology Sensors, hand-held/wireless devices, remote sensing (satellites), other automated logging devices
Geo-spatial Database Discovers rules in geo-spatial database 12) Given Western Canada, describe the weather patterns Given temperature, precipitation, etc., describe the regions Show the differences in weather patterns between British Columbia and Alberta If a Canadian town is large and is adjacent to large water body, then it is close to the U.S. border, with the possibility of 78% GeoMiner
Earth Science Interesting patterns on Earth Science 14) Regions that are covered by the highly correlated pattern, FPAR-Hi    NPP-Hi Shrubland regions FPAR: Fractional Intercepted Photosynthetically Active Radiation NPP  : Net Primary Production
Earth Science Interesting patterns on Earth Science 14) Two clusters for NPP (land) and two clusters for SST (ocean). The clusters approximate the northern and southern hemispheres, for land and ocean. SST: sea surface temperature
Earth Science Interesting patterns on Earth Science 14) Clusters of ocean near the Philipines (SST) and lands of Eastern Brazil, Southern Africa, and a bit of Australia (NPP) is highly correlated (0.47). In particular, this sea region is highly correlated (0.66), with SOI, which is a climate index related to El Niño, and it is known that parts of Southern Africa and Australia experience droughts related to El Nino.
Conclusion Today’s data repository is huge and collected in enormous speed  Traditional statistical methods are no longer sufficient to analyze data. Data mining is very important to discover knowledge hidden in data Helps decision making in broad range of fields: business, network security, web, environment etc. Good visualization tool is needed to understand mining results easily

More Related Content

PPT
Data mining in agriculture
PDF
Big Data - Gerami
PDF
Real World Application of Big Data In Data Mining Tools
PDF
H044063843
PDF
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
PDF
Introduction to Data Mining
PDF
Ijsrdv1 i2039
Data mining in agriculture
Big Data - Gerami
Real World Application of Big Data In Data Mining Tools
H044063843
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
Introduction to Data Mining
Ijsrdv1 i2039

What's hot (9)

PDF
Analysis of Crime Big Data using MapReduce
PPT
Data mining and knowledge Discovery
PDF
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
PDF
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PPT
Data Mining Overview
PDF
A literature review of modern association rule mining techniques
PPTX
Bigdata AI
PPTX
Data science courses
PDF
Data analytics courses
Analysis of Crime Big Data using MapReduce
Data mining and knowledge Discovery
Data Warehousing and Business Intelligence Project on Smart Agriculture and M...
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
Data Mining Overview
A literature review of modern association rule mining techniques
Bigdata AI
Data science courses
Data analytics courses
Ad

Similar to Overview of Data Mining (20)

PPT
Machine Learning, Data Mining, and
PPTX
Introduction to Data Mining
PPTX
Data warehousing and mining furc
PDF
Data mining
PDF
Dunham - Data Mining.pdf
PDF
Dunham - Data Mining.pdf
PDF
Lect 1 introduction
PPTX
Data Mining in Operating System
PPT
pattern mninng.ppt
PPT
Draws ideas from machine learning/AI, pattern recognition, statistics, and da...
PPT
Data Mining and Knowledge Discovery in Business Databases
PPTX
Lect 1 introduction
PPT
Chapter 01Intro.ppt full explanation used
PPTX
Data mining
PPT
Data warehousing and data mining Chapter 9
PPTX
data.2.pptx
PPT
Dma unit 1
PDF
Data Mining and its detail processes with steps
PPT
Introduction to data warehouse
PPT
introduction to data mining applications
Machine Learning, Data Mining, and
Introduction to Data Mining
Data warehousing and mining furc
Data mining
Dunham - Data Mining.pdf
Dunham - Data Mining.pdf
Lect 1 introduction
Data Mining in Operating System
pattern mninng.ppt
Draws ideas from machine learning/AI, pattern recognition, statistics, and da...
Data Mining and Knowledge Discovery in Business Databases
Lect 1 introduction
Chapter 01Intro.ppt full explanation used
Data mining
Data warehousing and data mining Chapter 9
data.2.pptx
Dma unit 1
Data Mining and its detail processes with steps
Introduction to data warehouse
introduction to data mining applications
Ad

More from Bowo Prasetyo (10)

ODP
e-Voting Application using Barcode Vtoken
ODP
e-Voting Application using Internal Vtoken
ODP
Konsep Baru Pemodelan Database dengan Anchor Modeling
ODP
Konsep Baru Pemodelan Database dengan Anchor Modeling
ODP
Konsep Baru Pemodelan Database dengan Anchor Modeling
ODP
Mengamankan Aplikasi Java EE 6
ODP
Mengenal Rapidminer
ODP
Mengamankan Aplikasi Java EE 6
ODP
Nutch dan Solr
ODP
Mengamankan Aplikasi Java EE 6
e-Voting Application using Barcode Vtoken
e-Voting Application using Internal Vtoken
Konsep Baru Pemodelan Database dengan Anchor Modeling
Konsep Baru Pemodelan Database dengan Anchor Modeling
Konsep Baru Pemodelan Database dengan Anchor Modeling
Mengamankan Aplikasi Java EE 6
Mengenal Rapidminer
Mengamankan Aplikasi Java EE 6
Nutch dan Solr
Mengamankan Aplikasi Java EE 6

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PDF
Getting Started with Data Integration: FME Form 101
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mushroom cultivation and it's methods.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A comparative analysis of optical character recognition models for extracting...
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
Getting Started with Data Integration: FME Form 101
Accuracy of neural networks in brain wave diagnosis of schizophrenia
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Mushroom cultivation and it's methods.pdf
TLE Review Electricity (Electricity).pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf

Overview of Data Mining

  • 1. Overview of Data Mining Meeting of WP Data Mining April 28, 2008 Bowo Prasetyo http://www.scribd.com/prazjp http://www.slideshare.net/bowoprasetyo This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.
  • 2. Contents What Is Data Mining? Does It Differ To Statistics? Why Uses Data Mining? What Can Data Mining Do? Methods of Data Mining Contoh Kasus - Toserba Mining Environmental Data Conclusion
  • 3. What Is Data Mining? The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules 1) . 1) Berry and Linoff, Data Mining Techniques for Marketing, Sales and Customer Support (Book), 1997
  • 4. Does It Differ To Statistics? Data mining is a blend of statistics, artificial intelligence, and database research 16) . 16) D. Pregibon, Data Mining: Statistical Computing and Graphics , p. 7-8, 1997 Statistics Artificial Intelligence Database Data Mining
  • 5. Statistics, AI, Database Statistics Distribution, mean, median, standard deviation Artificial Intelligence (AI) Neural network, fuzzy theory, genetic algorithm, particle swarm optimization Database Relational, object-oriented, spatial, temporal
  • 6. Why Uses Data Mining? Data explosion Automated data collection Log data of large organizations 2) : 44%  1 terabyte per month 11%  10 terabytes per month World’s digital data on PCs, digital cameras, servers, sensors, etc. 3) : in 2006  161 billion gigabytes In 2010  988 billion gigabytes (predicted) Large amounts of data, but small amounts of knowledge Data mining to discover the knowledge 2) ESG Research, New ESG Research Finds Large Organizations Experiencing Explosive Growth in Log Data Collection, Analysis, and Storage , 2007 ( http://www.enterprisestrategygroup.com/_documents/NewsEvent/NewsEvent439.pdf ) 3) EMC — IDC Research, The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010 , 2006 ( http://www.emc.com/about/destination/digital_universe/ )
  • 7. What Can Data Mining Do? Examples
  • 8. On Business and Network Security Builds customer profiles based on his/her transactional histories 4) Analyzes corporate credit ratings using public financial statements, such as financial ratios 5) Detects credit card fraud by analyzing customer transaction database 6) Detects network intrusion based on system program behavior such as sendmail and tcpdump 7) 4) G. Adomavicius and A. Tuzhilin, Using data mining methods to build customer profiles , in Computer magazine p. 74-82, 2001 5) Z. Huang, H. Chen, C. Hsu, W. Chen, S. Wu, Credit rating analysis with support vector machines and neural networks: a market comparative study , in Journal of Decision Support Systems p. 543-558, 2004 6) T. Fawcett and F. Provost, Adaptive Fraud Detection , in Journal of Data Mining and Knowledge Discovery p. 291-316, 2004 7) W. Lee and S. J. Stolfo, Data Mining Approaches for Intrusion Detection , in Proceedings of the 7th USENIX Security Symposium, 1998
  • 9. On The Web Discovers useful patterns from log files, contents, and links of websites 8) Ranks the web pages on the internet using link structure analysis 9) Personalizes a website based on log files, contents, and profile data 10) Supports on-line recommendation to customers by analyzing e-commerce transaction records 11) 8) R. Cooley, B. Mobasher, J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web , in Proceedings of 9th International Conference on Tools with Artificial Intelligence (ICTAI) p. 0558, 1997 9) Larry Page, Sergey Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web , 1998 ( http://citeseer.ist.psu.edu/page98pagerank.html ) 10) M. Eirinaki and M. Vazirgiannis, Web mining for web personalization , in ACM Transactions on Internet Technology (TOIT) p. 1- 27, 2003. 11) S. W. Changchien and T. Lu, Mining association rules procedure to support on-line recommendation by customers and products fragmentation , in Journal of Expert Systems with Applications v. 20-4 p. 325-335, 2001
  • 10. On Environment Discovers rules in geo-spatial database 12) Analyzes weather impacts on airspace system 13) Discovers interesting patterns on Earth Science variables (soil moisture, temperature, precipitation) along with ecosystem data (Net Primary Production) 14) Finds Ocean Climate Indices based on pressure and temperature data 15) 12) J. Han, K. Koperski, N. Stefanovic, GeoMiner: a system prototype for spatial data mining, in Proceedings of ACM SIGMOD international conference on Management of data p. 553 - 556, 1997 13) Z. Nazeri and J. Zhang, Mining aviation data to understand impacts of severe weather on airspace system performance , in Proceedings of International Conference on Coding and Computing p. 518- 523, 2002. 14) V. Kumar, M. Steinbach, P. Tan, S. Klooster, C. Potter, A. Torregrosa, Mining Scientific Data: Discovery of Patterns in the Global Climate System , in Proceedings of the Joint Statistical Meetings p. 5--9, 2001 15) M. Steinbach, P. Tan, V. Kumar, S. Klooster, C. Potter , Data Mining for the Discovery of Ocean Climate Indices , in Proceedings of the 5th Workshop on Scientific Data Mining p. 7-16, 2002
  • 11. Methods in Data Mining Basic Methods
  • 12. Classification, Clustering, Association Rules Data mining consists of several basic methods: Classification Places items into groups based on a training set of previously labeled items (supervised) Clustering Places items into groups based on some defined distance measure (unsupervised) Association Rules Discovers items that co-occur frequently within a data set and also their rules, such as implication or correlation
  • 13. Classification Naive Bayesian classifier Spam/Non-spam classification Spam if 17) http ://en.wikipedia.org/wiki/Naive_Bayes_classifier
  • 14. Clustering K-means algorithm 18) Partitions items into k clusters Calculates mean of each cluster as centroid Associates each items to the closest centroid using defined distance Back to 2 until convergence 18) J. A. Hartigan and M. A. Wong, A k-means clustering algorithm, in Applied Statistics, 28 (1) p. 100-108, 1979
  • 15. Association Rules If a customer buys bread and butter, then she will likely buy milk too with 90% confidence Algorithm 19) : Finds frequent itemsets whose support >= minsup Finds interesting rules from frequent itemsets above whose confidence >= minconf 19) R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules , in Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994
  • 16. Association Rules Apriori algorithm to find frequent itemsets L in database D 19) : Find frequent set L k −1 Join step C k is generated by joining L k −1 with itself Prune step Any ( k −1)-itemset that is not frequent cannot be a subset of a frequent k -itemset, hence should be removed ( C k : Candidate itemset of size k ) ( L k : frequent itemset of size k whose support >= minsup )
  • 17. Association Rules Apriori algorithm to find rules R from frequent itemsets L 19) : For each l  L generate S = non-empty subsets of l For each s  S generate rule s  ( l - s ) if confidence >= minconf
  • 18. Visualization Of Mining Results Problem of mining results Too much results to display Difficult to find important rules Difficult to understand the rules Needs good visualization tools Chart for statistical results Graph (node & edge) for association rules Globe map for geo-spatial results Animation for temporal results Utilizes colors, styles, thickness etc.
  • 19. Contoh Kasus Aturan Asosiasi di Toserba
  • 20. Item dan Transaksi Pembelian Pak Joko bulan Januari: beras, minyak goreng, daging sapi gula pasir, minyak goreng, telur ayam beras, gula pasir, minyak goreng, telur ayam gula pasir, telur ayam transaksi item
  • 21. Frequent Item (Item Sering) “ Sering”: pembelian >= 2 daging sapi = 1 kali  bukan sering support minimum support
  • 22. n -Length Item ( n -Item) n > 1 2-length item 3-length item
  • 23. Aturan Asosiasi Kustomer yang membeli beras akan membeli juga minyak goreng. “ jika beras maka minyak goreng" beras => minyak goreng support(minyak goreng & beras) support(beras) = 2/2 = 1 confidence antecedent consequent
  • 26. Explosion in Environmental Data Temperature, humidity, pressure, precipitation, sound, light, shock Weather & rainfall trends, river height & flows, air & water quality, pollution levels, salinity, emissions, FPAR, NPP Earth science, oceanography, meteorology, ecology Sensors, hand-held/wireless devices, remote sensing (satellites), other automated logging devices
  • 27. Geo-spatial Database Discovers rules in geo-spatial database 12) Given Western Canada, describe the weather patterns Given temperature, precipitation, etc., describe the regions Show the differences in weather patterns between British Columbia and Alberta If a Canadian town is large and is adjacent to large water body, then it is close to the U.S. border, with the possibility of 78% GeoMiner
  • 28. Earth Science Interesting patterns on Earth Science 14) Regions that are covered by the highly correlated pattern, FPAR-Hi  NPP-Hi Shrubland regions FPAR: Fractional Intercepted Photosynthetically Active Radiation NPP : Net Primary Production
  • 29. Earth Science Interesting patterns on Earth Science 14) Two clusters for NPP (land) and two clusters for SST (ocean). The clusters approximate the northern and southern hemispheres, for land and ocean. SST: sea surface temperature
  • 30. Earth Science Interesting patterns on Earth Science 14) Clusters of ocean near the Philipines (SST) and lands of Eastern Brazil, Southern Africa, and a bit of Australia (NPP) is highly correlated (0.47). In particular, this sea region is highly correlated (0.66), with SOI, which is a climate index related to El Niño, and it is known that parts of Southern Africa and Australia experience droughts related to El Nino.
  • 31. Conclusion Today’s data repository is huge and collected in enormous speed Traditional statistical methods are no longer sufficient to analyze data. Data mining is very important to discover knowledge hidden in data Helps decision making in broad range of fields: business, network security, web, environment etc. Good visualization tool is needed to understand mining results easily