SlideShare a Scribd company logo
Data Mining  with JDM API Regina Wang
Data Mining Knowledge-Discovery in Databases (KDD) Searching large volumes of data for patterns. The nontrivial extraction of implicit, previously known, and potentially useful information from data. The science of extracting useful information from large data sets or databases. Uses computational techniques from  statistics, machine learning,  and  pattern recognition .
Descriptive Statistics Collect data  Classify data  Summarize data  present data  Make inferences to draw a conclusions --Point and interval estimation --Hypothesis testing --Prediction
Machine Learning Concerned with the development of techniques which allow computers to "learn".  Concerned with the algorithmic complexity of computational implementations. Many inference problems turn out to be NP-hard or harder .
Common Machine Learning Algorithm Supervised learning—prior knowledge Unsupervised learning—statistical regularity of the patterns Semi-supervised learning Reinforcement learning Transduction Learning to learn
Pattern Recognition The act of taking in raw data and taking an action based on the category of the data. Aims to classify data patterns based on prior knowledge or on statistical info.  Based on availability of training set: supervised and unsupervised leanings Two approaches: statistical (decision theory) and syntactic (structural).
Supervised Techniques Classification: --  k -Nearest Neighbors --Naïve Bayes --Classification Trees --Descriminant Analysis --Logistic Regression --Neural Nets
Supervised Techniques Prediction (Estimation): --Regression --Regression Trees -- k -Nearest Neighbors
Unsupervised Techniques Cluster Analysis Principle Components Association Rules Collaborative Filtering
Data-mining tools were traditionally provided in products with vendor-specific interfaces. The Java Data Mining API (JDM) defines a common Java API to interact with data-mining systems. Developed by Java Community Data Mining Expert Group JAVA Data Mining API (JDM)
JDM Current Versions JDM 1.0 (JSR 73) final specification in August, 2004 http:// www.jcp.org/en/jsr/detail?id =73   JDM 2.0 (JSR 247) Early Review http:// www.jcp.org/en/jsr/detail?id =247   JDM is for the Java™ 2 Platform (J2EE™) and (J2SE™)
Data Mining System A typical data-mining system consists of --a data-mining engine  --a repository that persists the data-mining artifacts, such as the models, created in the process.  The actual data is obtained via a database connection, or via a file-system API.
JDM Architectural components Application programming interface (API) Data mining engine (DME)  – or  data mining server  (DMS), provides the infrastructure that offers a set of data mining services to its API clients.  Mining object repository (MOR)  - The DME uses a mining object repository which serves to persist data mining objects
Key JDM API benefit : abstracts out the physical components, tasks, and algorithms to java classes Figure 1. Components of a data-mining system
Building a data-mining model   Decide what you want to learn. Select and prepare your data.  Choose mining tasks and configure the mining algorithms. Build your data-mining model.  Test and refine the models.  Report findings or predict future outcomes.
Data Mining Process Figure 2. Data mining steps.
Usage of JDM API  Using JDM to explore mining object repository (MOR) and find out what models and model building parameters work best. Follow a few simple steps that map the process to JDM interactions.  Build Java Data Mining GUI Application
Figure 3. Top level packages.   Figure 4. Top level interfaces.
Figure 4. Top level interfaces.
Using the JDM API Identify the data  you wish to use to build your model—your  build data —with a URL that points to that data. Specify the type of model  you want to build, and parameters to the build process. Such parameters are termed  build settings  in JDM. such as clustering, classification, or association rules. These tasks are represented by API classes.  Create a logical representation of your data  to select certain attributes of the physical data, and then map those attributes to logical values.
Using the JDM API Specify  the parameters to your data-mining  algorithms   Create a build task , and apply to that task the physical data references and the build settings.  Finally, you  execute the task . The outcome of that execution is your data model. That model will have a  signature —a kind of interface—that describes the possible input attributes for later applying the model to additional data.
Using data model and results Once you've created a model, you can test that model, and then even apply the model to additional data. Building, testing, and applying the model to additional data is an iterative process that, ideally, yields increasingly accurate models.  Those models can then be saved in the MOR, and used to either explain data, or to predict the outcome of new data in relation to your data-mining objective.
JDM Data Connection A JDM connection is represented by the  engine  variable, which is of type javax.datamining.resource.Connection. JDM connections are very similar to JDBC connections, with one connection per thread.  PhysicalDataSetFactory dataSetFactory = (PhysicalDataSetFactory) engine.getFactory("javax.datamining.data.PhysicalDataSet");
JDM Data Connection Build data is referenced via a PhysicalDataSet object, which, in turn, loads the data from a file or a database table, referenced with a URL.  PhysicalDataSet dataSet = pdsFactory.create( "file:///export/data/textFileData.data", true);
Code Example: Building a clustering model // Create the physical representation of the data (1) PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory) dme- Conn.getFactory( “javax.datamining.data.PhysicalDataSet” ); (2) PhysicalDataSet buildData = pdsFactory.create( uri, true ); (3) dmeConn.saveObject( “myBuildData”, buildData, false ); // Create the logical representation of the data from physical data (4) LogicalDataFactory ldFactory = (LogicalDataFactory) dmeConn.getFactory( “ javax.datamining.data.LogicalData” ); (5) LogicalData ld = ldFactory.create( buildData ); (6) dmeConn.saveObject( “myLogicalData”, ld, false ); // Create the settings to build a clustering model (7) ClusteringSettingsFactory csFactory = (ClusteringSettingsFactory) dme- Conn.getFactory( “javax.datamining.clustering.ClusteringSettings”); (8) ClusteringSettings clusteringSettings = csFactory.create(); (9) clusteringSettings.setLogicalDataName( “myLogicalData” ); (10) clusteringSettings.setMaxNumberOfClusters( 20 );
Code Example: Building a clustering model con’t (11) clusteringSettings.setMinClusterCaseCount( 5 ); (12) dmeConn.saveObject( “myClusteringBS”, clusteringSettings, false ); // Create a task to build a clustering model with data and settings (13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory( “ javax.datamining.task.BuildTask” ); (14) BuildTask task = btFactory.create( “myBuildData”, “myClusteringBS”, “ myClusteringModel” ); (15) dmeConn.saveObject( “myClusteringTask”, task, false ); // Execute the task and check the status (16) ExecutionHandle handle = dmeConn.execute( “myClusteringTask” ); (17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done (18) ExecutionStatus status = handle.getLatestStatus(); (19) if( ExecutionState.success.equals( status.getState() ) ) (20) // task completed successfully...
References Java Data Mining Specification http://www.jcp.org/en/jsr/detail?id=73  Mine Your Own Data with the JDM API, Frank Sommers, July 7, 2005 http://www.artima.com/lejava/articles/data_mining.html http://www.stanford.edu/class/cs345a/#handouts

More Related Content

PDF
Classification on multi label dataset using rule mining technique
PDF
Data mining and data warehouse lab manual updated
PDF
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
PDF
Combined mining approach to generate patterns for complex data
PDF
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
PPTX
Data Cleaning Techniques
DOCX
Mc0088 data mining
PPT
Data mining
Classification on multi label dataset using rule mining technique
Data mining and data warehouse lab manual updated
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
Combined mining approach to generate patterns for complex data
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
Data Cleaning Techniques
Mc0088 data mining
Data mining

What's hot (20)

PDF
A statistical data fusion technique in virtual data integration environment
PDF
Recommendation system using bloom filter in mapreduce
PDF
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PDF
Enhancement techniques for data warehouse staging area
PDF
Literature%20 review
PDF
Comparative study of frequent item set in data mining
PDF
A cyber physical stream algorithm for intelligent software defined storage
PPTX
Seminar Presentation
PDF
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
PDF
Application of data mining tools for
PPTX
Data mining concepts
PDF
Bs31267274
PPT
Introduction to Data Mining
PPT
Patni Hibernate
PDF
IRJET- Data Mining - Secure Keyword Manager
PPTX
Protection models
DOCX
Hybrid feature selection using correlation coefficient and particle swarm opt...
PDF
A Survey on Fuzzy Association Rule Mining Methodologies
PDF
Ap26261267
DOCX
knowledge discovery and data mining approach in databases (2)
A statistical data fusion technique in virtual data integration environment
Recommendation system using bloom filter in mapreduce
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
Enhancement techniques for data warehouse staging area
Literature%20 review
Comparative study of frequent item set in data mining
A cyber physical stream algorithm for intelligent software defined storage
Seminar Presentation
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
Application of data mining tools for
Data mining concepts
Bs31267274
Introduction to Data Mining
Patni Hibernate
IRJET- Data Mining - Secure Keyword Manager
Protection models
Hybrid feature selection using correlation coefficient and particle swarm opt...
A Survey on Fuzzy Association Rule Mining Methodologies
Ap26261267
knowledge discovery and data mining approach in databases (2)
Ad

Viewers also liked (6)

PDF
CS 898O : Machine Learning
DOC
компьютерным наукам
PDF
Sans contenu vous êtes nus
DOC
BALANCING BOARD MACHINES
DOC
msword
DOCX
Sample Capstone Projects from 2005
CS 898O : Machine Learning
компьютерным наукам
Sans contenu vous êtes nus
BALANCING BOARD MACHINES
msword
Sample Capstone Projects from 2005
Ad

Similar to Data Mining with JDM API by Regina Wang (4/11) (20)

PDF
Data access
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
PPTX
Business Rule Learning with Interactive Selection of Association Rules - Rule...
PPT
DataFinder concepts and example: General (20100503)
PPT
YDP_API&MS_UNIT_IIIii8iiiiiiiii8iiii.ppt
PPT
YDP_API&MS_UNIT_hiii detail notes to understand api.ppt
KEY
Data Abstraction for Large Web Applications
PPTX
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
PPT
Part1
PDF
Nose Dive into Apache Spark ML
PPT
Knowledge Discovery Using Data Mining
DOC
IT6701-Information management question bank
PDF
Odi case-study-customer-correspondence-dm
PPTX
Spring jdbc dao
PPTX
Spring database - part2
PDF
Chapter6 database connectivity
PDF
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
PPTX
PATTERNS07 - Data Representation in C#
PDF
2008.11560v2.pdf
PPTX
Lecture-6-7.pptx
Data access
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Business Rule Learning with Interactive Selection of Association Rules - Rule...
DataFinder concepts and example: General (20100503)
YDP_API&MS_UNIT_IIIii8iiiiiiiii8iiii.ppt
YDP_API&MS_UNIT_hiii detail notes to understand api.ppt
Data Abstraction for Large Web Applications
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Part1
Nose Dive into Apache Spark ML
Knowledge Discovery Using Data Mining
IT6701-Information management question bank
Odi case-study-customer-correspondence-dm
Spring jdbc dao
Spring database - part2
Chapter6 database connectivity
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
PATTERNS07 - Data Representation in C#
2008.11560v2.pdf
Lecture-6-7.pptx

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Data Mining with JDM API by Regina Wang (4/11)

  • 1. Data Mining with JDM API Regina Wang
  • 2. Data Mining Knowledge-Discovery in Databases (KDD) Searching large volumes of data for patterns. The nontrivial extraction of implicit, previously known, and potentially useful information from data. The science of extracting useful information from large data sets or databases. Uses computational techniques from statistics, machine learning, and pattern recognition .
  • 3. Descriptive Statistics Collect data Classify data Summarize data present data Make inferences to draw a conclusions --Point and interval estimation --Hypothesis testing --Prediction
  • 4. Machine Learning Concerned with the development of techniques which allow computers to "learn". Concerned with the algorithmic complexity of computational implementations. Many inference problems turn out to be NP-hard or harder .
  • 5. Common Machine Learning Algorithm Supervised learning—prior knowledge Unsupervised learning—statistical regularity of the patterns Semi-supervised learning Reinforcement learning Transduction Learning to learn
  • 6. Pattern Recognition The act of taking in raw data and taking an action based on the category of the data. Aims to classify data patterns based on prior knowledge or on statistical info. Based on availability of training set: supervised and unsupervised leanings Two approaches: statistical (decision theory) and syntactic (structural).
  • 7. Supervised Techniques Classification: -- k -Nearest Neighbors --Naïve Bayes --Classification Trees --Descriminant Analysis --Logistic Regression --Neural Nets
  • 8. Supervised Techniques Prediction (Estimation): --Regression --Regression Trees -- k -Nearest Neighbors
  • 9. Unsupervised Techniques Cluster Analysis Principle Components Association Rules Collaborative Filtering
  • 10. Data-mining tools were traditionally provided in products with vendor-specific interfaces. The Java Data Mining API (JDM) defines a common Java API to interact with data-mining systems. Developed by Java Community Data Mining Expert Group JAVA Data Mining API (JDM)
  • 11. JDM Current Versions JDM 1.0 (JSR 73) final specification in August, 2004 http:// www.jcp.org/en/jsr/detail?id =73 JDM 2.0 (JSR 247) Early Review http:// www.jcp.org/en/jsr/detail?id =247 JDM is for the Java™ 2 Platform (J2EE™) and (J2SE™)
  • 12. Data Mining System A typical data-mining system consists of --a data-mining engine --a repository that persists the data-mining artifacts, such as the models, created in the process. The actual data is obtained via a database connection, or via a file-system API.
  • 13. JDM Architectural components Application programming interface (API) Data mining engine (DME) – or data mining server (DMS), provides the infrastructure that offers a set of data mining services to its API clients. Mining object repository (MOR) - The DME uses a mining object repository which serves to persist data mining objects
  • 14. Key JDM API benefit : abstracts out the physical components, tasks, and algorithms to java classes Figure 1. Components of a data-mining system
  • 15. Building a data-mining model Decide what you want to learn. Select and prepare your data. Choose mining tasks and configure the mining algorithms. Build your data-mining model. Test and refine the models. Report findings or predict future outcomes.
  • 16. Data Mining Process Figure 2. Data mining steps.
  • 17. Usage of JDM API Using JDM to explore mining object repository (MOR) and find out what models and model building parameters work best. Follow a few simple steps that map the process to JDM interactions. Build Java Data Mining GUI Application
  • 18. Figure 3. Top level packages. Figure 4. Top level interfaces.
  • 19. Figure 4. Top level interfaces.
  • 20. Using the JDM API Identify the data you wish to use to build your model—your build data —with a URL that points to that data. Specify the type of model you want to build, and parameters to the build process. Such parameters are termed build settings in JDM. such as clustering, classification, or association rules. These tasks are represented by API classes. Create a logical representation of your data to select certain attributes of the physical data, and then map those attributes to logical values.
  • 21. Using the JDM API Specify the parameters to your data-mining algorithms Create a build task , and apply to that task the physical data references and the build settings. Finally, you execute the task . The outcome of that execution is your data model. That model will have a signature —a kind of interface—that describes the possible input attributes for later applying the model to additional data.
  • 22. Using data model and results Once you've created a model, you can test that model, and then even apply the model to additional data. Building, testing, and applying the model to additional data is an iterative process that, ideally, yields increasingly accurate models. Those models can then be saved in the MOR, and used to either explain data, or to predict the outcome of new data in relation to your data-mining objective.
  • 23. JDM Data Connection A JDM connection is represented by the engine variable, which is of type javax.datamining.resource.Connection. JDM connections are very similar to JDBC connections, with one connection per thread. PhysicalDataSetFactory dataSetFactory = (PhysicalDataSetFactory) engine.getFactory("javax.datamining.data.PhysicalDataSet");
  • 24. JDM Data Connection Build data is referenced via a PhysicalDataSet object, which, in turn, loads the data from a file or a database table, referenced with a URL. PhysicalDataSet dataSet = pdsFactory.create( "file:///export/data/textFileData.data", true);
  • 25. Code Example: Building a clustering model // Create the physical representation of the data (1) PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory) dme- Conn.getFactory( “javax.datamining.data.PhysicalDataSet” ); (2) PhysicalDataSet buildData = pdsFactory.create( uri, true ); (3) dmeConn.saveObject( “myBuildData”, buildData, false ); // Create the logical representation of the data from physical data (4) LogicalDataFactory ldFactory = (LogicalDataFactory) dmeConn.getFactory( “ javax.datamining.data.LogicalData” ); (5) LogicalData ld = ldFactory.create( buildData ); (6) dmeConn.saveObject( “myLogicalData”, ld, false ); // Create the settings to build a clustering model (7) ClusteringSettingsFactory csFactory = (ClusteringSettingsFactory) dme- Conn.getFactory( “javax.datamining.clustering.ClusteringSettings”); (8) ClusteringSettings clusteringSettings = csFactory.create(); (9) clusteringSettings.setLogicalDataName( “myLogicalData” ); (10) clusteringSettings.setMaxNumberOfClusters( 20 );
  • 26. Code Example: Building a clustering model con’t (11) clusteringSettings.setMinClusterCaseCount( 5 ); (12) dmeConn.saveObject( “myClusteringBS”, clusteringSettings, false ); // Create a task to build a clustering model with data and settings (13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory( “ javax.datamining.task.BuildTask” ); (14) BuildTask task = btFactory.create( “myBuildData”, “myClusteringBS”, “ myClusteringModel” ); (15) dmeConn.saveObject( “myClusteringTask”, task, false ); // Execute the task and check the status (16) ExecutionHandle handle = dmeConn.execute( “myClusteringTask” ); (17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done (18) ExecutionStatus status = handle.getLatestStatus(); (19) if( ExecutionState.success.equals( status.getState() ) ) (20) // task completed successfully...
  • 27. References Java Data Mining Specification http://www.jcp.org/en/jsr/detail?id=73 Mine Your Own Data with the JDM API, Frank Sommers, July 7, 2005 http://www.artima.com/lejava/articles/data_mining.html http://www.stanford.edu/class/cs345a/#handouts