Clementine Tutorial
This tutorial will introduce you to the Clementine toolkit for data mining and show you how to get started with your own data mining project.
The first part provides a tour of the workspace, including an update of what's new in this version of Clementine.  The second part is a step-by-step guide to data mining in Clementine. All of the files shown in the examples are installed with Clementine so that you can follow along.
Clementine uses a visual approach to data mining that provides a tangible way to work with data.  Each process in Clementine is represented by an icon, or  node , that you connect to form a  stream  representing the flow of data through a variety of processes.
 
Working in Clementine is essentially like using a visual metaphor to describe the world of data, statistics, and complex algorithms.
Although it may take a minute to shift into this paradigm, you will soon find that Clementine's simplicity-of-use is exceedingly powerful. Let's take a closer look.
To start Clementine:    From the Windows Start menu choose:    Programs     Clementine
 
When you first start Clementine, the workspace opens in the default view. The tools here are used to help you create a visual representation of data mining operations.
 
First, the area in the middle is called the  stream canvas . This is the main area you will use to work in Clementine.
 
Most of the data and modeling tools in Clementine reside in  palettes , the area below the stream canvas.
 
Each tab contains groups of nodes that are a graphical representation of data mining tasks, such as accessing and filtering data, creating graphs, and building models. To add nodes to the canvas, double-click icons from the node palettes or drag and drop them onto the canvas. You then connect them to create a  stream , representing the flow of data.
You will learn more about building streams later in this tutorial. You can jump ahead now using the Contents button below.
On the top right side of the window are the output and object  managers . These tabs are used to view and manage a variety of Clementine objects.
 
The Streams tab contains all streams open in the current session. You can save and close streams as well as add them to a project.
 
The Outputs tab contains a variety of files produced by stream operations in Clementine. You can display, rename, and close the tables, graphs, and reports listed here
 
The Models tab is a powerful tool that contains all generated models (models that have been built in Clementine) for a session. Models can be examined closely, added to the stream, exported, or annotated.
 
Note : The Models tab replaces the Generated Models tab from earlier versions of Clementine.
On the bottom right side of the window is the  projects  tool, used to create and manage data mining projects. There are two ways to view projects you create in Clementine--Classes view and CRISP-DM view.
 
The CRISP-DM tab provides a way to organize projects according to the Cross-Industry Standard Process for Data Mining, an industry-proven, nonproprietary methodology. For both experienced and first-time data miners, using the CRISP-DM tool will help you to better organize and communicate your efforts.
 
The Classes tab provides a way to organize your work in Clementine categorically--by the types of objects you create. This view is useful when taking inventory of data, streams, models, etc.
 
As a data mining application, Clementine offers a strategic approach to finding useful relationships in large data sets. In contrast to more traditional statistical methods, you do not necessarily need to know what you are looking for when you start. You can explore your data, fitting different models and investigating different relationships, until you find useful information.
This section provides: An overview of  the types of data-mining problems  Clementine can help solve.  A  hands-on demonstration  of building streams, deriving fields, using graphs, and modeling in Clementine.
A wide variety of organisations use Clementine to help them mine vast repositories of data. Following is a sample of the types of problems data mining can help solve.
Public sector Governments around the world use data mining to explore massive data stores, improve citizen relationships, detect occurrences of fraud such as money laundering and tax evasion, detect crime and terrorist patterns, and enhance the expanding realm of e-government
 
CRM Customer relationship management can be improved thanks to smart classification of customer types and accurate predictions of churn. Clementine has successfully helped businesses attract and retain the most valuable customers in a variety of industries.
 
Web mining With powerful sequencing and prediction algorithms, Clementine contains the necessary tools to discover exactly what guests do at a Web site and deliver exactly the products or information they desire. From data preparation to modeling, the entire data-mining process can be managed inside of Clementine.
 
Drug discovery and bioinformatics Data mining aids both pharmaceutical and genomics research by analyzing the vast data stores resulting from increased lab automation. Clementine's clustering and classification models help generate leads from compound libraries while sequence detection aids the discovery of patterns.
 
Clementine provides templates for many of these data-mining applications. Clementine Application Templates, also known as CATs, are available for the following types of activities: Web-mining  Fraud detection  Analytical CRM  Telcommunications analytical CRM  Microarray analysis  Crime detection and prevention
Let's get started learning how Clementine can help you conduct your own data mining project. This section of the guide will show you how to build and execute simple streams using sample drug demonstration files that are included with Clementine. You will learn how to work with data in the various phases of data mining, including: Visualization , which helps you gain an overall picture of your data. You can create plots and charts to explore relationships among the fields in your data set and generate hypotheses to explore during modeling.  Manipulation , which lets you clean and prepare the data for modeling. You can sort or aggregate data, filter out fields, discard or replace missing values, and derive new fields.  Modeling , which gives you the broadest range of insight into the relationships among data fields. Models perform a variety of tasks such as predict outcomes, detect sequences, and group similarities. These help your organization grow, streamline processes, detect fraud, and retain the most valuable customers.
For this section, imagine that you are a medical researcher compiling data for a study.  You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications.  Part of your job is to use data mining to find out which drug might be appropriate for a future patient with the same illness.
The data fields used in this demo are:  Age  (Number)  Sex  M or F  BP Blood pressure: HIGH, NORMAL, or LOW  Cholesterol Blood cholesterol: NORMAL or HIGH  Na  Blood sodium concentration  K  Blood potassium concentration  Drug  Prescription drug to which a patient responded
The first step is to load the data file using a  Variable File node . You can add a Variable File node from the palettes--either click the  Sources  tab to find the node or use the  Favorites  tab, which includes this node by default. Next, double-click the newly placed node to open its dialog box.
 
Click the button just to the right of the File box marked with ellipses (...). This opens a dialog box for browsing to the directory in which Clementine is installed on your computer (or server). Open the  demos  directory and select the file called  DRUG1n .
 
Select  Read field names from file  and notice the fields and values that have just been loaded into the dialog box. Before clicking  OK  to close the dialog box, take a moment to look at the data using the other tabs on the Source node.
 
Click the  Data  tab to override and change  storage  for a field. Note that storage is different than  type , or usage of the data field.
 
The  Filter  tab can be used to remove any fields from the data that is brought into Clementine. Clicking on a field's arrow will mark it with a red X and filter it out. For this tutorial, though, we want to keep all fields.
 
The  Types  tab helps you learn more about the type of fields in your data. You can also choose  Read Values  to view the actual values for each field based on the selections that you make from the  Values  column. This process is known as  instantiation .
 
Now that you have loaded the data file, you may want to glance at the values for some of the records.  One way to do this is by building a stream that includes a Table node. To place a Table node in the stream, either double-click the icon in the palette or drag and drop it on to the canvas.
 
Note : Double-clicking a node from the palette will automatically connect it to the selected node in the stream canvas. However, you can not connect to terminal nodes like tables and graphs.
Next, if the nodes are not already connected, you can use your middle mouse button to connect the Source node to the Table node. To simulate a middle mouse button, click the Alt key while using the mouse.
 
Now that you have built a stream, you must execute it in order to view its output. Click the green arrow button on the toolbar to execute the stream and view an output table showing all of the records in the data file.
 
 

More Related Content

PPTX
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
DOCX
Topic 4 intro spss_stata 30032012 sy_srini
DOCX
Ms Access
PDF
Oracle Certification 1Z0-1041 Questions and Answers
PDF
Microsoft Access Notes 2007 Ecdl
PDF
社會網絡分析UCINET Quick Start Guide
PPTX
MS Access teaching powerpoint tasks
PDF
Ms access notes
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Topic 4 intro spss_stata 30032012 sy_srini
Ms Access
Oracle Certification 1Z0-1041 Questions and Answers
Microsoft Access Notes 2007 Ecdl
社會網絡分析UCINET Quick Start Guide
MS Access teaching powerpoint tasks
Ms access notes

What's hot (17)

PDF
SPSS :Introduction for beginners
PPTX
Lesson Two Exploring An Access Database
PDF
MS Access 2010 tutorial 3
PDF
224-2009
PPT
Introduction to microsoft access
PDF
Basic introduction to ms access
PDF
Creating a Coding Book in IBM SPSS Statistics
PDF
Access presentation
PPT
Microsoft Access 2007
PDF
Microsoft Access 2010 - a jargon free guide
PPT
MS Access Training
PPT
Access 2007
PPT
Excel2002
PPT
B.sc i agri u 4 introduction to ms access
PDF
MS Access 2007 in ITT
PPTX
MS Access Ch 2 PPT
SPSS :Introduction for beginners
Lesson Two Exploring An Access Database
MS Access 2010 tutorial 3
224-2009
Introduction to microsoft access
Basic introduction to ms access
Creating a Coding Book in IBM SPSS Statistics
Access presentation
Microsoft Access 2007
Microsoft Access 2010 - a jargon free guide
MS Access Training
Access 2007
Excel2002
B.sc i agri u 4 introduction to ms access
MS Access 2007 in ITT
MS Access Ch 2 PPT
Ad

Similar to 3320 lab1 (20)

PDF
Clementine tool
PPT
Data_Mining_Applications of various kinds .ppt
PPT
20IT501_DWDM_PPT_Unit_II.ppt
PPT
20IT501_DWDM_PPT_Unit_II.ppt
PPT
DMML1_overview.ppt
PPTX
DAtawarehousing and datamining in IT ind
PPTX
Data Science.pptx NEW COURICUUMN IN DATA
PDF
An Overview of General Data Mining Tools
PDF
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
PPTX
Data Preprocessing&tools
PDF
E miner
PDF
Cs501 dm intro
PPT
Knowledge discovery thru data mining
PPTX
Data mining (DM) in the pharmaceutical industry
PPT
Dma unit 1
PDF
ii mca juno
PPT
Introduction to data warehouse
PPT
DM UNIT_5 ppt for btech final year students
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PPT
Dwdmunit1 a
Clementine tool
Data_Mining_Applications of various kinds .ppt
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
DMML1_overview.ppt
DAtawarehousing and datamining in IT ind
Data Science.pptx NEW COURICUUMN IN DATA
An Overview of General Data Mining Tools
Quality of Groundwater in Lingala Mandal of YSR Kadapa District, Andhraprades...
Data Preprocessing&tools
E miner
Cs501 dm intro
Knowledge discovery thru data mining
Data mining (DM) in the pharmaceutical industry
Dma unit 1
ii mca juno
Introduction to data warehouse
DM UNIT_5 ppt for btech final year students
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Dwdmunit1 a
Ad

Recently uploaded (20)

PDF
Complications of Minimal Access-Surgery.pdf
PDF
Journal of Dental Science - UDMY (2021).pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
Journal of Dental Science - UDMY (2022).pdf
PPTX
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Hazard Identification & Risk Assessment .pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PPTX
Education and Perspectives of Education.pptx
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Complications of Minimal Access-Surgery.pdf
Journal of Dental Science - UDMY (2021).pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Race Reva University – Shaping Future Leaders in Artificial Intelligence
CRP102_SAGALASSOS_Final_Projects_2025.pdf
Introduction to pro and eukaryotes and differences.pptx
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
International_Financial_Reporting_Standa.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Journal of Dental Science - UDMY (2022).pdf
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Hazard Identification & Risk Assessment .pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
Education and Perspectives of Education.pptx
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf

3320 lab1

  • 2. This tutorial will introduce you to the Clementine toolkit for data mining and show you how to get started with your own data mining project.
  • 3. The first part provides a tour of the workspace, including an update of what's new in this version of Clementine. The second part is a step-by-step guide to data mining in Clementine. All of the files shown in the examples are installed with Clementine so that you can follow along.
  • 4. Clementine uses a visual approach to data mining that provides a tangible way to work with data. Each process in Clementine is represented by an icon, or node , that you connect to form a stream representing the flow of data through a variety of processes.
  • 5.  
  • 6. Working in Clementine is essentially like using a visual metaphor to describe the world of data, statistics, and complex algorithms.
  • 7. Although it may take a minute to shift into this paradigm, you will soon find that Clementine's simplicity-of-use is exceedingly powerful. Let's take a closer look.
  • 8. To start Clementine:   From the Windows Start menu choose:   Programs     Clementine
  • 9.  
  • 10. When you first start Clementine, the workspace opens in the default view. The tools here are used to help you create a visual representation of data mining operations.
  • 11.  
  • 12. First, the area in the middle is called the stream canvas . This is the main area you will use to work in Clementine.
  • 13.  
  • 14. Most of the data and modeling tools in Clementine reside in palettes , the area below the stream canvas.
  • 15.  
  • 16. Each tab contains groups of nodes that are a graphical representation of data mining tasks, such as accessing and filtering data, creating graphs, and building models. To add nodes to the canvas, double-click icons from the node palettes or drag and drop them onto the canvas. You then connect them to create a stream , representing the flow of data.
  • 17. You will learn more about building streams later in this tutorial. You can jump ahead now using the Contents button below.
  • 18. On the top right side of the window are the output and object managers . These tabs are used to view and manage a variety of Clementine objects.
  • 19.  
  • 20. The Streams tab contains all streams open in the current session. You can save and close streams as well as add them to a project.
  • 21.  
  • 22. The Outputs tab contains a variety of files produced by stream operations in Clementine. You can display, rename, and close the tables, graphs, and reports listed here
  • 23.  
  • 24. The Models tab is a powerful tool that contains all generated models (models that have been built in Clementine) for a session. Models can be examined closely, added to the stream, exported, or annotated.
  • 25.  
  • 26. Note : The Models tab replaces the Generated Models tab from earlier versions of Clementine.
  • 27. On the bottom right side of the window is the projects tool, used to create and manage data mining projects. There are two ways to view projects you create in Clementine--Classes view and CRISP-DM view.
  • 28.  
  • 29. The CRISP-DM tab provides a way to organize projects according to the Cross-Industry Standard Process for Data Mining, an industry-proven, nonproprietary methodology. For both experienced and first-time data miners, using the CRISP-DM tool will help you to better organize and communicate your efforts.
  • 30.  
  • 31. The Classes tab provides a way to organize your work in Clementine categorically--by the types of objects you create. This view is useful when taking inventory of data, streams, models, etc.
  • 32.  
  • 33. As a data mining application, Clementine offers a strategic approach to finding useful relationships in large data sets. In contrast to more traditional statistical methods, you do not necessarily need to know what you are looking for when you start. You can explore your data, fitting different models and investigating different relationships, until you find useful information.
  • 34. This section provides: An overview of the types of data-mining problems Clementine can help solve. A hands-on demonstration of building streams, deriving fields, using graphs, and modeling in Clementine.
  • 35. A wide variety of organisations use Clementine to help them mine vast repositories of data. Following is a sample of the types of problems data mining can help solve.
  • 36. Public sector Governments around the world use data mining to explore massive data stores, improve citizen relationships, detect occurrences of fraud such as money laundering and tax evasion, detect crime and terrorist patterns, and enhance the expanding realm of e-government
  • 37.  
  • 38. CRM Customer relationship management can be improved thanks to smart classification of customer types and accurate predictions of churn. Clementine has successfully helped businesses attract and retain the most valuable customers in a variety of industries.
  • 39.  
  • 40. Web mining With powerful sequencing and prediction algorithms, Clementine contains the necessary tools to discover exactly what guests do at a Web site and deliver exactly the products or information they desire. From data preparation to modeling, the entire data-mining process can be managed inside of Clementine.
  • 41.  
  • 42. Drug discovery and bioinformatics Data mining aids both pharmaceutical and genomics research by analyzing the vast data stores resulting from increased lab automation. Clementine's clustering and classification models help generate leads from compound libraries while sequence detection aids the discovery of patterns.
  • 43.  
  • 44. Clementine provides templates for many of these data-mining applications. Clementine Application Templates, also known as CATs, are available for the following types of activities: Web-mining Fraud detection Analytical CRM Telcommunications analytical CRM Microarray analysis Crime detection and prevention
  • 45. Let's get started learning how Clementine can help you conduct your own data mining project. This section of the guide will show you how to build and execute simple streams using sample drug demonstration files that are included with Clementine. You will learn how to work with data in the various phases of data mining, including: Visualization , which helps you gain an overall picture of your data. You can create plots and charts to explore relationships among the fields in your data set and generate hypotheses to explore during modeling. Manipulation , which lets you clean and prepare the data for modeling. You can sort or aggregate data, filter out fields, discard or replace missing values, and derive new fields. Modeling , which gives you the broadest range of insight into the relationships among data fields. Models perform a variety of tasks such as predict outcomes, detect sequences, and group similarities. These help your organization grow, streamline processes, detect fraud, and retain the most valuable customers.
  • 46. For this section, imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Part of your job is to use data mining to find out which drug might be appropriate for a future patient with the same illness.
  • 47. The data fields used in this demo are: Age (Number) Sex M or F BP Blood pressure: HIGH, NORMAL, or LOW Cholesterol Blood cholesterol: NORMAL or HIGH Na Blood sodium concentration K Blood potassium concentration Drug Prescription drug to which a patient responded
  • 48. The first step is to load the data file using a Variable File node . You can add a Variable File node from the palettes--either click the Sources tab to find the node or use the Favorites tab, which includes this node by default. Next, double-click the newly placed node to open its dialog box.
  • 49.  
  • 50. Click the button just to the right of the File box marked with ellipses (...). This opens a dialog box for browsing to the directory in which Clementine is installed on your computer (or server). Open the demos directory and select the file called DRUG1n .
  • 51.  
  • 52. Select Read field names from file and notice the fields and values that have just been loaded into the dialog box. Before clicking OK to close the dialog box, take a moment to look at the data using the other tabs on the Source node.
  • 53.  
  • 54. Click the Data tab to override and change storage for a field. Note that storage is different than type , or usage of the data field.
  • 55.  
  • 56. The Filter tab can be used to remove any fields from the data that is brought into Clementine. Clicking on a field's arrow will mark it with a red X and filter it out. For this tutorial, though, we want to keep all fields.
  • 57.  
  • 58. The Types tab helps you learn more about the type of fields in your data. You can also choose Read Values to view the actual values for each field based on the selections that you make from the Values column. This process is known as instantiation .
  • 59.  
  • 60. Now that you have loaded the data file, you may want to glance at the values for some of the records. One way to do this is by building a stream that includes a Table node. To place a Table node in the stream, either double-click the icon in the palette or drag and drop it on to the canvas.
  • 61.  
  • 62. Note : Double-clicking a node from the palette will automatically connect it to the selected node in the stream canvas. However, you can not connect to terminal nodes like tables and graphs.
  • 63. Next, if the nodes are not already connected, you can use your middle mouse button to connect the Source node to the Table node. To simulate a middle mouse button, click the Alt key while using the mouse.
  • 64.  
  • 65. Now that you have built a stream, you must execute it in order to view its output. Click the green arrow button on the toolbar to execute the stream and view an output table showing all of the records in the data file.
  • 66.  
  • 67.