SlideShare a Scribd company logo
Perfect Data Mining & Predictive
Analytics Model Methodology
Sub-field of computer science develop from computational learning and pattern reorganization theory in artificial
intelligence, Machine learning is the method of making analytical models to automatically search previously
unknown patterns from data that point out associations, anomalies (outliers), sequences, classifications, and clusters
and segments. These patterns reveal hidden strategy as to why an event happened.
Businesses and organizations can take benefit of various types of uses for
machine learning:
• Segmentation, sets of clients who have same or similar purchase
patterns for objective marketing
• Classification based on a set of attributes to make a prediction
• Forecasts—When purchase projections based on time series
• Pattern detection that associates one product with other one to
reveal cross-sell sequences and opportunities.
• Anomaly detection— fraud detecting (for illustration)
Predictive analytics model methodology
The most widely used Cross Industry Standard Process for Data Mining methodology is used to develop predictive
analytical models. It includes 6 phases:
1. business understanding
2. data understanding
3. data preparation
4. model development using supervised
5. unsupervised learning
6. model evaluation and model deployment
Business understanding
The understanding of business phase involves understand and define the use case or business problem, the business
target and the business query that require to be answered. It also include defining success criteria. Then the criterion
project-related action require to be process. These tasks involve defining resource needs such as defining any
constraints, technology, people, money, creating a project plan, requirements, assessing risks and creating a
contingency plan.
Data understanding
The understanding of data phase includes data needs such as internal and external data sources, origin and data
characteristics (feature and quality) including 3Vs data volumes, variety, velocity, formats and so on, also whether the
data is in a relational database, flat files, a Hadoop Distributed File System (HDFS) or if it is live, streaming data. This
phase also includes data exploration and investigation using statistical analysis to look at hug data, In addition, a data
quality assessment includes understanding the degree to which data is missing, has errors, is duplicated, and is
inconsistent.
Data preparation
The objective of the data preparation phase is to produce a set of information that can be fed into machine-learning
algos. This process requires a number of tasks including filtering and cleaning; data conversion; data transformation;
data enrichment; and variable identification, which is also known as dimensionality reduction or feature selection.
Variable identification’s objective is to create a data set of the most relevant variables to be used as model input to
get optimal results. The intention is also to remove variables from a data set that are not useful as model input
without compromising the model’s accuracy—for illustration, the accuracy of the predictions it makes.
Model development
The model development phase is about the development of a machine-learning model. Models can be build up to
predict, forecast or analyze information to find patterns such as sets, groups and associations
Two types of machine learning can be used in model development:
1. supervised learning
2. unsupervised learning
Typically, predictive models are build up using supervised learning. For illustration, if we require to develop a model
for equipment failure prediction, we can use data that describes equipment that has actually failed. We can use that
data to train the new model to distinguish the profile of a piece of equipment that is colorable going to fail. To fulfill
this profile recognition, we divide the data segments which inclusive failed equipment data records into a test data
set and a training data set. Then we train the model by fill the training data set and segments into an algorithm,
various of which can be used for prediction. Then we test the model by test data set.
Unsupervised learning is a method of analyzing data to try and search masked patterns in the data that indicate
product association and groupings—for illustration, customer segmentation. Grouping is based on minimizing or
maximizing similarity. The K-indicates clustering algorithm is a most widely used algorithm for this approach.
Predictive and descriptive analytical models can be build up using advanced Developed data mining tools, analytics
clouds, data science interactive workbooks with procedural or declarative programming languages and automated
model development tools.
Model evaluation
Afterward Model developed, the next phase is to evaluate the accuracy and purity of predictions. For predictions,
this assessment means understanding how many predictions were correct and incorrect? Various process can
achieve this evaluation. Key measures in model evaluation are the number of true positives, true negatives, false
positives and false negatives. The surface line is that we need to make surely that the model is accurate; otherwise, it
could generate hug false positives that may result in incorrect actions and decisions.
Model deployment
Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many
various environment. These environments include spreadsheets, analytics servers, database management systems
(DBMSs), applications, analytical relational database management systems, Apache Hadoop, Apache Spark and
streaming analytics platforms.

More Related Content

PPT
Knowledge Discovery Using Data Mining
PPTX
Knowledge Discovery and Data Mining
PDF
Data Mining and Knowledge Discovery in Large Databases
PPTX
Data mining , Knowledge Discovery Process, Classification
PPT
Knowledge discovery thru data mining
DOCX
knowledge discovery and data mining approach in databases (2)
PPTX
Additional themes of data mining for Msc CS
PPTX
Knowledge discovery process
Knowledge Discovery Using Data Mining
Knowledge Discovery and Data Mining
Data Mining and Knowledge Discovery in Large Databases
Data mining , Knowledge Discovery Process, Classification
Knowledge discovery thru data mining
knowledge discovery and data mining approach in databases (2)
Additional themes of data mining for Msc CS
Knowledge discovery process

What's hot (19)

PPTX
Kdd process
PPT
Data mining
PPTX
Application of KDD & its future scope
PDF
Fundamentals of data mining and its applications
PPT
Introduction To Data Mining
DOC
Data Mining
PPT
Introduction
PPT
Introduction-to-Knowledge Discovery in Database
PPTX
Introduction to Data Mining
PPT
Introduction to Data Mining
PPTX
Datamining - On What Kind of Data
PPTX
Data mining and knowledge discovery
PDF
Data mining
PDF
Introduction to Data Mining for Newbies
PPTX
Introduction to Data mining
PPTX
01 Introduction to Data Mining
PPT
1.2 steps and functionalities
PPT
Data mining - GDi Techno Solutions
PPTX
Data Mining : Concepts
Kdd process
Data mining
Application of KDD & its future scope
Fundamentals of data mining and its applications
Introduction To Data Mining
Data Mining
Introduction
Introduction-to-Knowledge Discovery in Database
Introduction to Data Mining
Introduction to Data Mining
Datamining - On What Kind of Data
Data mining and knowledge discovery
Data mining
Introduction to Data Mining for Newbies
Introduction to Data mining
01 Introduction to Data Mining
1.2 steps and functionalities
Data mining - GDi Techno Solutions
Data Mining : Concepts
Ad

Similar to Data Mining methodology (20)

PPTX
Data drift and machine learning
PPTX
Data drift and machine learning
PPTX
Predictive analytics BA4206 Anna University Business Analytics
PPTX
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
PPTX
The 4 Machine Learning Models Imperative for Business Transformation
PDF
Introduction to Machine Learning
PDF
Data mining
PDF
Guide To Predictive Analytics with Machine Learning.pdf
PPTX
Informs presentation new ppt
PDF
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
DOCX
Business Intelligence and Analytics Systems for Decision .docx
PPTX
Basic Overview of Data Mining
PDF
BI Chapter 04.pdf business business business business
PDF
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
PPTX
DataAnalyticsIntroduction and its ci.pptx
PPTX
3510-6510_Ch4.pptx
PDF
Predictive Modeling Development Life Cycle
PPTX
Programming-Introduction-to-Machine-Learning.pptx
PPTX
10 best practices in operational analytics
PDF
Exploring the Data science Process
Data drift and machine learning
Data drift and machine learning
Predictive analytics BA4206 Anna University Business Analytics
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
The 4 Machine Learning Models Imperative for Business Transformation
Introduction to Machine Learning
Data mining
Guide To Predictive Analytics with Machine Learning.pdf
Informs presentation new ppt
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
Business Intelligence and Analytics Systems for Decision .docx
Basic Overview of Data Mining
BI Chapter 04.pdf business business business business
THEORITICAL FRAMEWORK FOR THE DATA MINING PROCESS
DataAnalyticsIntroduction and its ci.pptx
3510-6510_Ch4.pptx
Predictive Modeling Development Life Cycle
Programming-Introduction-to-Machine-Learning.pptx
10 best practices in operational analytics
Exploring the Data science Process
Ad

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Essential Infomation Tech presentation.pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
medical staffing services at VALiNTRY
PDF
System and Network Administration Chapter 2
PDF
top salesforce developer skills in 2025.pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Essential Infomation Tech presentation.pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
How Creative Agencies Leverage Project Management Software.pdf
ai tools demonstartion for schools and inter college
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
medical staffing services at VALiNTRY
System and Network Administration Chapter 2
top salesforce developer skills in 2025.pdf
Reimagine Home Health with the Power of Agentic AI​
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Design an Analysis of Algorithms I-SECS-1021-03

Data Mining methodology

  • 1. Perfect Data Mining & Predictive Analytics Model Methodology Sub-field of computer science develop from computational learning and pattern reorganization theory in artificial intelligence, Machine learning is the method of making analytical models to automatically search previously unknown patterns from data that point out associations, anomalies (outliers), sequences, classifications, and clusters and segments. These patterns reveal hidden strategy as to why an event happened. Businesses and organizations can take benefit of various types of uses for machine learning: • Segmentation, sets of clients who have same or similar purchase patterns for objective marketing • Classification based on a set of attributes to make a prediction • Forecasts—When purchase projections based on time series • Pattern detection that associates one product with other one to reveal cross-sell sequences and opportunities. • Anomaly detection— fraud detecting (for illustration) Predictive analytics model methodology The most widely used Cross Industry Standard Process for Data Mining methodology is used to develop predictive analytical models. It includes 6 phases: 1. business understanding 2. data understanding 3. data preparation 4. model development using supervised 5. unsupervised learning 6. model evaluation and model deployment
  • 2. Business understanding The understanding of business phase involves understand and define the use case or business problem, the business target and the business query that require to be answered. It also include defining success criteria. Then the criterion project-related action require to be process. These tasks involve defining resource needs such as defining any constraints, technology, people, money, creating a project plan, requirements, assessing risks and creating a contingency plan. Data understanding The understanding of data phase includes data needs such as internal and external data sources, origin and data characteristics (feature and quality) including 3Vs data volumes, variety, velocity, formats and so on, also whether the data is in a relational database, flat files, a Hadoop Distributed File System (HDFS) or if it is live, streaming data. This phase also includes data exploration and investigation using statistical analysis to look at hug data, In addition, a data quality assessment includes understanding the degree to which data is missing, has errors, is duplicated, and is inconsistent. Data preparation The objective of the data preparation phase is to produce a set of information that can be fed into machine-learning algos. This process requires a number of tasks including filtering and cleaning; data conversion; data transformation; data enrichment; and variable identification, which is also known as dimensionality reduction or feature selection. Variable identification’s objective is to create a data set of the most relevant variables to be used as model input to get optimal results. The intention is also to remove variables from a data set that are not useful as model input without compromising the model’s accuracy—for illustration, the accuracy of the predictions it makes. Model development The model development phase is about the development of a machine-learning model. Models can be build up to predict, forecast or analyze information to find patterns such as sets, groups and associations Two types of machine learning can be used in model development: 1. supervised learning 2. unsupervised learning Typically, predictive models are build up using supervised learning. For illustration, if we require to develop a model for equipment failure prediction, we can use data that describes equipment that has actually failed. We can use that data to train the new model to distinguish the profile of a piece of equipment that is colorable going to fail. To fulfill this profile recognition, we divide the data segments which inclusive failed equipment data records into a test data set and a training data set. Then we train the model by fill the training data set and segments into an algorithm, various of which can be used for prediction. Then we test the model by test data set. Unsupervised learning is a method of analyzing data to try and search masked patterns in the data that indicate product association and groupings—for illustration, customer segmentation. Grouping is based on minimizing or maximizing similarity. The K-indicates clustering algorithm is a most widely used algorithm for this approach. Predictive and descriptive analytical models can be build up using advanced Developed data mining tools, analytics clouds, data science interactive workbooks with procedural or declarative programming languages and automated model development tools.
  • 3. Model evaluation Afterward Model developed, the next phase is to evaluate the accuracy and purity of predictions. For predictions, this assessment means understanding how many predictions were correct and incorrect? Various process can achieve this evaluation. Key measures in model evaluation are the number of true positives, true negatives, false positives and false negatives. The surface line is that we need to make surely that the model is accurate; otherwise, it could generate hug false positives that may result in incorrect actions and decisions. Model deployment Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many various environment. These environments include spreadsheets, analytics servers, database management systems (DBMSs), applications, analytical relational database management systems, Apache Hadoop, Apache Spark and streaming analytics platforms.