SlideShare a Scribd company logo
Introduction to Data Mining
• What is Data Mining?
• Related technologies
• Data Mining techniques
• Data Mining Goals
• Stages of data mining process
• Knowledge representation methods
• Applications
What is Data Mining?
• The process of extracting information to identify patterns, trends,
and useful data that would allow the business to take the data-
driven decision from huge sets of data is called Data Mining.
• Data mining is the act of automatically searching for large stores
of information to find trends and patterns that go beyond simple
analysis procedures.
• Data Mining is a process used by organizations to extract
specific data from huge databases to solve business problems.
It primarily turns raw data into useful information.
• Data mining utilizes complex mathematical algorithms for data
segments and evaluates the probability of future events. Data
Mining is also called Knowledge Discovery of Data (KDD).
Related Technologies
Data mining is related to many concepts. We briefly
introduce each concept and indicate how it is related to
data mining.
• Machine Learning
• DBMS
• OLAP
• Statistics
Machine Learning
• Machine learning is the area of AI that examines how to write programs that
can learn.
• In data mining, machine learning is often used for prediction or classification.
• Applications that typically use machine learning techniques include speech
recognition, training moving robots, classification of astronomical structures,
and game playing.
• When machine learning is applied to data mining tasks, a model is used to
represent the data (such as a graphical structure like a neural network or a
decision tree).
• During the learning process, a sample of the database is used to train the
system to properly perform the desired task.
• Then the system is applied to the general database to actually perform the
task.
Machine Learning
• Machine learning algorithms are divided into two types:
1. Unsupervised Learning
2. Supervised Learning
1. Unsupervised Machine Learning:
Unsupervised learning does not depend on trained data sets to predict the
results, but it utilizes direct techniques such as clustering and association in
order to predict the results.
2. Supervised Machine Learning:
Supervised learning is a learning process in which we teach or train the
machine using data which is well leveled implies that some data is already
marked with the correct responses. After that, the machine is provided with
the new sets of data so that the supervised learning algorithm analyzes the
training data and gives an accurate result.
OLAP
• OLAP stands for On-Line Analytic Processing.
• OLAP systems are targeted to provide more complex query
results than traditional OLTP or database systems.
• OLAP is performed on data warehouses or data marts. The
primary goal of OLAP is to support ad hoc querying needed to
support DSS.
• The multidimensional view of data is fundamental to OLAP
applications.
• OLAP tools can be classified as ROLAP or MOLAP.
• ROLAP- Relational OLAP
• MOLAP- Multidimensional OLAP
OLAP operations
OLAP operations
There are several types of OLAP operations supported by OLAP tools:
• A simple query may look at a single cell within the cube [Figure (a)] .
• Slice: Look at a subcube to get more specific information. This is performed
by selecting on one dimension. As seen in Figure (c), this is looking at a
portion of the cube.
• Dice: Look at a subcube by selecting on two or more dimensions. This can be
performed by a slice on one dimension and then rotating the cube to select
on a second dimension. In Figure (d)
• Roll up (dimension reduction, aggregation): Roll up allows the user to ask
questions that move up an aggregation hierarchy. Figure (b) represents a roll
up from (a).
• Drill down: Figure (a) represents a drill down from (b). These functions allow a
user to get more detailed fact information by navigating lower in the
aggregation hierarchy.
• Visualization: Visualization allows the OLAP users to actually "see" results of
an operation.
DBMS
• A database is a collection of data usually associated with some
organization or enterprise.
• Schema
– e.g. (ID,Name,Address,Salary,JobNo) may be the schema for a
personnel database.
• A database management system (DBMS) is the software used to access a
database.
• Data model is used to describe the data, attributes, and relationships
among them.
– ER Model.
DBMS
• Transaction
• Query:
SELECT Name
FROM T
WHERE Salary > 100000
• A major difference between data mining queries and those of database
systems is the output .
• Basic database queries always output either a subset of the database or
aggregates of the data. A data mining query outputs a KDD object.
Statistics
• Simple statistical concepts as determining a data distribution and calculating
a mean and a variance can be viewed as data mining techniques.
• Statistical inference: Generalizing a model created from a sample of the
data to the entire dataset.
• Exploratory Data Analysis:
– Data can actually drive the creation of the model
– Opposite of traditional statistical view.
• Statistics research has produced many of the proposed data mining
algorithms.
• The difference between the data mining and statistics is data mining is
targeted to business users not to the statistician.
Goals of Data Mining?
• Data mining is one of the most useful techniques that help
entrepreneurs, researchers, and individuals to extract valuable
information from huge sets of data.
• Data mining Store and manage the data in a multidimensional
database system.
• Data mining Provide data access to business analysts and
information technology professionals.
• Data mining Analyze the data by application software.
• Data mining Present the data in a useful format, such as a
graph or table.

More Related Content

PPT
Analysis technologies - day3 slides Lecture notesppt
PPT
Business analysis
PPTX
Introduction to data mining and data warehousing
PPTX
INTRODUCTION to datawarehouse IN DATA.pptx
PPTX
2-Concept Hierarchy to Classification of DMS.pptx
PPTX
finalestkddfinalpresentation-111207021040-phpapp01.pptx
PPTX
Introduction to Data mining
PPTX
Additional themes of data mining for Msc CS
Analysis technologies - day3 slides Lecture notesppt
Business analysis
Introduction to data mining and data warehousing
INTRODUCTION to datawarehouse IN DATA.pptx
2-Concept Hierarchy to Classification of DMS.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
Introduction to Data mining
Additional themes of data mining for Msc CS

Similar to Lecture2 (1).ppt (20)

PPTX
Data warehouse 16 data analysis techniques
PPTX
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
PPT
Data mining techniques unit 1
PPTX
Unit 1.A.Introduction to Knowledge Discovery Data Mining (1).pptx
PPTX
Data Science Fundamentals and Practices.pptx
PDF
Machinr Learning and artificial_Lect1.pdf
PPT
Complete unit ii notes
PPT
Unit 3 part i Data mining
PPTX
Data mining slide for data mining process
PPTX
MS SQL SERVER: Introduction To Datamining Suing Sql Server
PPTX
MS Sql Server: Introduction To Datamining Suing Sql Server
PDF
Ch_2.pdf
PPTX
Lect-02-Business Problems and Data Science Solutions.pptx
PPT
Data Mining-2023 (2).ppt
PPT
Sanjeev Kumar Dash D ata Mining-2023.ppt
PPT
Unit 1_data mining and warehousing subject
PDF
Data preprocessing using Machine Learning
PPTX
data_preprocessingknnnaiveandothera.pptx
PPTX
CIS60efrgrgrhdhbehthrthrthrhrtgfhgbrthrthtrhth08.pptx
PDF
DATA MINING BASIC INTRODUCTION OF ALL THE STAGES
Data warehouse 16 data analysis techniques
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
Data mining techniques unit 1
Unit 1.A.Introduction to Knowledge Discovery Data Mining (1).pptx
Data Science Fundamentals and Practices.pptx
Machinr Learning and artificial_Lect1.pdf
Complete unit ii notes
Unit 3 part i Data mining
Data mining slide for data mining process
MS SQL SERVER: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
Ch_2.pdf
Lect-02-Business Problems and Data Science Solutions.pptx
Data Mining-2023 (2).ppt
Sanjeev Kumar Dash D ata Mining-2023.ppt
Unit 1_data mining and warehousing subject
Data preprocessing using Machine Learning
data_preprocessingknnnaiveandothera.pptx
CIS60efrgrgrhdhbehthrthrthrhrtgfhgbrthrthtrhth08.pptx
DATA MINING BASIC INTRODUCTION OF ALL THE STAGES
Ad

More from Minakshee Patil (18)

PPTX
Introduction, characteristics, Pseudocode.pptx
PPTX
0-1_knapsack_using_Dynamic Programming.pptx
PPTX
Introduction to Computational Complexity Theory pptx
PPTX
Different Searching and Sorting Methods.pptx
PPTX
Analysis of Algorithms (1).pptx, asymptotic
PPTX
0-1_knapsack_using_DP, types of knapsack
PPT
Linear Data Structures, array, stack, queue
PPTX
Unit 5-BACKTRACKING- n queens, sum of subset, graph coloring problems
PPT
stack, opeartions on stack, applications of stack
PPTX
Algorithm Design Techiques, divide and conquer
PPTX
Analysis of Algorithms, recurrence relation, solving recurrences
PPT
Lecture2 (9).ppt
PPTX
oracle.pptx
PPT
Lecture1.ppt
PPT
Unit 1.ppt
PPTX
Hierarchical clustering algorithm.pptx
PPT
Lecture3 (3).ppt
PPT
Lecture4.ppt
Introduction, characteristics, Pseudocode.pptx
0-1_knapsack_using_Dynamic Programming.pptx
Introduction to Computational Complexity Theory pptx
Different Searching and Sorting Methods.pptx
Analysis of Algorithms (1).pptx, asymptotic
0-1_knapsack_using_DP, types of knapsack
Linear Data Structures, array, stack, queue
Unit 5-BACKTRACKING- n queens, sum of subset, graph coloring problems
stack, opeartions on stack, applications of stack
Algorithm Design Techiques, divide and conquer
Analysis of Algorithms, recurrence relation, solving recurrences
Lecture2 (9).ppt
oracle.pptx
Lecture1.ppt
Unit 1.ppt
Hierarchical clustering algorithm.pptx
Lecture3 (3).ppt
Lecture4.ppt
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Current and future trends in Computer Vision.pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Construction Project Organization Group 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
PPT on Performance Review to get promotions
PDF
Digital Logic Computer Design lecture notes
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
composite construction of structures.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Foundation to blockchain - A guide to Blockchain Tech
Model Code of Practice - Construction Work - 21102022 .pdf
UNIT 4 Total Quality Management .pptx
OOP with Java - Java Introduction (Basics)
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Current and future trends in Computer Vision.pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Construction Project Organization Group 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Internet of Things (IOT) - A guide to understanding
PPT on Performance Review to get promotions
Digital Logic Computer Design lecture notes
Automation-in-Manufacturing-Chapter-Introduction.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
composite construction of structures.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

Lecture2 (1).ppt

  • 1. Introduction to Data Mining • What is Data Mining? • Related technologies • Data Mining techniques • Data Mining Goals • Stages of data mining process • Knowledge representation methods • Applications
  • 2. What is Data Mining? • The process of extracting information to identify patterns, trends, and useful data that would allow the business to take the data- driven decision from huge sets of data is called Data Mining. • Data mining is the act of automatically searching for large stores of information to find trends and patterns that go beyond simple analysis procedures. • Data Mining is a process used by organizations to extract specific data from huge databases to solve business problems. It primarily turns raw data into useful information. • Data mining utilizes complex mathematical algorithms for data segments and evaluates the probability of future events. Data Mining is also called Knowledge Discovery of Data (KDD).
  • 3. Related Technologies Data mining is related to many concepts. We briefly introduce each concept and indicate how it is related to data mining. • Machine Learning • DBMS • OLAP • Statistics
  • 4. Machine Learning • Machine learning is the area of AI that examines how to write programs that can learn. • In data mining, machine learning is often used for prediction or classification. • Applications that typically use machine learning techniques include speech recognition, training moving robots, classification of astronomical structures, and game playing. • When machine learning is applied to data mining tasks, a model is used to represent the data (such as a graphical structure like a neural network or a decision tree). • During the learning process, a sample of the database is used to train the system to properly perform the desired task. • Then the system is applied to the general database to actually perform the task.
  • 5. Machine Learning • Machine learning algorithms are divided into two types: 1. Unsupervised Learning 2. Supervised Learning 1. Unsupervised Machine Learning: Unsupervised learning does not depend on trained data sets to predict the results, but it utilizes direct techniques such as clustering and association in order to predict the results. 2. Supervised Machine Learning: Supervised learning is a learning process in which we teach or train the machine using data which is well leveled implies that some data is already marked with the correct responses. After that, the machine is provided with the new sets of data so that the supervised learning algorithm analyzes the training data and gives an accurate result.
  • 6. OLAP • OLAP stands for On-Line Analytic Processing. • OLAP systems are targeted to provide more complex query results than traditional OLTP or database systems. • OLAP is performed on data warehouses or data marts. The primary goal of OLAP is to support ad hoc querying needed to support DSS. • The multidimensional view of data is fundamental to OLAP applications. • OLAP tools can be classified as ROLAP or MOLAP. • ROLAP- Relational OLAP • MOLAP- Multidimensional OLAP
  • 8. OLAP operations There are several types of OLAP operations supported by OLAP tools: • A simple query may look at a single cell within the cube [Figure (a)] . • Slice: Look at a subcube to get more specific information. This is performed by selecting on one dimension. As seen in Figure (c), this is looking at a portion of the cube. • Dice: Look at a subcube by selecting on two or more dimensions. This can be performed by a slice on one dimension and then rotating the cube to select on a second dimension. In Figure (d) • Roll up (dimension reduction, aggregation): Roll up allows the user to ask questions that move up an aggregation hierarchy. Figure (b) represents a roll up from (a). • Drill down: Figure (a) represents a drill down from (b). These functions allow a user to get more detailed fact information by navigating lower in the aggregation hierarchy. • Visualization: Visualization allows the OLAP users to actually "see" results of an operation.
  • 9. DBMS • A database is a collection of data usually associated with some organization or enterprise. • Schema – e.g. (ID,Name,Address,Salary,JobNo) may be the schema for a personnel database. • A database management system (DBMS) is the software used to access a database. • Data model is used to describe the data, attributes, and relationships among them. – ER Model.
  • 10. DBMS • Transaction • Query: SELECT Name FROM T WHERE Salary > 100000 • A major difference between data mining queries and those of database systems is the output . • Basic database queries always output either a subset of the database or aggregates of the data. A data mining query outputs a KDD object.
  • 11. Statistics • Simple statistical concepts as determining a data distribution and calculating a mean and a variance can be viewed as data mining techniques. • Statistical inference: Generalizing a model created from a sample of the data to the entire dataset. • Exploratory Data Analysis: – Data can actually drive the creation of the model – Opposite of traditional statistical view. • Statistics research has produced many of the proposed data mining algorithms. • The difference between the data mining and statistics is data mining is targeted to business users not to the statistician.
  • 12. Goals of Data Mining? • Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract valuable information from huge sets of data. • Data mining Store and manage the data in a multidimensional database system. • Data mining Provide data access to business analysts and information technology professionals. • Data mining Analyze the data by application software. • Data mining Present the data in a useful format, such as a graph or table.