1
From Data to Wisdom
 Data
 The raw material of
information
 Information
 Data organized and
presented by someone
 Knowledge
 Information read, heard or
seen and understood and
integrated
 Wisdom
 Distilled knowledge and
understanding which can
lead to decisions
Wisdom
Knowledge
Information
Data
The Information Hierarchy
Why Data Mining?
The Explosive Growth of Data: from terabytes to
petabytes
Data collection and data availability
Automated data collection tools, database systems, Web,
computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, images, video, documents
Internet …
2
3
Source: Intel
How much data?
 Google: ~20-30 PB a day
 Wayback Machine has ~4 PB + 100-200 TB/month
 Facebook: ~3 PB of user data + 25 TB/day
 eBay: ~7 PB of user data + 50 TB/day
 CERN’s Large Hydron Collider generates 15 PB a year
 In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB
640K ought to be
enough for anybody.
Big Data Growing
5
The Untapped Data Gap:
Most of the useful data will
not be tagged or analyzed –
partly due to skill shortage
IDC predicts: From 2005 to 2020, the
digital universe will double every 2
years and grow from 130 exabytes to
40,000 exabytes
or 5,200 GB / person in 2020.
What Is Data Mining?
We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—
Automated analysis of massive data sets
6
The non-trivial extraction of implicit, previously unknown and
potentially useful knowledge from data in large data repositories
 Data Mining: A Definition
 Non-trivial: obvious knowledge is not useful
 implicit: hidden difficult to observe knowledge
 previously unknown
 potentially useful: actionable; easy to understand
7
Data Mining: Confluence of Multiple Disciplines
Data Mining
Machine
Learning
Statistics
Applications
Algorithm
Pattern
Recognition
High-Performance
Computing
Visualization
Database
Technology
8
Data Mining’s Virtuous Cycle
1. Identifying the problem
2. Mining data to transform it into actionable
information
3. Acting on the information
4. Measuring the results
9
The Knowledge Discovery Process
 Data Mining v. Knowledge Discovery in Databases (KDD)
 DM and KDD are often used interchangeably
 actually, DM is only part of the KDD process
- The KDD Process
10
Types of Knowledge Discovery
 Two kinds of knowledge discovery: directed and undirected
 Directed Knowledge Discovery
 Purpose: Explain value of some field in terms of all the others (goal-oriented)
 Method: select the target field based on some hypothesis about the data; ask the
algorithm to tell us how to predict or classify new instances
 Examples:
what products show increased sale when cream cheese is discounted
which banner ad to use on a web page for a given user coming to the site
 Undirected Knowledge Discovery
 Purpose: Find patterns in the data that may be interesting (no target field)
 Method: clustering, affinity grouping
 Examples:
which products in the catalog often sell together
market segmentation (find groups of customers/users with similar
characteristics or behavioral patterns)
From Data Mining to Data Science
11
12
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Object-relational databases, Heterogeneous databases and legacy databases
 Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and information networks
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
13
Data Mining: What Kind of Data?
Structured Databases
relational, object-relational, etc.
can use SQL to perform parts of the process
e.g., SELECT count(*) FROM Items WHERE
type=video GROUP BY category
14
Data Mining: What Kind of Data?
 Flat Files
 most common data source
 can be text (or HTML) or binary
 may contain transactions, statistical data, measurements, etc.
 Transactional databases
 set of records each with a transaction id, time stamp, and a set of items
 may have an associated “description” file for the items
 typical source of data used in market basket analysis
15
Data Mining: What Kind of Data?
 Other Types of Databases
 legacy databases
 multimedia databases (usually very high-dimensional)
 spatial databases (containing geographical information, such as maps, or
satellite imaging data, etc.)
 Time Series Temporal Data (time dependent information such as stock market
data; usually very dynamic)
 World Wide Web
 basically a large, heterogeneous, distributed database
 need for new or additional tools and techniques
information retrieval, filtering and extraction
agents to assist in browsing and filtering
Web content, usage, and structure (linkage) mining tools
 The “social Web”
User generated meta-data, social networks, shared resources, etc.
16
What Can Data Mining Do
Many Data Mining Tasks
 often inter-related
 often need to try different techniques/algorithms for each task
 each tasks may require different types of knowledge discovery
What are some of data mining tasks
 Classification
 Prediction
 Clustering
 Affinity Grouping / Association discovery
 Sequence Analysis
 Characterization
 Discrimination
17
Some Applications of Data mining
 Business data analysis and decision support
Marketing focalization
Recognizing specific market segments that respond to particular
characteristics
Return on mailing campaign (target marketing)
Customer Profiling
Segmentation of customer for marketing strategies and/or product
offerings
Customer behavior understanding
Customer retention and loyalty
Mass customization / personalization
18
Some Applications of Data mining
 Business data analysis and decision support (cont.)
Market analysis and management
Provide summary information for decision-making
Market basket analysis, cross selling, market segmentation.
Resource planning
Risk analysis and management
"What if" analysis
Forecasting
Pricing analysis, competitive analysis
Time-series analysis (Ex. stock market)
19
Some Applications of Data mining
 Fraud detection
Detecting telephone fraud:
Telephone call model: destination of the call, duration, time of day or week
Analyze patterns that deviate from an expected norm
British Telecom identified discrete groups of callers with frequent intra-group calls,
especially mobile phones, and broke a multimillion dollar fraud scheme
Detection of credit-card fraud
Detecting suspicious money transactions (money laundering)
 Text mining:
 Message filtering (e-mail, newsgroups, etc.)
 Newspaper articles analysis
 Text and document categorization
 Web Mining
 Mining patterns from the content, usage, and structure of Web resources
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
20
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
21
Applications:
• document clustering or
categorization
• topic identification / tracking
• concept discovery
• focused crawling
• content-based personalization
• intelligent search tools
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• user and customer behavior modeling
• Web site optimization
• e-customer relationship management
• Web marketing
• targeted advertising
• recommender systems
22
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• document retrieval and
ranking (e.g., Google)
• discovery of “hubs” and
“authorities”
• discovery of Web
communities
• social network analysis
23
24
The Knowledge Discovery Process
- The KDD Process
 Next: We first focus on understanding the data and data
preparation/transformation

More Related Content

PDF
Data mining
PPTX
web mining
PPTX
Data mining
PPTX
Data mining introduction
PPTX
Data mining services
PPT
Data mining by_ashok
DOC
Data mining notes
Data mining
web mining
Data mining
Data mining introduction
Data mining services
Data mining by_ashok
Data mining notes

What's hot (18)

PPTX
Data Mining: Applying data mining
PPTX
Data mining
PDF
Data Mining Techniques
PPT
Upstate CSCI 525 Data Mining Chapter 1
PPTX
Data mining techniques
PPTX
Data Mining: Classification and analysis
PPT
Secondary Research in Applied Marketing Research
DOC
All types of mining and trends indata mining
PPTX
Data Mining: Key definitions
PPTX
Data mining in Telecommunications
PDF
Dm unit i r16
PPTX
Introduction to-data-mining chapter 1
PPTX
What is Data mining? Data mining Presentation
DOCX
Mining internal sources of data
DOCX
MC0088 Internal Assignment (SMU)
PPTX
Big data
PDF
Data mining 1 - Introduction (cheat sheet - printable)
Data Mining: Applying data mining
Data mining
Data Mining Techniques
Upstate CSCI 525 Data Mining Chapter 1
Data mining techniques
Data Mining: Classification and analysis
Secondary Research in Applied Marketing Research
All types of mining and trends indata mining
Data Mining: Key definitions
Data mining in Telecommunications
Dm unit i r16
Introduction to-data-mining chapter 1
What is Data mining? Data mining Presentation
Mining internal sources of data
MC0088 Internal Assignment (SMU)
Big data
Data mining 1 - Introduction (cheat sheet - printable)
Ad

Viewers also liked (20)

PPTX
Network
PPTX
Python basics
PPTX
Programming for engineers in python
PPT
Database concepts
PPT
Database introduction
PPTX
Optimizing shared caches in chip multiprocessors
PPT
Abstract class
PPTX
Key exchange in crypto
PPTX
Directory based cache coherence
PPT
Data preprocessing
PPT
Xml stylus studio
PPT
Stacks queues lists
PPT
Poo java
PPTX
Tecnologías de Información y Comunicación
PPT
List in webpage
PPTX
Python your new best friend
PPTX
Data visualization
PPT
Prolog programming
PPTX
Google appenginejava.ppt
PDF
Motivation for multithreaded architectures
Network
Python basics
Programming for engineers in python
Database concepts
Database introduction
Optimizing shared caches in chip multiprocessors
Abstract class
Key exchange in crypto
Directory based cache coherence
Data preprocessing
Xml stylus studio
Stacks queues lists
Poo java
Tecnologías de Información y Comunicación
List in webpage
Python your new best friend
Data visualization
Prolog programming
Google appenginejava.ppt
Motivation for multithreaded architectures
Ad

Similar to Data mining and knowledge discovery (20)

PPT
Data mining 1
PPT
Data Mining: Concepts and Techniques.ppt
PPTX
Data warehouse and data mining
PPT
Chapter 01Intro.ppt full explanation used
PPT
Introduction
PPT
Introduction of Data Mining - Concept and techniques
PPT
Dma unit 1
PPTX
Introduction_to_Data_Mining12345678.pptx
PPT
01Intro(1).ppt Introduction In computer science
PDF
Data Mining and its detail processes with steps
PPT
6 weeks summer training in data mining,ludhiana
PPT
6months industrial training in data mining,ludhiana
PPT
6 weeks summer training in data mining,jalandhar
PPT
6months industrial training in data mining, jalandhar
PPT
01Intro.ppt data analytics r language slide 1
PPT
Introduction.ppt
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PPT
Chapter 1. Introduction.ppt
PPT
hanjia chapter_1.ppt data mining chapter 1
PPTX
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
Data mining 1
Data Mining: Concepts and Techniques.ppt
Data warehouse and data mining
Chapter 01Intro.ppt full explanation used
Introduction
Introduction of Data Mining - Concept and techniques
Dma unit 1
Introduction_to_Data_Mining12345678.pptx
01Intro(1).ppt Introduction In computer science
Data Mining and its detail processes with steps
6 weeks summer training in data mining,ludhiana
6months industrial training in data mining,ludhiana
6 weeks summer training in data mining,jalandhar
6months industrial training in data mining, jalandhar
01Intro.ppt data analytics r language slide 1
Introduction.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
Chapter 1. Introduction.ppt
hanjia chapter_1.ppt data mining chapter 1
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...

More from Young Alista (20)

PPT
Serialization/deserialization
PPTX
Big picture of data mining
PPTX
Business analytics and data mining
PPTX
Cache recap
PPTX
Hardware managed cache
PPTX
How analysis services caching works
PPTX
Object model
PPT
Abstract data types
PPTX
Abstraction file
PPTX
Concurrency with java
PPTX
Data structures and algorithms
PPTX
Inheritance
PPTX
Cobol, lisp, and python
PPTX
Object oriented analysis
PPTX
Api crash
PPTX
Learning python
PPTX
Extending burp with python
PPTX
Python language data types
PPTX
Rest api to integrate with your site
PDF
How to build a rest api.pptx
Serialization/deserialization
Big picture of data mining
Business analytics and data mining
Cache recap
Hardware managed cache
How analysis services caching works
Object model
Abstract data types
Abstraction file
Concurrency with java
Data structures and algorithms
Inheritance
Cobol, lisp, and python
Object oriented analysis
Api crash
Learning python
Extending burp with python
Python language data types
Rest api to integrate with your site
How to build a rest api.pptx

Recently uploaded (20)

PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Five Habits of High-Impact Board Members
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A comparative study of natural language inference in Swahili using monolingua...
DOCX
search engine optimization ppt fir known well about this
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPT
What is a Computer? Input Devices /output devices
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Architecture types and enterprise applications.pdf
PPT
Geologic Time for studying geology for geologist
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
UiPath Agentic Automation session 1: RPA to Agents
Five Habits of High-Impact Board Members
A proposed approach for plagiarism detection in Myanmar Unicode text
Zenith AI: Advanced Artificial Intelligence
A comparative study of natural language inference in Swahili using monolingua...
search engine optimization ppt fir known well about this
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
OpenACC and Open Hackathons Monthly Highlights July 2025
Getting started with AI Agents and Multi-Agent Systems
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
What is a Computer? Input Devices /output devices
Module 1.ppt Iot fundamentals and Architecture
Hindi spoken digit analysis for native and non-native speakers
Architecture types and enterprise applications.pdf
Geologic Time for studying geology for geologist
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Final SEM Unit 1 for mit wpu at pune .pptx

Data mining and knowledge discovery

  • 1. 1 From Data to Wisdom  Data  The raw material of information  Information  Data organized and presented by someone  Knowledge  Information read, heard or seen and understood and integrated  Wisdom  Distilled knowledge and understanding which can lead to decisions Wisdom Knowledge Information Data The Information Hierarchy
  • 2. Why Data Mining? The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, images, video, documents Internet … 2
  • 4. How much data?  Google: ~20-30 PB a day  Wayback Machine has ~4 PB + 100-200 TB/month  Facebook: ~3 PB of user data + 25 TB/day  eBay: ~7 PB of user data + 50 TB/day  CERN’s Large Hydron Collider generates 15 PB a year  In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB 640K ought to be enough for anybody.
  • 5. Big Data Growing 5 The Untapped Data Gap: Most of the useful data will not be tagged or analyzed – partly due to skill shortage IDC predicts: From 2005 to 2020, the digital universe will double every 2 years and grow from 130 exabytes to 40,000 exabytes or 5,200 GB / person in 2020.
  • 6. What Is Data Mining? We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining— Automated analysis of massive data sets 6 The non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data in large data repositories  Data Mining: A Definition  Non-trivial: obvious knowledge is not useful  implicit: hidden difficult to observe knowledge  previously unknown  potentially useful: actionable; easy to understand
  • 7. 7 Data Mining: Confluence of Multiple Disciplines Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
  • 8. 8 Data Mining’s Virtuous Cycle 1. Identifying the problem 2. Mining data to transform it into actionable information 3. Acting on the information 4. Measuring the results
  • 9. 9 The Knowledge Discovery Process  Data Mining v. Knowledge Discovery in Databases (KDD)  DM and KDD are often used interchangeably  actually, DM is only part of the KDD process - The KDD Process
  • 10. 10 Types of Knowledge Discovery  Two kinds of knowledge discovery: directed and undirected  Directed Knowledge Discovery  Purpose: Explain value of some field in terms of all the others (goal-oriented)  Method: select the target field based on some hypothesis about the data; ask the algorithm to tell us how to predict or classify new instances  Examples: what products show increased sale when cream cheese is discounted which banner ad to use on a web page for a given user coming to the site  Undirected Knowledge Discovery  Purpose: Find patterns in the data that may be interesting (no target field)  Method: clustering, affinity grouping  Examples: which products in the catalog often sell together market segmentation (find groups of customers/users with similar characteristics or behavioral patterns)
  • 11. From Data Mining to Data Science 11
  • 12. 12 Data Mining: On What Kinds of Data?  Database-oriented data sets and applications Relational database, data warehouse, transactional database Object-relational databases, Heterogeneous databases and legacy databases  Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and information networks Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web
  • 13. 13 Data Mining: What Kind of Data? Structured Databases relational, object-relational, etc. can use SQL to perform parts of the process e.g., SELECT count(*) FROM Items WHERE type=video GROUP BY category
  • 14. 14 Data Mining: What Kind of Data?  Flat Files  most common data source  can be text (or HTML) or binary  may contain transactions, statistical data, measurements, etc.  Transactional databases  set of records each with a transaction id, time stamp, and a set of items  may have an associated “description” file for the items  typical source of data used in market basket analysis
  • 15. 15 Data Mining: What Kind of Data?  Other Types of Databases  legacy databases  multimedia databases (usually very high-dimensional)  spatial databases (containing geographical information, such as maps, or satellite imaging data, etc.)  Time Series Temporal Data (time dependent information such as stock market data; usually very dynamic)  World Wide Web  basically a large, heterogeneous, distributed database  need for new or additional tools and techniques information retrieval, filtering and extraction agents to assist in browsing and filtering Web content, usage, and structure (linkage) mining tools  The “social Web” User generated meta-data, social networks, shared resources, etc.
  • 16. 16 What Can Data Mining Do Many Data Mining Tasks  often inter-related  often need to try different techniques/algorithms for each task  each tasks may require different types of knowledge discovery What are some of data mining tasks  Classification  Prediction  Clustering  Affinity Grouping / Association discovery  Sequence Analysis  Characterization  Discrimination
  • 17. 17 Some Applications of Data mining  Business data analysis and decision support Marketing focalization Recognizing specific market segments that respond to particular characteristics Return on mailing campaign (target marketing) Customer Profiling Segmentation of customer for marketing strategies and/or product offerings Customer behavior understanding Customer retention and loyalty Mass customization / personalization
  • 18. 18 Some Applications of Data mining  Business data analysis and decision support (cont.) Market analysis and management Provide summary information for decision-making Market basket analysis, cross selling, market segmentation. Resource planning Risk analysis and management "What if" analysis Forecasting Pricing analysis, competitive analysis Time-series analysis (Ex. stock market)
  • 19. 19 Some Applications of Data mining  Fraud detection Detecting telephone fraud: Telephone call model: destination of the call, duration, time of day or week Analyze patterns that deviate from an expected norm British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud scheme Detection of credit-card fraud Detecting suspicious money transactions (money laundering)  Text mining:  Message filtering (e-mail, newsgroups, etc.)  Newspaper articles analysis  Text and document categorization  Web Mining  Mining patterns from the content, usage, and structure of Web resources
  • 20. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 20
  • 21. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 21 Applications: • document clustering or categorization • topic identification / tracking • concept discovery • focused crawling • content-based personalization • intelligent search tools
  • 22. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • user and customer behavior modeling • Web site optimization • e-customer relationship management • Web marketing • targeted advertising • recommender systems 22
  • 23. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • document retrieval and ranking (e.g., Google) • discovery of “hubs” and “authorities” • discovery of Web communities • social network analysis 23
  • 24. 24 The Knowledge Discovery Process - The KDD Process  Next: We first focus on understanding the data and data preparation/transformation