SlideShare a Scribd company logo
2
Most read
Data Warehousing/Mining
Data Warehouse Evolution
TIME
2000
1995
1990
1985
1980
1960 1975
Information-
Based
Management
Data
Revolution
“Middle
Ages”
“Prehistoric
Times”
Relational
Databases
PC’s and
Spreadsheets
End-user
Interfaces
1st DW
Article
DW
Confs.
Vendor DW
Frameworks
Company
DWs
“Building the
DW”
Inmon (1992)
Data Replication
Tools
Data Warehousing/Mining
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single, complete,
and consistent store of data obtained from a
variety of sources and made available to end
users in a way they can understand and use it
in a business context.”
-- Barry Devlin, IBM Consultant
Data Warehousing/Mining
A Data Warehouse is...
 Stored collection of diverse data
– A solution to data integration problem
– Single repository of information
 Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.
 Optimized differently from transaction-
oriented db
 User interface aimed at executive
Data Warehousing/Mining
A Data Warehouse is... (continued)
 Large volume of data (Gb, Tb)
 Non-volatile
– Historical
– Time attributes are important
 Updates infrequent
 May be append-only
 Examples
– All transactions ever at WalMart
– Complete client histories at insurance firm
– Stockbroker financial information and portfolios
Data Warehousing/Mining
Summary
Operational Systems
Enterprise
Modeling
Business
Information Guide
Data
Warehouse
Catalog
Data Warehouse
Population
Data
Warehouse
Business Information
Interface
Data Warehousing/Mining
Warehouse is a Specialized DB
Standard DB
 Mostly updates
 Many small transactions
 Mb - Gb of data
 Current snapshot
 Index/hash on p.k.
 Raw data
 Thousands of users (e.g.,
clerical users)
Warehouse
 Mostly reads
 Queries are long and complex
 Gb - Tb of data
 History
 Lots of scans
 Summarized, reconciled data
 Hundreds of users (e.g.,
decision-makers, analysts)
Data Warehousing/Mining
Warehousing and Industry
 Warehousing is big business
– $2 billion in 1995
– $3.5 billion in early 1997
– Predicted: $8 billion in 1998 [Metagroup]
 WalMart has largest warehouse
– 900-CPU, 2,700 disk, 23 TB Teradata system
– ~7TB in warehouse
– 40-50GB per day
Data Warehousing/Mining
Types of Data
 Business Data - represents meaning
– Real-time data (ultimate source of all business data)
– Reconciled data
– Derived data
 Metadata - describes meaning
– Build-time metadata
– Control metadata
– Usage metadata
 Data as a product* - intrinsic meaning
– Produced and stored for its own intrinsic value
– e.g., the contents of a text-book
Data Warehousing/Mining
Data Warehouse Architectures:
Conceptual View
 Single-layer
– Every data element is stored once only
– Virtual warehouse
 Two-layer
– Real-time + derived data
– Most commonly used approach in
industry today
“Real-time data”
Operational
systems
Informational
systems
Derived Data
Real-time data
Operational
systems
Informational
systems
Data Warehousing/Mining 1
Three-layer Architecture:
Conceptual View
 Transformation of real-time data to derived
data really requires two steps
Derived Data
Real-time data
Operational
systems
Informational
systems
Reconciled Data
Physical Implementation
of the Data Warehouse
View level
“Particular informational
needs”

More Related Content

PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPTX
IM SEMINAR.pptx
PPTX
DWIntro.pptx
PPT
1-_Intro_to_Data_Minning__DWH.ppt
PPTX
158001210111bapan data warehousepptse.pptx
DWIntro.ppt
DWIntro.ppt
DWIntro.ppt
DWIntro.ppt
IM SEMINAR.pptx
DWIntro.pptx
1-_Intro_to_Data_Minning__DWH.ppt
158001210111bapan data warehousepptse.pptx

Similar to Evolution of Data Warehouse and Data mining (20)

PPTX
Data warehousing
PPTX
Data warehouse
PPTX
presentationofism-complete-1-100227093028-phpapp01.pptx
PPTX
Data warehouse-complete-1-100227093028-phpapp01.pptx
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
PPTX
Data warehouse
PPTX
DATA WAREHOUSING.2.pptx
PPTX
DATA WAREHOUSING
PPT
IT Ready - DW: 1st Day
PPT
SUPERB DATA WAREHOUSE.ppt
PPT
bich-2.ngjfyjdkzxzkckzxzkxzkxkgxjgyityutxjgyutxppt
PPTX
DATA WAREHOUSING
PPTX
Datawarehouse
PPTX
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
PPTX
Data warehouse and data mining.pptx
PDF
Data Mining is the process ofData Mining is the process ofData Mining is the ...
PPTX
Data warehouse introduction
PPTX
Data Warehouse
PPTX
Datawarehouse
PPT
Data ware housing - Introduction to data ware housing process.
Data warehousing
Data warehouse
presentationofism-complete-1-100227093028-phpapp01.pptx
Data warehouse-complete-1-100227093028-phpapp01.pptx
Business Intelligence Module 3_Datawarehousing.pptx
Data warehouse
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING
IT Ready - DW: 1st Day
SUPERB DATA WAREHOUSE.ppt
bich-2.ngjfyjdkzxzkckzxzkxzkxkgxjgyityutxjgyutxppt
DATA WAREHOUSING
Datawarehouse
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
Data warehouse and data mining.pptx
Data Mining is the process ofData Mining is the process ofData Mining is the ...
Data warehouse introduction
Data Warehouse
Datawarehouse
Data ware housing - Introduction to data ware housing process.
Ad

More from gufranqureshi506 (20)

PPT
Mapping and cardiality for Entity Relationship
PPT
Entitiy Relationship Introduction Diagram
PPTX
Entity Relationship Management Moder: Introduction
PPTX
Data base management system-Introduction
PPT
Introduction to R for Data Science Technology
PPT
Introduction to Scala for Data Science Technology
PPTX
Introdcution to Machine Learning and its types.
PPTX
Introdcution to Deep Learning and Machine Learning
PPTX
Computer forensic presentation and roles of first responder
PPTX
cyber forensic presentation on practicals
PPTX
Web Application Programming Interface (Web)
PPTX
Applications Programming Interfaces (API)
PPT
Data Mining and Data Warehouse Introuduction
PPT
Architecture of Data Warehouse for Data Science
PPT
Introduction to Data Warehouse for Data Science
PPTX
Introduction to Topology of Computer Networkds
PPTX
Introduction to Computer Network SYBSCIT
PPTX
Introduction to Deep Learning and Machine Learning.pptx
PPTX
Introduction to Augment Reality, VR and MR.pptx
PPTX
Unit 1 Green IT for first year bscit.pptx
Mapping and cardiality for Entity Relationship
Entitiy Relationship Introduction Diagram
Entity Relationship Management Moder: Introduction
Data base management system-Introduction
Introduction to R for Data Science Technology
Introduction to Scala for Data Science Technology
Introdcution to Machine Learning and its types.
Introdcution to Deep Learning and Machine Learning
Computer forensic presentation and roles of first responder
cyber forensic presentation on practicals
Web Application Programming Interface (Web)
Applications Programming Interfaces (API)
Data Mining and Data Warehouse Introuduction
Architecture of Data Warehouse for Data Science
Introduction to Data Warehouse for Data Science
Introduction to Topology of Computer Networkds
Introduction to Computer Network SYBSCIT
Introduction to Deep Learning and Machine Learning.pptx
Introduction to Augment Reality, VR and MR.pptx
Unit 1 Green IT for first year bscit.pptx
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Introduction to the R Programming Language
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
annual-report-2024-2025 original latest.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Lecture1 pattern recognition............
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Business Analytics and business intelligence.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data_Analytics_and_PowerBI_Presentation.pptx
.pdf is not working space design for the following data for the following dat...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to the R Programming Language
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
climate analysis of Dhaka ,Banglades.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
annual-report-2024-2025 original latest.
Miokarditis (Inflamasi pada Otot Jantung)
Lecture1 pattern recognition............
Fluorescence-microscope_Botany_detailed content
Business Analytics and business intelligence.pdf
Reliability_Chapter_ presentation 1221.5784
ISS -ESG Data flows What is ESG and HowHow
Quality review (1)_presentation of this 21

Evolution of Data Warehouse and Data mining

  • 1. Data Warehousing/Mining Data Warehouse Evolution TIME 2000 1995 1990 1985 1980 1960 1975 Information- Based Management Data Revolution “Middle Ages” “Prehistoric Times” Relational Databases PC’s and Spreadsheets End-user Interfaces 1st DW Article DW Confs. Vendor DW Frameworks Company DWs “Building the DW” Inmon (1992) Data Replication Tools
  • 2. Data Warehousing/Mining What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant
  • 3. Data Warehousing/Mining A Data Warehouse is...  Stored collection of diverse data – A solution to data integration problem – Single repository of information  Subject-oriented – Organized by subject, not by application – Used for analysis, data mining, etc.  Optimized differently from transaction- oriented db  User interface aimed at executive
  • 4. Data Warehousing/Mining A Data Warehouse is... (continued)  Large volume of data (Gb, Tb)  Non-volatile – Historical – Time attributes are important  Updates infrequent  May be append-only  Examples – All transactions ever at WalMart – Complete client histories at insurance firm – Stockbroker financial information and portfolios
  • 5. Data Warehousing/Mining Summary Operational Systems Enterprise Modeling Business Information Guide Data Warehouse Catalog Data Warehouse Population Data Warehouse Business Information Interface
  • 6. Data Warehousing/Mining Warehouse is a Specialized DB Standard DB  Mostly updates  Many small transactions  Mb - Gb of data  Current snapshot  Index/hash on p.k.  Raw data  Thousands of users (e.g., clerical users) Warehouse  Mostly reads  Queries are long and complex  Gb - Tb of data  History  Lots of scans  Summarized, reconciled data  Hundreds of users (e.g., decision-makers, analysts)
  • 7. Data Warehousing/Mining Warehousing and Industry  Warehousing is big business – $2 billion in 1995 – $3.5 billion in early 1997 – Predicted: $8 billion in 1998 [Metagroup]  WalMart has largest warehouse – 900-CPU, 2,700 disk, 23 TB Teradata system – ~7TB in warehouse – 40-50GB per day
  • 8. Data Warehousing/Mining Types of Data  Business Data - represents meaning – Real-time data (ultimate source of all business data) – Reconciled data – Derived data  Metadata - describes meaning – Build-time metadata – Control metadata – Usage metadata  Data as a product* - intrinsic meaning – Produced and stored for its own intrinsic value – e.g., the contents of a text-book
  • 9. Data Warehousing/Mining Data Warehouse Architectures: Conceptual View  Single-layer – Every data element is stored once only – Virtual warehouse  Two-layer – Real-time + derived data – Most commonly used approach in industry today “Real-time data” Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems
  • 10. Data Warehousing/Mining 1 Three-layer Architecture: Conceptual View  Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level “Particular informational needs”