SlideShare a Scribd company logo
Data Warehousing
2
Problem: Heterogeneous Information
Sources
“Heterogeneities are everywhere”
 Different interfaces
 Different data representations
 Duplicate and inconsistent information
Personal
Databases
Digital Libraries
Scientific Databases
World
Wide
Web
3
Problem: Data Management in Large
Enterprises
 Vertical fragmentation of informational systems
(vertical stove pipes)
 Result of application (user)-driven development of
operational systems
Sales Administration Finance Manufacturing ...
Sales Planning
Stock Mngmt
...
Suppliers
...
Debt Mngmt
Num. Control
...
Inventory
4
Goal: Unified Access to Data
Integration System
 Collects and combines information
 Provides integrated view, uniform user interface
 Supports sharing
World
Wide
Web
Digital Libraries Scientific Databases
Personal
Databases
5
 Two Approaches:
 Query-Driven (Lazy)
 Warehouse (Eager)
Source Source
?
Why a Warehouse?
6
The Traditional Research Approach
Source Source
Source
. . .
Integration System
. . .
Metadata
Clients
Wrapper Wrapper
Wrapper
 Query-driven (lazy, on-demand)
7
Disadvantages of Query-Driven
Approach
 Delay in query processing
 Slow or unavailable information sources
 Complex filtering and integration
 Inefficient and potentially expensive for
frequent queries
 Competes with local processing at sources
 Hasn’t caught on in industry
8
The Warehousing Approach
Data
Warehouse
Clients
Source Source
Source
. . .
Extractor/
Monitor
Integration System
. . .
Metadata
Extractor/
Monitor
Extractor/
Monitor
 Information
integrated in
advance
 Stored in wh for
direct querying
and analysis
9
Advantages of Warehousing Approach
 High query performance
 But not necessarily most current information
 Doesn’t interfere with local processing at sources
 Complex queries at warehouse
 OLTP at information sources
 Information copied at warehouse
 Can modify, annotate, summarize, restructure, etc.
 Can store historical information
 Security, no auditing
 Has caught on in industry
10
Not Either-Or Decision
 Query-driven approach still better for
 Rapidly changing information
 Rapidly changing information sources
 Truly vast amounts of data from large numbers
of sources
 Clients with unpredictable needs
11
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single,
complete, and consistent store of data
obtained from a variety of sources and made
available to end users in a way they can
understand and use it in a business context.”
-- Barry Devlin, IBM Consultant
12
What is a Data Warehouse?
An Alternative Viewpoint
“A DW is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in
organizational decision making.”
-- W.H. Inmon, Building the Data Warehouse, 1992
13
A Data Warehouse is...
 Stored collection of diverse data
 A solution to data integration problem
 Single repository of information
 Subject-oriented
 Organized by subject, not by application
 Used for analysis, data mining, etc.
 Optimized differently from transaction-
oriented db
 User interface aimed at executive
14
… Cont’d
 Large volume of data (Gb, Tb)
 Non-volatile
 Historical
 Time attributes are important
 Updates infrequent
 May be append-only
 Examples
 All transactions ever at Sainsbury’s
 Complete client histories at insurance firm
 LSE financial information and portfolios
15
Generic Warehouse Architecture
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
Integrator
Warehouse
Client Client
Design Phase
Maintenance
Loading
...
Metadata
Optimization
Query & Analysis
16
Data Warehouse Architectures:
Conceptual View
 Single-layer
 Every data element is stored once only
 Virtual warehouse
 Two-layer
 Real-time + derived data
 Most commonly used approach in
industry today
“Real-time data”
Operational
systems
Informational
systems
Derived Data
Real-time data
Operational
systems
Informational
systems
17
Three-layer Architecture:
Conceptual View
 Transformation of real-time data to derived
data really requires two steps
Derived Data
Real-time data
Operational
systems
Informational
systems
Reconciled Data
Physical Implementation
of the Data Warehouse
View level
“Particular informational
needs”
18
Data Warehousing: Two Distinct
Issues
(1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in
warehouse
“Warehouse DBMS”
 Both rich research areas
 Industry has focused on (2)
19
Issues in Data Warehousing
 Warehouse Design
 Extraction
 Wrappers, monitors (change detectors)
 Integration
 Cleansing & merging
 Warehousing specification & Maintenance
 Optimizations
 Miscellaneous (e.g., evolution)
20
 OLTP: On Line Transaction Processing
 Describes processing at operational sites
 OLAP: On Line Analytical Processing
 Describes processing at warehouse
OLTP vs. OLAP
21
Warehouse is a Specialized DB
Standard DB (OLTP)
 Mostly updates
 Many small transactions
 Mb - Gb of data
 Current snapshot
 Index/hash on p.k.
 Raw data
 Thousands of users (e.g.,
clerical users)
Warehouse (OLAP)
 Mostly reads
 Queries are long and complex
 Gb - Tb of data
 History
 Lots of scans
 Summarized, reconciled data
 Hundreds of users (e.g.,
decision-makers, analysts)

More Related Content

PDF
SAP HANA Architecture Overview | SAP HANA Tutorial
PPT
Introduction to Data Warehousing
PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPT
DWIntro.ppt
PPT
Data Warehousing
PPT
Cs636 dw-intro
SAP HANA Architecture Overview | SAP HANA Tutorial
Introduction to Data Warehousing
DWIntro.ppt
DWIntro.ppt
DWIntro.ppt
DWIntro.ppt
Data Warehousing
Cs636 dw-intro

Similar to SUPERB DATA WAREHOUSE.ppt (20)

PPT
2. olap warehouse
PPT
Introduction to Business Intelligence and Data warehousing - ppt
PPT
Data mining presentation for OLAP and other details
PPT
1-_Intro_to_Data_Minning__DWH.ppt
PPT
Data Mining and Data Warehouse Introuduction
PPT
Data Mining and Warehousing Concept and Techniques
PPT
04OLAP in data mining concept Online Analytical Processing.ppt
PPT
1.4 data warehouse
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
PPT
Topic(4)-OLAP data mining master ALEX.ppt
PDF
BI Chapter 03.pdf business business business business business business
PPTX
04OLAPV2 from the course data warehousing
PDF
data warehousing and data mining (1).pdf
PDF
TOPIC 9 data warehousing and data mining.pdf
PDF
Dbm630_Lecture02-03
PDF
Dbm630_lecture02-03
PPTX
DMDW Lesson 03 - Data Warehouse Theory
PPTX
Data warehousing
PPT
data warehouse and data mining unit 2 ppt
2. olap warehouse
Introduction to Business Intelligence and Data warehousing - ppt
Data mining presentation for OLAP and other details
1-_Intro_to_Data_Minning__DWH.ppt
Data Mining and Data Warehouse Introuduction
Data Mining and Warehousing Concept and Techniques
04OLAP in data mining concept Online Analytical Processing.ppt
1.4 data warehouse
Business Intelligence Module 3_Datawarehousing.pptx
Topic(4)-OLAP data mining master ALEX.ppt
BI Chapter 03.pdf business business business business business business
04OLAPV2 from the course data warehousing
data warehousing and data mining (1).pdf
TOPIC 9 data warehousing and data mining.pdf
Dbm630_Lecture02-03
Dbm630_lecture02-03
DMDW Lesson 03 - Data Warehouse Theory
Data warehousing
data warehouse and data mining unit 2 ppt
Ad

Recently uploaded (20)

PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Introduction to Artificial Intelligence
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
ai tools demonstartion for schools and inter college
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
top salesforce developer skills in 2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
How to Choose the Right IT Partner for Your Business in Malaysia
Introduction to Artificial Intelligence
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
ai tools demonstartion for schools and inter college
Upgrade and Innovation Strategies for SAP ERP Customers
top salesforce developer skills in 2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Reimagine Home Health with the Power of Agentic AI​
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Understanding Forklifts - TECH EHS Solution
Design an Analysis of Algorithms II-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
PTS Company Brochure 2025 (1).pdf.......
How to Migrate SBCGlobal Email to Yahoo Easily
VVF-Customer-Presentation2025-Ver1.9.pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Ad

SUPERB DATA WAREHOUSE.ppt

  • 2. 2 Problem: Heterogeneous Information Sources “Heterogeneities are everywhere”  Different interfaces  Different data representations  Duplicate and inconsistent information Personal Databases Digital Libraries Scientific Databases World Wide Web
  • 3. 3 Problem: Data Management in Large Enterprises  Vertical fragmentation of informational systems (vertical stove pipes)  Result of application (user)-driven development of operational systems Sales Administration Finance Manufacturing ... Sales Planning Stock Mngmt ... Suppliers ... Debt Mngmt Num. Control ... Inventory
  • 4. 4 Goal: Unified Access to Data Integration System  Collects and combines information  Provides integrated view, uniform user interface  Supports sharing World Wide Web Digital Libraries Scientific Databases Personal Databases
  • 5. 5  Two Approaches:  Query-Driven (Lazy)  Warehouse (Eager) Source Source ? Why a Warehouse?
  • 6. 6 The Traditional Research Approach Source Source Source . . . Integration System . . . Metadata Clients Wrapper Wrapper Wrapper  Query-driven (lazy, on-demand)
  • 7. 7 Disadvantages of Query-Driven Approach  Delay in query processing  Slow or unavailable information sources  Complex filtering and integration  Inefficient and potentially expensive for frequent queries  Competes with local processing at sources  Hasn’t caught on in industry
  • 8. 8 The Warehousing Approach Data Warehouse Clients Source Source Source . . . Extractor/ Monitor Integration System . . . Metadata Extractor/ Monitor Extractor/ Monitor  Information integrated in advance  Stored in wh for direct querying and analysis
  • 9. 9 Advantages of Warehousing Approach  High query performance  But not necessarily most current information  Doesn’t interfere with local processing at sources  Complex queries at warehouse  OLTP at information sources  Information copied at warehouse  Can modify, annotate, summarize, restructure, etc.  Can store historical information  Security, no auditing  Has caught on in industry
  • 10. 10 Not Either-Or Decision  Query-driven approach still better for  Rapidly changing information  Rapidly changing information sources  Truly vast amounts of data from large numbers of sources  Clients with unpredictable needs
  • 11. 11 What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant
  • 12. 12 What is a Data Warehouse? An Alternative Viewpoint “A DW is a  subject-oriented,  integrated,  time-varying,  non-volatile collection of data that is used primarily in organizational decision making.” -- W.H. Inmon, Building the Data Warehouse, 1992
  • 13. 13 A Data Warehouse is...  Stored collection of diverse data  A solution to data integration problem  Single repository of information  Subject-oriented  Organized by subject, not by application  Used for analysis, data mining, etc.  Optimized differently from transaction- oriented db  User interface aimed at executive
  • 14. 14 … Cont’d  Large volume of data (Gb, Tb)  Non-volatile  Historical  Time attributes are important  Updates infrequent  May be append-only  Examples  All transactions ever at Sainsbury’s  Complete client histories at insurance firm  LSE financial information and portfolios
  • 15. 15 Generic Warehouse Architecture Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse Client Client Design Phase Maintenance Loading ... Metadata Optimization Query & Analysis
  • 16. 16 Data Warehouse Architectures: Conceptual View  Single-layer  Every data element is stored once only  Virtual warehouse  Two-layer  Real-time + derived data  Most commonly used approach in industry today “Real-time data” Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems
  • 17. 17 Three-layer Architecture: Conceptual View  Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level “Particular informational needs”
  • 18. 18 Data Warehousing: Two Distinct Issues (1) How to get information into warehouse “Data warehousing” (2) What to do with data once it’s in warehouse “Warehouse DBMS”  Both rich research areas  Industry has focused on (2)
  • 19. 19 Issues in Data Warehousing  Warehouse Design  Extraction  Wrappers, monitors (change detectors)  Integration  Cleansing & merging  Warehousing specification & Maintenance  Optimizations  Miscellaneous (e.g., evolution)
  • 20. 20  OLTP: On Line Transaction Processing  Describes processing at operational sites  OLAP: On Line Analytical Processing  Describes processing at warehouse OLTP vs. OLAP
  • 21. 21 Warehouse is a Specialized DB Standard DB (OLTP)  Mostly updates  Many small transactions  Mb - Gb of data  Current snapshot  Index/hash on p.k.  Raw data  Thousands of users (e.g., clerical users) Warehouse (OLAP)  Mostly reads  Queries are long and complex  Gb - Tb of data  History  Lots of scans  Summarized, reconciled data  Hundreds of users (e.g., decision-makers, analysts)