SlideShare a Scribd company logo
Data Warehouse: 11
Data Transformation
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Warehouse is a Specialized DB
Standard (Operational)
DB
‱ Mostly updates
‱ Many small
transactions
‱ Mb - Gb of data
‱ Current snapshot
‱ Index/hash on p.k.
‱ Raw data
‱ Thousands of users
(e.g., clerical users)
Warehouse
(Informational)
‱ Mostly reads
‱ Queries are long and
complex
‱ Gb - Tb of data
‱ History
‱ Lots of scans
‱ Summarized,
reconciled data
‱ Few users (e.g.,
decision-makers,
analysts)
Typical OLTP Data Model
Typical Data Warehouse Data Model
Other Data Warehouse Changes
‱ New descriptive attributes
‱ New business activity attributes
‱ New classes of descriptive attributes
‱ Descriptive attributes become more refined
‱ Descriptive data are related to one another
‱ New source of data
The Reconciled Data Layer
‱ Typical operational data is:
– Transient
– Not historical
– Restricted in scope–not comprehensive
– Sometimes poor quality–inconsistencies and errors
‱ After ETL, data should be:
– Detailed–not summarized yet
– Historical–periodic
– Comprehensive–enterprise-wide perspective
– Timely–data should be current enough to assist decision-making
– Quality controlled–accurate with full integrity
Types of Data in DW
‱ Business Data - represents meaning
– Real-time data (ultimate source of all business data)
– Reconciled data
– Derived data
‱ Metadata - describes meaning
– Build-time metadata
– Control metadata
– Usage metadata
‱ Data as a product* - intrinsic meaning
– Produced and stored for its own intrinsic value
– e.g., the contents of a text-book
The ETL Process
‱ Capture/Extract
‱ Scrub or data cleansing
‱ Transform
‱ Load and Index
‱ ETL = Extract, transform, and load
Capture/Extract
obtaining a snapshot of a chosen subset of
the source data for loading into the data warehouse
Static extract = capturing
a snapshot of the source
data at a point in time
Incremental extract =
capturing changes that
have occurred since the last
static extract
Thank You

More Related Content

PPT
Online Analytical Processing
PPT
Olap, oltp and data mining
 
PPTX
Data warehouse 10 oltp vs datawarehouse
PPTX
Efficient & effective data management for research projects : ILRI's Data Ma...
ODP
Building next generation data warehouses
PPTX
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
PPTX
Introduction to NoSQL and MongoDB
PPTX
Lantea platform
Online Analytical Processing
Olap, oltp and data mining
 
Data warehouse 10 oltp vs datawarehouse
Efficient & effective data management for research projects : ILRI's Data Ma...
Building next generation data warehouses
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
Introduction to NoSQL and MongoDB
Lantea platform

What's hot (19)

PPT
ETL Testing - Introduction to ETL testing
PPTX
ORM Tools
PPTX
Rdbms
DOC
Informatica Online Training
PPTX
NoSQL Architecture Pattern
PDF
Ciel, mes données ne sont plus relationnelles
PPTX
From Millennium ERMS to Proquest 360 Resource Manager
PPTX
Hadoop etl
PPT
GOKb: What it builds on, what it can build (code4lib 2012)
PDF
DataGraft Platform: RDF Database-as-a-Service
PDF
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
PPT
Coherance in dissemination- Msis 2007
PDF
Optimize MySQL performance for developers
PPTX
Using Tableau to Assess Electronic Resources in Context
PDF
Operationalizing Data Science using Cloud Foundry
PPTX
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
PDF
PLNOG 6: Piotr Modzelewski, BartƂomiej Rymarski - Product Catalogue - Case Study
PDF
Business objects data services advanced
PPT
Breaking the Waves: Implementing Coral at UW-Parkside
ETL Testing - Introduction to ETL testing
ORM Tools
Rdbms
Informatica Online Training
NoSQL Architecture Pattern
Ciel, mes données ne sont plus relationnelles
From Millennium ERMS to Proquest 360 Resource Manager
Hadoop etl
GOKb: What it builds on, what it can build (code4lib 2012)
DataGraft Platform: RDF Database-as-a-Service
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Coherance in dissemination- Msis 2007
Optimize MySQL performance for developers
Using Tableau to Assess Electronic Resources in Context
Operationalizing Data Science using Cloud Foundry
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
PLNOG 6: Piotr Modzelewski, BartƂomiej Rymarski - Product Catalogue - Case Study
Business objects data services advanced
Breaking the Waves: Implementing Coral at UW-Parkside
Ad

Similar to Data warehouse 11 introduction to data transformation (20)

PPT
Data ware housing - Introduction to data ware housing process.
PPTX
Data warehouse introduction
PPT
Various Applications of Data Warehouse.ppt
PPT
Lecture1
PPT
DW (1).ppt
PPTX
Chap3-Data Warehousing and OLAP operations..pptx
PPT
A Comparsion of Databases and DataWarehouses.ppt
PPTX
Data warehouse
PPTX
BI Introduction
PPT
Design and implementation of Clinical Databases using openEHR
PPTX
Lens at apachecon
PPT
Ch1 data-warehousing
PPT
Ch1 data-warehousing
PDF
An AMIS overview of database 12c
PPTX
An AMIS Overview of Oracle database 12c (12.1)
PPT
Business intelligence and data warehouses
PDF
InfiniFlux vs_RDBMS
PPT
DataBaseManagementSystem-DBMS
PPT
Analysis technologies - day3 slides Lecture notesppt
Data ware housing - Introduction to data ware housing process.
Data warehouse introduction
Various Applications of Data Warehouse.ppt
Lecture1
DW (1).ppt
Chap3-Data Warehousing and OLAP operations..pptx
A Comparsion of Databases and DataWarehouses.ppt
Data warehouse
BI Introduction
Design and implementation of Clinical Databases using openEHR
Lens at apachecon
Ch1 data-warehousing
Ch1 data-warehousing
An AMIS overview of database 12c
An AMIS Overview of Oracle database 12c (12.1)
Business intelligence and data warehouses
InfiniFlux vs_RDBMS
DataBaseManagementSystem-DBMS
Analysis technologies - day3 slides Lecture notesppt
Ad

More from Vaibhav Khanna (20)

PPTX
Information and network security 47 authentication applications
PPTX
Information and network security 46 digital signature algorithm
PPTX
Information and network security 45 digital signature standard
PPTX
Information and network security 44 direct digital signatures
PPTX
Information and network security 43 digital signatures
PPTX
Information and network security 42 security of message authentication code
PPTX
Information and network security 41 message authentication code
PPTX
Information and network security 40 sha3 secure hash algorithm
PPTX
Information and network security 39 secure hash algorithm
PPTX
Information and network security 38 birthday attacks and security of hash fun...
PPTX
Information and network security 37 hash functions and message authentication
PPTX
Information and network security 35 the chinese remainder theorem
PPTX
Information and network security 34 primality
PPTX
Information and network security 33 rsa algorithm
PPTX
Information and network security 32 principles of public key cryptosystems
PPTX
Information and network security 31 public key cryptography
PPTX
Information and network security 30 random numbers
PPTX
Information and network security 29 international data encryption algorithm
PPTX
Information and network security 28 blowfish
PPTX
Information and network security 27 triple des
Information and network security 47 authentication applications
Information and network security 46 digital signature algorithm
Information and network security 45 digital signature standard
Information and network security 44 direct digital signatures
Information and network security 43 digital signatures
Information and network security 42 security of message authentication code
Information and network security 41 message authentication code
Information and network security 40 sha3 secure hash algorithm
Information and network security 39 secure hash algorithm
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 37 hash functions and message authentication
Information and network security 35 the chinese remainder theorem
Information and network security 34 primality
Information and network security 33 rsa algorithm
Information and network security 32 principles of public key cryptosystems
Information and network security 31 public key cryptography
Information and network security 30 random numbers
Information and network security 29 international data encryption algorithm
Information and network security 28 blowfish
Information and network security 27 triple des

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Transform Your Business with a Software ERP System
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Wondershare Filmora 15 Crack With Activation Key [2025
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
Which alternative to Crystal Reports is best for small or large businesses.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Operating system designcfffgfgggggggvggggggggg
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PTS Company Brochure 2025 (1).pdf.......
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Transform Your Business with a Software ERP System
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Choose the Right IT Partner for Your Business in Malaysia
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Odoo Companies in India – Driving Business Transformation.pdf

Data warehouse 11 introduction to data transformation

  • 1. Data Warehouse: 11 Data Transformation Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. Warehouse is a Specialized DB Standard (Operational) DB ‱ Mostly updates ‱ Many small transactions ‱ Mb - Gb of data ‱ Current snapshot ‱ Index/hash on p.k. ‱ Raw data ‱ Thousands of users (e.g., clerical users) Warehouse (Informational) ‱ Mostly reads ‱ Queries are long and complex ‱ Gb - Tb of data ‱ History ‱ Lots of scans ‱ Summarized, reconciled data ‱ Few users (e.g., decision-makers, analysts)
  • 5. Other Data Warehouse Changes ‱ New descriptive attributes ‱ New business activity attributes ‱ New classes of descriptive attributes ‱ Descriptive attributes become more refined ‱ Descriptive data are related to one another ‱ New source of data
  • 6. The Reconciled Data Layer ‱ Typical operational data is: – Transient – Not historical – Restricted in scope–not comprehensive – Sometimes poor quality–inconsistencies and errors ‱ After ETL, data should be: – Detailed–not summarized yet – Historical–periodic – Comprehensive–enterprise-wide perspective – Timely–data should be current enough to assist decision-making – Quality controlled–accurate with full integrity
  • 7. Types of Data in DW ‱ Business Data - represents meaning – Real-time data (ultimate source of all business data) – Reconciled data – Derived data ‱ Metadata - describes meaning – Build-time metadata – Control metadata – Usage metadata ‱ Data as a product* - intrinsic meaning – Produced and stored for its own intrinsic value – e.g., the contents of a text-book
  • 8. The ETL Process ‱ Capture/Extract ‱ Scrub or data cleansing ‱ Transform ‱ Load and Index ‱ ETL = Extract, transform, and load
  • 9. Capture/Extract
obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract