SlideShare a Scribd company logo
INTRODUCTION TO DATA
WAREHOUSING
BY
QUONTRA SOLUTIONS
IT COURSES ONLINE TRAINING WITH PLACEMENT
SUPPORT
PHONE : +44 (0)20 3734 1498 / 99
EMAIL: INFO@QUONTRASOLUTIONS.CO.UK
WEB: WWW.QUONTRASOLUTIONS.CO.UK
DATA WAREHOUSE
 Maintain historic data
 Analysis to get better understanding of business
 Better Decision making
 Definition: A data warehouse is a
 subject-oriented
 integrated
 time-varying
 non-volatile
collection of data that is used primarily in organizational
decision making.
-- Bill Inmon, Building the Data Warehouse
1996
SUBJECT ORIENTED
• Data warehouse is organized around subjects such as sales,
product, customer.
• It focuses on modeling and analysis of data for decision
makers.
• Excludes data not useful in decision support process.
INTEGRATED
• Data Warehouse is constructed by integrating multiple
heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
Legacy
System
Data
Warehouse
Flat File
Data Processing
Data Transformation
Data Processing
Data Transformation
NON-VOLATILE
• Mostly, data once recorded will not be updated.
• Data warehouse requires two operations in data accessing
- Incremental loading of data
- Access of data
load access
TIME VARIANT
• Provides information from historical perspective e.g. past 5-
10 years
• Every key structure contains either implicitly or explicitly an
element of time
WHY DATA WAREHOUSE?
Problem Statement:
• ABC Pvt Ltd is a company with branches at USA,
UK,CANADA,INDIA
• The Sales Manager wants quarterly sales report across the
branches.
• Each branch has a separate operational system where sales
transactions are recorded.
WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
Sales
Manager
Get quarterly sales figure
for each branch
and manually calculate
sales figure across branches.
What if he need daily sales report across the branches?
WHY DATA WAREHOUSE?
Solution:
• Extract sales information from each database.
• Store the information in a common repository at a single site.
WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
Data
Warehouse
Sales
Manager
Query &
Analysis tools
CHARACTERISTICS OF DATA WAREHOUSE
 Relational / Multidimensional database
 Query and Analysis rather than transaction
 Historical data from transactions
 Consolidates Multiple data sources
 Separates query load from transactions
 Mostly non volatile
 Large amount of data in order of TBs
WHEN WE SAY LARGE - WE MEAN IT!
• Terabytes -- 10^12 bytes:
• Petabytes -- 10^15 bytes:
• Exabytes -- 10^18 bytes:
• Zettabytes -- 10^21 bytes:
• Zottabytes -- 10^24 bytes:
Yahoo! – 300 Terabytes and
growing
Geographic Information Systems
National Medical Records
Weather images
Intelligence Agency Videos
OLTP VS DATA WAREHOUSE (OLAP)
OLTP Data Warehouse (OLAP)
Indexes Few Many
Data Normalized Generally De-normalized
Joins Many Some
Derived data and aggregates Rare Common
DATA WAREHOUSE ARCHITECTURE
Flat
Files
ETL
(Extract
Transform
and Load)
Data
Warehouse
Inventory
Data Mart
Data Mining
Analysis
Reporting
Generic
Data Mart
Sales
Data Mart
Operational
System
Operational
System
Flat
Files
ETL
ETL stands for Extract, Transform and Load
 Data is distributed across different sources
– Flat files, Streaming Data, DB Systems, XML, JSON
 Data can be in different format
– CSV, Key Value Pairs
 Different units and representation
– Country: IN or India
– Date: 20 Nov 2010 or 20101020
ETL FUNCTIONS
 Extract
– Collect data from different sources
– Parse data
– Remove unwanted data
 Transform
– Project
– Generate Surrogate keys
– Encode data
– Join data from different sources
– Aggregate
 Load
ETL STEPS
• The first step in ETL process is mapping the data between
source systems and target database.
• The second step is cleansing of source data in staging area.
• The third step is transforming cleansed source data.
• Fourth step is loading into the target system.
 Data before ETL Processing:
 Data after ETL Processing:
ETL GLOSSARY
Mapping:
Defining relationship between source and target objects.
Cleansing:
The process of resolving inconsistencies in source data.
Transformation:
The process of manipulating data. Any manipulation beyond
copying is a transformation. Examples include aggregating, and
integrating data from multiple sources.
Staging Area:
A place where data is processed before entering the warehouse.
DIMENSION
 Categorizes the data. For example - time, location, etc.
 A dimension can have one or more attributes. For example
- day, week and month are attributes of time dimension.
 Role of dimensions in data warehousing.
- Slice and dice
- Filter by dimensions
TYPES OF DIMENSIONS
• Conformed Dimension - A dimension that is shared across fact tables.
• Junk Dimension - A junk dimension is a convenient grouping of flags
and indicators. For example, payment method, shipping method.
• De-generated Dimension - A dimension key, that has no attributes and
hence does not have its own dimension table. For example,
transaction number, invoice number. Value of these dimension is
mostly unique within a fact table.
• Role Playing Dimensions - Role Playing dimension refers to a
dimension that play different roles in fact tables depending on the
context. For example, the Date dimension can be used for the ordered
date, shipment date, and invoice date.
• Slowly Changing Dimensions - Dimensions that have data that
changes slowly, rather than changing on a time-based, regular
schedule.
TYPES OF SLOWLY CHANGING DIMENSION
• Type1 - The Type 1 methodology overwrites old data with new data, and
therefore does not track historical data at all.
• Type 2 - The Type 2 method tracks historical data by creating multiple records
for a given value in dimension table with separate surrogate keys.
• Type 3 - The Type 3 method tracks changes using separate columns. Whereas
Type 2 had unlimited history preservation, Type 3 has limited history
preservation, as it's limited to the number of columns we designate for storing
historical data.
• Type 4 - The Type 4 method is usually referred to as using "history tables",
where one table keeps the current data, and an additional table is used to keep
a record of all changes.
Type 1, 2 and 3 are commonly used.
Some books talks about Type 0 and 6 also.
http://en.wikipedia.org/wiki/Slowly_changing_dimension
FACTS
 Facts are values that can be examined and analyzed.
 For Example - Page Views, Unique Users, Pieces Sold,
Profit.
 Fact and measure are synonymous.
 Types of facts:
– Additive - Measures that can be added across all
dimensions.
– Non Additive - Measures that cannot be added across
all dimensions.
– Semi Additive - Measures that can be added across
few dimensions and not with others.
HOW TO STORE DATA?
Facts and Dimensions:
1. Select the business process to model
2. Declare the grain of the business process
3. Choose the dimensions that apply to each fact table row
4. Identify the numeric facts that will populate each fact table
row
DIMENSION TABLE
 Contains attributes of dimensions e.g. Month is an attribute
of Time dimension.
 Can also have foreign keys to another dimension table
 Usually identified by a unique integer primary key called
surrogate key
FACT TABLE
 Contains Facts
 Foreign keys to dimension tables
 Primary Key: usually composite key of all FKs
TYPES OF SCHEMA USED IN DATA WAREHOUSE
 Star Schema
 Snowflake Schema
 Fact Constellation Schema
STAR SCHEMA
 Multi-dimensional Data
 Dimension and Fact Tables
 A fact table with pointers to Dimension tables
STAR SCHEMA
SNOWFLAKE SCHEMA
 An extension of star schema in which the dimension tables
are partly or fully normalized.
 Dimension table hierarchies broken down into simpler
tables.
SNOWFLAKE SCHEMA
FACT CONSTELLATION SCHEMA
• A fact constellation schema allows dimension tables to be
shared between fact tables.
• This Schema is used mainly for the aggregate fact tables,
OR where we want to split a fact table for better
comprehension.
 For example, a separate fact table for daily, weekly and
monthly reporting requirement.
FACT CONSTELLATION SCHEMA
In this example, the dimensions tables for time, item, and location are
shared between both the sales and shipping fact tables.
OPERATIONS ON DATA WAREHOUSE
 Drill Down
 Roll up
 Slice & Dice
 Pivoting
DRILL DOWN
Time
Product
Category e.g Home Appliances
Sub Category e.g Kitchen Appliances
Product e.g Toaster
ROLL UP
Year
Quarter
Month
Fiscal Year
Fiscal Quarter
Fiscal Month
Fiscal Week
Day
SLICE & DICE
Time
Product
Product = Toaster
Time
PIVOTING
• Also called rotation
• Rotate on an axis
• Interchange Rows and Columns
Time
Product
Region
Product
ADVANTAGES OF DATA WAREHOUSE
• One consistent data store for reporting, forecasting, and
analysis
• Easier and timely access to data
• Scalability
• Trend analysis and detection
• Drill down analysis
DISADVANTAGES OF DATA WAREHOUSE
• Preparation may be time consuming.
• High associated cost
CASE STUDY: WHY DATA WAREHOUSE
• G2G Courier Pvt. Ltd. is an established brand in courier
industry which has its own network in main cities and also
have sub contracted in rural areas across the country to
various partners.
• The President of the company wants to look deep into the
financial health of the company and different performance
aspects.
CHALLENGES
• Apart from G2G’s own transaction system, each partner has
their own system which make the data very heterogeneous.
• Granularity of data in various systems is also different. For
eg: minute accuracy and day accuracy.
• To do analysis on metrics like Revenue and Timely delivery
across various geographical locations and partner, we need
to have a unified system.
DATA WAREHOUSE MODEL
Sales Fact
Region
Product
Product
Category
Time
THANK YOU

More Related Content

PDF
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
PPTX
Agile Methods and Data Warehousing (2016 update)
PDF
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
PDF
Worst Practices in Data Warehouse Design
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
PDF
Demystifying Data Warehouse as a Service (DWaaS)
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
PDF
Data Mesh for Dinner
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Agile Methods and Data Warehousing (2016 update)
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Worst Practices in Data Warehouse Design
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Demystifying Data Warehouse as a Service (DWaaS)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Data Mesh for Dinner

What's hot (20)

PDF
Speeding Time to Insight with a Modern ELT Approach
PDF
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
PPTX
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
PDF
Making Sense of Schema on Read
PDF
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
PDF
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
PDF
Deploying Full BI Platforms to Oracle Cloud
PPTX
Visual Data Vault
PDF
Actionable Insights with AI - Snowflake for Data Science
PPT
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
PDF
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
PPTX
Design Principles for a Modern Data Warehouse
PPTX
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
PPTX
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
PDF
Introduction to Data Vault Modeling
PPT
Kb 40 kevin_klineukug_reading20070717[1]
PDF
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
PDF
DQS & MDS in SQL Server 2016
PDF
NoSQL – Beyond the Key-Value Store
PPTX
Cheetah:Data Warehouse on Top of MapReduce
Speeding Time to Insight with a Modern ELT Approach
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
Making Sense of Schema on Read
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Deploying Full BI Platforms to Oracle Cloud
Visual Data Vault
Actionable Insights with AI - Snowflake for Data Science
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Design Principles for a Modern Data Warehouse
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Introduction to Data Vault Modeling
Kb 40 kevin_klineukug_reading20070717[1]
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
DQS & MDS in SQL Server 2016
NoSQL – Beyond the Key-Value Store
Cheetah:Data Warehouse on Top of MapReduce
Ad

Similar to Dataware house Introduction By Quontra Solutions (20)

PPTX
Dataware house introduction by InformaticaTrainingClasses
PPTX
Introduction to Data Warehousing
PPT
Data warehouse
PDF
1 introductory slides (1)
PPTX
Module 1_Data Warehousing Fundamentals.pptx
PPTX
1-Data Warehousing-Multi Dim Data Model.pptx
PDF
Data Warehouse Introduction - Data Mining
PPTX
Data warehouse - Nivetha Durganathan
PDF
data warehousing and online analtytical processing
PPT
Data warehousing and online analytical processing
PPTX
Data Mining and Data Warehousing Presentation
PPT
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
PPTX
Data Warehousing for students educationpptx
PPTX
Data warehousing
PDF
PPTX
DATA WAREHOUSING.2.pptx
PPT
Data Mining Concept & Technique-ch04.ppt
PPTX
DATA WAREHOUSING
PPT
11666 Bitt I 2008 Lect3
PPT
11667 Bitt I 2008 Lect4
Dataware house introduction by InformaticaTrainingClasses
Introduction to Data Warehousing
Data warehouse
1 introductory slides (1)
Module 1_Data Warehousing Fundamentals.pptx
1-Data Warehousing-Multi Dim Data Model.pptx
Data Warehouse Introduction - Data Mining
Data warehouse - Nivetha Durganathan
data warehousing and online analtytical processing
Data warehousing and online analytical processing
Data Mining and Data Warehousing Presentation
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Data Warehousing for students educationpptx
Data warehousing
DATA WAREHOUSING.2.pptx
Data Mining Concept & Technique-ch04.ppt
DATA WAREHOUSING
11666 Bitt I 2008 Lect3
11667 Bitt I 2008 Lect4
Ad

More from Quontra Solutions (12)

PPTX
Java Constructors with examples - Quontra Solutions
PPTX
Oracle-12c Online Training by Quontra Solutions
PPT
Test Automation Framework Online Training by QuontraSolutions
PPTX
Enterprise java beans
PPT
Automation with Selenium Presented by Quontra Solutions
PPT
Automated Software Testing Framework Training by Quontra Solutions
PPT
DataMining and OLAP Technology Concepts Presented By Quontra Solutions
PPTX
Network security by quontra solutions uk
PPTX
Introduction to .net FrameWork by QuontraSolutions
DOC
Informatica Metadata Exchange Frequently Asked Questions by Quontra Solutions
DOC
Informatica metadata exchange frequently asked questions by quontra solutions
PPTX
Selenium overview ppt by quontra solutions
Java Constructors with examples - Quontra Solutions
Oracle-12c Online Training by Quontra Solutions
Test Automation Framework Online Training by QuontraSolutions
Enterprise java beans
Automation with Selenium Presented by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
DataMining and OLAP Technology Concepts Presented By Quontra Solutions
Network security by quontra solutions uk
Introduction to .net FrameWork by QuontraSolutions
Informatica Metadata Exchange Frequently Asked Questions by Quontra Solutions
Informatica metadata exchange frequently asked questions by quontra solutions
Selenium overview ppt by quontra solutions

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
master seminar digital applications in india
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Institutional Correction lecture only . . .
STATICS OF THE RIGID BODIES Hibbelers.pdf
VCE English Exam - Section C Student Revision Booklet
Pharma ospi slides which help in ospi learning
Week 4 Term 3 Study Techniques revisited.pptx
Complications of Minimal Access Surgery at WLH
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O7-L3 Supply Chain Operations - ICLT Program
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Renaissance Architecture: A Journey from Faith to Humanism
master seminar digital applications in india
102 student loan defaulters named and shamed – Is someone you know on the list?
PPH.pptx obstetrics and gynecology in nursing
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Classroom Observation Tools for Teachers
Abdominal Access Techniques with Prof. Dr. R K Mishra
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...

Dataware house Introduction By Quontra Solutions

  • 1. INTRODUCTION TO DATA WAREHOUSING BY QUONTRA SOLUTIONS IT COURSES ONLINE TRAINING WITH PLACEMENT SUPPORT PHONE : +44 (0)20 3734 1498 / 99 EMAIL: INFO@QUONTRASOLUTIONS.CO.UK WEB: WWW.QUONTRASOLUTIONS.CO.UK
  • 2. DATA WAREHOUSE  Maintain historic data  Analysis to get better understanding of business  Better Decision making  Definition: A data warehouse is a  subject-oriented  integrated  time-varying  non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996
  • 3. SUBJECT ORIENTED • Data warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process.
  • 4. INTEGRATED • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation Data Processing Data Transformation
  • 5. NON-VOLATILE • Mostly, data once recorded will not be updated. • Data warehouse requires two operations in data accessing - Incremental loading of data - Access of data load access
  • 6. TIME VARIANT • Provides information from historical perspective e.g. past 5- 10 years • Every key structure contains either implicitly or explicitly an element of time
  • 7. WHY DATA WAREHOUSE? Problem Statement: • ABC Pvt Ltd is a company with branches at USA, UK,CANADA,INDIA • The Sales Manager wants quarterly sales report across the branches. • Each branch has a separate operational system where sales transactions are recorded.
  • 8. WHY DATA WAREHOUSE? USA UK CANADA INDIA Sales Manager Get quarterly sales figure for each branch and manually calculate sales figure across branches. What if he need daily sales report across the branches?
  • 9. WHY DATA WAREHOUSE? Solution: • Extract sales information from each database. • Store the information in a common repository at a single site.
  • 11. CHARACTERISTICS OF DATA WAREHOUSE  Relational / Multidimensional database  Query and Analysis rather than transaction  Historical data from transactions  Consolidates Multiple data sources  Separates query load from transactions  Mostly non volatile  Large amount of data in order of TBs
  • 12. WHEN WE SAY LARGE - WE MEAN IT! • Terabytes -- 10^12 bytes: • Petabytes -- 10^15 bytes: • Exabytes -- 10^18 bytes: • Zettabytes -- 10^21 bytes: • Zottabytes -- 10^24 bytes: Yahoo! – 300 Terabytes and growing Geographic Information Systems National Medical Records Weather images Intelligence Agency Videos
  • 13. OLTP VS DATA WAREHOUSE (OLAP) OLTP Data Warehouse (OLAP) Indexes Few Many Data Normalized Generally De-normalized Joins Many Some Derived data and aggregates Rare Common
  • 14. DATA WAREHOUSE ARCHITECTURE Flat Files ETL (Extract Transform and Load) Data Warehouse Inventory Data Mart Data Mining Analysis Reporting Generic Data Mart Sales Data Mart Operational System Operational System Flat Files
  • 15. ETL ETL stands for Extract, Transform and Load  Data is distributed across different sources – Flat files, Streaming Data, DB Systems, XML, JSON  Data can be in different format – CSV, Key Value Pairs  Different units and representation – Country: IN or India – Date: 20 Nov 2010 or 20101020
  • 16. ETL FUNCTIONS  Extract – Collect data from different sources – Parse data – Remove unwanted data  Transform – Project – Generate Surrogate keys – Encode data – Join data from different sources – Aggregate  Load
  • 17. ETL STEPS • The first step in ETL process is mapping the data between source systems and target database. • The second step is cleansing of source data in staging area. • The third step is transforming cleansed source data. • Fourth step is loading into the target system.  Data before ETL Processing:  Data after ETL Processing:
  • 18. ETL GLOSSARY Mapping: Defining relationship between source and target objects. Cleansing: The process of resolving inconsistencies in source data. Transformation: The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include aggregating, and integrating data from multiple sources. Staging Area: A place where data is processed before entering the warehouse.
  • 19. DIMENSION  Categorizes the data. For example - time, location, etc.  A dimension can have one or more attributes. For example - day, week and month are attributes of time dimension.  Role of dimensions in data warehousing. - Slice and dice - Filter by dimensions
  • 20. TYPES OF DIMENSIONS • Conformed Dimension - A dimension that is shared across fact tables. • Junk Dimension - A junk dimension is a convenient grouping of flags and indicators. For example, payment method, shipping method. • De-generated Dimension - A dimension key, that has no attributes and hence does not have its own dimension table. For example, transaction number, invoice number. Value of these dimension is mostly unique within a fact table. • Role Playing Dimensions - Role Playing dimension refers to a dimension that play different roles in fact tables depending on the context. For example, the Date dimension can be used for the ordered date, shipment date, and invoice date. • Slowly Changing Dimensions - Dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.
  • 21. TYPES OF SLOWLY CHANGING DIMENSION • Type1 - The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. • Type 2 - The Type 2 method tracks historical data by creating multiple records for a given value in dimension table with separate surrogate keys. • Type 3 - The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. • Type 4 - The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of all changes. Type 1, 2 and 3 are commonly used. Some books talks about Type 0 and 6 also. http://en.wikipedia.org/wiki/Slowly_changing_dimension
  • 22. FACTS  Facts are values that can be examined and analyzed.  For Example - Page Views, Unique Users, Pieces Sold, Profit.  Fact and measure are synonymous.  Types of facts: – Additive - Measures that can be added across all dimensions. – Non Additive - Measures that cannot be added across all dimensions. – Semi Additive - Measures that can be added across few dimensions and not with others.
  • 23. HOW TO STORE DATA? Facts and Dimensions: 1. Select the business process to model 2. Declare the grain of the business process 3. Choose the dimensions that apply to each fact table row 4. Identify the numeric facts that will populate each fact table row
  • 24. DIMENSION TABLE  Contains attributes of dimensions e.g. Month is an attribute of Time dimension.  Can also have foreign keys to another dimension table  Usually identified by a unique integer primary key called surrogate key
  • 25. FACT TABLE  Contains Facts  Foreign keys to dimension tables  Primary Key: usually composite key of all FKs
  • 26. TYPES OF SCHEMA USED IN DATA WAREHOUSE  Star Schema  Snowflake Schema  Fact Constellation Schema
  • 27. STAR SCHEMA  Multi-dimensional Data  Dimension and Fact Tables  A fact table with pointers to Dimension tables
  • 29. SNOWFLAKE SCHEMA  An extension of star schema in which the dimension tables are partly or fully normalized.  Dimension table hierarchies broken down into simpler tables.
  • 31. FACT CONSTELLATION SCHEMA • A fact constellation schema allows dimension tables to be shared between fact tables. • This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension.  For example, a separate fact table for daily, weekly and monthly reporting requirement.
  • 32. FACT CONSTELLATION SCHEMA In this example, the dimensions tables for time, item, and location are shared between both the sales and shipping fact tables.
  • 33. OPERATIONS ON DATA WAREHOUSE  Drill Down  Roll up  Slice & Dice  Pivoting
  • 34. DRILL DOWN Time Product Category e.g Home Appliances Sub Category e.g Kitchen Appliances Product e.g Toaster
  • 35. ROLL UP Year Quarter Month Fiscal Year Fiscal Quarter Fiscal Month Fiscal Week Day
  • 37. PIVOTING • Also called rotation • Rotate on an axis • Interchange Rows and Columns Time Product Region Product
  • 38. ADVANTAGES OF DATA WAREHOUSE • One consistent data store for reporting, forecasting, and analysis • Easier and timely access to data • Scalability • Trend analysis and detection • Drill down analysis
  • 39. DISADVANTAGES OF DATA WAREHOUSE • Preparation may be time consuming. • High associated cost
  • 40. CASE STUDY: WHY DATA WAREHOUSE • G2G Courier Pvt. Ltd. is an established brand in courier industry which has its own network in main cities and also have sub contracted in rural areas across the country to various partners. • The President of the company wants to look deep into the financial health of the company and different performance aspects.
  • 41. CHALLENGES • Apart from G2G’s own transaction system, each partner has their own system which make the data very heterogeneous. • Granularity of data in various systems is also different. For eg: minute accuracy and day accuracy. • To do analysis on metrics like Revenue and Timely delivery across various geographical locations and partner, we need to have a unified system.
  • 42. DATA WAREHOUSE MODEL Sales Fact Region Product Product Category Time