SlideShare a Scribd company logo
Data Warehouse Concepts
What is a Data Warehouse? According to Inmon, famous author for several data warehouse
books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection
of data in support of management's decision making process".
Example: In order to store data, over the years, many application designers in each
branch have made their individual decisions as to how an application and database should be
built. So source systems will be different in naming conventions, variable measurements,
encoding structures, and physical attributes of data. Consider a bank that has got several
branches in several countries, has millions of customers and the lines of business of the
enterprise are savings, and loans. The following example explains how the data is integrated
from source systems to target systems.
Example of Source Data
System Name Attribute Name Column Name Datatype Values
Source System
1
Customer Application
Date
CUSTOMER_APPLICATION_DATE NUMERIC(8,0) 11012005
Source System
2
Customer Application
Date
CUST_APPLICATION_DATE DATE 11012005
Source System
3
Application Date APPLICATION_DATE DATE 01NOV2005
In the aforementioned example, attribute name, column name, datatype and values are entirely
different from one source system to another. This inconsistency in data can be avoided by
integrating the data into a data warehouse with good standards.
Example of Target Data(Data Warehouse)
Target System Attribute Name Column Name Datatype Values
Record #1 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #2 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #3 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
In the above example of target data, attribute names, column names, and datatypes are
consistent throughout the target system. This is how data from various source systems is
integrated and accurately stored into the data warehouse.
See Figure 1.12 below for Data Warehouse Architecture Diagram.
Figure 1.12 : Data Warehouse Architecture
Data Warehouse & Data Mart
A data warehouse is a relational/multidimensional database that is designed for query and
analysis rather than transaction processing. A data warehouse usually contains historical data
that is derived from transaction data. It separates analysis workload from transaction workload
and enables a business to consolidate data from several sources.
In addition to a relational/multidimensional database, a data warehouse environment
often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications
that manage the process of gathering data and delivering it to business users.
There are three types of data warehouses:
1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database
for decision support throughout the enterprise.
2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real
entertprise data warehouse, data is refreshed in near real time and used for routine business
activity.
3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region,
business unit or business function.
Data warehouses and data marts are built on dimensional data modeling where fact tables are
connected with dimension tables. This is most useful for users to access data since a database
can be visualized as a cube of several dimensions. A data warehouse provides an opportunity for
slicing and dicing that cube along each of its dimensions.
Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of
business, such as sales, marketing, or finance. In a dependent data mart, data can be derived
from an enterprise-wide data warehouse. In an independent data mart, data can be collected
directly from sources.
Figure 1.12 : Data Warehouse and Datamarts
General Information
In general, an organization is started to earn money by selling a product or by providing service
to the product. An organization may be at one place or may have several branches.
When we consider an example of an organization selling products throughtout the world,
the main four major dimensions are product, location, time and organization. Dimension tables
have been explained in detail under the section Dimensions. With this example, we will try to
provide detailed explanation about STAR SCHEMA.
What is Star Schema?
Star Schema is a relational database schema for representing multimensional data. It is the
simplest form of data warehouse schema that contains one or more dimensions and fact tables.
It is called a star schema because the entity-relationship diagram between dimensions and fact
tables resembles a star where one fact table is connected to multiple dimensions. The center of
the star schema consists of a large fact table and it points towards the dimension tables. The
advantage of star schema are slicing down, performance increase and easy understanding of
data.
Steps in designing Star Schema
• Identify a business process for analysis(like sales).
• Identify measures or facts (sales dollar).
• Identify dimensions for facts(product dimension, location dimension, time dimension,
organization dimension).
• List the columns that describe each dimension.(region name, branch name, region name).
• Determine the lowest level of summary in a fact table(sales dollar).
Important aspects of Star Schema & Snow Flake Schema
• In a star schema every dimension will have a primary key.
• In a star schema, a dimension table will not have any parent table.
• Whereas in a snow flake schema, a dimension table will have one or more parent tables.
• Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
• Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps
to drill down the data from topmost hierachies to the lowermost hierarchies.
Glossary:
Hierarchy
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be
used to define data aggregation; for example, in a time dimension, a hierarchy might be used to
aggregate data from the Month level to the Quarter level, from the Quarter level to the Year
level. A hierarchy can also be used to define a navigational drill path, regardless of whether the
levels in the hierarchy represent aggregated totals or not.
Level
A position in a hierarchy. For example, a time dimension might have a hierarchy that represents
data at the Month, Quarter, and Year levels.
Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically
has two types of columns: those that contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.
Example of Star Schema: Figure 1.6
In the example figure 1.6, sales fact table is connected to dimensions location, product, time
and organization. It shows that data can be sliced across all dimensions and again it is possible
for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can
be calculated across all dimensions independently or in a combined manner which is explained
below.
• Sales Dollar value for a particular product
• Sales Dollar value for a product in a location
• Sales Dollar value for a product in a year within a location
• Sales Dollar value for a product in a year within a location sold or serviced by an employee
Snowflake Schema
A snowflake schema is a term that describes a star schema structure normalized through the
use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star
schema example we had 4 dimensions like location, product, time, organization and a fact
table(sales).
In Snowflake schema, the example diagram shown below has 4 dimension tables, 4
lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and
month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION,
and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach
increases the number of joins and poor performance in retrieval of data. In few organizations,
they try to normalize the dimension tables to save space. Since dimension tables hold less
space, Snowflake schema approach may be avoided.
Example of Snowflake Schema: Figure 1.7
Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically has two
types of columns: those that contain facts and those that are foreign keys to dimension tables.
The primary key of a fact table is usually a composite key that is made up of all of its foreign
keys.
In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across
several dimensions. Fact tables store different types of measures like additive, non additive and
semi additive measures.
Measure Types
• Additive - Measures that can be added across all dimensions.
• Non Additive - Measures that cannot be added across all dimensions.
• Semi Additive - Measures that can be added across few dimensions and not with others.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables).
In the real world, it is possible to have a fact table that contains no measures or facts. These
tables are called as Factless Fact tables.
Steps in designing Fact Table
• Identify a business process for analysis(like sales).
• Identify measures or facts (sales dollar).
• Identify dimensions for facts(product dimension, location dimension, time dimension,
organization dimension).
• List the columns that describe each dimension.(region name, branch name, region name).
• Determine the lowest level of summary in a fact table(sales dollar).
Example of a Fact Table with an Additive Measure in Star Schema: Figure 1.6
In the example figure 1.6, sales fact table is connected to dimensions location, product, time
and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions
independently or in a combined manner which is explained below.
• Sales Dollar value for a particular product
• Sales Dollar value for a product in a location
• Sales Dollar value for a product in a year within a location
• Sales Dollar value for a product in a year within a location sold or serviced by an employee

More Related Content

DOCX
Star ,Snow and Fact-Constullation Schemas??
DOCX
PPTX
Schemas for multidimensional databases
DOCX
Dimensional data model
PPTX
Dimensional data modeling
PDF
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
PPT
Star schema PPT
PDF
Multidimensional schema
Star ,Snow and Fact-Constullation Schemas??
Schemas for multidimensional databases
Dimensional data model
Dimensional data modeling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Star schema PPT
Multidimensional schema

What's hot (20)

PPTX
multi dimensional data model
PDF
Multidimentional data model
PPTX
Introduction to Dimesional Modelling
PPTX
Dimensional Modeling
DOC
Difference between ER-Modeling and Dimensional Modeling
PPT
Dimensional Modeling
PPT
DW DIMENSN MODELNG
PPTX
Fact table design for data ware house
DOCX
Designing the business process dimensional model
PDF
Difference between snowflake schema and fact constellation
PPTX
Fact table facts
PPT
Dimensional modelling-mod-3
PDF
Business Intelligence and Multidimensional Database
PDF
Data Warehouse Design & Dimensional Modeling
PPTX
Dominick’s finer foods
PDF
Dimensional modeling primer
PPTX
Fact less fact Tables & Aggregate Tables
PPT
E-R vs Starschema
PPT
Dimensional Modelling Session 2
PPTX
Advanced Dimensional Modelling
multi dimensional data model
Multidimentional data model
Introduction to Dimesional Modelling
Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
Dimensional Modeling
DW DIMENSN MODELNG
Fact table design for data ware house
Designing the business process dimensional model
Difference between snowflake schema and fact constellation
Fact table facts
Dimensional modelling-mod-3
Business Intelligence and Multidimensional Database
Data Warehouse Design & Dimensional Modeling
Dominick’s finer foods
Dimensional modeling primer
Fact less fact Tables & Aggregate Tables
E-R vs Starschema
Dimensional Modelling Session 2
Advanced Dimensional Modelling
Ad

Similar to Dw concepts (20)

PPTX
Data warehouse logical design
PDF
Database aggregation using metadata
DOC
Basics+of+Datawarehousing
PPT
Dimensional Modeling Concepts_Nishant.ppt
PPT
Intro to datawarehouse dev 1.0
PPTX
Data Warehouse_Architecture.pptx
PPTX
1.2 CLASS-DW.pptx-data warehouse design and development
PPT
Data warehouse
DOC
Sqlserver interview questions
PPTX
Data Warehouse by Amr Ali
PDF
Data Warehousing concepts for Data Engineering
PPTX
Module 1.2: Data Warehousing Fundamentals.pptx
PPT
Business Intelligence: A Review
PDF
1 introductory slides (1)
PPTX
Data warehouse - Nivetha Durganathan
PPT
Become BI Architect with 1KEY Agile BI Suite - OLAP
PDF
Olap fundamentals
PDF
LECTURE 7.ppt.pdf
PPTX
Data warehousing Concepts and Design.pptx
PPTX
introduction & conceptsdatawarehousing.pptx
Data warehouse logical design
Database aggregation using metadata
Basics+of+Datawarehousing
Dimensional Modeling Concepts_Nishant.ppt
Intro to datawarehouse dev 1.0
Data Warehouse_Architecture.pptx
1.2 CLASS-DW.pptx-data warehouse design and development
Data warehouse
Sqlserver interview questions
Data Warehouse by Amr Ali
Data Warehousing concepts for Data Engineering
Module 1.2: Data Warehousing Fundamentals.pptx
Business Intelligence: A Review
1 introductory slides (1)
Data warehouse - Nivetha Durganathan
Become BI Architect with 1KEY Agile BI Suite - OLAP
Olap fundamentals
LECTURE 7.ppt.pdf
Data warehousing Concepts and Design.pptx
introduction & conceptsdatawarehousing.pptx
Ad

More from Krishna Prasad (7)

TXT
TXT
TXT
TXT
DOC
Etl testing
DOC
Success quotes from nh
TXT
Datastage details
Etl testing
Success quotes from nh
Datastage details

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf

Dw concepts

  • 1. Data Warehouse Concepts What is a Data Warehouse? According to Inmon, famous author for several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process". Example: In order to store data, over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems. Example of Source Data System Name Attribute Name Column Name Datatype Values Source System 1 Customer Application Date CUSTOMER_APPLICATION_DATE NUMERIC(8,0) 11012005 Source System 2 Customer Application Date CUST_APPLICATION_DATE DATE 11012005 Source System 3 Application Date APPLICATION_DATE DATE 01NOV2005 In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source system to another. This inconsistency in data can be avoided by integrating the data into a data warehouse with good standards. Example of Target Data(Data Warehouse) Target System Attribute Name Column Name Datatype Values Record #1 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 Record #2 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 Record #3 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 In the above example of target data, attribute names, column names, and datatypes are consistent throughout the target system. This is how data from various source systems is integrated and accurately stored into the data warehouse. See Figure 1.12 below for Data Warehouse Architecture Diagram.
  • 2. Figure 1.12 : Data Warehouse Architecture Data Warehouse & Data Mart A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources. In addition to a relational/multidimensional database, a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. There are three types of data warehouses: 1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise. 2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real entertprise data warehouse, data is refreshed in near real time and used for routine business activity. 3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business function. Data warehouses and data marts are built on dimensional data modeling where fact tables are connected with dimension tables. This is most useful for users to access data since a database can be visualized as a cube of several dimensions. A data warehouse provides an opportunity for slicing and dicing that cube along each of its dimensions. Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources.
  • 3. Figure 1.12 : Data Warehouse and Datamarts General Information In general, an organization is started to earn money by selling a product or by providing service to the product. An organization may be at one place or may have several branches. When we consider an example of an organization selling products throughtout the world, the main four major dimensions are product, location, time and organization. Dimension tables have been explained in detail under the section Dimensions. With this example, we will try to provide detailed explanation about STAR SCHEMA. What is Star Schema? Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data. Steps in designing Star Schema • Identify a business process for analysis(like sales). • Identify measures or facts (sales dollar). • Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). • List the columns that describe each dimension.(region name, branch name, region name). • Determine the lowest level of summary in a fact table(sales dollar). Important aspects of Star Schema & Snow Flake Schema
  • 4. • In a star schema every dimension will have a primary key. • In a star schema, a dimension table will not have any parent table. • Whereas in a snow flake schema, a dimension table will have one or more parent tables. • Hierarchies for the dimensions are stored in the dimensional table itself in star schema. • Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies. Glossary: Hierarchy A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not. Level A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels. Fact Table A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation. Example of Star Schema: Figure 1.6 In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible
  • 5. for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below. • Sales Dollar value for a particular product • Sales Dollar value for a product in a location • Sales Dollar value for a product in a year within a location • Sales Dollar value for a product in a year within a location sold or serviced by an employee Snowflake Schema A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table(sales). In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space, Snowflake schema approach may be avoided.
  • 6. Example of Snowflake Schema: Figure 1.7 Fact Table The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures. Measure Types • Additive - Measures that can be added across all dimensions. • Non Additive - Measures that cannot be added across all dimensions. • Semi Additive - Measures that can be added across few dimensions and not with others. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables. Steps in designing Fact Table • Identify a business process for analysis(like sales).
  • 7. • Identify measures or facts (sales dollar). • Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). • List the columns that describe each dimension.(region name, branch name, region name). • Determine the lowest level of summary in a fact table(sales dollar). Example of a Fact Table with an Additive Measure in Star Schema: Figure 1.6 In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is explained below. • Sales Dollar value for a particular product • Sales Dollar value for a product in a location • Sales Dollar value for a product in a year within a location • Sales Dollar value for a product in a year within a location sold or serviced by an employee