Logical Data Modeling
Data Warehousing
Components of Dimensional Model
Star
Fact Table
Fact Table Keys
What's Wrong w/This Fact Table of Basketball
Player game stats?
Fact Table Grain
Fact Table Granularity
Fact Table
Dimension Table
–Should always contain a business key, or legacy PK from source
system.
–Always have a Surrogate Primary Key.+
–Dimension attributes can sometime be discrete numbers values that
behave like text like Phone number
–During meetings with Business, Dimension attribute are often heard
following the word “By” (by year, by product, by region)
–Dimension attribute should be verbose, descriptive, complete,
discreetly valued and quality assured.
Dimension Table
• Dimension rows are uniquely identified by a single key field.
• Always have a Surrogate Primary Key.
• Should always contain a business key, or legacy PK from source
system.
• A 4 byte integer makes a great surrogate key as it can represent more
than 2 billion +ve integers
Surrogate Key Advantage
Conformed dimensions
Conformed dimensions - example
Date and Time dimensions
›This is the most common conformed dimensions.
›Acceptable to use PK in the form YYYYMMDD
›In Case you need time of day, use a separate dimension.
›Time of day should only be used if there are meaningful
textual descriptions of time
–Ex.Lunch, Dinner, 1stshift, 2ndShift, Etc…
›Elapsed times intervals are facts, not attributes.
–Ex. Minutes between when order was received and
shipped
Date dimension
Degenerate dimension
›These are the dimensions which we store in the fact table
›These occur in transaction fact tables as actual transaction number
without joining to anything e.g. Order Number, Invoice Number, Ticket
Number
›It can also be used as a link back to operational systems
›Often used as part of PK of Fact table
Slowly changing dimensions
Dimensional data changes infrequently but when it does you need a
strategy for addressing the change.
–Ex:What happens when a customer has a new address, or an
Employee has a name change?
3 Popular strategies
-Type 1: Overwrite the existing attribute
-Type 2: Add a new Dimension row
-Type 3: Add a new Dimension attribute –
Type 1 - Overwrite the existing attribute
Type 2: Add New Dimension Row ( Insert-Update)
›Most popular strategy, as it preserves history
›Natural key is repeated.
›Old and new values are stored along with effective
dates and indicator of which row is “current”
Type 3: Add A New Dimension Attribute
• Infrequently used, preserves partial history
• Useful for “Soft” changes where users might want to choose between
the old and new attribute, or need to access both values for a time.
• The new value is written to the existing column, the old value is
stored in a new column.
• This way queries do not have to be re-written to access the new
attribute.
Type 3 Example
Role Plying dimensions
Junk dimensions
Fact-less Facts
• Business processes that do not generate quantifiable measurements
• Ex: Student attendance, College admissions
• Can be easily converted into traditional fact tables by adding an
attribute Count, which is always equal to 1.
• Consider adding facts for when the event did not happen
• Helps to perform aggregations
• Ex: Attendance % present or absent versus class size.
Fact-less Fact Table Example
Consolidated fact tables
Fact tables populated from different sources may
consolidated into single fact table
–Level of granularity must be the same
–Measurements are listed side-by-side
–Ex. by combining forecast and actual sales amounts, a
forecast/actual sales variance amount can be easily
calculated and stored
Designing the Dimension model
• Identify the required team roles and participants.
• Review the business requirements document
• Review the source data
• Use the modelling tool
• Establish Naming Conventions
• Obtain facilities and supplies
Four Step Modelling Process
Four Step Modelling Process
Design Dimension Model
High Level Domain Model
Design the Dimension Model
Step by Step Approach to Dimensional Modeling
Step by Step Approach to Dimensional Modeling
›Choose the business process
–number of burgers and fries sold from a specific McDonalds outlet per day.
–hence the Business Process is “Fast food Sales”
›Declare the Grain
–granularity refers to the lowest (or most granular) level of information
stored in any table
–how many burgers and fries are getting sold from a specific McDonalds
outlet per day
–Hence the Grain is “food getting sold per store per day”
Step by Step Approach to Dimensional Modeling
• ›Identify the dimensions
• –Dimensions are the object or context. dimensions are the 'things' about which
something is being spoken
• –we are speaking about some "food", some specific McDonalds "store" and some
specific “"day“
• –Hence we have 3 dimensions -"food" (e.g. burgers and fries), "store" and "day“
• –we need to know what are the attributes of each dimension that we need to
store in our table
• –let's take the dimension "food“, different attributes of food -e.g. names of the
food, Type, price of the food, total calories in the food, contents, expiry date and
so on
Step by Step Approach to Dimensional Modeling
Step by Step Approach to Dimensional Modeling
Step by Step Approach to Dimensional Modeling
›Identify the facts
–Measures are the quantifiable subjects and these are often numeric in
nature
–the number of burgers/fries sold is a measure
–a separate Fact table is created for storing measures.
–since in our case the granularity is food getting sold per store per day,
we will need to add key columns from food/store and Day dimensions
to the “Sales Fact” Table
Step by Step Approach to Dimensional Modeling
Lecture 08B - Logical-DWH-Model-Pending.pptx
Business Intelligence or Semantic Layer
• A business representation of corporate data that
helps end users access data using common business
terms.
• The aim is to insulate users from the technical details
of the data warehouse and allow them to create
queries in terms that are familiar and meaningful.
• One of the key components of the business
intelligence (BI) architecture.
Types of BI Applications
› Direct Access query and reporting tools
› Data Mining
› Standard Reports
› Analytic Applications
› Dashboards and Scorecards
› Operational BI
Dashboards
Lecture 08B - Logical-DWH-Model-Pending.pptx

More Related Content

PPTX
Chapter 2 - Retail Sales
PPT
Modelado Dimensional 4 etapas.ppt
PPT
Modelado Dimensional 4 Etapas
PPTX
Introduction to Dimesional Modelling
PPTX
Introduction to Data Warehousing
PPTX
CaseStudy.pptx
PPT
Intro to datawarehouse dev 1.0
PPTX
Dataware house introduction by InformaticaTrainingClasses
Chapter 2 - Retail Sales
Modelado Dimensional 4 etapas.ppt
Modelado Dimensional 4 Etapas
Introduction to Dimesional Modelling
Introduction to Data Warehousing
CaseStudy.pptx
Intro to datawarehouse dev 1.0
Dataware house introduction by InformaticaTrainingClasses

Similar to Lecture 08B - Logical-DWH-Model-Pending.pptx (20)

PDF
Business Intelligence Data Warehouse System
PPT
An introduction to data warehousing
PPTX
INFORMATICA EASY LEARNING ONLINE TRAINING
PDF
First Steps to Define Grain
PPTX
Is Your Marketing Database "Model Ready"?
PPTX
Is Your Marketing Database "Model Ready"?
PPTX
Yehoshua Coren - Analytics Ninja (All Things Data 2015)
PDF
Chapter 1 :Introduction to business analytics
PPTX
Dataware house Introduction By Quontra Solutions
PDF
Sales Force & Merchandiser Enablement Through Mobility
PDF
Sales Force & Merchandiser Enablement Through Mobility
PPSX
Data Refinement: The missing link between data collection and decisions
PPTX
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
PDF
SALES_FORECASTING of sparkflows.pdf
PPT
Datawarehouse Overview
PPTX
Dimensional Modeling
PPTX
Kaggle winning solutions: Retail Sales Forecasting
PDF
RQ Retail Management: Connecting the Inventory Dots
PDF
Predict Repeat Shoppers with H20 and Spark
PPT
Data warehouse
Business Intelligence Data Warehouse System
An introduction to data warehousing
INFORMATICA EASY LEARNING ONLINE TRAINING
First Steps to Define Grain
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
Yehoshua Coren - Analytics Ninja (All Things Data 2015)
Chapter 1 :Introduction to business analytics
Dataware house Introduction By Quontra Solutions
Sales Force & Merchandiser Enablement Through Mobility
Sales Force & Merchandiser Enablement Through Mobility
Data Refinement: The missing link between data collection and decisions
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
SALES_FORECASTING of sparkflows.pdf
Datawarehouse Overview
Dimensional Modeling
Kaggle winning solutions: Retail Sales Forecasting
RQ Retail Management: Connecting the Inventory Dots
Predict Repeat Shoppers with H20 and Spark
Data warehouse
Ad

More from Asadkhan47384 (18)

PPTX
App_ICT Lec Scope of ICT and Future trends-2.pptx
PPTX
Applications of ICT Lec-3 search engine WWW.pptx
PPTX
fuzzy logic-AMkkkkkkkkkkkkkkkkkkkkk.pptx
PPTX
DWH_ Lec-01 nmnmmnmn asad khan asad.pptx
PPTX
cactus-.pptx
PPTX
Usability in Practice.pptx
PPTX
HCI_Lec-15.pptx
PPT
Lecture 06- Reading-SQLDataManipulation.ppt
PPT
Lecture 10 - DataMiningEngineering.ppt
PPTX
HCI_Lec-12.pptx
PPT
Lecture 05-SchemaMatching.ppt
PPTX
Lecture 06 -IIS-OLAP.pptx
PPTX
Lecture 01-1-IIS.pptx
PPTX
Lecture 02-2-IIS.pptx
PPTX
HCI_ Lec-5.pptx
PPTX
HCI.pptx
PPT
Lecture 06- Reading-SQLDataManipulation.ppt
PPTX
HCI.pptx
App_ICT Lec Scope of ICT and Future trends-2.pptx
Applications of ICT Lec-3 search engine WWW.pptx
fuzzy logic-AMkkkkkkkkkkkkkkkkkkkkk.pptx
DWH_ Lec-01 nmnmmnmn asad khan asad.pptx
cactus-.pptx
Usability in Practice.pptx
HCI_Lec-15.pptx
Lecture 06- Reading-SQLDataManipulation.ppt
Lecture 10 - DataMiningEngineering.ppt
HCI_Lec-12.pptx
Lecture 05-SchemaMatching.ppt
Lecture 06 -IIS-OLAP.pptx
Lecture 01-1-IIS.pptx
Lecture 02-2-IIS.pptx
HCI_ Lec-5.pptx
HCI.pptx
Lecture 06- Reading-SQLDataManipulation.ppt
HCI.pptx
Ad

Recently uploaded (20)

PPTX
Presentation (1).pptx gjkbhhjk hjjgtihkk
PPTX
dDifference Beetween Saving slides And Investment Slides.pptx
PDF
script scriptscriptscriptscriptscriptscript
PPTX
Operating_Systems_Presentation_With_Icons (1).pptx
PDF
SAHIL PROdhdjejss yo yo pdf TOCOL PPT.pdf
PPTX
vortex flow measurement in instrumentation
PDF
CAB UNIT 1 with computer details details
PPT
The process of making an electrical connection by melting low-temperature met...
PPTX
Presentation societal project DEEPIKA T.pptx
PPTX
Clauses_Part1.hshshpjzjxnznxnxnndndndndndndndnndptx
PDF
Printing Presentation to show beginners.
PPTX
Subordinate_Clauses_BlueGradient_Optimized.pptx
PPT
System Unit Components and its Functions
DOCX
Copy-OT LIST 12.8.25.docxjdjfufufufufuuffuf
PDF
PakistanCoinageAct-906.pdfdbnsshsjjsbsbb
PDF
Topic-1-Main-Features-of-Data-Processing.pdf
PPTX
Malnutrition_Presentation_Revised.pptxhwjsjjsjs
PDF
20A LG INR18650HJ2 3.6V 2900mAh Battery cells for Power Tools Vacuum Cleaner
PPTX
Chapter no 8 output devices dpart 2.pptx
PPTX
ppt to the world finance to the world in growing
Presentation (1).pptx gjkbhhjk hjjgtihkk
dDifference Beetween Saving slides And Investment Slides.pptx
script scriptscriptscriptscriptscriptscript
Operating_Systems_Presentation_With_Icons (1).pptx
SAHIL PROdhdjejss yo yo pdf TOCOL PPT.pdf
vortex flow measurement in instrumentation
CAB UNIT 1 with computer details details
The process of making an electrical connection by melting low-temperature met...
Presentation societal project DEEPIKA T.pptx
Clauses_Part1.hshshpjzjxnznxnxnndndndndndndndnndptx
Printing Presentation to show beginners.
Subordinate_Clauses_BlueGradient_Optimized.pptx
System Unit Components and its Functions
Copy-OT LIST 12.8.25.docxjdjfufufufufuuffuf
PakistanCoinageAct-906.pdfdbnsshsjjsbsbb
Topic-1-Main-Features-of-Data-Processing.pdf
Malnutrition_Presentation_Revised.pptxhwjsjjsjs
20A LG INR18650HJ2 3.6V 2900mAh Battery cells for Power Tools Vacuum Cleaner
Chapter no 8 output devices dpart 2.pptx
ppt to the world finance to the world in growing

Lecture 08B - Logical-DWH-Model-Pending.pptx

  • 6. What's Wrong w/This Fact Table of Basketball Player game stats?
  • 10. Dimension Table –Should always contain a business key, or legacy PK from source system. –Always have a Surrogate Primary Key.+ –Dimension attributes can sometime be discrete numbers values that behave like text like Phone number –During meetings with Business, Dimension attribute are often heard following the word “By” (by year, by product, by region) –Dimension attribute should be verbose, descriptive, complete, discreetly valued and quality assured.
  • 11. Dimension Table • Dimension rows are uniquely identified by a single key field. • Always have a Surrogate Primary Key. • Should always contain a business key, or legacy PK from source system. • A 4 byte integer makes a great surrogate key as it can represent more than 2 billion +ve integers
  • 15. Date and Time dimensions ›This is the most common conformed dimensions. ›Acceptable to use PK in the form YYYYMMDD ›In Case you need time of day, use a separate dimension. ›Time of day should only be used if there are meaningful textual descriptions of time –Ex.Lunch, Dinner, 1stshift, 2ndShift, Etc… ›Elapsed times intervals are facts, not attributes. –Ex. Minutes between when order was received and shipped
  • 17. Degenerate dimension ›These are the dimensions which we store in the fact table ›These occur in transaction fact tables as actual transaction number without joining to anything e.g. Order Number, Invoice Number, Ticket Number ›It can also be used as a link back to operational systems ›Often used as part of PK of Fact table
  • 18. Slowly changing dimensions Dimensional data changes infrequently but when it does you need a strategy for addressing the change. –Ex:What happens when a customer has a new address, or an Employee has a name change? 3 Popular strategies -Type 1: Overwrite the existing attribute -Type 2: Add a new Dimension row -Type 3: Add a new Dimension attribute –
  • 19. Type 1 - Overwrite the existing attribute
  • 20. Type 2: Add New Dimension Row ( Insert-Update) ›Most popular strategy, as it preserves history ›Natural key is repeated. ›Old and new values are stored along with effective dates and indicator of which row is “current”
  • 21. Type 3: Add A New Dimension Attribute • Infrequently used, preserves partial history • Useful for “Soft” changes where users might want to choose between the old and new attribute, or need to access both values for a time. • The new value is written to the existing column, the old value is stored in a new column. • This way queries do not have to be re-written to access the new attribute.
  • 25. Fact-less Facts • Business processes that do not generate quantifiable measurements • Ex: Student attendance, College admissions • Can be easily converted into traditional fact tables by adding an attribute Count, which is always equal to 1. • Consider adding facts for when the event did not happen • Helps to perform aggregations • Ex: Attendance % present or absent versus class size.
  • 27. Consolidated fact tables Fact tables populated from different sources may consolidated into single fact table –Level of granularity must be the same –Measurements are listed side-by-side –Ex. by combining forecast and actual sales amounts, a forecast/actual sales variance amount can be easily calculated and stored
  • 28. Designing the Dimension model • Identify the required team roles and participants. • Review the business requirements document • Review the source data • Use the modelling tool • Establish Naming Conventions • Obtain facilities and supplies
  • 34. Step by Step Approach to Dimensional Modeling
  • 35. Step by Step Approach to Dimensional Modeling ›Choose the business process –number of burgers and fries sold from a specific McDonalds outlet per day. –hence the Business Process is “Fast food Sales” ›Declare the Grain –granularity refers to the lowest (or most granular) level of information stored in any table –how many burgers and fries are getting sold from a specific McDonalds outlet per day –Hence the Grain is “food getting sold per store per day”
  • 36. Step by Step Approach to Dimensional Modeling • ›Identify the dimensions • –Dimensions are the object or context. dimensions are the 'things' about which something is being spoken • –we are speaking about some "food", some specific McDonalds "store" and some specific “"day“ • –Hence we have 3 dimensions -"food" (e.g. burgers and fries), "store" and "day“ • –we need to know what are the attributes of each dimension that we need to store in our table • –let's take the dimension "food“, different attributes of food -e.g. names of the food, Type, price of the food, total calories in the food, contents, expiry date and so on
  • 37. Step by Step Approach to Dimensional Modeling
  • 38. Step by Step Approach to Dimensional Modeling
  • 39. Step by Step Approach to Dimensional Modeling ›Identify the facts –Measures are the quantifiable subjects and these are often numeric in nature –the number of burgers/fries sold is a measure –a separate Fact table is created for storing measures. –since in our case the granularity is food getting sold per store per day, we will need to add key columns from food/store and Day dimensions to the “Sales Fact” Table
  • 40. Step by Step Approach to Dimensional Modeling
  • 42. Business Intelligence or Semantic Layer
  • 43. • A business representation of corporate data that helps end users access data using common business terms. • The aim is to insulate users from the technical details of the data warehouse and allow them to create queries in terms that are familiar and meaningful. • One of the key components of the business intelligence (BI) architecture.
  • 44. Types of BI Applications › Direct Access query and reporting tools › Data Mining › Standard Reports › Analytic Applications › Dashboards and Scorecards › Operational BI