SlideShare a Scribd company logo
BIUG

Itay Braun
CTO
itay@twingo.co.il
Agenda

l Dimension Design
l SSAS Best Practices
l MDX

l Inspired by Vincent Rainardi (http://sqlbits.com/ )
  and Mosha Pumanski
l
                     l
                     l
BI-   DB             l
                     l
                     l
           DB- BI-   l
Microstrategy-                 BI         •
                               DWH                      •
                                                        •
                                                             BI

                               SQL SERVER- SYBASE-      •
                                                        •
                                        DB        P T   •   DB
                                     OEM                •



                                                        •
WEB SILVERLIGHT C                                       •
                                                        •
                                                            .NET
Biug 20112026   dimensional modeling and mdx best practices
White Papers
• Analysis Services 2008 R2 Performance
  Guide
• Analysis Services 2008 Operation
  Guide
• Performance Improvements for MDX in
  SQL Server 2008 Analysis Services
• OLAP Design Best Practices
1 or 2 dimensions
 a) One Dimension                    b) Two Dimensions

                  Dim                                 Dim
                 Account                            Account
    Fact                                Fact
    Table                               Table
                   customer
                                                      Dim
                   attributes
                                                    Customer

                                     • We can get the customer
• Simplicity, 1 dim
                                       attributes without knowing the
• Hierarchy from customer
                                       account key
  attribute &account attribute
                                     • Disadvantage: can‟t go from
• Use when we don‟t have fact
                                       account to customer without
  tables requiring customer grain.
                                       going through the fact table -
                                       performance
1 or 2 dimensions
c) Snowflake

                 Dim          Dim
               Account      Customer
   Fact
   Table       • Dim customer is needed by another fact table
               • Modular: 2 separate dim tables but we can combine
                 them easily to create a bigger dimension
               • To get the breakdown of a measure by a customer
                 attribute is a bit more complicated than a)
 select c. attribute, sum(f.measure1) from fact1 f
 inner join dim_account a on f.account_key = a.account_key
 inner join dim_customer c on a.customer_key = c.customer_key
 group by c. attribute
When to Snowflake
1. When the sub dim is used by several dims
                               City-Country-Region columns exist in
                               DimBroker, DimPolicy, DimOffice and
                               DimInsured
                              Replaced by Location/GeoKey
                              pointing to DimLocation /
                              DimGeography
 Advantage: consistent hierarchy, i.e. relationship between
 City, Country & Region.
 Weakness: we would lose flexibility. City to Country are
 more or less fixed, but the grouping of countries might be
 different between dimensions.
When to Snowflake
2. When the sub dim is used by both the main dim and
the fact table(s)
                    • DimCustomer is used in DimAccount,
                      and is also used in the fact table.
                    • DimManufacturer is used in DimProduct,
                      and is also used in the fact table.
                    • DimProductGroup is used in DimProduct,
                      and is also used in some fact table.

                     The alternative is maintaining two
                     full dimensions (star classic).
When to Snowflake
3. To make “base dim” and “detail dim”
Insurance classes, account types
(banking), product lines, diagnosis,
treatment (health care)
Policies for marine, aviation & property classes have different
attributes.
Pull common attributes into 1 dim: DimBasePolicy
Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213
A dimension with only 1 attribute

             Should we put the attribute in the fact table?
             (like DD = Degenerate Dim)
             Probably, if the grain = fact table,
             and it‟s short or it‟s a number.
Reasons for putting single attribute in its own dim:
– Keep fact table slim (4 bytes int not 100 bytes varchar)
– When the value changes, we don‟t have to update the
  BIG fact table – ETL performance
– Grain is much lower than fact table – small dim
– Yes it‟s only 1 attribute today, but in the future there
  could be another attribute.
Fact Table Primary Key
Should we have a PK?                            Some experts totally disagree

Yes, if we need to be able to identify each fact row
1. Need to refer to a fact row from another fact row e.g. chain of events
2. Many identical fact rows and we need to update/delete only one
3. To link the fact table to another fact table

Related Trans        Header - Detail           Uniqueness


  PK FK              PK         FK (no RI)        PK
    (not enforced)
                                             previous/next transaction
Fact Table Primary Key
Single or Multi Column?
  Single Column: Generated Identity
  Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK
guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query
performance. To do a self join in the fact table (e.g. to link the current
fact row to the previous fact row), we join on a single integer column.
Fact Table Primary Key
• Advantage: Prevent duplicate rows, query performance
• Disadvantage: loading performance
• Indexing the PK: cluster or not?
    – Cluster the PK if: the PK is an identity column
    – Don‟t cluster the PK if: the PK is a composite, or when you need
      the cluster index for query performance (with partitioning)

Example of not having a PK
 If duplicate fact rows are allowed.
 e.g. retail DW: Store Key, Date Key, Product Key, Customer Key
 Same customer buying the same milk in the same shop on the same day
 twice
Aggregate Fact Tables
What are they?
                                                      Base Fact Tables
• High level aggregation of base fact tables
• A “select group by” query on a 2 billion rows
  fact table can take 30 mins if it joins with two
  big fact tables, even with indexes in place
• So we do this query in advance as part of the
  DW load and store it as an Aggregate Fact
  Table                                    30 mins
• The report only takes 1 second to run.
                                                      Aggregate
                                              1 sec   Fact Table

                                 Report
Rapidly Changing Dimension
• Why is it a problem
   – Large SCD2 dim – Attributes change every day
   – Slow query when join with large fact tables
• What to do
   – Put into a separate dim, link direct to fact table.
   – Just store the latest, type 1 attributes (or dual)
   – Store in the fact table (for small attribute, e.g. indicator)

        Type2                 Type2               Type2

        Type2                                    Type1
Very Large Dimension
Why is it a problem
  – SSAS: 4 GB string store limit for dimension
  – SSAS: dim is “select distinct” on each attribute
    – long processing time
  – Difficult to browse high cardinality attribute
  – Join with fact tables – performance
Very Large Dimension
What to do
– Split into 2 dims, same grain. Always cut vertically.
– Remove SCD2, or at least only certain columns.
– Most common: separate the attributes with high cardinality/change
  frequency




        VLD
Real Time Fact Table
•   Reporting the transaction system in real time
•   View to union with the normal fact table, or use partitions
•   Freezing the dims for key lookup, -3 unknown key
•   Key corrections next day

                            Dims as of                    Main partition
                             yesterday                 (up to last night)
Unknown keys:
-1 null in source
-2 not in dim table                                    Real time partition
-3 not in dim table as dim was frozen         dim        (intraday today)
   to be resolved next batch                  key
Dealing with Currency Rates
What for/background/requirements
– Report in 3 reporting currencies, using today rates or past
– Analyse over time without the impact of currency rates (using fixed
  currency rates, e.g. 2010 EOY rates)
– Had the transactions happened today
– Currency rates historical analysis

 Transaction                      DW                        Reporting
  Currency      Transaction     Currency      Reporting     Currency
                   Rates                        Rates
100 countries (many transaction 1 currency   ( 1 reporting 3-4 currencies
40 currencies      dates)       e.g. GBP                  GBP, USD, EUR,
                                                 date)
                                                              Original
Dealing with Currency Rates
• A good example can be found here.
Dealing with Status
What/background
  – Workflow (policies, contracts, documents)
  – Bottleneck analysis (no of days between
    stages)
  – How many on each stage

   Status       Status            Status        Status
     1            2                 4             6
   date1         date2            date3         date4
                         Status            Status
                           3                 5
Dealing with Status
Approaches
– Accumulative Snapshot Fact, 1 row per application
– SCD2 on DimApp                   AppKey AppID StsKey               StsDate Current
                                   1      1     1                    1/3/11     N
– App Status fact table
                                               2        1     2      3/3/11     N
                                               3        1     3      7/3/11     Y
 AppKey   StsKey   StsDateKey
                                               4        2     1      6/3/11     N
 1        1        61
                                               5        2     2      7/3/11     Y
 1        2        63
 1        3        67
 2        1        66
                                AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind
 2        2        67
                                1      1/3/11 1         3/3/11 1         7/3/11 1
                                2      6/3/11 1         7/3/11 1                  0
Referenced Dimensions
• Enables using one “master” member
• Not Snowflake dimension
  – For ex.
    • Dim customers: UK, London, Roman Avramovich.
    • Dim Stores: UK, London, Friendly Bikes Store
  – What is the total revenue from Internet
    customers and stores in London?
MDX optimization Methodology
•   Re-write the MDX code
•   Add Aggregations
•   Add pre-calculated Measure Groups (ETL)
•   Solve the problem using Relational Engine
•   Use .NET Store Procedures.
     – Rarely the problem can be solved using better
       hardware.
• Column based Databases
• Optimizing MDX
  – Baselining Query Speeds
    • Clearing the Analysis Services Caches
    • Clearing the Operating System Caches using
      fsutil.exe or SSAS Stored Proc (codeplex)
    • Identifying and Resolving MDX Query
      Performance Bottlenecks in SQL Server 2005
      Analysis Services
    • Configuring the Analysis Services Query Log
• Cell-by-Cell Mode vs. Subspace Mode
Almost always, performance obtained by
using subspace (or block computation)
mode is superior to that obtained by using
cell-by-cell (nor naïve) mode.
Using Profiler
• So far so good
Doesn‟t use the cache
Subcube
• Granularity
• Slice
Granularity
• Single grain
  – List of GROUP BY attributes in SQL SELECT
• Mixed grain
  – Both Attribute.[All] and Attribute.MEMBERS
Granularity
                  All   Countries,   Countries,
            Country,      All City      Cities
             All City
All
Products




 Products
Slice
• Single member
  – SQL: Where City = „Redmond‟
  – MDX: [City].[Redmond]
• Multiple members
  – SQL: Where City IN („Redmond‟, „Seattle‟)
  – MDX: { [City].[Redmond], [City].[Seattle] }
Slice at granularity
SQL
SELECT Sum(Sales), City FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
GROUP BY City
MDX
SELECT Measures.Sales ON 0
, NON EMPTY {Redmond, Seattle} ON 1
FROM Sales_Cube
Slice below granularity

SQL
SELECT Sum(Sales) FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
MDX
SELECT Measures.Sales ON 0
FROM Sales_Cube
WHERE {Redmond, Seattle}
Examples

             All Years   2005   2006   2007   2008
All Cities
Redmon
   d
Seattle
  New
  York
London
Examples

             All Years   2005    2006      2007         2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                         (Seattle, Year.Year.MEMBERS)
Examples

             All Years   2005       2006     2007         2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                                (Seattle, Year.MEMBERS)
Examples

             All Years   2005     2006     2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                 ({Redmond, Seattle, London}, Year.MEMBERS)
Examples

             All Years   2005      2006       2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                 ({Redmond, Seattle}, {2005, 2006, 2007})
Arbitrary shaped subcubes
•   What is it ?
•   How can it happen ?
•   Why is it so bad ?
•   How to avoid them ?
Arbitrary shaped subcubes

             All Years   2005   2006   2007     2008
All Cities
Redmon
   d
Seattle
  New
  York
Lodnon
 Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS,
                                                   2005))
Arbitrary shaped subcubes

             All Years   2005    2006       2007      2008
All Cities
Redmon
   d
Seattle
   SF
Denver

          CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) –
                                               (Seattle, 2007)
Arbitrary shaped subcubes

             All Years   2005   2006      2007       2008
All Cities
Redmon
   d
Seattle
  New
  York
London
{(Redmond,2005), (Seattle, 2006), (New York, 2007), (London,
                                                      2008)}
Arbitrary shaped subcubes

             All Years   2005   2006      2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
   Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All
                                                  Years]))
Arbitrary shapes
• WHERE/Subselect/Aggregate
• Unnatural hierarchies
• Parent-Child (visual totals)
• “Non Leaves” subcube
• Conditional logic (IIF, IF, CASE,
  CoalesceEmpty etc)
• NonEmpty, Exists
WHERE/Subselect
• Severity = „1‟ OR Priority = „1‟
• multiselect
  – {USA, London}
Mixed grain slicer
                       All



          USA                         UK


                New
Seattle                      London        Bristol
                York
Mixed grain slicer
                                          All



                             USA                         UK


                                   New
                 Seattle                        London        Bristol
                                   York

                All Cities     Seattle    New York       London         Bristol
All Countries
    USA
    UK
Parent-child
Leaves vs. Non Leaves
                All   Countries,   Countries,
          Country,      All City      Cities
           All City
    All
Product
      s




Product                                Leaves
      s
Problems with arbitrary shapes
•    Caching
•    Partition slices
•    Indexes
•    SCOPEs
•    Matching calculations
•    Many more
(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
SCOPE
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
    ...;
END SCOPE;
Subcube decomposition
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
    ...;
END SCOPE;




                                                  Scope 2




                Scope 3                           Scope 1
Subcube decomposition
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
         Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
    ...;
END SCOPE;
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         [DateTool].[Aggregation].DefaultMember,
         Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
    ...;
END SCOPE;
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
         [DateTool].[Comparison].DefaultMember
    ...;
END SCOPE;
MDX Optimization - Tips
• Partial expressions are not cached
This = iif(<expensive expression >= 0, 1/<expensive expression>, null);

create member currentcube.measures.MyPartialExpression as <expensive
expression> , visible=0;
this = iif(measures.MyPartialExpression >= 0, 1/
measures.MyPartialExpression, null);
Demo
SSAS Denali
• Coming in the first half of 2012
• SSAS Tabular Mode
  – Cheaper
  – Not best of breed
  – Uses DAX or MDX
• Have you started working with it?
Mobile BI

                          BI               l

          Smart Phone                      l
                     BI                    l

                               Mobile Bi
                                BI

Gartner
Social BI


• Discover New Insights - Analyze the
  demographic and psychographic profiles
  of your Facebook application users.
• Analyze Facebook Data - Analyze the full
  spectrum of Facebook data: profiles,
  interests, check-ins, and more
• Instantly Available via Cloud
Social BI
• Deep Personalization



• Enterprise Data Integration
Survey
• SQL / SSAS Denali
• Mobile BI
• Social BI
Biug 20112026   dimensional modeling and mdx best practices

More Related Content

PDF
Dimensional modelingowb11gr2 presentation
PDF
Dimensional modelingowb11gr2 paper
PDF
Sql Performance Tuning For Developers
PPT
DB2 V10 Migration Guidance
PDF
Task Factory - Pragmatic Works
PDF
Partitioning CCGrid 2012
PDF
DB2 10 Migration Planning & Customer experiences - Chris Crone (IDUG India)
PDF
Mas 90-and-mas-200-crystal-reports-manual
Dimensional modelingowb11gr2 presentation
Dimensional modelingowb11gr2 paper
Sql Performance Tuning For Developers
DB2 V10 Migration Guidance
Task Factory - Pragmatic Works
Partitioning CCGrid 2012
DB2 10 Migration Planning & Customer experiences - Chris Crone (IDUG India)
Mas 90-and-mas-200-crystal-reports-manual

Viewers also liked (7)

PDF
Microstrategy Overview (Hebrew)
PPTX
Journées SQL Server 2012 - DAX pour les fans de MDX
PPTX
Le reporting BI dans tous ses états / quel outil pour quel usage
PPTX
Extreme SSAS - Part II
PPTX
Extreme SSAS- SQL 2011
PPTX
JSS2014 – Azure ML et Data Mining SSAS
PPTX
[JSS2015] Nouveautés SSIS SSRS 2016
Microstrategy Overview (Hebrew)
Journées SQL Server 2012 - DAX pour les fans de MDX
Le reporting BI dans tous ses états / quel outil pour quel usage
Extreme SSAS - Part II
Extreme SSAS- SQL 2011
JSS2014 – Azure ML et Data Mining SSAS
[JSS2015] Nouveautés SSIS SSRS 2016
Ad

Similar to Biug 20112026 dimensional modeling and mdx best practices (20)

PDF
Solutions for Sage Customers from Robert Lavery
PPTX
Advanced Dimensional Modelling
PPTX
Advanced dimensional modelling
PPTX
6910 week 3 - web metircs and tools
PPTX
Big Data presentation at GITPRO 2013
PDF
AWS User Group October
PPTX
NOSQL introduction for big data analytics
PPTX
Temporal Snapshot Fact Tables
PDF
Microsoft SQL Server - How to Collaboratively Manage Excel Data
PDF
The final frontier
PPTX
Building a highly scalable and available cloud application
PPTX
Database Virtualization: The Next Wave of Big Data
PPTX
Performance Management in ‘Big Data’ Applications
PPTX
Svccg nosql 2011_v4
PDF
Everything You Need to Know About Oracle 12c Indexes
PPTX
Power View: Analysis and Visualization for Your Application’s Data
PPTX
SQL for Data Science - for everyone.pptx
PDF
Oracle 12.2 - My Favorite Top 5 New or Improved Features
ODP
BigQuery at AppsFlyer - past, present and future
PPT
Microsoft Dynamics NAV data integration
Solutions for Sage Customers from Robert Lavery
Advanced Dimensional Modelling
Advanced dimensional modelling
6910 week 3 - web metircs and tools
Big Data presentation at GITPRO 2013
AWS User Group October
NOSQL introduction for big data analytics
Temporal Snapshot Fact Tables
Microsoft SQL Server - How to Collaboratively Manage Excel Data
The final frontier
Building a highly scalable and available cloud application
Database Virtualization: The Next Wave of Big Data
Performance Management in ‘Big Data’ Applications
Svccg nosql 2011_v4
Everything You Need to Know About Oracle 12c Indexes
Power View: Analysis and Visualization for Your Application’s Data
SQL for Data Science - for everyone.pptx
Oracle 12.2 - My Favorite Top 5 New or Improved Features
BigQuery at AppsFlyer - past, present and future
Microsoft Dynamics NAV data integration
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?

Biug 20112026 dimensional modeling and mdx best practices

  • 2. Agenda l Dimension Design l SSAS Best Practices l MDX l Inspired by Vincent Rainardi (http://sqlbits.com/ ) and Mosha Pumanski
  • 3. l l l BI- DB l l l DB- BI- l
  • 4. Microstrategy- BI • DWH • • BI SQL SERVER- SYBASE- • • DB P T • DB OEM • • WEB SILVERLIGHT C • • .NET
  • 6. White Papers • Analysis Services 2008 R2 Performance Guide • Analysis Services 2008 Operation Guide • Performance Improvements for MDX in SQL Server 2008 Analysis Services • OLAP Design Best Practices
  • 7. 1 or 2 dimensions a) One Dimension b) Two Dimensions Dim Dim Account Account Fact Fact Table Table customer Dim attributes Customer • We can get the customer • Simplicity, 1 dim attributes without knowing the • Hierarchy from customer account key attribute &account attribute • Disadvantage: can‟t go from • Use when we don‟t have fact account to customer without tables requiring customer grain. going through the fact table - performance
  • 8. 1 or 2 dimensions c) Snowflake Dim Dim Account Customer Fact Table • Dim customer is needed by another fact table • Modular: 2 separate dim tables but we can combine them easily to create a bigger dimension • To get the breakdown of a measure by a customer attribute is a bit more complicated than a) select c. attribute, sum(f.measure1) from fact1 f inner join dim_account a on f.account_key = a.account_key inner join dim_customer c on a.customer_key = c.customer_key group by c. attribute
  • 9. When to Snowflake 1. When the sub dim is used by several dims City-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured Replaced by Location/GeoKey pointing to DimLocation / DimGeography Advantage: consistent hierarchy, i.e. relationship between City, Country & Region. Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
  • 10. When to Snowflake 2. When the sub dim is used by both the main dim and the fact table(s) • DimCustomer is used in DimAccount, and is also used in the fact table. • DimManufacturer is used in DimProduct, and is also used in the fact table. • DimProductGroup is used in DimProduct, and is also used in some fact table. The alternative is maintaining two full dimensions (star classic).
  • 11. When to Snowflake 3. To make “base dim” and “detail dim” Insurance classes, account types (banking), product lines, diagnosis, treatment (health care) Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicy Put class-specific attributes into DimMarine, DimProperty, DimAviation Ref: Kimball DW Toolkit 2nd edition page 213
  • 12. A dimension with only 1 attribute Should we put the attribute in the fact table? (like DD = Degenerate Dim) Probably, if the grain = fact table, and it‟s short or it‟s a number. Reasons for putting single attribute in its own dim: – Keep fact table slim (4 bytes int not 100 bytes varchar) – When the value changes, we don‟t have to update the BIG fact table – ETL performance – Grain is much lower than fact table – small dim – Yes it‟s only 1 attribute today, but in the future there could be another attribute.
  • 13. Fact Table Primary Key Should we have a PK? Some experts totally disagree Yes, if we need to be able to identify each fact row 1. Need to refer to a fact row from another fact row e.g. chain of events 2. Many identical fact rows and we need to update/delete only one 3. To link the fact table to another fact table Related Trans Header - Detail Uniqueness PK FK PK FK (no RI) PK (not enforced) previous/next transaction
  • 14. Fact Table Primary Key Single or Multi Column? Single Column: Generated Identity Multi Column: Dimension Keys Single-column PK is better than multi-column PK because : 1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column. 2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.
  • 15. Fact Table Primary Key • Advantage: Prevent duplicate rows, query performance • Disadvantage: loading performance • Indexing the PK: cluster or not? – Cluster the PK if: the PK is an identity column – Don‟t cluster the PK if: the PK is a composite, or when you need the cluster index for query performance (with partitioning) Example of not having a PK If duplicate fact rows are allowed. e.g. retail DW: Store Key, Date Key, Product Key, Customer Key Same customer buying the same milk in the same shop on the same day twice
  • 16. Aggregate Fact Tables What are they? Base Fact Tables • High level aggregation of base fact tables • A “select group by” query on a 2 billion rows fact table can take 30 mins if it joins with two big fact tables, even with indexes in place • So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table 30 mins • The report only takes 1 second to run. Aggregate 1 sec Fact Table Report
  • 17. Rapidly Changing Dimension • Why is it a problem – Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables • What to do – Put into a separate dim, link direct to fact table. – Just store the latest, type 1 attributes (or dual) – Store in the fact table (for small attribute, e.g. indicator) Type2 Type2 Type2 Type2 Type1
  • 18. Very Large Dimension Why is it a problem – SSAS: 4 GB string store limit for dimension – SSAS: dim is “select distinct” on each attribute – long processing time – Difficult to browse high cardinality attribute – Join with fact tables – performance
  • 19. Very Large Dimension What to do – Split into 2 dims, same grain. Always cut vertically. – Remove SCD2, or at least only certain columns. – Most common: separate the attributes with high cardinality/change frequency VLD
  • 20. Real Time Fact Table • Reporting the transaction system in real time • View to union with the normal fact table, or use partitions • Freezing the dims for key lookup, -3 unknown key • Key corrections next day Dims as of Main partition yesterday (up to last night) Unknown keys: -1 null in source -2 not in dim table Real time partition -3 not in dim table as dim was frozen dim (intraday today) to be resolved next batch key
  • 21. Dealing with Currency Rates What for/background/requirements – Report in 3 reporting currencies, using today rates or past – Analyse over time without the impact of currency rates (using fixed currency rates, e.g. 2010 EOY rates) – Had the transactions happened today – Currency rates historical analysis Transaction DW Reporting Currency Transaction Currency Reporting Currency Rates Rates 100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies 40 currencies dates) e.g. GBP GBP, USD, EUR, date) Original
  • 22. Dealing with Currency Rates • A good example can be found here.
  • 23. Dealing with Status What/background – Workflow (policies, contracts, documents) – Bottleneck analysis (no of days between stages) – How many on each stage Status Status Status Status 1 2 4 6 date1 date2 date3 date4 Status Status 3 5
  • 24. Dealing with Status Approaches – Accumulative Snapshot Fact, 1 row per application – SCD2 on DimApp AppKey AppID StsKey StsDate Current 1 1 1 1/3/11 N – App Status fact table 2 1 2 3/3/11 N 3 1 3 7/3/11 Y AppKey StsKey StsDateKey 4 2 1 6/3/11 N 1 1 61 5 2 2 7/3/11 Y 1 2 63 1 3 67 2 1 66 AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind 2 2 67 1 1/3/11 1 3/3/11 1 7/3/11 1 2 6/3/11 1 7/3/11 1 0
  • 25. Referenced Dimensions • Enables using one “master” member • Not Snowflake dimension – For ex. • Dim customers: UK, London, Roman Avramovich. • Dim Stores: UK, London, Friendly Bikes Store – What is the total revenue from Internet customers and stores in London?
  • 26. MDX optimization Methodology • Re-write the MDX code • Add Aggregations • Add pre-calculated Measure Groups (ETL) • Solve the problem using Relational Engine • Use .NET Store Procedures. – Rarely the problem can be solved using better hardware. • Column based Databases
  • 27. • Optimizing MDX – Baselining Query Speeds • Clearing the Analysis Services Caches • Clearing the Operating System Caches using fsutil.exe or SSAS Stored Proc (codeplex) • Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services • Configuring the Analysis Services Query Log
  • 28. • Cell-by-Cell Mode vs. Subspace Mode Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode.
  • 29. Using Profiler • So far so good
  • 32. Granularity • Single grain – List of GROUP BY attributes in SQL SELECT • Mixed grain – Both Attribute.[All] and Attribute.MEMBERS
  • 33. Granularity All Countries, Countries, Country, All City Cities All City All Products Products
  • 34. Slice • Single member – SQL: Where City = „Redmond‟ – MDX: [City].[Redmond] • Multiple members – SQL: Where City IN („Redmond‟, „Seattle‟) – MDX: { [City].[Redmond], [City].[Seattle] }
  • 35. Slice at granularity SQL SELECT Sum(Sales), City FROM Sales_Table WHERE City IN (‘Redmond’, ‘Seattle’) GROUP BY City MDX SELECT Measures.Sales ON 0 , NON EMPTY {Redmond, Seattle} ON 1 FROM Sales_Cube
  • 36. Slice below granularity SQL SELECT Sum(Sales) FROM Sales_Table WHERE City IN (‘Redmond’, ‘Seattle’) MDX SELECT Measures.Sales ON 0 FROM Sales_Cube WHERE {Redmond, Seattle}
  • 37. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London
  • 38. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London (Seattle, Year.Year.MEMBERS)
  • 39. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London (Seattle, Year.MEMBERS)
  • 40. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London ({Redmond, Seattle, London}, Year.MEMBERS)
  • 41. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London ({Redmond, Seattle}, {2005, 2006, 2007})
  • 42. Arbitrary shaped subcubes • What is it ? • How can it happen ? • Why is it so bad ? • How to avoid them ?
  • 43. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York Lodnon Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))
  • 44. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle SF Denver CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)
  • 45. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London {(Redmond,2005), (Seattle, 2006), (New York, 2007), (London, 2008)}
  • 46. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))
  • 47. Arbitrary shapes • WHERE/Subselect/Aggregate • Unnatural hierarchies • Parent-Child (visual totals) • “Non Leaves” subcube • Conditional logic (IIF, IF, CASE, CoalesceEmpty etc) • NonEmpty, Exists
  • 48. WHERE/Subselect • Severity = „1‟ OR Priority = „1‟ • multiselect – {USA, London}
  • 49. Mixed grain slicer All USA UK New Seattle London Bristol York
  • 50. Mixed grain slicer All USA UK New Seattle London Bristol York All Cities Seattle New York London Bristol All Countries USA UK
  • 52. Leaves vs. Non Leaves All Countries, Countries, Country, All City Cities All City All Product s Product Leaves s
  • 53. Problems with arbitrary shapes • Caching • Partition slices • Indexes • SCOPEs • Matching calculations • Many more (for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
  • 54. SCOPE SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...; END SCOPE;
  • 55. Subcube decomposition SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...; END SCOPE; Scope 2 Scope 3 Scope 1
  • 56. Subcube decomposition SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...; END SCOPE; SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...; END SCOPE; SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember ...; END SCOPE;
  • 57. MDX Optimization - Tips • Partial expressions are not cached This = iif(<expensive expression >= 0, 1/<expensive expression>, null); create member currentcube.measures.MyPartialExpression as <expensive expression> , visible=0; this = iif(measures.MyPartialExpression >= 0, 1/ measures.MyPartialExpression, null);
  • 58. Demo
  • 59. SSAS Denali • Coming in the first half of 2012 • SSAS Tabular Mode – Cheaper – Not best of breed – Uses DAX or MDX • Have you started working with it?
  • 60. Mobile BI BI l Smart Phone l BI l Mobile Bi BI Gartner
  • 61. Social BI • Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users. • Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more • Instantly Available via Cloud
  • 62. Social BI • Deep Personalization • Enterprise Data Integration
  • 63. Survey • SQL / SSAS Denali • Mobile BI • Social BI