SlideShare a Scribd company logo
SCD2 mal anders
Andrej Pashchenko
Senior Consultant, Düsseldorf
@Andrej_SQL doag2017
Unser Unternehmen.
Trivadis DOAG17: SCD2 mal anders2 29.11.2018
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution
Engineering und der Erbringung von IT-Services mit Fokussierung auf -
und -Technologien in der Schweiz, Deutschland, Österreich und
Dänemark. Trivadis erbringt ihre Leistungen aus den strategischen Geschäftsfeldern:
Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme.
B E T R I E B
KOPENHAGEN
MÜNCHEN
LAUSANNE
BERN
ZÜRICH
BRUGG
GENF
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
WIEN
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort.
Trivadis DOAG17: SCD2 mal anders3 29.11.2018
14 Trivadis Niederlassungen mit
über 600 Mitarbeitenden.
Über 200 Service Level Agreements.
Mehr als 4'000 Trainingsteilnehmer.
Forschungs- und Entwicklungsbudget:
CHF 5.0 Mio. / EUR 4.0 Mio.
Finanziell unabhängig und
nachhaltig profitabel.
Erfahrung aus mehr als 1'900 Projekten
pro Jahr bei über 800 Kunden.
Über mich
Trivadis DOAG17: SCD2 mal anders4 29.11.2018
Senior Consultant bei der Trivadis GmbH, Düsseldorf
Schwerpunkt Oracle
– Data Warehousing
– Application Development
– Application Performance
Kurs-Referent „Oracle 12c New Features für Entwickler“
und „TechnoCircle Oracle 12c Release 2“
Blog: http://blog.sqlora.com
Agenda
Trivadis DOAG17: SCD2 mal anders5 29.11.2018
1. Introduction and state of the art
2. The „new“ approach
3. Use cases and performance
4. Conclusion
Trivadis DOAG17: SCD2 mal anders6 29.11.2018
Introduction and state of the art
Introduction
Trivadis DOAG17: SCD2 mal anders7 29.11.2018
Historization? As a part of loading process in a data warehouse
We consider Slowly Changing Dimensions Type II
All changes are completely tracked. The change in at least one of the tracked
columns toggles the creation of the new version record
The most challenging task is the change detection
DWH_KEY VALID_FROM VALID_TO CUR_VERSION ETL_OP BUS_KEY FIRST_NAME SECOND_NAMES LAST_NAME HIRE_DATE FIRE_DATE SALARY
1 01.12.2016 02.12.2016 N UPD 123 Roger Federer 01.01.2010 900000
11 03.12.2016 Y INS 123 Roger Federer 01.01.2010 920000
6 02.12.2016 02.12.2016 N UPD 345 Venus Williams 01.11.2016 500000
10 03.12.2016 Y INS 345 Venus Williams 01.11.2016 01.12.2016 500000
2 01.12.2016 02.12.2016 N UPD 456 Rafael Nadal 01.05.2009 720000
3 01.12.2016 01.12.2016 N UPD 789 Serena Williams 01.06.2008 650000
5 02.12.2016 Y INS 789 Serena Jameka Williams 01.06.2008 650000
State of the Art
Trivadis DOAG17: SCD2 mal anders8 29.11.2018
Typical OWB mapping
BK_T C1_T C2_T
11 A BB
22 D E
77 M N
33 F G
State of the Art
Trivadis DOAG17: SCD2 mal anders9 29.11.2018
BK C1 C2
11 A B
22 D E
44 K L
77 M
BK C1 C2
11 A BB
22 D E
33 F G
77 M N
BK_S C1_S C2_S
11 A B
22 D E
44 K L
77 M
NVL(C2_S,'(NULL)') != NVL(C2_T,'(NULL)')
LNNVL(C2_S = C2_T) AND NVL(C2_S, C2_T) IS NOT NULL
DECODE, STANDARD_HASH, SYS_OP_MAP_NONNULL …
Full
Outer
Join
Change
Detection?
Old
Versions
New
Versions
Old
New
Target
Source
Target
Split
UNION ALL
MERGE
More on delta detection: https://danischnider.wordpress.com/2016/10/08/delta-detection-in-oracle-sql/
Data to the left has to be
accessed twice!
State of the Art
Trivadis DOAG17: SCD2 mal anders10 29.11.2018
Change detection must be done with respect to null values
Comparing each and every column in a complex way
Or maintaining and comparing hash-diffs: common rules needed, re-hashing after
structural changes sometimes needed
Full outer join may be expensive if not working with „deltas“
Splitting the join result into two data sets causes this join to be made twice
Another
solution?
Trivadis DOAG17: SCD2 mal anders11 29.11.2018
The „new“ approach
The „new“ approach
Trivadis DOAG17: SCD2 mal anders12 29.11.2018
The „new“ approach is not really new
Oft used for ad hoc queries
Are these two records different?
Using Group BY
BK C1 C2 C3 C4 … … C467 C468 C469
11 A B C D … … AA BB CC
11 A B C D … … AB BB CC
SELECT COUNT(*)
FROM t
GROUP BY BK, C1, C2, C3, C4, … C467, C468, C469
The „new“ approach
Trivadis DOAG17: SCD2 mal anders13 29.11.2018
Or using analytical function:
If count equals 2 – they are the same
If count equals 1 – they are different
For GROUP BY and PARTITION BY:
NULL=NULL, VALUE!=NULL
SELECT COUNT(*) OVER (PARTITION BY BK, C1, C2, C3, … C468, C469)
FROM t;
But what
about NULLs?
BK C1 C2
11 A BB
33 F G
77 M N
S_T BK C1 C2
T 11 A BB
T 22 D E
T 33 F G
T 77 M N
The „new“ approach
Trivadis DOAG17: SCD2 mal anders14 29.11.2018
BK C1 C2
11 A B
22 D E
44 K L
77 M
BK C1 C2
11 A BB
22 D E
33 F G
77 M N
UNION ALL
Target
Source
Target
GROUP BY MERGE
S_T BK C1 C2
S 11 A B
S 22 D E
S 44 K L
S 77 M
MIN
(S_T)
S
S
S
S
T
T
T
DEMO!
BK C1 C2
11 A B
22 D E
44 K L
77 M
CNT
1
2
1
1
1
1
1
The „new“ approach
Trivadis DOAG17: SCD2 mal anders15 29.11.2018
An unconventional approach for ETL of historized data16 19.03.2017
Use Cases and Performance
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders17 29.11.2018
Source
Older
Versions
Full Data
Current
VersionsJOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Full Data Load
Full Data
Current
Versions
Group By
may be
slow
UNION ALLLegacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders18 29.11.2018
Source
Delta
JOIN Filter
may be
slow
Partitio-
ning?
Older
Versions
Current
Versions
Target
Delta Load
Delta
Current
Versions
Group By
may be
slow
UNION ALLLegacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders19 29.11.2018
Source
Older
Versions
Delta
Current
Versions
JOIN
Filter
Business_key
IN …
Target
Delta Load with pre-filter
Delta
Current Ver-
sions (filtered)
Group By
fast
UNION ALLLegacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders20 29.11.2018
Data Warehouse with Siebel-CRM as a source
Order table S_ORDER – 120 columns „only“
Comparing legacy approach vs. GROUP BY vs. analytical functions
Full staging table as a source vs. delta (with or without pre-filtering)
Ca. 6 Mio rows in the target table
Ca. 3 Mio rows in the full load dataset
Ca. 3000 rows in the delta load dataset
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders21 29.11.2018
Method Delta Load, min Full Load, min
Outer Join (legacy approach) 0:09 0:41
GROUP BY 1:10 1:04
GROUP BY with pre-filter 0:04 N/A
Analytic Function 2:12 4:52
Analytic with pre-filter 0:12 N/A
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders22 29.11.2018
Execution Plan
--------------------------------------------------------------------------------------------
| Id | Operation | Name | A-Rows | A-Time |
--------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 0 |00:00:04.33 |
| 1 | MERGE | CO_S_ORDER_TEST | 0 |00:00:04.33 |
| 2 | VIEW | | 3799 |00:00:04.29 |
| 3 | SEQUENCE | SEQ_CO_S_ORDER | 3799 |00:00:04.29 |
| 4 | PX COORDINATOR | | 3799 |00:00:04.28 |
| 5 | PX SEND QC (RANDOM) | :TQ10005 | 0 |00:00:00.01 |
|* 6 | HASH JOIN OUTER BUFFERED | | 3799 |00:00:11.51 |
| 7 | PX RECEIVE | | 3799 |00:00:00.01 |
...
| 15 | PX RECEIVE | | 4654 |00:00:00.04 |
| 16 | PX SEND HASH | :TQ10001 | 0 |00:00:00.01 |
| 17 | HASH GROUP BY | | 4654 |00:00:03.41 |
| 18 | VIEW | | 4801 |00:00:00.77 |
| 19 | UNION-ALL | | 4801 |00:00:00.77 |
| 20 | PX BLOCK ITERATOR | | 3120 |00:00:00.01 |
|* 21 | TABLE ACCESS FULL | STG_S_ORDER_DELTA | 3120 |00:00:00.01 |
|* 22 | HASH JOIN RIGHT SEMI | | 1681 |00:00:04.41 |
| 23 | PX RECEIVE | | 12480 |00:00:00.02 |
| 24 | PX SEND BROADCAST | :TQ10000 | 0 |00:00:00.01 |
| 25 | PX BLOCK ITERATOR | | 3120 |00:00:00.01 |
|* 26 | TABLE ACCESS FULL| STG_S_ORDER_DELTA | 3120 |00:00:00.01 |
| 27 | PX BLOCK ITERATOR | | 3710K|00:00:03.26 |
|* 28 | TABLE ACCESS FULL | CO_S_ORDER_TEST | 3710K|00:00:02.92 |
| 29 | PX RECEIVE | | 6107K|00:00:11.11 |
| 30 | PX SEND HASH | :TQ10004 | 0 |00:00:00.01 |
| 31 | PX BLOCK ITERATOR | | 6107K|00:00:05.37 |
|* 32 | TABLE ACCESS FULL | CO_S_ORDER_TEST | 6107K|00:00:04.69 |
--------------------------------------------------------------------------------------------
Legacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders23 29.11.2018
Source
Older
Versions
Current
Versions
Core
Current
Versions
Dim
JOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Loading Dimensions from Core
Current
Versions
Core
Current
Versions
Dim
Group By
may be
slow
UNION ALL
Older
Versions
Legacy New
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders24 29.11.2018
Source is a View
Older
Versions
Current
VersionsJOIN
may be
slow
Filter
may be
slow
Partitio-
ning?
Target
Loading Dimensions from Core
Full Data
Current
Versions
Group By
may be
slow
UNION ALL
Use Cases and Performance
Trivadis DOAG17: SCD2 mal anders25 29.11.2018
Loading of a dimension via view
The view joins some „big“ tables (50 Gb, 40+ Mio rows)
And produces < 500 dimension records per day
The loading time could be reduced by 45 percent (3 min 50 sec → 2 min)
Conclusion
Trivadis DOAG17: SCD2 mal anders26 29.11.2018
It is simpler and faster in certain cases
The source is queried only once, can be significant if the source is a view
The code can be simply generated
Simple to build even without generation (only a plain list of columns to Copy&Paste)
It‘s worth to do an ad hoc testing with your data
Test it!
Andrej Pashchenko
Senior Consultant
Tel. +49 211 58 666 470
andrej.pashchenko@trivadis.com
29.11.2018 Trivadis DOAG17: SCD2 mal anders28
blog.sqlora.com
Trivadis @ DOAG 2017
#opencompany
Stand: 3ter Stock, direkt an der Rolltreppe
Wir teilen unser Know how!
Einfach vorbei kommen, Live-Präsentationen
und Dokumentenarchiv
T-Shirts, Gewinnspiel und mehr
Wir freuen uns wenn Sie vorbei schauen
29.11.2018 Trivadis DOAG17: SCD2 mal anders29

More Related Content

PDF
Wavelet News
PPT
Pushandpullproductionsystems chap7-ppt-100210005527-phpapp01[1]
PPT
Push And Pull Production Systems Chap7 Ppt)
TXT
Pavement design method
PDF
Agilent ADS 模擬手冊 [實習2] 放大器設計
PDF
Dds 2
PDF
Talk litvinenko prior_cov
PDF
Agilent ADS 模擬手冊 [實習3] 壓控振盪器模擬
Wavelet News
Pushandpullproductionsystems chap7-ppt-100210005527-phpapp01[1]
Push And Pull Production Systems Chap7 Ppt)
Pavement design method
Agilent ADS 模擬手冊 [實習2] 放大器設計
Dds 2
Talk litvinenko prior_cov
Agilent ADS 模擬手冊 [實習3] 壓控振盪器模擬

Similar to An unconventional approach for ETL of historized data (20)

PPTX
Indexes From the Concept to Internals
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
PDF
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
PPT
unit-6_combinational_jbiunkjnjbkjbjjcircuit-2.ppt
PDF
Generating lisp program for assembly drawing in AutoCAD
PDF
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
PDF
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
PDF
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
PDF
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
PDF
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
PDF
Z390 Designare REV1.0.pdf
DOCX
Subquery factoring for FTS
PPTX
Make your data dance: PIVOT and GROUP BY in Oracle SQL
PDF
Sch 28303 b
PDF
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
PDF
Caterpillar cat 312 d excavator (prefix rkf) service repair manual (rkf00001 ...
PDF
Caterpillar Cat 312D Excavator (Prefix RKF) Service Repair Manual (RKF00001 a...
PDF
Caterpillar Cat 312D L Excavator (Prefix RKF) Service Repair Manual (RKF00001...
PDF
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
PDF
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
Indexes From the Concept to Internals
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
unit-6_combinational_jbiunkjnjbkjbjjcircuit-2.ppt
Generating lisp program for assembly drawing in AutoCAD
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
Caterpillar Cat 320D2L Excavator (Prefix XCC) Service Repair Manual (XCC00001...
Z390 Designare REV1.0.pdf
Subquery factoring for FTS
Make your data dance: PIVOT and GROUP BY in Oracle SQL
Sch 28303 b
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
Caterpillar cat 312 d excavator (prefix rkf) service repair manual (rkf00001 ...
Caterpillar Cat 312D Excavator (Prefix RKF) Service Repair Manual (RKF00001 a...
Caterpillar Cat 312D L Excavator (Prefix RKF) Service Repair Manual (RKF00001...
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
Caterpillar Cat 312D and 312D L Excavator (Prefix RKF) Service Repair Manual ...
Ad

More from Andrej Pashchenko (8)

PDF
MERGE SQL Statement: Lesser Known Facets
PDF
SQL Macros - Game Changing Feature for SQL Developers?
PDF
Polymorphic Table Functions in 18c
PDF
Properly Use Parallel DML for ETL
PDF
Polymorphic Table Functions in 18c
PDF
Online Statistics Gathering for ETL
PDF
SQL Pattern Matching – should I start using it?
PDF
Pure SQL for batch processing
MERGE SQL Statement: Lesser Known Facets
SQL Macros - Game Changing Feature for SQL Developers?
Polymorphic Table Functions in 18c
Properly Use Parallel DML for ETL
Polymorphic Table Functions in 18c
Online Statistics Gathering for ETL
SQL Pattern Matching – should I start using it?
Pure SQL for batch processing
Ad

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Introduction to the R Programming Language
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
How to run a consulting project- client discovery
PDF
Microsoft Core Cloud Services powerpoint
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Qualitative Qantitative and Mixed Methods.pptx
Database Infoormation System (DBIS).pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IMPACT OF LANDSLIDE.....................
importance of Data-Visualization-in-Data-Science. for mba studnts
ISS -ESG Data flows What is ESG and HowHow
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
modul_python (1).pptx for professional and student
STERILIZATION AND DISINFECTION-1.ppthhhbx
CYBER SECURITY the Next Warefare Tactics
Introduction to the R Programming Language
SAP 2 completion done . PRESENTATION.pptx
[EN] Industrial Machine Downtime Prediction
How to run a consulting project- client discovery
Microsoft Core Cloud Services powerpoint
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}

An unconventional approach for ETL of historized data

  • 1. SCD2 mal anders Andrej Pashchenko Senior Consultant, Düsseldorf @Andrej_SQL doag2017
  • 2. Unser Unternehmen. Trivadis DOAG17: SCD2 mal anders2 29.11.2018 Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution Engineering und der Erbringung von IT-Services mit Fokussierung auf - und -Technologien in der Schweiz, Deutschland, Österreich und Dänemark. Trivadis erbringt ihre Leistungen aus den strategischen Geschäftsfeldern: Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme. B E T R I E B
  • 3. KOPENHAGEN MÜNCHEN LAUSANNE BERN ZÜRICH BRUGG GENF HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL WIEN Mit über 600 IT- und Fachexperten bei Ihnen vor Ort. Trivadis DOAG17: SCD2 mal anders3 29.11.2018 14 Trivadis Niederlassungen mit über 600 Mitarbeitenden. Über 200 Service Level Agreements. Mehr als 4'000 Trainingsteilnehmer. Forschungs- und Entwicklungsbudget: CHF 5.0 Mio. / EUR 4.0 Mio. Finanziell unabhängig und nachhaltig profitabel. Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden.
  • 4. Über mich Trivadis DOAG17: SCD2 mal anders4 29.11.2018 Senior Consultant bei der Trivadis GmbH, Düsseldorf Schwerpunkt Oracle – Data Warehousing – Application Development – Application Performance Kurs-Referent „Oracle 12c New Features für Entwickler“ und „TechnoCircle Oracle 12c Release 2“ Blog: http://blog.sqlora.com
  • 5. Agenda Trivadis DOAG17: SCD2 mal anders5 29.11.2018 1. Introduction and state of the art 2. The „new“ approach 3. Use cases and performance 4. Conclusion
  • 6. Trivadis DOAG17: SCD2 mal anders6 29.11.2018 Introduction and state of the art
  • 7. Introduction Trivadis DOAG17: SCD2 mal anders7 29.11.2018 Historization? As a part of loading process in a data warehouse We consider Slowly Changing Dimensions Type II All changes are completely tracked. The change in at least one of the tracked columns toggles the creation of the new version record The most challenging task is the change detection DWH_KEY VALID_FROM VALID_TO CUR_VERSION ETL_OP BUS_KEY FIRST_NAME SECOND_NAMES LAST_NAME HIRE_DATE FIRE_DATE SALARY 1 01.12.2016 02.12.2016 N UPD 123 Roger Federer 01.01.2010 900000 11 03.12.2016 Y INS 123 Roger Federer 01.01.2010 920000 6 02.12.2016 02.12.2016 N UPD 345 Venus Williams 01.11.2016 500000 10 03.12.2016 Y INS 345 Venus Williams 01.11.2016 01.12.2016 500000 2 01.12.2016 02.12.2016 N UPD 456 Rafael Nadal 01.05.2009 720000 3 01.12.2016 01.12.2016 N UPD 789 Serena Williams 01.06.2008 650000 5 02.12.2016 Y INS 789 Serena Jameka Williams 01.06.2008 650000
  • 8. State of the Art Trivadis DOAG17: SCD2 mal anders8 29.11.2018 Typical OWB mapping
  • 9. BK_T C1_T C2_T 11 A BB 22 D E 77 M N 33 F G State of the Art Trivadis DOAG17: SCD2 mal anders9 29.11.2018 BK C1 C2 11 A B 22 D E 44 K L 77 M BK C1 C2 11 A BB 22 D E 33 F G 77 M N BK_S C1_S C2_S 11 A B 22 D E 44 K L 77 M NVL(C2_S,'(NULL)') != NVL(C2_T,'(NULL)') LNNVL(C2_S = C2_T) AND NVL(C2_S, C2_T) IS NOT NULL DECODE, STANDARD_HASH, SYS_OP_MAP_NONNULL … Full Outer Join Change Detection? Old Versions New Versions Old New Target Source Target Split UNION ALL MERGE More on delta detection: https://danischnider.wordpress.com/2016/10/08/delta-detection-in-oracle-sql/ Data to the left has to be accessed twice!
  • 10. State of the Art Trivadis DOAG17: SCD2 mal anders10 29.11.2018 Change detection must be done with respect to null values Comparing each and every column in a complex way Or maintaining and comparing hash-diffs: common rules needed, re-hashing after structural changes sometimes needed Full outer join may be expensive if not working with „deltas“ Splitting the join result into two data sets causes this join to be made twice Another solution?
  • 11. Trivadis DOAG17: SCD2 mal anders11 29.11.2018 The „new“ approach
  • 12. The „new“ approach Trivadis DOAG17: SCD2 mal anders12 29.11.2018 The „new“ approach is not really new Oft used for ad hoc queries Are these two records different? Using Group BY BK C1 C2 C3 C4 … … C467 C468 C469 11 A B C D … … AA BB CC 11 A B C D … … AB BB CC SELECT COUNT(*) FROM t GROUP BY BK, C1, C2, C3, C4, … C467, C468, C469
  • 13. The „new“ approach Trivadis DOAG17: SCD2 mal anders13 29.11.2018 Or using analytical function: If count equals 2 – they are the same If count equals 1 – they are different For GROUP BY and PARTITION BY: NULL=NULL, VALUE!=NULL SELECT COUNT(*) OVER (PARTITION BY BK, C1, C2, C3, … C468, C469) FROM t; But what about NULLs?
  • 14. BK C1 C2 11 A BB 33 F G 77 M N S_T BK C1 C2 T 11 A BB T 22 D E T 33 F G T 77 M N The „new“ approach Trivadis DOAG17: SCD2 mal anders14 29.11.2018 BK C1 C2 11 A B 22 D E 44 K L 77 M BK C1 C2 11 A BB 22 D E 33 F G 77 M N UNION ALL Target Source Target GROUP BY MERGE S_T BK C1 C2 S 11 A B S 22 D E S 44 K L S 77 M MIN (S_T) S S S S T T T DEMO! BK C1 C2 11 A B 22 D E 44 K L 77 M CNT 1 2 1 1 1 1 1
  • 15. The „new“ approach Trivadis DOAG17: SCD2 mal anders15 29.11.2018
  • 16. An unconventional approach for ETL of historized data16 19.03.2017 Use Cases and Performance
  • 17. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders17 29.11.2018 Source Older Versions Full Data Current VersionsJOIN may be slow Filter may be slow Partitio- ning? Target Full Data Load Full Data Current Versions Group By may be slow UNION ALLLegacy New
  • 18. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders18 29.11.2018 Source Delta JOIN Filter may be slow Partitio- ning? Older Versions Current Versions Target Delta Load Delta Current Versions Group By may be slow UNION ALLLegacy New
  • 19. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders19 29.11.2018 Source Older Versions Delta Current Versions JOIN Filter Business_key IN … Target Delta Load with pre-filter Delta Current Ver- sions (filtered) Group By fast UNION ALLLegacy New
  • 20. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders20 29.11.2018 Data Warehouse with Siebel-CRM as a source Order table S_ORDER – 120 columns „only“ Comparing legacy approach vs. GROUP BY vs. analytical functions Full staging table as a source vs. delta (with or without pre-filtering) Ca. 6 Mio rows in the target table Ca. 3 Mio rows in the full load dataset Ca. 3000 rows in the delta load dataset
  • 21. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders21 29.11.2018 Method Delta Load, min Full Load, min Outer Join (legacy approach) 0:09 0:41 GROUP BY 1:10 1:04 GROUP BY with pre-filter 0:04 N/A Analytic Function 2:12 4:52 Analytic with pre-filter 0:12 N/A
  • 22. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders22 29.11.2018 Execution Plan -------------------------------------------------------------------------------------------- | Id | Operation | Name | A-Rows | A-Time | -------------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 0 |00:00:04.33 | | 1 | MERGE | CO_S_ORDER_TEST | 0 |00:00:04.33 | | 2 | VIEW | | 3799 |00:00:04.29 | | 3 | SEQUENCE | SEQ_CO_S_ORDER | 3799 |00:00:04.29 | | 4 | PX COORDINATOR | | 3799 |00:00:04.28 | | 5 | PX SEND QC (RANDOM) | :TQ10005 | 0 |00:00:00.01 | |* 6 | HASH JOIN OUTER BUFFERED | | 3799 |00:00:11.51 | | 7 | PX RECEIVE | | 3799 |00:00:00.01 | ... | 15 | PX RECEIVE | | 4654 |00:00:00.04 | | 16 | PX SEND HASH | :TQ10001 | 0 |00:00:00.01 | | 17 | HASH GROUP BY | | 4654 |00:00:03.41 | | 18 | VIEW | | 4801 |00:00:00.77 | | 19 | UNION-ALL | | 4801 |00:00:00.77 | | 20 | PX BLOCK ITERATOR | | 3120 |00:00:00.01 | |* 21 | TABLE ACCESS FULL | STG_S_ORDER_DELTA | 3120 |00:00:00.01 | |* 22 | HASH JOIN RIGHT SEMI | | 1681 |00:00:04.41 | | 23 | PX RECEIVE | | 12480 |00:00:00.02 | | 24 | PX SEND BROADCAST | :TQ10000 | 0 |00:00:00.01 | | 25 | PX BLOCK ITERATOR | | 3120 |00:00:00.01 | |* 26 | TABLE ACCESS FULL| STG_S_ORDER_DELTA | 3120 |00:00:00.01 | | 27 | PX BLOCK ITERATOR | | 3710K|00:00:03.26 | |* 28 | TABLE ACCESS FULL | CO_S_ORDER_TEST | 3710K|00:00:02.92 | | 29 | PX RECEIVE | | 6107K|00:00:11.11 | | 30 | PX SEND HASH | :TQ10004 | 0 |00:00:00.01 | | 31 | PX BLOCK ITERATOR | | 6107K|00:00:05.37 | |* 32 | TABLE ACCESS FULL | CO_S_ORDER_TEST | 6107K|00:00:04.69 | --------------------------------------------------------------------------------------------
  • 23. Legacy New Use Cases and Performance Trivadis DOAG17: SCD2 mal anders23 29.11.2018 Source Older Versions Current Versions Core Current Versions Dim JOIN may be slow Filter may be slow Partitio- ning? Target Loading Dimensions from Core Current Versions Core Current Versions Dim Group By may be slow UNION ALL Older Versions
  • 24. Legacy New Use Cases and Performance Trivadis DOAG17: SCD2 mal anders24 29.11.2018 Source is a View Older Versions Current VersionsJOIN may be slow Filter may be slow Partitio- ning? Target Loading Dimensions from Core Full Data Current Versions Group By may be slow UNION ALL
  • 25. Use Cases and Performance Trivadis DOAG17: SCD2 mal anders25 29.11.2018 Loading of a dimension via view The view joins some „big“ tables (50 Gb, 40+ Mio rows) And produces < 500 dimension records per day The loading time could be reduced by 45 percent (3 min 50 sec → 2 min)
  • 26. Conclusion Trivadis DOAG17: SCD2 mal anders26 29.11.2018 It is simpler and faster in certain cases The source is queried only once, can be significant if the source is a view The code can be simply generated Simple to build even without generation (only a plain list of columns to Copy&Paste) It‘s worth to do an ad hoc testing with your data Test it!
  • 27. Andrej Pashchenko Senior Consultant Tel. +49 211 58 666 470 andrej.pashchenko@trivadis.com 29.11.2018 Trivadis DOAG17: SCD2 mal anders28 blog.sqlora.com
  • 28. Trivadis @ DOAG 2017 #opencompany Stand: 3ter Stock, direkt an der Rolltreppe Wir teilen unser Know how! Einfach vorbei kommen, Live-Präsentationen und Dokumentenarchiv T-Shirts, Gewinnspiel und mehr Wir freuen uns wenn Sie vorbei schauen 29.11.2018 Trivadis DOAG17: SCD2 mal anders29