SlideShare a Scribd company logo
Five Tuning Tips For Your Data Warehouse Jeff Moss
My First Presentation Yes, my very first presentation For BIRT SIG For UKOUG Useful Advice from friends and colleagues Use graphics where appropriate Find a friendly or familiar face in the audience Imagine your audience is naked! … but like Oracle, be careful when combining advice!
Be Careful Combining Advice! Thanks for the opportunity Mark!
Agenda My background Five tips Partition for success Squeeze your data with data segment compression Make the most of your PGA memory Beware of temporal data affecting the optimizer Find out where your query is at Questions
My Background Independent Consultant 13 years Oracle experience Blog:  http://oramossoracle.blogspot.com/ Focused on warehousing / VLDB since 1998 First project UK Music Sales Data Mart Produces BBC Radio 1 Top 40 chart and many more 2 billion row sales fact table 1 Tb total database size Currently working with Eon UK (Powergen) 4Tb Production Warehouse, 8Tb total storage Oracle Product Stack
What Is Partitioning ? “ Partitioning  addresses key issues in supporting very large tables and indexes by letting you decompose them into  smaller  and more  manageable  pieces called  partitions .”   Oracle Database Concepts Manual, 10gR2 Introduced in Oracle 8.0 Numerous improvements since Subpartitioning adds another level of decomposition Partitions and Subpartitions are logical containers
Partition To Tablespace Mapping Partitions map to tablespaces Partition can only be in One tablespace Tablespace can hold many partitions Highest granularity is One tablespace per partition Lowest granularity is One tablespace for all the partitions Tablespace volatility Read / Write Read Only P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
Why Partition ? - Performance Improved query performance Pruning or elimination Partition wise joins Read only partitions Quicker checkpointing Quicker backup Quicker recovery … but it depends on mapping of: partition:tablespace:datafile SELECT SUM(sales)  FROM part_tab WHERE sales_date BETWEEN ‘01-JAN-2005’  AND ’30-JUN-2005’ Sales Fact Table * Oracle 10gR2 Data Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Why Partition ? - Manageability Archiving Use a rolling window approach ALTER TABLE … ADD/SPLIT/DROP PARTITION… Easier ETL Processing Build a new dataset in a staging table Add indexes and constraints Collect statistics Then swap the staging table for a partition on the target ALTER TABLE…EXCHANGE PARTITION… Easier Maintenance Table partition move, e.g. to compress data Local Index partition rebuild
Why Partition ? - Scalability Partition is generally consistent and predictable Assuming an appropriate partitioning key is used …and data has an even distribution across the key Read only approach Scalable backups - read only tablespaces are ignored …so partitions in those tablespaces are ignored Pruning allows consistent query performance
Why Partition ? - Availability Offline data impact minimised … depending on granularity Quicker recovery Pruned data not missed EXCHANGE PARTITION Allows offline build Quick swap over P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
Fact Table Partitioning Transaction Date Load Date Easier ETL Processing Each load deals with only 1 partition No use to end user queries! Can’t prune – Full scans! Harder ETL Processing But still uses EXCHANGE PARTITION Useful to end user queries Allows full pruning capability 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 January Partition February Partition 22-JAN-2005 Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 09-APR-2005 Customer 9 10-APR-2005
Watch out for… Partition exchange and table statistics 1 Partition stats updated … but Global stats are NOT! Affects queries accessing multiple partitions Solution Gather stats on staging table prior to EXCHANGE Gather stats on partitioned table using GLOBAL Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
Partitioning Feature: Characteristic Reason Matrix    Partition Truncation     Exchange Partition    Archiving    Pruning (Partition Elimination)   Partition wise joins  Parallel DML     Local Indexes    Read Only Partitions Availability Scalability Manageability Performance Characteristic: Feature:
What Is Data Segment Compression ? Compresses data by eliminating intra block repeated column values Reduces the space required for a segment …but only if there are appropriate repeats! Self contained Lossless algorithm
Where Can Data Segment Compression Be Used ? Can be used with a number of segment types Heap & Nested Tables Range or List Partitions Materialized Views Can’t be used with Subpartitions Hash Partitions Indexes –  but they have row level compression IOT External Tables Tables that are part of a Cluster LOBs
How Does Segment Compression Work ? Database Block Symbol Table Row Data Area 100 Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8 4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP
Pros & Cons Pros Saves space Reduces LIO / PIO Speeds up backup/recovery Improves query response time Transparent To readers  … and writers Decreases time to perform some DML  Deletes  should be  quicker Bulk inserts  may  be quicker Cons Increases CPU load Can only be used on Direct Path operations CTAS Serial Inserts using INSERT /*+ APPEND */ Parallel Inserts (PDML) ALTER TABLE…MOVE… Direct Path SQL*Loader Increases time to perform some DML Bulk inserts  may  be slower Updates are slower
Ordering Your Data For Maximum Benefits Colocate data to maximise compression benefits For maximum compression Minimise the total space required by the segment Identify most “compressable” column(s) For optimal access We know how the data is to be queried Order the data by  Access path columns  Then the next most “compressable” column(s) Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
Get Max Compression Order Package PROCEDURE mgmt_p_get_max_compress_order Argument Name  Type  In/Out Default? ------------------------------ ----------------------- ------ -------- P_TABLE_OWNER  VARCHAR2  IN  DEFAULT P_TABLE_NAME  VARCHAR2  IN P_PARTITION_NAME  VARCHAR2  IN  DEFAULT P_SAMPLE_SIZE  NUMBER  IN  DEFAULT P_PREFIX_COLUMN1  VARCHAR2  IN  DEFAULT P_PREFIX_COLUMN2  VARCHAR2  IN  DEFAULT P_PREFIX_COLUMN3  VARCHAR2  IN  DEFAULT BEGIN mgmt_p_get_max_compress_order(p_table_owner => ‘AE_MGMT’ ,p_table_name =>’BIG_TABLE’ ,p_sample_size =>10000); END: / Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table  : BIG_TABLE Sample Size  : 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table  : TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAME  COLUMN  BLOCKS  ROWS   SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119  COL1  290  10000   .0022 TEMP_COL_002_25012006232119  COL2  345  10000   .0026 TEMP_COL_003_25012006232119  COL3  555  10000   .0042
Data Warehousing Specifics Star Schema compresses better than Normalized More redundant data Focus on… Fact Tables and Summaries in Star Schema Transaction tables in Normalized Schema Performance Impact 1 Space Savings Star schema: 67% Normalized: 24% Query Elapsed Times Star schema: 16.5% Normalized: 10% 1 -  Table Compression in Oracle 9iR2: A Performance Analysis
Things To Watch Out For DROP COLUMN is awkward ORA-39726: Unsupported add/drop column operation on compressed tables Uncompress the table and try again - still gives ORA-39726! After UPDATEs data is uncompressed Performance impact Row migration Use appropriate physical design settings PCTFREE 0  - pack each block Large blocksize -  reduce overhead / increase repeats per block
PGA Memory: What For ? Sorts Standard sorts  [SORT] Buffer  [BUFFER] Group By  [GROUP BY (SORT)] Connect By  [CONNECT-BY (SORT)] Rollup  [ROLLUP (SORT)] Window  [WINDOW (SORT)] Hash Joins  [HASH-JOIN] Indexes Maintenance  [IDX MAINTENANCE SOR] Bitmap Merge  [BITMAP MERGE] Bitmap Create  [BITMAP CREATE] Write Buffers  [LOAD WRITE BUFFERS] Serial Process PGA Dedicated Server Cursors Variables Sort Area [] V$SQL_WORKAREA.OPERATION_TYPE
PGA Memory Management: Manual The “old” way of doing things  Still available though – even in 10g R2 Configuring ALTER SESSION SET WORKAREA_SIZE_POLICY=MANUAL; Initialisation parameter:  WORKAREA_SIZE_POLICY=MANUAL Set memory parameters yourself HASH_AREA_SIZE SORT_AREA_SIZE SORT_AREA_RETAINED_SIZE BITMAP_MERGE_AREA_SIZE CREATE_BITMAP_AREA_SIZE Optimal values depend on the type of work 1 One size does not fit all! 1 - Richmond Shee: If Your Memory Serves You Right
PGA Memory Management: Automatic The “new” way from 9i R1 Default OFF in 9i R1/R2 Enabled by setting at session/instance level:  WORKAREA_SIZE_POLICY=AUTO PGA_AGGREGATE_TARGET > 0 Default ON since 10g R1 Oracle  dynamically  manages the available memory to suit the  workload But of course, it’s not perfect! Jože Senegačnik -  Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
Auto PGA Parameters: Pre 10gR2 WORKAREA_SIZE_POLICY Set to AUTO PGA_AGGREGATE_TARGET The  target  for summed PGA across all processes Can be exceeded if too small Over Allocation _PGA_MAX_SIZE Target  maximum PGA size for a single process Default is a fixed value of 200Mb Hidden / Undocumented Parameter Usual caveats apply
Auto PGA Parameters : Pre 10gR2 _SMM_MAX_SIZE Limit for a single workarea operation for one process Derived Default LEAST(5% of PGA_AGGREGATE_TARGET , 50% of _PGA_MAX_SIZE) Hits limit of 100Mb When PGA_AGGREGATE_TARGET is >= 2000Mb And _PGA_MAX_SIZE is left at default of 200Mb Hidden / Undocumented Parameter Usual caveats apply
Auto PGA Parameters : Pre 10gR2 _SMM_PX_MAX_SIZE Limit for all the parallel slaves of a single workarea operation Derived Default 30%  of PGA_AGGREGATE_TARGET Hidden / Undocumented Parameter Usual caveats apply Parallel slaves still limited _SMM_MAX_SIZE Impacts only when…  PGA_AGGREGATE_TARGET: 3000Mb  _PGA_MAX_SIZE = 200Mb _SMM_MAX_SIZE = 100Mb _SMM_PX_MAX_SIZE = 900Mb Session 1 100Mb Session 2 100Mb Session 3 100Mb Session 4 100Mb Session 5 100Mb Session 6 100Mb Session 7 100Mb Session 8 100Mb Session 9 75Mb Session 10 75Mb Session 11 75Mb Session 12 75Mb Session 1 75Mb Session 2 75Mb Session 3 75Mb Session 4 75Mb Session 5 75Mb Session 6 75Mb Session 7 75Mb Session 8 75Mb
10gR2 Improvements _SMM_MAX_SIZE now the driver More advanced algorithm _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE Parallel operations _SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET  When DOP <=5 then _smm_max_size is used When DOP > 5 _smm_px_max_size / DOP is used Jože Senegačnik -  Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005 10% * PGA_AGGREGATE_TARGET 1000Mb +  100Mb 500Mb – 1000Mb 20% * PGA_AGGREGATE_TARGET <= 500Mb _SMM_MAX_SIZE PGA_AGGREGATE_TARGET
PGA Target Advisor select trunc(pga_target_for_estimate/1024/1024) pga_target_for_estimate ,  to_char(pga_target_factor * 100,'999.9') ||'%' pga_target_factor ,  trunc(bytes_processed/1024/1024) bytes_processed ,  trunc(estd_extra_bytes_rw/1024/1024) estd_extra_bytes_rw ,  to_char(estd_pga_cache_hit_percentage,'999') || '%' estd_pga_cache_hit_percentage ,  estd_overalloc_count from  v$pga_target_advice / PGA Target For PGA Tgt  Estimated Extra Estimated PGA  Estimated Estimate Mb Factor  Bytes Processed Bytes Read/Written Cache Hit %  Overallocation Count -------------- ------- ---------------- ------------------ --------------- -------------------- 5,376  12.5%  5,884,017  7,279,799  45%  113 10,752  25.0%  5,884,017  3,593,510  62%  8 21,504  50.0%  5,884,017  3,140,993  65%  0 32,256  75.0%  5,884,017  3,104,894  65%  0 43,008  100.0%  5,884,017  2,300,826  72%  0 51,609  120.0%  5,884,017  2,189,160  73%  0 60,211  140.0%  5,884,017  2,189,160  73%  0 68,812  160.0%  5,884,017  2,189,160  73%  0 77,414  180.0%  5,884,017  2,189,160  73%  0 86,016  200.0%  5,884,017  2,189,160  73%  0 129,024  300.0%  5,884,017  2,189,160  73%  0 172,032  400.0%  5,884,017  2,189,160  73%  0 258,048  600.0%  5,884,017  2,189,160  73%  0
Beware Of Temporal Data Affecting The Optimizer Slowly Changing Dimensions Cover ranges of time “ From” and “To” DATE columns define applicability Need BETWEEN operator to retrieve rows for a reporting point in time SELECT * FROM d_customer  WHERE ’15/01/2005’ BETWEEN valid_from AND valid_to Month 1 1 st  Jan, 2004 Month 2 1 st  Feb, 2004 CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE 487438 Jeff Moss I & C 839398 Mark Rittman SME D_CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO 487438 Jeff Moss SME 01/01/2004 31/01/2004 487438 Jeff Moss I & C 01/02/2004 839398 Mark Rittman SME 01/02/2004 CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE 487438 Jeff Moss SME D_CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO 487438 Jeff Moss SME 01/01/2004
Dependent Predicates When multiple predicates exist, individual selectivities are combined using standard probability math 1 : P1 AND P2 S(P1 & P2) = S(P1) * S(P2) P1 OR P2 S(P1 | P2) = S(P1) + S(P2) – [S(P1) * S(P2)] Only valid if the predicates are independent otherwise… Incorrect selectivity estimate Incorrect cardinality estimate Potentially suboptimal execution plan BETWEEN is multiple predicates! Also known as Correlated Columns 2 1 – Wolfgang Breitling, Fallacies Of The Cost Based Optimizer 2 – Jonathan Lewis, Cost-Based Oracle Fundamentals, Chapter 6
Some Test Tables… Consider these 3 test tables… 12 records in an SCD type table TEST_12_DISTINCT_TD TEST_2_DISTINCT_TD TEST_1_DISTINCT_TD
Optimizer Gets Incorrect Cardinality select * from test_1_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 1 Jeff  01-JAN-05 31-DEC-05 2 Mark  01-FEB-05 31-DEC-05 3 Doug  01-MAR-05 31-DEC-05 4 Niall  01-APR-05 31-DEC-05 5 Tom  01-MAY-05 31-DEC-05 6 Jonathan  01-JUN-05 31-DEC-05 7 Lisa  01-JUL-05 31-DEC-05 8 Cary  01-AUG-05 31-DEC-05 9 Mogens  01-SEP-05 31-DEC-05 10 Anjo  01-OCT-05 31-DEC-05 10 rows selected. Execution Plan ---------------------------------------------------------- | Id  | Operation  | Name  | Rows  | Bytes | Cost (%CPU)| Time  | ---------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT  |  |  11 |  264 |  3  (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TEST_1_DISTINCT_TD |  11 |  264 |  3  (0)| 00:00:01 | ----------------------------------------------------------------------------------------
…And Again select * from test_2_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 7 Lisa  01-JUL-05 31-DEC-05 8 Cary  01-AUG-05 31-DEC-05 9 Mogens  01-SEP-05 31-DEC-05 10 Anjo  01-OCT-05 31-DEC-05 4 rows selected. Execution Plan ---------------------------------------------------------------------------------------- | Id  | Operation  | Name  | Rows  | Bytes | Cost (%CPU)| Time  | ---------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT  |  |  11 |  264 |  3  (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TEST_2_DISTINCT_TD |  11 |  264 |  3  (0)| 00:00:01 | ----------------------------------------------------------------------------------------
… And Again select * from test_12_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 10 Anjo  01-OCT-05 31-OCT-05 1 row selected. Execution Plan ----------------------------------------------------------------------------------------- | Id  | Operation  | Name  | Rows  | Bytes | Cost (%CPU)| Time  | ----------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT  |  |  4 |  96 |  3  (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TEST_12_DISTINCT_TD |  4 |  96 |  3  (0)| 00:00:01 | -----------------------------------------------------------------------------------------
Workarounds Ignore it If your query still gets the right plan of course! Hints Force the optimizer to do as you tell it Stored outlines Adjust statistics held against the table Affects any SQL that accesses that object Optimizer Profile (10g) Offline Optimisation 1 Dynamic sampling level 4 or above Samples “ single table predicates that reference 2 or more columns ” Takes extra time during the parse – minimal but often worth it 1 - Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
Dynamic Sampling With A Hint select /*+ dynamic_sampling(test_1_distinct_td,4) */ *  from test_1_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 1 Jeff  01-JAN-05 31-DEC-05 2 Mark  01-FEB-05 31-DEC-05 3 Doug  01-MAR-05 31-DEC-05 4 Niall  01-APR-05 31-DEC-05 5 Tom  01-MAY-05 31-DEC-05 6 Jonathan  01-JUN-05 31-DEC-05 7 Lisa  01-JUL-05 31-DEC-05 8 Cary  01-AUG-05 31-DEC-05 9 Mogens  01-SEP-05 31-DEC-05 10 Anjo  01-OCT-05 31-DEC-05 10 rows selected. Execution Plan ---------------------------------------------------------------------------------------- | Id  | Operation  | Name  | Rows  | Bytes | Cost (%CPU)| Time  | ---------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT  |  |  10 |  240 |  3  (0)| 00:00:01 | |*  1 |  TABLE ACCESS FULL| TEST_1_DISTINCT_TD |  10 |  240 |  3  (0)| 00:00:01 | ----------------------------------------------------------------------------------------
Find Out Where Your Query Is At Data Warehouses are big, big, BIG! Big on rows Big on disk storage Big on hardware Big SQL statements issued Lots of data to scan, join and sort Many operations Long running So where is my long running query at ? No solid answers here, just food for thought…
A “Big” Query Execution Plan | Id  | Operation  | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| -------------------------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT  |  |  1 |  124 |  | 49722  (10)| |  1 |  PX COORDINATOR  |  |  |  |  |  | |  2 |  PX SEND QC (RANDOM)  | :TQ20006  |  1 |  124 |  | 49722  (10)| |  3 |  HASH JOIN  |  |  1 |  124 |  | 49722  (10)| |  4 |  BUFFER SORT  |  |  |  |  |  | |  5 |  PX RECEIVE  |  |  207K|  9510K|  | 25982  (9)| |  6 |  PX SEND BROADCAST  | :TQ20000  |  207K|  9510K|  | 25982  (9)| |  7 |  VIEW  |  |  207K|  9510K|  | 25982  (9)| |  8 |  WINDOW SORT  |  |  207K|  10M|  26M| 25982  (9)| |  9 |  MERGE JOIN  |  |  207K|  10M|  | 25976  (9)| |  10 |  TABLE ACCESS BY INDEX ROWID| AML_T_ANALYSIS_DATE  |  1 |  22 |  |  2  (0)| |  11 |  INDEX UNIQUE SCAN  | AML_I_ANL_PK  |  1 |  |  |  0  (0)| |  12 |  SORT AGGREGATE  |  |  1 |  9 |  |  | |  13 |  PX COORDINATOR  |  |  |  |  |  | |  14 |  PX SEND QC (RANDOM)  |  :TQ10000  |  1 |  9 |  |  | |  15 |  SORT AGGREGATE  |  |  1 |  9 |  |  | |  16 |  PX BLOCK ITERATOR  |  |  1 |  9 |  |  2  (0)| |  17 |  TABLE ACCESS FULL  | AML_T_ANALYSIS_DATE  |  1 |  9 |  |  2  (0)| |  18 |  FILTER  |  |  |  |  |  | |  19 |  FILTER  |  |  |  |  |  | |  20 |  TABLE ACCESS FULL  | AML_T_BILLING_ACCOUNT_DIM|  82M|  2371M|  |  5457  (5)| |  21 |  HASH JOIN  |  |  18M|  1340M|  | 23704  (10)| |  22 |  HASH JOIN  |  |  10M|  500M|  | 17005  (11)| |  23 |  PX RECEIVE  |  |  10M|  265M|  | 11304  (14)| |  24 |  PX SEND HASH  | :TQ20003  |  10M|  265M|  | 11304  (14)| |  25 |  BUFFER SORT  |  |  1 |  124 |  |  | |  26 |  VIEW  | AML_V_MD_CUH_SID  |  10M|  265M|  | 11304  (14)| |  27 |  HASH JOIN  |  |  10M|  337M|  | 11304  (14)| |  28 |  PX RECEIVE  |  |  17M|  310M|  |  5228  (18)| |  29 |  PX SEND HASH  | :TQ20001  |  17M|  310M|  |  5228  (18)| |  30 |  PX BLOCK ITERATOR  |  |  17M|  310M|  |  5228  (18)| |  31 |  TABLE ACCESS FULL  | AML_T_MEASURE_DIM  |  17M|  310M|  |  5228  (18)| |  32 |  PX RECEIVE  |  |  34M|  461M|  |  5958  (10)| |  33 |  PX SEND HASH  | :TQ20002  |  34M|  461M|  |  5958  (10)| |  34 |  PX BLOCK ITERATOR  |  |  34M|  461M|  |  5958  (10)| |  35 |  TABLE ACCESS FULL  | AML_T_CUSTOMER_DIM  |  34M|  461M|  |  5958  (10)| |  36 |  PX RECEIVE  |  |  55M|  1212M|  |  5562  (3)| |  37 |  PX SEND HASH  | :TQ20004  |  55M|  1212M|  |  5562  (3)| |  38 |  PX BLOCK ITERATOR  |  |  55M|  1212M|  |  5562  (3)| |  39 |  TABLE ACCESS FULL  | AML_T_CUSTOMER_DIM  |  55M|  1212M|  |  5562  (3)| |  40 |  PX RECEIVE  |  |  94M|  2516M|  |  6483  (5)| |  41 |  PX SEND HASH  | :TQ20005  |  94M|  2516M|  |  6483  (5)| |  42 |  PX BLOCK ITERATOR  |  |  94M|  2516M|  |  6483  (5)| |  43 |  MAT_VIEW ACCESS FULL  | AML_M_CD_BAD  |  94M|  2516M|  |  6483  (5)| Sorts Aggregations Hash joins Merge joins Table scans Materialized View scans Analytics Parallel Query Pruning Temp Space Use
V$ Views To The Rescue ? V$SESSION – Identify your session V$SQL_PLAN – Get the execution plan operations V$SQL_WORKAREA – Get all the work areas which will be required V$SESSION_LONGOPS – Get information on long plan operations V$SQL_WORKAREA_ACTIVE – Get the work area(s) being used right now V$SESSION SID SERIAL# PROGRAM USERNAME SQL_ID SQL_CHILD_NUMBER SQL_ADDRESS SQL_HASH_VALUE V$SQL_PLAN SQL_ID CHILD_NUMBER ADDRESS HASH_VALUE OPERATION ID PARENT_ID V$SESSION_LONGOPS SID SERIAL# OPNAME TARGET MESSAGE SQL_ID SQL_ADDRESS SQL_HASH_VALUE ELAPSED_SECONDS V$SQL_WORKAREA_ACTIVE SQL_ID SQL_HASH_VALUE WORKAREA_ADDRESS OPERATION_ID OPERATION_TYPE POLICY SID QCSID ACTIVE_TIME V$SQL_WORKAREA SQL_ID CHILD_NUMBER WORKAREA_ADDRESS OPERATION_ID OPERATION_TYPE
Demonstration
Problems V$SQL_PLAN Bug Service Request: 4990863.992  Broken in 10gR1, Works in 10gR2 PARENT_ID corruption Can’t link rows in this view to their parents as the values are corrupted due to this bug Shows up in TEMP TABLE TRANSFORMATION operations Multiple Work Areas can be active…or None Some operations are not shown in Long ops V$SESSION sql_id may not be the executing cursor E.g. for refreshing Materialized View * Test case for bug: http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql
Questions ?
References: Papers Table Compression in Oracle 9iR2: A Performance Analysis Table Compression in Oracle 9iR2: An Oracle White Paper “Fallacies Of The Cost Based Optimizer”, Wolfgang Breitling “ Scaling To Infinity, Partitioning In Oracle Data Warehouses”, Tim Gorman Advanced Management Of Working Areas in Oracle 9i/10g, UKOUG 2005, Joze Senegacnik Oracle9i Memory Management: Easier Than Ever,  Oracle Open World 2002 , Sushil Kumar  Working with Automatic PGA ,   Christo  Kutrovsky Optimising Oracle9i Instance Memory,   Ramaswamy, Ramesh Oracle Metalink Note 223730.1 :  Automatic PGA Memory Managment in 9i Oracle Metalink Note 147806.1 :  Oracle9i New Feature: Automated SQL Execution Memory Management Oracle Metalink Note 148346.1 :  Oracle9i Monitoring Automated SQL Execution Memory Management Memory Management and Latching Improvements in Oracle Database 9i and 10g   , Oracle Open World 2005,  Tanel  Pőder If Your Memory Serves You Right… ,  IOUG Live! 2004, April 2004, Toronto, Canada, Richmond Shee Decision Speed: Table Compression In Action
References: Online Presentation / Code http://www.oramoss.demon.co.uk/presentations/fivetuningtipsforyourdatawarehouse.ppt http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql htt p://www.oramoss.demon.co.uk/Code/run_big_query.sql htt p://www.oramoss.demon.co.uk/Code/run_big_query_parallel.sql htt p://www.oramoss.demon.co.uk/Code/get_query_progress.sql

More Related Content

PPT
The Database Environment Chapter 9
PDF
Tuning data warehouse
PPT
Teradata a z
PDF
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
PPT
Introduction To Msbi By Yasir
PDF
SQL Database Performance Tuning for Developers
PPT
The Database Environment Chapter 12
PPT
The Database Environment Chapter 6
The Database Environment Chapter 9
Tuning data warehouse
Teradata a z
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Introduction To Msbi By Yasir
SQL Database Performance Tuning for Developers
The Database Environment Chapter 12
The Database Environment Chapter 6

What's hot (20)

PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
PPT
Ch 7 Physical D B Design
PDF
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
PDF
How should I monitor my idaa
PPTX
Understanding DB2 Optimizer
PPT
Teradata 13.10
PPT
Database performance tuning and query optimization
PPTX
Introduction to Teradata And How Teradata Works
PPT
Understanding System Performance
PPTX
Datastage free tutorial
PPTX
Oracle Database 12c features for DBA
DOCX
Migration from 8.1 to 11.3
PPT
ETL Testing - Introduction to ETL testing
PPTX
ETL Testing Overview
PPT
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
PDF
Database recovery techniques
PDF
Performance tuning and optimization (ppt)
PPT
Informix partitioning interval_rolling_window_table
PDF
Time Travelling With DB2 10 For zOS
PPT
DB2UDB_the_Basics Day 6
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Ch 7 Physical D B Design
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
How should I monitor my idaa
Understanding DB2 Optimizer
Teradata 13.10
Database performance tuning and query optimization
Introduction to Teradata And How Teradata Works
Understanding System Performance
Datastage free tutorial
Oracle Database 12c features for DBA
Migration from 8.1 to 11.3
ETL Testing - Introduction to ETL testing
ETL Testing Overview
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
Database recovery techniques
Performance tuning and optimization (ppt)
Informix partitioning interval_rolling_window_table
Time Travelling With DB2 10 For zOS
DB2UDB_the_Basics Day 6
Ad

Similar to Five Tuning Tips For Your Datawarehouse (20)

PPT
Stack It And Unpack It
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
PPTX
Geek Sync | Tips for Data Warehouses and Other Very Large Databases
PPT
SQL Server 2008 Performance Enhancements
PDF
Partitioning Tables and Indexing Them --- Article
PDF
MySQL Partitioning 5.6
PDF
MySQL partitioning
PPTX
Partitioning 101
PPT
The thinking persons guide to data warehouse design
PDF
Partitioning 11g-whitepaper-159443
PPT
Les 18 space
PDF
Table Partitioning: Secret Weapon for Big Data Problems
PPTX
Data warehouse 25 data warehouse partitioning
PDF
Юра Гуляев. Oracle tables
PPTX
oracle tables
PPTX
Postgres db performance improvements
PDF
Partitioning Under The Hood
PDF
MySQL 5.1 and beyond
PPTX
Real World Performance - Data Warehouses
PPTX
1606802425-dba-w7 database management.pptx
Stack It And Unpack It
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Geek Sync | Tips for Data Warehouses and Other Very Large Databases
SQL Server 2008 Performance Enhancements
Partitioning Tables and Indexing Them --- Article
MySQL Partitioning 5.6
MySQL partitioning
Partitioning 101
The thinking persons guide to data warehouse design
Partitioning 11g-whitepaper-159443
Les 18 space
Table Partitioning: Secret Weapon for Big Data Problems
Data warehouse 25 data warehouse partitioning
Юра Гуляев. Oracle tables
oracle tables
Postgres db performance improvements
Partitioning Under The Hood
MySQL 5.1 and beyond
Real World Performance - Data Warehouses
1606802425-dba-w7 database management.pptx
Ad

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...

Five Tuning Tips For Your Datawarehouse

  • 1. Five Tuning Tips For Your Data Warehouse Jeff Moss
  • 2. My First Presentation Yes, my very first presentation For BIRT SIG For UKOUG Useful Advice from friends and colleagues Use graphics where appropriate Find a friendly or familiar face in the audience Imagine your audience is naked! … but like Oracle, be careful when combining advice!
  • 3. Be Careful Combining Advice! Thanks for the opportunity Mark!
  • 4. Agenda My background Five tips Partition for success Squeeze your data with data segment compression Make the most of your PGA memory Beware of temporal data affecting the optimizer Find out where your query is at Questions
  • 5. My Background Independent Consultant 13 years Oracle experience Blog: http://oramossoracle.blogspot.com/ Focused on warehousing / VLDB since 1998 First project UK Music Sales Data Mart Produces BBC Radio 1 Top 40 chart and many more 2 billion row sales fact table 1 Tb total database size Currently working with Eon UK (Powergen) 4Tb Production Warehouse, 8Tb total storage Oracle Product Stack
  • 6. What Is Partitioning ? “ Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions .” Oracle Database Concepts Manual, 10gR2 Introduced in Oracle 8.0 Numerous improvements since Subpartitioning adds another level of decomposition Partitions and Subpartitions are logical containers
  • 7. Partition To Tablespace Mapping Partitions map to tablespaces Partition can only be in One tablespace Tablespace can hold many partitions Highest granularity is One tablespace per partition Lowest granularity is One tablespace for all the partitions Tablespace volatility Read / Write Read Only P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
  • 8. Why Partition ? - Performance Improved query performance Pruning or elimination Partition wise joins Read only partitions Quicker checkpointing Quicker backup Quicker recovery … but it depends on mapping of: partition:tablespace:datafile SELECT SUM(sales) FROM part_tab WHERE sales_date BETWEEN ‘01-JAN-2005’ AND ’30-JUN-2005’ Sales Fact Table * Oracle 10gR2 Data Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
  • 9. Why Partition ? - Manageability Archiving Use a rolling window approach ALTER TABLE … ADD/SPLIT/DROP PARTITION… Easier ETL Processing Build a new dataset in a staging table Add indexes and constraints Collect statistics Then swap the staging table for a partition on the target ALTER TABLE…EXCHANGE PARTITION… Easier Maintenance Table partition move, e.g. to compress data Local Index partition rebuild
  • 10. Why Partition ? - Scalability Partition is generally consistent and predictable Assuming an appropriate partitioning key is used …and data has an even distribution across the key Read only approach Scalable backups - read only tablespaces are ignored …so partitions in those tablespaces are ignored Pruning allows consistent query performance
  • 11. Why Partition ? - Availability Offline data impact minimised … depending on granularity Quicker recovery Pruned data not missed EXCHANGE PARTITION Allows offline build Quick swap over P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
  • 12. Fact Table Partitioning Transaction Date Load Date Easier ETL Processing Each load deals with only 1 partition No use to end user queries! Can’t prune – Full scans! Harder ETL Processing But still uses EXCHANGE PARTITION Useful to end user queries Allows full pruning capability 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 January Partition February Partition 22-JAN-2005 Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 09-APR-2005 Customer 9 10-APR-2005
  • 13. Watch out for… Partition exchange and table statistics 1 Partition stats updated … but Global stats are NOT! Affects queries accessing multiple partitions Solution Gather stats on staging table prior to EXCHANGE Gather stats on partitioned table using GLOBAL Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
  • 14. Partitioning Feature: Characteristic Reason Matrix    Partition Truncation     Exchange Partition    Archiving    Pruning (Partition Elimination)   Partition wise joins  Parallel DML     Local Indexes    Read Only Partitions Availability Scalability Manageability Performance Characteristic: Feature:
  • 15. What Is Data Segment Compression ? Compresses data by eliminating intra block repeated column values Reduces the space required for a segment …but only if there are appropriate repeats! Self contained Lossless algorithm
  • 16. Where Can Data Segment Compression Be Used ? Can be used with a number of segment types Heap & Nested Tables Range or List Partitions Materialized Views Can’t be used with Subpartitions Hash Partitions Indexes – but they have row level compression IOT External Tables Tables that are part of a Cluster LOBs
  • 17. How Does Segment Compression Work ? Database Block Symbol Table Row Data Area 100 Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8 4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP
  • 18. Pros & Cons Pros Saves space Reduces LIO / PIO Speeds up backup/recovery Improves query response time Transparent To readers … and writers Decreases time to perform some DML Deletes should be quicker Bulk inserts may be quicker Cons Increases CPU load Can only be used on Direct Path operations CTAS Serial Inserts using INSERT /*+ APPEND */ Parallel Inserts (PDML) ALTER TABLE…MOVE… Direct Path SQL*Loader Increases time to perform some DML Bulk inserts may be slower Updates are slower
  • 19. Ordering Your Data For Maximum Benefits Colocate data to maximise compression benefits For maximum compression Minimise the total space required by the segment Identify most “compressable” column(s) For optimal access We know how the data is to be queried Order the data by Access path columns Then the next most “compressable” column(s) Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
  • 20. Get Max Compression Order Package PROCEDURE mgmt_p_get_max_compress_order Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- P_TABLE_OWNER VARCHAR2 IN DEFAULT P_TABLE_NAME VARCHAR2 IN P_PARTITION_NAME VARCHAR2 IN DEFAULT P_SAMPLE_SIZE NUMBER IN DEFAULT P_PREFIX_COLUMN1 VARCHAR2 IN DEFAULT P_PREFIX_COLUMN2 VARCHAR2 IN DEFAULT P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT BEGIN mgmt_p_get_max_compress_order(p_table_owner => ‘AE_MGMT’ ,p_table_name =>’BIG_TABLE’ ,p_sample_size =>10000); END: / Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table : BIG_TABLE Sample Size : 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table : TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAME COLUMN BLOCKS ROWS SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119 COL1 290 10000 .0022 TEMP_COL_002_25012006232119 COL2 345 10000 .0026 TEMP_COL_003_25012006232119 COL3 555 10000 .0042
  • 21. Data Warehousing Specifics Star Schema compresses better than Normalized More redundant data Focus on… Fact Tables and Summaries in Star Schema Transaction tables in Normalized Schema Performance Impact 1 Space Savings Star schema: 67% Normalized: 24% Query Elapsed Times Star schema: 16.5% Normalized: 10% 1 - Table Compression in Oracle 9iR2: A Performance Analysis
  • 22. Things To Watch Out For DROP COLUMN is awkward ORA-39726: Unsupported add/drop column operation on compressed tables Uncompress the table and try again - still gives ORA-39726! After UPDATEs data is uncompressed Performance impact Row migration Use appropriate physical design settings PCTFREE 0 - pack each block Large blocksize - reduce overhead / increase repeats per block
  • 23. PGA Memory: What For ? Sorts Standard sorts [SORT] Buffer [BUFFER] Group By [GROUP BY (SORT)] Connect By [CONNECT-BY (SORT)] Rollup [ROLLUP (SORT)] Window [WINDOW (SORT)] Hash Joins [HASH-JOIN] Indexes Maintenance [IDX MAINTENANCE SOR] Bitmap Merge [BITMAP MERGE] Bitmap Create [BITMAP CREATE] Write Buffers [LOAD WRITE BUFFERS] Serial Process PGA Dedicated Server Cursors Variables Sort Area [] V$SQL_WORKAREA.OPERATION_TYPE
  • 24. PGA Memory Management: Manual The “old” way of doing things Still available though – even in 10g R2 Configuring ALTER SESSION SET WORKAREA_SIZE_POLICY=MANUAL; Initialisation parameter: WORKAREA_SIZE_POLICY=MANUAL Set memory parameters yourself HASH_AREA_SIZE SORT_AREA_SIZE SORT_AREA_RETAINED_SIZE BITMAP_MERGE_AREA_SIZE CREATE_BITMAP_AREA_SIZE Optimal values depend on the type of work 1 One size does not fit all! 1 - Richmond Shee: If Your Memory Serves You Right
  • 25. PGA Memory Management: Automatic The “new” way from 9i R1 Default OFF in 9i R1/R2 Enabled by setting at session/instance level: WORKAREA_SIZE_POLICY=AUTO PGA_AGGREGATE_TARGET > 0 Default ON since 10g R1 Oracle dynamically manages the available memory to suit the workload But of course, it’s not perfect! Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
  • 26. Auto PGA Parameters: Pre 10gR2 WORKAREA_SIZE_POLICY Set to AUTO PGA_AGGREGATE_TARGET The target for summed PGA across all processes Can be exceeded if too small Over Allocation _PGA_MAX_SIZE Target maximum PGA size for a single process Default is a fixed value of 200Mb Hidden / Undocumented Parameter Usual caveats apply
  • 27. Auto PGA Parameters : Pre 10gR2 _SMM_MAX_SIZE Limit for a single workarea operation for one process Derived Default LEAST(5% of PGA_AGGREGATE_TARGET , 50% of _PGA_MAX_SIZE) Hits limit of 100Mb When PGA_AGGREGATE_TARGET is >= 2000Mb And _PGA_MAX_SIZE is left at default of 200Mb Hidden / Undocumented Parameter Usual caveats apply
  • 28. Auto PGA Parameters : Pre 10gR2 _SMM_PX_MAX_SIZE Limit for all the parallel slaves of a single workarea operation Derived Default 30% of PGA_AGGREGATE_TARGET Hidden / Undocumented Parameter Usual caveats apply Parallel slaves still limited _SMM_MAX_SIZE Impacts only when… PGA_AGGREGATE_TARGET: 3000Mb _PGA_MAX_SIZE = 200Mb _SMM_MAX_SIZE = 100Mb _SMM_PX_MAX_SIZE = 900Mb Session 1 100Mb Session 2 100Mb Session 3 100Mb Session 4 100Mb Session 5 100Mb Session 6 100Mb Session 7 100Mb Session 8 100Mb Session 9 75Mb Session 10 75Mb Session 11 75Mb Session 12 75Mb Session 1 75Mb Session 2 75Mb Session 3 75Mb Session 4 75Mb Session 5 75Mb Session 6 75Mb Session 7 75Mb Session 8 75Mb
  • 29. 10gR2 Improvements _SMM_MAX_SIZE now the driver More advanced algorithm _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE Parallel operations _SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET When DOP <=5 then _smm_max_size is used When DOP > 5 _smm_px_max_size / DOP is used Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005 10% * PGA_AGGREGATE_TARGET 1000Mb + 100Mb 500Mb – 1000Mb 20% * PGA_AGGREGATE_TARGET <= 500Mb _SMM_MAX_SIZE PGA_AGGREGATE_TARGET
  • 30. PGA Target Advisor select trunc(pga_target_for_estimate/1024/1024) pga_target_for_estimate , to_char(pga_target_factor * 100,'999.9') ||'%' pga_target_factor , trunc(bytes_processed/1024/1024) bytes_processed , trunc(estd_extra_bytes_rw/1024/1024) estd_extra_bytes_rw , to_char(estd_pga_cache_hit_percentage,'999') || '%' estd_pga_cache_hit_percentage , estd_overalloc_count from v$pga_target_advice / PGA Target For PGA Tgt Estimated Extra Estimated PGA Estimated Estimate Mb Factor Bytes Processed Bytes Read/Written Cache Hit % Overallocation Count -------------- ------- ---------------- ------------------ --------------- -------------------- 5,376 12.5% 5,884,017 7,279,799 45% 113 10,752 25.0% 5,884,017 3,593,510 62% 8 21,504 50.0% 5,884,017 3,140,993 65% 0 32,256 75.0% 5,884,017 3,104,894 65% 0 43,008 100.0% 5,884,017 2,300,826 72% 0 51,609 120.0% 5,884,017 2,189,160 73% 0 60,211 140.0% 5,884,017 2,189,160 73% 0 68,812 160.0% 5,884,017 2,189,160 73% 0 77,414 180.0% 5,884,017 2,189,160 73% 0 86,016 200.0% 5,884,017 2,189,160 73% 0 129,024 300.0% 5,884,017 2,189,160 73% 0 172,032 400.0% 5,884,017 2,189,160 73% 0 258,048 600.0% 5,884,017 2,189,160 73% 0
  • 31. Beware Of Temporal Data Affecting The Optimizer Slowly Changing Dimensions Cover ranges of time “ From” and “To” DATE columns define applicability Need BETWEEN operator to retrieve rows for a reporting point in time SELECT * FROM d_customer WHERE ’15/01/2005’ BETWEEN valid_from AND valid_to Month 1 1 st Jan, 2004 Month 2 1 st Feb, 2004 CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE 487438 Jeff Moss I & C 839398 Mark Rittman SME D_CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO 487438 Jeff Moss SME 01/01/2004 31/01/2004 487438 Jeff Moss I & C 01/02/2004 839398 Mark Rittman SME 01/02/2004 CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE 487438 Jeff Moss SME D_CUSTOMER CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO 487438 Jeff Moss SME 01/01/2004
  • 32. Dependent Predicates When multiple predicates exist, individual selectivities are combined using standard probability math 1 : P1 AND P2 S(P1 & P2) = S(P1) * S(P2) P1 OR P2 S(P1 | P2) = S(P1) + S(P2) – [S(P1) * S(P2)] Only valid if the predicates are independent otherwise… Incorrect selectivity estimate Incorrect cardinality estimate Potentially suboptimal execution plan BETWEEN is multiple predicates! Also known as Correlated Columns 2 1 – Wolfgang Breitling, Fallacies Of The Cost Based Optimizer 2 – Jonathan Lewis, Cost-Based Oracle Fundamentals, Chapter 6
  • 33. Some Test Tables… Consider these 3 test tables… 12 records in an SCD type table TEST_12_DISTINCT_TD TEST_2_DISTINCT_TD TEST_1_DISTINCT_TD
  • 34. Optimizer Gets Incorrect Cardinality select * from test_1_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 1 Jeff 01-JAN-05 31-DEC-05 2 Mark 01-FEB-05 31-DEC-05 3 Doug 01-MAR-05 31-DEC-05 4 Niall 01-APR-05 31-DEC-05 5 Tom 01-MAY-05 31-DEC-05 6 Jonathan 01-JUN-05 31-DEC-05 7 Lisa 01-JUL-05 31-DEC-05 8 Cary 01-AUG-05 31-DEC-05 9 Mogens 01-SEP-05 31-DEC-05 10 Anjo 01-OCT-05 31-DEC-05 10 rows selected. Execution Plan ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 264 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_1_DISTINCT_TD | 11 | 264 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
  • 35. …And Again select * from test_2_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 7 Lisa 01-JUL-05 31-DEC-05 8 Cary 01-AUG-05 31-DEC-05 9 Mogens 01-SEP-05 31-DEC-05 10 Anjo 01-OCT-05 31-DEC-05 4 rows selected. Execution Plan ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 264 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_2_DISTINCT_TD | 11 | 264 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
  • 36. … And Again select * from test_12_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 10 Anjo 01-OCT-05 31-OCT-05 1 row selected. Execution Plan ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 96 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_12_DISTINCT_TD | 4 | 96 | 3 (0)| 00:00:01 | -----------------------------------------------------------------------------------------
  • 37. Workarounds Ignore it If your query still gets the right plan of course! Hints Force the optimizer to do as you tell it Stored outlines Adjust statistics held against the table Affects any SQL that accesses that object Optimizer Profile (10g) Offline Optimisation 1 Dynamic sampling level 4 or above Samples “ single table predicates that reference 2 or more columns ” Takes extra time during the parse – minimal but often worth it 1 - Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
  • 38. Dynamic Sampling With A Hint select /*+ dynamic_sampling(test_1_distinct_td,4) */ * from test_1_distinct_td where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date; KEY NON_KEY_AT FROM_DATE TO_DATE ---------- ---------- --------- --------- 1 Jeff 01-JAN-05 31-DEC-05 2 Mark 01-FEB-05 31-DEC-05 3 Doug 01-MAR-05 31-DEC-05 4 Niall 01-APR-05 31-DEC-05 5 Tom 01-MAY-05 31-DEC-05 6 Jonathan 01-JUN-05 31-DEC-05 7 Lisa 01-JUL-05 31-DEC-05 8 Cary 01-AUG-05 31-DEC-05 9 Mogens 01-SEP-05 31-DEC-05 10 Anjo 01-OCT-05 31-DEC-05 10 rows selected. Execution Plan ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 240 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_1_DISTINCT_TD | 10 | 240 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
  • 39. Find Out Where Your Query Is At Data Warehouses are big, big, BIG! Big on rows Big on disk storage Big on hardware Big SQL statements issued Lots of data to scan, join and sort Many operations Long running So where is my long running query at ? No solid answers here, just food for thought…
  • 40. A “Big” Query Execution Plan | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 124 | | 49722 (10)| | 1 | PX COORDINATOR | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ20006 | 1 | 124 | | 49722 (10)| | 3 | HASH JOIN | | 1 | 124 | | 49722 (10)| | 4 | BUFFER SORT | | | | | | | 5 | PX RECEIVE | | 207K| 9510K| | 25982 (9)| | 6 | PX SEND BROADCAST | :TQ20000 | 207K| 9510K| | 25982 (9)| | 7 | VIEW | | 207K| 9510K| | 25982 (9)| | 8 | WINDOW SORT | | 207K| 10M| 26M| 25982 (9)| | 9 | MERGE JOIN | | 207K| 10M| | 25976 (9)| | 10 | TABLE ACCESS BY INDEX ROWID| AML_T_ANALYSIS_DATE | 1 | 22 | | 2 (0)| | 11 | INDEX UNIQUE SCAN | AML_I_ANL_PK | 1 | | | 0 (0)| | 12 | SORT AGGREGATE | | 1 | 9 | | | | 13 | PX COORDINATOR | | | | | | | 14 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 9 | | | | 15 | SORT AGGREGATE | | 1 | 9 | | | | 16 | PX BLOCK ITERATOR | | 1 | 9 | | 2 (0)| | 17 | TABLE ACCESS FULL | AML_T_ANALYSIS_DATE | 1 | 9 | | 2 (0)| | 18 | FILTER | | | | | | | 19 | FILTER | | | | | | | 20 | TABLE ACCESS FULL | AML_T_BILLING_ACCOUNT_DIM| 82M| 2371M| | 5457 (5)| | 21 | HASH JOIN | | 18M| 1340M| | 23704 (10)| | 22 | HASH JOIN | | 10M| 500M| | 17005 (11)| | 23 | PX RECEIVE | | 10M| 265M| | 11304 (14)| | 24 | PX SEND HASH | :TQ20003 | 10M| 265M| | 11304 (14)| | 25 | BUFFER SORT | | 1 | 124 | | | | 26 | VIEW | AML_V_MD_CUH_SID | 10M| 265M| | 11304 (14)| | 27 | HASH JOIN | | 10M| 337M| | 11304 (14)| | 28 | PX RECEIVE | | 17M| 310M| | 5228 (18)| | 29 | PX SEND HASH | :TQ20001 | 17M| 310M| | 5228 (18)| | 30 | PX BLOCK ITERATOR | | 17M| 310M| | 5228 (18)| | 31 | TABLE ACCESS FULL | AML_T_MEASURE_DIM | 17M| 310M| | 5228 (18)| | 32 | PX RECEIVE | | 34M| 461M| | 5958 (10)| | 33 | PX SEND HASH | :TQ20002 | 34M| 461M| | 5958 (10)| | 34 | PX BLOCK ITERATOR | | 34M| 461M| | 5958 (10)| | 35 | TABLE ACCESS FULL | AML_T_CUSTOMER_DIM | 34M| 461M| | 5958 (10)| | 36 | PX RECEIVE | | 55M| 1212M| | 5562 (3)| | 37 | PX SEND HASH | :TQ20004 | 55M| 1212M| | 5562 (3)| | 38 | PX BLOCK ITERATOR | | 55M| 1212M| | 5562 (3)| | 39 | TABLE ACCESS FULL | AML_T_CUSTOMER_DIM | 55M| 1212M| | 5562 (3)| | 40 | PX RECEIVE | | 94M| 2516M| | 6483 (5)| | 41 | PX SEND HASH | :TQ20005 | 94M| 2516M| | 6483 (5)| | 42 | PX BLOCK ITERATOR | | 94M| 2516M| | 6483 (5)| | 43 | MAT_VIEW ACCESS FULL | AML_M_CD_BAD | 94M| 2516M| | 6483 (5)| Sorts Aggregations Hash joins Merge joins Table scans Materialized View scans Analytics Parallel Query Pruning Temp Space Use
  • 41. V$ Views To The Rescue ? V$SESSION – Identify your session V$SQL_PLAN – Get the execution plan operations V$SQL_WORKAREA – Get all the work areas which will be required V$SESSION_LONGOPS – Get information on long plan operations V$SQL_WORKAREA_ACTIVE – Get the work area(s) being used right now V$SESSION SID SERIAL# PROGRAM USERNAME SQL_ID SQL_CHILD_NUMBER SQL_ADDRESS SQL_HASH_VALUE V$SQL_PLAN SQL_ID CHILD_NUMBER ADDRESS HASH_VALUE OPERATION ID PARENT_ID V$SESSION_LONGOPS SID SERIAL# OPNAME TARGET MESSAGE SQL_ID SQL_ADDRESS SQL_HASH_VALUE ELAPSED_SECONDS V$SQL_WORKAREA_ACTIVE SQL_ID SQL_HASH_VALUE WORKAREA_ADDRESS OPERATION_ID OPERATION_TYPE POLICY SID QCSID ACTIVE_TIME V$SQL_WORKAREA SQL_ID CHILD_NUMBER WORKAREA_ADDRESS OPERATION_ID OPERATION_TYPE
  • 43. Problems V$SQL_PLAN Bug Service Request: 4990863.992 Broken in 10gR1, Works in 10gR2 PARENT_ID corruption Can’t link rows in this view to their parents as the values are corrupted due to this bug Shows up in TEMP TABLE TRANSFORMATION operations Multiple Work Areas can be active…or None Some operations are not shown in Long ops V$SESSION sql_id may not be the executing cursor E.g. for refreshing Materialized View * Test case for bug: http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql
  • 45. References: Papers Table Compression in Oracle 9iR2: A Performance Analysis Table Compression in Oracle 9iR2: An Oracle White Paper “Fallacies Of The Cost Based Optimizer”, Wolfgang Breitling “ Scaling To Infinity, Partitioning In Oracle Data Warehouses”, Tim Gorman Advanced Management Of Working Areas in Oracle 9i/10g, UKOUG 2005, Joze Senegacnik Oracle9i Memory Management: Easier Than Ever, Oracle Open World 2002 , Sushil Kumar Working with Automatic PGA , Christo Kutrovsky Optimising Oracle9i Instance Memory, Ramaswamy, Ramesh Oracle Metalink Note 223730.1 : Automatic PGA Memory Managment in 9i Oracle Metalink Note 147806.1 : Oracle9i New Feature: Automated SQL Execution Memory Management Oracle Metalink Note 148346.1 : Oracle9i Monitoring Automated SQL Execution Memory Management Memory Management and Latching Improvements in Oracle Database 9i and 10g , Oracle Open World 2005, Tanel Pőder If Your Memory Serves You Right… , IOUG Live! 2004, April 2004, Toronto, Canada, Richmond Shee Decision Speed: Table Compression In Action
  • 46. References: Online Presentation / Code http://www.oramoss.demon.co.uk/presentations/fivetuningtipsforyourdatawarehouse.ppt http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql http://www.oramoss.demon.co.uk/Code/test_error_v_sql_plan.sql htt p://www.oramoss.demon.co.uk/Code/run_big_query.sql htt p://www.oramoss.demon.co.uk/Code/run_big_query_parallel.sql htt p://www.oramoss.demon.co.uk/Code/get_query_progress.sql