SlideShare a Scribd company logo
Spool Space in Teradata

               - Nazir Iqbal
                 07-March-2012




           1
Table of Contents

  1.   Introduction………………………………………………………………………………………….3

       1.1   Spool Space………………………………………………………………………………….3
       1.2   Spool Space and Capacity Planning………………………………………………….3
       1.3   Spool Space Categories………………………………………………………………….4
       1.4   Spool Space Allocation……………………………………………………………………4

  2.   Causes of spool space error and how to minimize it……………………………………4

  3.   Know the data…………………………………………………………………………………………5

  4.   Primary Index…………………………………………………………………………………………5

  5.   Multiset or Set table…………………………………………………………………………………6

  6.   Collect Statistics………………………………………………………………………………………7

  7.   Skewing………………………………………………………………………………………………….7

  8.   Conclusion……………………………………………………………………………………………….8

  9.   References………………………………………………………………………………………………8




                                     2
1. INTRODUCTION:

1.1 SPOOL SPACE:

TERADATA Spool Space is unused Perm Space that it used for running queries. Spool Space is used
to hold intermediate rows during processing, and to hold the rows in the answer set of a transaction.
TERADATA recommends 20% of the available perm space is allocated for Spool space but various
across applications.

In the majority of cases, well written SQL queries should not use huge amounts of spool space. A
poor choice of join column, product join and lack of statistics are the main reason of excessive spool
space consumption. Each user can be set a spool space limit. In later version of TERADATA, this is
often set in the user’s profile.

Insufficient spool error is usually the result of poor table design, poor data distribution, or a poorly
written query. Running out of Spool Space will give the user an error code 2646.


1.2 SPOOL SPACE AND CAPACITY PLANNING: H E C K

Spool Space and Capacity Planning are mutually dependent concepts.

Spool space is critical to the operation of Teradata RDBMS, yet it is frequently
overlooked in capacity planning. Size requirements vary from user to user, table to table and
application to application.
For instance,
• The Spool space of a user is used to hold the response rows of every query run by that user during
a session. Thus, each user needs a high enough spool allocation to contain the biggest anticipated
answer set.
•Tables containing huge data require more available spool space than smaller tables, because
intermediate rows are held in spool space during query execution.




1.3. SPOOL SPACE CATEGORIES

Spool falls into three categories of space.

They are:-
Volatile
Intermediate
Output




                                                    3
Volatile                        Intermediate                           Output

This Spool is retained until the    Intermediate spool results are     Output Spool results are either
transaction completes (unless       retained until no longer needed.   final rows returned in the answer
the table was created with ON       We can determine when              set for a query, rows updated
COMMIT PRESERVE ROW), table         intermediate spool is flushed by   within, inserted into, or deleted
is dropped manually during the      examining the output of an         from a base table.
session, Session ends or Teradata   EXPLAIN.
RDBMS resets.




1.4. SPOOL SPACE ALLOCATION

Teradata RDBMS allocates spool space dynamically only from disk cylinders that are not being used
for permanent or temporary data. Permanent, temporary, and spool data blocks cannot co-exist on
the same cylinder. Spool space is not reserved. All unused space in the Teradata RDBMS is
considered available spool space. When spool is released, the file system returns the cylinders it was
using to the free cylinder list.

We allocate spool space for a database, a user, or a user profile, not at the table level.

A SPOOL limit defined in a profile takes effect upon completion of a:
• CREATE/MODIFY USER statement that assigns the profile to a user.
• MODIFY PROFILE statement that changes the spool space limit.
If the user is logged on, the profile specification affects the current session.

Inefficient SQL queries generally results in Capacity Planning and Spool Space Allocation going
wayward and throws up Spool Space Error which is one of the most common error encountered by a
Teradata SQL Programmer.


2. CAUSES OF SPOOL SPACE ERROR AND HOW TO MINIMIZE IT:
When Resource thresholds are met, like Spool Space exceeded, then either a warning is given by
the DBAs or the query is aborted by them. Different thresholds are set for tactical, decision support,
and ad-hoc scenarios.

High skew is another cause for Spool Space been exceeded.

Not all alerts or warnings indicate there is a problem, as some transactions use high CPU and spool
because of large data volumes and complexity of the code.

Often we have en-countered scenarios where a SQL query has been running for a long time.
The reasons may be:
                • Missing or aged statistics.
                • Large product joins.
                • Merge joins where there is a many-to-many relationship.



                                                       4
•   Set tables that should be Multiset.
                    •   Stats reflect zero rows on a table, yet are not empty.
                    •   A change in data volume which requires additional stats and it will generate a
                        different explain plan.
                    •   Unbalanced parenthesis.

The key is to know the data before writing SQL codes.



3. KNOW THE DATA
Below are a few questions that should always be given a thought so that SQL codes are efficient and
do not exceed the thresholds of Spool Space or CPU.


     1. How many rows exist on the tables in the query?
     2. What columns are we joining on?
     3. Do we need to add filters or additional joins to reduce volume?
     4. How many unique values exist on columns?
     5. How many rows exist on tables that are duplicated?
     6. Queries having derived tables will often show no confidence because the optimizer does not
        know how many rows are in a derived table.
     7. High estimated time can indicate aged stats i.e. stats should be collected again.
     8. What type of join is performed?

          Product Join                             Merge Join                              Hash Join

This is a cross join every row from     Requires sort of spool files. Merge    The tables do not have to be sorted
one table is joined to every row on     join are efficient when there is not   and the smaller table can be much
the second table. Spool file is as      a many to many relationship on         larger than for a product join. The
large as (No. Of rows table_one *       columns involved in the join.          smaller table/spool is "hashed" into
No. Of rows table_two), large           If there is a many to many             memory. Then, the larger table is
product joins (billions of rows)        relationship, try to aggregate the     scanned and for each row, it looks
should be avoided. Product joins        columns on one table to reduce the     up the row from the smaller table
are most efficient when a SMALL         volume by creating a volatile table,   in the hashed table that was
lookup table is duplicated. Product     derived table or work table.           created in memory. If the smaller
joins are inefficient when large fact                                          table is broken into partitions to fit
tables are duplicated (this can                                                into memory, the larger table must
indicate aged or missing stats).                                               also be broken into the same
                                                                               partitions prior to the join.


   4. PRIMARY INDEX
           A poor primary index having lumpy distribution data which can cause a query to run
           several hours when it should execute in seconds/minutes. Hence, we should choose a
           single column or multiple columns that distribute the data evenly across all AMPS.



                                                         5
Eliminate columns from the primary index that have a lot of null values. Value change
    rate should be low or never. Column(s) should be frequently used in join constraint.
    Teradata is a multi parallel processor so a query runs as well as the SLOWEST AMP. If the
    table joins to a similar table having the same columns, the primary index on both tables
    should be the same.




 AMP-4 has much more data than AMP-1, AMP-2 and AMP-3 which causes Spool Space Error.
 Choice of primary index should be such to avoid such un-even data distribution across AMPs.

5. MULTISET OR SET TABLE
 A set table performs a duplicate row check. If there are a lot of non unique values for a
 primary index, this can be very CPU intensive. For example, for a primary index having 2000
 values a duplicate row check will be performed 4,000,000 times. This is referred to as
 chaining. The first record is loaded. The next record having the same PI value to load, checks
 all the columns of the first one to determine if it is a duplicate. Once the third record is
 loaded, it checks both the first and the second records and so on.

 A Multiset table allows duplicate rows so the duplicate row check is omitted. If duplicates can
 be omitted using a group by or filtered programmatically, a load to a multiset table performs
 better.

 A Multiset table having a NUPI, non unique primary index, with occurrences between 500 –
 2000 is not bad.

 For tables having non unique primary index where there are several hundred or a couple of
 thousands values for a given primary index ‘use a multiset table’

 For tables having a more unique index like 1 to 10 values for a give primary index ‘use a set
 table’

  Note: the FASTLOAD utility program will not allow duplicates, even if the target table is
  MULTISET.



                                            6
6. COLLECT STATISTICS
  Poor or missing statistics OR Aged statistics may cause Spool Space Error.

  TERADATA recommends that COLLECT STATS should include:-
  1. Individual columns in an index.
  2. All columns in an index, multi-column where size is less than 16 bytes.
  3. Join columns.
  4. Filter or qualifying columns.
  5. Secondary Index

  Statistics are not needed for temp tables that are not joined to other tables and only used
  for staging.

  Be careful to NOT over collect on statistics. If a table is updated by several inserts multiple
  times a day, the statistics do not need to be refreshed after each insert. One collection is
  Significant after the last insert. For tables being completely refreshed, the statistics are
  Needed after the refresh.

  TOOL TO CHECK THE EFFICIENCY OF A QUERY:

  Run this diagnostic command before the explain of the query.
  At the bottom of the explain it will list the statistics that are missing.

  Diagnostic helpstats on for session;
  Explain
  <OUR SQL TEXT>

  WE SHOULD NOT COLLECT STATISTICS ON EMPTY TABLES. This will cause the optimizer to
  choose an inefficient path based on the information available to the parser. Statistics should
  be collected when a table is initially loaded and anytime the table’s demographics change by
  more than 10%. After the initial collect statistics on an object, the user can run the
  statement below to refresh the table’s statistics based on the (new) data.

  Collect statistics on databasename.tablename; -- this will refresh all stats, index and
  column, that were previously gathered on a table

  To see the statistics that exist for a table, run the following:
  HELP STATISTICS databasename.tablename;

7. SKEWING:
  Proper primary index specification should evenly distribute the rows of a table across the
  AMPs. This prevents skewing. The Query Log Information in SQL Assistant and other Editors
  tells us about the degree of Skewing in a query.CPU Skew > 50 reflects worse case scenarios
  and generally any query having a CPU Skew > 4 is considered poor performing. Hence, by
  seeing the CPU Skew from the Query Log Information a programmer can easily make out
  which query needs to be fine-tuned to avoid high Skew.


                                               7
CONCLUSION:
   Most of the performance related issues are caused by poor indexing, missing statistics, aged
   statistics , over collecting statistics, mismatched data types and missing filters and conditions
   on where clause. These can be eliminated if we follow the best practices of sound SQL
   techniques as discussed above. The key to efficient SQL coding is good knowledge of the
   database and understanding the various join-constraints and the mappings. Database
   knowledge accompanied with adherence to collect stats feature of TERADATA is the key to
   avoidance of Spool Space Error.



REFERENCES:
   1. http://www.info.teradata.com/Datawarehouse/eBrowseBy.cfm?page=TeradataDatabase
   2. http://www.teradatatech.com/
   3. www.google.com




                                               8

More Related Content

PDF
Hadoop YARN
PDF
Hadoop ecosystem
PDF
تنقيب البيانات
PPT
Hadoop hive presentation
PPTX
المحاضرة الثالثة لغات البرمجة
PPTX
Tableau Visual analytics complete deck 2
PDF
Introduction to Big Data
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Hadoop YARN
Hadoop ecosystem
تنقيب البيانات
Hadoop hive presentation
المحاضرة الثالثة لغات البرمجة
Tableau Visual analytics complete deck 2
Introduction to Big Data
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...

What's hot (20)

PDF
SAP BW/4HANA - Ein Überblick
PPTX
PPT on Hadoop
PDF
Big data Analytics
PDF
الذكاء الاصطناعي وتعلم الآلة: تعريف سريع جداً
PDF
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPSX
PPTX
Introduction to Machine Learning & AI
PPTX
Machine Learning and AI
PDF
A Roadmap to Data Migration Success
PPTX
Advanced Dimensional Modelling
PPTX
Machine Learning Models in Production
PDF
Etl design document
PPTX
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
PDF
DB2 for z/OS - Starter's guide to memory monitoring and control
PDF
NLP using transformers
PDF
Talend Open Studio Data Integration
PPT
Data warehouse
PPTX
Hadoop And Their Ecosystem ppt
PDF
Hadoop and Spark
PPTX
Introduction to HiveQL
SAP BW/4HANA - Ein Überblick
PPT on Hadoop
Big data Analytics
الذكاء الاصطناعي وتعلم الآلة: تعريف سريع جداً
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
Introduction to Machine Learning & AI
Machine Learning and AI
A Roadmap to Data Migration Success
Advanced Dimensional Modelling
Machine Learning Models in Production
Etl design document
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
DB2 for z/OS - Starter's guide to memory monitoring and control
NLP using transformers
Talend Open Studio Data Integration
Data warehouse
Hadoop And Their Ecosystem ppt
Hadoop and Spark
Introduction to HiveQL
Ad

Viewers also liked (20)

PDF
Lights camera action orlando - october 2015 -slide upload
PDF
Presentation on Instant page speed optimization
PDF
Echo Conference 2008
PDF
VietRees_Newsletter_26_Tuan2_Thang04
PPT
Practice Powerpoint
PDF
Image Digitization with Scanning Technology
PPT
Just in time
PPTX
Camtasia relay presentation final
PDF
VietRees_Newsletter_28_Week4_Month04_Year08
PDF
MOW communication plan for education
PDF
Striking a Balance: Middle Ground in Front-End Development
PDF
Summary of Digital Archive Package Tools Research and Development Project
PDF
Yes project.pptx
PDF
VietRees_Newsletter_56_Tuan1_Thang11
KEY
Digital Citizenship
PDF
Erasmus+ blue group presentation spain
PPT
Els Cocodrils Modificat
PDF
1 plan financiero proyecto emprendedor (1)
PDF
Digital Museum
PDF
Lights camera action orlando - october 2015 -slide upload
Presentation on Instant page speed optimization
Echo Conference 2008
VietRees_Newsletter_26_Tuan2_Thang04
Practice Powerpoint
Image Digitization with Scanning Technology
Just in time
Camtasia relay presentation final
VietRees_Newsletter_28_Week4_Month04_Year08
MOW communication plan for education
Striking a Balance: Middle Ground in Front-End Development
Summary of Digital Archive Package Tools Research and Development Project
Yes project.pptx
VietRees_Newsletter_56_Tuan1_Thang11
Digital Citizenship
Erasmus+ blue group presentation spain
Els Cocodrils Modificat
1 plan financiero proyecto emprendedor (1)
Digital Museum
Ad

Similar to White paper on Spool space in teradata (20)

PDF
Speed up sql
PPTX
SQL Server 2012 Best Practices
PPTX
Query Optimization in SQL Server
PPT
Ms sql server architecture
PDF
SQL Joins and Query Optimization
PDF
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
PDF
[INSIGHT OUT 2011] B26 optimising a two table join(jonathan lewis)
PPTX
Relational Database Management System
PDF
Troubleshooting MySQL Performance add-ons
PPTX
Building scalable application with sql server
PDF
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
PPTX
Oracle sql high performance tuning
PPTX
SQL Server 2008 Development for Programmers
PPTX
Databases for Storage Engineers
PDF
3 indexes
PPT
Mssql
PPT
SQL Server 2008 Performance Enhancements
PPTX
Oracle database performance tuning
PDF
VLDB 2009 Tutorial on Column-Stores
PPT
Oracle tips and tricks
Speed up sql
SQL Server 2012 Best Practices
Query Optimization in SQL Server
Ms sql server architecture
SQL Joins and Query Optimization
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
[INSIGHT OUT 2011] B26 optimising a two table join(jonathan lewis)
Relational Database Management System
Troubleshooting MySQL Performance add-ons
Building scalable application with sql server
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
Oracle sql high performance tuning
SQL Server 2008 Development for Programmers
Databases for Storage Engineers
3 indexes
Mssql
SQL Server 2008 Performance Enhancements
Oracle database performance tuning
VLDB 2009 Tutorial on Column-Stores
Oracle tips and tricks

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Dropbox Q2 2025 Financial Results & Investor Presentation

White paper on Spool space in teradata

  • 1. Spool Space in Teradata - Nazir Iqbal 07-March-2012 1
  • 2. Table of Contents 1. Introduction………………………………………………………………………………………….3 1.1 Spool Space………………………………………………………………………………….3 1.2 Spool Space and Capacity Planning………………………………………………….3 1.3 Spool Space Categories………………………………………………………………….4 1.4 Spool Space Allocation……………………………………………………………………4 2. Causes of spool space error and how to minimize it……………………………………4 3. Know the data…………………………………………………………………………………………5 4. Primary Index…………………………………………………………………………………………5 5. Multiset or Set table…………………………………………………………………………………6 6. Collect Statistics………………………………………………………………………………………7 7. Skewing………………………………………………………………………………………………….7 8. Conclusion……………………………………………………………………………………………….8 9. References………………………………………………………………………………………………8 2
  • 3. 1. INTRODUCTION: 1.1 SPOOL SPACE: TERADATA Spool Space is unused Perm Space that it used for running queries. Spool Space is used to hold intermediate rows during processing, and to hold the rows in the answer set of a transaction. TERADATA recommends 20% of the available perm space is allocated for Spool space but various across applications. In the majority of cases, well written SQL queries should not use huge amounts of spool space. A poor choice of join column, product join and lack of statistics are the main reason of excessive spool space consumption. Each user can be set a spool space limit. In later version of TERADATA, this is often set in the user’s profile. Insufficient spool error is usually the result of poor table design, poor data distribution, or a poorly written query. Running out of Spool Space will give the user an error code 2646. 1.2 SPOOL SPACE AND CAPACITY PLANNING: H E C K Spool Space and Capacity Planning are mutually dependent concepts. Spool space is critical to the operation of Teradata RDBMS, yet it is frequently overlooked in capacity planning. Size requirements vary from user to user, table to table and application to application. For instance, • The Spool space of a user is used to hold the response rows of every query run by that user during a session. Thus, each user needs a high enough spool allocation to contain the biggest anticipated answer set. •Tables containing huge data require more available spool space than smaller tables, because intermediate rows are held in spool space during query execution. 1.3. SPOOL SPACE CATEGORIES Spool falls into three categories of space. They are:- Volatile Intermediate Output 3
  • 4. Volatile Intermediate Output This Spool is retained until the Intermediate spool results are Output Spool results are either transaction completes (unless retained until no longer needed. final rows returned in the answer the table was created with ON We can determine when set for a query, rows updated COMMIT PRESERVE ROW), table intermediate spool is flushed by within, inserted into, or deleted is dropped manually during the examining the output of an from a base table. session, Session ends or Teradata EXPLAIN. RDBMS resets. 1.4. SPOOL SPACE ALLOCATION Teradata RDBMS allocates spool space dynamically only from disk cylinders that are not being used for permanent or temporary data. Permanent, temporary, and spool data blocks cannot co-exist on the same cylinder. Spool space is not reserved. All unused space in the Teradata RDBMS is considered available spool space. When spool is released, the file system returns the cylinders it was using to the free cylinder list. We allocate spool space for a database, a user, or a user profile, not at the table level. A SPOOL limit defined in a profile takes effect upon completion of a: • CREATE/MODIFY USER statement that assigns the profile to a user. • MODIFY PROFILE statement that changes the spool space limit. If the user is logged on, the profile specification affects the current session. Inefficient SQL queries generally results in Capacity Planning and Spool Space Allocation going wayward and throws up Spool Space Error which is one of the most common error encountered by a Teradata SQL Programmer. 2. CAUSES OF SPOOL SPACE ERROR AND HOW TO MINIMIZE IT: When Resource thresholds are met, like Spool Space exceeded, then either a warning is given by the DBAs or the query is aborted by them. Different thresholds are set for tactical, decision support, and ad-hoc scenarios. High skew is another cause for Spool Space been exceeded. Not all alerts or warnings indicate there is a problem, as some transactions use high CPU and spool because of large data volumes and complexity of the code. Often we have en-countered scenarios where a SQL query has been running for a long time. The reasons may be: • Missing or aged statistics. • Large product joins. • Merge joins where there is a many-to-many relationship. 4
  • 5. Set tables that should be Multiset. • Stats reflect zero rows on a table, yet are not empty. • A change in data volume which requires additional stats and it will generate a different explain plan. • Unbalanced parenthesis. The key is to know the data before writing SQL codes. 3. KNOW THE DATA Below are a few questions that should always be given a thought so that SQL codes are efficient and do not exceed the thresholds of Spool Space or CPU. 1. How many rows exist on the tables in the query? 2. What columns are we joining on? 3. Do we need to add filters or additional joins to reduce volume? 4. How many unique values exist on columns? 5. How many rows exist on tables that are duplicated? 6. Queries having derived tables will often show no confidence because the optimizer does not know how many rows are in a derived table. 7. High estimated time can indicate aged stats i.e. stats should be collected again. 8. What type of join is performed? Product Join Merge Join Hash Join This is a cross join every row from Requires sort of spool files. Merge The tables do not have to be sorted one table is joined to every row on join are efficient when there is not and the smaller table can be much the second table. Spool file is as a many to many relationship on larger than for a product join. The large as (No. Of rows table_one * columns involved in the join. smaller table/spool is "hashed" into No. Of rows table_two), large If there is a many to many memory. Then, the larger table is product joins (billions of rows) relationship, try to aggregate the scanned and for each row, it looks should be avoided. Product joins columns on one table to reduce the up the row from the smaller table are most efficient when a SMALL volume by creating a volatile table, in the hashed table that was lookup table is duplicated. Product derived table or work table. created in memory. If the smaller joins are inefficient when large fact table is broken into partitions to fit tables are duplicated (this can into memory, the larger table must indicate aged or missing stats). also be broken into the same partitions prior to the join. 4. PRIMARY INDEX A poor primary index having lumpy distribution data which can cause a query to run several hours when it should execute in seconds/minutes. Hence, we should choose a single column or multiple columns that distribute the data evenly across all AMPS. 5
  • 6. Eliminate columns from the primary index that have a lot of null values. Value change rate should be low or never. Column(s) should be frequently used in join constraint. Teradata is a multi parallel processor so a query runs as well as the SLOWEST AMP. If the table joins to a similar table having the same columns, the primary index on both tables should be the same. AMP-4 has much more data than AMP-1, AMP-2 and AMP-3 which causes Spool Space Error. Choice of primary index should be such to avoid such un-even data distribution across AMPs. 5. MULTISET OR SET TABLE A set table performs a duplicate row check. If there are a lot of non unique values for a primary index, this can be very CPU intensive. For example, for a primary index having 2000 values a duplicate row check will be performed 4,000,000 times. This is referred to as chaining. The first record is loaded. The next record having the same PI value to load, checks all the columns of the first one to determine if it is a duplicate. Once the third record is loaded, it checks both the first and the second records and so on. A Multiset table allows duplicate rows so the duplicate row check is omitted. If duplicates can be omitted using a group by or filtered programmatically, a load to a multiset table performs better. A Multiset table having a NUPI, non unique primary index, with occurrences between 500 – 2000 is not bad. For tables having non unique primary index where there are several hundred or a couple of thousands values for a given primary index ‘use a multiset table’ For tables having a more unique index like 1 to 10 values for a give primary index ‘use a set table’ Note: the FASTLOAD utility program will not allow duplicates, even if the target table is MULTISET. 6
  • 7. 6. COLLECT STATISTICS Poor or missing statistics OR Aged statistics may cause Spool Space Error. TERADATA recommends that COLLECT STATS should include:- 1. Individual columns in an index. 2. All columns in an index, multi-column where size is less than 16 bytes. 3. Join columns. 4. Filter or qualifying columns. 5. Secondary Index Statistics are not needed for temp tables that are not joined to other tables and only used for staging. Be careful to NOT over collect on statistics. If a table is updated by several inserts multiple times a day, the statistics do not need to be refreshed after each insert. One collection is Significant after the last insert. For tables being completely refreshed, the statistics are Needed after the refresh. TOOL TO CHECK THE EFFICIENCY OF A QUERY: Run this diagnostic command before the explain of the query. At the bottom of the explain it will list the statistics that are missing. Diagnostic helpstats on for session; Explain <OUR SQL TEXT> WE SHOULD NOT COLLECT STATISTICS ON EMPTY TABLES. This will cause the optimizer to choose an inefficient path based on the information available to the parser. Statistics should be collected when a table is initially loaded and anytime the table’s demographics change by more than 10%. After the initial collect statistics on an object, the user can run the statement below to refresh the table’s statistics based on the (new) data. Collect statistics on databasename.tablename; -- this will refresh all stats, index and column, that were previously gathered on a table To see the statistics that exist for a table, run the following: HELP STATISTICS databasename.tablename; 7. SKEWING: Proper primary index specification should evenly distribute the rows of a table across the AMPs. This prevents skewing. The Query Log Information in SQL Assistant and other Editors tells us about the degree of Skewing in a query.CPU Skew > 50 reflects worse case scenarios and generally any query having a CPU Skew > 4 is considered poor performing. Hence, by seeing the CPU Skew from the Query Log Information a programmer can easily make out which query needs to be fine-tuned to avoid high Skew. 7
  • 8. CONCLUSION: Most of the performance related issues are caused by poor indexing, missing statistics, aged statistics , over collecting statistics, mismatched data types and missing filters and conditions on where clause. These can be eliminated if we follow the best practices of sound SQL techniques as discussed above. The key to efficient SQL coding is good knowledge of the database and understanding the various join-constraints and the mappings. Database knowledge accompanied with adherence to collect stats feature of TERADATA is the key to avoidance of Spool Space Error. REFERENCES: 1. http://www.info.teradata.com/Datawarehouse/eBrowseBy.cfm?page=TeradataDatabase 2. http://www.teradatatech.com/ 3. www.google.com 8