SlideShare a Scribd company logo
::IBM Informix indexing techniques:
which one to use when ?
Eric Vercelletto Session A12
Begooden IT Consulting 4/23/2013 3:35 PM
• Introduction to Response Time measuring
• Identify the relevant indexing techniques
• Describe implementation method
• Confirm/recognize its use by accurate monitoring
• Measure its efficency as response time and
effective use in the database (sqltrace,sqexplain)
• Identify pros and cons
Agenda / methodology
4/24/2013 Session F12 2
Introduction
• Begooden IT Consulting is an IBM ISV company, mainly
focused on Informix technology services.
• Our 15+ years experience within Informix Software
France and Portugal helped us to acquire in depth
product knowledge as well as solid field experience.
• Our services include Informix implementation auditing,
performance tuning, issue management, administration
mentoring …
• We also happen to be the Querix reseller for France and
French speaking countries (except Québec and Louisiana)
• The company is based in Pont l’Abbé, Finistère, France
4/24/2013
3
Some basics not to forget about
There are 2 ways to measure response times
• The « cold » measure: the response time is measured just after
starting the engine, when data and index pages are not yet loaded
into Shared Memory IFMX buffers. Disk IO must be performed to
read the data and index pages, which will increase the RT.
• The « hot » measure: RT is measured when data and index pages
are loaded into SHMEM. No or few disk IO => RT is much shorter.
• This point can often explain surprising RT differences according to
how the data accessed.
• Broad range or DS queries most often access data and/or indexes in
disk pages
• OLTP queries mostly access data and indexes in SHMEM pages
4
Derivated thoughts and facts
• Reading data pages and/or index pages on disk always take
longer than in SHMEM. Full table scans can take minutes or
more, according to table size
• Reading data pages in SHMEM is very fast. Full scan of a
table in SHMEM take fractions of seconds or seconds, rarely
more.
• Reading index pages in SHMEM is also very fast. Added to
this, due to the B TREE structure, reading index pages
generally handles more contents than reading data pages.
• This often makes difficult the comparison of the efficiency
of 2 different indexes on the same table, when reading in
SHMEM.
5
Derivated thoughts and facts (continued)
• When running hot measures on indexes, the differences
can be as low as milliseconds BUT …
• Repeating millions of times 3 unuseful milliseconds can
make a difference!
• When the Response Times get to such a low level, sqltrace
is the tool you need to understand the query behaviour.
• In certain situations, saving milliseconds on a query will
make the difference. In other situations, saving seconds will
not make the difference.
• A bad response time can be caused by an unappropriate
indexation, but can also be caused by some « unusual »
logic adding unuseful efforts to be performed by the
applications and the server.
6
Comparing cold measure with hot measure (1)
• full scan of a mid-sized table tpcc:order_line,
containing 24 millions of rows
se l e ct * from order_line
on s t at -g his output
« Cold » read: performed just after oninit -v
« Hot read: performed after the first scan
Many disk pages read
zero disk pages read47.4 secs 19,4 secs secs
All buffer reads
7
Comparing cold measure with hot measure (2)
• Cold use of a poor selectivity index
select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values)
Cold read Hot read
Few disk readsMany disk reads
Execution time: 5,9 secs Execution time: 1.1 secs
8
BATCHEDREAD_INDEX: description
• This feature has been taken from XPS and
introduced in 11.50xC5.
• The purpose is to maximize the index keys access
by grouping the reading of many index keys into
large buffers, then fetching the rows associated
with those keys
• This technique brings strong savings in terms of
CPU and IO, therefore reducing Response Time.
• This technique is suitable and efficient for
massive index reads (DS/OLAP), not for pinpoint-
type (OLTP) index access.
9
BATCHEDREAD_INDEX: the test
• We will run the following query against a 30
millions rows clients table. The table has an
index on ‘lastname’. Row size is 328 bytes
output to /dev/null
select lastname,count(*)
from clients
group by 1
• This query returns 2,188,286 rows
10
BATCHEDREAD_INDEX: facts
• All those response times are measured as « cold »
AUTO_READAHEAD 0
BATCHEDREAD_INDEX 0
• AUTO_READAHEAD 0
BATCHEDREAD_INDEX 1
• AUTO_READAHEAD 1
BATCHEDREAD_INDEX 1
See the difference
11
BATCHEDREAD_INDEX: how ?
• BATCHEDREAD_INDEX can be set, as well as
BATCHEDREAD_TABLE, either in the onconfig file
• Or used as an environment variable before
launching the application
export IFX_BATCHEDREAD_INDEX=1
• Or as an SQL statement
SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1';
• Monitor index scan activity with onstat –g scn
•
12
Attached or Detached Index?
• The « Antique Informix Disk Layout » used to create the index pages in the same
extents as the data pages for the attached indexes. The expected result was
reducing disk IO.
• This layout happened to become a problem because the data pages were often
located far from the index pages, causing the opposite effect of increasing disk IO.
The official recommandation was at this time to create detached indexes for this
reason.
• Nowadays, index pages are created in a different partition than the data pages,
causing the attached indexes to have the same level of performance as the
detached indexes.
• But.. If you have the possibility to create the data dbspaces and the index
dbspaces on independant disks and channels , you will increase your disk IO
performance by reducing disk contention.
• This gain will be observed mainly during intensive sessions doing massive data
changes.
• Watch out the output of onstat –g iof and look for low IO thruput per second.
13
Few columns or many columns in the same index?
Key points to consider
• Remember about « cold » reads and « hot » reads when
testing the efficiency of an index. Results can be
dramatically different between cold and hot.
• The choice is as often a hard to obtain trade-off, and
definately a long subject to discuss!
• Many columns in a index can make it more selective, but it
also will consume more CPU/disk resource when updating
keys (see b-tree cleaner tuning)
• Few columns in an index can make it less selective, but it
will consume less CPU/disk resource when updating keys
• Integrity constraints are not negotiable, but some integrity
constraints indexes can be negotiated…
14
Few columns or many columns?
Techniques to evaluate efficiency
• time dbaccess dbname queryfile gives an
indication on the efficiency of an index, but can be
misleading due to cold and hot measure huge
differences.
• onmode –Y sessnum 1 will identify which
index(es) are used, also will inform on how many rows
have been scanned against how many rows have been
returned
• onstat –g his (sqltrace) will give fine detail
about response time, buffer and disk access, lock waits
etc…
• A complete diagnostic will be done with the 3 tools.
15
Few columns or Many columns?
Let’s analyze a real case: one column
16
Rows scanned: 4913
Response time: 0.0368’’
1 column index
buffer reads: 5900
Few columns or many columns?
Same case, index with 2 columns
17
Rows scanned: 106
Response time: 0.0047’’
2 columns index
Buffer reads: 122
Highly duplicated lead columns
indexes: how was life before?
• The Antique Informix Rule stated to avoid multi-
columns indexes with low selectivity for the
leading keys, due to poor efficiency.
Ex: warehouse_id,district_id,order_id,order_line
• Querying on order_line required to specify the
lead columns in the query predicate, or create
another index with order_line as lead column
• Restructuring indexes following those rules was a
complex, long and risky task, not to mention the
fact that any downtime due to index rebuilding
was poorly accepted by Operations Managers…
18
Index key first & self join : it’s magic!
• The key-first scan was introduced in 7.3. It has been enhanced so
that an index can be used even the lead columns are not specified
in the where clause
• The index self join technique has been introduced in IDS 11.10,
although many DBA’s didn’t even notice it!
• By scanning subsets of the poorly selective composite index, the
engine manages to use the non-subsequent index keys as index
filters, transforming the index into a highly selective index.
• Hierarchical-like indexes with highly duplicated lead columns now
need no redefinition to be efficient.
• You need not building new indexes with highly selective lead
columns. This saves optimizer work and disk space.
• Index self join is enabled by default. You can, if you persist in not
using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or
with an optimizer directive {+AVOID_INDEX_SJ}
19
Index self-join: the test
• We will use the order_line TPC-C table, that contains
23,735,211 rows
• The index follows the hierarchy, which was formerly
considered as a poor implementation:
ol_w_id: warehouse id (50 distinct values)
ol_d_id: district id (10 distinct values)
ol_o_id: order number ( 9279 distinct values)
ol_number: order line number (14 distinct values)
• The challenging query is
SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount)
FROM order_line
GROUP BY 1,2
ORDER BY 2,3
20
No Self join
• Use onmode -wm INDEX_SELFJOIN=0 to disable self join
21
Index is taken, but only key first
Many rows scanned
Response time: 11.258’’
Self join: find the differences!
22
Key-first + self join access
Rows scanned: =~ 100 times less
RT: 3.313’’
The Antique Informix Rule says:
“you will use only one index per table”
The AIR says:
“you will use only one index per table”
• The Antique Informix Rule stated that only one
index per table could be used
• The optimizer had to choose only one index
among several indexes for the same table,
although several indexes were needed.
• Many not so unrealistic query cases had to be
drastically re-written in order to provide
acceptable response times
• The trick was generally to use an UNION or a
nested query, but the query code readability and
maintenability suffered from that.
24
What A.I.R. obliged you to do
• Generally, the best way to workaround the RT
issue was to use either UNION or nested queries
• The trick could be efficient in terms of Response
Time, but the code got more complex to read and
to maintain
• This workaround needed to strongly modify the
application code, and needed detailed and
accurate tests to obtain the same results as with
the initial query
25
The optimizer constantly getting
smarter across releases
• An optimizer enhancement introduced the use
of several indexes on the same table, but only
if the where clauses were linked with the ‘OR’
operator.
• The query path is like a usual INDEX PATH, the
difference being the use of several indexes
26
Measure with INDEX PATH
Use of 3 indexes!
Simple INDEX PATH
Scanned rows: 376,000
RT: 2.489’’
27
Disk reads:: 34136
Multi index: different path
33% gain in RT
Multi-index /skip scan enabled
Response Time is shorter
3 indexes used
Disk reads: 1984
28
Multiple indexes:
what should be done?
• Generally, the optimizer decides correctly which is the best path
• You can compare the results with the use of UNION, then decide
between keeping hard to maintain code or not
• You can nonetheless use optimizer directives to force the access
method, like
{+ AVOID_MULTI_INDEX (clients)}
To force INDEX PATH
• Or
{+ MULTI_INDEX (clients)}
TO force multi index SKIP SCAN path
• Can get tricky to make a self choice if AND and OR conditions are
set on the involved indexes
• The difference is almost not visible in case of hot measure
• Statistics on indexes are very important, the access method can
change according to them!
29
Star join
• Star join is an extension of the MULTI INDEX concept
• It combines this technique with DYNAMIC HASH JOINS
• The technique has been ported from XPS to IDS 11.70
• It is used exclusively for DS/OLAP queries where a FACT
table is the center point of many dimension tables
• Requires PDQPRIORITY ( Ultimate Edition or Enterprise
Edition )
• If you consider using Star Join, you are an excellent
candidate to see a demo of Informix Warehouse
Accelerator!
30
The A.I.R says:
« you will avoid indexes with too many tree levels »
• Ok, but what could I do to solve that ?
My indexes are built with the data they
have inside, and nothing or almost
nothing can be done
• Databases and tables are getting
bigger and bigger, and
splitting/archiving part of the data is
not always an acceptable solution
31
FOREST OF TREES INDEXES
• The forest of trees index type has been
introduced in 11.70 xC1
• It replicates the model of a traditionnal B-
TREE, having several root nodes instead of
only one root node
• The forest of trees brings benefits when
contention against the root node is observed
32
Reducing b-tree levels number
on index « lastname,firstname »
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree
=> The initial number of b-trees levels is 6
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 10 buckets
=> The number of b-trees levels decreased to 5
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 100 buckets
=> The number of b-trees levels decreased to 4
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 1000 buckets
=> The number of b-trees levels decreased to 3
33
Tpcc with regular b-tree indexes
• Index iu_stock_01 has 4 levels
Tpcc result is 14093 tpmC
High contention on
iu_stock_01: 8,704,052 spins
in 4 mn
34
Tpcc with FOT on iu_stock_01
• create unique index iu_stock_01 on stock (s_w_id,s_i_id)
using btree in data03 HASH on (s_w_id) with 50 buckets;
• Index iu_stock_01 has now 3 levels
Result grew to 16413 tpmC
Contention on iu_stock_01
decreased from 8,704,000
to 149,600 spins in 4mn
iu_oorder_01 is now a good
candidate for FOT!
35
Main facts on FOT indexes
• FOT is very efficient on reducing concurrency on indexes
access => Better RT in OLTP context
• FOT is very efficient to reduce levels of B-TREE => Better
overall RT
• Ideal for primary keys and foreign keys in an high
concurrency OLTP context
• Implementation is easy and fast
• Supports main index functionality: ER, PK, FK, b-tree
cleaning…
• Does not support aggregate queries, range scans on HASH
ON columns
• Also does not support index clustering, index fillfactor and
functional(UDR based) indexes
36
Optimizing big index creation:
PSORT_NPROCS
• The PSORT_NPROCS env variable is used to allocate more
threads to the sort package, which is also used for parallel
index creation.
• Significant performance improvements on index creation
can be obtained on multi-core/multi-processor servers
• It can be used even with non PDQPRIORITY-enabled
editions if the server has more than one core/CPU.
• PSORT_NPROCS can unleash the memory consumption:
please check for available memory on the server.
• The onconfig parameter DS_NONPDQ_QUERY_MEM has to
be checked if using PSORT_NPROCS.
37
Optimizing big index creation
DBSPACETEMP or PSORT_DBTEMP
• The env variables DBSPACETEMP overrides the
same onconfig parameter.
• Generally raw-device based temp dbspaces offer
more performance than file system based files.
• PSORT_DBTEMP write temporary sort files in the
specified file-system based directories instead of
DBSPACETEMP.
• It is useful to spread the temporary sort files to a
wider list of directories mounted on different
spindles
38
PSORT_NPROCS/PSORT_DBTEMP:
facts
• create index id_clients_02 on clients(lastname,firstname)
• unset PSORT_NPROCS
unset PSORT_DBTEMP
=> 13m28.709s
• export PSORT_NPROCS=3
export PSORT_DBTEMP=
/tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id
s_chunks/ids_space03
=> 6m19
• A ram disk, or even a SSD drive can improve performance a lot:
export PSORT_NPROCS=3
export PSORT_DBTEMP=/mnt/myramdisk
=> 4m22.030s
• To check the environment of the session:
onstat –g env SessionNumber
39
Index disable: What happens?
• Disabling an existing index will prevent the server from using this
index, but it will « remember » the index schema.
• This technique can be applied before executing massive data insert
or update, since it will alleviate the index keys update workload.
• Heavy side effects can be expected: loss of key unicity, loss of
performance…
• If you run a query on a disabled index, the optimizer will probably
choose a sequential scan unless a better path is found.
• The index will be seen as ‘disabled’ in dbschema, but will not be
seen in oncheck –pT no oncheck –pe
• Disabling an index will make its former disk space available in the
dbspace
• Disabling an index is immediate
• Syntax is: set indexes IndexName disabled
40
Index enable: what happens?
• Enabling an index will rebuild the index physically,
with the same definition as before
• Enabling an index takes as much time as creating
the same index
• But the enable statement is simpler to type than the
create index statement 
• + you do not have to remember the initial create
index statement 
• Syntax is: set indexes IndexName enabled
41
Digging for more performance:
Disable foreign key indexes
• Many times, foreign key indexes are a part of the same table’s primary
key.
• order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number)
order_line foreign key (ol_w_id,ol_d_id,ol_o_id)
• Using ‘disable index’ in the add constraint statement will save the
creation of an ‘unuseful’ index, because its structure is already existing
in the primary key.
• ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id)
REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED);
• This implementation will save disk space by dropping an index
• CPU resource will be saved when updating/deleting/creating index keys,
• and consequently disk IO will also be saved.
• Check that disabling the constraint index has no hidden side effects, an
mistake can have expensive consequences!
42
I need to create a new index,
but users are always connected to the table!
• Sometimes a new index needs to be created, but
the tables are accessed by users or batches.
• IDS 11.10 introduced the possibility to create an
index without putting an exclusive lock on the table,
called index online.
• Users can SELECT, INSERT, UPDATE or DELETE rows
in the table while the index is being created
• Syntax is:
create index id_clients_01 on clients(lastname,firstname)ONLINE
• Drop index online is also available in the same
conditions
43
Create index online:
precautions & restrictions
• The create index online is a complex operation, involving
table snapshot, base index build catch up and more.
• It will request additional resources, such as disk space, CPU
and memory in order to make the operation safe and as
fast as possible.
• Long transactions may happen: check logical logs size
before diving
• The index pre-image pool memory size is managed with the
onconfig parameter ONLIDX_MAXMEM, updatable with
onmode –wm
• No appliable for cluster index, UDT columns, no UDR
indexes
• Only one create index online per table at the same time
44
Index compression
• IDS introduced table compression in 11.50 xC4. This technology is now
used successfully in large databases implementations.
• Index compression is a new feature of IDS 12.10. It is based on the
same technology as table compression.
• The principle is to compress the key columns values at b-tree leaf level,
but not the rowids attached to these key values
• Index compression is very effective for indexes having large key values:
names, item names etc…
• The compression dictionary must contain at least 2000 unique key
values
• Index compression is an excellent way to save disk space, and …
• Since more key values fit in an index page, more key values can be read
in one IO cycle => IO is more efficient
• Reducing IO must enhance index access performance in large queries
45
Index compression:
Disk space gained
• Execute function task ("index compress", "id_clients_01", "staging");
• Or
execute function task(“index compress”, “j”,“testdb”);
• Or
create index id_clients_01 on clients(lastname,firstname) compressed
More than 50% compression rate
46
Cluster index
• The creation or alter of a cluster index will physically sort
the table data by the first column of this index at creation
time
• Accessing a table data with a cluster index will read already
sorted data pages.
• Generally makes IO on data pages easier because they are
contiguous => Decrease RT
• The cluster level will decrease as long as new rows are
insert
• High cost of administration: re-clustering this index will
rewrite the table data pages
• Cluster index can be good for stable tables accessed in a
ordered sequential way
47
Statistics on indexes
• Introduced in 11.70: when one creates an index,
the distributions for this index are automatically
created
• High mode statistics are generated for the lead
column
• Index levels statistics are also generated in low
mode
• This will not stop you from regularly updating
statistics for those indexes, but it is no more
required to do it just after the index creation
Questions?
Indexing techniques: which one to use when
Eric Vercelletto Begooden IT Consulting eric.vercelletto@begooden-it.com

More Related Content

PDF
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
PDF
Efficient Monitoring & Tuning of Dynamic SQL in DB2 for z/OS by Namik Hrle ...
PDF
DB2 for z/OS Bufferpool Tuning win by Divide and Conquer or Lose by Multiply ...
PDF
Using Release(deallocate) and Painful Lessons to be learned on DB2 locking
PDF
DB2 for z/OS Real Storage Monitoring, Control and Planning
PPTX
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
PPT
08 operating system support
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
Efficient Monitoring & Tuning of Dynamic SQL in DB2 for z/OS by Namik Hrle ...
DB2 for z/OS Bufferpool Tuning win by Divide and Conquer or Lose by Multiply ...
Using Release(deallocate) and Painful Lessons to be learned on DB2 locking
DB2 for z/OS Real Storage Monitoring, Control and Planning
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
08 operating system support

Viewers also liked (8)

PDF
Covering Indexes Ordersof Magnitude Improvements
PPT
Explain that explain
PPT
IBM Informix Database SQL Set operators and ANSI Hash Join
PPT
Optimizer Enhancement in Informix
PPTX
IBM Informix - What's new in 12.10.xc7
PPTX
MySQL Performance Tips & Best Practices
PDF
Mysql Explain Explained
PDF
How to Design Indexes, Really
Covering Indexes Ordersof Magnitude Improvements
Explain that explain
IBM Informix Database SQL Set operators and ANSI Hash Join
Optimizer Enhancement in Informix
IBM Informix - What's new in 12.10.xc7
MySQL Performance Tips & Best Practices
Mysql Explain Explained
How to Design Indexes, Really
Ad

Similar to A12 vercelletto indexing_techniques (20)

PPTX
Optimizing Application Performance - 2022.pptx
PPTX
SQL Server Wait Types Everyone Should Know
PPT
Reduced instruction set computers
PPTX
Scaling Security Workflows in Government Agencies
PDF
Breaking data
PPTX
Percona FT / TokuDB
PDF
Scaling tappsi
PDF
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
PDF
InfiniFlux vs_RDBMS
PPT
Performance Tuning by Dijesh P
PDF
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
PDF
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
PDF
chap2_slidesforparallelcomputingananthgarama
PPTX
Building scalable application with sql server
PPT
Performance Enhancement with Pipelining
PDF
This is Unit 1 of High Performance Computing For SRM students
PPT
Top schools in gudgao
PDF
Redshift deep dive
PPTX
Top 10 tips for Oracle performance (Updated April 2015)
PPT
Top schools in gudgao
Optimizing Application Performance - 2022.pptx
SQL Server Wait Types Everyone Should Know
Reduced instruction set computers
Scaling Security Workflows in Government Agencies
Breaking data
Percona FT / TokuDB
Scaling tappsi
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
InfiniFlux vs_RDBMS
Performance Tuning by Dijesh P
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
chap2_slidesforparallelcomputingananthgarama
Building scalable application with sql server
Performance Enhancement with Pipelining
This is Unit 1 of High Performance Computing For SRM students
Top schools in gudgao
Redshift deep dive
Top 10 tips for Oracle performance (Updated April 2015)
Top schools in gudgao
Ad

More from BeGooden-IT Consulting (8)

PDF
Querix lycia presentation v1.2 fr
PDF
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
PDF
Querix Lycia: 4GL is modern!
PDF
A15 ibm informix on power8 power linux
PDF
IBM informix: compared performance efficiency between physical server and Vir...
PDF
Informix4gl status
PDF
Ibm informix security functionality overview
PDF
F12 vercelletto innovator-c_tpc_benchmark
Querix lycia presentation v1.2 fr
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix Lycia: 4GL is modern!
A15 ibm informix on power8 power linux
IBM informix: compared performance efficiency between physical server and Vir...
Informix4gl status
Ibm informix security functionality overview
F12 vercelletto innovator-c_tpc_benchmark

Recently uploaded (20)

PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Unlocking AI with Model Context Protocol (MCP)
MIND Revenue Release Quarter 2 2025 Press Release
Group 1 Presentation -Planning and Decision Making .pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
TLE Review Electricity (Electricity).pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Enhancing emotion recognition model for a student engagement use case through...
Web App vs Mobile App What Should You Build First.pdf
Zenith AI: Advanced Artificial Intelligence
Heart disease approach using modified random forest and particle swarm optimi...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Hybrid model detection and classification of lung cancer
A comparative analysis of optical character recognition models for extracting...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
Accuracy of neural networks in brain wave diagnosis of schizophrenia

A12 vercelletto indexing_techniques

  • 1. ::IBM Informix indexing techniques: which one to use when ? Eric Vercelletto Session A12 Begooden IT Consulting 4/23/2013 3:35 PM
  • 2. • Introduction to Response Time measuring • Identify the relevant indexing techniques • Describe implementation method • Confirm/recognize its use by accurate monitoring • Measure its efficency as response time and effective use in the database (sqltrace,sqexplain) • Identify pros and cons Agenda / methodology 4/24/2013 Session F12 2
  • 3. Introduction • Begooden IT Consulting is an IBM ISV company, mainly focused on Informix technology services. • Our 15+ years experience within Informix Software France and Portugal helped us to acquire in depth product knowledge as well as solid field experience. • Our services include Informix implementation auditing, performance tuning, issue management, administration mentoring … • We also happen to be the Querix reseller for France and French speaking countries (except Québec and Louisiana) • The company is based in Pont l’Abbé, Finistère, France 4/24/2013 3
  • 4. Some basics not to forget about There are 2 ways to measure response times • The « cold » measure: the response time is measured just after starting the engine, when data and index pages are not yet loaded into Shared Memory IFMX buffers. Disk IO must be performed to read the data and index pages, which will increase the RT. • The « hot » measure: RT is measured when data and index pages are loaded into SHMEM. No or few disk IO => RT is much shorter. • This point can often explain surprising RT differences according to how the data accessed. • Broad range or DS queries most often access data and/or indexes in disk pages • OLTP queries mostly access data and indexes in SHMEM pages 4
  • 5. Derivated thoughts and facts • Reading data pages and/or index pages on disk always take longer than in SHMEM. Full table scans can take minutes or more, according to table size • Reading data pages in SHMEM is very fast. Full scan of a table in SHMEM take fractions of seconds or seconds, rarely more. • Reading index pages in SHMEM is also very fast. Added to this, due to the B TREE structure, reading index pages generally handles more contents than reading data pages. • This often makes difficult the comparison of the efficiency of 2 different indexes on the same table, when reading in SHMEM. 5
  • 6. Derivated thoughts and facts (continued) • When running hot measures on indexes, the differences can be as low as milliseconds BUT … • Repeating millions of times 3 unuseful milliseconds can make a difference! • When the Response Times get to such a low level, sqltrace is the tool you need to understand the query behaviour. • In certain situations, saving milliseconds on a query will make the difference. In other situations, saving seconds will not make the difference. • A bad response time can be caused by an unappropriate indexation, but can also be caused by some « unusual » logic adding unuseful efforts to be performed by the applications and the server. 6
  • 7. Comparing cold measure with hot measure (1) • full scan of a mid-sized table tpcc:order_line, containing 24 millions of rows se l e ct * from order_line on s t at -g his output « Cold » read: performed just after oninit -v « Hot read: performed after the first scan Many disk pages read zero disk pages read47.4 secs 19,4 secs secs All buffer reads 7
  • 8. Comparing cold measure with hot measure (2) • Cold use of a poor selectivity index select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values) Cold read Hot read Few disk readsMany disk reads Execution time: 5,9 secs Execution time: 1.1 secs 8
  • 9. BATCHEDREAD_INDEX: description • This feature has been taken from XPS and introduced in 11.50xC5. • The purpose is to maximize the index keys access by grouping the reading of many index keys into large buffers, then fetching the rows associated with those keys • This technique brings strong savings in terms of CPU and IO, therefore reducing Response Time. • This technique is suitable and efficient for massive index reads (DS/OLAP), not for pinpoint- type (OLTP) index access. 9
  • 10. BATCHEDREAD_INDEX: the test • We will run the following query against a 30 millions rows clients table. The table has an index on ‘lastname’. Row size is 328 bytes output to /dev/null select lastname,count(*) from clients group by 1 • This query returns 2,188,286 rows 10
  • 11. BATCHEDREAD_INDEX: facts • All those response times are measured as « cold » AUTO_READAHEAD 0 BATCHEDREAD_INDEX 0 • AUTO_READAHEAD 0 BATCHEDREAD_INDEX 1 • AUTO_READAHEAD 1 BATCHEDREAD_INDEX 1 See the difference 11
  • 12. BATCHEDREAD_INDEX: how ? • BATCHEDREAD_INDEX can be set, as well as BATCHEDREAD_TABLE, either in the onconfig file • Or used as an environment variable before launching the application export IFX_BATCHEDREAD_INDEX=1 • Or as an SQL statement SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1'; • Monitor index scan activity with onstat –g scn • 12
  • 13. Attached or Detached Index? • The « Antique Informix Disk Layout » used to create the index pages in the same extents as the data pages for the attached indexes. The expected result was reducing disk IO. • This layout happened to become a problem because the data pages were often located far from the index pages, causing the opposite effect of increasing disk IO. The official recommandation was at this time to create detached indexes for this reason. • Nowadays, index pages are created in a different partition than the data pages, causing the attached indexes to have the same level of performance as the detached indexes. • But.. If you have the possibility to create the data dbspaces and the index dbspaces on independant disks and channels , you will increase your disk IO performance by reducing disk contention. • This gain will be observed mainly during intensive sessions doing massive data changes. • Watch out the output of onstat –g iof and look for low IO thruput per second. 13
  • 14. Few columns or many columns in the same index? Key points to consider • Remember about « cold » reads and « hot » reads when testing the efficiency of an index. Results can be dramatically different between cold and hot. • The choice is as often a hard to obtain trade-off, and definately a long subject to discuss! • Many columns in a index can make it more selective, but it also will consume more CPU/disk resource when updating keys (see b-tree cleaner tuning) • Few columns in an index can make it less selective, but it will consume less CPU/disk resource when updating keys • Integrity constraints are not negotiable, but some integrity constraints indexes can be negotiated… 14
  • 15. Few columns or many columns? Techniques to evaluate efficiency • time dbaccess dbname queryfile gives an indication on the efficiency of an index, but can be misleading due to cold and hot measure huge differences. • onmode –Y sessnum 1 will identify which index(es) are used, also will inform on how many rows have been scanned against how many rows have been returned • onstat –g his (sqltrace) will give fine detail about response time, buffer and disk access, lock waits etc… • A complete diagnostic will be done with the 3 tools. 15
  • 16. Few columns or Many columns? Let’s analyze a real case: one column 16 Rows scanned: 4913 Response time: 0.0368’’ 1 column index buffer reads: 5900
  • 17. Few columns or many columns? Same case, index with 2 columns 17 Rows scanned: 106 Response time: 0.0047’’ 2 columns index Buffer reads: 122
  • 18. Highly duplicated lead columns indexes: how was life before? • The Antique Informix Rule stated to avoid multi- columns indexes with low selectivity for the leading keys, due to poor efficiency. Ex: warehouse_id,district_id,order_id,order_line • Querying on order_line required to specify the lead columns in the query predicate, or create another index with order_line as lead column • Restructuring indexes following those rules was a complex, long and risky task, not to mention the fact that any downtime due to index rebuilding was poorly accepted by Operations Managers… 18
  • 19. Index key first & self join : it’s magic! • The key-first scan was introduced in 7.3. It has been enhanced so that an index can be used even the lead columns are not specified in the where clause • The index self join technique has been introduced in IDS 11.10, although many DBA’s didn’t even notice it! • By scanning subsets of the poorly selective composite index, the engine manages to use the non-subsequent index keys as index filters, transforming the index into a highly selective index. • Hierarchical-like indexes with highly duplicated lead columns now need no redefinition to be efficient. • You need not building new indexes with highly selective lead columns. This saves optimizer work and disk space. • Index self join is enabled by default. You can, if you persist in not using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or with an optimizer directive {+AVOID_INDEX_SJ} 19
  • 20. Index self-join: the test • We will use the order_line TPC-C table, that contains 23,735,211 rows • The index follows the hierarchy, which was formerly considered as a poor implementation: ol_w_id: warehouse id (50 distinct values) ol_d_id: district id (10 distinct values) ol_o_id: order number ( 9279 distinct values) ol_number: order line number (14 distinct values) • The challenging query is SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount) FROM order_line GROUP BY 1,2 ORDER BY 2,3 20
  • 21. No Self join • Use onmode -wm INDEX_SELFJOIN=0 to disable self join 21 Index is taken, but only key first Many rows scanned Response time: 11.258’’
  • 22. Self join: find the differences! 22 Key-first + self join access Rows scanned: =~ 100 times less RT: 3.313’’
  • 23. The Antique Informix Rule says: “you will use only one index per table”
  • 24. The AIR says: “you will use only one index per table” • The Antique Informix Rule stated that only one index per table could be used • The optimizer had to choose only one index among several indexes for the same table, although several indexes were needed. • Many not so unrealistic query cases had to be drastically re-written in order to provide acceptable response times • The trick was generally to use an UNION or a nested query, but the query code readability and maintenability suffered from that. 24
  • 25. What A.I.R. obliged you to do • Generally, the best way to workaround the RT issue was to use either UNION or nested queries • The trick could be efficient in terms of Response Time, but the code got more complex to read and to maintain • This workaround needed to strongly modify the application code, and needed detailed and accurate tests to obtain the same results as with the initial query 25
  • 26. The optimizer constantly getting smarter across releases • An optimizer enhancement introduced the use of several indexes on the same table, but only if the where clauses were linked with the ‘OR’ operator. • The query path is like a usual INDEX PATH, the difference being the use of several indexes 26
  • 27. Measure with INDEX PATH Use of 3 indexes! Simple INDEX PATH Scanned rows: 376,000 RT: 2.489’’ 27 Disk reads:: 34136
  • 28. Multi index: different path 33% gain in RT Multi-index /skip scan enabled Response Time is shorter 3 indexes used Disk reads: 1984 28
  • 29. Multiple indexes: what should be done? • Generally, the optimizer decides correctly which is the best path • You can compare the results with the use of UNION, then decide between keeping hard to maintain code or not • You can nonetheless use optimizer directives to force the access method, like {+ AVOID_MULTI_INDEX (clients)} To force INDEX PATH • Or {+ MULTI_INDEX (clients)} TO force multi index SKIP SCAN path • Can get tricky to make a self choice if AND and OR conditions are set on the involved indexes • The difference is almost not visible in case of hot measure • Statistics on indexes are very important, the access method can change according to them! 29
  • 30. Star join • Star join is an extension of the MULTI INDEX concept • It combines this technique with DYNAMIC HASH JOINS • The technique has been ported from XPS to IDS 11.70 • It is used exclusively for DS/OLAP queries where a FACT table is the center point of many dimension tables • Requires PDQPRIORITY ( Ultimate Edition or Enterprise Edition ) • If you consider using Star Join, you are an excellent candidate to see a demo of Informix Warehouse Accelerator! 30
  • 31. The A.I.R says: « you will avoid indexes with too many tree levels » • Ok, but what could I do to solve that ? My indexes are built with the data they have inside, and nothing or almost nothing can be done • Databases and tables are getting bigger and bigger, and splitting/archiving part of the data is not always an acceptable solution 31
  • 32. FOREST OF TREES INDEXES • The forest of trees index type has been introduced in 11.70 xC1 • It replicates the model of a traditionnal B- TREE, having several root nodes instead of only one root node • The forest of trees brings benefits when contention against the root node is observed 32
  • 33. Reducing b-tree levels number on index « lastname,firstname » • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree => The initial number of b-trees levels is 6 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 10 buckets => The number of b-trees levels decreased to 5 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 100 buckets => The number of b-trees levels decreased to 4 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 1000 buckets => The number of b-trees levels decreased to 3 33
  • 34. Tpcc with regular b-tree indexes • Index iu_stock_01 has 4 levels Tpcc result is 14093 tpmC High contention on iu_stock_01: 8,704,052 spins in 4 mn 34
  • 35. Tpcc with FOT on iu_stock_01 • create unique index iu_stock_01 on stock (s_w_id,s_i_id) using btree in data03 HASH on (s_w_id) with 50 buckets; • Index iu_stock_01 has now 3 levels Result grew to 16413 tpmC Contention on iu_stock_01 decreased from 8,704,000 to 149,600 spins in 4mn iu_oorder_01 is now a good candidate for FOT! 35
  • 36. Main facts on FOT indexes • FOT is very efficient on reducing concurrency on indexes access => Better RT in OLTP context • FOT is very efficient to reduce levels of B-TREE => Better overall RT • Ideal for primary keys and foreign keys in an high concurrency OLTP context • Implementation is easy and fast • Supports main index functionality: ER, PK, FK, b-tree cleaning… • Does not support aggregate queries, range scans on HASH ON columns • Also does not support index clustering, index fillfactor and functional(UDR based) indexes 36
  • 37. Optimizing big index creation: PSORT_NPROCS • The PSORT_NPROCS env variable is used to allocate more threads to the sort package, which is also used for parallel index creation. • Significant performance improvements on index creation can be obtained on multi-core/multi-processor servers • It can be used even with non PDQPRIORITY-enabled editions if the server has more than one core/CPU. • PSORT_NPROCS can unleash the memory consumption: please check for available memory on the server. • The onconfig parameter DS_NONPDQ_QUERY_MEM has to be checked if using PSORT_NPROCS. 37
  • 38. Optimizing big index creation DBSPACETEMP or PSORT_DBTEMP • The env variables DBSPACETEMP overrides the same onconfig parameter. • Generally raw-device based temp dbspaces offer more performance than file system based files. • PSORT_DBTEMP write temporary sort files in the specified file-system based directories instead of DBSPACETEMP. • It is useful to spread the temporary sort files to a wider list of directories mounted on different spindles 38
  • 39. PSORT_NPROCS/PSORT_DBTEMP: facts • create index id_clients_02 on clients(lastname,firstname) • unset PSORT_NPROCS unset PSORT_DBTEMP => 13m28.709s • export PSORT_NPROCS=3 export PSORT_DBTEMP= /tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id s_chunks/ids_space03 => 6m19 • A ram disk, or even a SSD drive can improve performance a lot: export PSORT_NPROCS=3 export PSORT_DBTEMP=/mnt/myramdisk => 4m22.030s • To check the environment of the session: onstat –g env SessionNumber 39
  • 40. Index disable: What happens? • Disabling an existing index will prevent the server from using this index, but it will « remember » the index schema. • This technique can be applied before executing massive data insert or update, since it will alleviate the index keys update workload. • Heavy side effects can be expected: loss of key unicity, loss of performance… • If you run a query on a disabled index, the optimizer will probably choose a sequential scan unless a better path is found. • The index will be seen as ‘disabled’ in dbschema, but will not be seen in oncheck –pT no oncheck –pe • Disabling an index will make its former disk space available in the dbspace • Disabling an index is immediate • Syntax is: set indexes IndexName disabled 40
  • 41. Index enable: what happens? • Enabling an index will rebuild the index physically, with the same definition as before • Enabling an index takes as much time as creating the same index • But the enable statement is simpler to type than the create index statement  • + you do not have to remember the initial create index statement  • Syntax is: set indexes IndexName enabled 41
  • 42. Digging for more performance: Disable foreign key indexes • Many times, foreign key indexes are a part of the same table’s primary key. • order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number) order_line foreign key (ol_w_id,ol_d_id,ol_o_id) • Using ‘disable index’ in the add constraint statement will save the creation of an ‘unuseful’ index, because its structure is already existing in the primary key. • ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id) REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED); • This implementation will save disk space by dropping an index • CPU resource will be saved when updating/deleting/creating index keys, • and consequently disk IO will also be saved. • Check that disabling the constraint index has no hidden side effects, an mistake can have expensive consequences! 42
  • 43. I need to create a new index, but users are always connected to the table! • Sometimes a new index needs to be created, but the tables are accessed by users or batches. • IDS 11.10 introduced the possibility to create an index without putting an exclusive lock on the table, called index online. • Users can SELECT, INSERT, UPDATE or DELETE rows in the table while the index is being created • Syntax is: create index id_clients_01 on clients(lastname,firstname)ONLINE • Drop index online is also available in the same conditions 43
  • 44. Create index online: precautions & restrictions • The create index online is a complex operation, involving table snapshot, base index build catch up and more. • It will request additional resources, such as disk space, CPU and memory in order to make the operation safe and as fast as possible. • Long transactions may happen: check logical logs size before diving • The index pre-image pool memory size is managed with the onconfig parameter ONLIDX_MAXMEM, updatable with onmode –wm • No appliable for cluster index, UDT columns, no UDR indexes • Only one create index online per table at the same time 44
  • 45. Index compression • IDS introduced table compression in 11.50 xC4. This technology is now used successfully in large databases implementations. • Index compression is a new feature of IDS 12.10. It is based on the same technology as table compression. • The principle is to compress the key columns values at b-tree leaf level, but not the rowids attached to these key values • Index compression is very effective for indexes having large key values: names, item names etc… • The compression dictionary must contain at least 2000 unique key values • Index compression is an excellent way to save disk space, and … • Since more key values fit in an index page, more key values can be read in one IO cycle => IO is more efficient • Reducing IO must enhance index access performance in large queries 45
  • 46. Index compression: Disk space gained • Execute function task ("index compress", "id_clients_01", "staging"); • Or execute function task(“index compress”, “j”,“testdb”); • Or create index id_clients_01 on clients(lastname,firstname) compressed More than 50% compression rate 46
  • 47. Cluster index • The creation or alter of a cluster index will physically sort the table data by the first column of this index at creation time • Accessing a table data with a cluster index will read already sorted data pages. • Generally makes IO on data pages easier because they are contiguous => Decrease RT • The cluster level will decrease as long as new rows are insert • High cost of administration: re-clustering this index will rewrite the table data pages • Cluster index can be good for stable tables accessed in a ordered sequential way 47
  • 48. Statistics on indexes • Introduced in 11.70: when one creates an index, the distributions for this index are automatically created • High mode statistics are generated for the lead column • Index levels statistics are also generated in low mode • This will not stop you from regularly updating statistics for those indexes, but it is no more required to do it just after the index creation
  • 49. Questions? Indexing techniques: which one to use when Eric Vercelletto Begooden IT Consulting eric.vercelletto@begooden-it.com