SlideShare a Scribd company logo
SQL Server Data Indexing
Clustered Tables vs Heap Tables
• If a table has no indexes or only has non-clustered indexes it is called a heap
An age old question is whether or not a table must have a clustered index. The
answer is no, but in most cases, it is a good idea to have a clustered index on the
table to store the data in a specific order.
• The name suggests itself, these tables have a Clustered Index. Data is stored in a
specific order based on a Clustered Index key.
Cluster table
Heap Tables
Clustered Tables vs Heap Tables
HEAP
• Data is not stored in any particular
order
• Specific data can not be retrieved
quickly, unless there are also non-
clustered indexes.
• Data pages are not linked, so
sequential access needs to refer back
to the index allocation map (IAM)
pages
• Since there is no clustered index,
additional time is not needed to
maintain the index
• Since there is no clustered index, there
is not the need for additional space to
store the clustered index tree
• These tables have a index_id value of 0
in the sys.indexes catalog view
Clustered Table
• Data is stored in order based on the
clustered index key
• Data can be retrieved quickly based on the
clustered index key, if the query uses the
indexed columns
• Data pages are linked for faster sequential
access
• Additional time is needed to maintain
clustered index based on INSERTS,
UPDATES and DELETES
• Additional space is needed to store
clustered index tree
• These tables have a index_id value of 1 in
the sys.indexes catalog view
Clustered Tables vs Heap Tables
Types of Indexes
• Clustered index
• Nonclustered index
• Unique index
• Filtered index
• Covered Index
• Columnstore index
• Non-Key Index Columns
• Implied indexes
Created by some constraints
i. Primary Key
ii. Unique
Types of Indexes
• Full-text index
A special type of token-based functional index that is built and maintained by
the Microsoft Full-Text Engine for SQL Server. It provides efficient support for
sophisticated word searches in character string data.
• Spatial index
A spatial index provides the ability to perform certain operations more
efficiently on spatial objects (spatial data) in a column of the geometry data
type.
Types of Indexes
SQL Server Index Basics
Clustered Index
• The top-most node of this tree is called
the "root node"
• The bottom level of the nodes is called
"leaf nodes"
• Any index level between the root node
and leaf node is called an "intermediate
level"
• The leaf nodes contain the data pages of
the table in the case of a cluster index.
• The root and intermediate nodes
contain index pages holding an index
row.
• Each index row contains a key value and
pointer to intermediate level pages of
the B-tree or leaf level of the index.
• The pages in each level of the index are
linked in a doubly-linked list.
Clustered Index
Database and leaf node
Root Abby Bob Carol Dave
Abby Ada Andy Ann
Ada Alan Amanda Amy
• A clustered index
sorts and stores the
data rows of the table
or view in order based
on the clustered index
key.
• The clustered index is
implemented as a B-
tree index structure
that supports fast
retrieval of the rows,
based on their
clustered index key
values.
The basic syntax to create a clustered index is
CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column);
• A clustered index stores the data for the table based on the columns defined in the
create index statement. As such, only one clustered index can be defined for the
table because the data can only be stored and sorted one way per table.
Nonclustered Index
• Index Leaf Nodes and Corresponding Table Data
• Each index entry consists of the
indexed columns (the key,
column 2) and refers to the
corresponding table row
(via ROWID or RID).
• Unlike the index, the table data is
stored in a heap structure and is
not sorted at all.
• There is neither a relationship
between the rows stored in the
same table block nor is there any
connection between the blocks.
Nonclustered Index
Database
Root Abby Bob Carol Dave
Amy Ada Amanda Alan
Leaf node
Abby Ada Andy Ann
Ada Alan Amanda Amy
• A nonclustered index can be
defined on a table or view
with a clustered index or on a
heap.
• Each index row in the
nonclustered index contains
the nonclustered key value
and a row locator
The basic syntax for a nonclustered index is
CREATE INDEX Index_Name ON Schema.TableName(Column);
• SQL Server supports
up to 999
nonclustered
indexes per table.
CLUSTERED VS. NONCLUSTERED INDEXES
• Clustered index: a SQL Server index that sorts and stores data
rows in a table, based on key values.
• Nonclustered index: a SQL Server index which contains a key
value and a pointer to the data in the heap or clustered index.
• The difference between clustered and nonclustered SQL
Server indexes is that
• a clustered index controls the physical order of the data pages.
• The data pages of a clustered index will always include all the columns
in the table, even if you only create the index on one column.
• The column(s) you specify as key columns affect how the pages are
stored in the B-tree index structure
• A nonclustered index does not affect the ordering and storing of the
data
Clustered and Nonclustered Indexes Interact
• Clustered indexes are always unique
– If you don’t specify unique when creating them, SQL Server may
add a “uniqueifier” to the index key
• Only used when there actually is a duplicate
• Adds 4 bytes to the key
• The clustering key is used in nonclustered indexes
– This allows SQL Server to go directly to the record from the
nonclustered index
– If there is no clustered index, a record identifier will be used instead
1 Jones John
2 Smith Mary
3 Adams Mark
4 Douglas Susan
Adams 3
Douglas 4
Jones 1
Smith 2
Leaf node of a clustered
index on EmployeeID
Leaf node of a nonclustered
index on LastName
Clustered and Nonclustered Indexes Interact
(continued)
• Another reason to keep the clustering key small!
• Consider the following query:
SELECT LastName, FirstName
FROM Employee
WHERE LastName = 'Douglas'
• When SQL Server uses the nonclustered index, it
– Traverses the nonclustered index until it finds the desired key
– Picks up the associated clustering key
– Traverses the clustered index to find the data
Deciding what indexes go where?
• Indexes speed access, but costly to maintain
– Almost every update to table requires altering both data pages
and every index.
• All inserts and deletions affect all indexes
• Many updates will affect non-clustered indexes
• Sometimes less is more
– Not creating an index sometimes may be best
• Code for tranasaction have where clause? What columns used?
Sort requried?
• Selectivity
– Indexes, particularly non-clustered indexes, are primarily beneficial in
situations where there is a reasonably HIGH LEVEL of Selectivity within
the index.
• % of values in column that are unique
• Higher percentage of unique values, the higher the selectivity
– If 80% of parts are either ‘red’ or ‘green’ not very selective
Deciding what indexes go where?
Choosing Clustered Index
• Only one per table! - Choose wisely
• Default, primary key creates clustered index
– Do you really want your prime key to be clustered index?
– Option: create table foo myfooExample
(column1 int identify
primary key nonclustered
column2 ….
)
– Changing clustered index can be costly
• How long? Do I have enough space?
Clustered Indexes Pros & Cons
• Pros
– Clustered indexes best for queries where columns in question will
frequently be the subject of
• RANGE query (e.g., between)
• Group by with max, min, count
– Search can go straight to particular point in data and just keep reading
sequentially from there.
– Clustered indexes helpful with order by based on clustered key
Clustered Indexes Pros & Cons
• The Cons – two situations
– Don’t use clustered index on column just because seems thing to do
(e.g., primary key default)
– Lots of inserts in non-sequential order
• Constant page splits, include data page as well as index pages
• Choose clustered key that is going to be sequential inserting
• Don’t use a clustered index at all perhaps?
These are limits, not goals. Every index you create will take up space in your
database. The index will also need to be modified when inserts, updates, and
deletes are performed. This will lead to CPU and disk overhead, so craft indexes
carefully and test them thoroughly
There are a few limits to indexes.
• There can be only one clustered index per table.
• SQL Server supports up to 999 nonclustered indexes per table.
• An index – clustered or nonclustered – can be a maximum of 16 columns and
900 bytes.
Limits to indexes
PRIMARY KEY AS A CLUSTERED INDEX
• Primary key: a constraint to enforce uniqueness in a table. The primary key
columns cannot hold NULL values.
• In SQL Server, when you create a primary key on a table, if a clustered index
is not defined and a nonclustered index is not specified, a unique clustered
index is created to enforce the constraint.
• However, there is no guarantee that this is the best choice for a clustered
index for that table.
• Make sure you are carefully considering this in your indexing strategy.
Unique Index
• An index that ensures the uniqueness of each value in the indexed column.
• If the index is a composite, the uniqueness is enforced across the columns as a whole,
not on the individual columns.
• For example, • if you were to create an index on the FirstName and LastName
columns in a table, the names together must be unique, but the
individual names can be duplicated.
• A unique index is automatically created when you define a primary key or unique
constraint:
• Primary key: When you define a primary key constraint on one or more
columns, SQL Server automatically creates a unique, clustered index if a
clustered index does not already exist on the table or view. However, you can
override the default behavior and define a unique, nonclustered index on the
primary key.
• Unique: When you define a unique constraint, SQL Server automatically creates
a unique, nonclustered index. You can specify that a unique clustered index be
created if a clustered index does not already exist on the table.
• A unique index ensures that the index key contains no duplicate values.
Both clustered and nonclustered indexes can be unique.
Filtered index
• An optimized nonclustered index, especially suited to cover queries that select from a
well-defined subset of data.
• SQL Server 2008 introduces Filtered Indexes which is an index with a WHERE clause
• Filtered indexes can provide the following advantages over full-table indexes:
• Improved query performance and plan quality
• Reduced index maintenance costs
• Reduced index storage costs
A well-designed filtered index improves query performance and execution plan quality
because it is smaller than a full-table nonclustered index and has filtered statistics
An index is maintained only when data manipulation language (DML) statements affect
the data in the index. A filtered index reduces index maintenance costs compared with a
full-table nonclustered index because it is smaller and is only maintained when the data
in the index is changed.
Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table
index is not necessary.
Filtered index
Design Considerations
• When a column only has a small number of relevant values for queries, you can create a
filtered index on the subset of values. For example, when the values in a column are mostly
NULL and the query selects only from the non-NULL values, you can create a filtered index for
the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a
full-table nonclustered index defined on the same key columns.
• When a table has heterogeneous data rows, you can create a filtered index for one or more
categories of data. This can improve the performance of queries on these data rows by narrowing
the focus of a query to a specific area of the table. Again, the resulting index will be smaller and
cost less to maintain than a full-table nonclustered index.
SELECT ComponentID, StartDate FROM Production.BillOfMaterials
WITH ( INDEX ( FIBillOfMaterialsWithEndDate ) ) WHERE EndDate
IN ('20000825', '20000908', '20000918');
To ensure that a filtered index is used in a SQL query
CREATE NONCLUSTERED INDEX
FIBillOfMaterialsWithEndDate ON
Production.BillOfMaterials (ComponentID,
StartDate) WHERE EndDate IS NOT NULL ;
Covering Indexes
• When a nonclustered index includes all the data requested in a query (both the items
in the SELECT list and the WHERE clause), it is called a covering index
• With a covering index, there is no need to access the actual data pages
– Only the leaf nodes of the nonclustered index are accessed
– For example, your query might retrieve the FirstName ,LastName and DOB columns from a
table, based on a value in the ContactID column. You can create a covering index that
includes all three columns.
• Because the leaf node of a clustered index is the data itself, a clustered index covers all
queries
Leaf node of a nonclustered index on LastName, FirstName, Birthdate
Adams Mark 1/14/1956 3
Douglas Susan 12/12/1947 4
Jones John 4/15/1967 1
Smith Mary 7/14/1970 2
The last column is EmployeeID.
Remember that the clustering key
is always included in a
nonclustered index.
Non-Key Index Columns
• SQL Server 2005 and later allow you to include columns in a non-clustered
index that are not part of the key
– Allows the index to cover more queries
– Included columns only appear in the leaf level of the index
– Up to 1,023 additional columns
– Can include data types that cannot be key columns
• Except text, ntext, and image data types
• Syntax
CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
• Example
CREATE NONCLUSTERED INDEX NameRegion_IDX
ON Employees(LastName)
INCLUDE (Region)
KEY VS. NONKEY COLUMNS
• Key columns: the columns specified to create a clustered or nonclustered index.
• Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.
• The basic syntax to create a nonclustered index with nonkey columns is:
• CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE
(ColumnA, ColumnB);
• A column cannot be both a key and a non-key. It is either a key column or a non-
key, included column.
• The difference lies in where the data about the column is stored in the B-tree.
Clustered and nonclustered key columns are stored at every level of the index –
the columns appear on the leaf and all intermediate levels. A nonkey column will
only be stored at the leaf level, however.
• There are benefits to using non-key columns.
• Columns can be accessed with an index scan.
• Data types not allowed in key columns are allowed in nonkey columns. All data
types but text, ntext, and image are allowed.
• Included columns do not count against the 900 byte index key limit enforced by
SQL Server.
The query we want to use is
SELECT ProductID, Name, ProductNumber, Color
FROM dbo.Products
WHERE Color = 'Black';
The first index is nonclustered, with two key columns:
CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name,
ProductNumber);
The second is also nonclustered, with two key columns and three nonkey
columns:
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON
dbo.Products(Name, ProductNumber)
INCLUDE (Color, Class, Style);
In this case, the first index would not be a covering index for that query. The
second index would be a covering index for that specific query.
COVERING INDEXES EXAMPLES
Column Store Index Basic
There are two types of storage available in the database; RowStore and ColumnStore.
In RowStore, data rows are placed sequentially on a page while in ColumnStore values
from a single column, but from multiple rows are stored adjacently. So a ColumnStore
Index works using ColumnStore storage.
We cannot perform DML ( Insert Update  Delete ) operations on a table having a
ColumnStore Index, because this puts the data in a Read Only mode.
So one big advantage of using this feature is a Data Warehouse where most operations
are read only.
Creating Column Store Index
Creating a ColumnStore Index is the same as creating a NonClustered Index except we
need to add the ColumnStore keyword as shown below.
The syntax of a ColumnStore Index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name
(Column1,Column2,... Column N)
Example:
-- Creating Non - CLustered ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]ON
[dbo].[Test_Person]([FirstName] , [MiddleName],[LastName])
• The cost when using the ColumnStore index is 4 times less than the
traditional non-clustered index.
Fill Factor
• When you create an index the fill factor option indicates how full the
leaf level pages are when the index is created or rebuilt.
• Valid values are 0 to 100.
• A fill factor of 0 means that all of the leaf level pages are full.
• If data is always inserted at the end of the table, then the fill factor could be
between 90 to 100 percent since the data will never be inserted into the
middle of a page.
• If the data can be inserted anywhere in the table then a fill factor of 60 to 80
percent could be appropriate based on the INSERT, UPDATE and DELETE
activity.
Introduction of sql server indexing
How SQL Server Indexes Work
B-Tree Index Data Structure
• SQL Server indexes are based on B-trees
– Special records called nodes that allow keyed access to data
– Two kinds of nodes are special
• Root
• Leaf
Intermediate node
Leaf
node
Data
pages
Root node A O
O T
T W
E IGCA T
A C E G I K M N O Q
A I
• If there are enough records, intermediate levels may be added as well.
• Clustered index leaf-level pages contain the data in the table.
• Nonclustered index leaf-level pages contain the key value and a pointer to the
data row in the clustered index or heap.
SQL Server B-Tree Rules
• Root and intermediate nodes point only to other nodes
• Only leaf nodes point to data
• The number of nodes between the root and any leaf is the same for all leaves
• B+tree can have more than 1 keys in a node, in fact thousands of keys is seen typically
stored in a node and hence, the branching factor of a B+tree is very large.
• B-trees are always sorted
• The tree will be maintained during insertion, deletion, and updating so that these rules are
met
– When records are inserted or updated, nodes may split
– When records are deleted, nodes may be collapsed
• B+trees have all the key values in their leaf nodes. All the leaf nodes of a B+tree are at
the same height, which implies that every index lookup will take same number of B+tree
lookups to find a value.
• Within a B+tree all leaf nodes are linked together in a linked-listed, left to right, and since
the values at the leaf nodes are sorted, so range lookups are very efficient.
What Is a Node?
• A page that contains key and pointer pairs
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Splitting a B-Tree Node
Root (Level 0)
Node (Level 1)
Leaf (Level 2)
Abby Bob Carol Dave
Abby Ada Andy Ann
Ada Alan Amanda Amy
Bob Alan Amanda Carol Amy Dave Ada DB
Let’s Add Alice
• Step 1: Split the leaf node
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice Amanda Amy
Adding Alice
• Step 2: Split the next level up
DB
Leaf
Abby Ada Amanda Andy Ann
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice Amanda Amy
Adding Alice
(continued)• Split the root
DB
LeafAda Alan Alice
Bob Alan Amanda Carol Amy Dave Ada Alice
Amanda Amy
Andy Ann
Carol DaveAbby Andy Bob
Abby Ada Amanda
Adding Alice
(continued)
• When the root splits, the tree grows another level
Root (Level 0)
Node
(Level 1)
Node
(Level 2)
Leaf
(Level 3)
DB
Abby Carol
Amanda Amy
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice
Abby Andy Bob
Abby Ada Amanda
Carol Dave
Andy Ann
Page splits cause fragmentation
• Two types of fragmentation
– Data pages in a clustered table
– Index pages in all indexes
• Fragmentation happens because these pages must be kept in order
• Data page fragmentation happens when a new record must be added to a page that is full
– Consider an Employee table with a clustered index on LastName, FirstName
– A new employee, Peter Dent, is hired
ExtentAdams, Carol
Ally, Kent
Baccus, Mary
David, Sue
Dulles, Kelly
Edom, Mike
Farly, Lee
Frank, Joe
Ollen, Carol
Oppus, Larry...
Data Page Fragmentation
Extent
ExtentDulles, Kelly
Edom, Mike ...
Adams, Carol
Ally, Kent
Baccus, Mary
David, Sue
Dent, Peter
Farly, Lee
Frank, Joe
Ollen, Carol
Oppus, Larry...
Index Fragmentation
• Index page fragmentation occurs when a new key-pointer pair must be
added to an index page that is full
– Consider an Employee table with a nonclustered index on Social
Security Number
• Employee 048-12-9875 is added
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
046-11-9987, pointer
048-33-9874, pointer
052-87-8373, pointer
116-11-9987, pointer
116-33-9874, pointer ...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
Extent
Index Fragmentation
(continued)
Extent
Extent
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
046-11-9987, pointer
048-12-9875, pointer
116-11-9987, pointer
116-33-9874, pointer ...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
048-33-9874, pointer
052-87-8373, pointer
...
Introduction of sql server indexing
How B+tree Indexes Impact
Performance
Why use B+tree?
• B+tree is used for an obvious reason and that is speed.
• As we know that there are space limitations when it comes to memory, and not
all of the data can reside in memory, and hence majority of the data has to be
saved on disk.
• Disk as we know is a lot slower as compared to memory because it has
moving parts.
• So if there were no tree structure to do the lookup, then to find a value in a
database, the DBMS would have to do a sequential scan of all the records.
• Now imagine a data size of a billion rows, and you can clearly see that
sequential scan is going to take very long.
• But with B+tree, its possible to store a
billion key values (with pointers to billion
rows) at a height of 3, 4 or 5, so that every
key lookup out of the billion keys is going
to take 3, 4 or 5 disk accesses, which is a
huge saving.
This goes to show the effectiveness of a
B+tree index, more than 16 million key
values can be stored in a B+tree of height
1 and every key value can be accessed in
exactly 2 lookups.
How is B+tree structured?
• B+trees are normally structured in such a
way that the size of a node is chosen
according to the page size.
• Why? Because whenever data is accessed
on disk, instead of reading a few bits, a
whole page of data is read, because that is
much cheaper.
• Let us look at an example,
Consider InnoDB whose page size is 16KB
• and suppose we have an index on a integer
column of size 4bytes
• So a node can contain at most 16 * 1024 /
4 = 4096 keys, and a node can have at
most 4097 children.
• So for a B+tree of height 1, the root node
has 4096 keys and the nodes at height 1
(the leaf nodes) have 4096 * 4097 =
16781312 key values.
• So the size of the index values have a direct bearing on performance!
How important is the size of the index values?
As can be seen from the above example, the size of the index values plays a very
important role for the following reasons:
• The longer the index, the less number of values that can fit in a node,
and hence the more the height of the B+tree.
• The more the height of the tree, the more disk accesses are needed.
• The more the disk accesses the less the performance.
Index Design
• For tables that are heavily updated, use as few columns as possible in the
index, and don’t over-index the tables.
• If a table contains a lot of data but data modifications are low, use as many
indexes as necessary to improve query performance
• For clustered indexes, try to keep the length of the indexed columns as short
as possible. Ideally, try to implement your clustered indexes on unique
columns that do not permit null values.
• The uniqueness of values in a column affects index performance. In general,
the more duplicate values you have in a column, the more poorly the index
performs.
Index design should take into account a number of considerations.
Index Design
• In addition, indexes are automatically updated when the data rows themselves
are updated, which can lead to additional overhead and can affect
performance.
• Due to the storage and sorting impacts, be sure to carefully determine the best
column for this index.
• The number of columns in the clustered (or non clustered) index can have
significant performance implications with heavy INSERT, UPDATE and DELETE
activity in your database.
• For composite indexes, take into consideration the order of the columns in the
index definition. Columns that will be used in comparison expressions in the
WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first.
• You can also index computed columns if they meet certain requirements. For
example, the expression used to generate the values must be deterministic
(which means it always returns the same result for a specified set of inputs).
Identifying Fragmentation vs. page
splits
DBCC SHOWCONTIG
Page 283
Resolving Fragmentation
Heap Tables:
• For heap tables this is not as easy. The following are different options you can
take to resolve the fragmentation:
• Create a clustered index
• Create a new table and insert data from the heap table into the new table
based on some sort order
• Export the data, truncate the table and import the data back into the table
Clustered Tables:
• Resolving the fragmentation for a clustered table can be done easily by
rebuilding or reorganizing your clustered index. This was shown in this
previous tip: SQL Server 2000 to 2005 Crosswalk - Index Rebuilds.
DBCC DBREINDEX
DBCC INDEXDEFRAG
( { database_name | database_id | 0 }
, { table_name | table_id}
, { index_name | index_id }
)
Introduction of sql server indexing
Mahabubur Rahaman
Senior Database Architect
Orion Informatics Ltd

More Related Content

PPTX
PPTX
Introduction to Data Science
PPTX
Caching
PPTX
Cloud storage presentation.pptx
PDF
Introduction to Microsoft Azure Cloud
PPTX
Machine learning in Cyber Security
PPTX
Ppt on data science
PPTX
Normalization in DBMS
Introduction to Data Science
Caching
Cloud storage presentation.pptx
Introduction to Microsoft Azure Cloud
Machine learning in Cyber Security
Ppt on data science
Normalization in DBMS

What's hot (20)

DOCX
Index in sql server
PPT
SQL select clause
PPTX
Triggers
PDF
Triggers in SQL | Edureka
PPTX
Query Optimization in SQL Server
PPT
MySql slides (ppt)
PPT
Advanced sql
PPTX
Sql server ___________session_17(indexes)
PPT
Sql join
PPTX
Sql subquery
PPT
MYSQL.ppt
PPTX
Basic sql Commands
PPTX
An Introduction To Oracle Database
PPT
Introduction to structured query language (sql)
ODP
Partitioning
PPTX
SQL Join Basic
PPTX
SQL Joins.pptx
PPTX
Sql joins inner join self join outer joins
Index in sql server
SQL select clause
Triggers
Triggers in SQL | Edureka
Query Optimization in SQL Server
MySql slides (ppt)
Advanced sql
Sql server ___________session_17(indexes)
Sql join
Sql subquery
MYSQL.ppt
Basic sql Commands
An Introduction To Oracle Database
Introduction to structured query language (sql)
Partitioning
SQL Join Basic
SQL Joins.pptx
Sql joins inner join self join outer joins
Ad

Similar to Introduction of sql server indexing (20)

PPTX
9. index and index organized table
PDF
Database Indexes
PPTX
File Organization in database management.pptx
PDF
Introduction to NOSQL quadrants
PPTX
dotnetMALAGA - Sql query tuning guidelines
PPTX
Sql performance tuning
PPTX
We Don't Need Roads: A Developers Look Into SQL Server Indexes
PDF
Statistics and Indexes Internals
PDF
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
PPTX
SAG_Indexing and Query Optimization
PDF
DBMS and SQL Questions and Answers (1).pdf
PPTX
Index
PPTX
Types of no sql databases
PPTX
Lecture 17 (Week 11) - MYSQL INDEXES.pptx
PPT
No sql or Not only SQL
PDF
Indexing and-hashing
PPT
Tunning overview
PPTX
Database_Indexing_AND ITTS TYPES PRESENTATION
PPTX
Sql query performance analysis
PPTX
Database indexing techniques
9. index and index organized table
Database Indexes
File Organization in database management.pptx
Introduction to NOSQL quadrants
dotnetMALAGA - Sql query tuning guidelines
Sql performance tuning
We Don't Need Roads: A Developers Look Into SQL Server Indexes
Statistics and Indexes Internals
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SAG_Indexing and Query Optimization
DBMS and SQL Questions and Answers (1).pdf
Index
Types of no sql databases
Lecture 17 (Week 11) - MYSQL INDEXES.pptx
No sql or Not only SQL
Indexing and-hashing
Tunning overview
Database_Indexing_AND ITTS TYPES PRESENTATION
Sql query performance analysis
Database indexing techniques
Ad

More from Mahabubur Rahaman (6)

DOCX
Transaction isolationexamples
DOCX
Lock basicsexamples
PPTX
Sql server concurrency
DOCX
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
DOCX
supporting t-sql scripts for Heap vs clustered table
PPTX
Introduction to Apache Hadoop Ecosystem
Transaction isolationexamples
Lock basicsexamples
Sql server concurrency
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
supporting t-sql scripts for Heap vs clustered table
Introduction to Apache Hadoop Ecosystem

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Foundation of Data Science unit number two notes
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Computer network topology notes for revision
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Global journeys: estimating international migration
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Database Infoormation System (DBIS).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Reliability_Chapter_ presentation 1221.5784
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
IB Computer Science - Internal Assessment.pptx
Supervised vs unsupervised machine learning algorithms
Foundation of Data Science unit number two notes
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Computer network topology notes for revision
Introduction-to-Cloud-ComputingFinal.pptx
.pdf is not working space design for the following data for the following dat...
Moving the Public Sector (Government) to a Digital Adoption
Global journeys: estimating international migration
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Database Infoormation System (DBIS).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Galatica Smart Energy Infrastructure Startup Pitch Deck
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd

Introduction of sql server indexing

  • 1. SQL Server Data Indexing
  • 2. Clustered Tables vs Heap Tables • If a table has no indexes or only has non-clustered indexes it is called a heap An age old question is whether or not a table must have a clustered index. The answer is no, but in most cases, it is a good idea to have a clustered index on the table to store the data in a specific order. • The name suggests itself, these tables have a Clustered Index. Data is stored in a specific order based on a Clustered Index key. Cluster table Heap Tables
  • 3. Clustered Tables vs Heap Tables HEAP • Data is not stored in any particular order • Specific data can not be retrieved quickly, unless there are also non- clustered indexes. • Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages • Since there is no clustered index, additional time is not needed to maintain the index • Since there is no clustered index, there is not the need for additional space to store the clustered index tree • These tables have a index_id value of 0 in the sys.indexes catalog view
  • 4. Clustered Table • Data is stored in order based on the clustered index key • Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns • Data pages are linked for faster sequential access • Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES • Additional space is needed to store clustered index tree • These tables have a index_id value of 1 in the sys.indexes catalog view Clustered Tables vs Heap Tables
  • 5. Types of Indexes • Clustered index • Nonclustered index • Unique index • Filtered index
  • 6. • Covered Index • Columnstore index • Non-Key Index Columns • Implied indexes Created by some constraints i. Primary Key ii. Unique Types of Indexes
  • 7. • Full-text index A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data. • Spatial index A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. Types of Indexes
  • 9. Clustered Index • The top-most node of this tree is called the "root node" • The bottom level of the nodes is called "leaf nodes" • Any index level between the root node and leaf node is called an "intermediate level" • The leaf nodes contain the data pages of the table in the case of a cluster index. • The root and intermediate nodes contain index pages holding an index row. • Each index row contains a key value and pointer to intermediate level pages of the B-tree or leaf level of the index. • The pages in each level of the index are linked in a doubly-linked list.
  • 10. Clustered Index Database and leaf node Root Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy • A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. • The clustered index is implemented as a B- tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The basic syntax to create a clustered index is CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column); • A clustered index stores the data for the table based on the columns defined in the create index statement. As such, only one clustered index can be defined for the table because the data can only be stored and sorted one way per table.
  • 11. Nonclustered Index • Index Leaf Nodes and Corresponding Table Data • Each index entry consists of the indexed columns (the key, column 2) and refers to the corresponding table row (via ROWID or RID). • Unlike the index, the table data is stored in a heap structure and is not sorted at all. • There is neither a relationship between the rows stored in the same table block nor is there any connection between the blocks.
  • 12. Nonclustered Index Database Root Abby Bob Carol Dave Amy Ada Amanda Alan Leaf node Abby Ada Andy Ann Ada Alan Amanda Amy • A nonclustered index can be defined on a table or view with a clustered index or on a heap. • Each index row in the nonclustered index contains the nonclustered key value and a row locator The basic syntax for a nonclustered index is CREATE INDEX Index_Name ON Schema.TableName(Column); • SQL Server supports up to 999 nonclustered indexes per table.
  • 13. CLUSTERED VS. NONCLUSTERED INDEXES • Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values. • Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index. • The difference between clustered and nonclustered SQL Server indexes is that • a clustered index controls the physical order of the data pages. • The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. • The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure • A nonclustered index does not affect the ordering and storing of the data
  • 14. Clustered and Nonclustered Indexes Interact • Clustered indexes are always unique – If you don’t specify unique when creating them, SQL Server may add a “uniqueifier” to the index key • Only used when there actually is a duplicate • Adds 4 bytes to the key • The clustering key is used in nonclustered indexes – This allows SQL Server to go directly to the record from the nonclustered index – If there is no clustered index, a record identifier will be used instead 1 Jones John 2 Smith Mary 3 Adams Mark 4 Douglas Susan Adams 3 Douglas 4 Jones 1 Smith 2 Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName
  • 15. Clustered and Nonclustered Indexes Interact (continued) • Another reason to keep the clustering key small! • Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' • When SQL Server uses the nonclustered index, it – Traverses the nonclustered index until it finds the desired key – Picks up the associated clustering key – Traverses the clustered index to find the data
  • 16. Deciding what indexes go where? • Indexes speed access, but costly to maintain – Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more – Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried?
  • 17. • Selectivity – Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity – If 80% of parts are either ‘red’ or ‘green’ not very selective Deciding what indexes go where?
  • 18. Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index – Do you really want your prime key to be clustered index? – Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) – Changing clustered index can be costly • How long? Do I have enough space?
  • 19. Clustered Indexes Pros & Cons • Pros – Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count – Search can go straight to particular point in data and just keep reading sequentially from there. – Clustered indexes helpful with order by based on clustered key
  • 20. Clustered Indexes Pros & Cons • The Cons – two situations – Don’t use clustered index on column just because seems thing to do (e.g., primary key default) – Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps?
  • 21. These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly There are a few limits to indexes. • There can be only one clustered index per table. • SQL Server supports up to 999 nonclustered indexes per table. • An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes. Limits to indexes
  • 22. PRIMARY KEY AS A CLUSTERED INDEX • Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values. • In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. • However, there is no guarantee that this is the best choice for a clustered index for that table. • Make sure you are carefully considering this in your indexing strategy.
  • 23. Unique Index • An index that ensures the uniqueness of each value in the indexed column. • If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. • For example, • if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated. • A unique index is automatically created when you define a primary key or unique constraint: • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key. • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. • A unique index ensures that the index key contains no duplicate values. Both clustered and nonclustered indexes can be unique.
  • 24. Filtered index • An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. • SQL Server 2008 introduces Filtered Indexes which is an index with a WHERE clause • Filtered indexes can provide the following advantages over full-table indexes: • Improved query performance and plan quality • Reduced index maintenance costs • Reduced index storage costs A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics An index is maintained only when data manipulation language (DML) statements affect the data in the index. A filtered index reduces index maintenance costs compared with a full-table nonclustered index because it is smaller and is only maintained when the data in the index is changed. Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table index is not necessary.
  • 25. Filtered index Design Considerations • When a column only has a small number of relevant values for queries, you can create a filtered index on the subset of values. For example, when the values in a column are mostly NULL and the query selects only from the non-NULL values, you can create a filtered index for the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a full-table nonclustered index defined on the same key columns. • When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data. This can improve the performance of queries on these data rows by narrowing the focus of a query to a specific area of the table. Again, the resulting index will be smaller and cost less to maintain than a full-table nonclustered index. SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH ( INDEX ( FIBillOfMaterialsWithEndDate ) ) WHERE EndDate IN ('20000825', '20000908', '20000918'); To ensure that a filtered index is used in a SQL query CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate ON Production.BillOfMaterials (ComponentID, StartDate) WHERE EndDate IS NOT NULL ;
  • 26. Covering Indexes • When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index • With a covering index, there is no need to access the actual data pages – Only the leaf nodes of the nonclustered index are accessed – For example, your query might retrieve the FirstName ,LastName and DOB columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns. • Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate Adams Mark 1/14/1956 3 Douglas Susan 12/12/1947 4 Jones John 4/15/1967 1 Smith Mary 7/14/1970 2 The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.
  • 27. Non-Key Index Columns • SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key – Allows the index to cover more queries – Included columns only appear in the leaf level of the index – Up to 1,023 additional columns – Can include data types that cannot be key columns • Except text, ntext, and image data types • Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON <object> ( column [ ASC | DESC ] [ ,...n ] ) [ INCLUDE ( column_name [ ,...n ] ) ] • Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)
  • 28. KEY VS. NONKEY COLUMNS • Key columns: the columns specified to create a clustered or nonclustered index. • Nonkey columns: columns added to the INCLUDE clause of a nonclustered index. • The basic syntax to create a nonclustered index with nonkey columns is: • CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE (ColumnA, ColumnB); • A column cannot be both a key and a non-key. It is either a key column or a non- key, included column. • The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however. • There are benefits to using non-key columns. • Columns can be accessed with an index scan. • Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed. • Included columns do not count against the 900 byte index key limit enforced by SQL Server.
  • 29. The query we want to use is SELECT ProductID, Name, ProductNumber, Color FROM dbo.Products WHERE Color = 'Black'; The first index is nonclustered, with two key columns: CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber); The second is also nonclustered, with two key columns and three nonkey columns: CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber) INCLUDE (Color, Class, Style); In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query. COVERING INDEXES EXAMPLES
  • 30. Column Store Index Basic There are two types of storage available in the database; RowStore and ColumnStore. In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored adjacently. So a ColumnStore Index works using ColumnStore storage. We cannot perform DML ( Insert Update Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
  • 31. Creating Column Store Index Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below. The syntax of a ColumnStore Index is: CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N) Example: -- Creating Non - CLustered ColumnStore Index on 3 Columns CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]ON [dbo].[Test_Person]([FirstName] , [MiddleName],[LastName]) • The cost when using the ColumnStore index is 4 times less than the traditional non-clustered index.
  • 32. Fill Factor • When you create an index the fill factor option indicates how full the leaf level pages are when the index is created or rebuilt. • Valid values are 0 to 100. • A fill factor of 0 means that all of the leaf level pages are full. • If data is always inserted at the end of the table, then the fill factor could be between 90 to 100 percent since the data will never be inserted into the middle of a page. • If the data can be inserted anywhere in the table then a fill factor of 60 to 80 percent could be appropriate based on the INSERT, UPDATE and DELETE activity.
  • 34. How SQL Server Indexes Work
  • 35. B-Tree Index Data Structure • SQL Server indexes are based on B-trees – Special records called nodes that allow keyed access to data – Two kinds of nodes are special • Root • Leaf Intermediate node Leaf node Data pages Root node A O O T T W E IGCA T A C E G I K M N O Q A I • If there are enough records, intermediate levels may be added as well. • Clustered index leaf-level pages contain the data in the table. • Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.
  • 36. SQL Server B-Tree Rules • Root and intermediate nodes point only to other nodes • Only leaf nodes point to data • The number of nodes between the root and any leaf is the same for all leaves • B+tree can have more than 1 keys in a node, in fact thousands of keys is seen typically stored in a node and hence, the branching factor of a B+tree is very large. • B-trees are always sorted • The tree will be maintained during insertion, deletion, and updating so that these rules are met – When records are inserted or updated, nodes may split – When records are deleted, nodes may be collapsed • B+trees have all the key values in their leaf nodes. All the leaf nodes of a B+tree are at the same height, which implies that every index lookup will take same number of B+tree lookups to find a value. • Within a B+tree all leaf nodes are linked together in a linked-listed, left to right, and since the values at the leaf nodes are sorted, so range lookups are very efficient.
  • 37. What Is a Node? • A page that contains key and pointer pairs Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer
  • 38. Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy Bob Alan Amanda Carol Amy Dave Ada DB
  • 39. Let’s Add Alice • Step 1: Split the leaf node Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  • 40. Adding Alice • Step 2: Split the next level up DB Leaf Abby Ada Amanda Andy Ann Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  • 41. Adding Alice (continued)• Split the root DB LeafAda Alan Alice Bob Alan Amanda Carol Amy Dave Ada Alice Amanda Amy Andy Ann Carol DaveAbby Andy Bob Abby Ada Amanda
  • 42. Adding Alice (continued) • When the root splits, the tree grows another level Root (Level 0) Node (Level 1) Node (Level 2) Leaf (Level 3) DB Abby Carol Amanda Amy Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Abby Andy Bob Abby Ada Amanda Carol Dave Andy Ann
  • 43. Page splits cause fragmentation • Two types of fragmentation – Data pages in a clustered table – Index pages in all indexes • Fragmentation happens because these pages must be kept in order • Data page fragmentation happens when a new record must be added to a page that is full – Consider an Employee table with a clustered index on LastName, FirstName – A new employee, Peter Dent, is hired ExtentAdams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  • 44. Data Page Fragmentation Extent ExtentDulles, Kelly Edom, Mike ... Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  • 45. Index Fragmentation • Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full – Consider an Employee table with a nonclustered index on Social Security Number • Employee 048-12-9875 is added 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer Extent
  • 46. Index Fragmentation (continued) Extent Extent 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer 048-33-9874, pointer 052-87-8373, pointer ...
  • 48. How B+tree Indexes Impact Performance
  • 49. Why use B+tree? • B+tree is used for an obvious reason and that is speed. • As we know that there are space limitations when it comes to memory, and not all of the data can reside in memory, and hence majority of the data has to be saved on disk. • Disk as we know is a lot slower as compared to memory because it has moving parts. • So if there were no tree structure to do the lookup, then to find a value in a database, the DBMS would have to do a sequential scan of all the records. • Now imagine a data size of a billion rows, and you can clearly see that sequential scan is going to take very long. • But with B+tree, its possible to store a billion key values (with pointers to billion rows) at a height of 3, 4 or 5, so that every key lookup out of the billion keys is going to take 3, 4 or 5 disk accesses, which is a huge saving.
  • 50. This goes to show the effectiveness of a B+tree index, more than 16 million key values can be stored in a B+tree of height 1 and every key value can be accessed in exactly 2 lookups. How is B+tree structured? • B+trees are normally structured in such a way that the size of a node is chosen according to the page size. • Why? Because whenever data is accessed on disk, instead of reading a few bits, a whole page of data is read, because that is much cheaper. • Let us look at an example, Consider InnoDB whose page size is 16KB • and suppose we have an index on a integer column of size 4bytes • So a node can contain at most 16 * 1024 / 4 = 4096 keys, and a node can have at most 4097 children. • So for a B+tree of height 1, the root node has 4096 keys and the nodes at height 1 (the leaf nodes) have 4096 * 4097 = 16781312 key values.
  • 51. • So the size of the index values have a direct bearing on performance! How important is the size of the index values? As can be seen from the above example, the size of the index values plays a very important role for the following reasons: • The longer the index, the less number of values that can fit in a node, and hence the more the height of the B+tree. • The more the height of the tree, the more disk accesses are needed. • The more the disk accesses the less the performance.
  • 52. Index Design • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables. • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. Index design should take into account a number of considerations.
  • 53. Index Design • In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. • Due to the storage and sorting impacts, be sure to carefully determine the best column for this index. • The number of columns in the clustered (or non clustered) index can have significant performance implications with heavy INSERT, UPDATE and DELETE activity in your database. • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs).
  • 54. Identifying Fragmentation vs. page splits DBCC SHOWCONTIG Page 283
  • 55. Resolving Fragmentation Heap Tables: • For heap tables this is not as easy. The following are different options you can take to resolve the fragmentation: • Create a clustered index • Create a new table and insert data from the heap table into the new table based on some sort order • Export the data, truncate the table and import the data back into the table Clustered Tables: • Resolving the fragmentation for a clustered table can be done easily by rebuilding or reorganizing your clustered index. This was shown in this previous tip: SQL Server 2000 to 2005 Crosswalk - Index Rebuilds. DBCC DBREINDEX DBCC INDEXDEFRAG ( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } )
  • 57. Mahabubur Rahaman Senior Database Architect Orion Informatics Ltd