10 I S U G T E C H N I C A L J O U R N A L
Using direct update to utilize
datapage space more
effectively and improve
performance
Valapet Badri consults as a Sybase
architect, primarily for the financial
industry in New York and New Jersey.
He can be reached at 732-650-1702 or
valapet@iname.com.
T
he update algorithm used by the
Sybase SQL server is a sophisticated
mechanism designed to minimize
the cost of updating the server. It chooses
an update strategy based on constraints
imposed by the design and the data
values changed by the operation.
One of the primary goals of a
“physical database design” in Sybase
is to promote the use of the update
mechanism that is least expensive to the
SQL server. The update operation called
update-in-place is the least costly type,
performing data modification of a row by
changing the existing row in its existing
physical location. Update-in-place is
faster by magnitude when compared to
other types of updates. Benchmarks have
shown more than three-fold improve-
ment in performance in updates using
update-in-place with a 200-byte row table
with one clustered index and two non-
clustered indexes.
Description of the Problem
The basic definition of the problem is as
follows: If a row occupying N bytes of
space is changed by an update operation
to occupy N+n bytes (where n>0), the
database server needs to have a mecha-
nism to provide space for storing the
extra n bytes. This allocation works
under the design constraints imposed by
the database engine, in order to not
break the existing allocation structures
and the unit of storage. Other design
constraints are imposed to minimize the
cost involved in allocating and maintain-
ing the internal consistency of the
storage and the shuffling of existing rows.
Some of these constraints compete in
weightage and often are tangential to
one another.
Typically, the internal storage
mechanism needs to maintain the rows
within a page contiguously. In such cases,
the problem of allocating extra n bytes
becomes even more difficult. Let’s pre-
sume that Row 1 has been updated and
has increased by n bytes. In order to pro-
vide Row 1 with extra n bytes, Row 2 has
to be moved down n bytes. Thus, we see
that a change in Row 1 has introduced
an overhead of moving the subsequent
rows within a page (a page is the smallest
storage unit in Sybase).
Consider the case where the page is
almost full. The only way to provide the
extra space now would be to split the
existing page into two. If indexes are
defined on these rows and the rows get
shifted around, the update algorithm has
to maintain the consistency of the index
pages to reflect the changes made to its
datapages. This illustrates the often-com-
peting costs associated with determining
the optimum way to perform an update
operation.
Update Operations in Sybase SQL Server:
An Analysis of ASE 11.0, 11.5, and 11.9
By Valapet P. Badri
F I R S T Q U A R T E R 1 9 9 9 11
Constraints and Solutions
There are many ways to resolve these competing requirements
and costs. The simplest mechanism is to split a single update
operation into a delete operation of the existing row, followed
by an insert of the new row. This mechanism of changing the
data value reallocates the space needed by the updated (new)
rows every time. This is called a deferred update in Sybase
terminology.
The downside to this simplistic approach is that no
attempt is made to use the existing space occupied by the
deleted row. This introduces the overhead of freeing up space
occupied by the deleted row and reallocation of space for the
newly inserted row. Additionally, we run into transaction
concurrency overheads (locking issues) of any changed pages
between the delete and insert operation, overhead of copying
unchanged columns from the delete to insert buffer, and
subsequent high costs in terms of poor response times and
throughput.
In contrast, consider an update operation that writes
only the changes to an existing row. This is a less expensive
operation and has an improved transaction concurrency—and
it allows extra n bytes could be allocated within the existing
address space.
This type of update operation is called direct update-in-
place. Sybase version 10.0 and lower placed highly restrictive
conditions on performing an update-in-place operation.
However, later versions have been flexible enough to address
the performance impact by relaxing the rules for update-in-
place operation.
There are primarily two conditions under which a direct
update-in-place is not allowed:
x The updated row has columns that do not fit into existing
space (meaning that varchar column has increased in size
or the NULLable column has changed from a NULL
value to some finite value)
x There are certain rules in the server that explicitly require
a delete and an insert statement to be written to syslogs.
Replication server (LTM-Log transfer manager) requires
the update to be broken into a delete and insert operation
and written to the transaction log (syslogs) for its use in
replication to a remote system.
Let us now discuss the above-mentioned design constraints
introduced by the Sybase SQL Server.
NULL Storage Mechanism
A minimum unit of storage in a SQL server is by default a 2k
page (besides Stratus). This page contains the actual data and
control information about every row. The top of each page
contains the offset information of every row. The control
information includes the number of variable length columns
and the number of NULLs in a particular row.
Although NULL is supposedly an undefined quantity,
the storage needs to uniquely store some value for this
and retrieve it as NULL. Hence, the storage length of a
NULLable field varies from a couple of bytes to the length
of the data value, or, the default size of the fixed length
field. NULLable fixed-length data types like char (NULL) or
binary (NULL) inherently expand in size from a NULL value
to the fixed length value. Internally, the server treats these
NULLable columns as variable-length columns even if they
were defined as fixed length data types. (A detailed descrip-
tion of storage and handling of NULL values is beyond the
scope of this paper.)
How does all this affect the type of update operation?
When a NULL value is changed to a non-NULL value,
extra storage space must be allocated. Hence, an update in
place cannot be performed.
Varchar Fields
Similar arguments can be drawn in case of varchar fields, in
which the storage size of a varchar column changes with an
update. This inherently implies that a data modification
(update) operation cannot always be performed in place.
Instead, it carries the overhead of being logged as a separate
delete and insert operation.
Any Operation Using Syslogs
There are certain design constraints in server operation that
normally require an update operation to be logged in syslogs as
a delete and insert operation. There are two such operations:
1. Triggers: An update trigger is triggered on an update
operation to its parent table. Triggers work off deleted
rows and inserted rows, which are really views of syslogs
entries visible inside the update trigger. In such cases, an
update-in-place that writes an update record to the trans-
action log is not a possibility, as the update trigger requires
a deleted and inserted row to be written. However, other
direct-update types discussed later in this aricle are possible.
U P D A T E O P E R A T I O N S I N S Q L S E R V E R
12 I S U G T E C H N I C A L J O U R N A L
2. Replication Server: The Sybase replication server is
essentially designed as a log-based replication. The
replication server reads the transaction from the databases
(syslogs) and forwards it to the remote database. The
replication server component that reads the transaction
log and forwards it to other replication server components
is called Log Transfer Manager (LTM). LTM issues dbcc
commands to read the syslogs for any operation on the
replicated objects (such as tables or stored procedures).
In addition, it expects to see the update operations on
replicated tables, such as delete and insert operations.
These requirements directly affect the type of update
operation performed.
New Locking Mechanisms in ASE 11.9
In ASE 11.9, when working with tables that use page-level
locking, all pre-11.9 behavior of update operation described
later in this paper would be the default behavior. In ASE
11.9 terminology, it is called a Allpages locked table operation.
In this type of locking operation, the space allocation and
deallocation algorithm maintains the contiguous nature of a
page. When a row is deleted, it forces a reorganization of the
rows in the page. The other rows move up in the page so that
space is contiguously filled from the top of the page.
This type of reorganization can also be triggered by an
update that changes the space occupied by an existing row.
The top of every page contains an entry (besides other
control information) for the starting byte number of every
row, known as the offset number. The offset value changes
whenever a row is shifted around in the page.
From a locking perspective, in the Allpages locked table,
the datapages and the index pages are locked using an exclu-
sive page lock. This is transactional, meaning, it is held until
the end of a transaction. For an update operation to succeed,
both index and data pages need to obtain exclusive locks
before any changes to the pages are performed.
Typically, the total size of index columns is far less than
that of the datapages. Hence, an index page can store upwards
of 100 to 200 keys. Therefore, locking the index pages can
block access to all rows referenced by the index page. This
creates concurrency problems in index pages of an Allpages
locked table. The ASE 11.9 tries to address this problem by
providing a new locking called data-only locking mechanism
for the table. (For a more detailed discussion of this topic,
also see Michael Mamet’s article on page 2.)
Data-Only Locking
There are two new types of data-only locking mechanisms
introduced in ASE 11.9: datapage locking and data-row
locking. These directly affect the way in which direct and
deferred updates are performed.
Datapage locking: When a row needs to be changed, the
entire data page is locked. However, the index pages are not
locked. The changes to the index pages are performed using a
latch. Latches are non-transactional and provide a synchro-
nization mechanism used to guarantee the physical consisten-
cy of a page. In datapage locking, latches are applied only for
the duration of time required to insert or change the index
pages. In contrast, locks remain in effect for the duration of
the transaction. Latches also minimize transaction concurren-
cy problems by reducing the contention for the index pages.
Datarow locking: Data-row locking is essentially a row-
level locking mechanism. Row-level locks are acquired on
individual rows on datapages as opposed to entire datapages.
Index rows and pages are not locked. Instead, latches are used
when changes are made.
Deletes in Data-Only Locked Tables
Before we discuss the implication of these new mechanisms
on changing the values, let us examine the new delete mecha-
nism introduced in 11.9. A new logical delete mechanism has
been introduced to accommodate and improve the concurren-
cy in delete or changes to index page. Now, a delete of a row
does not immediately reorganize space on the page to make
the data or index rows contiguous. Instead, it sets a bit from
the row to indicate that the row has been logically deleted.
The actual data or index row is not physically deleted from
the page. This ensures an easier rollback mechanism.
To facilitate the new locking mechanism, the data-only
locked tables use a new storage method that keeps row IDs
same for the life of the row. These internal row IDs do not
change as a result of new inserts, page splits, or updates that
change row length.
Update Operations on Data-Only Locked Tables
If an update operation increases the length of a row so that it
no longer fits on the same page, the following now happens in
ASE 11.9:
x The row is inserted into a different page (forwarded row)
x A pointer to the row ID on the new page is stored in the
original location.
U P D A T E O P E R A T I O N S I N S Q L S E R V E R
F I R S T Q U A R T E R 1 9 9 9 13
This maintains the changed row’s original row ID, and the
index pointers do not change. When further changes are
made to a forwarded row, only the pointers at the original row
location are changed. This ensures that the forwarded row is
never more than one hop from its original location.
The concept of row forwarding introduces a change in the
way the clustered indexes are stored. A clustered index on a
data-only locked table uses a traditional (old) non-clustered
index structure. Therefore, the leaf-level page contains
pointers to the data itself.
These changes, improve the overall performance of
update operations by minimizing the number of I/O necessary.
The downside is that row forwarding may require additional
I/O during a select operation. The clustering of data on a
clustered index may be affected due to the relaxed storage
mechanism needed. This leads to increased I/O, both logical
and physical, on queries using clustered indexes.
This loosening of the strictly contiguous storage mechanism
introduces some administration overhead. The commands such
as reorg need to be executed periodically on high update and
delete transaction rate tables to avoid performance penalties
due to continued storage for logically deleted rows, as well as
forwarded rows.
The next section discusses the AllPage lock update opera-
tion and its various types. Note that in versions ASE 11.0 or
ASE 11.5 this terminology is not used—it is applicable only
in version ASE 11.9
Types of Updates in 11.0 and 11.5
SQL server 11.0 and ASE 11.5 support two basic type of
updates: direct updates and deferred. As discussed earlier, the
SQL server tries to perform a direct update first. If that does
not satisfy the criteria, then it checks for the conditions to
perform other types of updates. Deferred update is the most
expensive update operation performed by the server.
Direct Updates
These are single-pass operations, and may be one of three types:
1. In-place updates
2. Cheap direct updates
3. Expensive direct updates
In-Place Updates
The SQL server tries to perform an update-in-place as its first
choice among the various types of updates. Here, the change
is explicitly written first to the log as a modify (update)
statement. Then, the data values are modified directly in the
existing datapages. There may be additional records written
due to index delete and insert operations. The update does
not move any rows on the modified page. This implies that
the above operations are feasible if the following conditions
are met:
x Rows being changed must not change in length
x Column being updated cannot be a key or be a part of it
(including RI columns)
x Update statement does not include joins
x No Update triggers on the table or replication set-up for
the table
Briefly, let us explore the reasons behind the requirement that
update statements should not contain joins. Joins inherently
retrieve rows from two or more tables to determine the rows
that satisfy the join criteria. For every row satisfying the
search argument in the “outer table” (defined as the table the
join is pivoting from), the server examines the “inner table”
to determine the rows to be updated. This involves operating
on a result set of the outer table and using that to find the
rows affected in the inner tables. Such operation needs
logging as separate delete and insert operation.
Similar arguments about logging (delete and inserted
rows) may be made, because the column being updated
cannot be a key or be a part of it.
Cheap Direct Updates
When the SQL server cannot perform an update-in-place, it
tries to perform a cheap direct update. The server tries to fit
U P D A T E O P E R A T I O N S I N S Q L S E R V E R
YES
YES
YES
NO
NO
NO
Update Operation
Deferred Update Performed
Cheap Direct
Update Performed
Expensive Direct
Update Performed
Update-in-place
Performed
Deferred Update
Direct Update
Expensive Direct Update
Conditions Checked
Cheap Direct Update
Conditions Checked
Update-in-place
Condition Checked
Figure 1: Update Algorithm
14 I S U G T E C H N I C A L J O U R N A L
the changed row by moving the subsequent row in the same
page. This type of operation does involve logging of deleted
and inserted row to maintain the unit of work(transaction).
Compared to update-in-place, this operation has the following
conditions:
x The size of the updated row has changed (as in variable-
length columns)
x No restrictions on triggers and replication server as the
insert and delete records are written to the log.
This type of update is as fast as update-in-place, with respect
to the total I/O operation performed. However, it involves a
little more processing compared to update-in-place. Changes
to index keys are handled as in-place updates.
Expensive Direct Updates
This is the third option available for the SQL server. The SQL
server performs an update which involves moving the data
rows to a different page. The clustered index key therefore
needs to change, and the data and index rows must be deleted
from their existing location and inserted as changed rows into
the new location. The following conditions need to be met:
x Data page split on a update due to longer row size
x Index used to find the row is not changed
x Usual restriction on joins in the update statement and RI
Expensive direct update is the third-fastest update considered
by the SQL server. The log records are not re-scanned, as in
deferred update. Hence, it is less expensive than deferred
updates. Full delete and insert of data and index rows are
written to syslogs, so that replication and trigger restrictions
do not apply.
Deferred Updates
This type of update is performed when an update operation
does not qualify the conditions to perform any other direct
update operations. The following conditions apply to all such
cases:
x Updates with join clause in where
x Changes to columns used for RI
The discussion of constraints in space allocation and steps
involved in making changes to data pages are also applicable
for index updates (known as deferred index). Deferred index
inserts are triggered by updates that change the index used to
find a row or change the value in a unique index. These
updates to indexes tend to move the index pages and force a
deferred operation to be performed. This is the most expen-
sive type of update operation performed by the SQL server.
The Costs of a Deferred Update
Deferred update occurs in two phases:
Phase 1
1. Server fetches the qualifying rows and pages and writes
them to the transaction log.
2. Server scans the log for the qualified rows and pages and
deletes the data pages and index pages.
Phase 2
1. Server re-scans the log records for the transaction and
performs the inserts to data pages and index rows.
2. On a commit operation, the transaction log is flushed.
Let us consider the cost involved in doing a deferred
update operation of a table with four indexes. For each data
row changed, a total of four log records need to be written
(once to determine the qualifying rows, once to re-fetch the
data pages and write changes to the log, once to modify the
data pages).
x For four indexes, eight log records need to be written to
syslogs
x 24 extra locks need to be applied for index traversal
(presuming three-level deep index)
x Multiple log pages need to be traversed to find the pages
that must be applied to the table
Design Guidelines
What can be done to get by all these restrictions to promote
the use of direct update?
x Create at least one unique index on the table
x Promote the use of non-key columns in where clause
when updating a different key
x When defining tables, use NOT NULL values for columns
whenever feasible
Conclusion
In summary, there are more choices of update operation for
the SQL server in ASE 11.9. We have the older update types
on an allpage locked table, and newer mechanisms on data-
only locked tables. The one best suited for a particular table
needs to be looked at, not only from the perspective of its
update performance, but also from an overall transaction
profile (insert, deletes, and updates on a table) of a client
application. This helps to ensure overall performance gains
instead of skewed results. s
U P D A T E O P E R A T I O N S I N S Q L S E R V E R

More Related Content

PDF
Whitepaper Performance Tuning using Upsert and SCD (Task Factory)
PDF
Best Practices in the Use of Columnar Databases
PPTX
Maryna Popova "Deep dive AWS Redshift"
PDF
Rails DB migrations
PDF
DB2 LUW - Backup and Recovery
PDF
IBM DB2 for z/OS Administration Basics
 
PPTX
SKILLWISE-DB2 DBA
PDF
Whitepaper Performance Tuning using Upsert and SCD (Task Factory)
Best Practices in the Use of Columnar Databases
Maryna Popova "Deep dive AWS Redshift"
Rails DB migrations
DB2 LUW - Backup and Recovery
IBM DB2 for z/OS Administration Basics
 
SKILLWISE-DB2 DBA

What's hot (18)

PDF
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
PDF
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
PDF
MySQL Overview
PPT
Introduction To Maxtable
PPT
8. column oriented databases
PPT
DB2UDB_the_Basics Day 4
PDF
Write intensive workloads and lsm trees
PPTX
The design and implementation of modern column oriented databases
PDF
How to Fine-Tune Performance Using Amazon Redshift
PDF
Db2 performance tuning for dummies
PPT
datastage training | datastage online training | datastage training videos | ...
PPTX
Tuning Apache Phoenix/HBase
PDF
Intro to HBase Internals & Schema Design (for HBase users)
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PDF
Identify SQL Tuning Opportunities
PPTX
SQL Server 2014 In-Memory OLTP
PPTX
Ibm db2
PDF
Ycsb benchmarking
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
MySQL Overview
Introduction To Maxtable
8. column oriented databases
DB2UDB_the_Basics Day 4
Write intensive workloads and lsm trees
The design and implementation of modern column oriented databases
How to Fine-Tune Performance Using Amazon Redshift
Db2 performance tuning for dummies
datastage training | datastage online training | datastage training videos | ...
Tuning Apache Phoenix/HBase
Intro to HBase Internals & Schema Design (for HBase users)
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Identify SQL Tuning Opportunities
SQL Server 2014 In-Memory OLTP
Ibm db2
Ycsb benchmarking
Ad

Similar to UPD_OP_SQL (20)

PPTX
Lec 4 Recovery in database management system.pptx
PPT
Manipulating data
PPTX
Chapter22 database security in dbms.pptx
PDF
The Science of DBMS: Data Storage & Organization
PPT
PDF
Ijetr012023
PPTX
2. DML_INSERT_DELETE_UPDATE
PPT
Sql DML
PPT
Sql DML
PPT
SQL WORKSHOP::Lecture 9
PDF
Database recovery techniques
PPT
Intro to tsql unit 7
PPT
Less08 Schema
PPTX
LECTURE 11-Backup and Recovery - Dr Preeti Aggarwal.pptx
PDF
Chapter1.0 database management system
PPT
ch 5 Daatabase Recovery.ppt
PDF
DBMS Vardhaman.pdf
PPT
e computer notes - Manipulating data
PPTX
2. DBMS Experiment - Lab 2 Made in SQL Used
Lec 4 Recovery in database management system.pptx
Manipulating data
Chapter22 database security in dbms.pptx
The Science of DBMS: Data Storage & Organization
Ijetr012023
2. DML_INSERT_DELETE_UPDATE
Sql DML
Sql DML
SQL WORKSHOP::Lecture 9
Database recovery techniques
Intro to tsql unit 7
Less08 Schema
LECTURE 11-Backup and Recovery - Dr Preeti Aggarwal.pptx
Chapter1.0 database management system
ch 5 Daatabase Recovery.ppt
DBMS Vardhaman.pdf
e computer notes - Manipulating data
2. DBMS Experiment - Lab 2 Made in SQL Used
Ad

UPD_OP_SQL

  • 1. 10 I S U G T E C H N I C A L J O U R N A L Using direct update to utilize datapage space more effectively and improve performance Valapet Badri consults as a Sybase architect, primarily for the financial industry in New York and New Jersey. He can be reached at 732-650-1702 or valapet@iname.com. T he update algorithm used by the Sybase SQL server is a sophisticated mechanism designed to minimize the cost of updating the server. It chooses an update strategy based on constraints imposed by the design and the data values changed by the operation. One of the primary goals of a “physical database design” in Sybase is to promote the use of the update mechanism that is least expensive to the SQL server. The update operation called update-in-place is the least costly type, performing data modification of a row by changing the existing row in its existing physical location. Update-in-place is faster by magnitude when compared to other types of updates. Benchmarks have shown more than three-fold improve- ment in performance in updates using update-in-place with a 200-byte row table with one clustered index and two non- clustered indexes. Description of the Problem The basic definition of the problem is as follows: If a row occupying N bytes of space is changed by an update operation to occupy N+n bytes (where n>0), the database server needs to have a mecha- nism to provide space for storing the extra n bytes. This allocation works under the design constraints imposed by the database engine, in order to not break the existing allocation structures and the unit of storage. Other design constraints are imposed to minimize the cost involved in allocating and maintain- ing the internal consistency of the storage and the shuffling of existing rows. Some of these constraints compete in weightage and often are tangential to one another. Typically, the internal storage mechanism needs to maintain the rows within a page contiguously. In such cases, the problem of allocating extra n bytes becomes even more difficult. Let’s pre- sume that Row 1 has been updated and has increased by n bytes. In order to pro- vide Row 1 with extra n bytes, Row 2 has to be moved down n bytes. Thus, we see that a change in Row 1 has introduced an overhead of moving the subsequent rows within a page (a page is the smallest storage unit in Sybase). Consider the case where the page is almost full. The only way to provide the extra space now would be to split the existing page into two. If indexes are defined on these rows and the rows get shifted around, the update algorithm has to maintain the consistency of the index pages to reflect the changes made to its datapages. This illustrates the often-com- peting costs associated with determining the optimum way to perform an update operation. Update Operations in Sybase SQL Server: An Analysis of ASE 11.0, 11.5, and 11.9 By Valapet P. Badri
  • 2. F I R S T Q U A R T E R 1 9 9 9 11 Constraints and Solutions There are many ways to resolve these competing requirements and costs. The simplest mechanism is to split a single update operation into a delete operation of the existing row, followed by an insert of the new row. This mechanism of changing the data value reallocates the space needed by the updated (new) rows every time. This is called a deferred update in Sybase terminology. The downside to this simplistic approach is that no attempt is made to use the existing space occupied by the deleted row. This introduces the overhead of freeing up space occupied by the deleted row and reallocation of space for the newly inserted row. Additionally, we run into transaction concurrency overheads (locking issues) of any changed pages between the delete and insert operation, overhead of copying unchanged columns from the delete to insert buffer, and subsequent high costs in terms of poor response times and throughput. In contrast, consider an update operation that writes only the changes to an existing row. This is a less expensive operation and has an improved transaction concurrency—and it allows extra n bytes could be allocated within the existing address space. This type of update operation is called direct update-in- place. Sybase version 10.0 and lower placed highly restrictive conditions on performing an update-in-place operation. However, later versions have been flexible enough to address the performance impact by relaxing the rules for update-in- place operation. There are primarily two conditions under which a direct update-in-place is not allowed: x The updated row has columns that do not fit into existing space (meaning that varchar column has increased in size or the NULLable column has changed from a NULL value to some finite value) x There are certain rules in the server that explicitly require a delete and an insert statement to be written to syslogs. Replication server (LTM-Log transfer manager) requires the update to be broken into a delete and insert operation and written to the transaction log (syslogs) for its use in replication to a remote system. Let us now discuss the above-mentioned design constraints introduced by the Sybase SQL Server. NULL Storage Mechanism A minimum unit of storage in a SQL server is by default a 2k page (besides Stratus). This page contains the actual data and control information about every row. The top of each page contains the offset information of every row. The control information includes the number of variable length columns and the number of NULLs in a particular row. Although NULL is supposedly an undefined quantity, the storage needs to uniquely store some value for this and retrieve it as NULL. Hence, the storage length of a NULLable field varies from a couple of bytes to the length of the data value, or, the default size of the fixed length field. NULLable fixed-length data types like char (NULL) or binary (NULL) inherently expand in size from a NULL value to the fixed length value. Internally, the server treats these NULLable columns as variable-length columns even if they were defined as fixed length data types. (A detailed descrip- tion of storage and handling of NULL values is beyond the scope of this paper.) How does all this affect the type of update operation? When a NULL value is changed to a non-NULL value, extra storage space must be allocated. Hence, an update in place cannot be performed. Varchar Fields Similar arguments can be drawn in case of varchar fields, in which the storage size of a varchar column changes with an update. This inherently implies that a data modification (update) operation cannot always be performed in place. Instead, it carries the overhead of being logged as a separate delete and insert operation. Any Operation Using Syslogs There are certain design constraints in server operation that normally require an update operation to be logged in syslogs as a delete and insert operation. There are two such operations: 1. Triggers: An update trigger is triggered on an update operation to its parent table. Triggers work off deleted rows and inserted rows, which are really views of syslogs entries visible inside the update trigger. In such cases, an update-in-place that writes an update record to the trans- action log is not a possibility, as the update trigger requires a deleted and inserted row to be written. However, other direct-update types discussed later in this aricle are possible. U P D A T E O P E R A T I O N S I N S Q L S E R V E R
  • 3. 12 I S U G T E C H N I C A L J O U R N A L 2. Replication Server: The Sybase replication server is essentially designed as a log-based replication. The replication server reads the transaction from the databases (syslogs) and forwards it to the remote database. The replication server component that reads the transaction log and forwards it to other replication server components is called Log Transfer Manager (LTM). LTM issues dbcc commands to read the syslogs for any operation on the replicated objects (such as tables or stored procedures). In addition, it expects to see the update operations on replicated tables, such as delete and insert operations. These requirements directly affect the type of update operation performed. New Locking Mechanisms in ASE 11.9 In ASE 11.9, when working with tables that use page-level locking, all pre-11.9 behavior of update operation described later in this paper would be the default behavior. In ASE 11.9 terminology, it is called a Allpages locked table operation. In this type of locking operation, the space allocation and deallocation algorithm maintains the contiguous nature of a page. When a row is deleted, it forces a reorganization of the rows in the page. The other rows move up in the page so that space is contiguously filled from the top of the page. This type of reorganization can also be triggered by an update that changes the space occupied by an existing row. The top of every page contains an entry (besides other control information) for the starting byte number of every row, known as the offset number. The offset value changes whenever a row is shifted around in the page. From a locking perspective, in the Allpages locked table, the datapages and the index pages are locked using an exclu- sive page lock. This is transactional, meaning, it is held until the end of a transaction. For an update operation to succeed, both index and data pages need to obtain exclusive locks before any changes to the pages are performed. Typically, the total size of index columns is far less than that of the datapages. Hence, an index page can store upwards of 100 to 200 keys. Therefore, locking the index pages can block access to all rows referenced by the index page. This creates concurrency problems in index pages of an Allpages locked table. The ASE 11.9 tries to address this problem by providing a new locking called data-only locking mechanism for the table. (For a more detailed discussion of this topic, also see Michael Mamet’s article on page 2.) Data-Only Locking There are two new types of data-only locking mechanisms introduced in ASE 11.9: datapage locking and data-row locking. These directly affect the way in which direct and deferred updates are performed. Datapage locking: When a row needs to be changed, the entire data page is locked. However, the index pages are not locked. The changes to the index pages are performed using a latch. Latches are non-transactional and provide a synchro- nization mechanism used to guarantee the physical consisten- cy of a page. In datapage locking, latches are applied only for the duration of time required to insert or change the index pages. In contrast, locks remain in effect for the duration of the transaction. Latches also minimize transaction concurren- cy problems by reducing the contention for the index pages. Datarow locking: Data-row locking is essentially a row- level locking mechanism. Row-level locks are acquired on individual rows on datapages as opposed to entire datapages. Index rows and pages are not locked. Instead, latches are used when changes are made. Deletes in Data-Only Locked Tables Before we discuss the implication of these new mechanisms on changing the values, let us examine the new delete mecha- nism introduced in 11.9. A new logical delete mechanism has been introduced to accommodate and improve the concurren- cy in delete or changes to index page. Now, a delete of a row does not immediately reorganize space on the page to make the data or index rows contiguous. Instead, it sets a bit from the row to indicate that the row has been logically deleted. The actual data or index row is not physically deleted from the page. This ensures an easier rollback mechanism. To facilitate the new locking mechanism, the data-only locked tables use a new storage method that keeps row IDs same for the life of the row. These internal row IDs do not change as a result of new inserts, page splits, or updates that change row length. Update Operations on Data-Only Locked Tables If an update operation increases the length of a row so that it no longer fits on the same page, the following now happens in ASE 11.9: x The row is inserted into a different page (forwarded row) x A pointer to the row ID on the new page is stored in the original location. U P D A T E O P E R A T I O N S I N S Q L S E R V E R
  • 4. F I R S T Q U A R T E R 1 9 9 9 13 This maintains the changed row’s original row ID, and the index pointers do not change. When further changes are made to a forwarded row, only the pointers at the original row location are changed. This ensures that the forwarded row is never more than one hop from its original location. The concept of row forwarding introduces a change in the way the clustered indexes are stored. A clustered index on a data-only locked table uses a traditional (old) non-clustered index structure. Therefore, the leaf-level page contains pointers to the data itself. These changes, improve the overall performance of update operations by minimizing the number of I/O necessary. The downside is that row forwarding may require additional I/O during a select operation. The clustering of data on a clustered index may be affected due to the relaxed storage mechanism needed. This leads to increased I/O, both logical and physical, on queries using clustered indexes. This loosening of the strictly contiguous storage mechanism introduces some administration overhead. The commands such as reorg need to be executed periodically on high update and delete transaction rate tables to avoid performance penalties due to continued storage for logically deleted rows, as well as forwarded rows. The next section discusses the AllPage lock update opera- tion and its various types. Note that in versions ASE 11.0 or ASE 11.5 this terminology is not used—it is applicable only in version ASE 11.9 Types of Updates in 11.0 and 11.5 SQL server 11.0 and ASE 11.5 support two basic type of updates: direct updates and deferred. As discussed earlier, the SQL server tries to perform a direct update first. If that does not satisfy the criteria, then it checks for the conditions to perform other types of updates. Deferred update is the most expensive update operation performed by the server. Direct Updates These are single-pass operations, and may be one of three types: 1. In-place updates 2. Cheap direct updates 3. Expensive direct updates In-Place Updates The SQL server tries to perform an update-in-place as its first choice among the various types of updates. Here, the change is explicitly written first to the log as a modify (update) statement. Then, the data values are modified directly in the existing datapages. There may be additional records written due to index delete and insert operations. The update does not move any rows on the modified page. This implies that the above operations are feasible if the following conditions are met: x Rows being changed must not change in length x Column being updated cannot be a key or be a part of it (including RI columns) x Update statement does not include joins x No Update triggers on the table or replication set-up for the table Briefly, let us explore the reasons behind the requirement that update statements should not contain joins. Joins inherently retrieve rows from two or more tables to determine the rows that satisfy the join criteria. For every row satisfying the search argument in the “outer table” (defined as the table the join is pivoting from), the server examines the “inner table” to determine the rows to be updated. This involves operating on a result set of the outer table and using that to find the rows affected in the inner tables. Such operation needs logging as separate delete and insert operation. Similar arguments about logging (delete and inserted rows) may be made, because the column being updated cannot be a key or be a part of it. Cheap Direct Updates When the SQL server cannot perform an update-in-place, it tries to perform a cheap direct update. The server tries to fit U P D A T E O P E R A T I O N S I N S Q L S E R V E R YES YES YES NO NO NO Update Operation Deferred Update Performed Cheap Direct Update Performed Expensive Direct Update Performed Update-in-place Performed Deferred Update Direct Update Expensive Direct Update Conditions Checked Cheap Direct Update Conditions Checked Update-in-place Condition Checked Figure 1: Update Algorithm
  • 5. 14 I S U G T E C H N I C A L J O U R N A L the changed row by moving the subsequent row in the same page. This type of operation does involve logging of deleted and inserted row to maintain the unit of work(transaction). Compared to update-in-place, this operation has the following conditions: x The size of the updated row has changed (as in variable- length columns) x No restrictions on triggers and replication server as the insert and delete records are written to the log. This type of update is as fast as update-in-place, with respect to the total I/O operation performed. However, it involves a little more processing compared to update-in-place. Changes to index keys are handled as in-place updates. Expensive Direct Updates This is the third option available for the SQL server. The SQL server performs an update which involves moving the data rows to a different page. The clustered index key therefore needs to change, and the data and index rows must be deleted from their existing location and inserted as changed rows into the new location. The following conditions need to be met: x Data page split on a update due to longer row size x Index used to find the row is not changed x Usual restriction on joins in the update statement and RI Expensive direct update is the third-fastest update considered by the SQL server. The log records are not re-scanned, as in deferred update. Hence, it is less expensive than deferred updates. Full delete and insert of data and index rows are written to syslogs, so that replication and trigger restrictions do not apply. Deferred Updates This type of update is performed when an update operation does not qualify the conditions to perform any other direct update operations. The following conditions apply to all such cases: x Updates with join clause in where x Changes to columns used for RI The discussion of constraints in space allocation and steps involved in making changes to data pages are also applicable for index updates (known as deferred index). Deferred index inserts are triggered by updates that change the index used to find a row or change the value in a unique index. These updates to indexes tend to move the index pages and force a deferred operation to be performed. This is the most expen- sive type of update operation performed by the SQL server. The Costs of a Deferred Update Deferred update occurs in two phases: Phase 1 1. Server fetches the qualifying rows and pages and writes them to the transaction log. 2. Server scans the log for the qualified rows and pages and deletes the data pages and index pages. Phase 2 1. Server re-scans the log records for the transaction and performs the inserts to data pages and index rows. 2. On a commit operation, the transaction log is flushed. Let us consider the cost involved in doing a deferred update operation of a table with four indexes. For each data row changed, a total of four log records need to be written (once to determine the qualifying rows, once to re-fetch the data pages and write changes to the log, once to modify the data pages). x For four indexes, eight log records need to be written to syslogs x 24 extra locks need to be applied for index traversal (presuming three-level deep index) x Multiple log pages need to be traversed to find the pages that must be applied to the table Design Guidelines What can be done to get by all these restrictions to promote the use of direct update? x Create at least one unique index on the table x Promote the use of non-key columns in where clause when updating a different key x When defining tables, use NOT NULL values for columns whenever feasible Conclusion In summary, there are more choices of update operation for the SQL server in ASE 11.9. We have the older update types on an allpage locked table, and newer mechanisms on data- only locked tables. The one best suited for a particular table needs to be looked at, not only from the perspective of its update performance, but also from an overall transaction profile (insert, deletes, and updates on a table) of a client application. This helps to ensure overall performance gains instead of skewed results. s U P D A T E O P E R A T I O N S I N S Q L S E R V E R