UPD_OP_SQL

10 I S U G T E C H N I C A L J O U R N A L
Using direct update to utilize
datapage space more
effectively and improve
performance
Valapet Badri consults as a Sybase
architect, primarily for the financial
industry in New York and New Jersey.
He can be reached at 732-650-1702 or
valapet@iname.com.
T
he update algorithm used by the
Sybase SQL server is a sophisticated
mechanism designed to minimize
the cost of updating the server. It chooses
an update strategy based on constraints
imposed by the design and the data
values changed by the operation.
One of the primary goals of a
“physical database design” in Sybase
is to promote the use of the update
mechanism that is least expensive to the
SQL server. The update operation called
update-in-place is the least costly type,
performing data modification of a row by
changing the existing row in its existing
physical location. Update-in-place is
faster by magnitude when compared to
other types of updates. Benchmarks have
shown more than three-fold improve-
ment in performance in updates using
update-in-place with a 200-byte row table
with one clustered index and two non-
clustered indexes.
Description of the Problem
The basic definition of the problem is as
follows: If a row occupying N bytes of
space is changed by an update operation
to occupy N+n bytes (where n>0), the
database server needs to have a mecha-
nism to provide space for storing the
extra n bytes. This allocation works
under the design constraints imposed by
the database engine, in order to not
break the existing allocation structures
and the unit of storage. Other design
constraints are imposed to minimize the
cost involved in allocating and maintain-
ing the internal consistency of the
storage and the shuffling of existing rows.
Some of these constraints compete in
weightage and often are tangential to
one another.
Typically, the internal storage
mechanism needs to maintain the rows
within a page contiguously. In such cases,
the problem of allocating extra n bytes
becomes even more difficult. Let’s pre-
sume that Row 1 has been updated and
has increased by n bytes. In order to pro-
vide Row 1 with extra n bytes, Row 2 has
to be moved down n bytes. Thus, we see
that a change in Row 1 has introduced
an overhead of moving the subsequent
rows within a page (a page is the smallest
storage unit in Sybase).
Consider the case where the page is
almost full. The only way to provide the
extra space now would be to split the
existing page into two. If indexes are
defined on these rows and the rows get
shifted around, the update algorithm has
to maintain the consistency of the index
pages to reflect the changes made to its
datapages. This illustrates the often-com-
peting costs associated with determining
the optimum way to perform an update
operation.
Update Operations in Sybase SQL Server:
An Analysis of ASE 11.0, 11.5, and 11.9
By Valapet P. Badri

F I R S T Q U A R T E R 1 9 9 9 11
Constraints and Solutions
There are many ways to resolve these competing requirements
and costs. The simplest mechanism is to split a single update
operation into a delete operation of the existing row, followed
by an insert of the new row. This mechanism of changing the
data value reallocates the space needed by the updated (new)
rows every time. This is called a deferred update in Sybase
terminology.
The downside to this simplistic approach is that no
attempt is made to use the existing space occupied by the
deleted row. This introduces the overhead of freeing up space
occupied by the deleted row and reallocation of space for the
newly inserted row. Additionally, we run into transaction
concurrency overheads (locking issues) of any changed pages
between the delete and insert operation, overhead of copying
unchanged columns from the delete to insert buffer, and
subsequent high costs in terms of poor response times and
throughput.
In contrast, consider an update operation that writes
only the changes to an existing row. This is a less expensive
operation and has an improved transaction concurrency—and
it allows extra n bytes could be allocated within the existing
address space.
This type of update operation is called direct update-in-
place. Sybase version 10.0 and lower placed highly restrictive
conditions on performing an update-in-place operation.
However, later versions have been flexible enough to address
the performance impact by relaxing the rules for update-in-
place operation.
There are primarily two conditions under which a direct
update-in-place is not allowed:
x The updated row has columns that do not fit into existing
space (meaning that varchar column has increased in size
or the NULLable column has changed from a NULL
value to some finite value)
x There are certain rules in the server that explicitly require
a delete and an insert statement to be written to syslogs.
Replication server (LTM-Log transfer manager) requires
the update to be broken into a delete and insert operation
and written to the transaction log (syslogs) for its use in
replication to a remote system.
Let us now discuss the above-mentioned design constraints
introduced by the Sybase SQL Server.
NULL Storage Mechanism
A minimum unit of storage in a SQL server is by default a 2k
page (besides Stratus). This page contains the actual data and
control information about every row. The top of each page
contains the offset information of every row. The control
information includes the number of variable length columns
and the number of NULLs in a particular row.
Although NULL is supposedly an undefined quantity,
the storage needs to uniquely store some value for this
and retrieve it as NULL. Hence, the storage length of a
NULLable field varies from a couple of bytes to the length
of the data value, or, the default size of the fixed length
field. NULLable fixed-length data types like char (NULL) or
binary (NULL) inherently expand in size from a NULL value
to the fixed length value. Internally, the server treats these
NULLable columns as variable-length columns even if they
were defined as fixed length data types. (A detailed descrip-
tion of storage and handling of NULL values is beyond the
scope of this paper.)
How does all this affect the type of update operation?
When a NULL value is changed to a non-NULL value,
extra storage space must be allocated. Hence, an update in
place cannot be performed.
Varchar Fields
Similar arguments can be drawn in case of varchar fields, in
which the storage size of a varchar column changes with an
update. This inherently implies that a data modification
(update) operation cannot always be performed in place.
Instead, it carries the overhead of being logged as a separate
delete and insert operation.
Any Operation Using Syslogs
There are certain design constraints in server operation that
normally require an update operation to be logged in syslogs as
a delete and insert operation. There are two such operations:
1. Triggers: An update trigger is triggered on an update
operation to its parent table. Triggers work off deleted
rows and inserted rows, which are really views of syslogs
entries visible inside the update trigger. In such cases, an
update-in-place that writes an update record to the trans-
action log is not a possibility, as the update trigger requires
a deleted and inserted row to be written. However, other
direct-update types discussed later in this aricle are possible.
U P D A T E O P E R A T I O N S I N S Q L S E R V E R

2. Replication Server: The Sybase replication server is
essentially designed as a log-based replication. The
replication server reads the transaction from the databases
(syslogs) and forwards it to the remote database. The
replication server component that reads the transaction
log and forwards it to other replication server components
is called Log Transfer Manager (LTM). LTM issues dbcc
commands to read the syslogs for any operation on the
replicated objects (such as tables or stored procedures).
In addition, it expects to see the update operations on
replicated tables, such as delete and insert operations.
These requirements directly affect the type of update
operation performed.
New Locking Mechanisms in ASE 11.9
In ASE 11.9, when working with tables that use page-level
locking, all pre-11.9 behavior of update operation described
later in this paper would be the default behavior. In ASE
11.9 terminology, it is called a Allpages locked table operation.
In this type of locking operation, the space allocation and
deallocation algorithm maintains the contiguous nature of a
page. When a row is deleted, it forces a reorganization of the
rows in the page. The other rows move up in the page so that
space is contiguously filled from the top of the page.
This type of reorganization can also be triggered by an
update that changes the space occupied by an existing row.
The top of every page contains an entry (besides other
control information) for the starting byte number of every
row, known as the offset number. The offset value changes
whenever a row is shifted around in the page.
From a locking perspective, in the Allpages locked table,
the datapages and the index pages are locked using an exclu-
sive page lock. This is transactional, meaning, it is held until
the end of a transaction. For an update operation to succeed,
both index and data pages need to obtain exclusive locks
before any changes to the pages are performed.
Typically, the total size of index columns is far less than
that of the datapages. Hence, an index page can store upwards
of 100 to 200 keys. Therefore, locking the index pages can
block access to all rows referenced by the index page. This
creates concurrency problems in index pages of an Allpages
locked table. The ASE 11.9 tries to address this problem by
providing a new locking called data-only locking mechanism
for the table. (For a more detailed discussion of this topic,
also see Michael Mamet’s article on page 2.)
Data-Only Locking
There are two new types of data-only locking mechanisms
introduced in ASE 11.9: datapage locking and data-row
locking. These directly affect the way in which direct and
deferred updates are performed.
Datapage locking: When a row needs to be changed, the
entire data page is locked. However, the index pages are not
locked. The changes to the index pages are performed using a
latch. Latches are non-transactional and provide a synchro-
nization mechanism used to guarantee the physical consisten-
cy of a page. In datapage locking, latches are applied only for
the duration of time required to insert or change the index
pages. In contrast, locks remain in effect for the duration of
the transaction. Latches also minimize transaction concurren-
cy problems by reducing the contention for the index pages.
Datarow locking: Data-row locking is essentially a row-
level locking mechanism. Row-level locks are acquired on
individual rows on datapages as opposed to entire datapages.
Index rows and pages are not locked. Instead, latches are used
when changes are made.
Deletes in Data-Only Locked Tables
Before we discuss the implication of these new mechanisms
on changing the values, let us examine the new delete mecha-
nism introduced in 11.9. A new logical delete mechanism has
been introduced to accommodate and improve the concurren-
cy in delete or changes to index page. Now, a delete of a row
does not immediately reorganize space on the page to make
the data or index rows contiguous. Instead, it sets a bit from
the row to indicate that the row has been logically deleted.
The actual data or index row is not physically deleted from
the page. This ensures an easier rollback mechanism.
To facilitate the new locking mechanism, the data-only
locked tables use a new storage method that keeps row IDs
same for the life of the row. These internal row IDs do not
change as a result of new inserts, page splits, or updates that
change row length.
Update Operations on Data-Only Locked Tables
If an update operation increases the length of a row so that it
no longer fits on the same page, the following now happens in
ASE 11.9:
x The row is inserted into a different page (forwarded row)
x A pointer to the row ID on the new page is stored in the
original location.

F I R S T Q U A R T E R 1 9 9 9 13
This maintains the changed row’s original row ID, and the
index pointers do not change. When further changes are
made to a forwarded row, only the pointers at the original row
location are changed. This ensures that the forwarded row is
never more than one hop from its original location.
The concept of row forwarding introduces a change in the
way the clustered indexes are stored. A clustered index on a
data-only locked table uses a traditional (old) non-clustered
index structure. Therefore, the leaf-level page contains
pointers to the data itself.
These changes, improve the overall performance of
update operations by minimizing the number of I/O necessary.
The downside is that row forwarding may require additional
I/O during a select operation. The clustering of data on a
clustered index may be affected due to the relaxed storage
mechanism needed. This leads to increased I/O, both logical
and physical, on queries using clustered indexes.
This loosening of the strictly contiguous storage mechanism
introduces some administration overhead. The commands such
as reorg need to be executed periodically on high update and
delete transaction rate tables to avoid performance penalties
due to continued storage for logically deleted rows, as well as
forwarded rows.
The next section discusses the AllPage lock update opera-
tion and its various types. Note that in versions ASE 11.0 or
ASE 11.5 this terminology is not used—it is applicable only
in version ASE 11.9
Types of Updates in 11.0 and 11.5
SQL server 11.0 and ASE 11.5 support two basic type of
updates: direct updates and deferred. As discussed earlier, the
SQL server tries to perform a direct update first. If that does
not satisfy the criteria, then it checks for the conditions to
perform other types of updates. Deferred update is the most
expensive update operation performed by the server.
Direct Updates
These are single-pass operations, and may be one of three types:
1. In-place updates
2. Cheap direct updates
3. Expensive direct updates
In-Place Updates
The SQL server tries to perform an update-in-place as its first
choice among the various types of updates. Here, the change
is explicitly written first to the log as a modify (update)
statement. Then, the data values are modified directly in the
existing datapages. There may be additional records written
due to index delete and insert operations. The update does
not move any rows on the modified page. This implies that
the above operations are feasible if the following conditions
are met:
x Rows being changed must not change in length
x Column being updated cannot be a key or be a part of it
(including RI columns)
x Update statement does not include joins
x No Update triggers on the table or replication set-up for
the table
Briefly, let us explore the reasons behind the requirement that
update statements should not contain joins. Joins inherently
retrieve rows from two or more tables to determine the rows
that satisfy the join criteria. For every row satisfying the
search argument in the “outer table” (defined as the table the
join is pivoting from), the server examines the “inner table”
to determine the rows to be updated. This involves operating
on a result set of the outer table and using that to find the
rows affected in the inner tables. Such operation needs
logging as separate delete and insert operation.
Similar arguments about logging (delete and inserted
rows) may be made, because the column being updated
cannot be a key or be a part of it.
Cheap Direct Updates
When the SQL server cannot perform an update-in-place, it
tries to perform a cheap direct update. The server tries to fit
YES
YES
YES
NO
NO
NO
Update Operation
Deferred Update Performed
Cheap Direct
Update Performed
Expensive Direct
Update Performed
Update-in-place
Performed
Deferred Update
Direct Update
Expensive Direct Update
Conditions Checked
Cheap Direct Update
Conditions Checked
Update-in-place
Condition Checked
Figure 1: Update Algorithm

the changed row by moving the subsequent row in the same
page. This type of operation does involve logging of deleted
and inserted row to maintain the unit of work(transaction).
Compared to update-in-place, this operation has the following
conditions:
x The size of the updated row has changed (as in variable-
length columns)
x No restrictions on triggers and replication server as the
insert and delete records are written to the log.
This type of update is as fast as update-in-place, with respect
to the total I/O operation performed. However, it involves a
little more processing compared to update-in-place. Changes
to index keys are handled as in-place updates.
Expensive Direct Updates
This is the third option available for the SQL server. The SQL
server performs an update which involves moving the data
rows to a different page. The clustered index key therefore
needs to change, and the data and index rows must be deleted
from their existing location and inserted as changed rows into
the new location. The following conditions need to be met:
x Data page split on a update due to longer row size
x Index used to find the row is not changed
x Usual restriction on joins in the update statement and RI
Expensive direct update is the third-fastest update considered
by the SQL server. The log records are not re-scanned, as in
deferred update. Hence, it is less expensive than deferred
updates. Full delete and insert of data and index rows are
written to syslogs, so that replication and trigger restrictions
do not apply.
Deferred Updates
This type of update is performed when an update operation
does not qualify the conditions to perform any other direct
update operations. The following conditions apply to all such
cases:
x Updates with join clause in where
x Changes to columns used for RI
The discussion of constraints in space allocation and steps
involved in making changes to data pages are also applicable
for index updates (known as deferred index). Deferred index
inserts are triggered by updates that change the index used to
find a row or change the value in a unique index. These
updates to indexes tend to move the index pages and force a
deferred operation to be performed. This is the most expen-
sive type of update operation performed by the SQL server.
The Costs of a Deferred Update
Deferred update occurs in two phases:
Phase 1
1. Server fetches the qualifying rows and pages and writes
them to the transaction log.
2. Server scans the log for the qualified rows and pages and
deletes the data pages and index pages.
Phase 2
1. Server re-scans the log records for the transaction and
performs the inserts to data pages and index rows.
2. On a commit operation, the transaction log is flushed.
Let us consider the cost involved in doing a deferred
update operation of a table with four indexes. For each data
row changed, a total of four log records need to be written
(once to determine the qualifying rows, once to re-fetch the
data pages and write changes to the log, once to modify the
data pages).
x For four indexes, eight log records need to be written to
syslogs
x 24 extra locks need to be applied for index traversal
(presuming three-level deep index)
x Multiple log pages need to be traversed to find the pages
that must be applied to the table
Design Guidelines
What can be done to get by all these restrictions to promote
the use of direct update?
x Create at least one unique index on the table
x Promote the use of non-key columns in where clause
when updating a different key
x When defining tables, use NOT NULL values for columns
whenever feasible
Conclusion
In summary, there are more choices of update operation for
the SQL server in ASE 11.9. We have the older update types
on an allpage locked table, and newer mechanisms on data-
only locked tables. The one best suited for a particular table
needs to be looked at, not only from the perspective of its
update performance, but also from an overall transaction
profile (insert, deletes, and updates on a table) of a client
application. This helps to ensure overall performance gains
instead of skewed results. s

UPD_OP_SQL

More Related Content

What's hot (18)

Similar to UPD_OP_SQL (20)

UPD_OP_SQL