An In-Depth Look at SAP SQL Anywhere Performance Features

(c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015
SQL Anywhere Performance
Features
Jason Hinsperger
Product Manager
SAP

Agenda
Review
SQL Anywhere design goals
Why self-management is important
Query processing in SQL Anywhere
SQL Anywhere performance
Sequential scans vs index scans
Multiprogramming level
Cache management
Adaptive query execution
Statistics management

Design Goals of SQL Anywhere
Ease of administration
Comprehensive yet comprehensible tools
Good out-of-the-box performance
“Embeddability” features  self-tuning
Many environments have no DBA’s
Cross-platform support
Interoperability
A Holistic Approach to Autonomic Database Management

Autonomic Database Management
Self-Managing/Self-Configuring
Self-Tuning/Self-Adapting
Self-Healing
Monitoring and correcting/advising on problems
Self-protecting
Ease of administration
Goal: zero (manual) administration
Design Automation
Management Tools
Index consultant, application profiler, …

Why is Self-management Important?
In a word: complexity
Application development is becoming more complex: new development paradigms
such as ORM toolkits, distributed computation with synchronization amongst
database replicas, and so on
Databases are now ubiquitous in IT because they solve a variety of difficult problems
Yet most companies continue to own and manage a variety of different DBMS
products, which increases administrative costs
Ubiquity brings scale, in several ways
To keep TCO constant, one must improve the productivity of each developer

Agenda
Review
Sequential scans vs index scans
Cache management

Query Processing in SQL Anywhere
QOG Build
Execute
Close
Pre-Optimization
ScanSQL
Parse
Semantic Transformations
Prepare Parse Tree
Cursor
Join Enumeration
DFO Build
Open
Execute
Close
Post-Optimization

SQL Anywhere is designed to get good performance with very little tuning
• Many auto-tuning and self-management capabilities designed to adapt:
Self-managing buffer pool: size and contents
Dynamic tuning of multi-programming level
Automatic statistics gathering, monitoring and healing
Self-tuning query optimization
Query optimization bypass for simple statements
Intra-query parallelism
IO intelligence for certain operations
Cache warming on startup and to steady state

BUT…
Auto-tuning and self-management capabilities are designed to adapt to:
Hardware – CPU, I/O, memory of the machine
Queries being requested
Application logic and concurrency attributes
SQL Anywhere will adapt to different deployment environments
BUT: some adaptations may produce unacceptable performance
Eg. Low memory execution strategies
 Many things can be done at development time to improve performance of
application and database interactions
Capacity planning, Performance analysis and improvements, Scalability testing

Agenda
Review
Reading Data – Sequential scans vs index scans
Cache management

How Fast is SQL Anywhere?
Typical conversation with regards to performance:
C: “So, how fast is SQL Anywhere?”
S: “Well, it depends on a variety of factors. Your database design, number of
concurrent users, hardware, server cache contents, etc…”
C: “I understand those things are important, but can’t you just tell me how
many rows the server can fetch per second?”
S: “Well, in this test, we can fetch between 80 and 30 million rows per
second”
C: “What? That makes no sense. My application can’t get anywhere near 30
million rows per second. Something must be broken. Can you fix it?”
S: “Well, it depends …”
…

How Many Rows Can We Read per Second?
8,161
31,272
125
5,083
1,111
5,616
0.08
0 0 0 1 10 100 1,000 10,000 100,000
Seq Scan cold
Seq Scan Hot
Non Clust IDX 1% cold
Non Clust IDX 1% hot
Clustered IDX 10% cold
Clustered IDX 10% hot
One row statement
Thousands
Rows Read Per Second on Z820 Server (256GB, 32 threads, SSD)

How Many Rows Can We Read per Second? (cont)
733
704
1
3,933
108
3,163
5.10
0 0 0 1 10 100 1,000 10,000 100,000
Seq Scan cold
Seq Scan Hot
Non Clust IDX 1% cold
Non Clust IDX 1% hot
Clustered IDX 10% cold
Clustered IDX 10% hot
One row statement
Thousands
Rows Read Per Second on T520 Laptop (8GB, 4 thread, HDD)

Cold Cache Performance on Two Hosts
I/O dominates cold cache performance
On HDD, sequential is much faster
Clustering of indexes is very important
Buffer size affects how many pages are
re-read
SSD has much better performance
Excellent throughput and random seeks
CPU speed/number is important when
I/O fast
81.9
0.1
492.0
3.5
55.4
0.00
200.00
400.00
600.00
Seq Scan NonCluIX
.1%
NonCluIX 1% CluIX 1% CluIx 10%
T520: Laptop, 8GB, 4-thread,
HDD
7.4
0.0
4.8
0.4
5.4
0.00
2.00
4.00
6.00
8.00
Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10%
Z820: Server, 256GB, 32-thread,
SSD

Warm Cache Performance On Two Hosts
When data is in cache, CPU is the major
factor
Parallelism is available but has
overheads
Clock speed is an important factor
Buffer pool contents have huge
impact
85.3 0.0 0.2 0.2
1.9
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10%
T520: Laptop, 8GB, 4-thread,
HDD
1.9
0.0 0.1 0.2
1.1
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Seq Scan NonCluIX
.1%
NonCluIX
1%
CluIX 1% CluIx 10%
Z820: Server, 256GB, 32-thread,
SSD

Access Methods
Full Table Scan
Index Scan Index

Deciding About Access Method
Full Table Scan
Reads all pages in a table => unnecessary I/O 
Processes all rows => more CPU 
But it benefits from sequential I/O 
Index Scan
Reads only required pages 
Processes only required rows => Less CPU 
Suffers from Random I/O 
Needs to read index pages in addition to table pages 
Might need to re-fetch the same table pages 
When selectivity is large enough, it might need to read the
entire table pages 
0
10
20
30
40
50
60
0 20 40 60 80 100
Runtime
Selectivity (%)
Index Scan Full Table Scan
Selectivity
Break-even
Point

Factors For Choosing Between
Index and Table Scan
Selectivity
Larger selectivity  Table Scan
Small selectivity  Index Scan
Row size (the number of rows per page)
Larger row size  Shifts the break even point toward right (index scan performs better)
Cache contents
With more of the table in cache, more reads are satisfied from the cache
Available memory
Larger available memory  Shifts the break even point toward right (index scan performs
better)
What about I/O Parallelism?

Parallel Index Scan in SAP SQL Anywhere
Leaf
Node
Leaf
Node
Index

0
20
40
60
80
100
120
140
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
Time(Second)
Selectivity %
IS
PIS32
FTS
PFTS32
HDD – Parallel Index Scan Moves
Break-Even Point
Parallel
Break-even point
Non-parallel
Break-even point

0
2
4
6
8
10
12
14
16
18
20 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
4.2
4.4
4.6
4.8
5
5.2
5.4
5.6
5.8
Time(Second)
Selectivity %
IS
PIS32
FTS
PFTS32
SSD – Break-Even Point Moves
Further to Right
Non-parallel
Break-even point
Parallel
Break-even point

Shift in Break Even Point
NP-HDD P-HDD NP-SSD P-SSD
RPP=1 0.55% 1.4% 8% 48%
RPP=33 0.02% 0.05% 0.4% 2.1%
RPP=500 0.0045% 0.005% 0.15% 0.5%
On SSD the magnitude of shift in selectivity break-even point is significantly higher
So the query optimizer needs to be aware of the impact of parallel I/O
Otherwise, we will end up with non optimal execution plans up to ~20 times worse
than optimal
Takeaway: consider ALTER DATABASE CALIBRATE SERVER
If you know the database will run on one configuration, consider calibrating
If you have little control over disk configuration, default calibration does best we can

Elements of a Database System
Client
Process
Server
Process
DB
Statement, parameters
Status, results
Network
Buffer
Pool

Anatomy of a single statement
Client ServerNetwork
Form SQL
Input
Prepare
Prepare
Describe
Describe
Execute
Execute
…Read results
…
Format Output
Output
Close Close
Open

Concurrent requests
C1 C2 CN Server

SQL Anywhere Scheduler

Task Scheduling
Database server has a worker-per-request
A worker pool and a request queue
Each worker picks and complete one request at a time.
No guarantee that the same worker will service the same connection
A small pool of workers executes requests
Scheduler dynamically assigns work across workers - Cooperative multitasking
Unassigned requests wait for an available thread
Server supports dynamic intra-query parallelism
Degree of parallelism varies based on available resources
Pool size establishes the multiprogramming level
SA Default: 20
Priorities can be set on connections
Adjusts the number of time slices that any given request will get

Worker-per-Request Architecture
How to choose the size of the worker pool?
A large worker pool:
Increases the concurrency level of the server
Increases contention on server resources
Increases working set size of server
A smaller worker pool:
Under utilization of hardware resources
Limit concurrency level of workload
Possibility of a server hang due to no workers available to handle outstanding
requests

Dynamic Worker Pool Management
Dynamically adjust the size of the worker pool
Based on workload throughput monitoring and number of requests pending
Benefits of dynamic MPL:
One less parameter for DBAs to worry about
Improve server throughput for different workloads
Better handling of changes in workload transaction mix

Dynamic Memory Management
A SQL Anywhere server will grow and shrink the buffer pool as necessary to
accommodate both
Database server load
Physical memory requirements of other applications
Enabled by default on all supported platforms
User can set lower, upper bounds, initial size

Dynamic Memory Management –
Adjust Buffer Pool Size
Basic idea: match buffer pool size to SA's working set as determined
by the operating system plus the OS free pool
• Feedback control loop
Buffer Pool Governor Buffer Pool Manager
New Buffer
Pool Size
Operating
System
Buffer Miss
Rate
OS Working
Set Size
Adjusted
Memory
Target
Grow/Shrink
Amount
Amount of Free Physical Memory
Database file sizes

SQL Anywhere Memory Management
Single heterogeneous buffer pool with few predefined limits
Buffer pool comprises
• Table and index data pages
• Checkpoint log pages
• Bitmap pages
• Heap pages (data structures for query execution plans, optimization graphs,
connection structures, stored procedures, triggers)
• Free (available) pages
All page frames are the same size
Fully contained memory manager
• Self managed memory foot-print

Cache Warming
Startup cache warming
Record the pages referenced during the “startup period”
Read these pages in on future startups
Meant to quickly load data needed for the first few requests
Steady state cache warming
Record an approximation of the steady state of the cache
After startup warming is done, in the background load up pages expected to be needed
 Should be included in V17

Cache contents estimation
Every table and index maintains a count of pages currently in cache
• This is incremented/decremented when pages are read/evicted
The cost model estimates how many disk reads are needed
• Estimates the number of distinct pages referenced by the plan
• Estimates how many are likely already in the buffer pool
• Estimates how many of those read multiple times will remain in the buffer pool
Takeaway: Consider buffer pool contents when evaluating performance
• Consider flushing or warming cache before experiments to stabilize state

SQL Anywhere Query Optimizer
SA optimizes requests each time they are executed
Takes into account server context
Optimization process includes both heuristic and cost-based rewrites
No hard limits – tested with 500 quantifiers in a single block
Advantages
Plans are responsive to server environment, buffer pool contents/size, data skew
No need to administer ‘packages’ (pre-optimized SQL)
Optimization effort adapts to expected query cost and benefit of optimization
Simple statements bypass optimizer
Cheap but complex statements use plan cache
Optimizer considers multiple join enumeration approaches depending on expected benefit

Bypassing the Query Optimizer
Single-table queries without “complications” bypass the optimizer:
If they have a specific form (select * from T where pk = value), use a single “bypass
cache” plan
If there is only one reasonable plan (WHERE clause specifies a unique row), bypass
heuristic
Otherwise, “bypass costed” compares alternative indexes and sequential scan
A subset are “bypass costed simple” where we can skip trying predicate optimizations or
semantic transforms
If the bypass optimizer finds a plan > 5 seconds, it re-optimizes with full optimizer

Plan Caching and Auto-Parameterization
Access plans for queries in stored procedures/triggers/events are cached
and reused for future executions
Plans undergo a ‘training period’ where plan variance is determined
If no variance (even without variable values), plan is cached and reused
Query is periodically re-optimized on a logarithmic scale to ensure plan does not
become sub-optimal
Improvements in V16 and V17 avoid plan caching when it degrades performance
Takeaway: Do not set max_plans_cached=0

Adaptive Query Processing
Alternative access plans can be executed if actual intermediate result sizes
are poorly estimated
Server switches to alternative plan automatically at run time
Low-memory strategies used when buffer pool utilization is high
Parallelize access plan when doing so is advantageous
The degree of parallelism is determined based on cost during enumeration process
Work is partitioned independently of worker pool size
Plans are largely self-tuning with respect to degree of parallelism
Prevents starvation of query fragments when the number of available workers is less
than optimal for some period

Automatic Statistics Management
Self-tuning column histograms
On both base and temporary tables
Statistics are updated on-the-fly automatically
Join histograms built for intermediate result analysis during an optimization
process
Not persisted
Server maintains persistent index statistics in real-time
Index sampling during optimization
If there is no histogram or it reports “no confidence”
If there is an index with two or more predicates covered (better than combining single-
column estimates)

Column Histograms
Updated in real-time with the results of predicate evaluation and update
DML statements
By default, statistics are computed during the execution of every DML
request
Histograms computed automatically on LOAD TABLE or CREATE INDEX
statements
Can be created/dropped explicitly if necessary
Retained by default across unload/reload

Motivation for Self Healing Statistics
Quality of self tuned statistics can degrade arbitrarily
Can get out of sync in the face of rollbacks
Statistics generation looks at data once, out of order
Goal is not to be perfect with self tuning
Can get out of sync in the face of severe data skew
Self-tuning may not be able to “keep up” on busy servers
The system needs to monitor and correct itself

Self Healing Statistics
An internal system of background server processes
Low overhead to the engine and query execution
Statistics Governor
Categorize and record estimation errors during QP
Self-monitors “quality” of statistics as they are used
Self-heals “poor” statistics
Removes “bad” statistics

SQL Anywhere Solution
Statistics Flusher
Unloads unused statistics from memory
Advises on the health of column statistics
Advises on column statistics usage
Advises on whether to create or drop statistics
Runs every 30 minutes
Statistics Cleaner
Triggered by the flusher process to fix statistics that cannot be fixed otherwise
Keeps track of the table IDs where bad statistics is found
Runs with background priority

Fixing Statistics
Several methods used for automatically improving quality of statistics
Piggyback off user queries
Exploit access plans that see a large portion of the table
Perform in-line statistics collection during query execution
Replace or fix in-situ
Recreate from indexes
Fallback mechanism for piggybacking
Use a shallow index scan to recreate histogram
Perform a sampled table scan
If the table column does not have an index, then we must scan the table to get the statistics
Read a random sample of small number of table pages
Detect pathological situations and prevent self-healing or, even, drop histograms

Agenda
Review
Reading Data – Sequential scans vs index scans
Cache management
Conclusion

Conclusions
How fast is SQL Anywhere?
“It depends” is the right answer!
The optimizer is co-ordinating changing data from multiple sources in real-
time in order to provide/maintain the best performance it can at that
point in time
But it is not perfect!
Specific Takeaways
Consider ALTER DATABASE CALIBRATE SERVER
Consider buffer pool contents when evaluating performance
Do not set max_plans_cached=0

Questions?
Jason Hinsperger
jason.hinsperger@sap.com

An In-Depth Look at SAP SQL Anywhere Performance Features

More Related Content

Viewers also liked (10)

Similar to An In-Depth Look at SAP SQL Anywhere Performance Features (20)

More from SAP Technology (20)

Recently uploaded (20)

An In-Depth Look at SAP SQL Anywhere Performance Features