Performance and Scalability Tuning

Scalable Performance

Building enterprise-scale web applications that perform

Scalability vs. Performance

• Ratio of the increase in • Serving a single request in the
throughput to an increase in shortest amount of time
resources • Inverse of latency
• Support additional users at the
least incremental cost
• Predictability of application
behavior as users are added

Scalable Performance

• Number of requests that can be concurrently served
(throughput) while meeting a minimum level of service
(response time)
• Measuring:
o Resource utilization
o Throughput
o Response time

Horizontal vs. Vertical Scaling

• Increase the hardware resources • Improve hardware capabilities
• Separate types of processing into (cpu, RAM, storage, etc…)
tiers • No network bottleneck
• Commodity hardware is a • Becomes increasingly expensive
predictable cost per user • Practical and financial limitations
• Limitations to scaling are dictated to the ability to scale
by application architecture • Typically increases performance
• Can degrade performance

Horizontal vs. Vertical Scaling

General Observations

• Performance decreases in each later tier (LB > web > app >
DB) due to increasing complexity
o Service requests in the earliest possible tier
• Costs of scalability increases in each later tier (LB < web <
app < DB)
o Architect bottlenecks in the earliest possible tier
• Scalable performance is ultimately limited by the operations
that do not scale linearly
• Ideally, each request that makes it to the database tier
would have its own connection
o Realistically, this means serving requests in earlier tiers because of
constraints to db scaling

Application Bottlenecks

• Thread starvation
• Thread contention
• IO contention
• IO performance
• Memory limitations
• Data access

Resource Capacity Settings

• Database CPUs
• Database connection pool
• Application server CPUs
• Application server thread pool
• JVM Heap settings
• Web server thread pool

Resource Capacity Settings

• Walk through the application architecture and identify the
points where a request could potentially wait.
• Open all wait points.
• Generate balanced and representative load against the
environment.
• Identify the limiting wait point’s saturation point.
• Tighten all wait points to facilitate only the maximum load of
the limiting wait point.
• Force all pending requests to wait at the Web server.
• Add more resources.

Profiling

• Long running http requests
• Long running methods
• Memory leaks
• Deadlocks
• Long running queries

Database Tier

• System of record
• Difficult to scale horizontally and expensive to scale
vertically
• Keep connections limited to what the server will support
o Block at the app tier
• Perform data processing on staging server
• Each database has its own optimization techniques
o Explain plan to locate and eliminate full table scans
o Query and table caches
o Buffer sizes

Application Tier

• Generally cannot be stateless due to security and business
requirements
• Sticky vs. clustered sessions
o Use the HTTPSession sparingly
o If the app does not need to be HA, it may not require clean failover
and can drop sessions
• Scaling horizontally could potentially put more load on the
DB
o Caching can be used to offset the load

Application Caching

• Read-only data can be cached like static data in the web
tier
• Write-able data can be cached but will impose limitations on
clustering
o Will probably either need to be in-sync across the cluster or turned off
completely
• Filters can provide caching of dynamic, secure data at the
earliest point in this tier
o Caches entire response
o Not recommended for user-specific data
• Service layer or data access caches provide a simple way
to stop requests from continuing to the database
o Transparent to the calling code
o Cache interceptors

Tomcat Tuning

• maxThreads controls actively served connections
• backlog controls the number of connections that can be
queued
• maxThreads + backlog = total accepted connections
• connectionTimeout can be used to drop faulty connections
• bufferSize is by default set to -1, no buffering of output
• Keep heap size manageable, < 2GB

Web Tier

• Clustering is easiest in the web tier because they are
generally stateless
• Web tier clustering can provide super-linear scaling (IO
contention, context switching costs)
• The web tier should serve all static data (images, static
html)
• The web tier can serve dynamic requests by caching non-
secure data that only depends upon url parameters
o Squid reverse proxy
o Apache mod cache
o Memcached

Apache Tuning

• Limit connections in the web tier to prevent overloading
later tiers (MaxClients)
• ServerLimit x Memory per process < RAM available to limit
swapping
• Avg connections = ThreadsPerChild x Apache hosts / App
Server hosts
• ProxyPass max = ThreadsPerChild

Other

• Grid Caches
o GigaSpaces
o Coherence
• CDN
o Akamai
• Compute Appliances
o Azul Systems

References

• http://www.theserverside.com/tt/articles/content/JIApresent
ations/JIA-HASP.pdf
• http://www.mnot.net/cache_docs/
• http://httpd.apache.org/docs/2.2/misc/perf-tuning.html
• http://www.yourkit.com/overview/index.jsp
• http://dev2dev.bea.com/pub/a/2006/05/declarative-
caching.html
• http://azulsystems.com/
• http://www.theserverside.com/tt/knowledgecenter/knowledg
ecenter.tss?l=ProJavaEE_Ch06

Performance and Scalability Tuning

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Performance and Scalability Tuning (20)

Recently uploaded (20)

Performance and Scalability Tuning