SlideShare a Scribd company logo
Java SE 6 Performance White Paper


 Java™ Platform Performance Engineering
 Sun Microsystems, Inc.




 Table of Contents
 1 Introduction
 2 New Features and Performance Enhancements
 2.1 Runtime Performance Improvements
 2.1.1 Biased Locking
 2.1.2 Lock Coarsening
 2.1.3 Adaptive Spinning
 2.1.4 Support for large page heap on x86 and amd64 platforms
 2.1.5 Array Copy Performance Improvements 5007322|6245890|6245890
 2.1.6 Background Compilation in HotSpot ™ Client compiler
 2.1.7 New Linear Scan Register Allocation Algorithm for the HotSpot™ Client Compiler
 2.2 Garbage Collection Performance Enhancements
 2.2.1 Parallel Compaction Collector
 2.2.2 Concurrent Low Pause Garbage Collector Improvements
 2.3 Ergonomics in the 6.0 Java Virtual Machine
 2.4 Client-side Performance Features and Improvements
 2.4.1 New Class List for Class Data Sharing
 2.4.2 Performance improvements to the boot class loader
 2.4.3 Splash Screen Functionality
 2.4.4 Swing's true double buffering
 2.4.5 Improving rendering on windows systems
 3 New Platform Support
 3.1 Operating Environments
 3.1.1 Windows Vista
 4 Going Further
 4.1 Java Performance Portal
 4.2 jvmstat 3.0
 4.3 Java SE 6 Documentation
 4.4 Performance Monitoring and Management
 4.4.1 DTrace Probes in HotSpot VM
 4.4.2 New monitoring, management and diagnosability features
4.4.3 Observability Using Java SE 6 on Solaris OS
4.5 Benchmark Disclosures
4.5.1 SPECjbb 2005
4.5.2 VolanoMark™ 2.5



1 Introduction

One of the principal design centers for Java Platform, Standard Edition 6 (Java SE 6) was to
improve performance and scalability by targeting performance deficiencies highlighted by
some of the most popular Java benchmarks currently available and also by working closely
with the Java community to determine key areas where performance enhancements would
have the most impact.

This guide gives an overview of the new performance and scalability improvements in Java
Standard Edition 6 along with various industry standard and internally developed
benchmark results to demonstrate the impact of these improvements.



2 New Features and Performance
Enhancements
Java SE 6 includes several new features and enhancements to improve performance in many
areas of the platform. Improvements include: synchronization performance optimizations,
compiler performance optimizations, the new Parallel Compaction Collector, better
ergonomics for the Concurrent Low Pause Collector and application start-up performance.

2.1 Runtime performance optimizations



2.1.1 Biased locking


Biased Locking is a class of optimizations that improves uncontended synchronization
performance by eliminating atomic operations associated with the Java language’s
synchronization primitives. These optimizations rely on the property that not only are most
monitors uncontended, they are locked by at most one thread during their lifetime.
An object is "biased" toward the thread which first acquires its monitor via a monitorenter
bytecode or synchronized method invocation; subsequent monitor-related operations can be
performed by that thread without using atomic operations resulting in much better
performance, particularly on multiprocessor machines.

Locking attempts by threads other that the one toward which the object is "biased" will
cause a relatively expensive operation whereby the bias is revoked. The benefit of the
elimination of atomic operations must exceed the penalty of revocation for this optimization
to be profitable.

Applications with substantial amounts of uncontended synchronization may attain
significant speedups while others with certain patterns of locking may see slowdowns.

Biased Locking is enabled by default in Java SE 6 and later. To disable Biased Locking,
please add to the command line -XX:-UseBiasedLocking .

For more on Biased Locking, please refer to the ACM OOPSLA 2006 paper by Kenneth
Russell and David Detlefs: "Eliminating Synchronization-Related Atomic Operations with
Biased Locking and Bulk Rebiasing".

2.1.2 Lock coarsening


There are some patterns of locking where a lock is released and then reacquired within a
piece of code where no observable operations occur in between. The lock coarsening
optimization technique implemented in hotspot eliminates the unlock and relock operations
in those situations (when a lock is released and then reacquired with no meaningful work
done in between those operations). It basically reduces the amount of synchronization work
by enlarging an existing synchronized region. Doing this around a loop could cause a lock to
be held for long periods of times, so the technique is only used on non-looping control flow.

This feature is on by default. To disable it, please add the following option to the command
line: -XX:-EliminateLocks

2.1.3 Adaptive spinning


Adaptive spinning is an optimization technique where a two-phase spin-then-block strategy
is used by threads attempting a contended synchronized enter operation. This technique
enables threads to avoid undesirable effects that impact performance such as context
switching and repopulation of Translation Lookaside Buffers (TLBs). It is “adaptive"
because the duration of the spin is determined by policy decisions based on factors such as
the rate of success and/or failure of recent spin attempts on the same monitor and the state of
the current lock owner.
For more on Adaptive Spinning, please refer to the presentation by Dave Dice:
"Synchronization in Java SE 6"

2.1.4 Support for large page heap on x86 and amd64 platforms


Java SE 6 supports large page heaps on x86 and amd64 platforms. Large page heaps help
the Operating System avoid costly Translation-Lookaside Buffer (TLB) misses to enable
memory-intensive applications perform better (a single TLB entry can represent a larger
memory range).

Please note that large page memory can sometimes negatively impact system performance.
For example, when a large amount of memory is pinned by an application, it may create a
shortage of regular memory and cause excessive paging in other applications and slow down
the entire system. Also please note for a system that has been up for a long time, excessive
fragmentation can make it impossible to reserve enough large page memory. When it
happens, the OS may revert to using regular pages. Furthermore, this effect can be
minimized by setting -Xms == -Xmx, -XX:PermSize == -XX:MaxPermSize and -
XX:InitialCodeCacheSize == -XX:ReserverCodeCacheSize .

Another possible drawback of large pages is that the default sizes of the perm gen and code
cache might be larger as a result of using a large page; this is particularly noticeable with
page sizes that are larger than the default sizes for these memory areas.

Support for large pages is enabled by default on Solaris. It's off by default on Windows and
Linux. Please add to the command line -XX:+UseLargePages to enable this feature. Please
note that Operating System configuration changes may be required to enable large pages.
For more information, please refer to the documentation on Java Support for Large Memory
Pages on Sun Developer Network.



2.1.5 Array Copy Performance Improvements


The method instruction System.arraycopy() was further enhanced in Java SE 6. Hand-coded
assembly stubs are now used for each type size when no overlap occurs.

2.1.6 Background Compilation in HotSpot™ Client Compiler


Prior to Java SE 6, the HotSpot Client compiler did not compile Java methods in the
background by default. As a consequence, Hyperthreaded or Multi-processing systems
couldn't take advantage of spare CPU cycles to optimize Java code execution speed.
Background compilation is now enabled in the Java SE 6 HotSpot client compiler.

2.1.7 New Linear Scan Register Allocation Algorithm for the HotSpot™ Client
Compiler


The HotSpot client compiler features a new linear scan register allocation algorithm that
relies on static single assignment (SSA) form. This has the added advantage of providing a
simplified data flow analysis and shorter live intervals which yields a better tradeoff
between compilation time and program runtime. This new algorithm has provided
performance improvements of about 10% on many internal and industry-standard
benchmarks.

For more information on this new new feature, please refer to the following paper: Linear
Scan Register Allocation for the Java HotSpot™ Client Compiler

2.2 Garbage Collection

2.2.1 Parallel Compaction Collector


Parallel compaction is a feature that enables the parallel collector to perform major
collections in parallel resulting in lower garbage collection overhead and better application
performance particularly for applications with large heaps. It is best suited to platforms with
two or more processors or hardware threads.

Previous to Java SE 6, while the young generation was collected in parallel, major
collections were performed using a single thread. For applications with frequent major
collections, this adversely affected scalability.

Parallel compaction is used by default in JDK 6, but can be enabled by adding the option -
XX:+UseParallelOldGC to the command line in JDK 5 update 6 and later.

Please note that parallel compaction is not available in combination with the concurrent
mark sweep collector; it can only be used with the parallel young generation collector (-
XX:+UseParallelGC). The documents referenced below provide more information on the
available collectors and recommendations for their use.

For more on the Parallel Compaction Collection, please refer to the Java SE 6 release notes.
For more information on garbage collection in general, the HotSpot memory management
whitepaper describes the various collectors available in HotSpot and includes
recommendations on when to use parallel compaction as well as a high-level description of
the algorithm.

2.2.2 Concurrent Low Pause Collector: Concurrent Mark Sweep Collector
Enhancements


The Concurrent Mark Sweep Collector has been enhanced to provide concurrent collection
for the System.gc() and Runtime.getRuntime().gc() method instructions. Prior to Java SE 6,
these methods stopped all application threads in order to collect the entire heap which
sometimes resulted in lengthy pause times in applications with large heaps. In line with the
goals of the Concurrent Mark Sweep Collector, this new feature is enabling the collector to
keep pauses as short as possible during full heap collection.

To enable this feature, add the option -XX:+ExplicitGCInvokesConcurrent to the Java
command line.

The concurrent marking task in the CMS collector is now performed in parallel on platforms
with multiple processors . This significantly reduces the duration of the concurrent marking
cycle and enables the collector to better support applications with larger numbers of threads
and high object allocation rates, particularly on large multiprocessor machines.

For more on these new features, please refer to the Java SE 6 release notes.

2.3 Ergonomics in the 6.0 Java Virtual Machine




In Java SE 5, platform-dependent default selections for the garbage collector, heap size, and
runtime compiler were introduced to better match the needs of different types of
applications while requiring less command-line tuning. New tuning flags were also
introduced to allow users to specify a desired behavior which in turn enabled the garbage
collector to dynamically tune the size of the heap to meet the specified behavior. In Java SE
6, the default selections have been further enhanced to improve application runtime
performance and garbage collector efficiency.

The chart below compares out-of-the-box SPECjbb2005™ performance between Java SE 5
and Java SE 6 Update 2. This test was conducted on a Sun Fire V890 with 24 x 1.5 GHz
UltraSparc CPU's and 64 GB RAM running Solaris 10:
In each case the benchmarks were ran without any performance flags. Please see the
SPECjbb 2005 Benchmark Disclosure

We also compared I/O performance between Java SE 5 and Java SE 6 Update 2. This test
was conducted on
a Sun Fire V890 with 24 x 1.5 GHz UltraSparc CPU's and 64 GB RAM running Solaris 10:
In each case the benchmarks were ran without any performance flags.

We compared VolanoMark™ 2.5 performance between Java SE 5 and Java SE 6.
VolanoMark is a pure Java benchmark that measures both (a) raw server performance and
(b) server network scalability performance. In this benchmark, the client side simulates up
to 4,000 concurrent socket connections. Only those VMs that successfully scale up to 4,000
connections pass the test. In both the raw performance and network scalability tests, the
higher the score, the better the result.


This test was conducted on a Sun Fire V890 with 24 x 1.5 GHz UltraSparc CPU's and 64
GB RAM running Solaris 10:
In each case we ran the benchmark in loopback mode without any performance flags. The
result shown is based upon relative throughput (messages per second with 400 loopback
connections).

The full Java version for Java SE 5 is:

java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-b64)
Java HotSpot(TM) Client VM (build 1.5.0-b64, mixed mode)



The full Java version for Java SE 6 is:

java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode)



Please see the VolanoMark™ 2.5 Benchmark Disclosure
Some other improvements in Java SE 6 include:

       On server-class machines, a specified maximum pause time goal of less than or
       equal to 1 second will enable the Concurrent Mark Sweep Collector.
       The garbage collector is allowed to move the boundary between the tenured
       generation and the young generation as needed (within prescribed limits) to better
       achieve performance goals. This mechanism is off by default; to activate it add this
       to the command line: option -XX:+UseAdaptiveGCBoundary .
       Promotion failure handling is turned on by default for the serial (-
       XX:+UseSerialGC) and Parallel Young Generation (-XX:+ParNewGC) collectors.
       This feature allows the collector to start a minor collection and then back out of it if
       there is not enough space in the tenured generation to promote all the objects that
       need to be promoted.
       An alternative order for copying objects from the young to the tenured generation in
       the parallel scavenge collector has been implemented. The intent of this feature is to
       decrease cache misses for objects accessed in the tenured generation.This feature is
       on by default. To disable it, please add this to the command line -XX:-
       UseDepthFirstScavengeOrder
       The default young generation size has been increased to 1MB on x86 platforms
       The Concurrent Mark Sweep Collector's default Young Generation size has been
       increased.
       The minimum young generation size was increased from 4MB to 16MB.
       The proportion of the overall heap used for the young generation was increased from
       1/15 to 1/7.
       The CMS collector is now using the survivor spaces by default, and their default size
       was increased.
       The primary effect of these changes is to improve application performance by
       reducing garbage collection overhead. However, because the default young
       generation size is larger, applications may also see larger young generation pause
       times and a larger memory footprint.

2.4 Client-side Performance Features and Improvements



2.4.1 New class list for Class Data Sharing


To reduce application startup time and footprint, Java SE 5.0 introduced a feature called
"class data sharing" (CDS). On 32-bit platforms, this mechanism works as follows: the Sun
provided installer loads a set of classes from the system jar (the jar file containing all the
Java class library, called rt.jar) file into a private internal representation, and dumps that
representation to a file, called a "shared archive". On subsequent JVM invocations, the
shared archive is memory-mapped in, saving the cost of loading those classes and allowing
much of the Java Virtual Machine's metadata for these classes to be shared among multiple
JVM processes.

In Java SE 6.0, the list of classes in the "shared archive" has been updated to better reflect
the changes to the system jar file.

2.4.2 Improvements to the boot class loader


The Java Virtual Machine's boot and extension class loaders have been enhanced to improve
the cold-start time of Java applications. Prior to Java SE 6, opening the system jar file
caused the Java Virtual Machine to read a one-megabyte ZIP index file that translated into a
lot of disk seek activity when the file was not in the disk cache. With "class data sharing"
enabled, the Java Virtual Machine is now provided with a "meta-index" file (located in
jre/lib) that contains high-level information about which packages (or package prefixes) are
contained in which jar files.

This helps the JVM avoid opening all of the jar files on the boot and extension class paths
when a Java application class is loaded. Check bug 6278968} for more details.

Below we show a chart comparing application start-up time performance between Java SE 5
and Java SE 6 Update 2. This test was conducted on an Intel Core 2 Duo 2.66GHz desktop
machine with 1GB of memory:
The application start-up comparison above shows relative performance (smaller is better)
and in each case the benchmarks were ran without any performance flags.



We also compared memory footprint size required between Java SE 5 and Java SE 6 Update
2. This test was conducted on
an Intel Core 2 Duo 2.66GHz desktop machine with 1GB of memory:
The footprint comparison above shows relative performance (smaller is better) and in each
case the benchmarks were run without any performance flags.


Despite the addition of many new features, the Java Virtual Machine's core memory usage
has been pared down to make the actual memory impact on your system even lower than
with Java SE 5

2.4.3 Splash Screen Functionality

Java SE 6 provides a solution that allows an application to show a splash screen before the
virtual machine starts. Now, a Java application launcher is able to decode an image and
display it in a simple non-decorated window.

2.4.4 Swing's true double buffering

Swing's true double buffering has now been enabled. Swing used to provide double
buffering on an application basis, it now provides it on a per-window basis and native
exposed events are copied directly from the double buffer. This significantly improves
Swing performance, especially on remote servers.

Please see the Scott Violet's Blog for full details.

2.4.5 Improving rendering on windows systems

The UxTheme API, which allows standard Look&Feel rendering of windows controls on
Microsoft Windows systems, has been adopted to improve the fidelity of Swing Systems
Look & Feels.




3 New Platform Support
Please see the Supported System Configurations chart for full details.

3.1 Operating Environments



3.1.1 Windows Vista

Java SE 6 is supported on Windows Vista Ultimate Edition, Home Premium Edition, Home
Basic Edition, Enterprise Edition and Business Edition in addition to Windows XP Home
and Professional, 2000 Professional, 2000 Server, and 2003 Server.



4 Going Further
4.1 Java Performance Portal


For the latest in Java Performance best practices, documentation, tools, FAQs, code
samples, White Papers and other Java performance news check out the Java Performance
Portal .

Three especially relevant performance links for Java SE 6.0 are given here:

4.1.1 Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning



The Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning document expands
on GC tuning concepts and techniques for Java SE 6 that were introduced in the Tuning
Garbage Collection with the 5.0 Java Virtual Machine document.

4.1.2 jvmstat 3.0



The jvmstat 3.0 home page documents the lightweight performance monitoring capabilities
that are built into Java SE 6 and explains how to use these tools to monitor not only for the
6.0 HotSpot Java Virtual Machines but also HotSpot 1.5.0, 1.4.2 and 1.4.1 JVM's.

4.2 Java SE 6 Documentation


Be sure to check out the wealth of Java SE 6 Documentation including the New Features
and Enhancements and the Java Platform, Standard Edition 6 Overview.

4.3 Performance Monitoring and Management



4.3.1 DTrace Probes in HotSpot VM




4.3.2 New monitoring, management and diagnosability features




4.3.3 Observability Using Java SE 6 on Solaris OS




4.4 Benchmark Disclosure



4.4.1 SPECjbb 2005



SPECjbb2000 is a benchmark from the Standard Performance Evaluation Corporation
(SPEC). The performance referenced is based on Sun internal software testing conforming
to the testing methodologies listed above.

For the latest SPECjbb2005 results visit http://www.spec.org/osg/jbb2005.
4.4.2 VolanoMark™ 2.5



VolanoMark™ version 2.5 is a benchmark from Volano LLC ( http://www.volano.com/ ).

More Related Content

PDF
Performance Tuning Oracle Weblogic Server 12c
PPT
RAC - Test
PDF
RAC Attack 12c Installation Instruction
PDF
Best practices for_large_oracle_apps_r12_implementations
PPT
Shopzilla On Concurrency
PDF
Double the Performance of Oracle SOA Suite 11g? Absolutely!
PDF
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
PDF
Oracle RAC 12c New Features List OOW13
Performance Tuning Oracle Weblogic Server 12c
RAC - Test
RAC Attack 12c Installation Instruction
Best practices for_large_oracle_apps_r12_implementations
Shopzilla On Concurrency
Double the Performance of Oracle SOA Suite 11g? Absolutely!
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Oracle RAC 12c New Features List OOW13

What's hot (17)

PDF
Oracle Flex ASM - What’s New and Best Practices by Jim Williams
PDF
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
PDF
MySQL Group Replication - Ready For Production? (2018-04)
PDF
Oracle soa suite 12c upgrade types
DOC
Garbage Collection, Tuning And Monitoring JVM In EBS 11i And R12
PPSX
Sql Server 2008 Enhancements
PDF
Oracle Failover Database Cluster with Grid Infrastructure 12c
DOCX
Mirroring in SQL Server 2012 R2
PDF
Java Tuning White Paper
PDF
MySQL Replication Performance in the Cloud
PDF
Oracle WebLogic Server: Remote Monitoring and Management
PPTX
Database Mirror for the exceptional DBA – David Izahk
PDF
MySQL InnoDB Cluster / ReplicaSet - Tutorial
PDF
VMworld 2013: vSphere Flash Read Cache Technical Overview
PDF
EAP6 performance Tuning
PPTX
Weblogic12 c installation guide
PPTX
Crack the complexity of oracle applications r12 workload v2
Oracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle RAC 12c Best Practices Sanger OOW13 [CON8805]
MySQL Group Replication - Ready For Production? (2018-04)
Oracle soa suite 12c upgrade types
Garbage Collection, Tuning And Monitoring JVM In EBS 11i And R12
Sql Server 2008 Enhancements
Oracle Failover Database Cluster with Grid Infrastructure 12c
Mirroring in SQL Server 2012 R2
Java Tuning White Paper
MySQL Replication Performance in the Cloud
Oracle WebLogic Server: Remote Monitoring and Management
Database Mirror for the exceptional DBA – David Izahk
MySQL InnoDB Cluster / ReplicaSet - Tutorial
VMworld 2013: vSphere Flash Read Cache Technical Overview
EAP6 performance Tuning
Weblogic12 c installation guide
Crack the complexity of oracle applications r12 workload v2
Ad

Similar to Java Standard Edition 6 Performance (20)

PDF
Java Standard Edition 5 Performance
PDF
Java Programming - 01 intro to java
PDF
JEE Programming - 01 Introduction
PPT
Best Practices for performance evaluation and diagnosis of Java Applications ...
DOCX
java full 1.docx
DOCX
java full.docx
PDF
Why should i switch to Java SE 7
DOCX
java completed units.docx
PDF
Designing and coding Series 40 Java apps for high performance
PDF
Software Profiling: Java Performance, Profiling and Flamegraphs
DOCX
java full 1 (Recovered).docx
PPTX
The Java Story
PPTX
55 New Features in Java SE 8
PDF
Java keynote preso
PDF
The State of Java under Oracle at JCertif 2011
PDF
Java 8 in Anger, Devoxx France
PDF
JVM Under the Hood
PPT
Jvm Performance Tunning
PPT
Jvm Performance Tunning
PDF
Java 8 in Anger (QCon London)
Java Standard Edition 5 Performance
Java Programming - 01 intro to java
JEE Programming - 01 Introduction
Best Practices for performance evaluation and diagnosis of Java Applications ...
java full 1.docx
java full.docx
Why should i switch to Java SE 7
java completed units.docx
Designing and coding Series 40 Java apps for high performance
Software Profiling: Java Performance, Profiling and Flamegraphs
java full 1 (Recovered).docx
The Java Story
55 New Features in Java SE 8
Java keynote preso
The State of Java under Oracle at JCertif 2011
Java 8 in Anger, Devoxx France
JVM Under the Hood
Jvm Performance Tunning
Jvm Performance Tunning
Java 8 in Anger (QCon London)
Ad

More from white paper (20)

PDF
Secure Computing With Java
PDF
Java Security Overview
PDF
Platform Migration Guide
PDF
Java Standard Edition 6 Performance
PDF
Java Standard Edition 6 Performance
PDF
Java Standard Edition 6 Performance
PDF
Java Standard Edition 6 Performance
PDF
Memory Management in the Java HotSpot Virtual Machine
PDF
J2 Se 5.0 Name And Version Change
PDF
Java Web Start
PDF
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
ZIP
Introduction to the Java(TM) Advanced Imaging API
PDF
* Evaluation of Java Advanced Imaging (1.0.2) as a Basis for Image Proce...
PDF
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
PDF
Concurrency Utilities Overview
PDF
Defining a Summative Usability Test for Voting Systems
PDF
Usability Performance Benchmarks
PDF
The Effect of Culture on Usability
PDF
Principles of Web Usability I - Summer 2006
PDF
Principles of Web Usabilty II - Fall 2007
Secure Computing With Java
Java Security Overview
Platform Migration Guide
Java Standard Edition 6 Performance
Java Standard Edition 6 Performance
Java Standard Edition 6 Performance
Java Standard Edition 6 Performance
Memory Management in the Java HotSpot Virtual Machine
J2 Se 5.0 Name And Version Change
Java Web Start
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Introduction to the Java(TM) Advanced Imaging API
* Evaluation of Java Advanced Imaging (1.0.2) as a Basis for Image Proce...
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
Concurrency Utilities Overview
Defining a Summative Usability Test for Voting Systems
Usability Performance Benchmarks
The Effect of Culture on Usability
Principles of Web Usability I - Summer 2006
Principles of Web Usabilty II - Fall 2007

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Java Standard Edition 6 Performance

  • 1. Java SE 6 Performance White Paper Java™ Platform Performance Engineering Sun Microsystems, Inc. Table of Contents 1 Introduction 2 New Features and Performance Enhancements 2.1 Runtime Performance Improvements 2.1.1 Biased Locking 2.1.2 Lock Coarsening 2.1.3 Adaptive Spinning 2.1.4 Support for large page heap on x86 and amd64 platforms 2.1.5 Array Copy Performance Improvements 5007322|6245890|6245890 2.1.6 Background Compilation in HotSpot ™ Client compiler 2.1.7 New Linear Scan Register Allocation Algorithm for the HotSpot™ Client Compiler 2.2 Garbage Collection Performance Enhancements 2.2.1 Parallel Compaction Collector 2.2.2 Concurrent Low Pause Garbage Collector Improvements 2.3 Ergonomics in the 6.0 Java Virtual Machine 2.4 Client-side Performance Features and Improvements 2.4.1 New Class List for Class Data Sharing 2.4.2 Performance improvements to the boot class loader 2.4.3 Splash Screen Functionality 2.4.4 Swing's true double buffering 2.4.5 Improving rendering on windows systems 3 New Platform Support 3.1 Operating Environments 3.1.1 Windows Vista 4 Going Further 4.1 Java Performance Portal 4.2 jvmstat 3.0 4.3 Java SE 6 Documentation 4.4 Performance Monitoring and Management 4.4.1 DTrace Probes in HotSpot VM 4.4.2 New monitoring, management and diagnosability features
  • 2. 4.4.3 Observability Using Java SE 6 on Solaris OS 4.5 Benchmark Disclosures 4.5.1 SPECjbb 2005 4.5.2 VolanoMark™ 2.5 1 Introduction One of the principal design centers for Java Platform, Standard Edition 6 (Java SE 6) was to improve performance and scalability by targeting performance deficiencies highlighted by some of the most popular Java benchmarks currently available and also by working closely with the Java community to determine key areas where performance enhancements would have the most impact. This guide gives an overview of the new performance and scalability improvements in Java Standard Edition 6 along with various industry standard and internally developed benchmark results to demonstrate the impact of these improvements. 2 New Features and Performance Enhancements Java SE 6 includes several new features and enhancements to improve performance in many areas of the platform. Improvements include: synchronization performance optimizations, compiler performance optimizations, the new Parallel Compaction Collector, better ergonomics for the Concurrent Low Pause Collector and application start-up performance. 2.1 Runtime performance optimizations 2.1.1 Biased locking Biased Locking is a class of optimizations that improves uncontended synchronization performance by eliminating atomic operations associated with the Java language’s synchronization primitives. These optimizations rely on the property that not only are most monitors uncontended, they are locked by at most one thread during their lifetime.
  • 3. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations can be performed by that thread without using atomic operations resulting in much better performance, particularly on multiprocessor machines. Locking attempts by threads other that the one toward which the object is "biased" will cause a relatively expensive operation whereby the bias is revoked. The benefit of the elimination of atomic operations must exceed the penalty of revocation for this optimization to be profitable. Applications with substantial amounts of uncontended synchronization may attain significant speedups while others with certain patterns of locking may see slowdowns. Biased Locking is enabled by default in Java SE 6 and later. To disable Biased Locking, please add to the command line -XX:-UseBiasedLocking . For more on Biased Locking, please refer to the ACM OOPSLA 2006 paper by Kenneth Russell and David Detlefs: "Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing". 2.1.2 Lock coarsening There are some patterns of locking where a lock is released and then reacquired within a piece of code where no observable operations occur in between. The lock coarsening optimization technique implemented in hotspot eliminates the unlock and relock operations in those situations (when a lock is released and then reacquired with no meaningful work done in between those operations). It basically reduces the amount of synchronization work by enlarging an existing synchronized region. Doing this around a loop could cause a lock to be held for long periods of times, so the technique is only used on non-looping control flow. This feature is on by default. To disable it, please add the following option to the command line: -XX:-EliminateLocks 2.1.3 Adaptive spinning Adaptive spinning is an optimization technique where a two-phase spin-then-block strategy is used by threads attempting a contended synchronized enter operation. This technique enables threads to avoid undesirable effects that impact performance such as context switching and repopulation of Translation Lookaside Buffers (TLBs). It is “adaptive" because the duration of the spin is determined by policy decisions based on factors such as the rate of success and/or failure of recent spin attempts on the same monitor and the state of the current lock owner.
  • 4. For more on Adaptive Spinning, please refer to the presentation by Dave Dice: "Synchronization in Java SE 6" 2.1.4 Support for large page heap on x86 and amd64 platforms Java SE 6 supports large page heaps on x86 and amd64 platforms. Large page heaps help the Operating System avoid costly Translation-Lookaside Buffer (TLB) misses to enable memory-intensive applications perform better (a single TLB entry can represent a larger memory range). Please note that large page memory can sometimes negatively impact system performance. For example, when a large amount of memory is pinned by an application, it may create a shortage of regular memory and cause excessive paging in other applications and slow down the entire system. Also please note for a system that has been up for a long time, excessive fragmentation can make it impossible to reserve enough large page memory. When it happens, the OS may revert to using regular pages. Furthermore, this effect can be minimized by setting -Xms == -Xmx, -XX:PermSize == -XX:MaxPermSize and - XX:InitialCodeCacheSize == -XX:ReserverCodeCacheSize . Another possible drawback of large pages is that the default sizes of the perm gen and code cache might be larger as a result of using a large page; this is particularly noticeable with page sizes that are larger than the default sizes for these memory areas. Support for large pages is enabled by default on Solaris. It's off by default on Windows and Linux. Please add to the command line -XX:+UseLargePages to enable this feature. Please note that Operating System configuration changes may be required to enable large pages. For more information, please refer to the documentation on Java Support for Large Memory Pages on Sun Developer Network. 2.1.5 Array Copy Performance Improvements The method instruction System.arraycopy() was further enhanced in Java SE 6. Hand-coded assembly stubs are now used for each type size when no overlap occurs. 2.1.6 Background Compilation in HotSpot™ Client Compiler Prior to Java SE 6, the HotSpot Client compiler did not compile Java methods in the background by default. As a consequence, Hyperthreaded or Multi-processing systems couldn't take advantage of spare CPU cycles to optimize Java code execution speed.
  • 5. Background compilation is now enabled in the Java SE 6 HotSpot client compiler. 2.1.7 New Linear Scan Register Allocation Algorithm for the HotSpot™ Client Compiler The HotSpot client compiler features a new linear scan register allocation algorithm that relies on static single assignment (SSA) form. This has the added advantage of providing a simplified data flow analysis and shorter live intervals which yields a better tradeoff between compilation time and program runtime. This new algorithm has provided performance improvements of about 10% on many internal and industry-standard benchmarks. For more information on this new new feature, please refer to the following paper: Linear Scan Register Allocation for the Java HotSpot™ Client Compiler 2.2 Garbage Collection 2.2.1 Parallel Compaction Collector Parallel compaction is a feature that enables the parallel collector to perform major collections in parallel resulting in lower garbage collection overhead and better application performance particularly for applications with large heaps. It is best suited to platforms with two or more processors or hardware threads. Previous to Java SE 6, while the young generation was collected in parallel, major collections were performed using a single thread. For applications with frequent major collections, this adversely affected scalability. Parallel compaction is used by default in JDK 6, but can be enabled by adding the option - XX:+UseParallelOldGC to the command line in JDK 5 update 6 and later. Please note that parallel compaction is not available in combination with the concurrent mark sweep collector; it can only be used with the parallel young generation collector (- XX:+UseParallelGC). The documents referenced below provide more information on the available collectors and recommendations for their use. For more on the Parallel Compaction Collection, please refer to the Java SE 6 release notes. For more information on garbage collection in general, the HotSpot memory management whitepaper describes the various collectors available in HotSpot and includes recommendations on when to use parallel compaction as well as a high-level description of the algorithm. 2.2.2 Concurrent Low Pause Collector: Concurrent Mark Sweep Collector
  • 6. Enhancements The Concurrent Mark Sweep Collector has been enhanced to provide concurrent collection for the System.gc() and Runtime.getRuntime().gc() method instructions. Prior to Java SE 6, these methods stopped all application threads in order to collect the entire heap which sometimes resulted in lengthy pause times in applications with large heaps. In line with the goals of the Concurrent Mark Sweep Collector, this new feature is enabling the collector to keep pauses as short as possible during full heap collection. To enable this feature, add the option -XX:+ExplicitGCInvokesConcurrent to the Java command line. The concurrent marking task in the CMS collector is now performed in parallel on platforms with multiple processors . This significantly reduces the duration of the concurrent marking cycle and enables the collector to better support applications with larger numbers of threads and high object allocation rates, particularly on large multiprocessor machines. For more on these new features, please refer to the Java SE 6 release notes. 2.3 Ergonomics in the 6.0 Java Virtual Machine In Java SE 5, platform-dependent default selections for the garbage collector, heap size, and runtime compiler were introduced to better match the needs of different types of applications while requiring less command-line tuning. New tuning flags were also introduced to allow users to specify a desired behavior which in turn enabled the garbage collector to dynamically tune the size of the heap to meet the specified behavior. In Java SE 6, the default selections have been further enhanced to improve application runtime performance and garbage collector efficiency. The chart below compares out-of-the-box SPECjbb2005™ performance between Java SE 5 and Java SE 6 Update 2. This test was conducted on a Sun Fire V890 with 24 x 1.5 GHz UltraSparc CPU's and 64 GB RAM running Solaris 10:
  • 7. In each case the benchmarks were ran without any performance flags. Please see the SPECjbb 2005 Benchmark Disclosure We also compared I/O performance between Java SE 5 and Java SE 6 Update 2. This test was conducted on a Sun Fire V890 with 24 x 1.5 GHz UltraSparc CPU's and 64 GB RAM running Solaris 10:
  • 8. In each case the benchmarks were ran without any performance flags. We compared VolanoMark™ 2.5 performance between Java SE 5 and Java SE 6. VolanoMark is a pure Java benchmark that measures both (a) raw server performance and (b) server network scalability performance. In this benchmark, the client side simulates up to 4,000 concurrent socket connections. Only those VMs that successfully scale up to 4,000 connections pass the test. In both the raw performance and network scalability tests, the higher the score, the better the result. This test was conducted on a Sun Fire V890 with 24 x 1.5 GHz UltraSparc CPU's and 64 GB RAM running Solaris 10:
  • 9. In each case we ran the benchmark in loopback mode without any performance flags. The result shown is based upon relative throughput (messages per second with 400 loopback connections). The full Java version for Java SE 5 is: java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-b64) Java HotSpot(TM) Client VM (build 1.5.0-b64, mixed mode) The full Java version for Java SE 6 is: java version "1.6.0_02" Java(TM) SE Runtime Environment (build 1.6.0_02-b05) Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode) Please see the VolanoMark™ 2.5 Benchmark Disclosure
  • 10. Some other improvements in Java SE 6 include: On server-class machines, a specified maximum pause time goal of less than or equal to 1 second will enable the Concurrent Mark Sweep Collector. The garbage collector is allowed to move the boundary between the tenured generation and the young generation as needed (within prescribed limits) to better achieve performance goals. This mechanism is off by default; to activate it add this to the command line: option -XX:+UseAdaptiveGCBoundary . Promotion failure handling is turned on by default for the serial (- XX:+UseSerialGC) and Parallel Young Generation (-XX:+ParNewGC) collectors. This feature allows the collector to start a minor collection and then back out of it if there is not enough space in the tenured generation to promote all the objects that need to be promoted. An alternative order for copying objects from the young to the tenured generation in the parallel scavenge collector has been implemented. The intent of this feature is to decrease cache misses for objects accessed in the tenured generation.This feature is on by default. To disable it, please add this to the command line -XX:- UseDepthFirstScavengeOrder The default young generation size has been increased to 1MB on x86 platforms The Concurrent Mark Sweep Collector's default Young Generation size has been increased. The minimum young generation size was increased from 4MB to 16MB. The proportion of the overall heap used for the young generation was increased from 1/15 to 1/7. The CMS collector is now using the survivor spaces by default, and their default size was increased. The primary effect of these changes is to improve application performance by reducing garbage collection overhead. However, because the default young generation size is larger, applications may also see larger young generation pause times and a larger memory footprint. 2.4 Client-side Performance Features and Improvements 2.4.1 New class list for Class Data Sharing To reduce application startup time and footprint, Java SE 5.0 introduced a feature called "class data sharing" (CDS). On 32-bit platforms, this mechanism works as follows: the Sun provided installer loads a set of classes from the system jar (the jar file containing all the Java class library, called rt.jar) file into a private internal representation, and dumps that representation to a file, called a "shared archive". On subsequent JVM invocations, the shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the Java Virtual Machine's metadata for these classes to be shared among multiple
  • 11. JVM processes. In Java SE 6.0, the list of classes in the "shared archive" has been updated to better reflect the changes to the system jar file. 2.4.2 Improvements to the boot class loader The Java Virtual Machine's boot and extension class loaders have been enhanced to improve the cold-start time of Java applications. Prior to Java SE 6, opening the system jar file caused the Java Virtual Machine to read a one-megabyte ZIP index file that translated into a lot of disk seek activity when the file was not in the disk cache. With "class data sharing" enabled, the Java Virtual Machine is now provided with a "meta-index" file (located in jre/lib) that contains high-level information about which packages (or package prefixes) are contained in which jar files. This helps the JVM avoid opening all of the jar files on the boot and extension class paths when a Java application class is loaded. Check bug 6278968} for more details. Below we show a chart comparing application start-up time performance between Java SE 5 and Java SE 6 Update 2. This test was conducted on an Intel Core 2 Duo 2.66GHz desktop machine with 1GB of memory:
  • 12. The application start-up comparison above shows relative performance (smaller is better) and in each case the benchmarks were ran without any performance flags. We also compared memory footprint size required between Java SE 5 and Java SE 6 Update 2. This test was conducted on an Intel Core 2 Duo 2.66GHz desktop machine with 1GB of memory:
  • 13. The footprint comparison above shows relative performance (smaller is better) and in each case the benchmarks were run without any performance flags. Despite the addition of many new features, the Java Virtual Machine's core memory usage has been pared down to make the actual memory impact on your system even lower than with Java SE 5 2.4.3 Splash Screen Functionality Java SE 6 provides a solution that allows an application to show a splash screen before the virtual machine starts. Now, a Java application launcher is able to decode an image and display it in a simple non-decorated window. 2.4.4 Swing's true double buffering Swing's true double buffering has now been enabled. Swing used to provide double buffering on an application basis, it now provides it on a per-window basis and native exposed events are copied directly from the double buffer. This significantly improves
  • 14. Swing performance, especially on remote servers. Please see the Scott Violet's Blog for full details. 2.4.5 Improving rendering on windows systems The UxTheme API, which allows standard Look&Feel rendering of windows controls on Microsoft Windows systems, has been adopted to improve the fidelity of Swing Systems Look & Feels. 3 New Platform Support Please see the Supported System Configurations chart for full details. 3.1 Operating Environments 3.1.1 Windows Vista Java SE 6 is supported on Windows Vista Ultimate Edition, Home Premium Edition, Home Basic Edition, Enterprise Edition and Business Edition in addition to Windows XP Home and Professional, 2000 Professional, 2000 Server, and 2003 Server. 4 Going Further 4.1 Java Performance Portal For the latest in Java Performance best practices, documentation, tools, FAQs, code samples, White Papers and other Java performance news check out the Java Performance Portal . Three especially relevant performance links for Java SE 6.0 are given here: 4.1.1 Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning The Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning document expands
  • 15. on GC tuning concepts and techniques for Java SE 6 that were introduced in the Tuning Garbage Collection with the 5.0 Java Virtual Machine document. 4.1.2 jvmstat 3.0 The jvmstat 3.0 home page documents the lightweight performance monitoring capabilities that are built into Java SE 6 and explains how to use these tools to monitor not only for the 6.0 HotSpot Java Virtual Machines but also HotSpot 1.5.0, 1.4.2 and 1.4.1 JVM's. 4.2 Java SE 6 Documentation Be sure to check out the wealth of Java SE 6 Documentation including the New Features and Enhancements and the Java Platform, Standard Edition 6 Overview. 4.3 Performance Monitoring and Management 4.3.1 DTrace Probes in HotSpot VM 4.3.2 New monitoring, management and diagnosability features 4.3.3 Observability Using Java SE 6 on Solaris OS 4.4 Benchmark Disclosure 4.4.1 SPECjbb 2005 SPECjbb2000 is a benchmark from the Standard Performance Evaluation Corporation (SPEC). The performance referenced is based on Sun internal software testing conforming to the testing methodologies listed above. For the latest SPECjbb2005 results visit http://www.spec.org/osg/jbb2005.
  • 16. 4.4.2 VolanoMark™ 2.5 VolanoMark™ version 2.5 is a benchmark from Volano LLC ( http://www.volano.com/ ).