SlideShare a Scribd company logo
The JVM MagicBaruch SadogurskyConsultant & Architect, AlphaCSP
AgendaIntroductionGC Magic 101General OptimizationsCompiler OptimizationsWhat can I do?Programming tipsJVM configuration flags2
Introduction
IntroductionIn the past, JVM was considered by many as Java Achilles’ heelInterpreter?!JVM team improved performance in 300 to 3000 timesJDK 1.6 compared to JDK 1.0Java is measured to be 50% to 100+% the speed of C and C++Jake2 vs Quake2How can it be?
Java Virtual Machines ZooCEE-J Excelsior JETHewlett-PackardJ9 (IBM)JbedJblendJrockitMRJMicroJvmMS JVMOJVMPERCBlackdown JavaCVMGemstoneGolden Code DevelopmentIntentNovellNSIcomCrE-MEChaiVMHotSpotAegisVMApache HarmonyCACAODalvikIcedTeaIKVM.NETJamigaJamVMJaosJCJelatine JVMJESSICAJikes RVMJnodeJOPJuiceJupiterJXKaffeleJOSMika VMMysaifuNanoVMSableVMSquawk virtual machineSuperWabaTinyVMVMkit of Low Level Virtual MachineWonka VMXam5
HotSpot Virtual MachineDeveloped by Longview Technologies  back in 1999Contains:Class loaderBytecode interpreter2 Virtual machines7 Garbage collectors2 CompilersRuntime libraries
HotSpot Virtual MachineConfigured by hundreds of –XX flagsReminder -X options are non-standard-XX options have specific system requirements for correct operationsBoth are subject to change without notice
GC Magic 101
GC Is Slow?GC has bad performance reputationReduces throughputIntroduces pausesUnpredictableUncontrolledPerformance degradation is proportional to objects countJust give me the damn free() and malloc()! I’ll be just fine!Is it so?
Generational CollectorsWeak generational hypothesisMost objects die young (AKA Infant mortality)Few old to young referencesGenerations: regions holding objects of different agesGC is done separately once a generation fillsDifferent GC algorithmsThe young (nursery) generationCollected by “Minor garbage collection”The old (tenured) generationCollected by “Minor garbage collection”
GC Magic 101vsYoung is better than TenuredLet your objects die in young generationWhen possible and makes sense11
GC Magic 10112vsSwapping is badApplication's memory footprint should not exceed the available physical memory
GC Magic 10113vsChoose:Throughput (client)Low-pause (server)
GC Magic 101http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html14
Tracking Collectors AlgorithmsMark-Sweep collectorMark phase marks each reachable objectSweep phase “sweeps” the heapNon marked objects reclaimed as garbageCopying collectorHeap is divided into two equal spacesWhen active space fills, live objects are copied to the unused spaceOnly live objects are examinedThe roles of the spaces are then flipped
CompactionCompaction: The collector moves all live objects to the bottom of the heapRemaining memory is reclaimedReduces the cost of objects allocationNo potential fragmentationThe drawback is slower completion of GC
The Young generationConsists of Eden + two survivor spaces  Objects are initially allocated in EdenAll HotSpot young collectors are stop-the-world copying collectorsDone is parallel for parallel garbage collectorsCollections are relatively fast and proportional to number of live objects
The Young generation
The Tenured generationObjects surviving several GC cycles, are promoted to the tenured generation Use -XX:MaxTenuringThreshold=# to changeCollectors algorithms used are variations of Mark-SweepMore space efficientCharacteristicsLower garbage densityBigger heap space Fewer GC cycles
Generetion Collectors
Garbage Collectors21
GC Flags22
When to Use23
Garbage First (G1)New in JDK 1.6 u14 (May 29th)All memory is divided to 1MB bucketsCalculates objects liveness in bucketsDrops “dead” bucketsIf a bucket is not total garbage, it’s not droppedCollects the most garbage buckets firstPauses only on “mark”No sweepUser can provide pause time goalsActual seconds or Percentage of runtimeG1 records bucket collection time and can estimate how many buckets to collect during pause
Garbage First (G1)Targets multi-process machines and large heapsG1 will be the long-term replacement for the CMS collectorUnlike CMS, compacts to battle fragmentationA bucket’s space is fully reclaimedBetter throughputPredictable pauses (high probability)Garbage left in buckets with high live ratioMay be collected later
Benefits of G1No imbalance of young-tenured generationGenerations are only logical Generations are merely sets of buckets More predictable GC pausesParallelism and concurrency in collections No fragmentation due to compactionBetter heap utilization Better GC ergonomics
Young GCs in G1Done using evacuation pausesStop-The-World parallel collectionsEvacuates surviving objects between sets of buckets
Old GCs in G1Drops dead bucketsCalculates liveness info per bucketIdentifies best buckets for subsequent eviction pausesCollect them piggy-backed on young GCs
GC Ergonomics29
GC ErgonomicsErgonomics goal is to provide good performance with little or no tuningBetter matches the needs of different application typesThe HotSpot, garbage collector and heap size are automatically chosenBased on OS, RAM and no# CPUServer Vs. Client class machineHints the characteristics of the application
GC Ergonomics
GC ErgonomicsWith the parallel collectors, one can specify performance goalsIn contrast to specifying the heap sizeImproves performance for large applicationsMax Pause Time GoalUse -XX:MaxGCPauseMillis=<N>Both generation separatelyOr: Average + VarianceNo pause time goal by default
GC ErgonomicsThroughput GoalUse -XX:GCTimeRatio=<N>The ratio of GC Vs. application time is 1/(1+N)If N=19, GC time goal is 1/(1+19) or 5%Default N is 99, meaning GC time is 1%  Minimum Footprint GoalPriority of goalsMaximum pause time goalThroughput goalMinimum footprint goal
GC ErgonomicsPerformance goals may not be metPause time and throughput goals are somewhat contradictingThe pause time goal shrinks the generationThe throughput goal grows the generationStatistics are kept by the GCAdaptive to changes in application behavior
GC Tweaking
Heap SizeThe larger the heap space, the betterFor both young and old generationLarger space: less frequent GCs, lower GC overhead, objects more likely to become garbageSmaller space: faster GCs (not always! see later)Sometimes max heap size is dictated by available memory and/or max space the JVM can addressYou have to find a good balance between young and old generation size
Heap SizeMaximize the number of objects reclaimed in the young generationApplication's memory footprint should not exceed the available physical memorySwapping is badThe above apply to all our GCs37
Heap Size-Xmx<size> : max heap sizeyoung generation + old generation-Xms<size> : initial heap sizeyoung generation + old generation-Xmn<size> : young generation size-XX:PermSize=<size> : permanent generation initial size-XX:MaxPermSize=<size> : permanent generation max size38
Heap SizeWhen -Xms != -Xmx, heap growth or shrinking requires a Full GCSet -Xms to desired heap size	Set –Xmx even higher “just in case”Even full GC is better than OOM crashSame for -XX:PermSize and -XX:MaxPermSizeSame for -XX:NewSize and-XX:MaxNewSize-Xmn Combines both39
TenuringMeasure tenuring with - XX:+PrintTenuringDistributionAvoid tenuring for short or even medium-lived objects!Less promotion into the old generationLess frequent old GCsPromote long-lived objects ASAPYeah, conflict with previous bulletBetter copy more, than promote more-XX:TargetSurvivorRatio=<percent>, e.g., 50How much of the survivor space should be filledTypically leave extra space to deal with “spikes”40
Permanent SpaceClasses aren’t unloaded by default-XX:+CMSClassUnloadingEnabled to enableClassloader should be collectedIt holds references to classesEach object holds reference to classloader41
GC Options42
GC Statistics OptionsGC logging has extremely low / non-existent overheadIt’s very helpful when diagnosing production issuesEnable itIn production too!-XX:+PrintGCPrintGCDetailsPrintGCTimeStampsPrintTenuringDistributionShow this threshold and the ages of objects in the new generation43
GC Is Slow? – The AnswersReduces throughputYou chooseIntroduces pausesYou chooseUnpredictableNot any moreUncontrolledConfigurablePerformance degradation is proportional to objects countNot trueJust give me the damn free() and malloc()! I’ll be just fine!Bad idea (see more later)
General Optimizations
HotSpot OptimizationsJIT CompilationCompiler OptimizationsGenerates more performant code that you could write in nativeAdaptive OptimizationSplit Time VerificationClass Data Sharing
Two Virtual Machines?Client VMReducing start-up time and memory footprint-client CL flagServer VMMaximum program execution speed-server CL flagAuto-detectionServer: >1 CPUs & >=2GB of physical memoryWin32 – always detected as clientMany 64bit OSes don’t have client VMs47
Just-In-Time CompilationEveryone knows about JIT!Hot code is compiled to nativeWhat is “hot”?Server VM – 10000 invocationsClient VM – 1500 invocationsUse -XX:CompileThreshold=# to changeMore invocations – better optimizationsLess invocations – shorter warmup time
Just-In-Time CompilationThe code is being optimized by the compilerComing soon…
Adaptive OptimizationAllows HotSpot to uncompile previously compiled codeMuch more aggressive, even speculative optimizations may be performedAnd rolled back if something goes wrong or new data gatheredE.g. classloading might invalidate inlining
Split Time VerificationJava suffers from long boot timeOne of the reasons is bytecode verificationValid flow controlType safetyVisibilityIn order to ease on the weak KVM, J2ME started performing part of the verification in compile timeIt’s good, so now it’s in Java SE 6 too
Class Data SharingHelps improve startup timeDuring JDK installation part of rt.jar is preloaded into shared memory file which is attached in runtimeNo need to reload and reverify those classes every time
Compiler Optimizations
Two Types of OptimizationsJava has two compilers:javac bytecode compilerHotSpot VM JIT compilerBoth implement similar optimizationsBytecode compiler is limitedDynamic linkingCan apply only static optimizations
WarningCaution! Don’t try this at home yourself!The source code you are about to see is not real!It’s pseudo assembly codeDon’t writesuch code!Source code should be readable and object-orientedBytecode will become performant automagically55
Optimization RulesMake the common case fastDon't worry about uncommon/infrequent caseDefer optimization decisionsUntil you have dataRevisit decisions if data warrants56
Null check EliminationJava is null-safe languagePointer can’t point to meaningless portion of memoryNull checks are added by the compiler, NullPointerException is thrownJVM’s profiler can eliminate those checks57
Example – Original Source58
Example – Null Check Elimination59
InliningLove Encapsulation?Getters and settersLove clean and simple code?Small methodsUse static code analysis?Small methodsNo penalty for using those!JIT brings the implementation of these methods into a containing methodThis optimization known as “Inlining”
InliningNot just about eliminating call overheadProvides optimizer with bigger blocksEnables other optimizationshoisting, dead code elimination, code motion, strength reduction61
InliningBut wait, all public non-final methods in Java are virtual!HotSpot examines the exact case in placeIn most cases there is only one implementation, which can be inlinedBut wait, more implementations may be loaded later!In such case HotSpot undoes the inliningSpeculative inliningBy default limited to 35 bytes of bytecodeUse -XX:MaxInlineSize=# to change
Example - Inlining63
Example – Source Code Revision64
Example – Source Code Revision65
Code HoistingHoist = to raise or liftSize optimizationEliminate duplicate code in method bodies by hoisting expressions or statementsDuplicate bytecode, not necessarily source code
Example – Code Hoisting67
Bounds Check EliminationJava promises automatic boundary checks for arraysException is thrownIf programmer checks the boundaries of its array by himself, the automatic check can be turned off
Example – Bounds Check Elimination69
Sub-Expression EliminationAvoids redundant memory access70
Loop UnrollingSome loops shouldn’t be loopsIn performance meaning, not code readabilityThose can be unrolled to set of statementsIf the boundaries are dynamic, partial unroll will occur

More Related Content

PPTX
Memory Management: What You Need to Know When Moving to Java 8
PPTX
HotSpot JVM Tuning
PDF
Java at Scale, Dallas JUG, October 2013
PDF
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
PDF
Basics of JVM Tuning
PDF
Tuning Java for Big Data
PPT
Performance tuning jvm
PDF
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS
Memory Management: What You Need to Know When Moving to Java 8
HotSpot JVM Tuning
Java at Scale, Dallas JUG, October 2013
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Basics of JVM Tuning
Tuning Java for Big Data
Performance tuning jvm
Scaling Jakarta EE Applications Vertically and Horizontally with Jelastic PaaS

What's hot (20)

PDF
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
PPTX
Tuning Java GC to resolve performance issues
PPTX
Java performance tuning
PDF
JVM Garbage Collection Tuning
PDF
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
PDF
Low latency Java apps
PDF
Running Java Applications inside Kubernetes with Nested Container Architectur...
PDF
JVM and Garbage Collection Tuning
PDF
Fight with Metaspace OOM
PPTX
G1 Garbage Collector - Big Heaps and Low Pauses?
PDF
DC JUG: Understanding Java Garbage Collection
PDF
Performance Tuning - Understanding Garbage Collection
PPTX
G1 collector and tuning and Cassandra
PDF
Understanding Java Garbage Collection
PDF
What you need to know about GC
PDF
Introduction of Java GC Tuning and Java Java Mission Control
PPTX
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
PDF
淺談 Java GC 原理、調教和 新發展
PDF
Understanding Java Garbage Collection - And What You Can Do About It
PDF
-XX:+UseG1GC
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
Tuning Java GC to resolve performance issues
Java performance tuning
JVM Garbage Collection Tuning
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
Low latency Java apps
Running Java Applications inside Kubernetes with Nested Container Architectur...
JVM and Garbage Collection Tuning
Fight with Metaspace OOM
G1 Garbage Collector - Big Heaps and Low Pauses?
DC JUG: Understanding Java Garbage Collection
Performance Tuning - Understanding Garbage Collection
G1 collector and tuning and Cassandra
Understanding Java Garbage Collection
What you need to know about GC
Introduction of Java GC Tuning and Java Java Mission Control
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
淺談 Java GC 原理、調教和 新發展
Understanding Java Garbage Collection - And What You Can Do About It
-XX:+UseG1GC
Ad

Similar to JVM Magic (20)

PDF
[BGOUG] Java GC - Friend or Foe
PPTX
Jvm lecture
PPTX
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
PPTX
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
PDF
Jvm is-your-friend
PDF
The JVM is your friend
PPTX
Jvm tuning for low latency application & Cassandra
PDF
Garbage First & You
PDF
Garbage First and you
PDF
Garbage First and You!
PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PPTX
JVM memory management & Diagnostics
PPTX
Java gc and JVM optimization
PDF
[Jbcn 2016] Garbage Collectors WTF!?
PPT
Jvm Performance Tunning
PPT
Jvm Performance Tunning
PDF
Tomcatx troubleshooting-production
PDF
Taming The JVM
PDF
GC Tuning Confessions Of A Performance Engineer
PPTX
Java Memory Management Tricks
[BGOUG] Java GC - Friend or Foe
Jvm lecture
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Jvm is-your-friend
The JVM is your friend
Jvm tuning for low latency application & Cassandra
Garbage First & You
Garbage First and you
Garbage First and You!
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
JVM memory management & Diagnostics
Java gc and JVM optimization
[Jbcn 2016] Garbage Collectors WTF!?
Jvm Performance Tunning
Jvm Performance Tunning
Tomcatx troubleshooting-production
Taming The JVM
GC Tuning Confessions Of A Performance Engineer
Java Memory Management Tricks
Ad

More from Baruch Sadogursky (20)

PDF
DevOps Patterns & Antipatterns for Continuous Software Updates @ NADOG April ...
PDF
DevOps Patterns & Antipatterns for Continuous Software Updates @ DevOps.com A...
PDF
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Oracle Code NY...
PDF
Data driven devops as presented at QCon London 2018
PDF
A Research Study Into DevOps Bottlenecks as presented at Oracle Code LA 2018
PDF
Java Puzzlers NG S03 a DevNexus 2018
PDF
Where the Helm are your binaries? as presented at Canada Kubernetes Meetups
PDF
Data driven devops as presented at Codemash 2018
PDF
A Research Study into DevOps Bottlenecks as presented at Codemash 2018
PPTX
Best Practices for Managing Docker Versions as presented at JavaOne 2017
PDF
Troubleshooting & Debugging Production Microservices in Kubernetes as present...
PDF
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Devoxx 2017
PPTX
Amazon Alexa Skills vs Google Home Actions, the Big Java VUI Faceoff as prese...
PDF
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at DevOps Days Be...
PDF
Java Puzzlers NG S02: Down the Rabbit Hole as it was presented at The Pittsbu...
PDF
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at The Pittsburgh...
PDF
Let’s Wing It: A Study in DevRel Strategy
PDF
Log Driven First Class Customer Support at Scale
PPTX
[Webinar] The Frog And The Butler: CI Pipelines For Modern DevOps
PDF
Patterns and antipatterns in Docker image lifecycle as was presented at DC Do...
DevOps Patterns & Antipatterns for Continuous Software Updates @ NADOG April ...
DevOps Patterns & Antipatterns for Continuous Software Updates @ DevOps.com A...
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Oracle Code NY...
Data driven devops as presented at QCon London 2018
A Research Study Into DevOps Bottlenecks as presented at Oracle Code LA 2018
Java Puzzlers NG S03 a DevNexus 2018
Where the Helm are your binaries? as presented at Canada Kubernetes Meetups
Data driven devops as presented at Codemash 2018
A Research Study into DevOps Bottlenecks as presented at Codemash 2018
Best Practices for Managing Docker Versions as presented at JavaOne 2017
Troubleshooting & Debugging Production Microservices in Kubernetes as present...
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Devoxx 2017
Amazon Alexa Skills vs Google Home Actions, the Big Java VUI Faceoff as prese...
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at DevOps Days Be...
Java Puzzlers NG S02: Down the Rabbit Hole as it was presented at The Pittsbu...
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at The Pittsburgh...
Let’s Wing It: A Study in DevRel Strategy
Log Driven First Class Customer Support at Scale
[Webinar] The Frog And The Butler: CI Pipelines For Modern DevOps
Patterns and antipatterns in Docker image lifecycle as was presented at DC Do...

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology

JVM Magic

  • 1. The JVM MagicBaruch SadogurskyConsultant & Architect, AlphaCSP
  • 2. AgendaIntroductionGC Magic 101General OptimizationsCompiler OptimizationsWhat can I do?Programming tipsJVM configuration flags2
  • 4. IntroductionIn the past, JVM was considered by many as Java Achilles’ heelInterpreter?!JVM team improved performance in 300 to 3000 timesJDK 1.6 compared to JDK 1.0Java is measured to be 50% to 100+% the speed of C and C++Jake2 vs Quake2How can it be?
  • 5. Java Virtual Machines ZooCEE-J Excelsior JETHewlett-PackardJ9 (IBM)JbedJblendJrockitMRJMicroJvmMS JVMOJVMPERCBlackdown JavaCVMGemstoneGolden Code DevelopmentIntentNovellNSIcomCrE-MEChaiVMHotSpotAegisVMApache HarmonyCACAODalvikIcedTeaIKVM.NETJamigaJamVMJaosJCJelatine JVMJESSICAJikes RVMJnodeJOPJuiceJupiterJXKaffeleJOSMika VMMysaifuNanoVMSableVMSquawk virtual machineSuperWabaTinyVMVMkit of Low Level Virtual MachineWonka VMXam5
  • 6. HotSpot Virtual MachineDeveloped by Longview Technologies back in 1999Contains:Class loaderBytecode interpreter2 Virtual machines7 Garbage collectors2 CompilersRuntime libraries
  • 7. HotSpot Virtual MachineConfigured by hundreds of –XX flagsReminder -X options are non-standard-XX options have specific system requirements for correct operationsBoth are subject to change without notice
  • 9. GC Is Slow?GC has bad performance reputationReduces throughputIntroduces pausesUnpredictableUncontrolledPerformance degradation is proportional to objects countJust give me the damn free() and malloc()! I’ll be just fine!Is it so?
  • 10. Generational CollectorsWeak generational hypothesisMost objects die young (AKA Infant mortality)Few old to young referencesGenerations: regions holding objects of different agesGC is done separately once a generation fillsDifferent GC algorithmsThe young (nursery) generationCollected by “Minor garbage collection”The old (tenured) generationCollected by “Minor garbage collection”
  • 11. GC Magic 101vsYoung is better than TenuredLet your objects die in young generationWhen possible and makes sense11
  • 12. GC Magic 10112vsSwapping is badApplication's memory footprint should not exceed the available physical memory
  • 13. GC Magic 10113vsChoose:Throughput (client)Low-pause (server)
  • 15. Tracking Collectors AlgorithmsMark-Sweep collectorMark phase marks each reachable objectSweep phase “sweeps” the heapNon marked objects reclaimed as garbageCopying collectorHeap is divided into two equal spacesWhen active space fills, live objects are copied to the unused spaceOnly live objects are examinedThe roles of the spaces are then flipped
  • 16. CompactionCompaction: The collector moves all live objects to the bottom of the heapRemaining memory is reclaimedReduces the cost of objects allocationNo potential fragmentationThe drawback is slower completion of GC
  • 17. The Young generationConsists of Eden + two survivor spaces Objects are initially allocated in EdenAll HotSpot young collectors are stop-the-world copying collectorsDone is parallel for parallel garbage collectorsCollections are relatively fast and proportional to number of live objects
  • 19. The Tenured generationObjects surviving several GC cycles, are promoted to the tenured generation Use -XX:MaxTenuringThreshold=# to changeCollectors algorithms used are variations of Mark-SweepMore space efficientCharacteristicsLower garbage densityBigger heap space Fewer GC cycles
  • 24. Garbage First (G1)New in JDK 1.6 u14 (May 29th)All memory is divided to 1MB bucketsCalculates objects liveness in bucketsDrops “dead” bucketsIf a bucket is not total garbage, it’s not droppedCollects the most garbage buckets firstPauses only on “mark”No sweepUser can provide pause time goalsActual seconds or Percentage of runtimeG1 records bucket collection time and can estimate how many buckets to collect during pause
  • 25. Garbage First (G1)Targets multi-process machines and large heapsG1 will be the long-term replacement for the CMS collectorUnlike CMS, compacts to battle fragmentationA bucket’s space is fully reclaimedBetter throughputPredictable pauses (high probability)Garbage left in buckets with high live ratioMay be collected later
  • 26. Benefits of G1No imbalance of young-tenured generationGenerations are only logical Generations are merely sets of buckets More predictable GC pausesParallelism and concurrency in collections No fragmentation due to compactionBetter heap utilization Better GC ergonomics
  • 27. Young GCs in G1Done using evacuation pausesStop-The-World parallel collectionsEvacuates surviving objects between sets of buckets
  • 28. Old GCs in G1Drops dead bucketsCalculates liveness info per bucketIdentifies best buckets for subsequent eviction pausesCollect them piggy-backed on young GCs
  • 30. GC ErgonomicsErgonomics goal is to provide good performance with little or no tuningBetter matches the needs of different application typesThe HotSpot, garbage collector and heap size are automatically chosenBased on OS, RAM and no# CPUServer Vs. Client class machineHints the characteristics of the application
  • 32. GC ErgonomicsWith the parallel collectors, one can specify performance goalsIn contrast to specifying the heap sizeImproves performance for large applicationsMax Pause Time GoalUse -XX:MaxGCPauseMillis=<N>Both generation separatelyOr: Average + VarianceNo pause time goal by default
  • 33. GC ErgonomicsThroughput GoalUse -XX:GCTimeRatio=<N>The ratio of GC Vs. application time is 1/(1+N)If N=19, GC time goal is 1/(1+19) or 5%Default N is 99, meaning GC time is 1% Minimum Footprint GoalPriority of goalsMaximum pause time goalThroughput goalMinimum footprint goal
  • 34. GC ErgonomicsPerformance goals may not be metPause time and throughput goals are somewhat contradictingThe pause time goal shrinks the generationThe throughput goal grows the generationStatistics are kept by the GCAdaptive to changes in application behavior
  • 36. Heap SizeThe larger the heap space, the betterFor both young and old generationLarger space: less frequent GCs, lower GC overhead, objects more likely to become garbageSmaller space: faster GCs (not always! see later)Sometimes max heap size is dictated by available memory and/or max space the JVM can addressYou have to find a good balance between young and old generation size
  • 37. Heap SizeMaximize the number of objects reclaimed in the young generationApplication's memory footprint should not exceed the available physical memorySwapping is badThe above apply to all our GCs37
  • 38. Heap Size-Xmx<size> : max heap sizeyoung generation + old generation-Xms<size> : initial heap sizeyoung generation + old generation-Xmn<size> : young generation size-XX:PermSize=<size> : permanent generation initial size-XX:MaxPermSize=<size> : permanent generation max size38
  • 39. Heap SizeWhen -Xms != -Xmx, heap growth or shrinking requires a Full GCSet -Xms to desired heap size Set –Xmx even higher “just in case”Even full GC is better than OOM crashSame for -XX:PermSize and -XX:MaxPermSizeSame for -XX:NewSize and-XX:MaxNewSize-Xmn Combines both39
  • 40. TenuringMeasure tenuring with - XX:+PrintTenuringDistributionAvoid tenuring for short or even medium-lived objects!Less promotion into the old generationLess frequent old GCsPromote long-lived objects ASAPYeah, conflict with previous bulletBetter copy more, than promote more-XX:TargetSurvivorRatio=<percent>, e.g., 50How much of the survivor space should be filledTypically leave extra space to deal with “spikes”40
  • 41. Permanent SpaceClasses aren’t unloaded by default-XX:+CMSClassUnloadingEnabled to enableClassloader should be collectedIt holds references to classesEach object holds reference to classloader41
  • 43. GC Statistics OptionsGC logging has extremely low / non-existent overheadIt’s very helpful when diagnosing production issuesEnable itIn production too!-XX:+PrintGCPrintGCDetailsPrintGCTimeStampsPrintTenuringDistributionShow this threshold and the ages of objects in the new generation43
  • 44. GC Is Slow? – The AnswersReduces throughputYou chooseIntroduces pausesYou chooseUnpredictableNot any moreUncontrolledConfigurablePerformance degradation is proportional to objects countNot trueJust give me the damn free() and malloc()! I’ll be just fine!Bad idea (see more later)
  • 46. HotSpot OptimizationsJIT CompilationCompiler OptimizationsGenerates more performant code that you could write in nativeAdaptive OptimizationSplit Time VerificationClass Data Sharing
  • 47. Two Virtual Machines?Client VMReducing start-up time and memory footprint-client CL flagServer VMMaximum program execution speed-server CL flagAuto-detectionServer: >1 CPUs & >=2GB of physical memoryWin32 – always detected as clientMany 64bit OSes don’t have client VMs47
  • 48. Just-In-Time CompilationEveryone knows about JIT!Hot code is compiled to nativeWhat is “hot”?Server VM – 10000 invocationsClient VM – 1500 invocationsUse -XX:CompileThreshold=# to changeMore invocations – better optimizationsLess invocations – shorter warmup time
  • 49. Just-In-Time CompilationThe code is being optimized by the compilerComing soon…
  • 50. Adaptive OptimizationAllows HotSpot to uncompile previously compiled codeMuch more aggressive, even speculative optimizations may be performedAnd rolled back if something goes wrong or new data gatheredE.g. classloading might invalidate inlining
  • 51. Split Time VerificationJava suffers from long boot timeOne of the reasons is bytecode verificationValid flow controlType safetyVisibilityIn order to ease on the weak KVM, J2ME started performing part of the verification in compile timeIt’s good, so now it’s in Java SE 6 too
  • 52. Class Data SharingHelps improve startup timeDuring JDK installation part of rt.jar is preloaded into shared memory file which is attached in runtimeNo need to reload and reverify those classes every time
  • 54. Two Types of OptimizationsJava has two compilers:javac bytecode compilerHotSpot VM JIT compilerBoth implement similar optimizationsBytecode compiler is limitedDynamic linkingCan apply only static optimizations
  • 55. WarningCaution! Don’t try this at home yourself!The source code you are about to see is not real!It’s pseudo assembly codeDon’t writesuch code!Source code should be readable and object-orientedBytecode will become performant automagically55
  • 56. Optimization RulesMake the common case fastDon't worry about uncommon/infrequent caseDefer optimization decisionsUntil you have dataRevisit decisions if data warrants56
  • 57. Null check EliminationJava is null-safe languagePointer can’t point to meaningless portion of memoryNull checks are added by the compiler, NullPointerException is thrownJVM’s profiler can eliminate those checks57
  • 59. Example – Null Check Elimination59
  • 60. InliningLove Encapsulation?Getters and settersLove clean and simple code?Small methodsUse static code analysis?Small methodsNo penalty for using those!JIT brings the implementation of these methods into a containing methodThis optimization known as “Inlining”
  • 61. InliningNot just about eliminating call overheadProvides optimizer with bigger blocksEnables other optimizationshoisting, dead code elimination, code motion, strength reduction61
  • 62. InliningBut wait, all public non-final methods in Java are virtual!HotSpot examines the exact case in placeIn most cases there is only one implementation, which can be inlinedBut wait, more implementations may be loaded later!In such case HotSpot undoes the inliningSpeculative inliningBy default limited to 35 bytes of bytecodeUse -XX:MaxInlineSize=# to change
  • 64. Example – Source Code Revision64
  • 65. Example – Source Code Revision65
  • 66. Code HoistingHoist = to raise or liftSize optimizationEliminate duplicate code in method bodies by hoisting expressions or statementsDuplicate bytecode, not necessarily source code
  • 67. Example – Code Hoisting67
  • 68. Bounds Check EliminationJava promises automatic boundary checks for arraysException is thrownIf programmer checks the boundaries of its array by himself, the automatic check can be turned off
  • 69. Example – Bounds Check Elimination69
  • 71. Loop UnrollingSome loops shouldn’t be loopsIn performance meaning, not code readabilityThose can be unrolled to set of statementsIf the boundaries are dynamic, partial unroll will occur
  • 72. Example – Loop Unrolling72
  • 74. Escape AnalysisEscape analysis is not optimizationIt is check for object not escaping local scopeE.g. created in private method, assigned to local variable and not returnedEscape analysis opens up possibilities for lots of optimizations
  • 75. Scalar ReplacementRemember the rule “new == always new object”?False!JVM can optimize away allocationsFields are hoisted into registersObject becomes unneededBut object creation is cheap!Yap, but GC is not so cheap…75
  • 76. Example – Source Code Revision76
  • 77. Example – Scalar Replacement77
  • 78. Example – Scalar Replacement78
  • 79. Lock CoarseningHotSpot merges adjacent synchronized blocks using the same lockThe compiler is allowed to moved statements into merged coarse blocks Tradeoff performance and responsivenessReduces instruction countBut locks are held longer
  • 80. Example – Source Code Revision80
  • 81. Example – Lock Coarsening81
  • 82. Lock ElisionA thread enters a lock that no other thread will synchronize onSynchronization has no effectCan be deducted using escape analysisSuch locks can be elidedElides 4 StringBuffer synchronized calls:
  • 83. Example - Lock Elision
  • 84. Constants FoldingTrivial optimizationHow many constants are there?More than you think!Inlining generates constantsUnrolling generates constantsEscape analysis generates constantsJIT determines what is constant in runtimeWhatever doesn’t change
  • 85. Constants FoldingLiterals foldingBefore: intfoo = 9*10;After: intfoo = 90;String folding or StringBuilder-ingBefore: String foo = "hi Joe " + (9*10);After: String foo = newStringBuilder().append("hi Joe ").append(9 * 10).toString();After: String foo = "hi Joe 90";
  • 87. Dead Code EliminationDead code - code that has no effect on the outcome of the program execution publicstaticvoid main(String[] args) {long start = System.nanoTime(); int result = 0; for (inti = 0; i < 10 * 1000 * 1000; i++) { result += Math.sqrt(i); } long duration = (System.nanoTime() - start) / 1000000; System.out.format("Test duration: %d (ms) %n", duration);}
  • 88. OSR - On Stack ReplacementNormally code is switched from interpretation to native in heap contextBefore entering methodOSR - switch from interpretation to compiled code in local contextIn the middle of a method callJVM tracks code block execution count Less optimizationsMay prevent bound check elimination and loop unrolling
  • 91. Programming & Tuning Tips 91How Can I Help?Just write good quality Java codeObject OrientationPolymorphismAbstractionEncapsulationDRYKISSLet the HotSpot optimize
  • 92. How Can I Help?final keywordFor fields:Allows cachingAllows lock coarseningFor methods:Simplifies Inlining decisionsImmutable objects die younger93
  • 93. JVM tuning tipsReminder: -XX options are non standardAdded for HotSpot development purposesMostly tested on Solaris 10Platform dependentSome options may contradict each otherKnow and experiment with these options 94
  • 95. ReferencesThe HotSpot Home PageJava HotSpot VM OptionsDynamic compilation and performance measurementUrban performance legends, revisitedSynchronization optimizations in MustangRobust Java benchmarkingGarbage Collection Tuning96
  • 96. ReferencesJavaOne 2009 Sessions:Garbage Collection Tuning in the Java HotSpot™ Virtual MachineUnder the Hood: Inside a High-Performance JVM™ MachinePractical Lessons in Memory AnalysisDebugging Your Production JVM™ MachineInside Out: A Modern Virtual Machine Revealed97
  • 97. Thank you for your attention Thanks to Ori Dar!