SlideShare a Scribd company logo
High Performance
Managed Languages
Martin Thompson - @mjpt777
Really, what is your preferred
platform for building HFT
applications?
High Performance Managed Languages
Why would you build
low-latency applications on a
GC’ed platform?
High Performance Managed Languages
1. Some Context
2. Runtime Optimisation
3. Garbage Collection
4. Algorithms & Design
Agenda
Some Context
Let’s be clear
A Managed Runtime is not
always a good choice…
Latency Arbitrage?
High Performance Managed Languages
Are native languages faster?
High Performance Managed Languages
Time?
Skills & Resources?
CPU vs Memory
Performance
Time to add 2 integers?
1 CPU Cycle : < 1ns
http://www.agner.org/optimize/instruction_tables.pdf
Time per operation to sum the
values in an array of integers?
Access Pattern Benchmark
Benchmark Score Error Units
===========================================
sequential 0.832 ± 0.006 ns/op
Really???
Less than 1ns per operation?
High Performance Managed Languages
What if the access pattern
is different?
Access Pattern Benchmark
Benchmark Score Error Units
===========================================
sequential 0.832 ± 0.006 ns/op
randomPage 2.703 ± 0.025 ns/op
dependentRandomPage 7.102 ± 0.326 ns/op
randomHeap 19.896 ± 3.110 ns/op
dependentRandomHeap 89.516 ± 4.573 ns/op
Data Dependent Loads
aka “Pointer Chasing”!!!
Performance 101
1. Memory is transported in Cachelines
Performance 101
1. Memory is transported in Cachelines
2. Memory is managed in OS Pages
Performance 101
1. Memory is transported in Cachelines
2. Memory is managed in OS Pages
3. Memory is pre-fetched on
predictable access patterns
Performance 101
Runtime Optimisation
1. Profile guided optimisations
Runtime JIT
1. Profile guided optimisations
2. Bets can be taken and later revoked
Runtime JIT
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Block B
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Block C
Block B
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Block C
Block B
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Block C
Block A
Block C
Block B
Branches
void foo()
{
// code
if (condition)
{
// code
}
// code
}
Block A
Block C
Block B
Subtle Branches
int result = (i > 7) ? a : b;
Subtle Branches
int result = (i > 7) ? a : b;
CMOV vs Branch Prediction?
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
bar()
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
bar()
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
Block B
bar()
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
Block B
bar()
Block A
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
Block B
bar()
Block A
bar()
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
Block A
Block B
bar()
Block A
Block B
bar()
Method/Function Inlining
void foo()
{
// code
bar();
// code
}
i-cache
& code bloat?
“Inlining is THE optimisation.”
- Cliff Click
Method/Function Inlining
void foo(int[] array, int length)
{
for (int i = 0; i < length; i++)
{
bar(Integer.bitCount(array[i]));
}
}
Loops
void foo(int[] array, int length)
{
for (int i = 0; i < length; i += 4)
{
bar(Integer.bitCount(array[i]));
bar(Integer.bitCount(array[i + 1]));
bar(Integer.bitCount(array[i + 2]));
bar(Integer.bitCount(array[i + 3]));
}
}
Loops
Loops
void foo(int[] array, int length)
{
for (int i = 0; i < length; i++)
{
bar(Integer.bitCount(array[i]));
}
}
Intrinsics
Subtype Polymorphism
void draw(Shape[] shapes)
{
for (int i = 0; i < shapes.length; i++)
{
shapes[i].draw();
}
}
Subtype Polymorphism
void draw(Shape[] shapes)
{
for (int i = 0; i < shapes.length; i++)
{
shapes[i].draw();
}
}
void bar(Shape shape)
{
bar(shape.isVisible());
}
Subtype Polymorphism
void draw(Shape[] shapes)
{
for (int i = 0; i < shapes.length; i++)
{
shapes[i].draw();
}
}
void bar(Shape shape)
{
bar(shape.isVisible());
}
Class Hierarchy Analysis
& Inline Caching
1. Profile guided optimisations
2. Bets can be taken and later revoked
Runtime JIT
Garbage Collection
Generational Garbage Collection
“Only the good die young”
- Billy Joel
Eden Survivor 0 Survivor 1
Young/New Generation
TLAB
TLAB
Tenured
Virtual
Virtual
Old Generation
Generational Garbage Collection
Modern Hardware (Intel Sandy Bridge EP)
C 1 C n C 1 C nRegisters/Buffers <1ns
L1 L1 L1 L1~4 cycles ~1ns
L2 L2 L2 L2~12 cycles ~3ns
L3 L3
~40 cycles ~15ns
~75 cycles ~25ns (dirty hit)
~65ns
DRAM
QPI ~40ns
MC MC
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
...
...
...
...
...
...
QPI QPIPCI-e 3 PCI-e 3
40X
IO
40X
IO
* Assumption: 3GHz Processor
Broadwell EX – 24 cores & 60MB L3 Cache
Eden
Young/New Generation
TLAB
TLAB
Thread Local Allocation Buffers
Eden
Young/New Generation
TLAB
TLAB
Thread Local Allocation Buffers
• Affords locality of reference
• Avoid false sharing
• Can have NUMA aware allocation
Eden Survivor 0 Survivor 1
Young/New Generation
TLAB
TLAB
Virtual
Object Survival
Eden Survivor 0 Survivor 1
Young/New Generation
TLAB
TLAB
Virtual
Object Survival
• Aging Policies
• Compacting Copy
• NUMA Interleave
• Fast Parallel Scavenging
• Only the survivors require work
Eden Survivor 0 Survivor 1
Young/New Generation
TLAB
TLAB
Tenured
Virtual
Virtual
Old Generation
Object Promotion
Eden Survivor 0 Survivor 1
Young/New Generation
TLAB
TLAB
Tenured
Virtual
Virtual
Old Generation
Object Promotion
• Concurrent Collection
• String Deduplication
Compacting Collections
Compacting Collections – Depth first copy
Compacting Collections
Compacting Collections
OS Pages and
cache lines?
EdenE
O
S
O
S
O E
S OO
E
O E O O
O
Survivor
Old
Unused
O
S
E
H
HumongousH
H
O O
G1 – Concurrent Compaction?
Azul Zing C4
True Concurrent Compacting
Collector
Where next for GC?
Object Inlining/Aggregation
GC vs Manual Memory Management
Not easy to pick clear winner…
GC vs Manual Memory Management
Managed GC
• GC Implementation
• Card Marking
• Read/Write Barriers
• Object Headers
• Background Overhead
on CPU and Memory
Not easy to pick clear winner…
GC vs Manual Memory Management
Managed GC
• GC Implementation
• Card Marking
• Read/Write Barriers
• Object Headers
• Background Overhead
on CPU and Memory
Not easy to pick clear winner…
Native
• Malloc Implementation
• Arena/pool contention
• Bin Wastage
• Fragmentation
• Debugging Effort
• Inter-thread costs
Algorithms & Design
What is most important to
performance?
Time
“If I had more time, I would
have written a shorter letter.”
- Blaise Pascal
• Avoiding duplicate work
• Avoiding cache misses
• Avoiding contention
• Strength reduction
• Amortising expensive operations
• Mechanical Sympathy
• Choice of algorithms & data structures
• API design
In a large codebase it is really
difficult to do everything well
It also takes some “uncommon”
disciplines such as:
profiling, telemetry, modelling…
The story of Aeron
https://github.com/real-logic/Aeron
Aeron is an interesting lesson in
“time to performance”
Lots of others exists such at the
C# Roslyn compiler
Time spent on
Mechanical Sympathy
vs
Debugging Pointers
???
GC
Immutable Data & Concurrency
In Closing …
What does the future hold?
Remember
Assembly vs Compiled
Languages?
What about
footprint, startup, warm up, etc.
???
High Performance Managed Languages
High Performance Managed Languages
High Performance Managed Languages
Blog: http://mechanical-sympathy.blogspot.com/
Twitter: @mjpt777
“Any intelligent fool can make things bigger, more
complex, and more violent.
It takes a touch of genius, and a lot of courage, to move
in the opposite direction.”
- Albert Einstein
Questions?

More Related Content

PDF
Hourglass Interfaces for C++ APIs - CppCon 2014
PPTX
Connecting C++ and JavaScript on the Web with Embind
PDF
Learn To Test Like A Grumpy Programmer - 3 hour workshop
ODP
2009 Eclipse Con
PPTX
Php extensions
PDF
ITB2019 Real World Scenarios for Modern CFML - Nolan Erck
PPTX
Async and Parallel F#
PDF
From V8 to Modern Compilers
Hourglass Interfaces for C++ APIs - CppCon 2014
Connecting C++ and JavaScript on the Web with Embind
Learn To Test Like A Grumpy Programmer - 3 hour workshop
2009 Eclipse Con
Php extensions
ITB2019 Real World Scenarios for Modern CFML - Nolan Erck
Async and Parallel F#
From V8 to Modern Compilers

What's hot (6)

PPTX
Old code doesn't stink
PDF
Ruby Performance - The Last Mile - RubyConf India 2016
PDF
JRuby 9000 - Optimizing Above the JVM
PDF
"Lego Programming" with Lorzy
PPT
Coldfusion
 
PDF
JRuby 9000 - Taipei Ruby User's Group 2015
Old code doesn't stink
Ruby Performance - The Last Mile - RubyConf India 2016
JRuby 9000 - Optimizing Above the JVM
"Lego Programming" with Lorzy
Coldfusion
 
JRuby 9000 - Taipei Ruby User's Group 2015
Ad

Similar to High Performance Managed Languages (20)

ODP
ooc - A hybrid language experiment
ODP
ooc - A hybrid language experiment
PDF
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
PDF
Introduction to Parallelization ans performance optimization
PPTX
Introduction to Parallelization ans performance optimization
PDF
Introduction to Parallelization and performance optimization
PPTX
Gopher in performance_tales_ms_go_cracow
PDF
Need for Async: Hot pursuit for scalable applications
PDF
Experiences building a distributed shared log on RADOS - Noah Watkins
PPTX
Gpgpu intro
PDF
Java tuning on GNU/Linux for busy dev
PDF
All About Storeconfigs
PDF
Erlang - Concurrent Language for Concurrent World
PDF
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
PPT
A Life of breakpoint
PPTX
Data oriented design and c++
PDF
London Spark Meetup Project Tungsten Oct 12 2015
PDF
Alto Desempenho com Java
PDF
Memory Management and Leaks in Postgres from pgext.day 2025
PPTX
Exploring .NET memory management - JetBrains webinar
ooc - A hybrid language experiment
ooc - A hybrid language experiment
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
Introduction to Parallelization and performance optimization
Gopher in performance_tales_ms_go_cracow
Need for Async: Hot pursuit for scalable applications
Experiences building a distributed shared log on RADOS - Noah Watkins
Gpgpu intro
Java tuning on GNU/Linux for busy dev
All About Storeconfigs
Erlang - Concurrent Language for Concurrent World
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
A Life of breakpoint
Data oriented design and c++
London Spark Meetup Project Tungsten Oct 12 2015
Alto Desempenho com Java
Memory Management and Leaks in Postgres from pgext.day 2025
Exploring .NET memory management - JetBrains webinar
Ad

More from J On The Beach (20)

PDF
Massively scalable ETL in real world applications: the hard way
PPTX
Big Data On Data You Don’t Have
PPTX
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
PDF
Pushing it to the edge in IoT
PDF
Drinking from the firehose, with virtual streams and virtual actors
PDF
How do we deploy? From Punched cards to Immutable server pattern
PDF
Java, Turbocharged
PDF
When Cloud Native meets the Financial Sector
PDF
The big data Universe. Literally.
PDF
Streaming to a New Jakarta EE
PDF
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
PDF
Pushing AI to the Client with WebAssembly and Blazor
PDF
Axon Server went RAFTing
PDF
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
PDF
Madaari : Ordering For The Monkeys
PDF
Servers are doomed to fail
PDF
Interaction Protocols: It's all about good manners
PDF
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
PDF
Leadership at every level
PDF
Machine Learning: The Bare Math Behind Libraries
Massively scalable ETL in real world applications: the hard way
Big Data On Data You Don’t Have
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Pushing it to the edge in IoT
Drinking from the firehose, with virtual streams and virtual actors
How do we deploy? From Punched cards to Immutable server pattern
Java, Turbocharged
When Cloud Native meets the Financial Sector
The big data Universe. Literally.
Streaming to a New Jakarta EE
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
Pushing AI to the Client with WebAssembly and Blazor
Axon Server went RAFTing
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
Madaari : Ordering For The Monkeys
Servers are doomed to fail
Interaction Protocols: It's all about good manners
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
Leadership at every level
Machine Learning: The Bare Math Behind Libraries

Recently uploaded (20)

PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
A Presentation on Artificial Intelligence
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
August Patch Tuesday
PDF
Encapsulation theory and applications.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Encapsulation_ Review paper, used for researhc scholars
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Tartificialntelligence_presentation.pptx
Hybrid model detection and classification of lung cancer
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
cloud_computing_Infrastucture_as_cloud_p
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
A comparative study of natural language inference in Swahili using monolingua...
WOOl fibre morphology and structure.pdf for textiles
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Zenith AI: Advanced Artificial Intelligence
A Presentation on Artificial Intelligence
Enhancing emotion recognition model for a student engagement use case through...
August Patch Tuesday
Encapsulation theory and applications.pdf

High Performance Managed Languages