SlideShare a Scribd company logo
Java Caching, Turbo Charged
JavaDevRoom, FOSDEM 2015
Jens Wilke, headissue GmbH
twitter.com/cruftex
github.com/cruftex
http://cache2k.org
cache2k Overview
● Started in year 2000 as in house product and evolving since
● Focus on in memory (in heap) caching (persistence and off heap is on the
way)
● Research on optimized performance / modern eviction policies
● Open sourced 2013
● Contains features not found in (all) cache products, e.g.:
– On time expiry
– Extensive statistics
– Support for exceptions and nulls
– Blocking fetch for multiple requests on the same key
(read through configuration)
Eviction AlgorithmsEviction Algorithms
flickr:alexander
LRU
1 2 3 4 5 6 7
1 2 3 5 6 74
LRU Entry
cache access => move to front
CLOCK
hand
1=hit
1=hit0=no hit
0=no hit
0=no hit
1=hit
1=hit 1=hit
1=hit
Improving on LRU...
protect the working set
● For completeness: Least frequently used
– LFU
– LRFU
– …
● Split set of entries into cold and hot, to protect the working set
– 2Q
– LIRS
– ARC – Adaptive Replacement Cache
● Nimrod Megiddo and Dharmendra S. Modha (Usenix 2003) – patented by
IBM
– Clock-Pro
● Song Jiang, Feng Chen and Xiaodong Zhang (Usenix 2005)
cold set hot set
Improving on LRU...
history of seen entries
● Keep an LRU list of the evicted keys
● If seen again, insert directly into hot set
cold set hot set
ghost set (only keys)
Clock-Pro+
hand
Hot
0 hits
1 hit
0 hits
2 hits
0 hits
1 hit 4 hits
0 hits
2 hits
handCold
5 hits
0 hits 1 hits
Clock-Pro+ Evaluation
– Only inexpensive operation on access,
no exclusive access needed
– Better efficiency then LRU for most analyzed workloads
– Downside
● Eviction overhead increases when possible hitrates get high
(e.g. 3 entries scanned per eviction at 50% hitrate, 10 entries
scanned at 95%)
● High complexity, no straight forward implementation by the
book, lots of tuning needed (and possible)
– Still missing:
● Optimal selection of cold / hot space sizes
BenchmarksBenchmarks
flickr:bantam10
Benchmark Setup
● Cache implementations:
– Cache2k Version 0.21 (to be release next week)
– EHCache Version 2.9.0
– Guava 18
– Infinispan 7.1.0.CR2
● Oracle JRE 1.8-25
● Hardware
– Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
Test workload
– Keys and values are integers
– Read through configuration, the cache source
just returns the key
– Not practical: emphasis of caching overhead
// run the benchmark
Integer[] trace = ….
for (Integer v : trace) {
cache.get(v);
}
// Implementation of cache source
public Integer get(Integer o) {
incrementMissCount();
return o;
}
Runtime for artificial traces
3 million requests on cache with 500 capacity
Except Hits2000: cache with 2000 capacity
Hits: repeat different 500 values
Random: random select from 1000 values
Eff90 / Eff95: random trace with approx.
90% and 95% hitrate on LRU0
1
2
3
4
5
6
runtimeinseconds
Runtime of 3 million cache requests
cache2k/CLOCK
cache2k/CP+
cache2k/ARC
EHCache
Infinispan
Guava
Runtime for mostly hits
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
runtimeinseconds
Runtime of 3 million cache hits
HashMap+Counter
cache2k/CLOCK
cache2k/CP+
cache2k/ARC
EHCache
Infinispan
Guava
The first four times for Hits:
20ms, 50ms, 50ms, 70ms
Runtime with two threads
0
0.5
1
1.5
2
2.5
runtimeinseconds
3 million cache requests Eff95 per thread count
cache2k/CLOCK
cache2k/CP+
cache2k/ARC
EHCache
Infinispan
Guava
Some CPU consuming
computation is done on
cache miss
Eff95Threads2:
Same trace executed in
separate thread
with index offset
Hitrate comparison -
Artificial traces
0
10
20
30
40
50
60
70
80
90
100
runtimeinseconds
Hitrate of 3 million cache requests
cache2k/CLOCK
cache2k/CP+
cache2k/ARC
EHCache
Infinispan
Guava
Hitrate comparison -
Multi2 trace
0
10
20
30
40
50
60
70
80
Hitrates for Multi2 trace
OPT
LRU
CLOCK
CP+
ARC
EHCache
Infinispan
Guava
RAND
Hitrates comparison -
Web12 trace
0
10
20
30
40
50
60
70
80
90
Hitrates for Web12 trace
OPT
LRU
CLOCK
CP+
ARC
EHCache
Infinispan
Guava
RAND
Hitrate comparison -
Sprite trace
0
10
20
30
40
50
60
70
80
90
100
Hitrates for Sprite trace
OPT
LRU
CLOCK
CP+
ARC
EHCache
Infinispan
Guava
RAND
Take away
● The goal:
– Eviction algorithm doing better than LRU
– Self tuning / adapting
– Minimal overhead on cache access
Clock-Pro+ is quite there
Get involved...
● Try it: cache2k is on maven central
● Source on github:
● http://github.com/headissue/cache2k
● http://github.com/headissue/cache2k-benchmarks
● Ask questions on stackoverflow!
Thanks & Enjoy Life!Thanks & Enjoy Life!
http://cruftex.nethttp://cruftex.net http://cache2k.orghttp://cache2k.org

More Related Content

PDF
JCache / JSR107 shortcomings
PPTX
The Silence of the Canaries
PDF
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
PDF
OSNoise Tracer: Who Is Stealing My CPU Time?
PDF
Crimson: Ceph for the Age of NVMe and Persistent Memory
PPTX
grsecurity and PaX
PDF
Whoops! I Rewrote It in Rust
PDF
Spying on the Linux kernel for fun and profit
JCache / JSR107 shortcomings
The Silence of the Canaries
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
OSNoise Tracer: Who Is Stealing My CPU Time?
Crimson: Ceph for the Age of NVMe and Persistent Memory
grsecurity and PaX
Whoops! I Rewrote It in Rust
Spying on the Linux kernel for fun and profit

What's hot (20)

PDF
High-Performance Networking Using eBPF, XDP, and io_uring
PDF
Linux kernel debugging
PDF
Rust Is Safe. But Is It Fast?
PDF
Yet another introduction to Linux RCU
PDF
Data Structures for High Resolution, Real-time Telemetry at Scale
PDF
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
PDF
Get Lower Latency and Higher Throughput for Java Applications
PDF
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
PDF
syzbot and the tale of million kernel bugs
PDF
Where Did All These Cycles Go?
PDF
Let’s Fix Logging Once and for All
PPSX
LMAX Disruptor as real-life example
PDF
syzkaller: the next gen kernel fuzzer
PDF
Continuous Performance Regression Testing with JfrUnit
PDF
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
PPTX
Practical Glusto Example
PPTX
protothread and its usage in contiki OS
PDF
Java Heap Dump Analysis Primer
PDF
Efficient Bytecode Analysis: Linespeed Shellcode Detection
PPTX
Modern Linux Tracing Landscape
High-Performance Networking Using eBPF, XDP, and io_uring
Linux kernel debugging
Rust Is Safe. But Is It Fast?
Yet another introduction to Linux RCU
Data Structures for High Resolution, Real-time Telemetry at Scale
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
Get Lower Latency and Higher Throughput for Java Applications
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
syzbot and the tale of million kernel bugs
Where Did All These Cycles Go?
Let’s Fix Logging Once and for All
LMAX Disruptor as real-life example
syzkaller: the next gen kernel fuzzer
Continuous Performance Regression Testing with JfrUnit
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Practical Glusto Example
protothread and its usage in contiki OS
Java Heap Dump Analysis Primer
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Modern Linux Tracing Landscape
Ad

Similar to cache2k, Java Caching, Turbo Charged, FOSDEM 2015 (20)

PDF
Java In-Process Caching - Performance, Progress and Pitfalls
PDF
Java In-Process Caching - Performance, Progress and Pittfalls
PDF
LCA14: LCA14-412: GPGPU on ARM SoC session
PPTX
Memory model
ODP
Java gpu computing
PDF
StormCrawler at Bristech
PDF
Android Boot Time Optimization
PDF
Lock, Stock and Backup: Data Guaranteed
PDF
Java and Containers - Make it Awesome !
PDF
Hardware Assisted Latency Investigations
PDF
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
PDF
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
PDF
Varnish - PLNOG 4
PDF
Testing Persistent Storage Performance in Kubernetes with Sherlock
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
PDF
JVM Performance Tuning
PDF
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
PPT
Java util concurrent
PDF
Persistent Memory Programming with Java*
Java In-Process Caching - Performance, Progress and Pitfalls
Java In-Process Caching - Performance, Progress and Pittfalls
LCA14: LCA14-412: GPGPU on ARM SoC session
Memory model
Java gpu computing
StormCrawler at Bristech
Android Boot Time Optimization
Lock, Stock and Backup: Data Guaranteed
Java and Containers - Make it Awesome !
Hardware Assisted Latency Investigations
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Varnish - PLNOG 4
Testing Persistent Storage Performance in Kubernetes with Sherlock
VMworld 2016: vSphere 6.x Host Resource Deep Dive
JVM Performance Tuning
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Project Tungsten: Bringing Spark Closer to Bare Metal
Java util concurrent
Persistent Memory Programming with Java*
Ad

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Essential Infomation Tech presentation.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
medical staffing services at VALiNTRY
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Digital Strategies for Manufacturing Companies
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
2025 Textile ERP Trends: SAP, Odoo & Oracle
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Softaken Excel to vCard Converter Software.pdf
PTS Company Brochure 2025 (1).pdf.......
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Design an Analysis of Algorithms I-SECS-1021-03
Essential Infomation Tech presentation.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
System and Network Administraation Chapter 3
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
medical staffing services at VALiNTRY
Understanding Forklifts - TECH EHS Solution
Odoo Companies in India – Driving Business Transformation.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Upgrade and Innovation Strategies for SAP ERP Customers
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Strategies for Manufacturing Companies
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

cache2k, Java Caching, Turbo Charged, FOSDEM 2015

  • 1. Java Caching, Turbo Charged JavaDevRoom, FOSDEM 2015 Jens Wilke, headissue GmbH twitter.com/cruftex github.com/cruftex http://cache2k.org
  • 2. cache2k Overview ● Started in year 2000 as in house product and evolving since ● Focus on in memory (in heap) caching (persistence and off heap is on the way) ● Research on optimized performance / modern eviction policies ● Open sourced 2013 ● Contains features not found in (all) cache products, e.g.: – On time expiry – Extensive statistics – Support for exceptions and nulls – Blocking fetch for multiple requests on the same key (read through configuration)
  • 4. LRU 1 2 3 4 5 6 7 1 2 3 5 6 74 LRU Entry cache access => move to front
  • 5. CLOCK hand 1=hit 1=hit0=no hit 0=no hit 0=no hit 1=hit 1=hit 1=hit 1=hit
  • 6. Improving on LRU... protect the working set ● For completeness: Least frequently used – LFU – LRFU – … ● Split set of entries into cold and hot, to protect the working set – 2Q – LIRS – ARC – Adaptive Replacement Cache ● Nimrod Megiddo and Dharmendra S. Modha (Usenix 2003) – patented by IBM – Clock-Pro ● Song Jiang, Feng Chen and Xiaodong Zhang (Usenix 2005) cold set hot set
  • 7. Improving on LRU... history of seen entries ● Keep an LRU list of the evicted keys ● If seen again, insert directly into hot set cold set hot set ghost set (only keys)
  • 8. Clock-Pro+ hand Hot 0 hits 1 hit 0 hits 2 hits 0 hits 1 hit 4 hits 0 hits 2 hits handCold 5 hits 0 hits 1 hits
  • 9. Clock-Pro+ Evaluation – Only inexpensive operation on access, no exclusive access needed – Better efficiency then LRU for most analyzed workloads – Downside ● Eviction overhead increases when possible hitrates get high (e.g. 3 entries scanned per eviction at 50% hitrate, 10 entries scanned at 95%) ● High complexity, no straight forward implementation by the book, lots of tuning needed (and possible) – Still missing: ● Optimal selection of cold / hot space sizes
  • 11. Benchmark Setup ● Cache implementations: – Cache2k Version 0.21 (to be release next week) – EHCache Version 2.9.0 – Guava 18 – Infinispan 7.1.0.CR2 ● Oracle JRE 1.8-25 ● Hardware – Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
  • 12. Test workload – Keys and values are integers – Read through configuration, the cache source just returns the key – Not practical: emphasis of caching overhead // run the benchmark Integer[] trace = …. for (Integer v : trace) { cache.get(v); } // Implementation of cache source public Integer get(Integer o) { incrementMissCount(); return o; }
  • 13. Runtime for artificial traces 3 million requests on cache with 500 capacity Except Hits2000: cache with 2000 capacity Hits: repeat different 500 values Random: random select from 1000 values Eff90 / Eff95: random trace with approx. 90% and 95% hitrate on LRU0 1 2 3 4 5 6 runtimeinseconds Runtime of 3 million cache requests cache2k/CLOCK cache2k/CP+ cache2k/ARC EHCache Infinispan Guava
  • 14. Runtime for mostly hits 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 runtimeinseconds Runtime of 3 million cache hits HashMap+Counter cache2k/CLOCK cache2k/CP+ cache2k/ARC EHCache Infinispan Guava The first four times for Hits: 20ms, 50ms, 50ms, 70ms
  • 15. Runtime with two threads 0 0.5 1 1.5 2 2.5 runtimeinseconds 3 million cache requests Eff95 per thread count cache2k/CLOCK cache2k/CP+ cache2k/ARC EHCache Infinispan Guava Some CPU consuming computation is done on cache miss Eff95Threads2: Same trace executed in separate thread with index offset
  • 16. Hitrate comparison - Artificial traces 0 10 20 30 40 50 60 70 80 90 100 runtimeinseconds Hitrate of 3 million cache requests cache2k/CLOCK cache2k/CP+ cache2k/ARC EHCache Infinispan Guava
  • 17. Hitrate comparison - Multi2 trace 0 10 20 30 40 50 60 70 80 Hitrates for Multi2 trace OPT LRU CLOCK CP+ ARC EHCache Infinispan Guava RAND
  • 18. Hitrates comparison - Web12 trace 0 10 20 30 40 50 60 70 80 90 Hitrates for Web12 trace OPT LRU CLOCK CP+ ARC EHCache Infinispan Guava RAND
  • 19. Hitrate comparison - Sprite trace 0 10 20 30 40 50 60 70 80 90 100 Hitrates for Sprite trace OPT LRU CLOCK CP+ ARC EHCache Infinispan Guava RAND
  • 20. Take away ● The goal: – Eviction algorithm doing better than LRU – Self tuning / adapting – Minimal overhead on cache access Clock-Pro+ is quite there
  • 21. Get involved... ● Try it: cache2k is on maven central ● Source on github: ● http://github.com/headissue/cache2k ● http://github.com/headissue/cache2k-benchmarks ● Ask questions on stackoverflow!
  • 22. Thanks & Enjoy Life!Thanks & Enjoy Life! http://cruftex.nethttp://cruftex.net http://cache2k.orghttp://cache2k.org

Editor's Notes

  • #2: Hello! Jens
  • #4: When cache becomes full: What entry to remove? Also called replacement policy Heart of the cache
  • #5: Implementation: Each access moves entry to the front of a double linked list Eviction: LRU entry is at the tail of the list Evaluation: Simple List manipulation needs exclusive access Does not work well for some workloads, especially: not scan resistant! Around since 1965
  • #6: Implementation: Cache entries in cyclic linked list Hand points into list and moved forward for eviction Insert: Entry inserted before hand Eviction: Move hand. Reset hit-bit or evict entry if hit-bit is 0 Evaluation: Inexpensive operation on access: Just set hit bit Usually not as effective as LRU Not scan resistant
  • #9: Implementation: Three clocks for cold, hot and ghost entries Increment hit counter on access Eviction: „shuffle“ entries between cold and hot and select the entry to be evicted with lowest hits Insert: Check for ghost, then insert in cold or hot set
  • #14: 3 million accesses on 500 entry cache First values are:30ms, 50ms, 50ms, 70ms 32x more effective then Infinispan ARC implementation uses synchronize for LRU operation Single threaded benchmark Java 8 does a good job for optimizing synchronize GC time for boxed integer keys is significant Benchmark is (always) questionable,timer resolution!
  • #16: about 95% hitrate Second thread executes the same trace with an offset More realistic, cache source does more work (generates 1000 random numbers per request) cache2k uses no segmentation, the hitrate and the cost for generating the cached value influences the number of possible concurrency