IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

WORK WITH MULTIPLE HOT TERABYTES IN
JVMS
PER MINBORG
@PMINBORG
CTO, SPEEDMENT, INC.
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org

SCENARIO
>1TB
Application
Source ofTruth
In-JVM-Cache
In-Memory
Solution
Web Shop
StockTrade
Bank
Machine learning
Etc.

PROS OF IN-MEMORY
 Improved performance
 Consistent performance
 Cost reduction (server, AWS and licenses)

CHALLENGES OF IN-MEMORY
 Optimized Speed
 Cost and size of Memory
 Consistency, Restart, DB impact, etc.
 Organization and size of JVMs

OPTIMIZED SPEED
 No matter how advanced database you may ever use, it is really the data locality that counts
 Eventually, memory will cost less than x $/GB (Pick any x)

LATENCIES USING THE SPEED OF LIGHT
 Database query (1 s)

 Disk Seek – LA
 TCP (DC) – SJ
 SSD - Oakland

 Main Memory
 CPU L3 Cache

 CPU L2 Cache
 CPU L1 Cache

TITLE OF SLIDE GOES HEREHow much
does 1 GB
cost?

BACK TO THE FUTURE
$ 5
$ 0.04
$ 720,000
$ 67,000,000,000
Source: http://www.jcmit.com/memoryprice.htm

CACHE SYNCHRONIZATION STRATEGIES
• Dumps are reloaded periodically
• All data elements are reloaded
• Data remains unchanged between
reloads
• System restart is just a reload
DUMP AND LOAD
• Data evicted, refreshed or marked as old
• Evicted element are reloaded
• Data changes all the time
• System restart either warm-up the cache
or use a cold cache
POLL

CACHE SYNCHRONIZATION STRATEGIES
• Changed data is captured in the Database
• Changed data events are pushed into the cache
• Events are grouped in transactions
• Cache updates are persisted
• Data changes all the time
• System restart, replay the missed events
REACTIVE PERSISTANT
CACHING

COMPARISON
Dump and Load
Caching
Poll Caching Reactive
Persistance
Caching
Max Data Age Dump period Eviction time Replication Latency -
Lookup
Performance
Consistently Instant ~20% slow Consistently Instant
Consistency Eventually Consistent Inconsistent - stale Eventually Consistent
Database Cache
Update Load
Total Size Depends on Eviction
Time and Access
Rate of Change
Restart Complete Reload Eviction Time Down time update
-> 10% of down time
*

BIG JVMS WITH TERABYTES OF DATA
 Scale Up
 One large JVM handles all data
 Map memory to (SSD backed) files
 Several JVMs can share data via the file system
 Instant restart
 Scale Out
 Have several JVMs in a network
 Use sharding between nodes
 Redundant nodes

CONVENTIONAL JAVA APPLICATIONS
 Java Objects live on the Heap and are Garbage Collected periodically
 Garbage Collection times increases with the Java Heap size
 Garbage Collection times increases with the Java Heap mutation rate
 “The app has hit the GC wall”
 Hard to meet reasonable SLAs with more than 16:ish GB JVMs
 10 TB data and 10 GB JVMs -> ~1000 JVMs

OFF HEAP STORAGE
 Stores data outside of the Java heap
 The Garbage Collector does not see the content
 Scales up to terra bytes of main memory in a single JVM
 Use any number of nodes for scale out solutions

PERSISTENT SCALE OUT CACHE
 Persists data in files or memory mapped files
 SSD backing device recommended
 1.3 GB/s reload per node
 10 GB in 6s
 100 GB in 1 min
 1 TB in 10 min
 6.5 GB/s reload in a system with 10 nodes (1 active and 1 backup)
 10 GB in 1 s
 100 GB in 12 s
 1 TB in 2 min
 65 GB/s reload in a system with 100 nodes, 1 TB in 12 s

COMPRESSED OOPS IN JAVA 8
 Using the default of
–XX:+UseCompressedOops
–XX:ObjectAlignmentInBytes=16
 In a 64-bit JVM, it can use “compressed” memory references.
 This allows the heap to be up to 64 GB without the overhead of 64-bit object references.
 As all object must be 8 or 16-byte aligned, the lower 3 or 4 bits of the address are always zeros and
don’t need to be stored. This allows the heap to reference 4 billion * 16-bytes or 64 GB.
 Uses 32-bit references.

JVM SIZE SWEET SPOT
 50 GB off heap per node
 20 nodes per terabyte
 40 nodes per terabyte with minimum redundancy

CONCLUSIONS
 Get speed by keeping your data close to the application
 RAM is cheap and getting bigger and ever cheaper
 Consistent solution with Reactive Persistent Caching
Reactive Persistent Caching imposes minimum load on restart and on the DB
 Scale up solutions can be in the terabytes with virtual memory or file mapped memory
Scale out solutions can use 50 GBish nodes

SOLUTION
>1TB
Application
In-JVM-Cache
Web Shop
StockTrade
Bank
Machine learning
Etc.
Source ofTruth

SPEEDMENT
 Java Application Development Tool
 In-JVM-memory cache
 Database SQL Reflector (CDC, Change Data Capture)
 Pluggable storage engines (Speedment, Chronicle Map, Hazelcast, Grid Gain, etc.)
 Code generation tool -> Automatic domain model extraction from databases
 Transaction-aware

SPEEDMENT SCALE UP ULTRA-LOW LATENCY CACHE
 Ultra-low latency (Runs in the same JVM as the application)
 Millions of TPS
 Latencies measured in microseconds
 Supports file mapping
 Terabytes of data
 O(1) for equality operations
 O(log(N)) for other operations

SPEEDMENT SQL REFLECTOR
 Detects changes in a database
 Buffers the changes
 Can replay the changes later on
 Will preserve order
 Will preserve transactions
 Sees data as it was persisted
 Detects changes from any
source
Database
INSERT
UPDATE
DELETE

DOWNLOAD TRIAL @ WWW.SPEEDMENT.COM

CONNECT TO YOUR EXISTING SQL DB

OFFERINGS
 Complete solutions for in-memory hot big data
 Software licenses
 Service and support
 Consulting

sales@speedment.com
@Speedment
www.speedment.com

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs (20)

More from In-Memory Computing Summit (18)

Recently uploaded (20)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs