TeraCache: Efficient Caching over
Fast Storage Devices
Iacovos G. Kolokasis1,2, Anastasios Papagiannis1,2, Foivos Zakkak3, Shoaib Akram4,
Christos Kozanitis2, Polyvios Pratikakis1,2, and Angelos Bilas1,2
1University of Crete
2Foundation of Research and Technology Hellas (FORTH), Greece
3Red Hat, Inc.
4Australian National University
Spark Caching Mechanism
▪ Stores the result of an RDD
▪ Essential when an RDD is used across
multiple Spark jobs
▪ Caching avoids recomputation and
reduces execution time
▪ Effective for iterative workloads
(e.g., ML, graph processing)
▪ How much data do we need to cache?
Storage Level
MEMORY_ONLY
MEMORY_AND_DISK
DISK_ONLY
OFF_HEAP
Source: https://spark.apache.org/docs/latest/rdd-programming-
guide.html
2
Increasing Memory Demands!
▪ Analytics datasets grow at high rate
▪ Today ~50ZB
▪ By 2025 ~175ZB
▪ Typical deployments use roughly as
much DRAM as the input dataset
▪ Typically cached data is even larger
than the input dataset
50ZB
175ZB
3.5x
Source: Seagate – The Digitization of the World
3
Cached Data Size Matters
▪ In-memory caching needs a lot of
DRAM
▪ DRAM density difficult to increase
▪ Fast storage (NVMe) scales to
TBs/device
▪ Spark already uses fast storage for
cached data – However, at high cost
Workload
Input
Dataset
(GB)
Cached
Rdds
(GB)
Linear Regression (LR)
64
182
Log. Regression (LgR) 160
SVM 188
4
3x
Dilemma: On-heap vs Off-heap NVMe Caching
Executor
Memory
Execution Memory Storage Memory
Executor
Memory
Pros Cons
On-heap
Cache
No Serialization High GC
Off-heap
Cache
Low GC
High
Serialization
Can we avoid
Serialization and reduce GC?
Serialization / Deserialization
Execution Memory Storage Memory
5
GC GC
Cached Objects Behave Differently
Dataset
Create RDD
Persist
Operations
Unpersist
GC
Spark App
Java Heap
6
Cached Objects Behave Differently
Dataset
Create RDD
Persist
Operations
Unpersist
GC
Create RDDs
Spark App
Java Heap
7
Cached Objects Behave Differently
Dataset
Persist
Operations
Unpersist
GC
Create RDDs
Persist
Spark App
Java Heap
Persist
Operations
▪ GC between persist-unpersist extremely wasteful
▪ GC scans all objects in the heap
8
Cached RDDs
Cached Objects Behave Differently
Dataset
Persist
Operations
Unpersist
GC
Create RDDs
Spark App
Java Heap
Unpersist
▪ GC reclaim cached RDDs after unpersist
9
Our Approach: Treat Cached Objects Differently
▪ Objects in JAVA follow generational hypothesis
▪ Opportunity: Nomadic hypothesis observation
▪ Spark cached objects are
▪ Long-lived: Used across multiple Spark jobs (cache)
▪ Intermittently-accessed: Long intervals without access (NVMe)
▪ Grouped life-times: RDD objects leave the cache at the same time (unpersist)
▪ Place cached objects in a special heap
10
TeraCache: Introduce a Second JVM heap on NVMe
▪ Execution Heap remains as a garbage collected heap
▪ Maintains the JVM heap for execution purposes
▪ The second TeraCache heap has two significant advantages
▪ No GC: Use persist/unpersist semantics to avoid GC
▪ No Serialization/Deserialization: Use memory-mapped I/O
11
TeraCache Design Overview
TeraCache: Design Overview
Execution Memory Storage Memory
JVM heap TeraCacheJVM
Spark Executor
DR1 DR2DRAM
NVMe SSD
mmap()
13
Spark Knocks on the JVM Door
Spark Application
Spark
Runtime
JVM
rdd.persist()
- Store RDD to Storage Memory
- Notify JVM to mark RDD object
▪ Spark notifies JVM for RDD caching
▪ At persist/unpersist operations
▪ Add new TeraFlag word in JVM objects
▪ JVM creates new object, sets TeraFlag
JVM heap TeraCache
14
Spark Knocks on the JVM Door
Spark Application
Spark
Runtime
JVM
rdd.persist()
- Store RDD to Storage Memory
- Notify JVM to mark RDD object
▪ Spark notifies JVM for RDD caching
▪ At persist/unpersist operations
▪ Add new TeraFlag word in JVM objects
▪ JVM creates new object, sets TeraFlag
▪ Move to TeraCache during next full GC
JVM heap TeraCache
15
TeraCache Design: Avoid GC
How to Avoid GC in TeraCache?
▪ Disallow backward pointers to Heap
▪ Move transitive closure in TeraCache
JVM heap TeraCache
17
How To Avoid GC in TeraCache?
▪ Disallow backward pointers to Heap
▪ Move transitive closure in TeraCache
▪ Allow forward pointers from Heap
▪ Objects in TeraCache do not move
▪ Fence GC from following forward pointers
JVM heap TeraCache
JVM heap TeraCache
18
Organize TeraCache in Regions
▪ Objects that belong to the same RDD
have similar life-time
▪ Organize TeraCache in regions
▪ Place objects in regions based on life-time
▪ Dynamic size of regions
▪ Bulk free
▪ Reclaim entire region
...
19
JVM heap TeraCache
Bulk Free Regions
▪ To provide correct and bulk free
▪ Allow only pointers within regions
▪ Merge regions with crossing
pointers when objects move to TeraCache
▪ Keep a bit map with live regions
▪ Track reachable regions from JVM heap
in every GC
▪ During GC marking phase identify
active regions
▪ Mark the bit array if there is a pointer from
the JVM heap to a TeraCache region
JVM heap TeraCache
JVM heap TeraCache
20
TeraCache Design: Avoid Serialization
No Serialization→Memory Mapped I/O
▪ MMIO allows same data format on memory and device
▪ No explicit device I/O - Only accesses using load/store
▪ Linux Kernel already supports required mechanisms for MMIO
▪ We use FastMap [ATC'20]: Optimize scalability of Linux MMIO
22
Competition for DRAM Resource
▪ Execution Memory must reside in DRAM
▪ A lot of short-lived data
▪ We need large DR1
▪ Cached objects are accessed as well
▪ E.g., Iterative jobs reuse cached data
▪ We need large DR2
▪ Can we statically divide DRAM between
the heaps?
Execution Memory Storage Memory
JVM heap TeraCache
DR1 DR2DRAM
JVM
Executor
NVMe SSD
mmap()
23
Dividing DRAM between Heaps
▪ KMeans (KM)-jobs produce more
short-lived data
▪ More minor GCs
▪ More space for DR1
▪ Linear Regression (LR)-jobs reuse
more cached data
▪ More page faults/s
▪ More space for DR2
▪ Dynamic Resizing of DR1, DR2
▪ Based on page-fault rate in MMIO
▪ Based on minor GCs
3x
2x
24
DR1 Size (GB) - DRAM = 32GB
Preliminary Evaluation
Early Prototype Implementation
▪ We implement a prototype of TeraCache based on ParallelGC
▪ Place New Generation on DRAM
▪ Place Old Generation on fast storage device
▪ Explicitly disable GC on Old Generation
▪ Remaining to be implemented
▪ Cached RDDs reclamation
▪ Dynamic DR1/DR2 resizing
▪ Evaluation
▪ GC overhead
▪ Serialization overhead
26
TeraCache Improves Performance by 25%
▪ Compared to Serialization: TC better up to 37% (on average 25%)
▪ Compared to GC + Linux swap: TC better up to 2x
2x
37%
SW – Linux Kernel Swap
HY – MEMORY_AND_DISK
TC - TeraCache
27
TeraCache Reduces GC Time by up to 50%
50%
HY – MEMORY_AND_DISK
TC - TeraCache
28
Conclusions
TeraCache: Efficient Caching over Fast Storage
▪ Spark incurs high overhead for caching RDDs
▪ We observe: Spark cached data follow a nomadic hypothesis
▪ We introduce TeraCache which both reduces GC and eliminates
serialization by using two heaps (generational, nomadic)
▪ We improve performance of Spark ML workloads by 25% (avg)
▪ Currently we are working on the full prototype
30
Iacovos G. Kolokasis
kolokasis@ics.forth.gr
www.csd.uoc.gr/~kolokasis
Thank you for your attention
This work is supported by the EU Horizon 2020 Evolve project (#825061)
Anastasios Papagiannis is supported by Facebook Graduate Fellowship
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

PPTX
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
PDF
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
PPTX
A Comparative Performance Evaluation of Apache Flink
PPTX
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
PDF
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
PDF
Parquet performance tuning: the missing guide
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
A Comparative Performance Evaluation of Apache Flink
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataEngConf SF16 - Collecting and Moving Data at Scale
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Parquet performance tuning: the missing guide

What's hot (20)

PDF
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
PPTX
Realtime olap architecture in apache kylin 3.0
PDF
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PDF
ORC 2015: Faster, Better, Smaller
PDF
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
PPTX
Time-Series Apache HBase
PDF
Transactional writes to cloud storage with Eric Liang
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
PDF
Set Up & Operate Real-Time Data Loading into Hadoop
PDF
Replicate from Oracle to data warehouses and analytics
PDF
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
PDF
From docker to kubernetes: running Apache Hadoop in a cloud native way
PPTX
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
PDF
Memory Management in Apache Spark
PPTX
CaffeOnSpark Update: Recent Enhancements and Use Cases
PDF
Re-Architecting Spark For Performance Understandability
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Realtime olap architecture in apache kylin 3.0
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
Floating on a RAFT: HBase Durability with Apache Ratis
ORC 2015: Faster, Better, Smaller
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Time-Series Apache HBase
Transactional writes to cloud storage with Eric Liang
HBase Tales From the Trenches - Short stories about most common HBase operati...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Set Up & Operate Real-Time Data Loading into Hadoop
Replicate from Oracle to data warehouses and analytics
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
From docker to kubernetes: running Apache Hadoop in a cloud native way
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
Memory Management in Apache Spark
CaffeOnSpark Update: Recent Enhancements and Use Cases
Re-Architecting Spark For Performance Understandability
Ad

Similar to TeraCache: Efficient Caching Over Fast Storage Devices (20)

PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
PDF
Software Design for Persistent Memory Systems
PDF
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...
PPTX
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
PDF
Application Caching: The Hidden Microservice (SAConf)
PDF
NUMA and Java Databases
PDF
In-Memory Computing - The Big Picture
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
PPTX
Jug Lugano - Scale over the limits
PPTX
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
PDF
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
PDF
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
PDF
Scaling Your Cache And Caching At Scale
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PPTX
The age of rename() is over
PDF
Big data processing meets non-volatile memory: opportunities and challenges
PDF
OpenDS_Jazoon2010
PPTX
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Software Design for Persistent Memory Systems
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
Application Caching: The Hidden Microservice (SAConf)
NUMA and Java Databases
In-Memory Computing - The Big Picture
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Jug Lugano - Scale over the limits
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Scaling Your Cache And Caching At Scale
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
From cache to in-memory data grid. Introduction to Hazelcast.
The age of rename() is over
Big data processing meets non-volatile memory: opportunities and challenges
OpenDS_Jazoon2010
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Microsoft 365 products and services descrption
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
New ISO 27001_2022 standard and the changes
PPT
Predictive modeling basics in data cleaning process
PDF
Introduction to the R Programming Language
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Global Data and Analytics Market Outlook Report
PDF
Transcultural that can help you someday.
PPTX
modul_python (1).pptx for professional and student
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction
CYBER SECURITY the Next Warefare Tactics
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Microsoft 365 products and services descrption
Topic 5 Presentation 5 Lesson 5 Corporate Fin
SAP 2 completion done . PRESENTATION.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
New ISO 27001_2022 standard and the changes
Predictive modeling basics in data cleaning process
Introduction to the R Programming Language
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
STERILIZATION AND DISINFECTION-1.ppthhhbx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Global Data and Analytics Market Outlook Report
Transcultural that can help you someday.
modul_python (1).pptx for professional and student
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx

TeraCache: Efficient Caching Over Fast Storage Devices

  • 1. TeraCache: Efficient Caching over Fast Storage Devices Iacovos G. Kolokasis1,2, Anastasios Papagiannis1,2, Foivos Zakkak3, Shoaib Akram4, Christos Kozanitis2, Polyvios Pratikakis1,2, and Angelos Bilas1,2 1University of Crete 2Foundation of Research and Technology Hellas (FORTH), Greece 3Red Hat, Inc. 4Australian National University
  • 2. Spark Caching Mechanism ▪ Stores the result of an RDD ▪ Essential when an RDD is used across multiple Spark jobs ▪ Caching avoids recomputation and reduces execution time ▪ Effective for iterative workloads (e.g., ML, graph processing) ▪ How much data do we need to cache? Storage Level MEMORY_ONLY MEMORY_AND_DISK DISK_ONLY OFF_HEAP Source: https://spark.apache.org/docs/latest/rdd-programming- guide.html 2
  • 3. Increasing Memory Demands! ▪ Analytics datasets grow at high rate ▪ Today ~50ZB ▪ By 2025 ~175ZB ▪ Typical deployments use roughly as much DRAM as the input dataset ▪ Typically cached data is even larger than the input dataset 50ZB 175ZB 3.5x Source: Seagate – The Digitization of the World 3
  • 4. Cached Data Size Matters ▪ In-memory caching needs a lot of DRAM ▪ DRAM density difficult to increase ▪ Fast storage (NVMe) scales to TBs/device ▪ Spark already uses fast storage for cached data – However, at high cost Workload Input Dataset (GB) Cached Rdds (GB) Linear Regression (LR) 64 182 Log. Regression (LgR) 160 SVM 188 4 3x
  • 5. Dilemma: On-heap vs Off-heap NVMe Caching Executor Memory Execution Memory Storage Memory Executor Memory Pros Cons On-heap Cache No Serialization High GC Off-heap Cache Low GC High Serialization Can we avoid Serialization and reduce GC? Serialization / Deserialization Execution Memory Storage Memory 5 GC GC
  • 6. Cached Objects Behave Differently Dataset Create RDD Persist Operations Unpersist GC Spark App Java Heap 6
  • 7. Cached Objects Behave Differently Dataset Create RDD Persist Operations Unpersist GC Create RDDs Spark App Java Heap 7
  • 8. Cached Objects Behave Differently Dataset Persist Operations Unpersist GC Create RDDs Persist Spark App Java Heap Persist Operations ▪ GC between persist-unpersist extremely wasteful ▪ GC scans all objects in the heap 8 Cached RDDs
  • 9. Cached Objects Behave Differently Dataset Persist Operations Unpersist GC Create RDDs Spark App Java Heap Unpersist ▪ GC reclaim cached RDDs after unpersist 9
  • 10. Our Approach: Treat Cached Objects Differently ▪ Objects in JAVA follow generational hypothesis ▪ Opportunity: Nomadic hypothesis observation ▪ Spark cached objects are ▪ Long-lived: Used across multiple Spark jobs (cache) ▪ Intermittently-accessed: Long intervals without access (NVMe) ▪ Grouped life-times: RDD objects leave the cache at the same time (unpersist) ▪ Place cached objects in a special heap 10
  • 11. TeraCache: Introduce a Second JVM heap on NVMe ▪ Execution Heap remains as a garbage collected heap ▪ Maintains the JVM heap for execution purposes ▪ The second TeraCache heap has two significant advantages ▪ No GC: Use persist/unpersist semantics to avoid GC ▪ No Serialization/Deserialization: Use memory-mapped I/O 11
  • 13. TeraCache: Design Overview Execution Memory Storage Memory JVM heap TeraCacheJVM Spark Executor DR1 DR2DRAM NVMe SSD mmap() 13
  • 14. Spark Knocks on the JVM Door Spark Application Spark Runtime JVM rdd.persist() - Store RDD to Storage Memory - Notify JVM to mark RDD object ▪ Spark notifies JVM for RDD caching ▪ At persist/unpersist operations ▪ Add new TeraFlag word in JVM objects ▪ JVM creates new object, sets TeraFlag JVM heap TeraCache 14
  • 15. Spark Knocks on the JVM Door Spark Application Spark Runtime JVM rdd.persist() - Store RDD to Storage Memory - Notify JVM to mark RDD object ▪ Spark notifies JVM for RDD caching ▪ At persist/unpersist operations ▪ Add new TeraFlag word in JVM objects ▪ JVM creates new object, sets TeraFlag ▪ Move to TeraCache during next full GC JVM heap TeraCache 15
  • 17. How to Avoid GC in TeraCache? ▪ Disallow backward pointers to Heap ▪ Move transitive closure in TeraCache JVM heap TeraCache 17
  • 18. How To Avoid GC in TeraCache? ▪ Disallow backward pointers to Heap ▪ Move transitive closure in TeraCache ▪ Allow forward pointers from Heap ▪ Objects in TeraCache do not move ▪ Fence GC from following forward pointers JVM heap TeraCache JVM heap TeraCache 18
  • 19. Organize TeraCache in Regions ▪ Objects that belong to the same RDD have similar life-time ▪ Organize TeraCache in regions ▪ Place objects in regions based on life-time ▪ Dynamic size of regions ▪ Bulk free ▪ Reclaim entire region ... 19 JVM heap TeraCache
  • 20. Bulk Free Regions ▪ To provide correct and bulk free ▪ Allow only pointers within regions ▪ Merge regions with crossing pointers when objects move to TeraCache ▪ Keep a bit map with live regions ▪ Track reachable regions from JVM heap in every GC ▪ During GC marking phase identify active regions ▪ Mark the bit array if there is a pointer from the JVM heap to a TeraCache region JVM heap TeraCache JVM heap TeraCache 20
  • 21. TeraCache Design: Avoid Serialization
  • 22. No Serialization→Memory Mapped I/O ▪ MMIO allows same data format on memory and device ▪ No explicit device I/O - Only accesses using load/store ▪ Linux Kernel already supports required mechanisms for MMIO ▪ We use FastMap [ATC'20]: Optimize scalability of Linux MMIO 22
  • 23. Competition for DRAM Resource ▪ Execution Memory must reside in DRAM ▪ A lot of short-lived data ▪ We need large DR1 ▪ Cached objects are accessed as well ▪ E.g., Iterative jobs reuse cached data ▪ We need large DR2 ▪ Can we statically divide DRAM between the heaps? Execution Memory Storage Memory JVM heap TeraCache DR1 DR2DRAM JVM Executor NVMe SSD mmap() 23
  • 24. Dividing DRAM between Heaps ▪ KMeans (KM)-jobs produce more short-lived data ▪ More minor GCs ▪ More space for DR1 ▪ Linear Regression (LR)-jobs reuse more cached data ▪ More page faults/s ▪ More space for DR2 ▪ Dynamic Resizing of DR1, DR2 ▪ Based on page-fault rate in MMIO ▪ Based on minor GCs 3x 2x 24 DR1 Size (GB) - DRAM = 32GB
  • 26. Early Prototype Implementation ▪ We implement a prototype of TeraCache based on ParallelGC ▪ Place New Generation on DRAM ▪ Place Old Generation on fast storage device ▪ Explicitly disable GC on Old Generation ▪ Remaining to be implemented ▪ Cached RDDs reclamation ▪ Dynamic DR1/DR2 resizing ▪ Evaluation ▪ GC overhead ▪ Serialization overhead 26
  • 27. TeraCache Improves Performance by 25% ▪ Compared to Serialization: TC better up to 37% (on average 25%) ▪ Compared to GC + Linux swap: TC better up to 2x 2x 37% SW – Linux Kernel Swap HY – MEMORY_AND_DISK TC - TeraCache 27
  • 28. TeraCache Reduces GC Time by up to 50% 50% HY – MEMORY_AND_DISK TC - TeraCache 28
  • 30. TeraCache: Efficient Caching over Fast Storage ▪ Spark incurs high overhead for caching RDDs ▪ We observe: Spark cached data follow a nomadic hypothesis ▪ We introduce TeraCache which both reduces GC and eliminates serialization by using two heaps (generational, nomadic) ▪ We improve performance of Spark ML workloads by 25% (avg) ▪ Currently we are working on the full prototype 30
  • 31. Iacovos G. Kolokasis kolokasis@ics.forth.gr www.csd.uoc.gr/~kolokasis Thank you for your attention This work is supported by the EU Horizon 2020 Evolve project (#825061) Anastasios Papagiannis is supported by Facebook Graduate Fellowship
  • 32. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.