Jayjeet Chakraborty
Towards an Arrow-Native Storage System
SkyhookDM
Mentored by: Carlos Maltzahn, Ivo Jimenez, Je
ff
LeFevre
1
Who am I ?
• Incoming Grad Student at UC Santa Cruz

• CS Graduate from NIT Durgapur, India

• IRIS-HEP Fellow Summer 2020

• Twitter: @heyjc25

• Github: JayjeetAtGithub

• LinkedIn: https://www.linkedin.com/in/jayjeet-chakraborty-077579162/

• E-Mail: jchakra1@ucsc.edu
2
Problem
• CPU is the new bottleneck with high speed network and storage devices.

• Client-side processing of data from highly e
ffi
cient storage formats like
Parquet, ORC exhausts the CPUs.

• Severely hampered scalability.
• O
ffl
oad computation from client to the storage layer.

• Take advantage of the idle CPUs of storage systems for increased processing
rates and faster queries.

• Results in less data movement and network tra
ffi
c.
Our Solution
3
Introduction to Ceph
1.Provides 3 types of storage interface:
File, Object, Block.

2.No central point of failure. Uses
CRUSH maps that contains object -
OSD mapping. A CRUSH map in each
client. Client talks directly to OSD.

3.Highly extensible Object storage layer
via the Ceph Object Classes SDK.

4
• Language-independent columnar memory format for
fl
at and hierarchical data,
organised for e
ffi
cient analytic operations on modern hardware.

• Share data between processes without serialization overhead.
Before
Arrow
After Arrow
5
Components of Arrow
6
Arrow components
used by Skyhook
Design Paradigm
• Extend client and storage layers of
programmable storage systems
with data access libraries.

• Embed a FS shim inside storage
nodes to have
fi
le-like view over
objects.

• Allow direct interaction with objects
in an object store while bypassing
the
fi
lesystem layer utilising FS
metadata.
7
Architecture
• Arrow data access libraries embedded inside Ceph OSDs to allow
fi
le fragment scanning inside the storage
layer. 

• Expose the functionality through the Arrow Dataset API by creating a new
fi
le format abstraction
“RadosParquetFileFormat”.
8
File Layout Design
• Large multi-gigabyte Parquet
fi
les are split into smaller ~128 MB Parquet
fi
les.
• Each Parquet
fi
le is stored in a single RADOS object for SkyhookDM to access.
9
Experiments: Latency
• O
ffl
oading makes queries with higher
selectivity faster as less amount of data
is moved around the system. Also, less
time goes in data (de)serialization and
more into processing.

• LZ4 compressed Arrow IPC
fi
les
(Bottom) makes SkyhookDM better
performing than Parquet
fi
les (Top) since
they are faster to R/W.
Parquet
on Disk
LZ4 IPC on
Disk
10
Experiments: CPU Usage
• SkyhookDM nicely o
ffl
oads CPU usage from client layer to storage layer. For
example with 4 OSDs and 100% selectivity,
Without
Skyhook
With Skyhook
11
Experiments: Network Traffic
• SkyhookDM saves network
bandwidth by transferring only
the data that is requested by the
client.

• We end up transferring a little
more data in case of 100% as
LZ4 compressed Arrow is larger
than Parquet binary data.
1%
10%
100%
12
Experiments: Crash Recovery
• In SkyhookDM, since processing is colocated with storage nodes, the crash recovery
and consistency semantics of the storage layer apply naturally to query processing.
Crash Point
13
Coffea + SkyhookDM
• Implemented a run_parquet_job executor method in Co
ff
ea to be able to read from
Parquet
fi
les using the Arrow Dataset API. This in turn allowed integrating Co
ff
ea with
SkyhookDM seamlessly.
14
41.5%
30.5%
24.6%
3
.
3
4
%
0.103%
0.0324%
0.00855%
0.00511%
[6] Serialize Result Table
[5] Scan Parquet Data
[7] Result Transfer
[4] Disk I/O
[3] Deserialize Scan Request
[1] Stat Fragment
[8] Deserialize Result Table
[2] Serialize Scan Request
Sending uncompressed IPC
Ongoing Work
• Arrow’s memory layout requires internal memory copies to serialize it to a
contiguous on the wire format and this has a very high overhead.
48.3%
29.5%
11.7%
5.37%
5.11%
0.0513%
0.0304%
0.00771%
[5] Scan Parquet Data
[6] Serialize Result Table
[7] Result Transfer
[8] Deserialize Result Table
[4] Disk I/O
[3] Deserialize Scan Request
[1] Stat Fragment
[2] Serialize Scan Request
Sending LZ4 compressed IPC
• Collaborating with ServiceX and Co
ff
ea team to integrate SkyhookDM into the
larger analysis facility ecosystem.
15
Checkout our work
• Github Repository: https://github.com/uccross/skyhookdm-arrow

• Docker containers: https://github.com/uccross/skyhookdm-arrow-docker

• ArXiv Paper: https://arxiv.org/pdf/2105.09894.pdf

• Co
ff
ea Skyhook Plugin: https://github.com/Co
ff
eaTeam/co
ff
ea/tree/master/
docker/co
ff
ea_rados_parquet

• Several bugs found and reported in Apache Arrow: ARROW-13161,
ARROW-13126, ARROW-13088.
16
Thank You


Questions ?


17

More Related Content

PPTX
Types of computers
PPTX
8086 architecture
PPTX
X86 Architecture
PDF
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
PPTX
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
PDF
Latest (storage IO) patterns for cloud-native applications
PDF
Inter connect2016 yss1841-cloud-storage-options-v4
Types of computers
8086 architecture
X86 Architecture
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
RaptorX: Building a 10X Faster Presto with hierarchical cache
Latest (storage IO) patterns for cloud-native applications
Inter connect2016 yss1841-cloud-storage-options-v4

Similar to SkyhookDM - Towards an Arrow-Native Storage System (20)

PPTX
Hadoop introduction
PDF
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
PPTX
Webinar: Untethering Compute from Storage
PPTX
HPC and cloud distributed computing, as a journey
PDF
VMworld 2013: Virtualizing Databases: Doing IT Right
PDF
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
PDF
Apache Spark At Scale in the Cloud
PDF
Apache Spark At Scale in the Cloud
PDF
COBOL to Apache Spark
PDF
Oracle GoldenGate Architecture Performance
PPTX
Scaling Security Workflows in Government Agencies
PDF
GEN-Z: An Overview and Use Cases
PPTX
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
PPTX
Cloud computing UNIT 2.1 presentation in
PDF
Accelerating workloads and bursting data with Google Dataproc & Alluxio
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
PDF
DPDK Summit 2015 - Aspera - Charles Shiflett
PPT
Building large scale, job processing systems with Scala Akka Actor framework
PDF
Internals of Presto Service
PDF
Machine Learning With H2O vs SparkML
Hadoop introduction
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
Webinar: Untethering Compute from Storage
HPC and cloud distributed computing, as a journey
VMworld 2013: Virtualizing Databases: Doing IT Right
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
COBOL to Apache Spark
Oracle GoldenGate Architecture Performance
Scaling Security Workflows in Government Agencies
GEN-Z: An Overview and Use Cases
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloud computing UNIT 2.1 presentation in
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Project Tungsten: Bringing Spark Closer to Bare Metal
DPDK Summit 2015 - Aspera - Charles Shiflett
Building large scale, job processing systems with Scala Akka Actor framework
Internals of Presto Service
Machine Learning With H2O vs SparkML
Ad

Recently uploaded (20)

PDF
Website Design Services for Small Businesses.pdf
PPTX
Cybersecurity: Protecting the Digital World
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
CNN LeNet5 Architecture: Neural Networks
PPTX
"Secure File Sharing Solutions on AWS".pptx
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
assetexplorer- product-overview - presentation
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Computer Software - Technology and Livelihood Education
PDF
Visual explanation of Dijkstra's Algorithm using Python
Website Design Services for Small Businesses.pdf
Cybersecurity: Protecting the Digital World
Wondershare Recoverit Full Crack New Version (Latest 2025)
CNN LeNet5 Architecture: Neural Networks
"Secure File Sharing Solutions on AWS".pptx
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Oracle Fusion HCM Cloud Demo for Beginners
CCleaner 6.39.11548 Crack 2025 License Key
MCP Security Tutorial - Beginner to Advanced
assetexplorer- product-overview - presentation
iTop VPN Crack Latest Version Full Key 2025
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Salesforce Agentforce AI Implementation.pdf
Topaz Photo AI Crack New Download (Latest 2025)
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Computer Software - Technology and Livelihood Education
Visual explanation of Dijkstra's Algorithm using Python
Ad

SkyhookDM - Towards an Arrow-Native Storage System

  • 1. Jayjeet Chakraborty Towards an Arrow-Native Storage System SkyhookDM Mentored by: Carlos Maltzahn, Ivo Jimenez, Je ff LeFevre 1
  • 2. Who am I ? • Incoming Grad Student at UC Santa Cruz • CS Graduate from NIT Durgapur, India • IRIS-HEP Fellow Summer 2020 • Twitter: @heyjc25 • Github: JayjeetAtGithub • LinkedIn: https://www.linkedin.com/in/jayjeet-chakraborty-077579162/ • E-Mail: jchakra1@ucsc.edu 2
  • 3. Problem • CPU is the new bottleneck with high speed network and storage devices. • Client-side processing of data from highly e ffi cient storage formats like Parquet, ORC exhausts the CPUs. • Severely hampered scalability. • O ffl oad computation from client to the storage layer. • Take advantage of the idle CPUs of storage systems for increased processing rates and faster queries. • Results in less data movement and network tra ffi c. Our Solution 3
  • 4. Introduction to Ceph 1.Provides 3 types of storage interface: File, Object, Block.
 2.No central point of failure. Uses CRUSH maps that contains object - OSD mapping. A CRUSH map in each client. Client talks directly to OSD.
 3.Highly extensible Object storage layer via the Ceph Object Classes SDK.
 4
  • 5. • Language-independent columnar memory format for fl at and hierarchical data, organised for e ffi cient analytic operations on modern hardware. • Share data between processes without serialization overhead. Before Arrow After Arrow 5
  • 6. Components of Arrow 6 Arrow components used by Skyhook
  • 7. Design Paradigm • Extend client and storage layers of programmable storage systems with data access libraries. • Embed a FS shim inside storage nodes to have fi le-like view over objects. • Allow direct interaction with objects in an object store while bypassing the fi lesystem layer utilising FS metadata. 7
  • 8. Architecture • Arrow data access libraries embedded inside Ceph OSDs to allow fi le fragment scanning inside the storage layer. • Expose the functionality through the Arrow Dataset API by creating a new fi le format abstraction “RadosParquetFileFormat”. 8
  • 9. File Layout Design • Large multi-gigabyte Parquet fi les are split into smaller ~128 MB Parquet fi les. • Each Parquet fi le is stored in a single RADOS object for SkyhookDM to access. 9
  • 10. Experiments: Latency • O ffl oading makes queries with higher selectivity faster as less amount of data is moved around the system. Also, less time goes in data (de)serialization and more into processing. • LZ4 compressed Arrow IPC fi les (Bottom) makes SkyhookDM better performing than Parquet fi les (Top) since they are faster to R/W. Parquet on Disk LZ4 IPC on Disk 10
  • 11. Experiments: CPU Usage • SkyhookDM nicely o ffl oads CPU usage from client layer to storage layer. For example with 4 OSDs and 100% selectivity, Without Skyhook With Skyhook 11
  • 12. Experiments: Network Traffic • SkyhookDM saves network bandwidth by transferring only the data that is requested by the client. • We end up transferring a little more data in case of 100% as LZ4 compressed Arrow is larger than Parquet binary data. 1% 10% 100% 12
  • 13. Experiments: Crash Recovery • In SkyhookDM, since processing is colocated with storage nodes, the crash recovery and consistency semantics of the storage layer apply naturally to query processing. Crash Point 13
  • 14. Coffea + SkyhookDM • Implemented a run_parquet_job executor method in Co ff ea to be able to read from Parquet fi les using the Arrow Dataset API. This in turn allowed integrating Co ff ea with SkyhookDM seamlessly. 14
  • 15. 41.5% 30.5% 24.6% 3 . 3 4 % 0.103% 0.0324% 0.00855% 0.00511% [6] Serialize Result Table [5] Scan Parquet Data [7] Result Transfer [4] Disk I/O [3] Deserialize Scan Request [1] Stat Fragment [8] Deserialize Result Table [2] Serialize Scan Request Sending uncompressed IPC Ongoing Work • Arrow’s memory layout requires internal memory copies to serialize it to a contiguous on the wire format and this has a very high overhead. 48.3% 29.5% 11.7% 5.37% 5.11% 0.0513% 0.0304% 0.00771% [5] Scan Parquet Data [6] Serialize Result Table [7] Result Transfer [8] Deserialize Result Table [4] Disk I/O [3] Deserialize Scan Request [1] Stat Fragment [2] Serialize Scan Request Sending LZ4 compressed IPC • Collaborating with ServiceX and Co ff ea team to integrate SkyhookDM into the larger analysis facility ecosystem. 15
  • 16. Checkout our work • Github Repository: https://github.com/uccross/skyhookdm-arrow • Docker containers: https://github.com/uccross/skyhookdm-arrow-docker • ArXiv Paper: https://arxiv.org/pdf/2105.09894.pdf • Co ff ea Skyhook Plugin: https://github.com/Co ff eaTeam/co ff ea/tree/master/ docker/co ff ea_rados_parquet • Several bugs found and reported in Apache Arrow: ARROW-13161, ARROW-13126, ARROW-13088. 16