SlideShare a Scribd company logo
DAOS For Applications
Mohamad Chaarawi
Extreme Scale Architecture & Development
Intel CorporationCloud & Enterprise Solutions Group 2
2
DAOS Storage Architecture
2
DAOS Storage Engine
Intel® Optane Memory
SPDK
NVMe
Interface
Metadata, low-latency I/Os &
indexing/query
Bulk data
3D-NAND/Optane SSD
PMDK
Memory
Interface
HDD
AI/Analytics/Simulation Workflow
DAOS library
POSIX I/O HDF5 Spark…
Compute Nodes
MPI-I/O Python
Libfabric • Low-latency & high-message-rate communications
• Native support for RDMA & scalable collective operations
• Support for iWarp, RoCE, Infiniband, OPA, Slingshot, …
• DAOS library directly linked with the applications
• No need for dedicated cores
• Low memory/CPU footprint
• End-to-end OS bypass
• Non-blocking, lockless, snapshot support, …
• Fine-grained I/O with media selection strategy
• Only application data on SSD to maximize throughput
• Small I/Os aggregated in pmem & migrated to SSD in large
chunks
• Full userspace model with no system calls on I/O path
• Built-in storage management infrastructure (control plane)
• NFSv4-like ACL
Delivers high-IOPS, high-bandwidth and low-latency
storage with advanced features in a single tier
Storage Nodes
Intel CorporationCloud & Enterprise Solutions Group 3
Aggregate related datasets into manageable and
coherent entities
• Distributed consistency & automated recovery
• Full Versioning
• Simplified data management
• Snapshot
• Cross-tier Migration
• Indexing
Storage Containers
DAOS Container
datadatadatafile
dir
datadatafile
dir
datadatadatadatafile
dir
root
Encapsulated POSIX Namespace File-per-process
DAOS Container
datadatadatadatafile
datadatadatadatafile
datadatadatadatafile
datadatadatadatafile
DAOS Container
datadatadatadataset
group
datadatadataset
group
datadatadatadatadataset
group
group
HDF5 « File » Key-value store
Graph
DAOS Container
valuekey
valuekey
valuekey
valuekey valuekey
DAOS Container
node
node
node
node
node
node
DAOS Container
Columnar Database
key
key
key
key
Value
Value
Value
Value
Value
Value
Value
Value
Intel CorporationCloud & Enterprise Solutions Group 4
 Native support for structured, semi-structured & unstructured data
models
• Built on top of DCPMM
• Unconstrained by POSIX serialization
• Custom attributes
• Data access time orders of magnitude faster (µs)
• Scalable concurrent updates & high IOPS
• Non-blocking
• Enable in-storage computing
DAOS Objects
DAOS Storage Engine
Open Source Apache 2.0 License
Data Model Library/Framework
Array KV Store Multi-level KV Store
key1
val1
key3
val3
@
@
Application
NVMe SSD
DAOS
key1
val1
root
@
key3
val3
@
val2
key2
@@
@
@
val2 con’d
val2
key2
@
Application
Intel CorporationCloud & Enterprise Solutions Group 5
5
DAOS Tools
Tool dmg daos
Target Administrators Users
Lustre Equivalent lctl/mkfs/mount/IML lfs
Functionality • Storage provisioning
• Burn-in
• Firmware update
• Data plane mgmt & monitoring
• Configure/monitor scrubbing
• Pool mgmt
• Telemetry
• Pool query
• Container mgmt
• Unified namespace mgmt
• Container user attributes
• Snapshots
• Object debugging
• POSIX container configuration
Intel CorporationCloud & Enterprise Solutions Group 6
6
Application Interface
Dataset
Mover
Capacity Tier
PFS, S3, HSM, …
DAOS Storage Engine
Open Source Apache 2.0 License
POSIX I/O
HPC APPs
HDF5 MPI-IO Python
Apache
Spark
Apache
Arrow
Analytics/AI APPs
TensorFlowSEGY
Intel CorporationCloud & Enterprise Solutions Group 7
 DAOS API is new and very flexible
• Multi-level keys
• Different value types supported
• Can build all data models / IO middleware on top of it
 Most applications still based on POSIX
• Need a smooth migration path with little to no application changes
• Quick path to realize performance of DCPMM & DAOS
 POSIX implemented as a middleware instead of being the
building block of all data models.
POSIX I/O Support
Application / Framework
DAOS library (libdaos)
DAOS File System (libdfs)
Interception Library
dfuse
Single process address space
DAOS Storage Engine
RPC RDMA
End-to-end
user space
No system calls
Intel® QLC
3D Nand
SSD
Intel CorporationCloud & Enterprise Solutions Group 8
MPI-IO Driver for DAOS
The DAOS MPI-IO driver is implemented within the
I/O library in MPICH (ROMIO).
• Added as an ADIO driver
• Portable to Open-MPI, Intel MPI, etc.
• https://github.com/pmodels/mpich
MPI Files use the same DFS mapping to the
DAOS Object Model
• MPI Files can be accessed through the DFS
API
• MPI Files can be accessed through regular
POSIX with a dfuse mount over the container.
Application works seamlessly by just specifying the use of the driver by appending “daos:” to the path.
MPI-IO ROMIO driver (https://github.com/pmodels/mpich/tree/master/src/mpi/romio/adio/ad_daos)
POSIX / MPI-IO File
DAOS Byte Array Object
Special DAOS Object:
 1 Level Key
 1 Byte records
 Configurable Chunk Size
Intel CorporationCloud & Enterprise Solutions Group 9The information on this page is subject to the use and disclosure restrictions provided on the second page to this document.
HDF5 VOL Architecture



 Three main components:
• HDF5 Library
• DAOS VOL Connector
• (External) HDF5 Test Suite
HDF5 API
VOL Layer
VFD Layer
Native VOL
DAOS
VOL
SEC2
MPIO
File System DAOS
HDF5 Tools Test Suite
New Component
…
…
DAOS APIPOSIX API
Core HDF5
Library
External
Test
Suite
External
VOL
Connector
Enhanced Component
Native Component
Through
MPI I/O
Intel CorporationCloud & Enterprise Solutions Group 10
 No longer requires separate version of HDF5
• Compatible with main develop branch of HDF5
• Compatible with 1.12.x release series of HDF5 with VOL support
 Currently supported features
• New H5M MAP API to expose K/V interface to HDF5 users
• Variable length datatypes are now also supported
• Chunking is recommended storage layout to get most of DAOS performance
• Tools support (h5dump, h5ls, h5repack etc)
 Coming by end of the year
• Independent metadata writes (= independent object creation)
• Asynchronous I/O
 Available from: https://git.hdfgroup.org/projects/HDF5VOL/repos/daos-vol/
• See user’s guide for more detailed list of supported features
HDF5 DAOS VOL Connector – Current Status
Intel CorporationCloud & Enterprise Solutions Group 11
11
Unified Namespace
DAOS Object Store:
 Addresses pools, containers with uuids; objects with 128-bit IDs.
 Applications/Users are used to access files / directories in a traditional namespace
Unified Namespace allows users to create links between a file/dir in a system
namespace to DAOS pools & containers:
 daos container create --path=/mnt/project1/userA/NS1 --pool=uuid --
type=POSIX/HDF5/etc.
 Path created above becomes a special file or directory (depending on container type) with
an extended attribute with the pool and container information.
 Accessing that path from DAOS aware middleware will make the link on the fly with the
DAOS UNS library.
Intel CorporationCloud & Enterprise Solutions Group 12
12
Spark Input/Output Support
DAOS
FileSystem
DAOS POSIX API
Java Wrapper
HDFS
Hadoop FileSystem Abstract Class
Disks
DAOS
Storage
DAOS Library, DPDK/PMDK/SPDK used, Kernel bypass, zero copy
DAOS/Arrow
Wrapper
Apache Arrow Data Source
DAOS API
Wrapper
DAOS Native Data Source
(Parquet, ORC, etc.)
Scan/Parse
Project, Join, ML, etc.
Native Scan/Parse
(CSV, Parquet, ORC, etc.)
Current Spark
Implemented
Planned
Intel CorporationCloud & Enterprise Solutions Group 13
 Pythonic bindings called pydaos
• Export key-value store objects
• Integrated with python dictionaries
• Support python iterator, direct assignments, …
• Scalable & performant
• Bulk insert/retrieve
• Core written in C
• Python 2.7 & 3 support
 Python integration
• dbm
• pyprob
 TODO
• Expose snapshots
• Integration with NumPy, …
Python Support
Intel CorporationCloud & Enterprise Solutions Group 14
Resources
 Source code on GitHub
• https://github.com/daos-stack/daos
 Documentation
• http://daos.io
Community mailing list
• https://daos.groups.io
DAOS solution brief
• https://www.intel.com/content/www/us/en/high-performance-computing/
15

More Related Content

PDF
Apache Spark : Genel Bir Bakış
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
PPTX
Christo kutrovsky oracle, memory & linux
PDF
Ursa Labs and Apache Arrow in 2019
PDF
DUG'20: 09 - DAOS Middleware Update
PDF
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
PDF
Big Data Uses with Distributed Asynchronous Object Storage
PDF
DUG'20: 13 - HPE’s DAOS Solution Plans
Apache Spark : Genel Bir Bakış
The columnar roadmap: Apache Parquet and Apache Arrow
Christo kutrovsky oracle, memory & linux
Ursa Labs and Apache Arrow in 2019
DUG'20: 09 - DAOS Middleware Update
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
Big Data Uses with Distributed Asynchronous Object Storage
DUG'20: 13 - HPE’s DAOS Solution Plans

Similar to DAOS Middleware overview (20)

PDF
DUG'20: 01 - Welcome & DAOS Update
PDF
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
PDF
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
PPTX
Data-Intensive Workflows with DAOS
PDF
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
PDF
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
PPTX
Impact of Intel Optane Technology on HPC
PPTX
EMC HADOOP Storage Strategy
PPT
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
PPTX
EMC World 2016 - Introduction to Mesos and Mesosphere
PDF
IBM System x3690 X5 Product Guide
PDF
Introduction to Apache Mesos and DC/OS
PPT
What will be new in HDF5?
PPTX
Disaggregated Hadoop Stacks
PPTX
In-Place analytics with Unified Data Access
PPTX
EMC World 2016 - code.08 Introduction to Mesos and Mesosphere
PDF
Speed up Digital Transformation with Openstack Cloud & Software Defined Storage
PPTX
EMC config Hadoop
PPTX
Adios Api Scidac Tutorialv2
PDF
IBM System x3850 X5 / x3950 X5 Product Guide
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Data-Intensive Workflows with DAOS
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Impact of Intel Optane Technology on HPC
EMC HADOOP Storage Strategy
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
EMC World 2016 - Introduction to Mesos and Mesosphere
IBM System x3690 X5 Product Guide
Introduction to Apache Mesos and DC/OS
What will be new in HDF5?
Disaggregated Hadoop Stacks
In-Place analytics with Unified Data Access
EMC World 2016 - code.08 Introduction to Mesos and Mesosphere
Speed up Digital Transformation with Openstack Cloud & Software Defined Storage
EMC config Hadoop
Adios Api Scidac Tutorialv2
IBM System x3850 X5 / x3950 X5 Product Guide
Ad

More from Andrey Kudryavtsev (7)

PDF
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
PDF
DUG'20: 08 - DAOS-SEGY Mapping
PDF
DUG'20: 07 - Storing High-Energy Physics data in DAOS
PDF
DUG'20: 06 - DAOS Adventures at CERN Openlab
PDF
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
PDF
DUG'20: 04 - DAOS Feature Update
PDF
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 04 - DAOS Feature Update
DUG'20: 03 - Online compression with QAT in DAOS
Ad

Recently uploaded (20)

PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Modernising the Digital Integration Hub
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
The various Industrial Revolutions .pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
1. Introduction to Computer Programming.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Getting Started with Data Integration: FME Form 101
PDF
August Patch Tuesday
WOOl fibre morphology and structure.pdf for textiles
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
OMC Textile Division Presentation 2021.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
1 - Historical Antecedents, Social Consideration.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
Modernising the Digital Integration Hub
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
cloud_computing_Infrastucture_as_cloud_p
The various Industrial Revolutions .pptx
Enhancing emotion recognition model for a student engagement use case through...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Zenith AI: Advanced Artificial Intelligence
1. Introduction to Computer Programming.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Getting Started with Data Integration: FME Form 101
August Patch Tuesday

DAOS Middleware overview

  • 1. DAOS For Applications Mohamad Chaarawi Extreme Scale Architecture & Development
  • 2. Intel CorporationCloud & Enterprise Solutions Group 2 2 DAOS Storage Architecture 2 DAOS Storage Engine Intel® Optane Memory SPDK NVMe Interface Metadata, low-latency I/Os & indexing/query Bulk data 3D-NAND/Optane SSD PMDK Memory Interface HDD AI/Analytics/Simulation Workflow DAOS library POSIX I/O HDF5 Spark… Compute Nodes MPI-I/O Python Libfabric • Low-latency & high-message-rate communications • Native support for RDMA & scalable collective operations • Support for iWarp, RoCE, Infiniband, OPA, Slingshot, … • DAOS library directly linked with the applications • No need for dedicated cores • Low memory/CPU footprint • End-to-end OS bypass • Non-blocking, lockless, snapshot support, … • Fine-grained I/O with media selection strategy • Only application data on SSD to maximize throughput • Small I/Os aggregated in pmem & migrated to SSD in large chunks • Full userspace model with no system calls on I/O path • Built-in storage management infrastructure (control plane) • NFSv4-like ACL Delivers high-IOPS, high-bandwidth and low-latency storage with advanced features in a single tier Storage Nodes
  • 3. Intel CorporationCloud & Enterprise Solutions Group 3 Aggregate related datasets into manageable and coherent entities • Distributed consistency & automated recovery • Full Versioning • Simplified data management • Snapshot • Cross-tier Migration • Indexing Storage Containers DAOS Container datadatadatafile dir datadatafile dir datadatadatadatafile dir root Encapsulated POSIX Namespace File-per-process DAOS Container datadatadatadatafile datadatadatadatafile datadatadatadatafile datadatadatadatafile DAOS Container datadatadatadataset group datadatadataset group datadatadatadatadataset group group HDF5 « File » Key-value store Graph DAOS Container valuekey valuekey valuekey valuekey valuekey DAOS Container node node node node node node DAOS Container Columnar Database key key key key Value Value Value Value Value Value Value Value
  • 4. Intel CorporationCloud & Enterprise Solutions Group 4  Native support for structured, semi-structured & unstructured data models • Built on top of DCPMM • Unconstrained by POSIX serialization • Custom attributes • Data access time orders of magnitude faster (µs) • Scalable concurrent updates & high IOPS • Non-blocking • Enable in-storage computing DAOS Objects DAOS Storage Engine Open Source Apache 2.0 License Data Model Library/Framework Array KV Store Multi-level KV Store key1 val1 key3 val3 @ @ Application NVMe SSD DAOS key1 val1 root @ key3 val3 @ val2 key2 @@ @ @ val2 con’d val2 key2 @ Application
  • 5. Intel CorporationCloud & Enterprise Solutions Group 5 5 DAOS Tools Tool dmg daos Target Administrators Users Lustre Equivalent lctl/mkfs/mount/IML lfs Functionality • Storage provisioning • Burn-in • Firmware update • Data plane mgmt & monitoring • Configure/monitor scrubbing • Pool mgmt • Telemetry • Pool query • Container mgmt • Unified namespace mgmt • Container user attributes • Snapshots • Object debugging • POSIX container configuration
  • 6. Intel CorporationCloud & Enterprise Solutions Group 6 6 Application Interface Dataset Mover Capacity Tier PFS, S3, HSM, … DAOS Storage Engine Open Source Apache 2.0 License POSIX I/O HPC APPs HDF5 MPI-IO Python Apache Spark Apache Arrow Analytics/AI APPs TensorFlowSEGY
  • 7. Intel CorporationCloud & Enterprise Solutions Group 7  DAOS API is new and very flexible • Multi-level keys • Different value types supported • Can build all data models / IO middleware on top of it  Most applications still based on POSIX • Need a smooth migration path with little to no application changes • Quick path to realize performance of DCPMM & DAOS  POSIX implemented as a middleware instead of being the building block of all data models. POSIX I/O Support Application / Framework DAOS library (libdaos) DAOS File System (libdfs) Interception Library dfuse Single process address space DAOS Storage Engine RPC RDMA End-to-end user space No system calls Intel® QLC 3D Nand SSD
  • 8. Intel CorporationCloud & Enterprise Solutions Group 8 MPI-IO Driver for DAOS The DAOS MPI-IO driver is implemented within the I/O library in MPICH (ROMIO). • Added as an ADIO driver • Portable to Open-MPI, Intel MPI, etc. • https://github.com/pmodels/mpich MPI Files use the same DFS mapping to the DAOS Object Model • MPI Files can be accessed through the DFS API • MPI Files can be accessed through regular POSIX with a dfuse mount over the container. Application works seamlessly by just specifying the use of the driver by appending “daos:” to the path. MPI-IO ROMIO driver (https://github.com/pmodels/mpich/tree/master/src/mpi/romio/adio/ad_daos) POSIX / MPI-IO File DAOS Byte Array Object Special DAOS Object:  1 Level Key  1 Byte records  Configurable Chunk Size
  • 9. Intel CorporationCloud & Enterprise Solutions Group 9The information on this page is subject to the use and disclosure restrictions provided on the second page to this document. HDF5 VOL Architecture     Three main components: • HDF5 Library • DAOS VOL Connector • (External) HDF5 Test Suite HDF5 API VOL Layer VFD Layer Native VOL DAOS VOL SEC2 MPIO File System DAOS HDF5 Tools Test Suite New Component … … DAOS APIPOSIX API Core HDF5 Library External Test Suite External VOL Connector Enhanced Component Native Component Through MPI I/O
  • 10. Intel CorporationCloud & Enterprise Solutions Group 10  No longer requires separate version of HDF5 • Compatible with main develop branch of HDF5 • Compatible with 1.12.x release series of HDF5 with VOL support  Currently supported features • New H5M MAP API to expose K/V interface to HDF5 users • Variable length datatypes are now also supported • Chunking is recommended storage layout to get most of DAOS performance • Tools support (h5dump, h5ls, h5repack etc)  Coming by end of the year • Independent metadata writes (= independent object creation) • Asynchronous I/O  Available from: https://git.hdfgroup.org/projects/HDF5VOL/repos/daos-vol/ • See user’s guide for more detailed list of supported features HDF5 DAOS VOL Connector – Current Status
  • 11. Intel CorporationCloud & Enterprise Solutions Group 11 11 Unified Namespace DAOS Object Store:  Addresses pools, containers with uuids; objects with 128-bit IDs.  Applications/Users are used to access files / directories in a traditional namespace Unified Namespace allows users to create links between a file/dir in a system namespace to DAOS pools & containers:  daos container create --path=/mnt/project1/userA/NS1 --pool=uuid -- type=POSIX/HDF5/etc.  Path created above becomes a special file or directory (depending on container type) with an extended attribute with the pool and container information.  Accessing that path from DAOS aware middleware will make the link on the fly with the DAOS UNS library.
  • 12. Intel CorporationCloud & Enterprise Solutions Group 12 12 Spark Input/Output Support DAOS FileSystem DAOS POSIX API Java Wrapper HDFS Hadoop FileSystem Abstract Class Disks DAOS Storage DAOS Library, DPDK/PMDK/SPDK used, Kernel bypass, zero copy DAOS/Arrow Wrapper Apache Arrow Data Source DAOS API Wrapper DAOS Native Data Source (Parquet, ORC, etc.) Scan/Parse Project, Join, ML, etc. Native Scan/Parse (CSV, Parquet, ORC, etc.) Current Spark Implemented Planned
  • 13. Intel CorporationCloud & Enterprise Solutions Group 13  Pythonic bindings called pydaos • Export key-value store objects • Integrated with python dictionaries • Support python iterator, direct assignments, … • Scalable & performant • Bulk insert/retrieve • Core written in C • Python 2.7 & 3 support  Python integration • dbm • pyprob  TODO • Expose snapshots • Integration with NumPy, … Python Support
  • 14. Intel CorporationCloud & Enterprise Solutions Group 14 Resources  Source code on GitHub • https://github.com/daos-stack/daos  Documentation • http://daos.io Community mailing list • https://daos.groups.io DAOS solution brief • https://www.intel.com/content/www/us/en/high-performance-computing/
  • 15. 15

Editor's Notes

  • #3: Overall DAOS architecture Distributed object store built from the ground up for NVM technology. Includes 2 types of storage: SSD with different media, 3d-NAND, Optane. Available with NVMe interface, provide fast block-based storage Intel developing new user-space nvme driver, SPDK Pmem, Optane pmem. Fine grained storage directly memory mapped. Can do direct load store to the storage. No block abstraction Intel developing new storage stack PMDK to manage transactional update to pmem We don’t have any Object store to leverage these object stores. Lustre, GPFS built into kernel. Ceph still uses kernel IOs by going through Linux block layer. Here everything in userspace. DAOS new OS built on top of pmdk and spdk. highIOPS high BW. More advanced storage API, not just byte arrays, KV, etc. Still need to look into communication. Reuse several decade effort to optimize latency and BW for MPI. Re-use same communication middleware as MPI (libfabric). Client side, very lightweight with support to many middleware. All metadata stored on pmem. Small IOs land in pmem and later aggregated into SSD in the background to free space in Pmem. Application large data go in the SSD.
  • #4: Baseline API based on Storage container -> Object address space inside a pool. Container can be encapsulated POSIX namespace, or just a flat one. HDF5 has it’s own representations for a single HDF5. KV store can be one. Database, ACG 1 graph Container provides a baseline API with object. Unit of storage management is the container not a file. When you have millions of file, data migration is done on all those. We Simplified that. Fewer number of containers than files, so we can index them. Data migration of entire container. unit of Snapshot. Snapshot not typically available traditionally (used to be at FS level) but now at container level and exposed to user
  • #5: POSIX not the foundation any more. New key-array store API provided by DAOS with advanced features (ad-hoc concurrency control, snapshot support, asynchronous API, ..). DAOS API to be directly integrated with rich data model for better performance and extended capabilities. POSIX (i.e. File) support built as a middleware over the DAOS API. DAOS has a native KV API which means that the wire protocol is KV fetch/lookup. On the server, the tree structure is maintained in DCPM and values are either stored in DCPM or SSD depending on sizes. No file serialization is required. A client can directly lookup or insert a key/value pair over the wire and this can happen from many clients concurrently. Keys can be ordered, so we support range query. KV store are partitioned/sharded/replicated/erasure coded over multiple servers for performance & resilience. DAOS can also support complex query since it understand the key-value structure. The native indexing can also be augmented with ad-hoc application-driven index.
  • #13: Spark Integration with DAOS – different team at intel Written Hadoop filesystem abstraction class for DAOS. Integrated in DAOS source code. Use DAOS in user-space through libdfs transparently with this Hadoop connector. More value/acceleration to Spark, by integrating DAOS as a datasource through parquet / apache arrow, or through native DAOS API.
  • #14: Native Python bindings (pydaos) integrate KV store object directly with python dictionaries. Very pythonic API, you can have iterator, direct assignments, bulk insertion/retrieve Allow people who are not very familiar with C using high level language. Some integration with dbm, pyprob. More work to do