SlideShare a Scribd company logo
© 2024 KIOXIA America, Inc. All Rights Reserved.
Redefining Data Redundancy
with RAID Offload
Mahinder Saluja
Director of Strategy, SSD BU
KIOXIA America, Inc.
© 2024 KIOXIA America, Inc. All Rights Reserved. 2
• Storage Services Evolution
• Data Redundancy Compute Challenges
• Offload to SSD
• Why xPU Should Leverage SSD Offload
Agenda
© 2024 KIOXIA America, Inc. All Rights Reserved. 3
Storage Cluster Controller
Controller Controller
NVMe-oF
SmartNIC / xPU
PCI Switch
NVMe-oF
SmartNIC / xPU
PCI Switch
Fabric
Storage Services Evolution
• Data redundancy in storage services demands high
compute resources
• xPUs are making inroads to offload and accelerate
storage services stack
• xPUs will be challenged for performance in future
– NVMeTM
performance continues to double with
every PCIe®
generation*
Storage Services Stack*
* Notational architecture, implementation dependent
xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application.
NVMe , and NVMe-oF are trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG.
Block Devices / File Systems / Data Base App
Storage Pool / Virtual Volume
Data Reduction
RAID / Erasure Coding (EC)
Storage Services
RAID/ EC
Storage Services
RAID/ EC
Storage Services
RAID/ EC
*Based on the PCIe evolution as published by PCI-SIG.
Data
Scrubbing
© 2024 KIOXIA America, Inc. All Rights Reserved. 4
What are Data Redundancy Challenges?
• Parity compute is memory bandwidth
and CPU intensive
• A RAID 5 partial stripe write requires
~10x DRAM throughput, for example a
modest 4KB block RAID 5 write will
consume 40KB DRAM bandwidth
• The problem worsens with RAID 6 /
erasure coding (EC)
• System resources are overprovisioned to
meet these demands
RAID 5 Full Stripe Write Memory Overhead
D0
XOR (CPU)
D1 D2 D3
P’ P’
DRAM Throughput required to Compute parity for 4+1 disk RAID and
128K segment is 1792KB(14x)
Data Overhead
DRAM
P
4.49
8.98
17.96
6.93
13.86
27.72
0
5
10
15
20
25
30
1 2 4
GB/s
Number of Parities
DRAM Throughput for RAID Write at 1 GB/s
full stripe write 50% full stripe write
Source: KIOXIA
write 50% stripe
© 2024 KIOXIA America, Inc. All Rights Reserved. 5
How xPUs Leverage RAID
Storage
Service
Stack
Storage Pool Software Stack
Erasure Coding (EC)/RAID Driver
NVMe-oF Protocol
System Memory
NVMeTM Driver
NVMe-oF Virtual Device
xPU
DRAM
XOR
SSD SSD SSD SSD
Storage
Service
Stack
Storage Pool Software Stack
Erasure Coding (EC)/RAID Driver
NVMe-oF Protocol
CMB Memory Pool
NVMe Driver
NVMe-oF Virtual Device
xPU
DRAM
SSD SSD SSD SSD
CMB
XOR
DMAC
CMB
XOR
DMAC
CMB
XOR
DMAC
CMB
XOR
DMAC
• KIOXIA NVMe SSD features
– Controller memory buffers
(CMB) to offload DRAM
– Exclusive OR (XOR) engine to
compute up to 8 parities
– Direct memory access
controller (DMAC)
• To place data in host
address space (including
remote CMB)
• RAID Offload enables parallel
compute and linear scaling
xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe , NVM Express , and
NVMe-oF are trademarks of NVM Express, Inc.
© 2024 KIOXIA America, Inc. All Rights Reserved. 6
How xPUs Can Leverage RAID – Sample Command Flow
XOR
New Data from
NVMe-oF virtual
device
SSDs
Flash
SSDp
P
SSDd
New Data
RAID Driver
command data
XOR
Flash
Step 1: Data In Step 2: Compute Step 3: Write
RAID 5 – Read / Modify / Write Sequence
1
Old Data
P
Data OP
New Data
1
Read Old Data
1
Read Old Parity
1
Move Data to CMB
1 1
Move New Data
in CMB
Read Old Data
in CM
Read Old Parity
in CMB
New Parity
2
Parity
Compute
Command
2
Calculate New
P Parity
3
Write Parity
3
Write Data
P
3 3
Write New Data
Write New Parity
NVMe and NVMe-oF are trademarks of NVM Express, Inc.
© 2024 KIOXIA America, Inc. All Rights Reserved. 7
Why Should xPUs Leverage SSD RAID Offload?
NVMe-oF Virtual Device
Storage
Service
Stack
Storage Pool Software Stack
Erasure Coding (EC) / RAID Driver
NVMe-oF Protocol
Controller Memory Buffers (CMB) Memory Pool
NVMe Driver
xPU
PARITY COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
PARITY
COMPUTE
xPUs can leverage their own accelerators, but why
offload to SSD?
• Performance of accelerators will be limited to design
time considerations
• The high memory bandwidth requirements increases
the cost of xPUs
• The SSD offload can scale linearly with every SSD
added to the cluster
With offload…
• Save compute and memory bandwidth for value add
storage functions
• Throw away operations like data scrubbing can be
offloaded to SSD; 99% data movement reduction in
data scrubbing operation to SSD
• Develop cost-effective data processing systems and
solutions
• xPUs can scale RAID solution by leveraging its
Remote Direct Memory Access (RDMA) capabilities
xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application.
NVMe and NVMe-oF are trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG.
© 2024 KIOXIA America, Inc. All Rights Reserved. 8
xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe and NVMe-oF are
trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG. Dell is a registered trademark of Dell Inc. PowerEdge is a trademark of Dell
Inc.. Xeon is a registered trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
RAID Offload Proof of Concept (PoC) results on CPU
Storage Cluster Controller
Controller Controller
NVMe-oF
SmartNIC / xPU
PCI Switch
NVMe-oF
SmartNIC / xPU
PCI Switch
Fabric
Storage Services
RAID/ EC
Storage Services
RAID/ EC
Storage Services
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
RAID/ EC
System: DELL
®
PowerEdge R650xs Xeon® Gold 6338N 2.2GHz (2 Socket, 32 Cores) PCIe® 4.0 , SSDs: 5xCM7 Gen4 (1.92TB)
IO workload: FIO 512K Random Write @ 950MB/s
RAID Offload : PoC Results (with KIOXIA CM7 and mdraid5) CPU attached
System KIOXIA CM7
Gen4 x4 – mdRAID 5#
RAID Offload % Benefit
CPU Utilization 42 37 12% reduction
DRAM Bandwidth
(in MiB/s)
3450 340 91% reduction
Offload Disabled Offload Enabled
Scrubbing Time 129s 91s
DRAM Bandwidth 10.24 GB/s 1.43 GB/s
Total CPU Utilization 99.5% ~70%
L3 Cache Misses 14.7M 4M
Total PCIe® Write (MB/s) 3694 MB/s 159 MB/s
Data Scrubbing PoC Results CPU attached
© 2024 KIOXIA America, Inc. All Rights Reserved. 9
Additionally, KIOXIA is exploring offload functions beyond RAID Offload.
xPU and SSD can team up to build a
cost-effective storage services solution.
For more information,
read our RAID Offload brief.
For more details,
visit KIOXIA FMS Booth #307.
Redefining Data Redundancywith RAID Offload

More Related Content

PPTX
Flash memory summit enterprise udate 2019
PDF
Introduction to NVMe Over Fabrics-V3R
PDF
The Unofficial VCAP / VCP VMware Study Guide
PDF
NVMe over Fabrics Demystified
PDF
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
PDF
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
PDF
Offloading for Databases - Deep Dive
PDF
Mega Launch Recap Slide Deck
Flash memory summit enterprise udate 2019
Introduction to NVMe Over Fabrics-V3R
The Unofficial VCAP / VCP VMware Study Guide
NVMe over Fabrics Demystified
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
Offloading for Databases - Deep Dive
Mega Launch Recap Slide Deck

Similar to Redefining Data Redundancywith RAID Offload (20)

PDF
Preserve user response time while ensuring data availability
PDF
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
PPTX
Vm13 vnx mixed workloads
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
PDF
Achieve more storage performance with Dell PowerEdge R750 servers equipped wi...
PDF
MT41 Dell EMC VMAX: Ask the Experts
PDF
NVMe Takes It All, SCSI Has To Fall
PPTX
Deploying ssd in the data center 2014
PPTX
Storage and performance- Batch processing, Whiptail
PDF
Unlock more mixed storage performance on Dell PowerEdge R750 servers with Bro...
PDF
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
PDF
NVMe over Fabric
PDF
Keep data available without affecting user response time
PDF
MigratingFromVMwareToKubernetes_DEVCONF_25_CZ_vkSHTgf.pdf
PDF
MigratingFromVMwareToKubernetes_DEVCONF_25_CZ_vkSHTgf.pdf
PDF
Current and Future of Non-Volatile Memory on Linux
PDF
Storage user cases
PPTX
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
PDF
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
PPT
Ds8000 Practical Performance Analysis P04 20060718
Preserve user response time while ensuring data availability
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
Vm13 vnx mixed workloads
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Achieve more storage performance with Dell PowerEdge R750 servers equipped wi...
MT41 Dell EMC VMAX: Ask the Experts
NVMe Takes It All, SCSI Has To Fall
Deploying ssd in the data center 2014
Storage and performance- Batch processing, Whiptail
Unlock more mixed storage performance on Dell PowerEdge R750 servers with Bro...
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
NVMe over Fabric
Keep data available without affecting user response time
MigratingFromVMwareToKubernetes_DEVCONF_25_CZ_vkSHTgf.pdf
MigratingFromVMwareToKubernetes_DEVCONF_25_CZ_vkSHTgf.pdf
Current and Future of Non-Volatile Memory on Linux
Storage user cases
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
Ds8000 Practical Performance Analysis P04 20060718
Ad

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
A Presentation on Artificial Intelligence
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Dropbox Q2 2025 Financial Results & Investor Presentation
A Presentation on Artificial Intelligence
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Ad

Redefining Data Redundancywith RAID Offload

  • 1. © 2024 KIOXIA America, Inc. All Rights Reserved. Redefining Data Redundancy with RAID Offload Mahinder Saluja Director of Strategy, SSD BU KIOXIA America, Inc.
  • 2. © 2024 KIOXIA America, Inc. All Rights Reserved. 2 • Storage Services Evolution • Data Redundancy Compute Challenges • Offload to SSD • Why xPU Should Leverage SSD Offload Agenda
  • 3. © 2024 KIOXIA America, Inc. All Rights Reserved. 3 Storage Cluster Controller Controller Controller NVMe-oF SmartNIC / xPU PCI Switch NVMe-oF SmartNIC / xPU PCI Switch Fabric Storage Services Evolution • Data redundancy in storage services demands high compute resources • xPUs are making inroads to offload and accelerate storage services stack • xPUs will be challenged for performance in future – NVMeTM performance continues to double with every PCIe® generation* Storage Services Stack* * Notational architecture, implementation dependent xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe , and NVMe-oF are trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG. Block Devices / File Systems / Data Base App Storage Pool / Virtual Volume Data Reduction RAID / Erasure Coding (EC) Storage Services RAID/ EC Storage Services RAID/ EC Storage Services RAID/ EC *Based on the PCIe evolution as published by PCI-SIG. Data Scrubbing
  • 4. © 2024 KIOXIA America, Inc. All Rights Reserved. 4 What are Data Redundancy Challenges? • Parity compute is memory bandwidth and CPU intensive • A RAID 5 partial stripe write requires ~10x DRAM throughput, for example a modest 4KB block RAID 5 write will consume 40KB DRAM bandwidth • The problem worsens with RAID 6 / erasure coding (EC) • System resources are overprovisioned to meet these demands RAID 5 Full Stripe Write Memory Overhead D0 XOR (CPU) D1 D2 D3 P’ P’ DRAM Throughput required to Compute parity for 4+1 disk RAID and 128K segment is 1792KB(14x) Data Overhead DRAM P 4.49 8.98 17.96 6.93 13.86 27.72 0 5 10 15 20 25 30 1 2 4 GB/s Number of Parities DRAM Throughput for RAID Write at 1 GB/s full stripe write 50% full stripe write Source: KIOXIA write 50% stripe
  • 5. © 2024 KIOXIA America, Inc. All Rights Reserved. 5 How xPUs Leverage RAID Storage Service Stack Storage Pool Software Stack Erasure Coding (EC)/RAID Driver NVMe-oF Protocol System Memory NVMeTM Driver NVMe-oF Virtual Device xPU DRAM XOR SSD SSD SSD SSD Storage Service Stack Storage Pool Software Stack Erasure Coding (EC)/RAID Driver NVMe-oF Protocol CMB Memory Pool NVMe Driver NVMe-oF Virtual Device xPU DRAM SSD SSD SSD SSD CMB XOR DMAC CMB XOR DMAC CMB XOR DMAC CMB XOR DMAC • KIOXIA NVMe SSD features – Controller memory buffers (CMB) to offload DRAM – Exclusive OR (XOR) engine to compute up to 8 parities – Direct memory access controller (DMAC) • To place data in host address space (including remote CMB) • RAID Offload enables parallel compute and linear scaling xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe , NVM Express , and NVMe-oF are trademarks of NVM Express, Inc.
  • 6. © 2024 KIOXIA America, Inc. All Rights Reserved. 6 How xPUs Can Leverage RAID – Sample Command Flow XOR New Data from NVMe-oF virtual device SSDs Flash SSDp P SSDd New Data RAID Driver command data XOR Flash Step 1: Data In Step 2: Compute Step 3: Write RAID 5 – Read / Modify / Write Sequence 1 Old Data P Data OP New Data 1 Read Old Data 1 Read Old Parity 1 Move Data to CMB 1 1 Move New Data in CMB Read Old Data in CM Read Old Parity in CMB New Parity 2 Parity Compute Command 2 Calculate New P Parity 3 Write Parity 3 Write Data P 3 3 Write New Data Write New Parity NVMe and NVMe-oF are trademarks of NVM Express, Inc.
  • 7. © 2024 KIOXIA America, Inc. All Rights Reserved. 7 Why Should xPUs Leverage SSD RAID Offload? NVMe-oF Virtual Device Storage Service Stack Storage Pool Software Stack Erasure Coding (EC) / RAID Driver NVMe-oF Protocol Controller Memory Buffers (CMB) Memory Pool NVMe Driver xPU PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE PARITY COMPUTE xPUs can leverage their own accelerators, but why offload to SSD? • Performance of accelerators will be limited to design time considerations • The high memory bandwidth requirements increases the cost of xPUs • The SSD offload can scale linearly with every SSD added to the cluster With offload… • Save compute and memory bandwidth for value add storage functions • Throw away operations like data scrubbing can be offloaded to SSD; 99% data movement reduction in data scrubbing operation to SSD • Develop cost-effective data processing systems and solutions • xPUs can scale RAID solution by leveraging its Remote Direct Memory Access (RDMA) capabilities xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe and NVMe-oF are trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG.
  • 8. © 2024 KIOXIA America, Inc. All Rights Reserved. 8 xPU represents a portfolio of architectures (i.e., CPU, GPU, FPGA and other accelerators), depending on the application. NVMe and NVMe-oF are trademarks of NVM Express, Inc. PCIe is a registered trademark of PCI SIG. Dell is a registered trademark of Dell Inc. PowerEdge is a trademark of Dell Inc.. Xeon is a registered trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries. RAID Offload Proof of Concept (PoC) results on CPU Storage Cluster Controller Controller Controller NVMe-oF SmartNIC / xPU PCI Switch NVMe-oF SmartNIC / xPU PCI Switch Fabric Storage Services RAID/ EC Storage Services RAID/ EC Storage Services RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC RAID/ EC System: DELL ® PowerEdge R650xs Xeon® Gold 6338N 2.2GHz (2 Socket, 32 Cores) PCIe® 4.0 , SSDs: 5xCM7 Gen4 (1.92TB) IO workload: FIO 512K Random Write @ 950MB/s RAID Offload : PoC Results (with KIOXIA CM7 and mdraid5) CPU attached System KIOXIA CM7 Gen4 x4 – mdRAID 5# RAID Offload % Benefit CPU Utilization 42 37 12% reduction DRAM Bandwidth (in MiB/s) 3450 340 91% reduction Offload Disabled Offload Enabled Scrubbing Time 129s 91s DRAM Bandwidth 10.24 GB/s 1.43 GB/s Total CPU Utilization 99.5% ~70% L3 Cache Misses 14.7M 4M Total PCIe® Write (MB/s) 3694 MB/s 159 MB/s Data Scrubbing PoC Results CPU attached
  • 9. © 2024 KIOXIA America, Inc. All Rights Reserved. 9 Additionally, KIOXIA is exploring offload functions beyond RAID Offload. xPU and SSD can team up to build a cost-effective storage services solution. For more information, read our RAID Offload brief. For more details, visit KIOXIA FMS Booth #307.