SlideShare a Scribd company logo
Revolutionary Storage
For Modern Applications
Sanjay Sabnis
@sabhub1
Big Data Science Meetup
@Paviliondata
05/25/2018
Agenda
• Welcome
• Big Data Application Demands
• Next Generation Storage for Big Data Applications
• Big Data Use Cases
It is About Data
Expectations
Speed
Performance
Latency
Accuracy
Solving the Scalability Problem
RACK
Adding More Nodes
Add More memory
Upgrade Networking
• Need more storage? Add more nodes ßà Comes with Compute As
well
• Under-utilization of storage - Islands of storage still exist
• Limited by Rack Level Data Management at Scale
• Network is not up to date to utilize new features of hardware
Infrastructure
Connectivity Cognition
2.0 3.0
How do we connect
the world?
How do we make
sense of the world?
Crossing the Chasm
AI
Autonomous Vehicles
Image Recognition
Von Neumann
Time to rethink infrastructure, tooling, and
development practices.
Modern Applications Requirements
• Compute/Network/Storage
• Rack Awareness/DC Awareness – Data Locality/HA
• Master/Slave - Scalable
• Master-less - Scalable
• IOPS, Bandwidth – High Data Transfer Rates – 25/50/100 GB is
Standard now.
• Storage Awareness? – This is something new!
• Non-compute Centric Data Management
The Compute & Storage Disconnect
• Compute and Storage Age Differently
• Compute has Moore’s Law, what about storage?
• Replacing Compute calls for replacing disks – Fixed Density, more $$$
• We all have been using SATA drives
• There is a new interface called NVMe for SSD (Non Volatile Memory
Express)
It is a logical device interface specification
Storage Protocol Differences
SATA NVMe
End of Freeway
11
Comparing NVMe to SATA
SATA SSD NVMe SSD NVMe Difference
Read BW (MB/s) 500 3300 6.6X
4K Read IOPS 64K 830K 13X
Write BW (MB/s) 475 2100 4.4X
4K Write IOPS 5K 200K 40X
4K Mixed IOPS (70:30 R:W) 11K 550K 55X
DATA INTENSIVE WORKLOADS
Analytics, IoT, Streaming Media, AI/ML, Databases
NVMe-oF SSD Array
• High Performance
• Cost Efficient
• High Utilization
• Scalable (14 TB to 1 PB)
• HA
• PaYG Model (Pay as You Grow)
NVMe-Based Storage for Big Data
Pavilion All-NVMe Storage Array
13
Advantages for Big Data Deployments
Reduce per-rack costs up to 72%
Improve Storage Utilization 2X+
Free up stranded capacity residing on DAS
Management Flexibility
Less raw storage deployed lowers IT Admin Costs
Move data sets from one server to another without copying
Reduce Infrastructure
Less Servers required, or consolidate more DB instances per
server
Eliminate DAS SSDs
Leverage Full-Performance, Space-Efficient Copies
14
Performance (Latency & Bandwidth) of Direct Attached Storage
Serviceability and Data Management of Shared Storage
DAS
Performance and Cost Advantages
Index More Data With Splunk
Lower Costs of
noSQL Deployments
Using networked Pavilion storage instead of direct-attached SSDs gives better performance per
server, allowing you to reduce server count and size, plus gain the cost advantages of a SAN
15
Pavilion All-NVMe Storage for Big Data
120 GB/S
PERFORMANCE
Up to 40 x 100GE
Ports
MODULAR
14TB – 1PB
CAPACITY
Up To 20, Active-
active Controllers
RESILIENCY
4 RU
DENSITY
Raid-6,
Snapshots, Thin
Provisioning
DATA MANAGEMENT
NVME & NVMEOF
100% STANDARDS COMPLIANT
X86, 2.5” NVMe
SSD
STANDARD OFF-THE-SHELF
COMPONENTS
1/10TH
$COST/IOPS
DISRUPTIVE ECONOMICS
Shared Block Storage For Big Data Applications
ü Hosts connected using
25/40/50/100Gb Ethernet
ü NVMe block storage presented to
host servers using
community/standard NVMeoF driver
ü No custom host software required
ü 10s of micro-second latency
ü Latency of DAS SSDs
ü Full HA capability and hot-pluggable
components
Thin-Provisioned
NVMe volumes
presented to the
host server
17
Management Integration
18
Rest API
Use Cases
Cassandra
C*
C*
C*
1
2
3
Volumes for Node2
a2 b2 c2
a Commit Log
b Data
c Log
Volumes for Node1
a1 b1 c1
Rack Scale
• Dense Compute Rack
• Easy to Add or Replace nodes
• Integrates into DevOps using Rest API
• Thin provisioning to save flash resources
• Increase Volume Size Dynamically
• Manage instant data copies using Rest API
~ 1PB Storage
Snapshot/Clone
Data Backup/Restore
Rack 1 Rack 2
Adding a New Shard • Adding Shard to the Cluster
• Add shard to scale the MongoDB cluster horizontally
• Affects the balance of chunks among the shards of a cluster for all
existing sharded collections.
• The balancer will begin migrating chunks so that the cluster will
achieve balance
• Rebalance will affect existing Read/Write and IOPS performance.
PRIMARY
SECONDARY
SECONDARY
SHARD 1
PRIMARY
SECONDARY
SECONDARY
SHARD 2
APP SERVER APP SERVERAPP SERVER
REPLICA SET 1 REPLICA SET 2
Present Individual Volumes
For each node
PRIMARY
SECONDARY
SECONDARY
SHARD 3
REPLICA SET 3
New Shard
>
> Speed Shard Rebalancing
• Pavilion Advantages
• No sizing activity required.
• No impact of no. of parallel chunk migrations, same IOPS for all with 40 ports
• Pre Configure Pavilion volumes for future shard expansion to automate the
scaling activity
• Over provision the volume size to alleviate IOPS performance as data grows
>
MongoDB - Leveraging Snapshots and Clones
PRIMARY
SECONDARY
SECONDARY
.
.
SECONDARY
Instant Clone
Point in Time Instant Pavilion Snapshots
PRODUCTION
PRIMARY
SECONDARY
SECONDARY
.
.
.
DEV/QA/PREPROD
Backup/Archive
Instant Clone
Use Clone to Scale Replica Set Use Clone to spin up DEV/QA/PREPROD quickly
Pavilion Instant Clones
SECONDARY
Replication
• Scale MongoDB infrastructure without downtime
• Rapid volume cloning capabilities allow for new backup and deployment strategies
• Instant cloning makes node recovery and replacement easy
Reduce Splunk Indexer Sprawl
PAVILION DATA CONFIDENTIAL & PROPRIETARY 23
HOT
WARM
COLD
FROZEN
Tier 1 - $$$$
Tier 2 - $$$
Tier 3 - $$
Tier 4 - $
Backup
Read-Only Snapshots
QA/Dev/PreProd Testing
R/W Clones
Consolidate All Splunk Data on One High-Speed Storage Platform, Simplify Backup and Copy Management
Addressing Splunk Challenges
24
Splunk Solution Design Considerations
Insufficient disk I/O is the most common limitation in Splunk infrastructure
Pavilion delivers over 100 GB/s of bandwidth, and 20 Million IOPS from a
compact, 4U Chassis, which can power even the largest Splunk
deployments
Review the disk subsystem requirements before provisioning your hardware
Pavilion’s scalable platform allows you to focus on the needs on the
compute infrastructure instead of storage
More disks (specifically, more spindles) are better for indexing performance
Pavilion’s low latency storage platform eliminates storage as the indexing
bottleneck
Total throughput of the entire system is important.
Pavilion delivers significant improvements in performance and improves
decision times.
The ratio of disks to disk controllers in a particular system should be higher,
similar to how you provision a database host
Pavilion’s performance and capacity allows for easy storage configuration.
Hot Bucket – Cannot Backup
Take backup of any volume any time without performance overhead on
indexing nodes by using the Pavilion Snapshot feature
Modernize Database Deployments
25
ü Simplify Infrastructure by disaggregating storage
into a centralized, rack-scale appliance
ü Leverage shared storage resources at the speed
and latency of local SSDs
ü Reduce raw flash required
ü Independently scale compute, networking and
storage to maximize flexibility
ü Move to ‘storage-less’ 1U servers to increase
compute density per rack
ü Centralize storage resources to facilitate easy
backup and restore
ü Instantly deploy new copies of the database for
test/dev/QA purposes
DENSE
Compute CLUSTER
Other Use Cases
…………
New Data Architectures
Centralized Logging
“We are a log Management Company that happens
To Stream Videos”
-Netflix Chief Architect
Log Monitoring/Forwarding/….
No Log Forwarding from each Node
Save CPU Cycles
Container Architecture - Cloud
• Fits into Kubernetes or OpenStack implementations
• Integrate Pavilion REST API with Cinder Wrapper provided by the Pavilion
• Storage can be used as Static or Dynamic Volume provisioning
• Fits readily into DevOps CI/CD setup with provided REST API interfaces
• Utilize the Pavilion Snapshot, Clone and volume migration features to manage data beyond lifecycle of the virtual image
• Supports Block Storage, NFS ( S3 support in near future ).
Kubernetes
Pod
Nova
KeyStone
Boot
Launch
Authentication
Persistent
Volume
Docker
Kubernetes Cluster - Datacenter
OpenStack
CSI
Wrapper
Cinder Block
Storage Volumes
Rack Scale Flash Array
Docker’s Containers-as-a-service
(CaaS) platform that can run atop
cloud-based infrastructure such as
OpenStack, or on bare metal
infrastructure, providing complete
application lifecycle management
for container deployments.
HiBD (Hi-Performance Big Data)
• NVMe-oF opens up opportunity for commoditizing the HiBD
• RDMA + NVMe = Killer IOPS & Bandwidth
• Lots of Development has been done using RDMA-based HiBD
Apache Crail - Incubating
Pavilion - 120 GB/S
With DAS Latency
Crail is designed from ground up
for modern high-performance
networking and storage
hardware (RDMA, NVMe, NVMf,
etc.). It leverages user-level I/O to
access hardware directly from
the application context, providing
bare-metal I/O performance to
analytics workloads.
Storage Awareness
Revolutionary Storage for Modern Databases, Applications and Infrastrcture

More Related Content

PDF
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
PPTX
AliCloud Object Storage Service (OSS) Core Features
PPTX
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
PPTX
Data Con LA 2019 - Patterns for Persistence and Streaming in Cloud Architectu...
PDF
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
PDF
Azure Custom Backup Solution for SAP NetWeaver
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
PPTX
Dev ops for big data cluster management tools
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
AliCloud Object Storage Service (OSS) Core Features
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Patterns for Persistence and Streaming in Cloud Architectu...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Azure Custom Backup Solution for SAP NetWeaver
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Dev ops for big data cluster management tools

What's hot (20)

PDF
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
PDF
SAP ASCS on Kubernetes - A Proposal
PDF
Running Analytics at the Speed of Your Business
PDF
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
PDF
Building a Hybrid Cloud Solution
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
PDF
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
PPTX
Scaling HDFS at Xiaomi
PPTX
Red Hat Storage Day Atlanta - Why Software Defined Storage Matters
PPTX
Containerized Hadoop beyond Kubernetes
PPTX
Responding to Digital Transformation With RDS Database Technology
PPTX
What's the Hadoop-la about Kubernetes?
PDF
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
PDF
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
PPTX
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
PDF
Apache Flink & Kudu: a connector to develop Kappa architectures
PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
PDF
Red Hat Storage Day New York - New Reference Architectures
PDF
#GeodeSummit - Redis to Geode Adaptor
PPTX
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
SAP ASCS on Kubernetes - A Proposal
Running Analytics at the Speed of Your Business
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building a Hybrid Cloud Solution
Building Apps with Distributed In-Memory Computing Using Apache Geode
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
Scaling HDFS at Xiaomi
Red Hat Storage Day Atlanta - Why Software Defined Storage Matters
Containerized Hadoop beyond Kubernetes
Responding to Digital Transformation With RDS Database Technology
What's the Hadoop-la about Kubernetes?
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
Red Hat Storage Day Atlanta - Persistent Storage for Linux Containers
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
Apache Flink & Kudu: a connector to develop Kappa architectures
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Red Hat Storage Day New York - New Reference Architectures
#GeodeSummit - Redis to Geode Adaptor
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Ad

Similar to Revolutionary Storage for Modern Databases, Applications and Infrastrcture (20)

PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
Building a High Performance Analytics Platform
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
PPTX
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
PDF
VMworld 2013: Virtualizing Databases: Doing IT Right
PDF
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
PDF
Red hat ceph storage customer presentation
PPTX
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
PPTX
HPC and cloud distributed computing, as a journey
PPTX
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
PPTX
VMworld 2015: Advanced SQL Server on vSphere
PDF
Latest (storage IO) patterns for cloud-native applications
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
PPTX
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
PDF
VMworld 2013: How SRP Delivers More Than Power to Their Customers
PDF
Red Hat Storage Roadmap
PDF
Red Hat Storage Roadmap
PPTX
Leveraging OpenStack Cinder for Peak Application Performance
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Building a High Performance Analytics Platform
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
VMworld 2013: Virtualizing Databases: Doing IT Right
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Red hat ceph storage customer presentation
Innovations of .NET and Azure (Recaps of Build 2017 selected sessions)
HPC and cloud distributed computing, as a journey
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
New Ceph capabilities and Reference Architectures
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
VMworld 2015: Advanced SQL Server on vSphere
Latest (storage IO) patterns for cloud-native applications
HPC DAY 2017 | HPE Storage and Data Management for Big Data
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
VMworld 2013: How SRP Delivers More Than Power to Their Customers
Red Hat Storage Roadmap
Red Hat Storage Roadmap
Leveraging OpenStack Cinder for Peak Application Performance
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced IT Governance
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced IT Governance
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Revolutionary Storage for Modern Databases, Applications and Infrastrcture

  • 1. Revolutionary Storage For Modern Applications Sanjay Sabnis @sabhub1 Big Data Science Meetup @Paviliondata 05/25/2018
  • 2. Agenda • Welcome • Big Data Application Demands • Next Generation Storage for Big Data Applications • Big Data Use Cases
  • 3. It is About Data
  • 5. Solving the Scalability Problem RACK Adding More Nodes Add More memory Upgrade Networking • Need more storage? Add more nodes ßà Comes with Compute As well • Under-utilization of storage - Islands of storage still exist • Limited by Rack Level Data Management at Scale • Network is not up to date to utilize new features of hardware
  • 6. Infrastructure Connectivity Cognition 2.0 3.0 How do we connect the world? How do we make sense of the world?
  • 7. Crossing the Chasm AI Autonomous Vehicles Image Recognition Von Neumann Time to rethink infrastructure, tooling, and development practices.
  • 8. Modern Applications Requirements • Compute/Network/Storage • Rack Awareness/DC Awareness – Data Locality/HA • Master/Slave - Scalable • Master-less - Scalable • IOPS, Bandwidth – High Data Transfer Rates – 25/50/100 GB is Standard now. • Storage Awareness? – This is something new! • Non-compute Centric Data Management
  • 9. The Compute & Storage Disconnect • Compute and Storage Age Differently • Compute has Moore’s Law, what about storage? • Replacing Compute calls for replacing disks – Fixed Density, more $$$ • We all have been using SATA drives • There is a new interface called NVMe for SSD (Non Volatile Memory Express) It is a logical device interface specification
  • 10. Storage Protocol Differences SATA NVMe End of Freeway
  • 11. 11 Comparing NVMe to SATA SATA SSD NVMe SSD NVMe Difference Read BW (MB/s) 500 3300 6.6X 4K Read IOPS 64K 830K 13X Write BW (MB/s) 475 2100 4.4X 4K Write IOPS 5K 200K 40X 4K Mixed IOPS (70:30 R:W) 11K 550K 55X
  • 12. DATA INTENSIVE WORKLOADS Analytics, IoT, Streaming Media, AI/ML, Databases NVMe-oF SSD Array • High Performance • Cost Efficient • High Utilization • Scalable (14 TB to 1 PB) • HA • PaYG Model (Pay as You Grow) NVMe-Based Storage for Big Data
  • 14. Advantages for Big Data Deployments Reduce per-rack costs up to 72% Improve Storage Utilization 2X+ Free up stranded capacity residing on DAS Management Flexibility Less raw storage deployed lowers IT Admin Costs Move data sets from one server to another without copying Reduce Infrastructure Less Servers required, or consolidate more DB instances per server Eliminate DAS SSDs Leverage Full-Performance, Space-Efficient Copies 14 Performance (Latency & Bandwidth) of Direct Attached Storage Serviceability and Data Management of Shared Storage DAS
  • 15. Performance and Cost Advantages Index More Data With Splunk Lower Costs of noSQL Deployments Using networked Pavilion storage instead of direct-attached SSDs gives better performance per server, allowing you to reduce server count and size, plus gain the cost advantages of a SAN 15
  • 16. Pavilion All-NVMe Storage for Big Data 120 GB/S PERFORMANCE Up to 40 x 100GE Ports MODULAR 14TB – 1PB CAPACITY Up To 20, Active- active Controllers RESILIENCY 4 RU DENSITY Raid-6, Snapshots, Thin Provisioning DATA MANAGEMENT NVME & NVMEOF 100% STANDARDS COMPLIANT X86, 2.5” NVMe SSD STANDARD OFF-THE-SHELF COMPONENTS 1/10TH $COST/IOPS DISRUPTIVE ECONOMICS
  • 17. Shared Block Storage For Big Data Applications ü Hosts connected using 25/40/50/100Gb Ethernet ü NVMe block storage presented to host servers using community/standard NVMeoF driver ü No custom host software required ü 10s of micro-second latency ü Latency of DAS SSDs ü Full HA capability and hot-pluggable components Thin-Provisioned NVMe volumes presented to the host server 17
  • 20. Cassandra C* C* C* 1 2 3 Volumes for Node2 a2 b2 c2 a Commit Log b Data c Log Volumes for Node1 a1 b1 c1 Rack Scale • Dense Compute Rack • Easy to Add or Replace nodes • Integrates into DevOps using Rest API • Thin provisioning to save flash resources • Increase Volume Size Dynamically • Manage instant data copies using Rest API ~ 1PB Storage Snapshot/Clone Data Backup/Restore Rack 1 Rack 2
  • 21. Adding a New Shard • Adding Shard to the Cluster • Add shard to scale the MongoDB cluster horizontally • Affects the balance of chunks among the shards of a cluster for all existing sharded collections. • The balancer will begin migrating chunks so that the cluster will achieve balance • Rebalance will affect existing Read/Write and IOPS performance. PRIMARY SECONDARY SECONDARY SHARD 1 PRIMARY SECONDARY SECONDARY SHARD 2 APP SERVER APP SERVERAPP SERVER REPLICA SET 1 REPLICA SET 2 Present Individual Volumes For each node PRIMARY SECONDARY SECONDARY SHARD 3 REPLICA SET 3 New Shard > > Speed Shard Rebalancing • Pavilion Advantages • No sizing activity required. • No impact of no. of parallel chunk migrations, same IOPS for all with 40 ports • Pre Configure Pavilion volumes for future shard expansion to automate the scaling activity • Over provision the volume size to alleviate IOPS performance as data grows >
  • 22. MongoDB - Leveraging Snapshots and Clones PRIMARY SECONDARY SECONDARY . . SECONDARY Instant Clone Point in Time Instant Pavilion Snapshots PRODUCTION PRIMARY SECONDARY SECONDARY . . . DEV/QA/PREPROD Backup/Archive Instant Clone Use Clone to Scale Replica Set Use Clone to spin up DEV/QA/PREPROD quickly Pavilion Instant Clones SECONDARY Replication • Scale MongoDB infrastructure without downtime • Rapid volume cloning capabilities allow for new backup and deployment strategies • Instant cloning makes node recovery and replacement easy
  • 23. Reduce Splunk Indexer Sprawl PAVILION DATA CONFIDENTIAL & PROPRIETARY 23 HOT WARM COLD FROZEN Tier 1 - $$$$ Tier 2 - $$$ Tier 3 - $$ Tier 4 - $ Backup Read-Only Snapshots QA/Dev/PreProd Testing R/W Clones Consolidate All Splunk Data on One High-Speed Storage Platform, Simplify Backup and Copy Management
  • 24. Addressing Splunk Challenges 24 Splunk Solution Design Considerations Insufficient disk I/O is the most common limitation in Splunk infrastructure Pavilion delivers over 100 GB/s of bandwidth, and 20 Million IOPS from a compact, 4U Chassis, which can power even the largest Splunk deployments Review the disk subsystem requirements before provisioning your hardware Pavilion’s scalable platform allows you to focus on the needs on the compute infrastructure instead of storage More disks (specifically, more spindles) are better for indexing performance Pavilion’s low latency storage platform eliminates storage as the indexing bottleneck Total throughput of the entire system is important. Pavilion delivers significant improvements in performance and improves decision times. The ratio of disks to disk controllers in a particular system should be higher, similar to how you provision a database host Pavilion’s performance and capacity allows for easy storage configuration. Hot Bucket – Cannot Backup Take backup of any volume any time without performance overhead on indexing nodes by using the Pavilion Snapshot feature
  • 25. Modernize Database Deployments 25 ü Simplify Infrastructure by disaggregating storage into a centralized, rack-scale appliance ü Leverage shared storage resources at the speed and latency of local SSDs ü Reduce raw flash required ü Independently scale compute, networking and storage to maximize flexibility ü Move to ‘storage-less’ 1U servers to increase compute density per rack ü Centralize storage resources to facilitate easy backup and restore ü Instantly deploy new copies of the database for test/dev/QA purposes DENSE Compute CLUSTER
  • 27. New Data Architectures Centralized Logging “We are a log Management Company that happens To Stream Videos” -Netflix Chief Architect Log Monitoring/Forwarding/…. No Log Forwarding from each Node Save CPU Cycles
  • 28. Container Architecture - Cloud • Fits into Kubernetes or OpenStack implementations • Integrate Pavilion REST API with Cinder Wrapper provided by the Pavilion • Storage can be used as Static or Dynamic Volume provisioning • Fits readily into DevOps CI/CD setup with provided REST API interfaces • Utilize the Pavilion Snapshot, Clone and volume migration features to manage data beyond lifecycle of the virtual image • Supports Block Storage, NFS ( S3 support in near future ). Kubernetes Pod Nova KeyStone Boot Launch Authentication Persistent Volume Docker Kubernetes Cluster - Datacenter OpenStack CSI Wrapper Cinder Block Storage Volumes Rack Scale Flash Array Docker’s Containers-as-a-service (CaaS) platform that can run atop cloud-based infrastructure such as OpenStack, or on bare metal infrastructure, providing complete application lifecycle management for container deployments.
  • 29. HiBD (Hi-Performance Big Data) • NVMe-oF opens up opportunity for commoditizing the HiBD • RDMA + NVMe = Killer IOPS & Bandwidth • Lots of Development has been done using RDMA-based HiBD Apache Crail - Incubating Pavilion - 120 GB/S With DAS Latency Crail is designed from ground up for modern high-performance networking and storage hardware (RDMA, NVMe, NVMf, etc.). It leverages user-level I/O to access hardware directly from the application context, providing bare-metal I/O performance to analytics workloads. Storage Awareness