SlideShare a Scribd company logo
JOSHUA ROBINSON
FLASHBLADE ENGINEERING, PURE STORAGE
From Big Data To Big Intelligence:
Spark Meets Flashblade
A Pure Engineering Use Case
© 2017 PURE STORAGE INC.
2
ALL-FLASH STORAGE FOR
DATA-INTENSIVE COMPUTING
© 2017 PURE STORAGE INC.
3
FLASHBLADE FOR BIG DATA ANALYTICS
FAST
DATA
BIG
DATA
X
AGILE
DATA
X
 =
DATA
ADVANTAGE
© 2017 PURE STORAGE INC.
4
TRADITIONAL DATA ANALYTICS
EXAMPLE
SALES FEED/
PIPELINE
CRM
ENGINEERING
TICKETS
EXTRACT 
AGGREGATE
ANALYTICS
PARAMETERS
PRODUCT LOGS
 RAW LOG STORAGE
 GREP, AWK, ETC
STORAGE


COMPUTE
© 2017 PURE STORAGE INC.
5
MODERN BIG DATA ANALYTICS
EXAMPLE
SALES FEED/
PIPELINE
CRM
ENGINEERING
TICKETS
PRODUCT LOGS
EXTRACT 
AGGREGATE
© 2017 PURE STORAGE INC.
6
INFRASTRUCTURE OF BIG
DATA WAREHOUSES
>6PBs ACROSS 100s OF
HETEROGENEOUS DATA SILOS
© 2017 PURE STORAGE INC.
7
BIG DATA WAREHOUSES

INFRASTRUCTURE
Compute
Storage
BIG FAST SIMPLE
© 2017 PURE STORAGE INC.
8
INTRODUCING FLASHBLADE™
ALL-FLASH FILE AND OBJECT STORAGE
BIG 
Up to 8 PBs
FAST 
75 GBps / 8M IOPS
SIMPLE 
Seamlessly scalable 
BLADE
 PURITY
 FABRIC
Automating triage of test
failures in SW development
A Pure Engineering Use Case
© 2017 PURE STORAGE INC.
10
THE PROBLEM
Handful
 1 Test
coordinator
(Jenkins)
Handful
Handful
100s of tests
© 2017 PURE STORAGE INC.
11
THE PROBLEM
1,000 
test 
failures
20,000+
tests / day
20 Engineers
2x in the next 12 months
1000+
VMs
120+
FBs
20+
Jenkins
400+
clients
100+
Engineers
© 2017 PURE STORAGE INC.
12
THE DREAM
1.  Automate triaging of failures as much as possible
2.  Extract performance metrics from the logs
3.  Save our logs for future use
4.  Do all of this in a scalable system
5.  Real-time results!
© 2017 PURE STORAGE INC.
13
OUR DATA ANALYTICS PIPELINE
10 FB
20 
clients
100+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
14
OUR DATA ANALYTICS PIPELINE
100 FB
200
clients
1,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
15
OUR DATA ANALYTICS PIPELINE
100 FB
200
clients
1,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
16
OUR DATA ANALYTICS PIPELINE
120+
FB
400+
clients
4,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
17
OUR DATA ANALYTICS PIPELINE
1,000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
12
12
12
12
12
12
6G
40
40
40
40
18T
 18T
6T
6G
 12
Custom code
ü  Duplicate bug
ü  Infrastructure failure
ü  Performance regression
20,000+
tests
© 2017 PURE STORAGE INC.
18
© 2017 PURE STORAGE INC.
19
Processed: 18 TB 30 Billion events
per day
Extracted: 6 GB 8 Million events
per day
© 2017 PURE STORAGE INC.
20
THE POWER OF DATA ANALYTICS
20,000+
tests
1,000 
test 
failures
20,000+
tests
~30 
distinct
test 
failures
Data Analytics Pipeline
Shared Storage Benefits
© 2017 PURE STORAGE INC.
22
SCALING
STORAGE
© 2017 PURE STORAGE INC.
23
FLASHBLADE GUI
© 2017 PURE STORAGE INC.
24
SCALING
COMPUTE
© 2017 PURE STORAGE INC.
25
OUR DATA ANALYTICS PIPELINE
1,000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
4G
40
40
40
40
18T
 18T
6T
4G
 12
Custom code
20,000+
tests
12
12
12
12
© 2017 PURE STORAGE INC.
26
ANALYTICS PIPELINE
SCALING COMPUTE

1.  Download docker image
2.  Mount FlashBlade on container
3.  Hot-add to Spark cluster
ALL OF THIS CAN BE DONE IN A SINGLE COMMAND 
WITHOUT DISRUPTING YOUR SPARK JOBS!
© 2017 PURE STORAGE INC.
27
OUR DATA ANALYTICS PIPELINE
1000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
12
12
12
12
12
12
6G
40
40
40
40
18T
 18T
6T
20,000+
tests
6G
 12
Custom code
© 2017 PURE STORAGE INC.
28
INFRASTRUCTURE AGILITY
AD-HOC AND BURSTY ANALYTICS
rsyslog
1000-CORE SPARK CLUSTER
ON FLASHBLADE
© 2017 PURE STORAGE INC.
29
INFRASTRUCTURE SIMPLICITY
⎯  Physical Consolidation
⎯  Density: Multiple racks to a single FlashBlade
⎯  Management Consolidation 
⎯  Non-disruptive upgrades
⎯  Storage capacity planning, data access, security
⎯  Backups and Restores
© 2017 PURE STORAGE INC.
30
FLASHBLADE

File & Object
AND
2.5 PBs (1:1)
N+2 REDUNDANCY
Purity
PLUS
Pure1
17TB
52TB
BLADES
Power

1150Watt/PB
8M IOPs
AND
75 GB/s
PERFORMANCE
© 2017 PURE STORAGE INC.
31
RESOURCES

Apache Spark White Papers:
1)  Guide to Supporting On-Premise Spark
Deployments with a Cloud-Scale Data Platform 
2) Engineering Unplugged: A Discussion with Pure
Storage's Brian Gold on Big Data Analytics for
Apache Spark 
Big Data Analytics: purestorage.com/analytics
FlashBlade Product Info: purestorage.com/flashblade
Storage for big-data by Joshua Robinson

More Related Content

PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
PPTX
Cloud Expo NYC 2017: Big Data in IoT
PPTX
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
PDF
Realtime Analytical Query Processing and Predictive Model Building on High Di...
PPTX
Running Databases in Containers - Overcome the Challenges of Heavy Containers
PPTX
Cloud Expo NYC 2017: Running Databases in Containers
PPTX
How Open Source Will Change How You Think about Storage - LGI Tech Summit
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
Cloud Expo NYC 2017: Big Data in IoT
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Running Databases in Containers - Overcome the Challenges of Heavy Containers
Cloud Expo NYC 2017: Running Databases in Containers
How Open Source Will Change How You Think about Storage - LGI Tech Summit

What's hot (20)

PPTX
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
PDF
Serverless data lake architecture
PDF
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
PDF
WekaIO: Making Machine Learning Compute Bound Again
PDF
Presto + Alluxio on steroids a romantic drama on Production with happy end
PDF
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
PDF
Advancing Open Software Defined Storage
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
PDF
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PPTX
IoT Architectural Overview - 3 use case studies from InfluxData
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
PDF
What's Next for Google's BigTable
PPTX
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
PDF
Best Practices for Using Alluxio with Spark
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Lenovo: Elastic Stack Practices in Enterprise Integration
PDF
Performance Models for Apache Accumulo
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Ceph used in Cancer Research at OICR
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
Serverless data lake architecture
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
WekaIO: Making Machine Learning Compute Bound Again
Presto + Alluxio on steroids a romantic drama on Production with happy end
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Advancing Open Software Defined Storage
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
IoT Architectural Overview - 3 use case studies from InfluxData
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
What's Next for Google's BigTable
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
Best Practices for Using Alluxio with Spark
Data Orchestration for AI, Big Data, and Cloud
Lenovo: Elastic Stack Practices in Enterprise Integration
Performance Models for Apache Accumulo
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Ceph used in Cancer Research at OICR
Ad

Similar to Storage for big-data by Joshua Robinson (20)

PDF
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
PDF
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
PDF
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
PPTX
Data at the corner of SAP and AWS
PDF
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
PDF
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
PDF
Top 5 Lessons Learned in Deploying AI in the Real World
PDF
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
PDF
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
PPTX
Postgres Vision 2018: Taking Postgres Everywhere
 
PPTX
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
PDF
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
PPTX
Demystifying Data Warehouse as a Service
PDF
Demystifying Data Warehousing as a Service - DFW
PDF
Cisco Kinetic. Раскрывая ценность данных
PDF
Aerospike Meetup - Introduction - Ami - 04 March 2020
PPTX
Architecting a Modern Data Warehouse: Enterprise Must-Haves
PDF
Building on Multi-Model Databases
PPTX
How Financial Services can Save On File Storage
PPTX
3..6 Digitalized Oil and Gas Fields.pptx
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Data at the corner of SAP and AWS
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Top 5 Lessons Learned in Deploying AI in the Real World
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Postgres Vision 2018: Taking Postgres Everywhere
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Demystifying Data Warehouse as a Service
Demystifying Data Warehousing as a Service - DFW
Cisco Kinetic. Раскрывая ценность данных
Aerospike Meetup - Introduction - Ami - 04 March 2020
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Building on Multi-Model Databases
How Financial Services can Save On File Storage
3..6 Digitalized Oil and Gas Fields.pptx
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology

Storage for big-data by Joshua Robinson

  • 1. JOSHUA ROBINSON FLASHBLADE ENGINEERING, PURE STORAGE From Big Data To Big Intelligence: Spark Meets Flashblade A Pure Engineering Use Case
  • 2. © 2017 PURE STORAGE INC. 2 ALL-FLASH STORAGE FOR DATA-INTENSIVE COMPUTING
  • 3. © 2017 PURE STORAGE INC. 3 FLASHBLADE FOR BIG DATA ANALYTICS FAST DATA BIG DATA X AGILE DATA X = DATA ADVANTAGE
  • 4. © 2017 PURE STORAGE INC. 4 TRADITIONAL DATA ANALYTICS EXAMPLE SALES FEED/ PIPELINE CRM ENGINEERING TICKETS EXTRACT AGGREGATE ANALYTICS PARAMETERS PRODUCT LOGS RAW LOG STORAGE GREP, AWK, ETC STORAGE COMPUTE
  • 5. © 2017 PURE STORAGE INC. 5 MODERN BIG DATA ANALYTICS EXAMPLE SALES FEED/ PIPELINE CRM ENGINEERING TICKETS PRODUCT LOGS EXTRACT AGGREGATE
  • 6. © 2017 PURE STORAGE INC. 6 INFRASTRUCTURE OF BIG DATA WAREHOUSES >6PBs ACROSS 100s OF HETEROGENEOUS DATA SILOS
  • 7. © 2017 PURE STORAGE INC. 7 BIG DATA WAREHOUSES INFRASTRUCTURE Compute Storage BIG FAST SIMPLE
  • 8. © 2017 PURE STORAGE INC. 8 INTRODUCING FLASHBLADE™ ALL-FLASH FILE AND OBJECT STORAGE BIG Up to 8 PBs FAST 75 GBps / 8M IOPS SIMPLE Seamlessly scalable BLADE PURITY FABRIC
  • 9. Automating triage of test failures in SW development A Pure Engineering Use Case
  • 10. © 2017 PURE STORAGE INC. 10 THE PROBLEM Handful 1 Test coordinator (Jenkins) Handful Handful 100s of tests
  • 11. © 2017 PURE STORAGE INC. 11 THE PROBLEM 1,000 test failures 20,000+ tests / day 20 Engineers 2x in the next 12 months 1000+ VMs 120+ FBs 20+ Jenkins 400+ clients 100+ Engineers
  • 12. © 2017 PURE STORAGE INC. 12 THE DREAM 1.  Automate triaging of failures as much as possible 2.  Extract performance metrics from the logs 3.  Save our logs for future use 4.  Do all of this in a scalable system 5.  Real-time results!
  • 13. © 2017 PURE STORAGE INC. 13 OUR DATA ANALYTICS PIPELINE 10 FB 20 clients 100+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12
  • 14. © 2017 PURE STORAGE INC. 14 OUR DATA ANALYTICS PIPELINE 100 FB 200 clients 1,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 15. © 2017 PURE STORAGE INC. 15 OUR DATA ANALYTICS PIPELINE 100 FB 200 clients 1,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 16. © 2017 PURE STORAGE INC. 16 OUR DATA ANALYTICS PIPELINE 120+ FB 400+ clients 4,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 17. © 2017 PURE STORAGE INC. 17 OUR DATA ANALYTICS PIPELINE 1,000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 12 12 12 12 12 12 6G 40 40 40 40 18T 18T 6T 6G 12 Custom code ü  Duplicate bug ü  Infrastructure failure ü  Performance regression 20,000+ tests
  • 18. © 2017 PURE STORAGE INC. 18
  • 19. © 2017 PURE STORAGE INC. 19 Processed: 18 TB 30 Billion events per day Extracted: 6 GB 8 Million events per day
  • 20. © 2017 PURE STORAGE INC. 20 THE POWER OF DATA ANALYTICS 20,000+ tests 1,000 test failures 20,000+ tests ~30 distinct test failures Data Analytics Pipeline
  • 22. © 2017 PURE STORAGE INC. 22 SCALING STORAGE
  • 23. © 2017 PURE STORAGE INC. 23 FLASHBLADE GUI
  • 24. © 2017 PURE STORAGE INC. 24 SCALING COMPUTE
  • 25. © 2017 PURE STORAGE INC. 25 OUR DATA ANALYTICS PIPELINE 1,000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 4G 40 40 40 40 18T 18T 6T 4G 12 Custom code 20,000+ tests 12 12 12 12
  • 26. © 2017 PURE STORAGE INC. 26 ANALYTICS PIPELINE SCALING COMPUTE 1.  Download docker image 2.  Mount FlashBlade on container 3.  Hot-add to Spark cluster ALL OF THIS CAN BE DONE IN A SINGLE COMMAND WITHOUT DISRUPTING YOUR SPARK JOBS!
  • 27. © 2017 PURE STORAGE INC. 27 OUR DATA ANALYTICS PIPELINE 1000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 12 12 12 12 12 12 6G 40 40 40 40 18T 18T 6T 20,000+ tests 6G 12 Custom code
  • 28. © 2017 PURE STORAGE INC. 28 INFRASTRUCTURE AGILITY AD-HOC AND BURSTY ANALYTICS rsyslog 1000-CORE SPARK CLUSTER ON FLASHBLADE
  • 29. © 2017 PURE STORAGE INC. 29 INFRASTRUCTURE SIMPLICITY ⎯  Physical Consolidation ⎯  Density: Multiple racks to a single FlashBlade ⎯  Management Consolidation ⎯  Non-disruptive upgrades ⎯  Storage capacity planning, data access, security ⎯  Backups and Restores
  • 30. © 2017 PURE STORAGE INC. 30 FLASHBLADE File & Object AND 2.5 PBs (1:1) N+2 REDUNDANCY Purity PLUS Pure1 17TB 52TB BLADES Power 1150Watt/PB 8M IOPs AND 75 GB/s PERFORMANCE
  • 31. © 2017 PURE STORAGE INC. 31 RESOURCES Apache Spark White Papers: 1)  Guide to Supporting On-Premise Spark Deployments with a Cloud-Scale Data Platform 2) Engineering Unplugged: A Discussion with Pure Storage's Brian Gold on Big Data Analytics for Apache Spark Big Data Analytics: purestorage.com/analytics FlashBlade Product Info: purestorage.com/flashblade