SlideShare a Scribd company logo
Dipti Borkar | Head of Product, Alluxio
Shailesh Manjrekar | Head of AI/ML Product and Solutions, SwiftStack
NextGen Data Analytics Stack – Alluxio and SwiftStack
Edge to Core to Cloud
Unstoppable Data Growth – Edge to core to cloud
Emphasis on capturing value and show return on investment (ROI) from data
*, IDC Worldwide Storage in Big Data Forecast, 2015-2019, October 2015 and IDC Directions
Value Capture is key
30B
IoT Connected
Devices*
By 2020,
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
101010101010101
100-250EB
Big Data Storage
Capacity*
10101010101010
1010101010101010
101010101
44ZB
data created*
Status quo - Existing solutions
Business leaders and storage architects struggling to show return on investment
SILO 1
Compute
+
Storage
SILO 3
Compute
+
Storage
SILO 5
Compute
+
Storage
SILO 2 SILO 4
DAS silos
Poor
Utilization
DIY
Fatigue
High Capex
High OpEx
INEFFICIENT AND EXPENSIVE
Data
Gravity
4 big trends driving the need for a new data analytics
stack
Separation of
Compute &
Storage
Hybrid – Multi
cloud environments
Self-service data
across the
enterprise
Rise
of the object
store
Customer Challenges with existing solutions
Lack of Enterprise ready products and continued pressure of every increasing cloud OpEx
Ever increasing operating expenditures on existing (a) poorly utilized DAS solutions or (b)
cloud storage deployments costs at scale
Need for high throughput stack with API compatibility to support batch, interactive and
advanced analytical workloads
Lack of enterprise ready and multi-cloud data lake systems – at scale deployments with
lifecycle management, self healing, geo-replication and faster re-builds
CHALLENGE 1
CHALLENGE 2
CHALLENGE 3
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
Co-located
Big data journey and innovation options for enterprises
Co-located
compute & HDFS
on the same cluster
Disaggregated
compute & HDFS
on the same cluster
MR / Hive
HDFS
Hive
HDFS
Disaggregated
Burst HDFS data in the
cloud,
public or private
Support Presto, Spark
and other computes
without app changes
Enable & accelerate big
data on
object stores
Transition to Object store
HDFS for Hybrid Cloud
Support more frameworks
§ Typically compute-bound
clusters over 100% capacity
§ Compute & I/O need to be
scaled together even when
not needed
§ Compute & I/O can be
scaled independently but I/O
still needed on HDFS which
is expensive
1 2
3
4
5
Multi-cloud storage and
Data Management
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 Driver
The SwiftStack Data Analytics Solution with Alluxio
Accelerated Compute, Data accessibility and Elasticity
SwiftStack Data Analytics Solution – business use cases
Customer, Security and Fraud Analysis
Precision Medicine and Bio-Informatics
Customer Churn/ Sentiment Analysis
Analytics As a Service, Operational
Analytics
Internet of Things / Everything
Financial Services, FSI
Healthcare and Life Sciences,
Genomics
Cloud Service Providers
Oil and Gas, Industrial Internet and
Manufacturing
Media and Entertainment
SwiftStack Data Analytics solution – Value to be Captured by Enterprises
Data and analytics as a source of competitive advantage
Source: IDC Directions
“Organizations that analyze all relevant data and deliver actionable information will achieve extra $430B in productivity
gains over less analytically oriented peers by 2020”
IDC: Worldwide Bid Data and Analytics 2016 Predictions
Value can be created in the following ways with some industry relevance:
üImprove operational efficiency
üReduce cost
üNew product development
üInsights into new services
üBetter Customer Experience
Alluxio and SwiftStack partnership
Originated as Tachyon project, at the UC
Berkley’s AMP Lab by then Ph.D. student & now
Alluxio CTO, Haoyuan (H.Y.) Li.2014
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed
for the Cloud for data driven apps such as Big
Data Analytics, ML and AI.
2018 20192018
Data Elasticity
with a unified
namespace
Abstract data silos & storage
systems to independently scale
data on-demand with compute
Run Spark, Hive, Presto, ML
workloads on your data
located anywhere
Accelerate big data
workloads with transparent
tiered local data
Data Accessibility
for popular APIs &
API translation
Data Locality
with Intelligent
Multi-tiering
Alluxio – Key innovations
Alluxio
MasterZookeeper /
RAFT
Standby
Master
WAN
Alluxio
Client
Alluxio
Client
Alluxio
Worker
RAM / SSD / HDD
Alluxio
Worker
RAM / SSD / HDD
Alluxio Reference Architecture
…
…
Application
Application
Under Store 1
Under Store 2
“Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s
Multi-Cloud data management solution is first of it’s kind in the industry and effectively handles storage I/O
challenges faced by edge to core to cloud, large scale AI/ML data pipelines”
Amita Potnis,
Research Director at IDC’s Infrastructure
System Platform and Technologies Group
Property of SwiftStack Inc. 15
Multi-Cloud Storage and Data Management
Storage and multi-cloud data management for
data-driven applications and workflows
SwiftStack
SwiftStack Storage SwiftStack 1space
On-premises cloud storage
Highest throughput performance
Easy to deploy, operate, and scale
From tens of terabytes to hundreds of petabytes
Spans multiple geographic regions
Proven platform to realize more value from data!
Multi-cloud data management
Transparent access to a single storage namespace
Public and private infrastructure
Policy-driven data placement
Metadata search across the namespace
Leverage unique services across clouds!
Property of SwiftStack Inc. 16
SwiftStack Object Storage Architecture
Continuous
Auditing
Automatic
Replication
Fault
Tolerant
• Automated storage system
management for standard
servers
Replicas Erasure Codes
Direct-Attached Storage
• Masterless, quorum writes
• Nearest reads
• As-dispersed-as-possible data
placement across
nodes/zones/regions
• Distributed partitions in a
modified consistent hash ring
Replication
Reconstruction
Auditing
Device Inventory Management
Storage System Metrics Collection
Hardware Fault Detection
Standard Servers, Drives & Networking
Site 1 Site 2 Site 3
SwiftStack
Storage
SwiftStack Differentiation
18
Data Analytics Hub – Total Cost Of Ownership (TCO) Analysis
The 5-year TCO for the Hosted Private Cloud solution is 1/4th of the one hosted on a public cloud
Cloud native
applications with
SwiftStack Data
Analytics solution
© 2016 Western Digital Corporation All rights reserved
3 key use cases - SwiftStack Data Analytics Solution
Cloud bursting
with SwiftStack
Data Analytics
solution; compute
in any public
cloud
HDFS off-load to
SwiftStack Data
Analytics Solution
1. SwiftStack Data Analytics solution – On-premise deployment
• Customers starting their on-premise analytic
journeys
• Benefits
• Same performance as HDFS
• No more HDFS: operational simplicity
• Compute can be fully virtualized /
containerized!
• Durability++ (Erasure Coding)
• Scale (billions of objects / racks / Geo)
Alluxio
Presto
Alluxio
Presto
Dramatically speed-up big data
on object stores on premise
Same container
/ machine
Alluxio
Presto
Alluxio
Spark / Presto / Hive
2. Cloud bursting with SwiftStack Data Analytics solution
•Hybrid workflow
• customers hosting data on-premise and
leveraging public cloud for economies
of scale
• Alluxio for data locality
•Benefits
• Data as strategic asset stays on-premise
• Leverage cloud economies of scale for
compute
Hadoop cluster node
Alluxio
“alluxio//”
Compute:
Spark, Presto, Hive,
…
WAN
Private Cloud
Public Cloud
3. HDFS off-load to SwiftStack Data Analytics solution
•HDFS off-load
existing HDFS customers on DAS
looking to move to S3 - needs
migration
leverage DistCp - Distributed copy as
data mover and then the same
workflow
Using Alluxio
•Benefits
• Known and well understood process by
administrators: existing HDFS
workflows + rsync-like backup workflow
Co-located environment
ImpalaHive Spark
Same data
center / region
Presto
Enable big data on object stores
across single or multiple clouds
Spark
Alluxio Alluxio
Enterprises moving towards independent compute & storage
Customers
25
§ Come, talk to us about analytics on SwiftStack
§ Data analytics Solution – Alluxio and SwiftStack deliver a winning
combination of performance and capacity – “deliver on the promise of a
future ready data lake”
§ Multiple use cases across industry verticals show success of highly
scalable and lowest TCO big data solution
Get started with the Data Analytics Solution
SwiftStack Confidential 26
Questions?

More Related Content

PDF
Cloud Adoption, Risks and Rewards Infographic
PPTX
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
PDF
Journey to Big Data: Main Issues, Solutions, Benefits
PDF
Oil & Gas Big Data use cases
PPTX
The Vortex of Change - Digital Transformation (Presented by Intel)
PDF
Climbing the AI Ladder
PPTX
Making Bank Predictive and Real-Time
Cloud Adoption, Risks and Rewards Infographic
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Benefits of Transferring Real-Time Data to Hadoop at Scale
Journey to Big Data: Main Issues, Solutions, Benefits
Oil & Gas Big Data use cases
The Vortex of Change - Digital Transformation (Presented by Intel)
Climbing the AI Ladder
Making Bank Predictive and Real-Time

What's hot (20)

PDF
Open Source Data Management for Industry 4.0
PPTX
Oil and gas big data edition
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
PPTX
Capgemini Insights and Data
PDF
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PDF
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
PDF
Beyond Big Data: Data Science and AI
PPTX
Put Alternative Data to Use in Capital Markets

PPTX
Cloudera Data Impact Awards 2021 - Finalists
PDF
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
PPTX
Get Started with Cloudera’s Cyber Solution
PPTX
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
PDF
Stream based Data Integration
PPTX
Cloudera for Internet of Things
PDF
Denodo Design Studio: Modeling and Creation of Data Services
PDF
Sprint's Data Modernization Journey
PPTX
Edc event vienna presentation 1 oct 2019
Open Source Data Management for Industry 4.0
Oil and gas big data edition
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Capgemini Insights and Data
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Beyond Big Data: Data Science and AI
Put Alternative Data to Use in Capital Markets

Cloudera Data Impact Awards 2021 - Finalists
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
Get Started with Cloudera’s Cyber Solution
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
Stream based Data Integration
Cloudera for Internet of Things
Denodo Design Studio: Modeling and Creation of Data Services
Sprint's Data Modernization Journey
Edc event vienna presentation 1 oct 2019
Ad

Similar to Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage (20)

PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PDF
Enabling Apache Spark for Hybrid Cloud
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
Alluxio Data Orchestration Platform for the Cloud
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Best Practices for Using Alluxio with Spark
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
The Architecture of Decoupling Compute and Storage with Alluxio
PDF
Unify Data at Memory Speed
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Data EcoSystem 2.0
PDF
Alluxio @ Uber Seattle Meetup
PDF
Alluxio Use Cases and Future Directions
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Enabling Apache Spark for Hybrid Cloud
Enabling big data & AI workloads on the object store at DBS
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio Data Orchestration Platform for the Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Data Orchestration for the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Achieving Separation of Compute and Storage in a Cloud World
Best Practices for Using Alluxio with Spark
Open Source Data Orchestration for AI, Big Data, and Cloud
The Architecture of Decoupling Compute and Storage with Alluxio
Unify Data at Memory Speed
How the Development Bank of Singapore solves on-prem compute capacity challen...
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Data EcoSystem 2.0
Alluxio @ Uber Seattle Meetup
Alluxio Use Cases and Future Directions
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
assetexplorer- product-overview - presentation
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Nekopoi APK 2025 free lastest update
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Operating system designcfffgfgggggggvggggggggg
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PTS Company Brochure 2025 (1).pdf.......
How to Choose the Right IT Partner for Your Business in Malaysia
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
assetexplorer- product-overview - presentation
Navsoft: AI-Powered Business Solutions & Custom Software Development
Digital Systems & Binary Numbers (comprehensive )
Nekopoi APK 2025 free lastest update
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Computer Software and OS of computer science of grade 11.pptx
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 2 - PM Management and IT Context
Which alternative to Crystal Reports is best for small or large businesses.pdf

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage

  • 1. Dipti Borkar | Head of Product, Alluxio Shailesh Manjrekar | Head of AI/ML Product and Solutions, SwiftStack NextGen Data Analytics Stack – Alluxio and SwiftStack Edge to Core to Cloud
  • 2. Unstoppable Data Growth – Edge to core to cloud Emphasis on capturing value and show return on investment (ROI) from data *, IDC Worldwide Storage in Big Data Forecast, 2015-2019, October 2015 and IDC Directions Value Capture is key 30B IoT Connected Devices* By 2020, 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 101010101010101 100-250EB Big Data Storage Capacity* 10101010101010 1010101010101010 101010101 44ZB data created*
  • 3. Status quo - Existing solutions Business leaders and storage architects struggling to show return on investment SILO 1 Compute + Storage SILO 3 Compute + Storage SILO 5 Compute + Storage SILO 2 SILO 4 DAS silos Poor Utilization DIY Fatigue High Capex High OpEx INEFFICIENT AND EXPENSIVE Data Gravity
  • 4. 4 big trends driving the need for a new data analytics stack Separation of Compute & Storage Hybrid – Multi cloud environments Self-service data across the enterprise Rise of the object store
  • 5. Customer Challenges with existing solutions Lack of Enterprise ready products and continued pressure of every increasing cloud OpEx Ever increasing operating expenditures on existing (a) poorly utilized DAS solutions or (b) cloud storage deployments costs at scale Need for high throughput stack with API compatibility to support batch, interactive and advanced analytical workloads Lack of enterprise ready and multi-cloud data lake systems – at scale deployments with lifecycle management, self healing, geo-replication and faster re-builds CHALLENGE 1 CHALLENGE 2 CHALLENGE 3
  • 6. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  • 7. Co-located Big data journey and innovation options for enterprises Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS Disaggregated Burst HDFS data in the cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores Transition to Object store HDFS for Hybrid Cloud Support more frameworks § Typically compute-bound clusters over 100% capacity § Compute & I/O need to be scaled together even when not needed § Compute & I/O can be scaled independently but I/O still needed on HDFS which is expensive 1 2 3 4 5
  • 8. Multi-cloud storage and Data Management Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver The SwiftStack Data Analytics Solution with Alluxio Accelerated Compute, Data accessibility and Elasticity
  • 9. SwiftStack Data Analytics Solution – business use cases Customer, Security and Fraud Analysis Precision Medicine and Bio-Informatics Customer Churn/ Sentiment Analysis Analytics As a Service, Operational Analytics Internet of Things / Everything Financial Services, FSI Healthcare and Life Sciences, Genomics Cloud Service Providers Oil and Gas, Industrial Internet and Manufacturing Media and Entertainment
  • 10. SwiftStack Data Analytics solution – Value to be Captured by Enterprises Data and analytics as a source of competitive advantage Source: IDC Directions “Organizations that analyze all relevant data and deliver actionable information will achieve extra $430B in productivity gains over less analytically oriented peers by 2020” IDC: Worldwide Bid Data and Analytics 2016 Predictions Value can be created in the following ways with some industry relevance: üImprove operational efficiency üReduce cost üNew product development üInsights into new services üBetter Customer Experience
  • 11. Alluxio and SwiftStack partnership Originated as Tachyon project, at the UC Berkley’s AMP Lab by then Ph.D. student & now Alluxio CTO, Haoyuan (H.Y.) Li.2014 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data at Memory Speed for the Cloud for data driven apps such as Big Data Analytics, ML and AI. 2018 20192018
  • 12. Data Elasticity with a unified namespace Abstract data silos & storage systems to independently scale data on-demand with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data Data Accessibility for popular APIs & API translation Data Locality with Intelligent Multi-tiering Alluxio – Key innovations
  • 13. Alluxio MasterZookeeper / RAFT Standby Master WAN Alluxio Client Alluxio Client Alluxio Worker RAM / SSD / HDD Alluxio Worker RAM / SSD / HDD Alluxio Reference Architecture … … Application Application Under Store 1 Under Store 2
  • 14. “Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s Multi-Cloud data management solution is first of it’s kind in the industry and effectively handles storage I/O challenges faced by edge to core to cloud, large scale AI/ML data pipelines” Amita Potnis, Research Director at IDC’s Infrastructure System Platform and Technologies Group
  • 15. Property of SwiftStack Inc. 15 Multi-Cloud Storage and Data Management Storage and multi-cloud data management for data-driven applications and workflows SwiftStack SwiftStack Storage SwiftStack 1space On-premises cloud storage Highest throughput performance Easy to deploy, operate, and scale From tens of terabytes to hundreds of petabytes Spans multiple geographic regions Proven platform to realize more value from data! Multi-cloud data management Transparent access to a single storage namespace Public and private infrastructure Policy-driven data placement Metadata search across the namespace Leverage unique services across clouds!
  • 16. Property of SwiftStack Inc. 16 SwiftStack Object Storage Architecture Continuous Auditing Automatic Replication Fault Tolerant • Automated storage system management for standard servers Replicas Erasure Codes Direct-Attached Storage • Masterless, quorum writes • Nearest reads • As-dispersed-as-possible data placement across nodes/zones/regions • Distributed partitions in a modified consistent hash ring Replication Reconstruction Auditing Device Inventory Management Storage System Metrics Collection Hardware Fault Detection Standard Servers, Drives & Networking Site 1 Site 2 Site 3 SwiftStack Storage
  • 18. 18 Data Analytics Hub – Total Cost Of Ownership (TCO) Analysis The 5-year TCO for the Hosted Private Cloud solution is 1/4th of the one hosted on a public cloud
  • 19. Cloud native applications with SwiftStack Data Analytics solution © 2016 Western Digital Corporation All rights reserved 3 key use cases - SwiftStack Data Analytics Solution Cloud bursting with SwiftStack Data Analytics solution; compute in any public cloud HDFS off-load to SwiftStack Data Analytics Solution
  • 20. 1. SwiftStack Data Analytics solution – On-premise deployment • Customers starting their on-premise analytic journeys • Benefits • Same performance as HDFS • No more HDFS: operational simplicity • Compute can be fully virtualized / containerized! • Durability++ (Erasure Coding) • Scale (billions of objects / racks / Geo) Alluxio Presto Alluxio Presto Dramatically speed-up big data on object stores on premise Same container / machine Alluxio Presto Alluxio Spark / Presto / Hive
  • 21. 2. Cloud bursting with SwiftStack Data Analytics solution •Hybrid workflow • customers hosting data on-premise and leveraging public cloud for economies of scale • Alluxio for data locality •Benefits • Data as strategic asset stays on-premise • Leverage cloud economies of scale for compute Hadoop cluster node Alluxio “alluxio//” Compute: Spark, Presto, Hive, … WAN Private Cloud Public Cloud
  • 22. 3. HDFS off-load to SwiftStack Data Analytics solution •HDFS off-load existing HDFS customers on DAS looking to move to S3 - needs migration leverage DistCp - Distributed copy as data mover and then the same workflow Using Alluxio •Benefits • Known and well understood process by administrators: existing HDFS workflows + rsync-like backup workflow Co-located environment ImpalaHive Spark Same data center / region Presto Enable big data on object stores across single or multiple clouds Spark Alluxio Alluxio
  • 23. Enterprises moving towards independent compute & storage
  • 25. 25 § Come, talk to us about analytics on SwiftStack § Data analytics Solution – Alluxio and SwiftStack deliver a winning combination of performance and capacity – “deliver on the promise of a future ready data lake” § Multiple use cases across industry verticals show success of highly scalable and lowest TCO big data solution Get started with the Data Analytics Solution