SlideShare a Scribd company logo
Alluxio Product School
Alluxio and NetApp
Modern Data Architecture Requirements
Resiliency
Data Placement Scale
2
Joseph Kandatilparambil
Solutions Architect, StorageGRID Software
Group, NetApp
Joseph leads the Big Data and Analytics strategy
for StorageGRID and has been with the
StorageGRID team for two years now.
Linkedin:
https://www.linkedin.com/in/jkandatilparambil/
 
Michael Waldrop
Mike leads the Global Solutions Engineering team for Alluxio.
Mike works with enterprises to solve complex data problems
and modernize their Big Data platforms.
https://www.linkedin.com/in/mikewaldrop
Object Use Cases Are Evolving
Cold
Archive
App
Storage
Data
Streaming
Analytics
Active
Archive and
Blob Store
HPC
Cold
Archive
Active
Archive
and
Blob
Store
Data streaming
HPC
App Storage
Analytics
5
• Massive unstructured data growth
continues to drive adoption
• Tipping point for S3 adoption across
workloads and applications
achieved
• Hybrid cloud is the standard
• Migrations from existing / legacy
object installations (not just green
field)
Object Storage Growth is Accelerating
87%
6
Common User Journey
Mount HDFS and object storage
into a common namespace to
facilitate migration to object
store without changing analytics
tools
Take advantage of all the
benefits of Object storage
Burst Compute to cloud to
scale quickly
Leverage multiple cloud
providers to get best of breed
analytics tools without vendor
lock-in
7
What is Data Orchestration ?
8
Modern Data Analytics Architecture
StorageGRID S3 Data Lake
On-Prem
On-Prem
On-Prem
9
• Software-defined object
storage
• Policy-based information
lifecycle management at
scale
• Global Namespace and true
Multi-Tenancy
• Durable, available, and
scale-out
• Cloud integrations: AWS SNS
notification, cloud mirroring,
Metadata search
Why NetApp StorageGRID for your Data Lake?
10
Data Fabric
Seattle
Denver
New York
New Apps
StorageGRID
Up to 16 logical/physical sites
S3.company.com
Public Cloud
S3
Developer
Data Scientist
Use case 1: Migration from HDFS to Object Storage
• Reduces capacity overhead costs
• Decouple compute and Storage
• Performance and scale
• Policy-based data migration to S3
StorageGRID S3 Data
Lake
11
Use case 2: Expose data on-prem to compute in cloud
ON PREMISE
• Enable storage at a fraction of the
cost
• Scale compute on demand
• Data is always protected – dual
layered protection
• Have control over data locality
12
Use case 3: Enable Multi-Cloud workloads
● Minimize data movement
● Have full control over
data placement
● Avoid vendor lock-in
● Adapt to new
requirements
Compute
On-Prem
StorageGRID S3 Data Lake
StorageGRID Data lake storage is designed for high-performance, fault-tolerance, and scale with low
touch operations
13
alluxio.io/slack
www.alluxio.io
twitter.com/alluxio
linkedin.com/alluxio
www.storagegrid.com
twitter.com/NetApp
linkedin.com/NetApp
joseph.kandatilparambil@netapp.com michael.waldrop@alluxio.com
15

More Related Content

PDF
Enabling Apache Spark for Hybrid Cloud
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Data Orchestration for the Hybrid Cloud Era
PPTX
Alluxio Presentation at Strata San Jose 2016
Enabling Apache Spark for Hybrid Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Accelerate Analytics and ML in the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
Alluxio Presentation at Strata San Jose 2016

Similar to Geo-distributed Analytics with NetApp StorageGRID and Alluxio (20)

PDF
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Data Orchestration Platform for the Cloud
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Slides: Accelerating Queries on Cloud Data Lakes
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Alluxio Data Orchestration Platform for the Cloud
PDF
Embracing hybrid cloud for data-intensive analytic workloads
PDF
Enabling big data & AI workloads on the object store at DBS
PPTX
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
PDF
Architecting Modern Data Platforms
PDF
Cloud Computing and Big Data
PDF
Exploring the Wider World of Big Data
PDF
Top 10 guidelines for deploying modern data architecture for the data driven ...
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Achieving Separation of Compute and Storage in a Cloud World
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate Analytics and ML in the Hybrid Cloud Era
Data Orchestration Platform for the Cloud
From limited Hadoop compute capacity to increased data scientist efficiency
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Slides: Accelerating Queries on Cloud Data Lakes
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Alluxio Data Orchestration Platform for the Cloud
Embracing hybrid cloud for data-intensive analytic workloads
Enabling big data & AI workloads on the object store at DBS
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Architecting Modern Data Platforms
Cloud Computing and Big Data
Exploring the Wider World of Big Data
Top 10 guidelines for deploying modern data architecture for the data driven ...
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Ad

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
ai tools demonstartion for schools and inter college
PPTX
assetexplorer- product-overview - presentation
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
System and Network Administraation Chapter 3
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administration Chapter 2
Digital Strategies for Manufacturing Companies
Operating system designcfffgfgggggggvggggggggg
Wondershare Filmora 15 Crack With Activation Key [2025
CHAPTER 2 - PM Management and IT Context
Adobe Illustrator 28.6 Crack My Vision of Vector Design
wealthsignaloriginal-com-DS-text-... (1).pdf
Computer Software and OS of computer science of grade 11.pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Understanding Forklifts - TECH EHS Solution
Softaken Excel to vCard Converter Software.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
ai tools demonstartion for schools and inter college
assetexplorer- product-overview - presentation
Design an Analysis of Algorithms I-SECS-1021-03
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Odoo Companies in India – Driving Business Transformation.pdf
System and Network Administraation Chapter 3
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administration Chapter 2

Geo-distributed Analytics with NetApp StorageGRID and Alluxio

  • 2. Modern Data Architecture Requirements Resiliency Data Placement Scale 2
  • 3. Joseph Kandatilparambil Solutions Architect, StorageGRID Software Group, NetApp Joseph leads the Big Data and Analytics strategy for StorageGRID and has been with the StorageGRID team for two years now. Linkedin: https://www.linkedin.com/in/jkandatilparambil/  
  • 4. Michael Waldrop Mike leads the Global Solutions Engineering team for Alluxio. Mike works with enterprises to solve complex data problems and modernize their Big Data platforms. https://www.linkedin.com/in/mikewaldrop
  • 5. Object Use Cases Are Evolving Cold Archive App Storage Data Streaming Analytics Active Archive and Blob Store HPC Cold Archive Active Archive and Blob Store Data streaming HPC App Storage Analytics 5
  • 6. • Massive unstructured data growth continues to drive adoption • Tipping point for S3 adoption across workloads and applications achieved • Hybrid cloud is the standard • Migrations from existing / legacy object installations (not just green field) Object Storage Growth is Accelerating 87% 6
  • 7. Common User Journey Mount HDFS and object storage into a common namespace to facilitate migration to object store without changing analytics tools Take advantage of all the benefits of Object storage Burst Compute to cloud to scale quickly Leverage multiple cloud providers to get best of breed analytics tools without vendor lock-in 7
  • 8. What is Data Orchestration ? 8
  • 9. Modern Data Analytics Architecture StorageGRID S3 Data Lake On-Prem On-Prem On-Prem 9
  • 10. • Software-defined object storage • Policy-based information lifecycle management at scale • Global Namespace and true Multi-Tenancy • Durable, available, and scale-out • Cloud integrations: AWS SNS notification, cloud mirroring, Metadata search Why NetApp StorageGRID for your Data Lake? 10 Data Fabric Seattle Denver New York New Apps StorageGRID Up to 16 logical/physical sites S3.company.com Public Cloud S3 Developer Data Scientist
  • 11. Use case 1: Migration from HDFS to Object Storage • Reduces capacity overhead costs • Decouple compute and Storage • Performance and scale • Policy-based data migration to S3 StorageGRID S3 Data Lake 11
  • 12. Use case 2: Expose data on-prem to compute in cloud ON PREMISE • Enable storage at a fraction of the cost • Scale compute on demand • Data is always protected – dual layered protection • Have control over data locality 12
  • 13. Use case 3: Enable Multi-Cloud workloads ● Minimize data movement ● Have full control over data placement ● Avoid vendor lock-in ● Adapt to new requirements Compute On-Prem StorageGRID S3 Data Lake StorageGRID Data lake storage is designed for high-performance, fault-tolerance, and scale with low touch operations 13
  • 15. 15