SlideShare a Scribd company logo
Building a Scalable Analytics Environment
to Support Diverse Workloads
WHO WE ARE
Aunalytics
Key Stats
Aunalytics provides a leading-edge cloud platform
to help companies leverage data, algorithms, and
high-performance computing to help their teams
answer questions and perform tasks more
efficiently.
Our side-by-side digital transformation model
provides on-demand access to technology, data
science, and AI experts to help transform the
way our clients work.
> 200 Employees
> 1,000 Customers
Financial
Institution
partners
THE SOLUTION
daybreak
Daybreak is a data platform powered by financial
industry intelligence and smart features that enable a
variety of analytics solutions across the enterprise.
SQL
UNIVERSAL ACCESS TO DATA
Access all your data in one
shared location
Securely connect your existing systems with a
data-source-agnostic product, and then quickly put
your data to use with everything you need in one
place.
Give everyone on the team access to the latest and
most accurate data, so they can answer their pressing
questions.
Use Daybreak as a single source of information.
Whether you are using Tableau, Power BI or input into
a 3rd party system, you can pull from a single source.
Simplify the information. Get everyone on the
same page.
SQL
FASTER INSIGHTS
Get the right data at the
right time
Get the updated data you need delivered timely and
consistently every day.
Convert rich, transactional data about your
customers into actionable insights.
Avoid wasting time wrangling data or straining your
IT department and focus on advancing your strategic
business priorities.
Make it easier to quickly understand your data and
save time with automated reporting and clean data.
Scale insights across the organization quickly
Leverage data insights and efficiently answer your
daily questions.
SMART
FEATURES
DATA MARTS
ARTIFICIAL INTELLIGENCE/
MACHINE LEARNING
MEMBER
LIBRARY
SERVICES
LIBRARY
TRANSACTION
LIBRARY
CORE
LENDING
MOBILE BANKING
ATM/ITM
WEALTH AND TRUST
CRM
ACCOUNT
LIBRARY
MEMBER-CENTRIC VIEW
DAYBREAK DATA WAREHOUSE
INSIGHTS
A new era for analytics
SIDE-BY-SIDE CLIENT SUCCESS
Support from a team of
data experts
Get tools, resources, and support throughout
our end-to-end process.
Integrate, enrich, and utilize data marts with
our team beside you, so you can get better
answers to the questions you have.
Be ready for your AI, machine learning, and
predictive analytics journey with the right
foundation.
Our talented team of data scientists and
analysts are here to help.
DATA
SCIENTISTS
CLIENT SUCCESS
MANAGER
BUSINESS
ANALYSTS
CLIENT
ADVISORY
TEAM
RELATIONSHIP
MANAGER
DATA ENGINEERS
ENGINEERS
CLIENT
INFRASTRUCTURE
INGESTION
SOFTWARE
SECURITY
PROJECT
MANAGER
The Challenge
Requirement: Data availability across a diverse
set of dynamic services
Based on
Requirement: Parallel and scalable data access layer
required, but not for all data all of the time
Typical Parallel File
System
All fast, all the time.
Tiering cost/benefit is
negligible and overhead
cost is high.
Alluxio as deployed
• Data in use is fast
• Invisible Upstream
• Scale based on
performance
• Scale de-coupled from
amount of storage
Building a scalable analytics environment to support diverse workloads
CLOUD HOSTING/ANALYTICS
Legacy Hadoop Platform
Hadoop
Cluster ONE
Hadoop
Cluster TWO
Hadoop
Cluster THREE
Small Containerization
Platform Kubernetes
Job Controller: low volume
workloads (low lift activity)
Limitations
Data Stored in triplicate
Requires high speed
storage
Requires high IOPS storage
Requires many spindles
Costly Hadoop nodes
Storage is still performant
even when you are not
using it !!!
Heavy Lift Area
Lots of performant
storage
Lots of performant LAN
Legacy Platform
CLOUD HOSTING/ANALYTICS
Commercial Boutique Storage Proposal
Diskless Physical Hadoop
Nodes
Hadoop processing nodes
connected to remote
boutique storage
Limitations
Extreme cost storage
All nodes have singular
purpose
Requires high speed
dedicated LAN/FIBER
Requires many spindles
Storage vendor lock in
Storage vendor support
All data on HP storage
always
Storage is still performant
even when you are not
Heavy Lift Storage Area
Lots of performant
storage
Lots of performant LAN
(Fiber possibly)
Lots of replication
Extreme performance
storage
Commercial performance
storage
Option ONE
CLOUD HOSTING/ANALYTICS
Open-Source Storage Proposal
Diskless Physical Hadoop
Nodes
Hadoop processing nodes
connected to remote
boutique storage
Limitations
Learning Curve
Internal Staff cost/training
All nodes have singular
purpose
Requires high speed
dedicated LAN/FIBER
Requires many spindles
All data on HP storage
always
Storage is still performant
even when you are not
using it !!!
Heavy Lift Storage Area
Lots of performant
storage
Lots of performant LAN
(Fiber possibly)
Lots of replication
Extreme performance
storage
CEPH, Gluster, Lustre, DPFS
Open-Source Storage
Option TWO
CLOUD HOSTING/ANALYTICS
Data Cache Layer Extreme Speed Storage (Abstraction Layer)
200 Cores
6TB ALL FLASH
12 million read IOPS
40 GB per second sustained read performance
Cost effective
Average Transfer Speeds
Low IOPS requirement
Highly Available
Built in DR functionality
NFS
● Scalable Caching Layer
● RAM/FLASH based
● Compensates for lower
speed/cost underlying storage
● Supports Spark/MR
● Replaces Physical HDFS
Kubernetes Heavy Lift
Platform
Alluxio Caching Layer
Final Design Choice
NFS
NFS
20 Hadoop Clusters
Same Hardware as 2
Legacy Clusters
CLOUD HOSTING/ANALYTICS
Kubernetes Platform Handles Heavy Lift
Object Store
or NFS
Alluxio
Data Cache
GPU
Containerization Platform
(DC/OS) Kubernetes
High Volume Transient
Workloads
Enterprise Cloud Services
Static Critical Management
Workloads
25 Servers (Can scale
to thousands)
1400 Cores
100% Memory
No spinning Disk
Hadoop
Map Reduce
Spark
Aunsight Tasks
Apache Drill
All heavy lifting data
processing
Adaptive Read/Write Methods
Local Object Store
(S3 Compatible)
NFS
Cloud Object Store
(Amazon/Azure)
• All Flash
• 600GB Aggregate
Lan Speed
• Extreme IOPS
• Low Latency
• Temp storage for
processing loads
• All NVME/Flash
• High RAM nodes
• High Core Density
Pre Staged Read Methodology
NFS
NFS
1) Data written to NFS
2) Alluxio copies data into
Flash to pre-stage for
processing
Adaptive Write Methods
• All Flash
• 600GB Aggregate
Lan Speed
• Extreme IOPS
• Low Latency
• Temp storage for
processing loads
NFS
NFS
Write to Alluxio only (Must
Cache)
Any Temp File (High Use)
Write through to UFS (Cache
Through)
(Rare Use)
Write Back to UFS (Async
Through)
Cache/Persist Later (High Use)
Write to UFS Only (Through)
(Rare Use)
Write modes embedded into
each write provides
maximum efficiency
Aunalytics
Use Case
Conclusions
Aunalytics
Use Case
Conclusions
• We have mass quantities of historical data that must be
stored but a much smaller amount of data that must be
processed daily
• The (relatively) small amount of data that we must
process daily requires parallelism from its underlying
storage in order to run in our required time frame
• ALL data must be quickly available for high speed
processing if required
• Allows for (IN Memory) storage performance levels in a
controlled, tunable and independently scalable way.

More Related Content

PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
PPTX
Hadoop vs. RDBMS for Advanced Analytics
PPTX
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
PPTX
Introduction to Kudu - StampedeCon 2016
PPTX
Introduction to PolyBase
PDF
DataStax Training – Everything you need to become a Cassandra Rockstar
PPTX
How much money do you lose every time your ecommerce site goes down?
PPTX
How To Tell if Your Business Needs NoSQL
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Hadoop vs. RDBMS for Advanced Analytics
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Introduction to Kudu - StampedeCon 2016
Introduction to PolyBase
DataStax Training – Everything you need to become a Cassandra Rockstar
How much money do you lose every time your ecommerce site goes down?
How To Tell if Your Business Needs NoSQL

What's hot (20)

PPTX
What's new in SQL Server 2016
PPTX
Announcing Spark Driver for Cassandra
PDF
Designing a modern data warehouse in azure
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
PPTX
C*ollege Credit: Keep the DB, Lose the A
PPTX
Transforms Document Management at Scale with Distributed Database Solution wi...
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
PPTX
Cortana Analytics Suite
PPTX
Getting Big Value from Big Data
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Big Data on Azure Tutorial
PPTX
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
PPTX
HA/DR options with SQL Server in Azure and hybrid
PPTX
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
PDF
A7 storytelling with_oracle_analytics_cloud
PPTX
Microsoft cloud big data strategy
PPTX
Azure data platform overview
PDF
Big data on Azure for Architects
PPTX
Analyzing Hadoop Data Using Sparklyr

What's new in SQL Server 2016
Announcing Spark Driver for Cassandra
Designing a modern data warehouse in azure
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
C*ollege Credit: Keep the DB, Lose the A
Transforms Document Management at Scale with Distributed Database Solution wi...
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
Cortana Analytics Suite
Getting Big Value from Big Data
Innovation in the Data Warehouse - StampedeCon 2016
Introduction to DataStax Enterprise Graph Database
Big Data on Azure Tutorial
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
HA/DR options with SQL Server in Azure and hybrid
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
A7 storytelling with_oracle_analytics_cloud
Microsoft cloud big data strategy
Azure data platform overview
Big data on Azure for Architects
Analyzing Hadoop Data Using Sparklyr

Ad

Similar to Building a scalable analytics environment to support diverse workloads (20)

PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Logical-DataWarehouse-Alluxio-meetup
PDF
Alluxio Use Cases and Future Directions
PPTX
2016 Strata Conference New York - Vendor Briefings
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
Data Orchestration Platform for the Cloud
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PPTX
Lessons learned processing 70 billion data points a day using the hybrid cloud
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
PPTX
Alluxio Presentation at Strata San Jose 2016
PDF
Unified Data API for Distributed Cloud Analytics and AI
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Achieving Separation of Compute and Storage in a Cloud World
How the Development Bank of Singapore solves on-prem compute capacity challen...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Data Orchestration for the Hybrid Cloud Era
Logical-DataWarehouse-Alluxio-meetup
Alluxio Use Cases and Future Directions
2016 Strata Conference New York - Vendor Briefings
Unified Big Data Analytics: Any Stack, Any Cloud
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Data Orchestration Platform for the Cloud
From limited Hadoop compute capacity to increased data scientist efficiency
Lessons learned processing 70 billion data points a day using the hybrid cloud
Enabling big data & AI workloads on the object store at DBS
Accelerate Analytics and ML in the Hybrid Cloud Era
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Alluxio Presentation at Strata San Jose 2016
Unified Data API for Distributed Cloud Analytics and AI
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PPTX
Transform Your Business with a Software ERP System
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Essential Infomation Tech presentation.pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
ai tools demonstartion for schools and inter college
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
Transform Your Business with a Software ERP System
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Operating system designcfffgfgggggggvggggggggg
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Essential Infomation Tech presentation.pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How Creative Agencies Leverage Project Management Software.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
ai tools demonstartion for schools and inter college
VVF-Customer-Presentation2025-Ver1.9.pptx
Digital Strategies for Manufacturing Companies

Building a scalable analytics environment to support diverse workloads

  • 1. Building a Scalable Analytics Environment to Support Diverse Workloads
  • 2. WHO WE ARE Aunalytics Key Stats Aunalytics provides a leading-edge cloud platform to help companies leverage data, algorithms, and high-performance computing to help their teams answer questions and perform tasks more efficiently. Our side-by-side digital transformation model provides on-demand access to technology, data science, and AI experts to help transform the way our clients work. > 200 Employees > 1,000 Customers Financial Institution partners
  • 4. Daybreak is a data platform powered by financial industry intelligence and smart features that enable a variety of analytics solutions across the enterprise.
  • 5. SQL
  • 6. UNIVERSAL ACCESS TO DATA Access all your data in one shared location Securely connect your existing systems with a data-source-agnostic product, and then quickly put your data to use with everything you need in one place. Give everyone on the team access to the latest and most accurate data, so they can answer their pressing questions. Use Daybreak as a single source of information. Whether you are using Tableau, Power BI or input into a 3rd party system, you can pull from a single source. Simplify the information. Get everyone on the same page.
  • 7. SQL FASTER INSIGHTS Get the right data at the right time Get the updated data you need delivered timely and consistently every day. Convert rich, transactional data about your customers into actionable insights. Avoid wasting time wrangling data or straining your IT department and focus on advancing your strategic business priorities. Make it easier to quickly understand your data and save time with automated reporting and clean data. Scale insights across the organization quickly Leverage data insights and efficiently answer your daily questions.
  • 8. SMART FEATURES DATA MARTS ARTIFICIAL INTELLIGENCE/ MACHINE LEARNING MEMBER LIBRARY SERVICES LIBRARY TRANSACTION LIBRARY CORE LENDING MOBILE BANKING ATM/ITM WEALTH AND TRUST CRM ACCOUNT LIBRARY MEMBER-CENTRIC VIEW DAYBREAK DATA WAREHOUSE INSIGHTS
  • 9. A new era for analytics
  • 10. SIDE-BY-SIDE CLIENT SUCCESS Support from a team of data experts Get tools, resources, and support throughout our end-to-end process. Integrate, enrich, and utilize data marts with our team beside you, so you can get better answers to the questions you have. Be ready for your AI, machine learning, and predictive analytics journey with the right foundation. Our talented team of data scientists and analysts are here to help. DATA SCIENTISTS CLIENT SUCCESS MANAGER BUSINESS ANALYSTS CLIENT ADVISORY TEAM RELATIONSHIP MANAGER DATA ENGINEERS ENGINEERS CLIENT INFRASTRUCTURE INGESTION SOFTWARE SECURITY PROJECT MANAGER
  • 12. Requirement: Data availability across a diverse set of dynamic services
  • 13. Based on Requirement: Parallel and scalable data access layer required, but not for all data all of the time Typical Parallel File System All fast, all the time. Tiering cost/benefit is negligible and overhead cost is high. Alluxio as deployed • Data in use is fast • Invisible Upstream • Scale based on performance • Scale de-coupled from amount of storage
  • 15. CLOUD HOSTING/ANALYTICS Legacy Hadoop Platform Hadoop Cluster ONE Hadoop Cluster TWO Hadoop Cluster THREE Small Containerization Platform Kubernetes Job Controller: low volume workloads (low lift activity) Limitations Data Stored in triplicate Requires high speed storage Requires high IOPS storage Requires many spindles Costly Hadoop nodes Storage is still performant even when you are not using it !!! Heavy Lift Area Lots of performant storage Lots of performant LAN Legacy Platform
  • 16. CLOUD HOSTING/ANALYTICS Commercial Boutique Storage Proposal Diskless Physical Hadoop Nodes Hadoop processing nodes connected to remote boutique storage Limitations Extreme cost storage All nodes have singular purpose Requires high speed dedicated LAN/FIBER Requires many spindles Storage vendor lock in Storage vendor support All data on HP storage always Storage is still performant even when you are not Heavy Lift Storage Area Lots of performant storage Lots of performant LAN (Fiber possibly) Lots of replication Extreme performance storage Commercial performance storage Option ONE
  • 17. CLOUD HOSTING/ANALYTICS Open-Source Storage Proposal Diskless Physical Hadoop Nodes Hadoop processing nodes connected to remote boutique storage Limitations Learning Curve Internal Staff cost/training All nodes have singular purpose Requires high speed dedicated LAN/FIBER Requires many spindles All data on HP storage always Storage is still performant even when you are not using it !!! Heavy Lift Storage Area Lots of performant storage Lots of performant LAN (Fiber possibly) Lots of replication Extreme performance storage CEPH, Gluster, Lustre, DPFS Open-Source Storage Option TWO
  • 18. CLOUD HOSTING/ANALYTICS Data Cache Layer Extreme Speed Storage (Abstraction Layer) 200 Cores 6TB ALL FLASH 12 million read IOPS 40 GB per second sustained read performance Cost effective Average Transfer Speeds Low IOPS requirement Highly Available Built in DR functionality NFS ● Scalable Caching Layer ● RAM/FLASH based ● Compensates for lower speed/cost underlying storage ● Supports Spark/MR ● Replaces Physical HDFS Kubernetes Heavy Lift Platform Alluxio Caching Layer Final Design Choice NFS NFS 20 Hadoop Clusters Same Hardware as 2 Legacy Clusters
  • 19. CLOUD HOSTING/ANALYTICS Kubernetes Platform Handles Heavy Lift Object Store or NFS Alluxio Data Cache GPU Containerization Platform (DC/OS) Kubernetes High Volume Transient Workloads Enterprise Cloud Services Static Critical Management Workloads 25 Servers (Can scale to thousands) 1400 Cores 100% Memory No spinning Disk Hadoop Map Reduce Spark Aunsight Tasks Apache Drill All heavy lifting data processing
  • 20. Adaptive Read/Write Methods Local Object Store (S3 Compatible) NFS Cloud Object Store (Amazon/Azure) • All Flash • 600GB Aggregate Lan Speed • Extreme IOPS • Low Latency • Temp storage for processing loads • All NVME/Flash • High RAM nodes • High Core Density
  • 21. Pre Staged Read Methodology NFS NFS 1) Data written to NFS 2) Alluxio copies data into Flash to pre-stage for processing
  • 22. Adaptive Write Methods • All Flash • 600GB Aggregate Lan Speed • Extreme IOPS • Low Latency • Temp storage for processing loads NFS NFS Write to Alluxio only (Must Cache) Any Temp File (High Use) Write through to UFS (Cache Through) (Rare Use) Write Back to UFS (Async Through) Cache/Persist Later (High Use) Write to UFS Only (Through) (Rare Use) Write modes embedded into each write provides maximum efficiency
  • 24. Aunalytics Use Case Conclusions • We have mass quantities of historical data that must be stored but a much smaller amount of data that must be processed daily • The (relatively) small amount of data that we must process daily requires parallelism from its underlying storage in order to run in our required time frame • ALL data must be quickly available for high speed processing if required • Allows for (IN Memory) storage performance levels in a controlled, tunable and independently scalable way.