SlideShare a Scribd company logo
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Coexistence and Migration of Vendor HPC-based
Infrastructure to Hadoop Ecosystem/YARN
Solution
S&P Captital IQ
Friday 22nd May, 2015
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Agenda
The HPC Inheritance
The Need
Integrating the Hadoop Ecosystem
Integration of HPC vendor based and the Hadoop ecosystem
via YARN AM
Advantages and Potential Drawbacks
Closing & Questions
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The HPC Inheritance
Preexisting
HPC distributed computing infrastructure established
2003-2007
Usually 500 - 5000 cores, but some instances 100K, not on a
single (HA) RM
Vendor products, (usually) closed source
No separate resource schedulers, a notable exception (EGO,
Platform Computing)
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The HPC Inheritance
Preexisting
The HPC applications
HPC systems built with: MPICH, OpenMPI, ACE-TAO or
Sockets
Few applications have 80% of the computational resources,
80/20 (Pareto) principle
Designed for computation heavy apps, with low I/Os, with
concentrated demand in range of hours
Low latency/high throughput, but some variances
Built with a particular (vendor) API implementation, callbacks
Continuous optimization cycles, on algorithmic and on
infrastructure levels
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
The Need
Engineer a new system, distributed computing & data, at
reasonable cost.
Reuse the infrastructure
Reuse already built internal knowledge
Current HPC applications should not experience noticeable
slowness
Growing awareness that heavy compute & data oriented
application need to be built in distributed fashion sharing
resources
Efficient resource utilization
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integrating the Hadoop Ecosystem
Apache Hadoop on the existing HPC infrastructure: hardware
coexistence, resource mapping one-to-one
.bashrc user account profile to setup the environment for both
of the systems
Using YARN as a resource scheduler for the both systems
May need OS optimization due to I/Os
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
Building AM, using the vendor API to control the HPC
computational processes, allocation on demand considering
the HPC specifics
AMRMClientAsync handles AM communications with RM,
needs CallbackHandler implementation
Depending on the HPC API, queue pooling or events
notification
up/down process, memory-utilization efficient - process start,
slow
open/close fast - memory footprint
HPC YARN AM, uses YARN API calls and HPC management
API, variety of combinations of resource allocation/release
possible
fixed, fixed + incremental, incremental only
scheduling based on job patterns, prediction scheduling (art)
combinations of the above
Handling the YARN’s callbacks for resource management
Recoverable on AM crash: simple state based on config
parameters and HPC Scheduler queues state
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+1
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+2
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Integration of HPC vendor based and the Hadoop ecosystem via YARN AM
Building YARN AM as valve for the computation flow to HPC
HPC
Sched-
uler
t+n
YARN
RM1
YARN
RM2
NNs
HA
ZK
nodes
R1 R2
hpc
AM
R4 R5
H D F S
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Advantages and Potential Drawbacks
Advantages
Sharing resources: Apache Hadoop coexisting with HPC
increasing resources utilization
the pattern changes are visible at the node compute resources
in contrast to the network that can have quite complex
topology and behavior
allowing new infrastructure to grow out of the existing one
Potential Drawbacks
Sharing resources: HPC AM logic adds additional complexity
and in some cases it may be considerable
The work is somehow slower, implementing gradual changes
and observing the system behavior based on the job patterns
May impose additional data block transfers on the network
Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution
Closing & Questions
Integrating the HPC RM/Schedulers with the Hadoop
Ecosystem via a custom AM valve, an optimal way to make
the HPC aware of YARN
Slowing hardware expansion & efficient resource utilization
Q&A

More Related Content

PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
YARN Ready: Apache Spark
PPTX
Hadoop crash course workshop at Hadoop Summit
PPTX
Hadoop from Hive with Stinger to Tez
PPTX
NextGen Apache Hadoop MapReduce
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
Apache Tez: Accelerating Hadoop Query Processing
YARN Ready: Apache Spark
Hadoop crash course workshop at Hadoop Summit
Hadoop from Hive with Stinger to Tez
NextGen Apache Hadoop MapReduce
Apache Hadoop YARN: Past, Present and Future
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
Combine SAS High-Performance Capabilities with Hadoop YARN

What's hot (20)

PPTX
What's new in Ambari
PDF
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
PPTX
Internet of things Crash Course Workshop
PPTX
Luo june27 1150am_room230_a_v2
PPTX
Mutable Data in Hive's Immutable World
PPTX
Enabling Diverse Workload Scheduling in YARN
PDF
Fast SQL on Hadoop, Really?
PPTX
Big Data Simplified - Is all about Ab'strakSHeN
PPTX
Apache Tez - A unifying Framework for Hadoop Data Processing
PDF
Hortonworks tech workshop in-memory processing with spark
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PPTX
Protecting your Critical Hadoop Clusters Against Disasters
PPTX
The Future of Hadoop Security
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
PPTX
Applied Deep Learning with Spark and Deeplearning4j
PPTX
Introduction to the Hortonworks YARN Ready Program
PDF
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
PPTX
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PPTX
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
What's new in Ambari
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Internet of things Crash Course Workshop
Luo june27 1150am_room230_a_v2
Mutable Data in Hive's Immutable World
Enabling Diverse Workload Scheduling in YARN
Fast SQL on Hadoop, Really?
Big Data Simplified - Is all about Ab'strakSHeN
Apache Tez - A unifying Framework for Hadoop Data Processing
Hortonworks tech workshop in-memory processing with spark
Apache Tez - Accelerating Hadoop Data Processing
Protecting your Critical Hadoop Clusters Against Disasters
The Future of Hadoop Security
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Applied Deep Learning with Spark and Deeplearning4j
Introduction to the Hortonworks YARN Ready Program
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Developing YARN Applications - Integrating natively to YARN July 24 2014
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Ad

Viewers also liked (20)

PPTX
Hadoop in Validated Environment - Data Governance Initiative
PPTX
HBase and Drill: How loosley typed SQL is ideal for NoSQL
PDF
Inspiring Travel at Airbnb [WIP]
PPTX
Realistic Synthetic Generation Allows Secure Development
PPTX
Karta an ETL Framework to process high volume datasets
PPTX
Carpe Datum: Building Big Data Analytical Applications with HP Haven
PDF
50 Shades of SQL
PPTX
Practical Distributed Machine Learning Pipelines on Hadoop
PPT
Hadoop for Genomics__HadoopSummit2010
PDF
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
PPTX
One Click Hadoop Clusters - Anywhere (Using Docker)
PPTX
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
PPTX
Running Spark and MapReduce together in Production
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
PPTX
Spark Application Development Made Easy
PPTX
NoSQL Needs SomeSQL
PPTX
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
PPTX
Big Data Challenges in the Energy Sector
PPTX
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
PDF
Online Approximate OLAP in SparkSQL
Hadoop in Validated Environment - Data Governance Initiative
HBase and Drill: How loosley typed SQL is ideal for NoSQL
Inspiring Travel at Airbnb [WIP]
Realistic Synthetic Generation Allows Secure Development
Karta an ETL Framework to process high volume datasets
Carpe Datum: Building Big Data Analytical Applications with HP Haven
50 Shades of SQL
Practical Distributed Machine Learning Pipelines on Hadoop
Hadoop for Genomics__HadoopSummit2010
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
One Click Hadoop Clusters - Anywhere (Using Docker)
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Running Spark and MapReduce together in Production
Open Source SQL for Hadoop: Where are we and Where are we Going?
Spark Application Development Made Easy
NoSQL Needs SomeSQL
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Big Data Challenges in the Energy Sector
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Online Approximate OLAP in SparkSQL
Ad

Similar to Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosystem/YARN solution (20)

PDF
1-s2.0-S1877050915011874-main
PPTX
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
PDF
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
PDF
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
PPTX
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
PDF
Hortonworks Technical Workshop: HBase and Apache Phoenix
PPTX
Hackathon bonn
PDF
DUG'20: 13 - HPE’s DAOS Solution Plans
PDF
Hadoop big data
PPTX
Leveraging the Spark-HPCC Ecosystem
PPTX
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PPTX
Hadoop - Looking to the Future By Arun Murthy
PPTX
Brief Introduction about Hadoop and Core Services.
PPTX
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
PDF
LEG Keynote: Linda Knippers - HP
PPTX
An Overview on Optimization in Apache Hive: Past, Present, Future
PPTX
SQL On Hadoop
PDF
Meet HBase 2.0 and Phoenix-5.0
1-s2.0-S1877050915011874-main
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hackathon bonn
DUG'20: 13 - HPE’s DAOS Solution Plans
Hadoop big data
Leveraging the Spark-HPCC Ecosystem
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Unified, Efficient, and Portable Data Processing with Apache Beam
Hadoop - Looking to the Future By Arun Murthy
Brief Introduction about Hadoop and Core Services.
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
LEG Keynote: Linda Knippers - HP
An Overview on Optimization in Apache Hive: Past, Present, Future
SQL On Hadoop
Meet HBase 2.0 and Phoenix-5.0

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosystem/YARN solution

  • 1. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution S&P Captital IQ Friday 22nd May, 2015
  • 2. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Agenda The HPC Inheritance The Need Integrating the Hadoop Ecosystem Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Advantages and Potential Drawbacks Closing & Questions
  • 3. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The HPC Inheritance Preexisting HPC distributed computing infrastructure established 2003-2007 Usually 500 - 5000 cores, but some instances 100K, not on a single (HA) RM Vendor products, (usually) closed source No separate resource schedulers, a notable exception (EGO, Platform Computing)
  • 4. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The HPC Inheritance Preexisting The HPC applications HPC systems built with: MPICH, OpenMPI, ACE-TAO or Sockets Few applications have 80% of the computational resources, 80/20 (Pareto) principle Designed for computation heavy apps, with low I/Os, with concentrated demand in range of hours Low latency/high throughput, but some variances Built with a particular (vendor) API implementation, callbacks Continuous optimization cycles, on algorithmic and on infrastructure levels
  • 5. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution The Need Engineer a new system, distributed computing & data, at reasonable cost. Reuse the infrastructure Reuse already built internal knowledge Current HPC applications should not experience noticeable slowness Growing awareness that heavy compute & data oriented application need to be built in distributed fashion sharing resources Efficient resource utilization
  • 6. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integrating the Hadoop Ecosystem Apache Hadoop on the existing HPC infrastructure: hardware coexistence, resource mapping one-to-one .bashrc user account profile to setup the environment for both of the systems Using YARN as a resource scheduler for the both systems May need OS optimization due to I/Os
  • 7. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC Building AM, using the vendor API to control the HPC computational processes, allocation on demand considering the HPC specifics AMRMClientAsync handles AM communications with RM, needs CallbackHandler implementation Depending on the HPC API, queue pooling or events notification up/down process, memory-utilization efficient - process start, slow open/close fast - memory footprint HPC YARN AM, uses YARN API calls and HPC management API, variety of combinations of resource allocation/release possible fixed, fixed + incremental, incremental only scheduling based on job patterns, prediction scheduling (art) combinations of the above Handling the YARN’s callbacks for resource management Recoverable on AM crash: simple state based on config parameters and HPC Scheduler queues state
  • 8. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 9. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+1 YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 10. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+2 YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 11. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Integration of HPC vendor based and the Hadoop ecosystem via YARN AM Building YARN AM as valve for the computation flow to HPC HPC Sched- uler t+n YARN RM1 YARN RM2 NNs HA ZK nodes R1 R2 hpc AM R4 R5 H D F S
  • 12. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Advantages and Potential Drawbacks Advantages Sharing resources: Apache Hadoop coexisting with HPC increasing resources utilization the pattern changes are visible at the node compute resources in contrast to the network that can have quite complex topology and behavior allowing new infrastructure to grow out of the existing one Potential Drawbacks Sharing resources: HPC AM logic adds additional complexity and in some cases it may be considerable The work is somehow slower, implementing gradual changes and observing the system behavior based on the job patterns May impose additional data block transfers on the network
  • 13. Coexistence and Migration of Vendor HPC-based Infrastructure to Hadoop Ecosystem/YARN Solution Closing & Questions Integrating the HPC RM/Schedulers with the Hadoop Ecosystem via a custom AM valve, an optimal way to make the HPC aware of YARN Slowing hardware expansion & efficient resource utilization Q&A