SlideShare a Scribd company logo
Nikolay Malitsky
Approaching the Fifth Paradigm
#SAISEco4
Outline
 Paradigm Shift
 Spark-MPI Approach
 MPI-Based Deep Learning Applications
 Next: Reinforcement Learning Applications
2#SAISEco4
Four Science Paradigms*
1. Experimental: describe empirical facts and test hypotheses
since: thousand years ago
2. Theoretical: explain and predict natural phenomena using
models and abstractions
since: several hundred years ago
3. Computational: simulate theoretical models using computers
since: second half of the 20th century
4. Data-Intensive: scientific discoveries based on Big Data analytics
since: around 15 years ago
3
*Jim Gray and Alex Szalay, eScience – A Transformed Scientific Method, NRC-CSTB, 2007
#SAISEco4
Paradigm Shift
4
▪ The fourth paradigm of data-intensive science rapidly became a major conceptual
approach for multiple application domains encompassing and generating large-
scale scientific drivers such as fusion reactors and light source facilities.
▪ The success of data-intensive projects subsequently triggered an explosion
of numerous machine learning approaches addressing a wide range of
industrial and scientific applications such as computer vision, self-driving
cars, and brain modelling.
▪ The next generation of artificial intelligent systems clearly represents
a paradigm shift from data processing pipelines towards cognitive
knowledge-centric applications.
4th Paradigm:
Data-Intensive
Science3rd Paradigm:
Computational
Science
5th Paradigm:
Cognitive Computing
DeepMind AlphaGo
IBM Watson DeepQA
Human Brain Project
▪ As shown in Fig. 1, AI systems broke the boundaries of computational and
data-intensive paradigms and began to form a new ecosystem by merging and
extending existing technologies.
Figure 1: The Fifth Paradigm*
#SAISEco4
*N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018
Knowledge
5
▪ In his original talk, Jim Gray discussed “objectifying” knowledge within the field of ontology for
providing a structured representation of abstract concepts and physical entities. This direction is
related with the development of structured knowledge bases and associated technologies such as the
Semantic Web and Linked Data.
▪ Existing structured resources however only capture a tiny subset of available information. Therefore,
advanced question-answering (QA) systems* augmented them with corpora of raw text and processing
pipelines consisting of multiple stages that combine hundreds of different cooperating algorithms from
various fields.
As a result, emerging AI-oriented applications imply a more general and practical knowledge definition:
Knowledge is a multifacet substance distributed among heterogeneous information networks and
associated processing platforms. The structure and relationship between different components of such
a composite representation is dynamic, continuously shaped and consolidated by machine learning
processes.
*D. A. Ferrucci, Introduction to “This is Watson”, IBM Journal of Research and Development, 2012
#SAISEco4
From Processing Pipelines to Rational Agents
6
Data-intensive
processing pipelines
Deep learning
model-centric applications
W
W
W
D
D
D
O
D
D
D
W
W
W
M
W
W
W
D
D
D
O
Reinforcement learning
agent-oriented applications
D
D
D
W
W
W
M
#SAISEco4
Approaching the Fifth Paradigm of Cognitive Applications
7
*Dharshan Kumaran, Demis Hassabis, and James L. McClelland, What Learning Systems do Intelligent Agents Need?
Complementary Learning Systems, Trends in Cognitive Sciences, 2016
Neocortex /
Heterogeneous Knowledge
and Information Network
Hippocampus /
Streaming Pipeline
Figure 2: Complementary Learning Systems*
The consolidation of HPC and Big Data
machine learning technologies
represents the prerequisite for
developing the next paradigm of
cognitive applications
4th Paradigm:
Data-Intensive
Science3rd Paradigm:
Computational
Science
5th Paradigm:
Cognitive Applications
Figure 1: The Fifth Paradigm
#SAISEco4
Spark-MPI Approach
8#SAISEco4
Closing the gap between Big Data and HPC computing
9
*Geoffrey Fox et al. HPC-ABDC High Performance Computing Enhanced Apache Big Data Stack, CCGrid, 2015
Spark MPI
Ecosystems*: Big Data HPC Computing
New Frontiers
#SAISEco4
MPI: Message Passing Interface
10
Application Programming Interface:
▪ peer-to-peer: allreduce
▪ master-workers: scatter, gather, reduce
▪ point-to-point: send, receive
▪ remote memory access: put, get
Portable Access Layer for various communication protocols:
Process Management Interface:
▪ RDMA
▪ GPUDirect RDMA
▪ TCP/IP
▪ shared memory
▪ address exchange service
▪ …
#SAISEco4
PMI-based Spark-MPI Approach
11
Spark
Driver
PMI
Server
Spark
Worker
Spark
Worker
Spark
Worker
Spark
Worker
Spark driver-worker
PMI server-worker
MPI inter-worker
Interfaces
▪ PMI-Exascale (PMIx): created by the Open
MPI team3 in response to the ever-increasing
scale of supercomputing clusters.
(3) R. Castain, D. Solt, J. Hursey, and A. Bouteiller, PMIx: Process Management for Exascale Environment, 2017
(2) P. Balaji et al. PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems, 2010
▪ Process Management Interface (PMI):
originally developed by the MPICH team2
and used for exchanging wireup information
among processes.
#SAISEco4
▪ The PMIx community has therefore focused on extending
the earlier PMI work, adding flexibility to existing APIs (e.g.,
to support asynchronous operations) as well as new APIs
that broaden the range of interactions with the resident
resource manager.
(1) N. Malitsky et al. Building Near-Real-Time Processing Pipelines with the Spark-MPI platform, NYSDS, 2017
▪ Spark-MPI1 encompasses three interfaces. Specifically, it
complements the Spark conventional driver-worker model
with the PMI server-worker interface for establishing MPI
inter-worker communications.
Open MPI*
12
Open MPI was derived as a generalization of four projects bringing together over 40 frameworks. It introduced
a Modular Component Architecture (MCA) that utilized components (a.k.a. plugins) to provide alternative
implementations of key functional blocks such as message transport, mapping, algorithms, and collective
operations.
*E.Gabriel, G.E. Fagg, G. Bosilca, T. Anhskun, J. J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain,
D. J. Daniel, R. L. Graham, and T. S. Woodall, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, 2004
Framework
Base
component
…
component
MPI application
Modular Component Architecture (MCA)
Framework
Base
component
…
component
…
Architecture
MPI application
Open MPI core (OPAL, ORTE, and OMPI layers)
sparkmpi
Base
default
…
Base
tcp
ofi
MPI byte
transfer layer
(btl)
smcuda
…
…
OpenRTE Daemon’s
Launch Subsystem
(odls)
Implementation
#SAISEco4
Spark-MPI Integrated Platform
13
N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018
MPI-Based Algorithms
Process Management Interface (PMI)
Connectors
Resilient Distributed Dataset API
SLURM
Parallel
File Systems
Spark Platform HPC Extensions
Receivers
Streaming
Sources
#SAISEco4
MPI-Based Deep Learning Applications
14#SAISEco4
Deep Learning Training as a Third Paradigm
Computational Application
15
W
W W
W
P
Parameter Server-based Data Parallel Model*
P: Parameter Server
*Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2015
W
W
W
W
All-Reduce Model
W: DL Worker
#SAISEco4
(Some of the) MPI DL Projects
16
▪ CNTK1: Microsoft Cognitive Toolkit
▪ TensorFlow-Matex2: added two new TensorFlow operators, Global_Broadcast and
MPI_Allreduce
▪ S-Caffe3: scaled Caffe with the MPI level hierarchical reduction design
▪ Horovod4: adopted Baidu’s approach based on the ring-allreduce algorithm and further
developed its implementation with NVIDIA’s NCCL library for collective implementation
(1) A. Agarwal et al. An Introduction to Computational Networks and Computational Network Toolkit, 2014
(2) A. Vishnu et al. User-transparent distributed TensorFlow, 2017
(3) A. A. Awan et al. S-Caffe: co-designing MPI runtime and Caffe for scalable deep learning on modern GPU clusters, 2017
(4) A. Segeev and M. Del Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018
(5) P. Mendygral. Scaling Deep Learning, 2018
▪ CPE ML Plugin5: Cray Programming Environment Machine Learning Plugin
#SAISEco4
Spark-MPI-Horovod
17
Crate the TF optimizer
Wrap TF with Horovod
Run the Horovod training
on Spark workers
Initialize Horovod and MPI
Extract the MNIST dataset
Build the DL model
…
Run the Horovod MPI-
based training
Initialize the PMI environmental variables
The Horovod MPI-based training framework replaces
the TensorFlow parameter servers with the ring-
allreduce approach for averaging gradients among
TensorFlow workers.
For users, the corresponding integration consists of two
primary steps as illustrated by the script: (1) initializing
Horovod with hvd.init() and (2) wrapping TensorFlow
worker’s optimizer with hvd.DistributedOptimizer().
The Spark-MPI pipelines enable to process the Horovod
training on Spark workers with Map operations. To
establish MPI communication among the Spark workers,
the Map operation (e.g. train()) needs only to define
PMI-related environmental variables (such as
PMIX_RANK and a port number).
#SAISEco4
Next
18#SAISEco4
Deep Reinforcement Learning
19
**R. Nishihara et. al. Real-Time Machine Learning: The Missing Pieces, arXiv 1703.03924, 2017
*A. Nair et al. Massively Parallel Methods for Deep Reinforcement Learning, ICML, 2015
Environment Actor
Parameter
Server
DQN Learner
Replay
Memory
(s,a,r,s’)
(s,a,r,s’)
Agent
argmaxa Q(s, a; q)
(r,s’)
Figure 1: Gorila* (General Reinforcement Learning Architecture)
System Requirements**:
• Low latency
• High throughput
• Dynamic task creation
• Heterogeneous tasks
• Arbitrary dataflow dependencies
• Transparent fault tolerance
• Debuggability and profiling
#SAISEco4
(Some of the) RL Applications*
20
(2) D. Silver et al. Mastering the game of Go with deep neutral networks an tree search, Nature, 2016
(1) V. Mnih et al. Playing Atari with Deep Reinforcement Learning, NIPS, 2013
▪ Atari Games1
▪ AlphaGo2
▪ Robotics
▪ Self-driving vehicles
▪ Autonomous UAVs
…
“Pterodactylus antiquus, the first pterosaur
species to be named and identified as a flying
reptile … 150.8–148.5 million years ago”
(Wikipedia)
range
#SAISEco4
Summary
21
▪ Emerging AI projects represent a paradigm shift from data processing pipelines towards
the fifth paradigm of cognitive knowledge-centric applications.
▪ The new generation of AI composite applications requires the integration of Big Data and
HPC technologies. For example, MPI was originally introduced within the computational
paradigm ecosystem for developing HPC scientific applications. But recently, MPI was
successfully applied for extending the scale of deep learning applications.
▪ Knowledge is a multifacet substance distributed among heterogeneous information
networks and associated processing platforms. The structure and relationship between
different components is dynamic, continuously shaped and consolidated by machine
learning processes.
▪ Spark-MPI addresses this strategic direction by extending the Spark platform with MPI-
based HPC applications using the Process Management Interface (PMI).
#SAISEco4

More Related Content

PDF
Data center Building & General Specification
PPT
Simplifying Data Center Design/ Build
PDF
Best Practices for Planning your Datacenter
PPTX
Nutanix Fundamentals The Enterprise Cloud Company
PDF
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
PDF
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
PPTX
Using FIWARE and Microsoft Azure for the development of IoT solutions
PPTX
POWER POINT PRESENTATION ON DATA CENTER
Data center Building & General Specification
Simplifying Data Center Design/ Build
Best Practices for Planning your Datacenter
Nutanix Fundamentals The Enterprise Cloud Company
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
Using FIWARE and Microsoft Azure for the development of IoT solutions
POWER POINT PRESENTATION ON DATA CENTER

What's hot (20)

PDF
Blueplanet Inventory & Federation - Alvaro Osle, blueplanet
PDF
A crash course in CRUSH
PPTX
Implementation & Comparison Of Rdma Over Ethernet
PPTX
Energy efficiency data center overview
PDF
Top 10 Data Center Success Criteria
PPTX
Data Center
PDF
A Jupyter kernel for Scala and Apache Spark.pdf
PPTX
Data Analytics in Carbon Capture and Storage
PPTX
Importance of data centers
PDF
【続編】その ionice、ほんとに効いてますか?
PPTX
Perforce
PPT
Tia 942 Data Center Standards
PDF
CXL_説明_公開用.pdf
PPTX
Data center tier standards
PPTX
Network Attached Storage (NAS)
PDF
Learning To Love Forms (WebVisions '07)
PPTX
Computer programmer
PDF
Attom Micro Data Center Brochure
PDF
Go or No-Go: Operability and Contingency Planning at Etsy.com
PPTX
Ovs dpdk hwoffload way to full offload
Blueplanet Inventory & Federation - Alvaro Osle, blueplanet
A crash course in CRUSH
Implementation & Comparison Of Rdma Over Ethernet
Energy efficiency data center overview
Top 10 Data Center Success Criteria
Data Center
A Jupyter kernel for Scala and Apache Spark.pdf
Data Analytics in Carbon Capture and Storage
Importance of data centers
【続編】その ionice、ほんとに効いてますか?
Perforce
Tia 942 Data Center Standards
CXL_説明_公開用.pdf
Data center tier standards
Network Attached Storage (NAS)
Learning To Love Forms (WebVisions '07)
Computer programmer
Attom Micro Data Center Brochure
Go or No-Go: Operability and Contingency Planning at Etsy.com
Ovs dpdk hwoffload way to full offload
Ad

Similar to Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky (20)

PPTX
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
PDF
Simple, Modular and Extensible Big Data Platform Concept
PPTX
10 Big Data Technologies you Didn't Know About
PPTX
Open, Secure & Transparent AI Pipelines
PPTX
What’s New in the Berkeley Data Analytics Stack
PPTX
NVIDIA GTC21 AI Conference Highlights
PDF
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
PPTX
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
PPT
The Concurrent Constraint Programming Research Programmes -- Redux
PDF
Top 11 python frameworks for machine learning and deep learning
PDF
Towards Smart Modeling (Environments)
PDF
Dev Ops Training
PPTX
In Memory Analytics with Apache Spark
PPTX
Designing High performance & Scalable Middleware for HPC
PDF
Implementation of Machine Learning Algorithms Using Control Flow and Dataflow...
PDF
What is Distributed Computing, Why we use Apache Spark
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PDF
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Simple, Modular and Extensible Big Data Platform Concept
10 Big Data Technologies you Didn't Know About
Open, Secure & Transparent AI Pipelines
What’s New in the Berkeley Data Analytics Stack
NVIDIA GTC21 AI Conference Highlights
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
The Concurrent Constraint Programming Research Programmes -- Redux
Top 11 python frameworks for machine learning and deep learning
Towards Smart Modeling (Environments)
Dev Ops Training
In Memory Analytics with Apache Spark
Designing High performance & Scalable Middleware for HPC
Implementation of Machine Learning Algorithms Using Control Flow and Dataflow...
What is Distributed Computing, Why we use Apache Spark
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
modul_python (1).pptx for professional and student
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
How to run a consulting project- client discovery
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Leprosy and NLEP programme community medicine
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Managing Community Partner Relationships
PDF
Transcultural that can help you someday.
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Introduction to Data Science and Data Analysis
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
modul_python (1).pptx for professional and student
IMPACT OF LANDSLIDE.....................
SAP 2 completion done . PRESENTATION.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
How to run a consulting project- client discovery
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Leprosy and NLEP programme community medicine
ISS -ESG Data flows What is ESG and HowHow
Optimise Shopper Experiences with a Strong Data Estate.pdf
importance of Data-Visualization-in-Data-Science. for mba studnts
Microsoft Core Cloud Services powerpoint
Managing Community Partner Relationships
Transcultural that can help you someday.
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Data Science and Data Analysis
Topic 5 Presentation 5 Lesson 5 Corporate Fin

Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky

  • 1. Nikolay Malitsky Approaching the Fifth Paradigm #SAISEco4
  • 2. Outline  Paradigm Shift  Spark-MPI Approach  MPI-Based Deep Learning Applications  Next: Reinforcement Learning Applications 2#SAISEco4
  • 3. Four Science Paradigms* 1. Experimental: describe empirical facts and test hypotheses since: thousand years ago 2. Theoretical: explain and predict natural phenomena using models and abstractions since: several hundred years ago 3. Computational: simulate theoretical models using computers since: second half of the 20th century 4. Data-Intensive: scientific discoveries based on Big Data analytics since: around 15 years ago 3 *Jim Gray and Alex Szalay, eScience – A Transformed Scientific Method, NRC-CSTB, 2007 #SAISEco4
  • 4. Paradigm Shift 4 ▪ The fourth paradigm of data-intensive science rapidly became a major conceptual approach for multiple application domains encompassing and generating large- scale scientific drivers such as fusion reactors and light source facilities. ▪ The success of data-intensive projects subsequently triggered an explosion of numerous machine learning approaches addressing a wide range of industrial and scientific applications such as computer vision, self-driving cars, and brain modelling. ▪ The next generation of artificial intelligent systems clearly represents a paradigm shift from data processing pipelines towards cognitive knowledge-centric applications. 4th Paradigm: Data-Intensive Science3rd Paradigm: Computational Science 5th Paradigm: Cognitive Computing DeepMind AlphaGo IBM Watson DeepQA Human Brain Project ▪ As shown in Fig. 1, AI systems broke the boundaries of computational and data-intensive paradigms and began to form a new ecosystem by merging and extending existing technologies. Figure 1: The Fifth Paradigm* #SAISEco4 *N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018
  • 5. Knowledge 5 ▪ In his original talk, Jim Gray discussed “objectifying” knowledge within the field of ontology for providing a structured representation of abstract concepts and physical entities. This direction is related with the development of structured knowledge bases and associated technologies such as the Semantic Web and Linked Data. ▪ Existing structured resources however only capture a tiny subset of available information. Therefore, advanced question-answering (QA) systems* augmented them with corpora of raw text and processing pipelines consisting of multiple stages that combine hundreds of different cooperating algorithms from various fields. As a result, emerging AI-oriented applications imply a more general and practical knowledge definition: Knowledge is a multifacet substance distributed among heterogeneous information networks and associated processing platforms. The structure and relationship between different components of such a composite representation is dynamic, continuously shaped and consolidated by machine learning processes. *D. A. Ferrucci, Introduction to “This is Watson”, IBM Journal of Research and Development, 2012 #SAISEco4
  • 6. From Processing Pipelines to Rational Agents 6 Data-intensive processing pipelines Deep learning model-centric applications W W W D D D O D D D W W W M W W W D D D O Reinforcement learning agent-oriented applications D D D W W W M #SAISEco4
  • 7. Approaching the Fifth Paradigm of Cognitive Applications 7 *Dharshan Kumaran, Demis Hassabis, and James L. McClelland, What Learning Systems do Intelligent Agents Need? Complementary Learning Systems, Trends in Cognitive Sciences, 2016 Neocortex / Heterogeneous Knowledge and Information Network Hippocampus / Streaming Pipeline Figure 2: Complementary Learning Systems* The consolidation of HPC and Big Data machine learning technologies represents the prerequisite for developing the next paradigm of cognitive applications 4th Paradigm: Data-Intensive Science3rd Paradigm: Computational Science 5th Paradigm: Cognitive Applications Figure 1: The Fifth Paradigm #SAISEco4
  • 9. Closing the gap between Big Data and HPC computing 9 *Geoffrey Fox et al. HPC-ABDC High Performance Computing Enhanced Apache Big Data Stack, CCGrid, 2015 Spark MPI Ecosystems*: Big Data HPC Computing New Frontiers #SAISEco4
  • 10. MPI: Message Passing Interface 10 Application Programming Interface: ▪ peer-to-peer: allreduce ▪ master-workers: scatter, gather, reduce ▪ point-to-point: send, receive ▪ remote memory access: put, get Portable Access Layer for various communication protocols: Process Management Interface: ▪ RDMA ▪ GPUDirect RDMA ▪ TCP/IP ▪ shared memory ▪ address exchange service ▪ … #SAISEco4
  • 11. PMI-based Spark-MPI Approach 11 Spark Driver PMI Server Spark Worker Spark Worker Spark Worker Spark Worker Spark driver-worker PMI server-worker MPI inter-worker Interfaces ▪ PMI-Exascale (PMIx): created by the Open MPI team3 in response to the ever-increasing scale of supercomputing clusters. (3) R. Castain, D. Solt, J. Hursey, and A. Bouteiller, PMIx: Process Management for Exascale Environment, 2017 (2) P. Balaji et al. PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems, 2010 ▪ Process Management Interface (PMI): originally developed by the MPICH team2 and used for exchanging wireup information among processes. #SAISEco4 ▪ The PMIx community has therefore focused on extending the earlier PMI work, adding flexibility to existing APIs (e.g., to support asynchronous operations) as well as new APIs that broaden the range of interactions with the resident resource manager. (1) N. Malitsky et al. Building Near-Real-Time Processing Pipelines with the Spark-MPI platform, NYSDS, 2017 ▪ Spark-MPI1 encompasses three interfaces. Specifically, it complements the Spark conventional driver-worker model with the PMI server-worker interface for establishing MPI inter-worker communications.
  • 12. Open MPI* 12 Open MPI was derived as a generalization of four projects bringing together over 40 frameworks. It introduced a Modular Component Architecture (MCA) that utilized components (a.k.a. plugins) to provide alternative implementations of key functional blocks such as message transport, mapping, algorithms, and collective operations. *E.Gabriel, G.E. Fagg, G. Bosilca, T. Anhskun, J. J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, 2004 Framework Base component … component MPI application Modular Component Architecture (MCA) Framework Base component … component … Architecture MPI application Open MPI core (OPAL, ORTE, and OMPI layers) sparkmpi Base default … Base tcp ofi MPI byte transfer layer (btl) smcuda … … OpenRTE Daemon’s Launch Subsystem (odls) Implementation #SAISEco4
  • 13. Spark-MPI Integrated Platform 13 N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018 MPI-Based Algorithms Process Management Interface (PMI) Connectors Resilient Distributed Dataset API SLURM Parallel File Systems Spark Platform HPC Extensions Receivers Streaming Sources #SAISEco4
  • 14. MPI-Based Deep Learning Applications 14#SAISEco4
  • 15. Deep Learning Training as a Third Paradigm Computational Application 15 W W W W P Parameter Server-based Data Parallel Model* P: Parameter Server *Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2015 W W W W All-Reduce Model W: DL Worker #SAISEco4
  • 16. (Some of the) MPI DL Projects 16 ▪ CNTK1: Microsoft Cognitive Toolkit ▪ TensorFlow-Matex2: added two new TensorFlow operators, Global_Broadcast and MPI_Allreduce ▪ S-Caffe3: scaled Caffe with the MPI level hierarchical reduction design ▪ Horovod4: adopted Baidu’s approach based on the ring-allreduce algorithm and further developed its implementation with NVIDIA’s NCCL library for collective implementation (1) A. Agarwal et al. An Introduction to Computational Networks and Computational Network Toolkit, 2014 (2) A. Vishnu et al. User-transparent distributed TensorFlow, 2017 (3) A. A. Awan et al. S-Caffe: co-designing MPI runtime and Caffe for scalable deep learning on modern GPU clusters, 2017 (4) A. Segeev and M. Del Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018 (5) P. Mendygral. Scaling Deep Learning, 2018 ▪ CPE ML Plugin5: Cray Programming Environment Machine Learning Plugin #SAISEco4
  • 17. Spark-MPI-Horovod 17 Crate the TF optimizer Wrap TF with Horovod Run the Horovod training on Spark workers Initialize Horovod and MPI Extract the MNIST dataset Build the DL model … Run the Horovod MPI- based training Initialize the PMI environmental variables The Horovod MPI-based training framework replaces the TensorFlow parameter servers with the ring- allreduce approach for averaging gradients among TensorFlow workers. For users, the corresponding integration consists of two primary steps as illustrated by the script: (1) initializing Horovod with hvd.init() and (2) wrapping TensorFlow worker’s optimizer with hvd.DistributedOptimizer(). The Spark-MPI pipelines enable to process the Horovod training on Spark workers with Map operations. To establish MPI communication among the Spark workers, the Map operation (e.g. train()) needs only to define PMI-related environmental variables (such as PMIX_RANK and a port number). #SAISEco4
  • 19. Deep Reinforcement Learning 19 **R. Nishihara et. al. Real-Time Machine Learning: The Missing Pieces, arXiv 1703.03924, 2017 *A. Nair et al. Massively Parallel Methods for Deep Reinforcement Learning, ICML, 2015 Environment Actor Parameter Server DQN Learner Replay Memory (s,a,r,s’) (s,a,r,s’) Agent argmaxa Q(s, a; q) (r,s’) Figure 1: Gorila* (General Reinforcement Learning Architecture) System Requirements**: • Low latency • High throughput • Dynamic task creation • Heterogeneous tasks • Arbitrary dataflow dependencies • Transparent fault tolerance • Debuggability and profiling #SAISEco4
  • 20. (Some of the) RL Applications* 20 (2) D. Silver et al. Mastering the game of Go with deep neutral networks an tree search, Nature, 2016 (1) V. Mnih et al. Playing Atari with Deep Reinforcement Learning, NIPS, 2013 ▪ Atari Games1 ▪ AlphaGo2 ▪ Robotics ▪ Self-driving vehicles ▪ Autonomous UAVs … “Pterodactylus antiquus, the first pterosaur species to be named and identified as a flying reptile … 150.8–148.5 million years ago” (Wikipedia) range #SAISEco4
  • 21. Summary 21 ▪ Emerging AI projects represent a paradigm shift from data processing pipelines towards the fifth paradigm of cognitive knowledge-centric applications. ▪ The new generation of AI composite applications requires the integration of Big Data and HPC technologies. For example, MPI was originally introduced within the computational paradigm ecosystem for developing HPC scientific applications. But recently, MPI was successfully applied for extending the scale of deep learning applications. ▪ Knowledge is a multifacet substance distributed among heterogeneous information networks and associated processing platforms. The structure and relationship between different components is dynamic, continuously shaped and consolidated by machine learning processes. ▪ Spark-MPI addresses this strategic direction by extending the Spark platform with MPI- based HPC applications using the Process Management Interface (PMI). #SAISEco4