Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky

Nikolay Malitsky
Approaching the Fifth Paradigm
#SAISEco4

Outline
 Paradigm Shift
 Spark-MPI Approach
 MPI-Based Deep Learning Applications
 Next: Reinforcement Learning Applications
2#SAISEco4

Four Science Paradigms*
1. Experimental: describe empirical facts and test hypotheses
since: thousand years ago
2. Theoretical: explain and predict natural phenomena using
models and abstractions
since: several hundred years ago
3. Computational: simulate theoretical models using computers
since: second half of the 20th century
4. Data-Intensive: scientific discoveries based on Big Data analytics
since: around 15 years ago
3
*Jim Gray and Alex Szalay, eScience – A Transformed Scientific Method, NRC-CSTB, 2007
#SAISEco4

Paradigm Shift
4
▪ The fourth paradigm of data-intensive science rapidly became a major conceptual
approach for multiple application domains encompassing and generating large-
scale scientific drivers such as fusion reactors and light source facilities.
▪ The success of data-intensive projects subsequently triggered an explosion
of numerous machine learning approaches addressing a wide range of
industrial and scientific applications such as computer vision, self-driving
cars, and brain modelling.
▪ The next generation of artificial intelligent systems clearly represents
a paradigm shift from data processing pipelines towards cognitive
knowledge-centric applications.
4th Paradigm:
Data-Intensive
Science3rd Paradigm:
Computational
Science
5th Paradigm:
Cognitive Computing
DeepMind AlphaGo
IBM Watson DeepQA
Human Brain Project
▪ As shown in Fig. 1, AI systems broke the boundaries of computational and
data-intensive paradigms and began to form a new ecosystem by merging and
extending existing technologies.
Figure 1: The Fifth Paradigm*
#SAISEco4
*N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018

Knowledge
5
▪ In his original talk, Jim Gray discussed “objectifying” knowledge within the field of ontology for
providing a structured representation of abstract concepts and physical entities. This direction is
related with the development of structured knowledge bases and associated technologies such as the
Semantic Web and Linked Data.
▪ Existing structured resources however only capture a tiny subset of available information. Therefore,
advanced question-answering (QA) systems* augmented them with corpora of raw text and processing
pipelines consisting of multiple stages that combine hundreds of different cooperating algorithms from
various fields.
As a result, emerging AI-oriented applications imply a more general and practical knowledge definition:
Knowledge is a multifacet substance distributed among heterogeneous information networks and
associated processing platforms. The structure and relationship between different components of such
a composite representation is dynamic, continuously shaped and consolidated by machine learning
processes.
*D. A. Ferrucci, Introduction to “This is Watson”, IBM Journal of Research and Development, 2012
#SAISEco4

From Processing Pipelines to Rational Agents
6
Data-intensive
processing pipelines
Deep learning
model-centric applications
W
W
W
D
D
D
O
D
D
D
W
W
W
M
W
W
W
D
D
D
O
Reinforcement learning
agent-oriented applications
D
D
D
W
W
W
M
#SAISEco4

Approaching the Fifth Paradigm of Cognitive Applications
7
*Dharshan Kumaran, Demis Hassabis, and James L. McClelland, What Learning Systems do Intelligent Agents Need?
Complementary Learning Systems, Trends in Cognitive Sciences, 2016
Neocortex /
Heterogeneous Knowledge
and Information Network
Hippocampus /
Streaming Pipeline
Figure 2: Complementary Learning Systems*
The consolidation of HPC and Big Data
machine learning technologies
represents the prerequisite for
developing the next paradigm of
cognitive applications
4th Paradigm:
Data-Intensive
Science3rd Paradigm:
Computational
Science
5th Paradigm:
Cognitive Applications
Figure 1: The Fifth Paradigm
#SAISEco4

Closing the gap between Big Data and HPC computing
9
*Geoffrey Fox et al. HPC-ABDC High Performance Computing Enhanced Apache Big Data Stack, CCGrid, 2015
Spark MPI
Ecosystems*: Big Data HPC Computing
New Frontiers
#SAISEco4

MPI: Message Passing Interface
10
Application Programming Interface:
▪ peer-to-peer: allreduce
▪ master-workers: scatter, gather, reduce
▪ point-to-point: send, receive
▪ remote memory access: put, get
Portable Access Layer for various communication protocols:
Process Management Interface:
▪ RDMA
▪ GPUDirect RDMA
▪ TCP/IP
▪ shared memory
▪ address exchange service
▪ …
#SAISEco4

PMI-based Spark-MPI Approach
11
Spark
Driver
PMI
Server
Spark
Worker
Spark
Worker
Spark
Worker
Spark
Worker
Spark driver-worker
PMI server-worker
MPI inter-worker
Interfaces
▪ PMI-Exascale (PMIx): created by the Open
MPI team3 in response to the ever-increasing
scale of supercomputing clusters.
(3) R. Castain, D. Solt, J. Hursey, and A. Bouteiller, PMIx: Process Management for Exascale Environment, 2017
(2) P. Balaji et al. PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems, 2010
▪ Process Management Interface (PMI):
originally developed by the MPICH team2
and used for exchanging wireup information
among processes.
#SAISEco4
▪ The PMIx community has therefore focused on extending
the earlier PMI work, adding flexibility to existing APIs (e.g.,
to support asynchronous operations) as well as new APIs
that broaden the range of interactions with the resident
resource manager.
(1) N. Malitsky et al. Building Near-Real-Time Processing Pipelines with the Spark-MPI platform, NYSDS, 2017
▪ Spark-MPI1 encompasses three interfaces. Specifically, it
complements the Spark conventional driver-worker model
with the PMI server-worker interface for establishing MPI
inter-worker communications.

Open MPI*
12
Open MPI was derived as a generalization of four projects bringing together over 40 frameworks. It introduced
a Modular Component Architecture (MCA) that utilized components (a.k.a. plugins) to provide alternative
implementations of key functional blocks such as message transport, mapping, algorithms, and collective
operations.
*E.Gabriel, G.E. Fagg, G. Bosilca, T. Anhskun, J. J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain,
D. J. Daniel, R. L. Graham, and T. S. Woodall, Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, 2004
Framework
Base
component
…
component
MPI application
Modular Component Architecture (MCA)
Framework
Base
component
…
component
…
Architecture
MPI application
Open MPI core (OPAL, ORTE, and OMPI layers)
sparkmpi
Base
default
…
Base
tcp
ofi
MPI byte
transfer layer
(btl)
smcuda
…
…
OpenRTE Daemon’s
Launch Subsystem
(odls)
Implementation
#SAISEco4

Spark-MPI Integrated Platform
13
N. Malitsky, R. Castain, and M. Cowan, Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications, arXiv:1806.01110, 2018
MPI-Based Algorithms
Process Management Interface (PMI)
Connectors
Resilient Distributed Dataset API
SLURM
Parallel
File Systems
Spark Platform HPC Extensions
Receivers
Streaming
Sources
#SAISEco4

MPI-Based Deep Learning Applications
14#SAISEco4

Deep Learning Training as a Third Paradigm
Computational Application
15
W
W W
W
P
Parameter Server-based Data Parallel Model*
P: Parameter Server
*Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2015
W
W
W
W
All-Reduce Model
W: DL Worker
#SAISEco4

(Some of the) MPI DL Projects
16
▪ CNTK1: Microsoft Cognitive Toolkit
▪ TensorFlow-Matex2: added two new TensorFlow operators, Global_Broadcast and
MPI_Allreduce
▪ S-Caffe3: scaled Caffe with the MPI level hierarchical reduction design
▪ Horovod4: adopted Baidu’s approach based on the ring-allreduce algorithm and further
developed its implementation with NVIDIA’s NCCL library for collective implementation
(1) A. Agarwal et al. An Introduction to Computational Networks and Computational Network Toolkit, 2014
(2) A. Vishnu et al. User-transparent distributed TensorFlow, 2017
(3) A. A. Awan et al. S-Caffe: co-designing MPI runtime and Caffe for scalable deep learning on modern GPU clusters, 2017
(4) A. Segeev and M. Del Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018
(5) P. Mendygral. Scaling Deep Learning, 2018
▪ CPE ML Plugin5: Cray Programming Environment Machine Learning Plugin
#SAISEco4

Spark-MPI-Horovod
17
Crate the TF optimizer
Wrap TF with Horovod
Run the Horovod training
on Spark workers
Initialize Horovod and MPI
Extract the MNIST dataset
Build the DL model
…
Run the Horovod MPI-
based training
Initialize the PMI environmental variables
The Horovod MPI-based training framework replaces
the TensorFlow parameter servers with the ring-
allreduce approach for averaging gradients among
TensorFlow workers.
For users, the corresponding integration consists of two
primary steps as illustrated by the script: (1) initializing
Horovod with hvd.init() and (2) wrapping TensorFlow
worker’s optimizer with hvd.DistributedOptimizer().
The Spark-MPI pipelines enable to process the Horovod
training on Spark workers with Map operations. To
establish MPI communication among the Spark workers,
the Map operation (e.g. train()) needs only to define
PMI-related environmental variables (such as
PMIX_RANK and a port number).
#SAISEco4

Deep Reinforcement Learning
19
**R. Nishihara et. al. Real-Time Machine Learning: The Missing Pieces, arXiv 1703.03924, 2017
*A. Nair et al. Massively Parallel Methods for Deep Reinforcement Learning, ICML, 2015
Environment Actor
Parameter
Server
DQN Learner
Replay
Memory
(s,a,r,s’)
(s,a,r,s’)
Agent
argmaxa Q(s, a; q)
(r,s’)
Figure 1: Gorila* (General Reinforcement Learning Architecture)
System Requirements**:
• Low latency
• High throughput
• Dynamic task creation
• Heterogeneous tasks
• Arbitrary dataflow dependencies
• Transparent fault tolerance
• Debuggability and profiling
#SAISEco4

(Some of the) RL Applications*
20
(2) D. Silver et al. Mastering the game of Go with deep neutral networks an tree search, Nature, 2016
(1) V. Mnih et al. Playing Atari with Deep Reinforcement Learning, NIPS, 2013
▪ Atari Games1
▪ AlphaGo2
▪ Robotics
▪ Self-driving vehicles
▪ Autonomous UAVs
…
“Pterodactylus antiquus, the first pterosaur
species to be named and identified as a flying
reptile … 150.8–148.5 million years ago”
(Wikipedia)
range
#SAISEco4

Summary
21
▪ Emerging AI projects represent a paradigm shift from data processing pipelines towards
the fifth paradigm of cognitive knowledge-centric applications.
▪ The new generation of AI composite applications requires the integration of Big Data and
HPC technologies. For example, MPI was originally introduced within the computational
paradigm ecosystem for developing HPC scientific applications. But recently, MPI was
successfully applied for extending the scale of deep learning applications.
▪ Knowledge is a multifacet substance distributed among heterogeneous information
networks and associated processing platforms. The structure and relationship between
different components is dynamic, continuously shaped and consolidated by machine
learning processes.
▪ Spark-MPI addresses this strategic direction by extending the Spark platform with MPI-
based HPC applications using the Process Management Interface (PMI).
#SAISEco4

Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky

More Related Content

What's hot (20)

Similar to Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky (20)

More from Databricks (20)

Recently uploaded (20)

Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky