SlideShare a Scribd company logo
Ralph H. Castain
Intel
PMIx: Bridging the Container Boundary
Origin: Changing Landscape
Launch time limiting scale
Legion
Programming model &
runtime proliferation
Container technologiesHybrid applications Model-specific tools
• Resolve launch scaling
 Pre-load information
known to RM/scheduler
 Pre-assign
communication endpoints
 Eliminate data exchange
during init
 Orchestrate launch
procedure
Start Someplace!
PMIx Launch Sequence
*RM daemon, mpirun-daemon, etc.
Three Distinct Entities
• PMIx Standard
 Defined set of APIs, attribute strings
 Nothing about implementation
• PMIx Reference Library
 A full-featured implementation of the Standard
 Intended to ease adoption
• PMIx Reference RTE
 Full-featured “shim” to a non-PMIx RM
 Provides development environment
v3.1 just
released!
Where Is It Used?
• Libraries
 OMPI, MPICH, Intel MPI, HPE-MPI,
Spectrum MPI, Fujitsu MPI
 OSHMEM, SOS, OpenSHMEM, …
• RMs
 Slurm, Fujitsu,
IBM’s JSM,
PBSPro (2019), Kubernetes(?)
 Slurm enhancement (LANL/ECP)
• New use-cases
 Spark, TensorFlow
 Debuggers (TotalView, DDT)
 MPI
• Re-ordering for load balance
(UTK/ECP)
• Fault management (UTK)
• On-the-fly session
formation/teardown (MPIF)
 Logging information
 Containers
• Singularity, Docker, Amazon
• Async event notification
• Cross-model notification
 Announce model type, characteristics
 Coordinate resource utilization,
programming blocks
• Generalized tool support
 Co-launch daemons with job
 Forward stdio channels
 Query job, system info, network traffic,
process counters, etc.
 Standardized attachment, launch methods
Build Upon It
• Allocation support
 Dynamically add/remove/loan nodes
 Register pre-emption acceptance,
handshake
• Dynamic process groups
 Async group construct/destruct
 Notification of process departure/failure
• File system integration
 Pre-cache files, specify storage strategies
Sprinkle Some Magic Dust
PMIx-SMS Interactions
RM
PMIx
Client
FS
Fabric
RAS
APP
Orchestration
Requests
Responses
NIC
Fabric
Mgr
PMIx
Server
MPI
OpenMP
Job
Script
System
Management Stack
Tool Support
PMIx-SMS Interactions
RM
PMIx
Client
FS
Fabric
RAS
APP
Orchestration
Requests
Responses
NIC
Fabric
Mgr
PMIx
Server
MPI
OpenMP
Job
Script
System
Management Stack
Tool SupportContainer!
• Version tracking across container boundary
 Different pieces moving at different rates
• Container managers vs HPC schedulers
 Dynamic, service related vs static, application focus
• Uneven adoption rates
 Different environments adopt features at different
times, different combinations
Container Issues
• Auto-negotiate messaging protocol
• Client starts
 Envar indicates server capabilities
 Select highest support in common
 Convey selection in connection
handshake
• Server follows client’s lead
 Per-client messaging protocol
 Support mix of client versions
Version Tracking
+
+
• Version tracking across container boundary
 Different pieces moving at different rates
• Uneven adoption rates
 Different environments adopt features at different times,
different combinations
• Container managers vs HPC schedulers
 Dynamic, service related vs static, application focus
 Mismatched capabilities
Container Issues
• PMIx relay daemon/server
• Integrated into container
• Sense what SMS supports
 From nothing to everything
• Supported requests
 Relay requests/responses
• Unsupported requests
 Execute internally
 Return “not supported”
EPYX
EPYX
• PMIx relay daemon/server
• Integrated into container
• Sense what SMS supports
 From nothing to everything
• Supported requests
 Relay requests/responses
• Unsupported requests
 Execute internally
 Return “not supported”
EPYX
EPYX
*RM can perform request, but doesn’t
have PMIx backend support for it
• Call intrinsic APIs to execute
PMIx requests from client
• Treat the RM as an equal
member of SMS
• Pros
 Allows more transparent movement of
containers across systems
 Reduces obstacles
• Cons
 Reduces pressure on SMS vendors to
integrate
EPYX: Filling the Gaps
Who writes these drivers?
• Allocation request
 Stabilize allocation for some
period of time
• Event notification
 Handshake need to break
commitment
 Notify when restored
 Use new FT/Sessions
methods for flexible members
HPC on Container Mgrs
EPYX
CRI
• Job control API
 Register as a service
 Request auto-restart, multiple replicas
 Setup parallel duplicate IO streams
• IO Forwarding APIs
 Construct data flows between
processes
• Storage APIs
• Publish/Lookup APIs
 Service discovery, rendezvous
Services on HPC Systems
EPYX
Why Enable This?
EPYX
CRI
EPYX
Interchangeable
• Avoid having to write entire runtimes just to do
something a little different
• Portability (HPC Service Mgrs)
• Generalized tools
• Scalable operations
• Async event notification
• Full system orchestration
Summary
Come Join Us!
https://pmix.org https://github.com/pmixSlack: pmix-workspace.slack.com
Useful Links:
General Information: https://pmix.org/
PMIx Library: https://github.com/pmix/pmix
PMIx Reference RunTime Environment (PRRTE): https://github.com/pmix/prrte
PMIx Standard: https://github.com/pmix/pmix-standard
Slack: pmix-workspace.slack.com
Q&A
Ralph H. Castain, Aurelien Bouteiller, Joshua Hursey, David Solt, “PMIx: Process management for exascale
environments”, Parallel Computing, 2018.
https://doi.org/10.1016/j.parco.2018.08.002
Overview Paper

More Related Content

PDF
Bridging the Gap: Connecting AWS and Kafka
PPTX
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...
PPTX
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PPTX
PPTX
Catena
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
PDF
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Bridging the Gap: Connecting AWS and Kafka
Encrypting Kafka messages at rest to secure applications | Robert Barnes, Has...
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Catena
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...

What's hot (20)

PDF
SAP ASE Migration Lessons Learned
PPTX
Fundamentals and Architecture of Apache Kafka
PDF
Cloud computing Fundamentals - behind the hood of cloud platforms
PPTX
Apache Kafka : Monitoring vs Alerting
PDF
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
PPTX
Apache Kafka Best Practices
PDF
Topic and schema management-meetupberlin
PDF
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
PDF
Trending Topics
PDF
SAP ASCS on Kubernetes - A Proposal
PPTX
Introduction to Kafka
PPTX
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
PDF
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
ODP
Kafka aws
PPTX
Kafka at scale facebook israel
PDF
Introduction to apache kafka
PDF
Cloud concepts
PPTX
Building an Event Bus at Scale
SAP ASE Migration Lessons Learned
Fundamentals and Architecture of Apache Kafka
Cloud computing Fundamentals - behind the hood of cloud platforms
Apache Kafka : Monitoring vs Alerting
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Apache Kafka Best Practices
Topic and schema management-meetupberlin
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Trending Topics
SAP ASCS on Kubernetes - A Proposal
Introduction to Kafka
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Kafka aws
Kafka at scale facebook israel
Introduction to apache kafka
Cloud concepts
Building an Event Bus at Scale
Ad

Similar to PMIx: Bridging the Container Boundary (20)

PDF
Pm ix tutorial-june2019-pub (1)
PPTX
EuroMPI 2017 PMIx presentation
PPTX
PMIx Tiered Storage Support
PPTX
SC15 PMIx Birds-of-a-Feather
PPTX
SC'16 PMIx BoF Presentation
PPTX
PMIx Updated Overview
PPTX
HPC Controls Future
PPTX
SC'17 BoF Presentation
PPTX
Exascale Process Management Interface
PPTX
SLUG 2015 PMIx Overview
PPTX
SC'18 BoF Presentation
PDF
On-node resource manager for containerized HPC workloads
PPTX
PMIx: Debuggers and Fabric Support
PPTX
Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017)
PPTX
HPC Resource Management: Futures
PPTX
Open MPI SC'15 State of the Union BOF
PDF
Application Profiling at the HPCAC High Performance Center
PPTX
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
PPT
What is [Open] MPI?
PDF
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Pm ix tutorial-june2019-pub (1)
EuroMPI 2017 PMIx presentation
PMIx Tiered Storage Support
SC15 PMIx Birds-of-a-Feather
SC'16 PMIx BoF Presentation
PMIx Updated Overview
HPC Controls Future
SC'17 BoF Presentation
Exascale Process Management Interface
SLUG 2015 PMIx Overview
SC'18 BoF Presentation
On-node resource manager for containerized HPC workloads
PMIx: Debuggers and Fabric Support
Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017)
HPC Resource Management: Futures
Open MPI SC'15 State of the Union BOF
Application Profiling at the HPCAC High Performance Center
Event Bus as Backbone for Decoupled Microservice Choreography (Oracle Code, A...
What is [Open] MPI?
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Ad

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf

PMIx: Bridging the Container Boundary

  • 1. Ralph H. Castain Intel PMIx: Bridging the Container Boundary
  • 2. Origin: Changing Landscape Launch time limiting scale Legion Programming model & runtime proliferation Container technologiesHybrid applications Model-specific tools
  • 3. • Resolve launch scaling  Pre-load information known to RM/scheduler  Pre-assign communication endpoints  Eliminate data exchange during init  Orchestrate launch procedure Start Someplace!
  • 4. PMIx Launch Sequence *RM daemon, mpirun-daemon, etc.
  • 5. Three Distinct Entities • PMIx Standard  Defined set of APIs, attribute strings  Nothing about implementation • PMIx Reference Library  A full-featured implementation of the Standard  Intended to ease adoption • PMIx Reference RTE  Full-featured “shim” to a non-PMIx RM  Provides development environment v3.1 just released!
  • 6. Where Is It Used? • Libraries  OMPI, MPICH, Intel MPI, HPE-MPI, Spectrum MPI, Fujitsu MPI  OSHMEM, SOS, OpenSHMEM, … • RMs  Slurm, Fujitsu, IBM’s JSM, PBSPro (2019), Kubernetes(?)  Slurm enhancement (LANL/ECP) • New use-cases  Spark, TensorFlow  Debuggers (TotalView, DDT)  MPI • Re-ordering for load balance (UTK/ECP) • Fault management (UTK) • On-the-fly session formation/teardown (MPIF)  Logging information  Containers • Singularity, Docker, Amazon
  • 7. • Async event notification • Cross-model notification  Announce model type, characteristics  Coordinate resource utilization, programming blocks • Generalized tool support  Co-launch daemons with job  Forward stdio channels  Query job, system info, network traffic, process counters, etc.  Standardized attachment, launch methods Build Upon It
  • 8. • Allocation support  Dynamically add/remove/loan nodes  Register pre-emption acceptance, handshake • Dynamic process groups  Async group construct/destruct  Notification of process departure/failure • File system integration  Pre-cache files, specify storage strategies Sprinkle Some Magic Dust
  • 11. • Version tracking across container boundary  Different pieces moving at different rates • Container managers vs HPC schedulers  Dynamic, service related vs static, application focus • Uneven adoption rates  Different environments adopt features at different times, different combinations Container Issues
  • 12. • Auto-negotiate messaging protocol • Client starts  Envar indicates server capabilities  Select highest support in common  Convey selection in connection handshake • Server follows client’s lead  Per-client messaging protocol  Support mix of client versions Version Tracking + +
  • 13. • Version tracking across container boundary  Different pieces moving at different rates • Uneven adoption rates  Different environments adopt features at different times, different combinations • Container managers vs HPC schedulers  Dynamic, service related vs static, application focus  Mismatched capabilities Container Issues
  • 14. • PMIx relay daemon/server • Integrated into container • Sense what SMS supports  From nothing to everything • Supported requests  Relay requests/responses • Unsupported requests  Execute internally  Return “not supported” EPYX EPYX
  • 15. • PMIx relay daemon/server • Integrated into container • Sense what SMS supports  From nothing to everything • Supported requests  Relay requests/responses • Unsupported requests  Execute internally  Return “not supported” EPYX EPYX *RM can perform request, but doesn’t have PMIx backend support for it
  • 16. • Call intrinsic APIs to execute PMIx requests from client • Treat the RM as an equal member of SMS • Pros  Allows more transparent movement of containers across systems  Reduces obstacles • Cons  Reduces pressure on SMS vendors to integrate EPYX: Filling the Gaps Who writes these drivers?
  • 17. • Allocation request  Stabilize allocation for some period of time • Event notification  Handshake need to break commitment  Notify when restored  Use new FT/Sessions methods for flexible members HPC on Container Mgrs EPYX CRI
  • 18. • Job control API  Register as a service  Request auto-restart, multiple replicas  Setup parallel duplicate IO streams • IO Forwarding APIs  Construct data flows between processes • Storage APIs • Publish/Lookup APIs  Service discovery, rendezvous Services on HPC Systems EPYX
  • 20. • Avoid having to write entire runtimes just to do something a little different • Portability (HPC Service Mgrs) • Generalized tools • Scalable operations • Async event notification • Full system orchestration Summary
  • 21. Come Join Us! https://pmix.org https://github.com/pmixSlack: pmix-workspace.slack.com
  • 22. Useful Links: General Information: https://pmix.org/ PMIx Library: https://github.com/pmix/pmix PMIx Reference RunTime Environment (PRRTE): https://github.com/pmix/prrte PMIx Standard: https://github.com/pmix/pmix-standard Slack: pmix-workspace.slack.com Q&A
  • 23. Ralph H. Castain, Aurelien Bouteiller, Joshua Hursey, David Solt, “PMIx: Process management for exascale environments”, Parallel Computing, 2018. https://doi.org/10.1016/j.parco.2018.08.002 Overview Paper