SlideShare a Scribd company logo
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING1
Enabling GPU-as-a-Service
Providers with Red Hat OpenShift
@jeremyeder
Senior Principal Software Engineer, Red Hat
March, 2018
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Agenda
● OpenShift Cluster Overview
● Infrastructure Abstraction
● High Performance Features
● GPU Overview
2
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Community Powered Innovation
3
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
What does an OpenShift Cluster look like?
SERVICE LAYER
ROUTING LAYER
PERSISTENT
STORAGE
REGISTRY
RHEL
NODE
C
C
RHEL
NODE
C C
RHEL
NODE
c
C
C
RHEL
NODE
C C
RHEL
NODE
C
RHEL
NODE
C
RED HAT
ENTERPRISE LINUX
MASTER
API/AUTHENTICATION
DATA STORE
SCHEDULER
HEALTH/SCALING
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
4
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Abstract away any infrastructure
SERVICE LAYER
ROUTING LAYER
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
● Bare Metal
● RHV
● OpenStack
● VMware
● GCE
● Azure
● AWS
● BYO nodes...
5
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 6
One Platform to...
OpenShift is the single platform
to run any application:
● Old or new
● Monolithic/Microservice
Big Data
NFV
FSI
Animation
ISVsHPC
Machine
Learning
6
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 7
High Performance RFEs by Vertical
Feature FSI NFV ISV BD/ML ANIM HPC
NUMA (cpuset.cpus and cpuset.mems) Yes Yes Yes Maybe Maybe Yes
Device Passthrough (NIC/Disk/GPU etc...) Yes Yes Yes Maybe Maybe Yes
sysctl Support (non-namespaced too) Yes Yes Yes Yes Yes Yes
Separation of control- and data-plane Yes Yes Yes Yes Yes Yes
Node “fitness” (extended health info) Yes Yes Maybe Maybe Maybe Yes
Multi-homed pods Yes Yes Maybe Yes Yes Yes
Kernel Modules (DKMS-ish) Yes Yes Maybe Maybe Yes Maybe
Hugepages Yes Yes Yes Yes Maybe Maybe
7
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Enable containerization of Infrastructure Software
● Software-defined Storage and Networking
● Packet switching and routing tiers
● Multi-workloads (very different) within a single cluster
○ Layered schedulers (HPC/grid)
● Many more...
Why do this?
8
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
● Gluster/Container Native Storage
● Ceph
● OpenStack
● rad analytics
● KubeVirt
Enable containerization of Red Hat’s products
9
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
● Resource Management Working Group
○ Features Delivered
■ Device Plugins (GPU/Bypass/FPGA)
■ CPU Manager (exclusive cores)
■ Huge Pages Support
○ Extensive Roadmap
● Intel, IBM, Google, NVIDIA, Red Hat, many more...
Upstream First: Kubernetes Working Groups
10
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
● Network Plumbing Working Group
○ Formalized Dec 2017
● Goal is to implement an out of tree, pseudo-standard collection of
CRDs for multiple networks, owned by sig-network, *out of tree*
● Separate control- and data-plane, Overlapping IPs, Fast Data-plane
● IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least.
Upstream First: Kubernetes Working Groups
11
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
GPU CLUSTER TOPOLOGY
12
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Control Plane
Compute Nodes and Storage Tier
Infrastructure
master
and etcd
master
and etcd
master
and etcd
registry
and
router
registry
and
router
LB
registry
and
router
OpenShift Cluster Topology
13
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
● How to enable software to take advantage of “special”
hardware
● Create Node Pools
○ Mark them as “special”
○ Taints/Tolerations
○ ExtendedResourceTole
ration
OpenShift Cluster Topology
14
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
● How to enable software to take advantage of “special”
hardware
● Tune/Configure the OS
○ Tuned Profiles
○ CPU Isolation
○ sysctls
OpenShift Cluster Topology
15
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Unsafe
● Experimental Kubelet Flag
● kernel.sem*
● kernel.shm*
● kernel.msg*
● fs.mqueue.*
● net.*
In OpenShift, there are three “types” of sysctls
Safe
● Enabled by default
● kernel.shm_rmid_forced
● net.ipv4.ip_local_port_range
● net.ipv4.tcp_syncookies
Node-level
● Can’t set from a pod
● Potentially affects other
pods
● Many interesting sysctls
● Use TuneD
16
OpenShift Cluster Topology
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
● How to enable software to take advantage of “special”
hardware
● Optimize your workload
○ Dedicate CPU cores
○ Consume hugepages
OpenShift Cluster Topology
17
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
● How to enable software to take advantage of “special”
hardware
● Enable the Hardware
○ Install drivers
○ Deploy Device Plugin
OpenShift Cluster Topology
18
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Compute Nodes...
● How to enable software to take advantage of “special”
hardware
● Consume the Device
○ KubeFlow Template
deployment
OpenShift Cluster Topology
19
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Kubernetes Deployment for STAC-A2
● All-in-One Kubernetes Installation
● (hack/local-up-cluster.sh)
● Node labeled
● Containers:
○ RHEL7+CUDA9
○ RHEL7+CUDA9+DEVICE-PLUGIN
○ RHEL7+CUDA9+STAC-A2
● CUDA 9
● 8 x NVIDIA Tesla V100 (Volta) GPUs
● HPE Apollo 6500 w/XL270d Gen9
● Red Hat Enterprise Linux 7.4
● Kubernetes 1.8 (setup info)
● nvidia-smi
--applications-clocks=877,1380
● https://rhelblog.redhat.com/2017/11/21/red-hat-and-partners-deliver-new-perf
ormance-records-on-prominent-risk-analytics-benchmark/
● https://news.developer.nvidia.com/a-new-stac-a2-record/
20
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 21
Kubernetes Deployment for STAC-A2
Volta GPU Kubelet
Device Plugin
(daemonset)
Kube Scheduler
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Benchmark (pod)
resources:
limits:
nvidia.com/gpu: 8
kubectl create
21
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Benchmark (pod)
resources:
limits:
nvidia.com/gpu: 8
22
Kubernetes Deployment for STAC-A2
Volta GPU Kubelet
Device Plugin
(daemonset)
Kube Scheduler
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
Volta GPU
kubectl create
22
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
● Early KubeFlow involvement
● radanalytics templates for ML-workflow on OpenShift
● Machine-Learning OpenShift Commons
● Demo Repositories
○ https://github.com/zvonkok/nvidia-k8s
○ https://github.com/redhat-performance/openshift-psap
Recent GPU-related work on OpenShift
23
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
24
JEREMY EDER - RED HAT PERFORMANCE ENGINEERING
Commoditizing GPU-as-a-Service Providers with Red Hat OpenShift
Tuesday, Mar 27, 1:00 PM - 1:25 PM, Room 210E
Red Hat OpenShift Container Platform, with Kubernetes at it's core, can play an
important role in building flexible hybrid cloud infrastructure. By abstracting
infrastructure away from developers, workloads become portable across any
cloud. With NVIDIA Volta GPUs now available in every public cloud [1], as well as
from every computer maker, an abstraction library like OpenShift becomes even
more valuable. Through demonstrations, this session will introduce you to
declarative models for consuming GPUs via OpenShift, as well as the two-level
scheduling decisions that provide fast placement and stability.
25

More Related Content

PDF
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
PDF
Red Hat Summit 2018 5 New High Performance Features in OpenShift
PDF
Rhel8 Beta - Halifax RHUG
PDF
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
PDF
Best practices for optimizing Red Hat platforms for large scale datacenter de...
PDF
NVIDIA GTC 2018: Spectre/Meltdown Impact on High Performance Workloads
PDF
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
PDF
A Container Stack for Openstack - OpenStack Silicon Valley
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Rhel8 Beta - Halifax RHUG
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
Best practices for optimizing Red Hat platforms for large scale datacenter de...
NVIDIA GTC 2018: Spectre/Meltdown Impact on High Performance Workloads
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
A Container Stack for Openstack - OpenStack Silicon Valley

What's hot (20)

PDF
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
PDF
ceph openstack dream team
PDF
02 ai inference acceleration with components all in open hardware: opencapi a...
PDF
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
PDF
Patroni: Kubernetes-native PostgreSQL companion
PDF
Deploying OpenNebula in an HPC environment
PDF
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
PDF
OSCON 2017: To contain or not to contain
PDF
Ceph Tech Talk: Ceph at DigitalOcean
PDF
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
PDF
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
PDF
DevOpsDays Taipei 2017 從打鐵到雲端
PDF
qCUDA-ARM : Virtualization for Embedded GPU Architectures
PDF
Ceph Day Beijing- Ceph Community Update
PDF
Deploying PostgreSQL on Kubernetes
PPTX
Running OpenEBS on GPDs - Weekly Contributors Meet 28th Sep 2018
PDF
OpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
PDF
kpatch.kgraft
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
Make Accelerator Pluggable for Container Engine
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
ceph openstack dream team
02 ai inference acceleration with components all in open hardware: opencapi a...
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
Patroni: Kubernetes-native PostgreSQL companion
Deploying OpenNebula in an HPC environment
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
OSCON 2017: To contain or not to contain
Ceph Tech Talk: Ceph at DigitalOcean
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
DevOpsDays Taipei 2017 從打鐵到雲端
qCUDA-ARM : Virtualization for Embedded GPU Architectures
Ceph Day Beijing- Ceph Community Update
Deploying PostgreSQL on Kubernetes
Running OpenEBS on GPDs - Weekly Contributors Meet 28th Sep 2018
OpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
kpatch.kgraft
Let's turn your PostgreSQL into columnar store with cstore_fdw
Make Accelerator Pluggable for Container Engine
Ad

Similar to NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift (20)

PDF
High%20Level%20-%20OpenShift%204%20Technical%20Deep%20Dive%20-%202024%20-%20I...
PDF
Testing kubernetes and_open_shift_at_scale_20170209
PDF
CNCF Meetup - OpenShift Overview
PDF
RHTE2015_CloudForms_Containers
PDF
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
PDF
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
PPTX
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
PDF
Red Hat OpenShift & CoreOS by Ludovic Aelbrecht, Senior Solution Architect at...
PDF
OpenStack Benelux Conference 2014 | Plenair | RedHat
PPTX
Kubernetes and OpenStack at Scale
PDF
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
PDF
Open shift 4-update
PDF
Locationless data science on a modern secure edge
PPTX
Painless containerization in your very own private Cloud
PDF
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
PDF
Red hat's updates on the cloud & infrastructure strategy
PDF
erwdsffdsfdfsfdsfdfdsfsfdsfdsfdsfsdfdsfsdfsd
PDF
OpenShift 4 installation
PDF
Openshift Container Platform: First ItalyMeetup
PDF
Leveraging IoT as part of your digital transformation
High%20Level%20-%20OpenShift%204%20Technical%20Deep%20Dive%20-%202024%20-%20I...
Testing kubernetes and_open_shift_at_scale_20170209
CNCF Meetup - OpenShift Overview
RHTE2015_CloudForms_Containers
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Red Hat OpenShift & CoreOS by Ludovic Aelbrecht, Senior Solution Architect at...
OpenStack Benelux Conference 2014 | Plenair | RedHat
Kubernetes and OpenStack at Scale
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
Open shift 4-update
Locationless data science on a modern secure edge
Painless containerization in your very own private Cloud
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Red hat's updates on the cloud & infrastructure strategy
erwdsffdsfdfsfdsfdfdsfsfdsfdsfdsfsdfdsfsdfsd
OpenShift 4 installation
Openshift Container Platform: First ItalyMeetup
Leveraging IoT as part of your digital transformation
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
A Presentation on Artificial Intelligence
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
A Presentation on Artificial Intelligence
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift

  • 1. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING1 Enabling GPU-as-a-Service Providers with Red Hat OpenShift @jeremyeder Senior Principal Software Engineer, Red Hat March, 2018
  • 2. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Agenda ● OpenShift Cluster Overview ● Infrastructure Abstraction ● High Performance Features ● GPU Overview 2
  • 3. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Community Powered Innovation 3
  • 4. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING What does an OpenShift Cluster look like? SERVICE LAYER ROUTING LAYER PERSISTENT STORAGE REGISTRY RHEL NODE C C RHEL NODE C C RHEL NODE c C C RHEL NODE C C RHEL NODE C RHEL NODE C RED HAT ENTERPRISE LINUX MASTER API/AUTHENTICATION DATA STORE SCHEDULER HEALTH/SCALING PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID 4
  • 5. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Abstract away any infrastructure SERVICE LAYER ROUTING LAYER PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID ● Bare Metal ● RHV ● OpenStack ● VMware ● GCE ● Azure ● AWS ● BYO nodes... 5
  • 6. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 6 One Platform to... OpenShift is the single platform to run any application: ● Old or new ● Monolithic/Microservice Big Data NFV FSI Animation ISVsHPC Machine Learning 6
  • 7. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 7 High Performance RFEs by Vertical Feature FSI NFV ISV BD/ML ANIM HPC NUMA (cpuset.cpus and cpuset.mems) Yes Yes Yes Maybe Maybe Yes Device Passthrough (NIC/Disk/GPU etc...) Yes Yes Yes Maybe Maybe Yes sysctl Support (non-namespaced too) Yes Yes Yes Yes Yes Yes Separation of control- and data-plane Yes Yes Yes Yes Yes Yes Node “fitness” (extended health info) Yes Yes Maybe Maybe Maybe Yes Multi-homed pods Yes Yes Maybe Yes Yes Yes Kernel Modules (DKMS-ish) Yes Yes Maybe Maybe Yes Maybe Hugepages Yes Yes Yes Yes Maybe Maybe 7
  • 8. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Enable containerization of Infrastructure Software ● Software-defined Storage and Networking ● Packet switching and routing tiers ● Multi-workloads (very different) within a single cluster ○ Layered schedulers (HPC/grid) ● Many more... Why do this? 8
  • 9. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING ● Gluster/Container Native Storage ● Ceph ● OpenStack ● rad analytics ● KubeVirt Enable containerization of Red Hat’s products 9
  • 10. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING ● Resource Management Working Group ○ Features Delivered ■ Device Plugins (GPU/Bypass/FPGA) ■ CPU Manager (exclusive cores) ■ Huge Pages Support ○ Extensive Roadmap ● Intel, IBM, Google, NVIDIA, Red Hat, many more... Upstream First: Kubernetes Working Groups 10
  • 11. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING ● Network Plumbing Working Group ○ Formalized Dec 2017 ● Goal is to implement an out of tree, pseudo-standard collection of CRDs for multiple networks, owned by sig-network, *out of tree* ● Separate control- and data-plane, Overlapping IPs, Fast Data-plane ● IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least. Upstream First: Kubernetes Working Groups 11
  • 12. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING GPU CLUSTER TOPOLOGY 12
  • 13. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Control Plane Compute Nodes and Storage Tier Infrastructure master and etcd master and etcd master and etcd registry and router registry and router LB registry and router OpenShift Cluster Topology 13
  • 14. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Compute Nodes... ● How to enable software to take advantage of “special” hardware ● Create Node Pools ○ Mark them as “special” ○ Taints/Tolerations ○ ExtendedResourceTole ration OpenShift Cluster Topology 14
  • 15. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Compute Nodes... ● How to enable software to take advantage of “special” hardware ● Tune/Configure the OS ○ Tuned Profiles ○ CPU Isolation ○ sysctls OpenShift Cluster Topology 15
  • 16. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Unsafe ● Experimental Kubelet Flag ● kernel.sem* ● kernel.shm* ● kernel.msg* ● fs.mqueue.* ● net.* In OpenShift, there are three “types” of sysctls Safe ● Enabled by default ● kernel.shm_rmid_forced ● net.ipv4.ip_local_port_range ● net.ipv4.tcp_syncookies Node-level ● Can’t set from a pod ● Potentially affects other pods ● Many interesting sysctls ● Use TuneD 16 OpenShift Cluster Topology
  • 17. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Compute Nodes... ● How to enable software to take advantage of “special” hardware ● Optimize your workload ○ Dedicate CPU cores ○ Consume hugepages OpenShift Cluster Topology 17
  • 18. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Compute Nodes... ● How to enable software to take advantage of “special” hardware ● Enable the Hardware ○ Install drivers ○ Deploy Device Plugin OpenShift Cluster Topology 18
  • 19. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Compute Nodes... ● How to enable software to take advantage of “special” hardware ● Consume the Device ○ KubeFlow Template deployment OpenShift Cluster Topology 19
  • 20. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Kubernetes Deployment for STAC-A2 ● All-in-One Kubernetes Installation ● (hack/local-up-cluster.sh) ● Node labeled ● Containers: ○ RHEL7+CUDA9 ○ RHEL7+CUDA9+DEVICE-PLUGIN ○ RHEL7+CUDA9+STAC-A2 ● CUDA 9 ● 8 x NVIDIA Tesla V100 (Volta) GPUs ● HPE Apollo 6500 w/XL270d Gen9 ● Red Hat Enterprise Linux 7.4 ● Kubernetes 1.8 (setup info) ● nvidia-smi --applications-clocks=877,1380 ● https://rhelblog.redhat.com/2017/11/21/red-hat-and-partners-deliver-new-perf ormance-records-on-prominent-risk-analytics-benchmark/ ● https://news.developer.nvidia.com/a-new-stac-a2-record/ 20
  • 21. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING 21 Kubernetes Deployment for STAC-A2 Volta GPU Kubelet Device Plugin (daemonset) Kube Scheduler Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Benchmark (pod) resources: limits: nvidia.com/gpu: 8 kubectl create 21
  • 22. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Benchmark (pod) resources: limits: nvidia.com/gpu: 8 22 Kubernetes Deployment for STAC-A2 Volta GPU Kubelet Device Plugin (daemonset) Kube Scheduler Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU Volta GPU kubectl create 22
  • 23. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING ● Early KubeFlow involvement ● radanalytics templates for ML-workflow on OpenShift ● Machine-Learning OpenShift Commons ● Demo Repositories ○ https://github.com/zvonkok/nvidia-k8s ○ https://github.com/redhat-performance/openshift-psap Recent GPU-related work on OpenShift 23
  • 24. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews 24
  • 25. JEREMY EDER - RED HAT PERFORMANCE ENGINEERING Commoditizing GPU-as-a-Service Providers with Red Hat OpenShift Tuesday, Mar 27, 1:00 PM - 1:25 PM, Room 210E Red Hat OpenShift Container Platform, with Kubernetes at it's core, can play an important role in building flexible hybrid cloud infrastructure. By abstracting infrastructure away from developers, workloads become portable across any cloud. With NVIDIA Volta GPUs now available in every public cloud [1], as well as from every computer maker, an abstraction library like OpenShift becomes even more valuable. Through demonstrations, this session will introduce you to declarative models for consuming GPUs via OpenShift, as well as the two-level scheduling decisions that provide fast placement and stability. 25