Red Hat Summit 2018 5 New High Performance Features in OpenShift

5 NEW HIGH-PERFORMANCE
FEATURES IN RED HAT OPENSHIFT
Patterns and technology to run critical, high
performance line-of-business applications on Red
Hat OpenShift Container Platform
Derek Carr, Jeremy Eder
Red Hat Product Engineering
May 2018

ACTUALLY, I HAVE A
LOT OF WORKLOADS
LIKE THIS!

COMMUNITY
sig-node, sig-scheduling, wg-resource-mgmt
● Expand the set of workloads runnable on the platform
● Maintain reliability
● Keep it simple

WORKLOAD TYPES
Going beyond generic web hosting workloads
Big Data
NFV
FSI
Animation
ISVsHPC
Machine
Learning
● Identify requirement overlap
across verticals
● Plumb enhancements
generically
● Allow flexibility

DRILL DOWN ON OVERLAP
Feature FSI NFV ISV BD/ML ANIM HPC
CPU pinning (cpuset) Yes Yes Yes Maybe Maybe Yes
Device passthrough (GPU, NIC, etc.) Yes Yes Yes Yes Maybe Yes
Sysctl support Yes Yes Yes Yes Yes Yes
HugePages Yes Yes Yes Yes Maybe Maybe
NUMA Yes Yes Yes Maybe Maybe Yes
Separate control and data plane Yes Yes Yes Yes Yes Yes
Kernel module loading Yes Yes Yes Maybe Yes Maybe

PROGRESS REPORT
What has been done in the last year?
● CPU manager (static pinning)
● HugePages
● Device Plugins (GPU, etc.)
● Sysctl support
● Extended Resources

RESOURCES AND TUNING OPTIONS
Natively understood
● CPU
● Memory
● Ephemeral storage
● Persistent storage
● HugePages
● Device Plugins
● Extended Resources
Control knobs available
● CPU Manager policy (none, static)
● sysctls (safe / unsafe)

RESOURCE REQUIREMENTS
Describes the compute resources needed by a pod
Limits
● Maximum burst (if available)
Requests
● Minimum amount
(guaranteed)
Overcommit
● Ration of limit to request

QUALITY OF SERVICE
Keep the end-user API simple, let the platform optimize for SLA guarantees
Burstable
● Requests < Limits
Best Effort
● No resource limits
Guaranteed
● Requests = Limits
Beware
● Workload SLA
● Eviction
Future
● QoS is an abstraction to allow kubelets to support different tuning
options in the future for particular resource types while keeping
API simple

CLUSTER TOPOLOGY
Control Plane
Compute Nodes and Storage Tier
Infrastructure
master
and etcd
master
and etcd
master
and etcd
registry
and
router
registry
and
router
LB
registry
and
router

NODE BOOTSTRAPPING
Compute Nodes...
Config Maps
node-compute
node-cpu-bound
node-master
node-highmem
Fetch config from server
● Default kubelet arguments
● Default labels
● Default taints
● Changes are kept in sync
node-config.yaml
(node-highmem)
node-config.yaml
(node-cpu-bound)

CPU
How is it exposed?
● Compressible
● Measured in cores
● Not normalized for clock speed
● Use node labels to differentiate
● Assigned per container

CPU
How is it enforced?
Result
● Distributed across all cores
● Throttling
Completely Fair
Scheduler (CFS)
● Requests via cpu.shares
● Limits via cpu.cfs_quota_us
Challenge
● CPU bound workloads
(cache affinity and
scheduling latency) are
impacted

CPU MANAGEMENT POLICY
Tuning the node for cpu-bound workloads
Supported policies
● none is the default policy (just integrates with CFS)
● static allows containers in Guaranteed pods with integer cpu requests
exclusive CPUs on the node enforced via cpuset cgroup controller
Benefits
● End-user API is simple (kubelet option)
● Increases CPU affinity and decreases context switches
● Kubelet manages local node topology (important when doing devices)
● More dynamic policies could be introduced in the future

DEMO 1 - CPU Pinning
Enable cpu pinning via dynamic node config: Demo
● Inspect node config map, see kubeletArguments
for --cpu-manager-policy=static
● Inspect cpuset.cpus of pod containers assigned
either shared or exclusive cores

HUGE PAGES
Supports the allocation and consumption of pre-allocated huge pages
Scenario
● Large memory
working set sensitive
to TLB misses
(RDBMS, JVM, cache,
packet processors)

HUGE PAGES
Example Pod
Usage
● Pod request
● Node must pre-allocate
● EmptyDir
(medium=hugepages)
● shmget w/ SHM_HUGETLB

DEMO 2 - Pod that requires huge pages
Dynamically pre-allocate huge pages and schedule a pod: Demo
● Deploy DaemonSet to pre-allocate huge pages
● Inspect node allocatable
● Deploy a pod that consumes huge pages

FEATURES 3, 4:
EXTENDED RESOURCES
and
DEVICE PLUGINS

DEMO 3 - Extended Resources
Counting dongles: Demo
● Implementation detail
○ For device plugins
● Industry leading UX!
○ (PATCH via curl)

DEVICE PLUGINS
gRPC service to expose devices to kubelet
Initialization
● Is the device healthy?
Registration
● Register with kubelet
Serving mode
● Monitor device health
● Allocate device

DEMO 4, 5 - GPUs
Consume a GPU in OpenShift: Infrastructure Demo, Multi-GPU Jupyter/Caffe Demo
● Deploy
nvidia-device-plugin
DaemonSet
● Inspect node
allocatable
● Deploy a pod that
consumes a GPU

SYSCTL
The three types
Unsafe
● Experimental Kubelet Flag
● kernel.sem*
● kernel.shm*
● kernel.msg*
● fs.mqueue.*
● net.*
Safe
● Enabled by default
● kernel.shm_rmid_forced
● net.ipv4.ip_local_port_range
● net.ipv4.tcp_syncookies
Node-level
● Can’t set from a pod
● Potentially affects other
pods
● Many interesting sysctls
● Use TuneD
Red Hat is working to graduate this feature to Beta during Kubernetes 1.11 release
● KEP: Promote sysctl annotations to fields
● Feedback welcome!

DEMO 6 - SYSCTL
Usage in a pod: Demo

ROADMAP
Red Hat continues to invest in evolving support
Topic areas
● NUMA
● Co-located device scheduling
● External device monitoring
● Resource API V2

Red Hat Confidential
WORKLOAD COVERAGE: Metal, KVM, Containers
Code Path Coverage
● CPU – linpack, lmbench
● Memory – lmbench, McCalpin STREAM
● Disk IO – iozone, fio
● Networks – netperf – 10/40Gbit,
Infiniband/RoCE, Bypass
Application Performance
● Linpack MPI, HPC workloads
● Database: DB2, Oracle 11/12, Sybase 15.x ,
MySQL, MariaDB, Postgres, MongoDB
● OLTP – TPC-C, TPC-VMS
● DSS – TPC-H/xDS
● Big Data – TPCx-HS, Bigbench
● SPEC cpu, jbb, sfs, virt, cloud
● SAP – SLCS, SD
● STAC = FSI (STAC-N,A2)
● SAS mixed Analytic, SAS grid (gfs2)

plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
THANK YOU

Red Hat Summit 2018 5 New High Performance Features in OpenShift

More Related Content

What's hot (20)

Similar to Red Hat Summit 2018 5 New High Performance Features in OpenShift (20)

Recently uploaded (20)

Red Hat Summit 2018 5 New High Performance Features in OpenShift