Scheduling a Kubernetes Federation with Admiralty

Scheduling a Kubernetes Federation
with Admiralty
OSG All-Hands Meeting 2020 - USCMS-USATLAS Session
September 2020

PRP, Nautilus and Kubernetes
• The Pacific Research Platform (PRP) has been using Kubernetes
since 2016
• Started as a way to conveniently schedule
network test services
• Evolved in being a convenient
platform for ML research
• OSG has had a CE gathering
opportunistic cycles
for over a year now
• As well as orchestrating some
of its services
e.g. StashCache and Frontends
100G NVMe 6.4TB
Caltech
40G 160TB HPWREN
40G 160TB
4 FIONA8s*
Calit2/UCI
35 FIONA2s
17 FIONA8s
2x40G 160TB HPWREN
UCSD
100G Epyc NVMe
100G Gold NVMe
27 FIONA8s + 5 FIONA8s
SDSC @ UCSD
1 FIONA8
40G 160TB
UCR 40G 160TB
USC
100G NVMe 6.4TB
2x40G 160TB
UCLA
1 FIONA8
2x40G 160TB
Stanford U
2 FIONA8s
40G 192TB
UCSB
4.5 FIONA8s
100G NVMe 6.4TB
40G 160TB
UCSC
Connected by PRP’s Use of CENIC 100G Network and Its National and
International Partner networks: PRP’s Nautilus Hypercluster
10 FIONA2s
2 FIONA8
40G 160TB
UCM
32-Location Nautilus Cluster:
6918 CPU Cores on 187 Hosts
2.1 PB Storage
550 GPUs
40G 160TB HPWREN
100G NVMe 6.4TB
1 FIONA8* 2 FIONA4s
FPGAs + 2PB BeeGFS
SDSU
PRP Disks
10G 3TB
CSUSB
Minority Serving
Institution
CHASE-CI
100G 48TB
NPS
40G 192TB
USD

Why federation?
• PRP/Nautilus has been steadily growing
• It now has nodes also in Asia, Europe and Australia
• While successful, we do understand not everyone will want to join the club
• Separate administration domains
• We even have the use case at UCSD
• PRP Nautilus and SDSC Expanse will operate separately,
but will work together through federation
• Multiple platforms
• PRP has an IoT component, where ARM CPUs rule
• Having a dedicated ARM k3s and federating with it ended being simpler

Driving principles
• We wanted a “native Kubernetes” solution
• I.e. kubectl should be all that the user needs
• We did not want a centralized solution
• All participating Kubernetes clusters should be on equal playing field
• Each Kubernetes cluster should be able to participate
in any number of federations
• We did not want to do any development ourselves
• Helping with testing OK
• Occasional patch OK
• But no long-term maintenance

Admiralty’s Multicluster-Scheduler
https://admiralty.io

Admiralty’s Multicluster-Scheduler

Admiralty on Nautilus
• Currently running 0.10.0-rc1
• Have been federating with
• ARM-based k3s
• PacificWave Kubernetes cluster
• Google Cloud Kubernetes cluster
• Kubernetes Cluster inside Azure
• Getting ready to federate with
• Expanse’s Kubernetes partition
• A Windows-based Kubernetes cluster

Installing Admiralty
• Pretty well documented in github:
https://github.com/admiraltyio/multicluster-scheduler/tree/v0.10.0-rc.1
• Source and target cluster both need Admiralty installed
helm install cert-manager …
helm install multicluster-scheduler admiralty/multicluster-scheduler …
• Create secret in target cluster and propagate to source cluster
(targer) kubemcsa export -n klum c1 --as c2 >s.yaml
(source) kubectl -n admiralty apply -f s.yaml
• Whitelist target cluster in source cluster (helm update …)
• You are pretty much good to go!
• Pods in source cluster just need to add an annotation
metadata:
annotations:
multicluster.admiralty.io/elect: ""

• Admiralty creates a set of new resource types
• Target clusters can be seen as virtual nodes

• We have been mostly using one-way federation
• Nautilus as source, others as targets
• Nautilus can easily be the target, too
• Admiralty allows for arbitrary mesh
• Federation with SDSC Expanse is expected to be both ways

Scheduling to target clusters
• Admiralty’s Multicluster-Scheduler is a real Kubernetes scheduler
• Users do not get to pick explicitly the target
• Offload happens based on standard requirements and preferences
• Users just have to opt-in
• When there are nodes in multiple possible clusters that match
• Admiralty will consider only clusters that have free matching nodes
• Which target cluster will be picked is (mostly) non-deterministic
• If no target clusters have any available matching nodes,
the pod remains pending in the source cluster (only)
• Priorities and preemption work as you would expect them to

Scheduling to target clusters
Under the hood,
uses the standard
k8s filtering and
scoring
mechanisms
https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/#kube-scheduler-implementation

Other features
• Admiralty has several other features we have not explored yet
• Three potentially interesting options:
• Multi-cluster services, using
load-balancing across a Cilium cluster mesh
• Identity federation (instead of shared secrets)
• Federation with Targets lacking a public IP (reversed connectivity)

Conclusion
• Admiralty has been in use in the PRP k8s cluster/Nautilus
for some time now
• Works as advertised for our main use cases
• We are planning to use it to expand to more clusters in the future
September 2020

Acknowledgments
• This work was partially funded by the
US National Science Foundation (NSF)
under grants OAC-1826967, OAC-1541349,
MPS-1148698 and OAC-1841530.
September 2020

Scheduling a Kubernetes Federation with Admiralty

More Related Content

What's hot (20)

Similar to Scheduling a Kubernetes Federation with Admiralty (20)

More from Igor Sfiligoi (20)

Recently uploaded (20)

Scheduling a Kubernetes Federation with Admiralty