SlideShare a Scribd company logo
© 2023 Akamai
1
Power and protect
life online
© 2023 Akamai
2
Crossplane and a story
about scaling Kubernetes
custom resources
Richa’rd [mhmxs] Kova’cs
© 2023 Akamai
3
Boring Slide
• At work
○ Staff Kubernetes Engineer
○ @Akamai > Linode
○ Go microservices, Kubernetes controllers
• At IT space
○ Many years of DevOps and automation
○ Several years of Go, Java … who counts
○ OSS devotee
○ Known as @mhmxs
© 2023 Akamai
4
Long time ago on a cluster far, far away….
Agenda
• Shallow dive into Kubernetes
• What are custom resources good for?
• What the heck is a Kubernetes operator?
• Crossplane in a nutshell
• Scaling characteristics
• Scaling in action
© 2023 Akamai
5
Shallow dive into
Kubernetes
© 2023 Akamai
6
What is Kubernetes?
Kubernetes, is an open-source system for automating deployment, scaling, and
management of containerized* applications.
It’s like an operating system for usually more than one computers.
● Regular “processes” are groups of containers.
*Containers are lightweight, portable, and isolated execution environments that encapsulate applications and their dependencies.
© 2023 Akamai
7
K8OS
Resource Management
Both allocate and manage resources like CPU, memory, and storage to ensure optimal performance.
Process Isolation
Both systems provide a level of process isolation to prevent conflicts and ensure applications run
independently.
Scheduling
Both systems implement scheduling algorithms to efficiently manage the execution of
processes or containers.
Security and AuthZ
Both focus on security measures, controlling access to resources, and maintaining the integrity of
the system.
Lifecycle Management
Both Kubernetes and traditional operating systems oversee the complete lifecycle of applications.
Similarities
© 2023 Akamai
8
Declarative
Kubernetes follows a declarative approach, where users specify the desired state of their applications.
The platform then automatically works to ensure the actual state aligns with this defined configuration.
Controllers are control loop processes that continuously work to maintain the desired state of resources by
reconciling the actual state with the specified configuration.
© 2023 Akamai
9
Architecture
src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
© 2023 Akamai
10
Architecture
src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
© 2023 Akamai
11
Architecture
src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
© 2023 Akamai
12
Average backend service
Managed Resources
Pod
© 2023 Akamai
13
Average backend service
Managed Resources
Deployment
Pod
© 2023 Akamai
14
Average backend service
Managed Resources
Service
Deployment
Pod
© 2023 Akamai
15
Average backend service
Managed Resources
Gateway
Service
Deployment
Pod
© 2023 Akamai
16
What are custom
resources good for?
© 2023 Akamai
17
Custom Resources
● Custom Resources (CRs) are extensions to the Kubernetes API.
● Explaining the structure of a Custom Resource Definition (CRD).
○ Basic validation
○ Validation webhook
● The Kubernetes API Server primarily functions as a data store.
○ Allowing users to manage their own custom objects (CRUD).
○ Only takes care on desired state.
© 2023 Akamai
18
Custom Resource Definitions
© 2023 Akamai
19
Custom Resource Definitions
© 2023 Akamai
20
Custom Resource Definitions
© 2023 Akamai
21
Custom Resource Definitions
© 2023 Akamai
22
Custom Resource Definitions
© 2023 Akamai
23
What the heck is a
Kubernetes operator?
© 2023 Akamai
24
Kubernetes Operators
● Group of custom controllers to ensure desired state of Core nor Custom
Resources.
● A Kubernetes Operator is a method of packaging, deploying, and
managing applications on top of Kubernetes.
● It extends the Kubernetes API to create, configure, and manage
instances of complex applications.
● Actively watches resources via the Kubernetes API server for real-time
monitoring of changes and deviations in the system's state.
© 2023 Akamai
25
Crossplane in a nutshell
© 2023 Akamai
26
Crossplane
in a nutshell: Crossplane is an operator framework on steroids.
Some random points:
● Operator for external resources.
● Uses Kubernetes as backend.
● Provides CRs as API for everything.
● Able to bundle complex applications into one package.
● Combine multiple CRDs into one.
● Advanced RBAC support by design.
● Examples: Kubernetes, Helm, Linode (beta), Terraform, Argo CD, CAPI, …
○ Marketplace: https://marketplace.upbound.io/
© 2023 Akamai
27
Architecture
© 2023 Akamai
28
Architecture
Crossplane in a nutshell
https://github.com/linode/provider-ceph
© 2023 Akamai
29
Scaling characteristics
© 2023 Akamai
30
Crossplane
It is a plain old controller
© 2023 Akamai
31
Kubernetes
No more than 150,000 total pods
No more than 300,000 total containers
No more than 5,000 nodes
No more than 110 pods per node
© 2023 Akamai
32
Dimensions
Number of CRDs
Each Custom Resource Definitions (CRD) contributes to
the overall complexity of the API server, potentially
impacting its performance and responsiveness.
Expanding the variety of CRDs often entails the creation
of more controllers, adding to the management
overhead within the Kubernetes cluster.
API discovery becomes pivotal to manage the growing
complexity and ensure streamlined access to extended
functionalities.
Number of CRs
As the quantity of custom resources grows,
considerations must be made regarding resource
utilization, such as storage, network overhead,
memory usage, and processing power.
The larger the number of controller instances, the more
significant the impact on the cluster's operational
aspects, including scheduling, monitoring, and
event handling.
Throughput
© 2023 Akamai
33
Horizontal
Number of CRDs
● Problems tend to start happening once you’re over - very approximately - 500 CRDs.
● Some Crossplane providers should have 800+ of CRDs.
○ Crossplane have a new feature for partial deployment of CRDs.
● Take care on client side rate limits, most importantly burst, to avoid overloading Kubernetes API server.
● Some clients [kubectl] are designed that there won’t be more than 50-100 CRDs installed on a cluster.
● Use Kubernetes v1.26.0+ [https://blog.upbound.io/scaling-kubernetes-to-thousands-of-crds]
● Count with longer startups and heavy CPU and memory peaks of Kube API server.
© 2023 Akamai
34
Vertical
Number of CRs
ETCD data store
● Storage speed is critical.
● How to go further:
○ Kine, a storage plugin designed for providing an efficient and flexible storage backend for container orchestration.
■ Only a few databases are supported jet …
■ Implement your own database integration.
■ Implement routing logic to distribute load.
● ETCD cluster per namespace
● ETCD cluster per resource
© 2023 Akamai
35
Vertical
Number of CRs
Operator side
● Memory would be the main bottleneck.
● Configure rate limits, bursts and timeouts of clients.
● Retry failed actions to reduce CPU load.
● Find and schedule best cache sync periods.
● Filter watch resources by label. - PAUSE
● Count with longer startups.
● Forget leader elected operators.
● Ensure best reconciliation concurrency.
© 2023 Akamai
36
Vertical
Number of CRs
Kubernetes API server
● Network bandwidth
○ # of active connections. - watchers everywhere
● CPU and Memory - infinite if possible.
● Only a few configuration options available.
○ watch-cache [on/off]
○ event-ttl [1h]
○ endpoint-reconciler-type [lease]
○ enable-aggregator-routing [EP IP]
○ http2-max-streams-per-connection
○ max-[mutating-]requests-inflight
© 2023 Akamai
37
Vertical
Number of CRs
Kubernetes API server
● How to go further:
○ Bring your own cluster.
○ Scale vertically, there is only one leader.
○ Running a proxy in front of the API server.
○ Delegating load via Aggregation API.
© 2023 Akamai
38
Scaling in action
© 2023 Akamai
39
Overall memory footprint
© 2023 Akamai
40
Overall memory footprint
© 2023 Akamai
41
Overall memory footprint
© 2023 Akamai
42
Overall memory footprint
© 2023 Akamai
43
This is sample text. This is sample text. This is sample text. This is sample text.
The main reason for watch cache
is to speed up list operations …
© 2023 Akamai
44
Summary
● Kubernetes API server leader is SPOF.
● Kubernetes API server doesn’t scale well.
● Object reconciliation pause logic is mandatory.
● Millions of custom resources needs ~infinite resources.
● Some clients may not tolerate huge datasets or large timeouts.
● Pretty easy to kill Kubernetes API server via unfiltered list operations.
● Middle tier performance
○ requires lots of care and understanding.
○ needs complex architecture.
○ is impossible without custom components and business logic.
© 2023 Akamai
45

More Related Content

PDF
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
PDF
BATbern53 ETHZ Rethinking Cluster State Management for Lightweight Function a...
PDF
Keep Your Kafka Cloud Costs in Check with Showbacks
PDF
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
PDF
Lc3 beijing-june262018-sahdev zala-guangya
PDF
Cloud Native Camel Design Patterns
PDF
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
PPTX
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
BATbern53 ETHZ Rethinking Cluster State Management for Lightweight Function a...
Keep Your Kafka Cloud Costs in Check with Showbacks
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
Lc3 beijing-june262018-sahdev zala-guangya
Cloud Native Camel Design Patterns
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...

Similar to Crossplane and a story about scaling Kubernetes custom resources.pdf (20)

PDF
Kubernetes for Beginners
PDF
AppRunner DeepDive
PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
PDF
Boyan Krosnov - Building a software-defined cloud - our experience
PDF
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
PPTX
IBM Cloud Integration Platform High Availability - Integration Tech Conference
PPTX
Multi-Arch Infra From the Ground Up.pptx
PPTX
Cloud computing benefits
PDF
Microservices Development - ICP Workshop Batch II
PPTX
CloudExpo NYC - Citrix Cloud Platforms Best Practices for Architecting Your C...
PPTX
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
PDF
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
PDF
AnyMind Group Tech Talk - Microservices architecture with AWS
PDF
Microservices Architecture with AWS @ AnyMind Group
PDF
Achieving Scalability and speed with IBM Solutions - IaaS Softlayer
PPTX
Hybrid cloud openstack meetup
PPTX
Kubernetes for Beginners: An Introductory Guide
PDF
Running Kubernetes on OpenStack
PDF
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
PDF
JITServerTalk-OSS-2023.pdf
Kubernetes for Beginners
AppRunner DeepDive
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Boyan Krosnov - Building a software-defined cloud - our experience
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
IBM Cloud Integration Platform High Availability - Integration Tech Conference
Multi-Arch Infra From the Ground Up.pptx
Cloud computing benefits
Microservices Development - ICP Workshop Batch II
CloudExpo NYC - Citrix Cloud Platforms Best Practices for Architecting Your C...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
AnyMind Group Tech Talk - Microservices architecture with AWS
Microservices Architecture with AWS @ AnyMind Group
Achieving Scalability and speed with IBM Solutions - IaaS Softlayer
Hybrid cloud openstack meetup
Kubernetes for Beginners: An Introductory Guide
Running Kubernetes on OpenStack
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
JITServerTalk-OSS-2023.pdf
Ad

More from Richárd Kovács (6)

PDF
Discoblocks.pptx.pdf
PDF
eBPF in the view of a storage developer
PDF
Golang dot-testing-lite
PPTX
I wanna talk about nsenter
PDF
First impression of the new cloud native programming language ballerina
PDF
Golang dot-testing
Discoblocks.pptx.pdf
eBPF in the view of a storage developer
Golang dot-testing-lite
I wanna talk about nsenter
First impression of the new cloud native programming language ballerina
Golang dot-testing
Ad

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
top salesforce developer skills in 2025.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
Nekopoi APK 2025 free lastest update
Operating system designcfffgfgggggggvggggggggg
2025 Textile ERP Trends: SAP, Odoo & Oracle
VVF-Customer-Presentation2025-Ver1.9.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 2 - PM Management and IT Context
Navsoft: AI-Powered Business Solutions & Custom Software Development
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Softaken Excel to vCard Converter Software.pdf
Transform Your Business with a Software ERP System
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms II-SECS-1021-03
Design an Analysis of Algorithms I-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
top salesforce developer skills in 2025.pdf
ManageIQ - Sprint 268 Review - Slide Deck

Crossplane and a story about scaling Kubernetes custom resources.pdf

  • 1. © 2023 Akamai 1 Power and protect life online
  • 2. © 2023 Akamai 2 Crossplane and a story about scaling Kubernetes custom resources Richa’rd [mhmxs] Kova’cs
  • 3. © 2023 Akamai 3 Boring Slide • At work ○ Staff Kubernetes Engineer ○ @Akamai > Linode ○ Go microservices, Kubernetes controllers • At IT space ○ Many years of DevOps and automation ○ Several years of Go, Java … who counts ○ OSS devotee ○ Known as @mhmxs
  • 4. © 2023 Akamai 4 Long time ago on a cluster far, far away…. Agenda • Shallow dive into Kubernetes • What are custom resources good for? • What the heck is a Kubernetes operator? • Crossplane in a nutshell • Scaling characteristics • Scaling in action
  • 5. © 2023 Akamai 5 Shallow dive into Kubernetes
  • 6. © 2023 Akamai 6 What is Kubernetes? Kubernetes, is an open-source system for automating deployment, scaling, and management of containerized* applications. It’s like an operating system for usually more than one computers. ● Regular “processes” are groups of containers. *Containers are lightweight, portable, and isolated execution environments that encapsulate applications and their dependencies.
  • 7. © 2023 Akamai 7 K8OS Resource Management Both allocate and manage resources like CPU, memory, and storage to ensure optimal performance. Process Isolation Both systems provide a level of process isolation to prevent conflicts and ensure applications run independently. Scheduling Both systems implement scheduling algorithms to efficiently manage the execution of processes or containers. Security and AuthZ Both focus on security measures, controlling access to resources, and maintaining the integrity of the system. Lifecycle Management Both Kubernetes and traditional operating systems oversee the complete lifecycle of applications. Similarities
  • 8. © 2023 Akamai 8 Declarative Kubernetes follows a declarative approach, where users specify the desired state of their applications. The platform then automatically works to ensure the actual state aligns with this defined configuration. Controllers are control loop processes that continuously work to maintain the desired state of resources by reconciling the actual state with the specified configuration.
  • 9. © 2023 Akamai 9 Architecture src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
  • 10. © 2023 Akamai 10 Architecture src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
  • 11. © 2023 Akamai 11 Architecture src https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
  • 12. © 2023 Akamai 12 Average backend service Managed Resources Pod
  • 13. © 2023 Akamai 13 Average backend service Managed Resources Deployment Pod
  • 14. © 2023 Akamai 14 Average backend service Managed Resources Service Deployment Pod
  • 15. © 2023 Akamai 15 Average backend service Managed Resources Gateway Service Deployment Pod
  • 16. © 2023 Akamai 16 What are custom resources good for?
  • 17. © 2023 Akamai 17 Custom Resources ● Custom Resources (CRs) are extensions to the Kubernetes API. ● Explaining the structure of a Custom Resource Definition (CRD). ○ Basic validation ○ Validation webhook ● The Kubernetes API Server primarily functions as a data store. ○ Allowing users to manage their own custom objects (CRUD). ○ Only takes care on desired state.
  • 18. © 2023 Akamai 18 Custom Resource Definitions
  • 19. © 2023 Akamai 19 Custom Resource Definitions
  • 20. © 2023 Akamai 20 Custom Resource Definitions
  • 21. © 2023 Akamai 21 Custom Resource Definitions
  • 22. © 2023 Akamai 22 Custom Resource Definitions
  • 23. © 2023 Akamai 23 What the heck is a Kubernetes operator?
  • 24. © 2023 Akamai 24 Kubernetes Operators ● Group of custom controllers to ensure desired state of Core nor Custom Resources. ● A Kubernetes Operator is a method of packaging, deploying, and managing applications on top of Kubernetes. ● It extends the Kubernetes API to create, configure, and manage instances of complex applications. ● Actively watches resources via the Kubernetes API server for real-time monitoring of changes and deviations in the system's state.
  • 26. © 2023 Akamai 26 Crossplane in a nutshell: Crossplane is an operator framework on steroids. Some random points: ● Operator for external resources. ● Uses Kubernetes as backend. ● Provides CRs as API for everything. ● Able to bundle complex applications into one package. ● Combine multiple CRDs into one. ● Advanced RBAC support by design. ● Examples: Kubernetes, Helm, Linode (beta), Terraform, Argo CD, CAPI, … ○ Marketplace: https://marketplace.upbound.io/
  • 28. © 2023 Akamai 28 Architecture Crossplane in a nutshell https://github.com/linode/provider-ceph
  • 29. © 2023 Akamai 29 Scaling characteristics
  • 30. © 2023 Akamai 30 Crossplane It is a plain old controller
  • 31. © 2023 Akamai 31 Kubernetes No more than 150,000 total pods No more than 300,000 total containers No more than 5,000 nodes No more than 110 pods per node
  • 32. © 2023 Akamai 32 Dimensions Number of CRDs Each Custom Resource Definitions (CRD) contributes to the overall complexity of the API server, potentially impacting its performance and responsiveness. Expanding the variety of CRDs often entails the creation of more controllers, adding to the management overhead within the Kubernetes cluster. API discovery becomes pivotal to manage the growing complexity and ensure streamlined access to extended functionalities. Number of CRs As the quantity of custom resources grows, considerations must be made regarding resource utilization, such as storage, network overhead, memory usage, and processing power. The larger the number of controller instances, the more significant the impact on the cluster's operational aspects, including scheduling, monitoring, and event handling. Throughput
  • 33. © 2023 Akamai 33 Horizontal Number of CRDs ● Problems tend to start happening once you’re over - very approximately - 500 CRDs. ● Some Crossplane providers should have 800+ of CRDs. ○ Crossplane have a new feature for partial deployment of CRDs. ● Take care on client side rate limits, most importantly burst, to avoid overloading Kubernetes API server. ● Some clients [kubectl] are designed that there won’t be more than 50-100 CRDs installed on a cluster. ● Use Kubernetes v1.26.0+ [https://blog.upbound.io/scaling-kubernetes-to-thousands-of-crds] ● Count with longer startups and heavy CPU and memory peaks of Kube API server.
  • 34. © 2023 Akamai 34 Vertical Number of CRs ETCD data store ● Storage speed is critical. ● How to go further: ○ Kine, a storage plugin designed for providing an efficient and flexible storage backend for container orchestration. ■ Only a few databases are supported jet … ■ Implement your own database integration. ■ Implement routing logic to distribute load. ● ETCD cluster per namespace ● ETCD cluster per resource
  • 35. © 2023 Akamai 35 Vertical Number of CRs Operator side ● Memory would be the main bottleneck. ● Configure rate limits, bursts and timeouts of clients. ● Retry failed actions to reduce CPU load. ● Find and schedule best cache sync periods. ● Filter watch resources by label. - PAUSE ● Count with longer startups. ● Forget leader elected operators. ● Ensure best reconciliation concurrency.
  • 36. © 2023 Akamai 36 Vertical Number of CRs Kubernetes API server ● Network bandwidth ○ # of active connections. - watchers everywhere ● CPU and Memory - infinite if possible. ● Only a few configuration options available. ○ watch-cache [on/off] ○ event-ttl [1h] ○ endpoint-reconciler-type [lease] ○ enable-aggregator-routing [EP IP] ○ http2-max-streams-per-connection ○ max-[mutating-]requests-inflight
  • 37. © 2023 Akamai 37 Vertical Number of CRs Kubernetes API server ● How to go further: ○ Bring your own cluster. ○ Scale vertically, there is only one leader. ○ Running a proxy in front of the API server. ○ Delegating load via Aggregation API.
  • 39. © 2023 Akamai 39 Overall memory footprint
  • 40. © 2023 Akamai 40 Overall memory footprint
  • 41. © 2023 Akamai 41 Overall memory footprint
  • 42. © 2023 Akamai 42 Overall memory footprint
  • 43. © 2023 Akamai 43 This is sample text. This is sample text. This is sample text. This is sample text. The main reason for watch cache is to speed up list operations …
  • 44. © 2023 Akamai 44 Summary ● Kubernetes API server leader is SPOF. ● Kubernetes API server doesn’t scale well. ● Object reconciliation pause logic is mandatory. ● Millions of custom resources needs ~infinite resources. ● Some clients may not tolerate huge datasets or large timeouts. ● Pretty easy to kill Kubernetes API server via unfiltered list operations. ● Middle tier performance ○ requires lots of care and understanding. ○ needs complex architecture. ○ is impossible without custom components and business logic.