SlideShare a Scribd company logo
Cost Optimization in Kubernetes:
Unlocking the Power of Machine Learning
DISCLAIMER: The information presented here is for informational
purposes only and should not be construed as a product commitment
or roadmap statement.
Alex Glikson
Chief Architect, Spot by NetApp
DevOps Days TLV, October 2024
© 2024 NetApp, Inc. All rights reserved.
© 2024 NetApp, Inc. All rights reserved.
2
• Cost Reduction vs Cost Optimization
• Cost Optimization in Kubernetes: Techniques and Trade-Offs
• Machine Learning: Uncertainty and The Power of Prediction
Outline
© 2024 NetApp, Inc. All rights reserved.
3
Cost Reduction vs Cost Optimization
© 2024 NetApp, Inc. All rights reserved.
4
Technique Challenges & Trade-Offs: performance, availability
1. Cost visibility • Fine-grained resource & cost monitoring (per-presource, per-workload)
2. Use spot instances • Risk of application downtime due to node interruptions
3. Leverage RIs/SPs • Balance between Spot instances and RIs/SPs (incl. renewals etc)
4. Cluster auto-scaling
• Adding/removing nodes may affect application performance/availability
• Distinguish between environments with different “SLA” (e.g., dev/test)
5. Node right-sizing • Changes over time are likely to cause resource fragmentation
6. Node over-committment • Risk of throttling (and possibly application downtime) at peak demand
7. Application auto-scaling
• Adding/removing Pods may affect application performance/availability
• Risks due to uneven load balancing, non-CPU/memory bottlenecks
8. Application right-sizing
• Changing resource ‘requests’ still requires a restart (as of Oct’24)
• Throttling might be disruptive for the application (e.g., OOM-killed)
9. Network-aware placement • Balance between application availability and minimizing cross-AZ traffic
Cost Optimization in Kubernetes
© 2024 NetApp, Inc. All rights reserved.
5
Prediction and Machine Learning
• Spot markets behavior
• Cost, Interruptions
• Application behavior
• Resource usage patterns, load patterns
• Correlations, seasonality, changes, etc
Many of the optimization challenges are hard because of uncertainty. Can we predict the future?
"Prediction is very difficult, especially about the future.” -- Niels Bohr
Alex Glikson, Chief Architect, Spot by NetApp
alex.glikson@netapp.com
Booth G14
Thank You!
© 2024 NetApp, Inc. All rights reserved. NETAPP CONFIDENTIAL

More Related Content

PPTX
Self service provisoining tradespace analysis (draft) 2016 02-16
PDF
“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events
PDF
Moving Applications to the Cloud
PDF
Univa Presentation at DAC 2020
PDF
A Year of “Testing” the Cloud for Development and Test
PDF
Automatic scaling of web applications for cloud computing services
PPTX
Cloud Migration - The Earlier You Instrument, The Faster You Go
PDF
Designing Scalable Applications
Self service provisoining tradespace analysis (draft) 2016 02-16
“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events
Moving Applications to the Cloud
Univa Presentation at DAC 2020
A Year of “Testing” the Cloud for Development and Test
Automatic scaling of web applications for cloud computing services
Cloud Migration - The Earlier You Instrument, The Faster You Go
Designing Scalable Applications

Similar to DevOpsDaysTLV24 - Spot Workload Optimization ML.pdf (20)

PDF
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
PPTX
Re-Platforming Applications for the Cloud
PDF
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
PPTX
Slides ch-5-the definitive guide to cloud computing -by- dan sullivan
PDF
The RECAP Project: Large Scale Simulation Framework
PDF
A Year of Testing in the Cloud: Lessons Learned
PPTX
cloud-migrations.pptx
PDF
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
PDF
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
PDF
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
PDF
RAN dimensioning: Lessons learned by Telstra
PPTX
Cloud migration presentation
PPTX
What is DevOps?
PDF
Commonwealth Bank of Australia's Private Cloud Implementation
PDF
Harnessing the Cloud for Performance Testing- Impetus White Paper
PDF
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
PDF
Future of Serverless
PDF
Tenant-based resource allocation model for cost-effective scaling Software-as...
PDF
Cloud Computing Essentials in Alibaba Cloud
PPTX
Migration into a Cloud
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Re-Platforming Applications for the Cloud
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Slides ch-5-the definitive guide to cloud computing -by- dan sullivan
The RECAP Project: Large Scale Simulation Framework
A Year of Testing in the Cloud: Lessons Learned
cloud-migrations.pptx
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
RAN dimensioning: Lessons learned by Telstra
Cloud migration presentation
What is DevOps?
Commonwealth Bank of Australia's Private Cloud Implementation
Harnessing the Cloud for Performance Testing- Impetus White Paper
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
Future of Serverless
Tenant-based resource allocation model for cost-effective scaling Software-as...
Cloud Computing Essentials in Alibaba Cloud
Migration into a Cloud
Ad

More from Alex Glikson (9)

PPTX
AWS Re:Invented
PPTX
Serverless Compute Platforms on Kubernetes
PPTX
From chroot to Docker to Kubernetes
PDF
Cloud-Native Application and Kubernetes
PDF
Mixing bare-metal and virtualized workloads on OpenStack - 2014
PPTX
Serverless, IoT and OpenWhisk
PDF
Container-Based Platforms and Kubernetes
PDF
Going Serverless with OpenWhisk
PDF
The Serverless Paradigm, OpenWhisk and FIWARE
AWS Re:Invented
Serverless Compute Platforms on Kubernetes
From chroot to Docker to Kubernetes
Cloud-Native Application and Kubernetes
Mixing bare-metal and virtualized workloads on OpenStack - 2014
Serverless, IoT and OpenWhisk
Container-Based Platforms and Kubernetes
Going Serverless with OpenWhisk
The Serverless Paradigm, OpenWhisk and FIWARE
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
top salesforce developer skills in 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
ai tools demonstartion for schools and inter college
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administration Chapter 2
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Design an Analysis of Algorithms II-SECS-1021-03
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
CHAPTER 2 - PM Management and IT Context
Odoo Companies in India – Driving Business Transformation.pdf
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Computer Software and OS of computer science of grade 11.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
ai tools demonstartion for schools and inter college
L1 - Introduction to python Backend.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administration Chapter 2
VVF-Customer-Presentation2025-Ver1.9.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Softaken Excel to vCard Converter Software.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Embracing Complexity in Serverless! GOTO Serverless Bengaluru

DevOpsDaysTLV24 - Spot Workload Optimization ML.pdf

  • 1. Cost Optimization in Kubernetes: Unlocking the Power of Machine Learning DISCLAIMER: The information presented here is for informational purposes only and should not be construed as a product commitment or roadmap statement. Alex Glikson Chief Architect, Spot by NetApp DevOps Days TLV, October 2024 © 2024 NetApp, Inc. All rights reserved.
  • 2. © 2024 NetApp, Inc. All rights reserved. 2 • Cost Reduction vs Cost Optimization • Cost Optimization in Kubernetes: Techniques and Trade-Offs • Machine Learning: Uncertainty and The Power of Prediction Outline
  • 3. © 2024 NetApp, Inc. All rights reserved. 3 Cost Reduction vs Cost Optimization
  • 4. © 2024 NetApp, Inc. All rights reserved. 4 Technique Challenges & Trade-Offs: performance, availability 1. Cost visibility • Fine-grained resource & cost monitoring (per-presource, per-workload) 2. Use spot instances • Risk of application downtime due to node interruptions 3. Leverage RIs/SPs • Balance between Spot instances and RIs/SPs (incl. renewals etc) 4. Cluster auto-scaling • Adding/removing nodes may affect application performance/availability • Distinguish between environments with different “SLA” (e.g., dev/test) 5. Node right-sizing • Changes over time are likely to cause resource fragmentation 6. Node over-committment • Risk of throttling (and possibly application downtime) at peak demand 7. Application auto-scaling • Adding/removing Pods may affect application performance/availability • Risks due to uneven load balancing, non-CPU/memory bottlenecks 8. Application right-sizing • Changing resource ‘requests’ still requires a restart (as of Oct’24) • Throttling might be disruptive for the application (e.g., OOM-killed) 9. Network-aware placement • Balance between application availability and minimizing cross-AZ traffic Cost Optimization in Kubernetes
  • 5. © 2024 NetApp, Inc. All rights reserved. 5 Prediction and Machine Learning • Spot markets behavior • Cost, Interruptions • Application behavior • Resource usage patterns, load patterns • Correlations, seasonality, changes, etc Many of the optimization challenges are hard because of uncertainty. Can we predict the future? "Prediction is very difficult, especially about the future.” -- Niels Bohr
  • 6. Alex Glikson, Chief Architect, Spot by NetApp alex.glikson@netapp.com Booth G14 Thank You! © 2024 NetApp, Inc. All rights reserved. NETAPP CONFIDENTIAL