SlideShare a Scribd company logo
CONFIDENTIAL designator
3 Pitfalls Everyone
Should Avoid with
Cloud Data
Eric D. Schabell
Portfolio Architect Technical Director
@ericschabell
Roel Hodzelmans
Manager, Specialist Solution Architecture
@roelhodzelmans
OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data
OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data
Cloud & data
Does your company know their bandwidth
and storage needs?
Question #1
Something to think about...
De Persgroep
(Based on audience question from ‘Power to Innovate talk)
“Why don’t you go 100% to cloud for hosting your news group assets?” The
public cloud providers quoted prices for hosting was OK, but the bandwidth
quotes were off the charts.
Shortly thereafter, during the Paris attacks (Charlie Hebdo) the French
people crashed all local news sites. The Walloon (FR Belgium) sites hosted by
De Persgroep received an extra 1.2 million unique visitors…. (810K BE, 450K
NL).
What would that have meant to your bandwidth costs in the cloud?
7
Observability is data
Will your organization have observability
across their cloud landscape?
Question #2
Something to think about...
“It’s remarkable how common this
situation is, where an organization is
paying more for their observability
data, than they do for their
production infrastructure.”
-- The Growth of Observability Data is out of Control
Data complexity
Experiment:
- Hello World app on 4 node
Kubernetes cluster with Tracing,
End User Metrics (EUM), Logs,
Metrics (containers / nodes)
- 30 days == +450 GB
OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data
FinOps the crucial Ops
Question #3
Do you know (who observes) the cost of
observability metrics data?
Something to think about...
Dedicated FinOp
“By 2023, 80% of organizations using
cloud services will establish a dedicated
FinOps function to automate policy-
driven observibility and optimization of
cloud resources to maximize value.”
-- Source: IDC 2022
Bonus Question #4
Does your organization have a baseline of
your cloud landscape?
Something to think about...
1. Determine your goals of migrating data to the cloud
2. Assess your current situation
3. Select the right cloud migration partner
4. Create your business case for the cloud
5. Select the type of cloud environment needed – public, private, hybrid or
hybrid-multi?
6. Determine the specific cloud components necessary
7. Choose the right cloud provider
8. Plan the cloud approach
9. Execute the migration
10. Observability of the production environment
Don’t forget the baseline!
OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data
CONFIDENTIAL designator
commons.openshift.org
youtube.com/OpenShift
facebook.com/openshift
twitter.com/openshift
Commons builds connections and collaboration
across OpenShift communities, projects and
stakeholders.
In doing so we'll enable the success of customers,
users, partners, and contributors as we deepen our
knowledge and experiences together.
Thank you

More Related Content

PPTX
3 Pitfalls Everyone Should Avoid with Cloud Native Data
PPTX
Everything you wanted to know about data ops
PDF
How to use the cloud for data and actually save money
PPTX
Optimizing Observability Spend: Metrics
PDF
10603_EBK_Observability_Challenges.pdf
PPTX
3 Pitfalls Everyone Should Avoid with Cloud Data
PDF
Safeguarding Your Data: Best Practices for Secure Cloud Storage
PDF
Cloud strorage whitepaper (Transform b2B).pdf
3 Pitfalls Everyone Should Avoid with Cloud Native Data
Everything you wanted to know about data ops
How to use the cloud for data and actually save money
Optimizing Observability Spend: Metrics
10603_EBK_Observability_Challenges.pdf
3 Pitfalls Everyone Should Avoid with Cloud Data
Safeguarding Your Data: Best Practices for Secure Cloud Storage
Cloud strorage whitepaper (Transform b2B).pdf

Similar to OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data (20)

PDF
Efficient Cloud Storage & Data Management Solutions
PDF
That's not a metric! Data for cloud-native success
PDF
faisal mushtaq - an enterprise cloud cost management framework
PDF
Mastering Cost Efficiency in the Cloud.pdf
PPTX
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
PPTX
3 Pitfalls Everyone Should Avoid with Cloud Data
PPTX
The Journey to Success with Big Data
PPTX
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
PDF
Monitoring and Observability_ Keeping Your DevOps Pipeline Healthy.pdf
PPTX
Monitoring and Observability_ Keeping Your DevOps Pipeline Healthy.pptx
PPTX
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
PDF
The Foundations of Cloud Data Storage
PDF
Cloud Cost Optimization Whitepaper
PDF
OSMC 2024 | Is Building an Observability Platform with Open Source the Right ...
PPTX
Softchoice Discovery Series: Cloud Cost Governance
PPTX
CloudHesive x Datadog Multi Generational Observability
PDF
Companies in cloud ecosystem
PDF
#DataOnCloud New York Event
PPTX
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
PDF
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Efficient Cloud Storage & Data Management Solutions
That's not a metric! Data for cloud-native success
faisal mushtaq - an enterprise cloud cost management framework
Mastering Cost Efficiency in the Cloud.pdf
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
3 Pitfalls Everyone Should Avoid with Cloud Data
The Journey to Success with Big Data
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Monitoring and Observability_ Keeping Your DevOps Pipeline Healthy.pdf
Monitoring and Observability_ Keeping Your DevOps Pipeline Healthy.pptx
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
The Foundations of Cloud Data Storage
Cloud Cost Optimization Whitepaper
OSMC 2024 | Is Building an Observability Platform with Open Source the Right ...
Softchoice Discovery Series: Cloud Cost Governance
CloudHesive x Datadog Multi Generational Observability
Companies in cloud ecosystem
#DataOnCloud New York Event
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Ad

More from Eric D. Schabell (20)

PPTX
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
PPTX
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
PPTX
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
PPTX
Observability-as-a-Service: When Platform Engineers meet SREs
PPTX
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
PPTX
When Platform Engineers meet SREs - The Birth of O11y-as-a-Service Superpowers
PPTX
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
PPTX
Taking Back Control of Your Telemetry Data with Fluent Bit
PPTX
Finding observability and DevEx tranquility sailing the monitoring data seas
PDF
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
PPTX
MTTS - Sleep more, slog less with automated cloud native o11y platforms
PPTX
KCD Porto: Choose Your Own Adventure - Cloud Naive Observability Pitfalls
PPTX
Infobip Shift EU 2024: Platform Engineers Arise - Adding Observability to You...
PPTX
PromCon EU 2024: Meet the New Kid in the Sandbox - Integrating Visualization ...
PPTX
Taking Back Control of Your Telemetry Data with Fluent Bit
PDF
Observability For You and Me with OpenTelemetry
PPTX
Power Up with Podman - Cloud Native + K8s Meetup
PPTX
Choose Your Own Adventure - Cloud Native Observability Pitfalls
PPTX
Choose Your Own Observability Adventure
PPTX
Checking the pulse of your cloud native architecture
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Observability-as-a-Service: When Platform Engineers meet SREs
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
When Platform Engineers meet SREs - The Birth of O11y-as-a-Service Superpowers
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Taking Back Control of Your Telemetry Data with Fluent Bit
Finding observability and DevEx tranquility sailing the monitoring data seas
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
MTTS - Sleep more, slog less with automated cloud native o11y platforms
KCD Porto: Choose Your Own Adventure - Cloud Naive Observability Pitfalls
Infobip Shift EU 2024: Platform Engineers Arise - Adding Observability to You...
PromCon EU 2024: Meet the New Kid in the Sandbox - Integrating Visualization ...
Taking Back Control of Your Telemetry Data with Fluent Bit
Observability For You and Me with OpenTelemetry
Power Up with Podman - Cloud Native + K8s Meetup
Choose Your Own Adventure - Cloud Native Observability Pitfalls
Choose Your Own Observability Adventure
Checking the pulse of your cloud native architecture
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
sap open course for s4hana steps from ECC to s4
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data

  • 1. CONFIDENTIAL designator 3 Pitfalls Everyone Should Avoid with Cloud Data Eric D. Schabell Portfolio Architect Technical Director @ericschabell Roel Hodzelmans Manager, Specialist Solution Architecture @roelhodzelmans
  • 5. Does your company know their bandwidth and storage needs? Question #1 Something to think about...
  • 6. De Persgroep (Based on audience question from ‘Power to Innovate talk) “Why don’t you go 100% to cloud for hosting your news group assets?” The public cloud providers quoted prices for hosting was OK, but the bandwidth quotes were off the charts. Shortly thereafter, during the Paris attacks (Charlie Hebdo) the French people crashed all local news sites. The Walloon (FR Belgium) sites hosted by De Persgroep received an extra 1.2 million unique visitors…. (810K BE, 450K NL). What would that have meant to your bandwidth costs in the cloud?
  • 7. 7
  • 9. Will your organization have observability across their cloud landscape? Question #2 Something to think about...
  • 10. “It’s remarkable how common this situation is, where an organization is paying more for their observability data, than they do for their production infrastructure.” -- The Growth of Observability Data is out of Control
  • 11. Data complexity Experiment: - Hello World app on 4 node Kubernetes cluster with Tracing, End User Metrics (EUM), Logs, Metrics (containers / nodes) - 30 days == +450 GB
  • 14. Question #3 Do you know (who observes) the cost of observability metrics data? Something to think about...
  • 15. Dedicated FinOp “By 2023, 80% of organizations using cloud services will establish a dedicated FinOps function to automate policy- driven observibility and optimization of cloud resources to maximize value.” -- Source: IDC 2022
  • 16. Bonus Question #4 Does your organization have a baseline of your cloud landscape? Something to think about...
  • 17. 1. Determine your goals of migrating data to the cloud 2. Assess your current situation 3. Select the right cloud migration partner 4. Create your business case for the cloud 5. Select the type of cloud environment needed – public, private, hybrid or hybrid-multi? 6. Determine the specific cloud components necessary 7. Choose the right cloud provider 8. Plan the cloud approach 9. Execute the migration 10. Observability of the production environment Don’t forget the baseline!
  • 19. CONFIDENTIAL designator commons.openshift.org youtube.com/OpenShift facebook.com/openshift twitter.com/openshift Commons builds connections and collaboration across OpenShift communities, projects and stakeholders. In doing so we'll enable the success of customers, users, partners, and contributors as we deepen our knowledge and experiences together. Thank you

Editor's Notes

  • #2: The daily hype is all around you. From cloud, multicloud, to hybrid cloud, this is the path to your digital future. These choices you’ve got to make don’t preclude the daily work of enhancing your customer's experience and agile delivery of those applications. With all this delivery and infrastructure, there is a lot of data to consider when engaging with any cloud experience. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pitfalls around the collection, storage, and maintenance of our data can mean the difference between bankruptcy and success with our cloud strategy. Let us take you on a journey, looking closely at the decisions being made for delivering and dealing with monitoring those applications. Join us for an hour of power, where real customer experiences are used to highlight the three top lessons learned as they transitioned their data needs into cloud environments.
  • #3: 3 pitfalls you should understand when dealing with customers and prospects looking for your strategic insights into cloud data. Cloud and data Observability is data FinOps the crucial Ops The rest of this session will cover these three pitfalls after first setting the stage with definitions and positioning for cloud and cloud data.
  • #4: It started with this session back in 2018. Five attendees insisted that we share this talk with their CIO/CTO onsite, resulting in a two week tour through the midwest in the US. This led to a book in the dummies series and several articles online. But this topic included one key element that kept gnawing on us… data. Data since 2018 has secretly become a huge issue in organizations making use of the cloud in any form and in any size. It’s not what you think… storage is easy to solve or accept in the pricing picture, it’s more about fully understanding what data in the cloud can mean for you. This talk evolved from the roots of hybrid multicloud and explores the modern day pitfalls solely based on data in the cloud… all kinds of data in the cloud. (The Red Hat Summit 2018 top rated session recording (https://youtu.be/eACHhV_uxTE) available and online free ebook download (https://www.redhat.com/en/engage/multicloud-portability-dummies-s-201903060959?sc_cid=701f20000012pHcAAI).)
  • #5: The first pitfall is understanding that cloud providers make their money on the transportation of data… water in the pipeline == $$$. This means we need to rethink our architecture and usage of the data pipeline and it’s not just about storage.
  • #6: Asking the audience to give their accounts, customers and personal experiences as feedback.
  • #7: Example of company in BE that decided to not go 100% in to the cloud, actually hosting their own sites in data centers due to the story here which would mean bankruptcy based on the pricing models. http://powertoinnovate.nl/presentaties-powertoinnovate/customer-case-de-persgroep.pdf
  • #8: From the presentation, the numbers that showcase extra load running on BE sites, with cloud pricing it would have meant bankruptcy. Note, nothing is running in containers…. wow. http://powertoinnovate.nl/presentaties-powertoinnovate/customer-case-de-persgroep.pdf
  • #9: The second pitfall is understanding that observability and metrics collection is cloud data..
  • #10: Asking the audience to give their accounts, customers and personal experiences as feedback.
  • #11: And for what purpose? If these organizations could draw a straight line from more data to better outcomes — higher levels of availability, happier customers, faster remediation, more revenue — this tradeoff might make sense. But in many cases, this isn’t true. “Paying more for logging/metrics/tracing doesn’t equate to a positive user experience. Consider how much data can be generated and shipped. $$$. You still need good people to turn data into action.” It’s remarkable how common this situation is, where an organization is paying more for their observability data (typically metrics, logs, traces, and sometimes events), than they do for their production infrastructure. -- The Growth of Observability Data is out of Control
  • #12: Observability metric data explosion will cause plenty of issues, not to mention costs… dare to flip the switch on new data collection? An experiment: Hello World application was deployed to a four node Kubernetes cluster on GKE. Load was generated using the script that comes with the app. Wrote some additional scripting to scrape the Prometheus end points and record the size of the data payloads. Another script accepted Jaeger tracing spans and EUM beacons, recording the size of the data payloads. Fluentd collected all the logs and concatenated them all into one flat file. Using the timestamps from the log file, one hour was extracted into a new file, which was then measured. Observability Data Volume: Tracing At a rate of 1 trace per second, over 24 hours per day and 30 days in a month, the total number of traces is 2.5 million. The average trace size was 66kB. Therefore, the total data size for traces was 161GB. Looks like my estimate of fitting inside 100GB has already been proved wrong. While Tracing can be sampled at source, that would mean having to throw away nearly half of the data to fit inside the original estimate of 100GB. Observability Data Volume: EUM Each back-end call is triggered by a user interaction at the browser, which produces an EUM beacon – conveniently making the number of beacons generated the same as the number of traces – 2.5 million. The average size of an End User Metrics (EUM) beacon is a lot smaller at 397 bytes, making our total data size for a month of EUM beacons 1 GB. Observability Data Volume: Logs For logs, especially when it comes to data volumes, your mileage may vary – depending on your app, configuration settings, etc. The application logs generate quite a bit at INFO level, though not nearly as much as some other real-world applications. From the experiment, the log file size for one hour was 5 MB, making the total log volume for one month 3.4 GB. Observability Data Volume: Metrics Collected metrics – using Prometheus – from from every container, each worker node and from kube state metrics for the cluster giving a total of 1.1 MB per sample period. With a sample every ten seconds, that’s 259,200 samples per month, which results in a total data volume of 285 GB. Total Observability Data Volumes The grand total across all datasets is 452 GB per month for a simple Hello World application running on a small Kubernetes cluster. A note on data granularity: As you may or may not know, Instana collects all metrics at 1-second granularity. Doing this with Prometheus would so devastatingly skew the experiment results, since Prometheus has none of the optimizations built into the Instana sensors and agents. Thus, the experiment was conducted at 10 second sample rate for Prometheus metrics. The load generation script produces one request per second to the application back-end services. (Source: The Hidden Cost of Data Observability)
  • #13: Most companies default to 13 months retention for all data. But in the modern cloud native architecture, where we are deploying multiple times a day, and a container is only around for a couple of hours, a huge amount of that modern observability data does not need to be retained for 13 months. One tactic for reducing your data footprint is setting the optimal retention period for each data type. For example, you might only need to keep observability data from your lab environment for two weeks if the environment is torn down and rebuilt on a bi-weekly basis anyways. Source -- The Growth of Observability Data is out of Control
  • #14: The thirds pitfall is how crucial FinOps is going to be.
  • #15: Asking the audience to give their accounts, customers and personal experiences as feedback. A banking customer OpSec wanted to leverage the cloud provider’s observability in the load balancers by using a label of the load balancer per application. However, the load balancers label was limited to a specific amount, hence even though the utilization was < 10% the Dev Team had to get more load balancers (one of the most expensive component in the cloud) to meet said expectation. After many escalations that was solved. However, now they ran into another issue, the load balancers had a limited amount of contextpaths they could support. So again they had to multiply the number of load balancers, without hitting the traffic limit. The alternative, a simple NGinx behind the Cloud Load balancer, was not permitted because of the LCM, nobody wanted to LCM said instance. Who observes the cost of sub par architectural decisions? Auditing, Monitoring, Tracing: Beautiful capabilities, highly necessary for proper observation of the health of the app and it’s capability to serve our customers in a timely and secure manner. But if each customer engagement for a purchase means N number of logs, then those will grow exponentially. Who owns that data strategy? It looks like we’ll be kicking off with yet another buzzword in this industry: FinOps.
  • #16: Many of us are re-architecting our apps & devs to be cloud native, our ops to be platform providers, building a SRE org to close the feedback loop between the platform consumers and providers. We talk (hopefully) about data portability, exit strategies and baselines. We talk about security, we talk about LCM, we talk about Utilization. And when we talk about the last 3, those should include not “just” the customer data - though that is extremely important - but also the auditing, tracing and logging data strategies & architectures. Netflix already talked about this, how many already implemented a strategy? Semi related (they use auto remediation, but still it’s a crap ton of data) https://www.infoq.com/presentations/netflix-streaming-data-infrastructure/
  • #17: Asking the audience to give their accounts, customers and personal experiences as feedback.
  • #18: A thorough assessment of your current situation is imperative, as it will lay the foundation for many important decisions you’ll need to make. A deep understanding of what applications you need to migrate to the cloud, your current IT environment, and the present level of resources and costs will help you make informed choices. A banking customer asked RH for piloting business case public cloud, OCP and OSP. Looking at cost running container vs running VM’s. Without a baseline you can’t scope anything…. A government agency running containers on OCP so also have not baseline when asked. All decisions based on this! BTW: That Banking customer now decided that they go full cloud, just leaving the lights on for the traditional DC. Their own team say its a bad idea, but they are going anyway. What are critical apps? What needs clustering? What can run in cloud (certified) and whatnot? https://www.thorntech.com/2016/07/10-steps-cloud-computing-migration/#execute
  • #19: 3 pitfalls you should understand when dealing with customers and prospects looking for your strategic insights into cloud data. Cloud and data Observability is data FinOps the crucial Ops The rest of this session will cover these three pitfalls after first setting the stage with definitions and positioning for cloud and cloud data.
  • #20: The daily hype is all around you. From cloud, multicloud, to hybrid cloud, this is the path to your digital future. These choices you’ve got to make don’t preclude the daily work of enhancing your customer's experience and agile delivery of those applications. With all this delivery and infrastructure, there is a lot of data to consider when engaging with any cloud experience. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pitfalls around the collection, storage, and maintenance of our data can mean the difference between bankruptcy and success with our cloud strategy. Let us take you on a journey, looking closely at the decisions being made for delivering and dealing with monitoring those applications. Join us for an hour of power, where real customer experiences are used to highlight the three top lessons learned as they transitioned their data needs into cloud environments.