SlideShare a Scribd company logo
Komodor <> Epsagon | May 2021
with Gremlin & Komodor
Chaos & Order: Breaking and Fixing
Things in K8s Environments
Epic | February 2021
Why is it hard to troubleshoot?
Issues happen on an hourly basis and it’s almost
impossible to understand what causes them.
85% of incidents can be traced to system changes:
Blind spot
Changes are
unaudited or hidden
Fragmented data
Events are scattered between
hundreds of different tools
Butterfly effect
Distributed systems makes it
harder to understand the
effect of a single change
Introduction
Explore relevant exceptions
Troubleshooting Today
Understand who
changed what
Check the CI
pipeline
Check pods
status
Check the CI
pipeline
Check current
alert
Explore relevant
exceptions
Review the alert’s
metrics
Check account
activity
Review the latest
code changes
Epic | February 2021
Introduction
Komodor tracks
changes across tools &
teams, understands
their ripple effect and
gives users the context
they need to
troubleshoot efficiently.
We track down
cross-services
cascading failures
We are service-centric,
showing the full activity
timeline per service
We help you find
the root cause
across all systems
Introduction
How does it work?
Collect cross
systems events
Provide a complete overview of all services
and their relations in a single place
For each service, we build a comprehensive
timeline: deploys, config changes, alerts and more
Introduction
Introduction
Installation
and integration
● Komodor takes about 5 minutes to
install.
● K8s agent documentation can be
found here:
https://github.com/komodorio/helm-
charts/tree/master/charts/k8s-watc
her
● Komodor integrates with all of your
favorite DevOps tools
Introduction
Introduction
Service
Explorer
We collect data from Kubernetes
and enrich it with observability, code
repository, CI/CD and alerting tools.
The data is organized in
a comprehensive way, ready for a
drill down from the big picture to
its details.
Introduction
Introduction
Related
Services
Troubleshooting microservices
requires a deep understanding of
connections and dependencies.
In one click, you can add more
services to the service view, so it’s
correlated on one timeline.
Introduction
Introduction
Events
View
The ‘Events’ feature offers a panoramic view
of all occurrences across your entire K8s
environment.
With this system-wide visibility, Komodor
Events makes it easier to troubleshoot
elusive issues, particularly those that aren’t
traced to any one specific service or cluster.
Introduction
Introduction
Pod Status
and logs
‘Pods Status and Logs’ enables you to
quickly drill down in the pods of an
unhealthy service. This offers quick access
to all of the pod-level data you`ll need for
troubleshooting, including:
● Overview of all pods running the
service
● Pod details, similar to what you would
get with kubectl describe
● Live view of all events
● Pod containers’ logs
Introduction
Introduction
Workflows
OOMKilled
1. Detect Kubernetes issues (e.g. health events, schedulable
resources and etc)
2. Correlate the information with data from external sources
(e.g, Cloud providers, source code and feature flags)
3. Run sequences of checks that quickly pinpoint the exact
root cause
4. Use all of the information acquired to deliver
made-to-measure instructions for remediation
Failures are inherent to complex
systems and will cause downtime
unless tested for.
12
What is Chaos Engineering?
Thoughtful, planned experiments
designed to reveal weakness
in our systems.
Start Small &
Increase the
Blast Radius
Chaos & Order: Breaking and Fixing Things in K8s Environments
Development Staging Production
Chaos
Engineering
01 Resource failure
Chaos
Engineering
01 Resource failure
02 Service failure
Chaos
Engineering
01 Resource failure
02 Service failure
03 Dependency failure
Chaos
Engineering
01 Resource failure
02 Service failure
03 Dependency failure
04 Application failure
Chaos
Engineering
01 Resource failure
02 Service failure
03 Dependency failure
04 Application failure
05 Continuous Chaos

More Related Content

PDF
Ondat komodor webinar
PDF
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
PDF
5 Best Practices to Simplify Kubernetes Troubleshooting
PDF
Troubleshooting in a distributed systems
PDF
CNCF JLM Meetup - Making Peace With the Grim Reaper
PDF
Troubleshooting Permissions in Cloud-Native Products With Komodor & Authorizon
PDF
Why DevOps Tools Do Not Speak Developer Language (and how to overcome this)
PDF
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...
Ondat komodor webinar
Removing CI/CD Blockers: Navigating K8s with Codefresh & Komodor
5 Best Practices to Simplify Kubernetes Troubleshooting
Troubleshooting in a distributed systems
CNCF JLM Meetup - Making Peace With the Grim Reaper
Troubleshooting Permissions in Cloud-Native Products With Komodor & Authorizon
Why DevOps Tools Do Not Speak Developer Language (and how to overcome this)
5 things we learned not to ignore while scaling kubernetes webinar dev ops.co...

What's hot (20)

PDF
WSO2Con USA 2015: Planning Your Cloud Strategy
PDF
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
PDF
Evan Kaplan [InfluxData] | InfluxDays Opening Remarks | InfluxDays NA 2021
PPTX
ML-Based Data-Driven Software Development with InfluxDB 2.0
PPTX
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
PDF
Migrating .NET Apps to CF, A Strategy for Enterprises
PDF
Building an IoT Monitoring App with InfluxDB and LoRa
PPTX
Tectonic Summit 2016: It's Go Time
PDF
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
PDF
Ana-Maria Calin [InfluxData] | Migrating from OSS to InfluxDB Cloud | InfluxD...
PPTX
CDK - The next big thing - Quang Phuong
PDF
Empower Your Security Practitioners with Elastic SIEM
PPTX
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
PDF
Bhagvan Kommadi [Value Momentum] | TeleHealth Platform: DevOps-Based Progress...
PPTX
Getting started with Azure Event Grid - Webinar with Steef-Jan Wiggers
PDF
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
PPTX
Enhancing web applications with cloud intelligence
PDF
Online Meetup #3 - Solo.io, Tidepool, Weaveworks, Buoyant
PPTX
Experts Live CH Bern Docker & Kubernetes
PDF
Tobias Braun [Herrenknecht AG] | Going Underground with InfluxDB | InfluxDays...
WSO2Con USA 2015: Planning Your Cloud Strategy
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
Evan Kaplan [InfluxData] | InfluxDays Opening Remarks | InfluxDays NA 2021
ML-Based Data-Driven Software Development with InfluxDB 2.0
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Migrating .NET Apps to CF, A Strategy for Enterprises
Building an IoT Monitoring App with InfluxDB and LoRa
Tectonic Summit 2016: It's Go Time
Argo Workflows 3.0, a detailed look at what’s new from the Argo Team
Ana-Maria Calin [InfluxData] | Migrating from OSS to InfluxDB Cloud | InfluxD...
CDK - The next big thing - Quang Phuong
Empower Your Security Practitioners with Elastic SIEM
Continuous Delivery to the Cloud: Automate Thru Production with CI + Spinnaker
Bhagvan Kommadi [Value Momentum] | TeleHealth Platform: DevOps-Based Progress...
Getting started with Azure Event Grid - Webinar with Steef-Jan Wiggers
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Enhancing web applications with cloud intelligence
Online Meetup #3 - Solo.io, Tidepool, Weaveworks, Buoyant
Experts Live CH Bern Docker & Kubernetes
Tobias Braun [Herrenknecht AG] | Going Underground with InfluxDB | InfluxDays...
Ad

Recently uploaded (20)

PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
"Secure File Sharing Solutions on AWS".pptx
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
Trending Python Topics for Data Visualization in 2025
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
assetexplorer- product-overview - presentation
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Introduction to Windows Operating System
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
CNN LeNet5 Architecture: Neural Networks
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Weekly report ppt - harsh dattuprasad patel.pptx
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
"Secure File Sharing Solutions on AWS".pptx
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Trending Python Topics for Data Visualization in 2025
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
assetexplorer- product-overview - presentation
Salesforce Agentforce AI Implementation.pdf
Designing Intelligence for the Shop Floor.pdf
Introduction to Windows Operating System
How to Use SharePoint as an ISO-Compliant Document Management System
CCleaner 6.39.11548 Crack 2025 License Key
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Topaz Photo AI Crack New Download (Latest 2025)
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Ad

Chaos & Order: Breaking and Fixing Things in K8s Environments