PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience

© 2023 SPLUNK INC.
Prague Splunk
User Group #5
20/1/2025
Splunk Observability and Business
Resilience
Innogy | ALEF | Splunk

© 2023 SPLUNK INC.
15:30 - 15:50 (20 min) Check-in
15:50 - 16:00 (10 min) Opening
16:00 - 17:00 (60 min) Part 1: Splunk Observability Vision and Roadmap (English, remote)
Ian Wells, Observability Advisory Director, Splunk
17:00 - 17:20 (20 min) Coffee Break 1
17:20 - 18:05 (45 min) Part 2: Splunk Synthetic Monitoring, innogy ČR (Czech)
Lukáš Gottesman, innogy, Radek Filip/Jakub Tamchyna, ALEF
18:05 - 18:25 (20 min) Coffee Break 2
18:25 - 18:50 (25 min) Part 3: Splunk OpenTelemetry (English)
Houssem Eddine Djlassy, Splunk
18:50 - 19:00 (10 min) Wrap-up
19:00 - 22:00 (3 hours) Dinner - Pivo Karlín (same building)
Agenda

© 2023 SPLUNK INC.
Splunk User Group Community
From Splunkers To Splunkers
✓ No sales
✓ No marketing
✓ It’s about YOU!
✓ Ask!

© 2023 SPLUNK INC.
Prague SUG Team
Tomáš
Moser
Sr. Solutions
Engineer - GSS,
Splunk
tmoser@splunk.com
Technical Support
Engineer, Splunk
inemeckova@splunk.com
Ingrid
Nemečková
Splunk Consultant,
ALEF NULA
radek.filip@alef.com
Radek Filip
Michał
Skórczewski
Sr. Solutions Engineer,
Splunk
mskorczewsky@splunk.com

© 2023 SPLUNK INC.
Splunk Observability Vision & Roadmap
Director - Splunk Advisory Observability
Ian Wells
“Observability is a way to
investigate
unknown unknowns
By instrumenting
everything”

© 2024 SPLUNK INC.
Splunk Observability
“O11y”
Vision & Roadmap
Ian Wells
Director - Observability Advisory
2025

© 2024 SPLUNK INC.
What is
Observability?
Depends on who
you ask, but . . .
“Metrics, Traces, and Logs”
“The ability to infer the state of a system by examining its
output”
“A great new thing to spend money on”
The real deﬁnition:
Observability is a way to investigate
unknown unknowns by instrumenting
everything

© 2024 SPLUNK INC.
WHAT’S HAPPENING?
Observability The Three Pillars
METRICS
Detect
WHERE IS IT HAPPENING? TRACES
Troubleshoot
WHY IS IT HAPPENING? EVENTS / LOGS
Pinpoint

© 2024 SPLUNK INC.
Cloud Transformation drives O11y
To increase velocity, agility and responsiveness
Retain &
Optimize Lift & Shift Re-Factor Re-Architect /
Cloud-Native
DEV OPS DEV OPS DEV OPS DEV OPS
Cloud Managed e.g. RDS,
DynamoDB, SaaS
Cloud First Architecture
Tightly Coupled Apps,
Slow Deployment
Cycles
Primarily using
Cloud IaaS
More Modular, but
Dependent App Components
Loosely Coupled
Microservices, and Serverless
Functions
VM VM VM
VM VM VM VM VM VM
Private Public
VM VM VM VM VM VM
Private Public Private Public

© 2020 SPLUNK INC.
Cluster
Control plane
Containers
Pods
Micro-Service A
10.10.9.2
10.10.10.2
10.10.10.3
10.10.10.4
Micro-Service B
10.10.9.1
10.10.10.1
Node
Master
Cloud Native monitoring is a challenge
Huge volume of
high cardinality
(very unique)
metrics
Metrics
Tsunami
Why monitor every
5mn something that
can be deployed,
killed and
redeployed in
seconds
Ephemerality

© 2020 SPLUNK INC.
11:05:00 AM
OK
11:06:00 AM
OK
11:05:13 11:05:45
11:05:26
Application inside the container
crashed & will be restarted by K8s
Not “observable” long
minute
MONITORING
INFRASTRUCTURE
USERS
1 mn
Why monitor every
5mn something that
can be deployed,
killed and
redeployed in
seconds
Ephemerality

© 2020 SPLUNK INC.
Monolith
Monolith
Data
Single process
Q
A
A
Q
Q
Airport kiosk
Mobile app
Online booking
CHECKOUT
MILES
FLIGHT INFO
BAGAGE
</> Java
</> Node JS
</> .Net
Boarding
pass
</> Golang
A
Q
Q
Q
Q
A
A
A
Microservices
Monitoring Applications in the new
world is a Challenge

© 2024 SPLUNK INC.
Comprehensive Data Collection: Implement OpenTelemetry instrumentation across your application stack to collect
distributed traces, metrics, and logs.
2
Distributed Tracing: Utilize OpenTelemetry's distributed tracing capabilities to trace end-to-end journeys of ﬁnancial
transactions, allowing you to identify bottlenecks and monitor the performance of critical services.
5
Data & Event Correlation: Correlate logs, metrics and traces to provide a holistic view of system behavior.
1
Full Fidelity Real-time Monitoring: Engineers should be able to access real-time dashboards, alerts, and
notiﬁcations to respond to incidents promptly. No data sampling should be required to achieve scale.
3
Automation and Self-Healing: Implement automation for common operational tasks and self-healing mechanisms
that can address issues without manual intervention. Move to O11y as code for adoption & implementation.
6
Centralised Data Platform: All data to be captured within a single common data platform which can ideally extend
beyond the O11y use case. Able to handle structured and unstructured data types in real time at high cardinality and
be accessible by multiple persona types. Will encourage collaboration between development, operations, and security
teams plus allow sharing of observability insights and data to facilitate joint troubleshooting and problem resolution.
4
Core Principles of Observability

© 2024 SPLUNK INC. | Splunk Conﬁdential and Internal - Do Not Distribute
Why do you need
Observability?

© 2024 SPLUNK INC.
What problems do we need to solve?
Quite a lot actually ….
Security is a Data Problem
Observability is a Data
Problem
Resilience is a Data
Problem
Infrastructure is changing
Applications are changing
We work differently - Agile
vs Waterfall
“You Build it you Run it”
Expectations are high -
Digital Experience
Siloed Teams
Siloed Tools
Siloed Data
It’s a Data Problem Environment People & Process

© 2024 SPLUNK INC.
Where does the journey begin …..

© 2024 SPLUNK INC.
Solving the problem is hard …..
Dashboards are Green
Nobody will admit to a problem
But the Main Service is Down …..

© 2024 SPLUNK INC.
You have something like this…..
Security & IT Logs, Infra Data, App Data
Security Monitoring
Agent Agent
Agent
Agent
Agent
Agent
Agent
Agent Agent

© 2024 SPLUNK INC.
You could have this …..
Security & IT Logs Infrastructure Data
(Metrics)
Application Data
(Traces)
Observability Cloud
Open Telemetry
Splunk Security
Open Telemetry
Splunk Observability

© 2024 SPLUNK INC.
Observability Cloud
Is my Infrastucture
performing as I expect it
to? Servers, VMs,
OpenShift, K8S, Cloud,
Databases etc…
Is my Application
performing as I expect it
to? How is my service
interacting with others?
Is my Application / Site /
Service working as I expect
it to? Is it down or slow?
What are the users doing
with my application?
“I am an SRE”
“I want to know….”
● Real Time
● What went wrong?
● How do I ﬁx it?
● How do I improve it?
Infrastructure
Application
Log
Analytics
What does this all do?

© 2024 SPLUNK INC.
© 2023 SPLUNK INC.
“The Engineer”
● Real Time
● What went wrong?
● How do I ﬁx it?
Who are you?
How do you want to consume your information?
Infrastructure
Application
User Experience
Infrastructure
Application
Log Analytics “The Service Owner”
● Full Visibility
● How is the Service Running?
● Is there any action we need to take?
Log Analytics
Security Insights

© 2024 SPLUNK INC.
You now have this…

Why Splunk
Observability for
Splunk Platform
users ?

© 2024 SPLUNK INC. | Splunk Conﬁdential and Internal - Do Not
Combining Platform and Observability
help address 3 pain points
Explosion of telemetry and
tool sprawl have resulted in
demand for tool consolidation
and better data management
Growing data volume
and costs
Increased complexity
of environments
Fragmented admin
experience
Most organizations today are
hybrid cloud and deploy and
manage multiple
environments
Explosion of solutions
makes user and data
management challenging
for admins

Splunk Observability completes Splunk
Platform (and vice versa)
ES/ITSI
Content
Packs
Splunk
Apps
Third
Party
APM
Splunk Platform
Splunk Forwarders
Proprietary
Agent(s)
Logs Metrics
Without Splunk Observability Cloud With Splunk Observability Cloud
Third
Party
Apps
Add powerful metric and trace analytics to provide out-of-the-box visibility, anomaly detection and directed
troubleshooting for hybrid infrastructure and applications
Splunk Platform
Splunk Forwarders
OpenTelemetry
Collector
Logs
Agent
Metrics Traces
Content
Packs
On-Call
ES/ITSI
Third
Party
APM
Splunk Enterprise 9.0, Splunk Cloud Platform
Splunk
Apps
Splunk
IM
Splunk
APM
Splunk
DEM

System health monitoring
● Customizable dashboarding for better
insights
● Accurate system view with full-fidelity and
scalable data platform
Splunk
Enterprise/
Cloud
Splunk
Observability
Cloud
Unified platform for end-to-end workflows
Early issue
detection
● Schema-on-the-fly and SPL
● Alert action automation
● Proactive and Predictive,
ML-based alerting and
notifications (ITSI)
Fast root-cause
analysis
● Log analytics at scale
● Related Content for added
context
Contextualization
● Purpose-built views of infrastructure,
application and end-user experience
● Real-time analytics for traces and metrics
● Visibility across any type of workloads
Guided troubleshooting
● No code/intuitive interface
● Dynamic Service Map and AlwaysOn
Profiling for high and granular level views
● Distributed Tracing

Splunk Platform and Observability Architecture
APM
Infrastructure
Monitoring
Splunk Observability Cloud
Splunk ITSI
Splunk Cloud/Enterprise Platform
Observability
Related Content
(APM, Infra
Monitoring)
in Splunk Cloud
Log Observer
Connect
(via Service-Account or
Uniﬁed Identity)
Logs2Metrics
(Victoria Experience)
Observability
Content Pack
(KPIs, Alerts)
Real-User Monitoring Synthetic Monitoring
UF/HF OpenTelemetry as TA
Public
Private
Hybrid
Cloud
Uniﬁed
Identity
(SSO+RBAC)
Metrics &
Traces
Logs
Observability Cloud
metrics store
Dashboard
Studio
Log Observer UI

© 2024 SPLUNK INC.
Unlock more use cases
Centralize data and
workﬂows for full
visibility into digital
systems
Optimize monitoring
costs to achieve better
economies of scale
Improve data and user
management by
alleviating admin efforts

Why Build your
Observability
Practice with
Splunk ?

© 2024 SPLUNK INC.
Why Build a leading observability practice
with Splunk
Complete business
visibility
across any environment
and any stack
Earlier detection &
faster investigation
of business-impacting
issues
Better control
of your data and
costs

© 2024 SPLUNK INC.
across any environment and any stack
How Splunk helps you get there
Complete business
visibility
● Easily monitor critical business processes like sales,
orders, abandonment & customer behavior
● See every user transaction, with no blind spots
● Visibility across COTS & homegrown apps &
infrastructure, spanning monoliths to microservices
“We have so much information at our ﬁngertips
thanks to Splunk… we’re constantly solving
business problems in creative ways.”
Don Mahler | Director of Performance Management | Leidos

© 2024 SPLUNK INC.
of business-impacting issues
Earlier detection &
faster investigation
● Search & analyze unstructured & structured data at
petabyte-scale
● Dynamically updated visualizations, out-of-the-box
● Real-time detection and guided root cause analysis
● Event correlation for alert noise reduction
● Instrument, tag and analyze high-cardinality metrics
“Splunk Observability Cloud helps us make
blazing-faster decisions…our development
team gains instant intelligence to support our
goals of always offering customers
outstanding services.”
Jose Felipe Lopez | EVP | Engineering | Rappi

© 2024 SPLUNK INC.
of your data and costs
Better control
● OpenTelemetry native - avoid vendor lock-in
● Reduce toil by instrumenting once as you build new apps
● Instrument everything ﬂexibly, pay only for what’s needed
● Optimize telemetry volume and costs with ﬂexible data
management capabilities
● Enterprise controls enable self-service observability
“We could bake OpenTelemetry into our
architecture from day one because we
have Splunk, who is the number-one
contributor to OpenTelemetry and way
ahead of the curve on this.”
Splunk
Observability
Open Standards
Data Collection
Sean Schade | Principal Architect, Care.com

© 2024 SPLUNK INC.
Logs
Integration
Cloud-native,
microservices
environments
Uniﬁed Experience
(Common look & feel, SSO, uniﬁed AI, deep links)
Traditional
three-tier
environments
Logs
Integration
On-prem | SaaS
Business service monitoring, event aggregation / AIOps, network monitoring
IT Service Intelligence
Observability Cloud
Platform
Private Public
Integrated Full-Stack Observability
A view of the combined portfolio
AppDynamics

© 2024 SPLUNK INC.
Cloud Native
Applications
Infrastructure
Monitoring
Microservices APM
Digital Experience
Monitoring
Splunk Platform
Log Analytics
Business
Performance
Monitoring
APM for
Three-Tier
Apps
Digital
Experience
Monitoring
App Security
& Risk Mgmt
Three-Tier
Applications
SPLUNK OBSERVABILITY CLOUD
SPLUNK APPDYNAMICS
Service Monitoring
AIOps Event Intelligence
SPLUNK IT SERVICE INTELLIGENCE
Private Public
Integrated Full-Stack Observability
A view of key capabilities & integration roadmap
Network Monitoring
via ThousandEyes & Meraki

© 2024 SPLUNK INC.
Resilience provides the advantage
Resilience = O11y + Security
● Customers want Resilience
● Splunk is the only vendor to lead in O11y + Security

© 2024 SPLUNK INC.
The Foundation for Digital Resilience
Uniﬁed Platform for Observability AND Security

© 2024 SPLUNK INC.
Add Observability Cloud + IT Service Intelligence
Enable Digital Resilience

© 2024 SPLUNK INC.
Thank You!
Thank you

© 2023 SPLUNK INC.
17:00 - 17:20

© 2023 SPLUNK INC.
Splunk Observability Cloud - Case Study
IT Specialist, Innogy ČR
Lukáš Gottesman
Sr. Solutions Architect, ALEF NULA
Jakub Tamchyna
Splunk DEM

Lukáš Gottesman
IT Specialist
1.2025
42
Splunk
Observability
Cloud
DEM
Jakub Tamchyna
Senior Solution Architect

43
Splunk
Observability
Real User Monitoring
APM
Infrastructure
Monitoring
Incident
Response
Log Analysis
Synthetic
Monitoring
On-Prem | Hybrid Cloud | Multi-Cloud | Cloud-Native
Real-Time Analytics-Powered Enterprise-Grade
OpenTelemetry-Nativ
e
Full-Stack
Splunk Observability Cloud components

Ensuring optimal performance and
user satisfaction in today's digital
landscape
DEM alias Digital Experience Monitoring

45
• Real User Monitoring - (RUM) injects an agent on each page of a website
or application. The agent reports real page load data for every request that
is really made for each page.
• Synthetic Monitoring - generates synthetic (not data from real users or
interactions) traffic data to collect data on page performance. Runs
periodically on remote (often global) infrastructure.
RUM vs Synthetic
Main goals - while Synthetic monitoring helps diagnose and solve shorter-term
performance problems, RUM offers insight into long-term trends.

46
Splunk Synthetic Monitoring key features
Proactive end-to-end monitoring
• SLA/SLO tracking
• Detailed performance metrics
• Business transactions (journeys)
Visualization and Reporting
Alerting and Notifications
• Web site uptime tests
• Browser tests
• Backend API tests
Global monitoring locations

Externí monitoring webů v innogy
Splunk Synthetic Monitoring 🡪 o11y
innogy · 20. ledna 2025

1
Obecně k monitoringu
webů
O co jde a co může
nabídnout …
2
Staráme se o spolehlivý
běh webů …
3
Několik čísel k
monitoringu …
4
Proč padla volba na
Splunk SM?
5
Perličky z
implementace …
6
Migrace Splunk SM 🡪
O11y
innogy · Externí monitoring webů v innogy · 20.ledna 2025

innogy · Externí monitoring webů v innogy · 20.ledna 2025 49
Je nás 11 a staráme se o weby, mobilní aplikace a integrace
Snažíme minimalizovat důsledky nedostupnosti či dlouhých odezev našich webových aplikací
způsobené např.:
- nasazením nových funkcí, změnami na infrastruktuře (servery, security politiky, …),
redaktorskými úpravami, apod.
Staráme se o spolehlivý běh webů …
Weby
Cca 23 webů na innogy.cz a subdoménách (homepage, B2B/B2C portály, emobility, CNG, …)
+ Testovací weby + Cca 19 domén kde je nastaven redirect
Mobilní aplikace
iOS + Android – každá má specifický způsob integrace na SAPy
Integrace
SAPy + služby třetích stran (Mluvii, Kadlec, MPSV, …)

54 Uptime checků – monitoring FE
•32 prioritních webů (co 1 minutu)
•3 weby s delší frekvencí (co 5/15 minut)
•19 redirektovaných domén (co 24 hodin)
Několik čísel k monitoringu …
20 API checků – monitoring BE
•14x služby na SAPy (CRM/ISU, portál/app)
(co 10 minut)
•6x služby třetích stran (Kadlec, Mluvii, OTE, …)
7 Browser checků – scénář průchodu uživatele
•např. přihlášení/odhlášení uživatele, kalkulačka produktů, ...
Za rok 2024 cca 1020 notifikací o chybě (chyb více – většinou nechodí notifikace hned po první chybě)
• delší odezvy (>3s)
• nedostupnost
• plánované odstávky (releasy, údržba,...)
Na týdenní bázi hodnotíme odezvy webů/BE služeb a řešíme případné odchylky od normálu.

Pro externí monitoring jsem využívali službu New Relic
•Došlo ke změně v modelu předplatného 🡪 výrazné zdražení
Hledali jsme možnou náhradu
•Splunk čerstvě převzal službu Rigor a začal poskytovat syntetický monitoring
•Uvažovali jsme i o možné integraci se Splunk Enterprise
Připravili jsme POC na Splunk Synthetic Monitoring
🡪 služba nám vyhovuje, integrace s „velkým splunkem“ se řešit nebude
Proč padla volba na Splunk Synthetic Monitoring?

Migrace z New Relic
•Export/Import browser checků pomocí Selenium skriptů – nebyl 100%, bylo nutné doladit
Notifikace
•Původně chtěné SMS nebyly u českých operátorů podporovány 🡪 nutno hledat jiný kanál (email
je nedostačující)
•Zvolen MS Teams a integrace přes Webhooky
Vytváření browser checků
•Často není jednoduché vybrat správný element na stránce či cestu jak se ho „chytit“
•Na pevno nastavený čas čekání na další krok (problém u pomalejší odezvy BE) – v o11y už je
možnost volby
Perličky z implementace …

Vlastní automatická migrace proběhla hladce
•jen výjimečně drobné doladění u některých checků/testů
Nebylo zmigrováno vše
•response monitor, alerty (detektory) – bylo nutné u všech testů znovu ručně nastavit
Problémy k řešení
• notiﬁkace s výrazným odstupem
• nefunkční auto-retry
• nemožnost "Run now“
• nemožnost nastavit „Blackout periods“
• dashboardy
Migrace Splunk SM 🡪 O11y

© 2023 SPLUNK INC.
18:05 - 18:25

© 2023 SPLUNK INC.
OpenTelemetry And Splunk
Solutions Engineer - GSS, Splunk
Houssem Eddine Djlassi

© 2024 SPLUNK INC.
Link to this Presentation

© 2024 SPLUNK INC.
OpenTelemetry
& Splunk
January 2025
Houssem Djlassi - Solutions Engineer
Houssem Djlassi - Solutions Engineer

Forward-
looking
statements
This presentation may contain forward-looking statements regarding future events, plans or the expected financial
performance of our company, including our expectations regarding our products, technology, strategy, customers,
markets, acquisitions and investments. These statements reflect management’s current expectations, estimates and
assumptions based on the information currently available to us. These forward-looking statements are not guarantees of
future performance and involve significant risks, uncertainties and other factors that may cause our actual results,
performance or achievements to be materially different from results, performance or achievements expressed or implied
by the forward-looking statements contained in this presentation.
For additional information about factors that could cause actual results to differ materially from those described in the
forward-looking statements made in this presentation, please refer to our periodic reports and other filings with the SEC,
including the risk factors identified in our most recent quarterly reports on Form 10-Q and annual reports on Form 10-K,
copies of which may be obtained by visiting the Splunk Investor Relations website at www.investors.splunk.com or the
SEC's website at www.sec.gov. The forward-looking statements made in this presentation are made as of the time and
date of this presentation. If reviewed after the initial presentation, even if made available by us, on our website or
otherwise, it may not contain current or accurate information. We disclaim any obligation to update or revise any
forward-looking statement based on new information, future events or otherwise, except as required by applicable law.
In addition, any information about our roadmap outlines our general product direction and is subject to change at any
time without notice. It is for informational purposes only and shall not be incorporated into any contract or other
commitment. We undertake no obligation either to develop the features or functionalities described, in beta or in preview
(used interchangeably), or to include any such feature or functionality in a future release.
Splunk, Splunk> and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names or trademarks belong to their respective owners.
© 2024 Splunk Inc. All rights reserved.
© 2024 SPLUNK INC.

© 2024 SPLUNK INC.
What is
OpenTelemetry?
OpenTelemetry is a collection of tools, APIs and software development kits (SDKs) used to instrument,
generate, collect and export telemetry data (metrics, logs, traces, and more) that helps you analyze your
software’s performance and behavior.

© 2024 SPLUNK INC.
Why it matters?
Inconsistent instrumentation
across different services and
applications.
Fragmentation in
Observability Tools
Difficulty in switching between
observability backends or using
multiple tools simultaneously.
Vendor Lock-In
Inability to collect all three types
of telemetry data (Logs, Metrics
and Traces) using a single
framework.
Limited Data collection
capabilities
Inconsistent data formats and
semantics across different
observability solutions.
Lack of Standardization
in Telemetry Data
Challenges in instrumentation and
monitoring distributed systems, especially
in cloud-native and microservices
environments.
Complex integration
with Cloud-Native
Architectures

© 2024 SPLUNK INC.
Main OpenTelemetry components:
❏ A specification for all components.
❏ Semantic conventions that define a standard naming
scheme for common telemetry data types.
❏ Language SDKs that implement the specification, APIs,
and export of telemetry data.
❏ Automatic instrumentation components that generate
telemetry data without requiring code changes.
❏ Various other tools, such as the OpenTelemetry Operator
for Kubernetes, OpenTelemetry Helm Charts, and
community assets for FaaS
❏ A standard protocol that defines the shape of telemetry
data.
❏ APIs that define how to generate telemetry data.
❏ A library ecosystem that implements instrumentation for
common libraries and frameworks.
❏ The OpenTelemetry Collector, a proxy that receives,
processes, and exports telemetry data.
OpenTelemetry consists of the following major components:
An Observability framework and toolkit
designed to create and manage telemetry
data such as traces, metrics, and logs.

© 2024 SPLUNK INC.
Exporters:
● OTLP: sends telemetry via the
OpenTelemetry Protocol.
● Splunk HEC: sends telemetry to
Splunk HTTP Event Collector endpoints.
● Jaeger: send traces data to Jaeger
backends.
Processors
● Batch: groups data into batches for more
efficient processing.
● Memory Limiter: limits the amount of
memory used by the collector.
● Transform: modifies data format or adds
metadata.
Receivers:
● OTLP: supports gRPC and HTTP
protocols.
● Prometheus: scrapes metrics from
Prometheus-instrumented targets.
● Filelog: tails and parses logs from files.
Examples
Collector Components

© 2024 SPLUNK INC.
Collector Components - Host Metric
Receiver
The Host Metrics receiver generates metrics about the host system scraped from various
sources and host entity event as log. This is intended to be used when the collector is
deployed as an agent.

© 2024 SPLUNK INC.
Collector Components - K8s attributes
Processor
Kubernetes attributes processor allow automatic setting of spans, metrics and logs resource attributes
with k8s metadata.
The processor automatically discovers* k8s resources (pods), extracts metadata from them and adds
the extracted metadata to the relevant spans, metrics and logs as resource attributes.
*: The processor uses the kubernetes API to discover all pods
running in a cluster, keeps a record of their IP addresses,
pod UIDs and interesting metadata.

© 2024 SPLUNK INC.
About instrumentation:
Automatic vs Manual
OpenTelemetry provides more than just zero-code and code-based telemetry solutions. The following things are also a part of OpenTelemetry:
➔ Libraries can leverage the OpenTelemetry API as a dependency, which will have no impact on applications using that library, unless the OpenTelemetry SDK is
imported.
➔ For each signal (traces, metrics, logs) you have several methods at your disposals to create, process, and export them.
➔ With context propagation built into the implementations, you can correlate signals regardless of where they are generated.
➔ Resources and Instrumentation Scopes allow grouping of signals, by different entities, like, the host, operating system or K8s cluster
➔ Each language-specific implementation of the API and SDK follows the requirements and expectations of the OpenTelemetry specification.
➔ Semantic Conventions provide a common naming schema that can be used for standardization across code bases and platforms.

© 2024 SPLUNK INC.
★ Patches / attaches to a library
★ Collect data library activities in runtime
★ Produces spans based on speciﬁcation and
semantic-conventions
★ May offer additional conﬁguration / features
★ List of all auto-instrumentation:
https://opentelemetry.io/registry/
Automatic/Zero-Code:
★ Application developer writes dedicated code
★ Starts and end span, set status
★ Adding attributes and events
Manual/Code-based:
About instrumentation:
Automatic vs Manual
In order to make a system observable, it must be instrumented: That is, code from the system’s components must emit traces, metrics, and
logs.

© 2019 SPLUNK INC.
Benefits
1) Avoid vendor lock-in
Choose the tools that are right for your business
2) Data Flexibility
Collect and analyze custom metrics
3) Easy set up
Instrument only once the way that works best for you
4) Operate at scale
Scale without tool-specific considerations
5) Community
Second most popular project behind Kubernetes

© 2024 SPLUNK INC.
Use Cases
Infrastructure:
● Perform version audits to
ensure zero vulnerabilities and
make sure configurations are
working
● Identify configuration changes
leading to performance
degeneration
● Check for misconfiguration
with your domain name
system (DNS), causing apps
to be inaccessible
Back End:
● Detect faulty logic or incorrect
user input, which leads to
exceptions being thrown
● Identify improperly
implemented API calls to the
back end
● Uncover poorly performing
code on an API, which also
leads to longer response time
Front End:
● Detect faulty logic or
incorrect user input to errors
● Find poorly implemented
code; which makes your UI
extremely slow despite
having fast APIs
● Locate geo-specific lag
requiring geo-distribution
Examples

© 2024 SPLUNK INC.
Status and Roadmap
Spans / Distributed Traces
Generally available across ~all languages, the Collector, and other components
Infrastructure Metrics, Application Metrics, Custom Metrics
Generally available across ~all languages, the Collector, and other components
Logs
Generally available across most languages, the Collector, and other components
Profiles
New, still being designed

© 2024 SPLUNK INC.
Observability Cloud GDI Roadmap
OpenTelemetry zero-configuration
Automatically instrument custom applications, databases, message queues, etc.
OpenTelemetry Agent Management
View every VM, container cluster, and service that you own, the OTel agent or SDK instrumenting each one (and
a list of ones that aren’t instrumented), and the health and status of each agent and SDK
Enhancements for Splunk Enterprise and Splunk Cloud Customers
Making the Collector deployable and configurable with Splunk Deployment Server, making the Collector fully
supported for all data types and all Splunk products, adding metrics and traces to Edge Processor

© 2024 SPLUNK INC.
Splunk and OpenTelemetry
● Splunk Observability = no proprietary
agents to enable data collection with
OpenTelemetry.
● Splunk supports automatic trace
instrumentation and configuration to
make it easy to get started.
● You can customize what's included by
building from the community source.
● View the status, interactions,
dashboards and logs from all of your
infrastructure in Splunk Observability
Cloud or other observability tools.
● You can use OpenTelemetry to capture
traces, metrics and logs from
OpenTelemetry SDKs on the same host
or over the networks, or from hundreds
of sources, including databases, network
proxies, Prometheus and Jaeger and
more.
Built-in standard for
observability
Powering end-to-end
observability
The Splunk Advantage

© 2024 SPLUNK INC.
The Splunk Distribution of the
OpenTelemetry Collector
The Splunk Distribution of OpenTelemetry Collector
supports automatic (no code modification) trace
instrumentation and comes with default configuration
and out-of-the-box support for Splunk Application
Performance Monitoring and Splunk Infrastructure
Monitoring — making it easier than ever to get started. Splunk
Observability
Open Standards
Data Collection

© 2024 SPLUNK INC.
Collector deployment modes
This pattern consists of applications instrumented with an
OpenTelemetry SDK that export telemetry signals (traces,
metrics, logs) directly into a backend:
The agent collector deployment pattern consists of applications
— instrumented with an OpenTelemetry SDK using
OpenTelemetry protocol (OTLP) — or other collectors (using the
OTLP exporter) that send telemetry signals to a collector
instance running with the application or on the same host as
the application (such as a sidecar or a daemonset).
The gateway collector deployment pattern consists of applications
(or other collectors) sending telemetry signals to a single OTLP
endpoint provided by one or more collector instances running as a
standalone service (for example, a deployment in Kubernetes),
typically per cluster, per data center or per region.

© 2024 SPLUNK INC.
Recommended Deployment Option for K8s
Kubernetes
cluster
Splunk Cloud
Observability
Cloud
OTel
Coll.
.yaml
Reads logs
from
Logs via
S2S
Push, pull
metrics,
metadata
Spans,
metrics via
OTLP
Logs via
HEC
Edge
Processor
Heavy
Forwarder
For Kubernetes environments:
● Use the helm chart to rapidly deploy
the collector to all nodes in the cluster.
● Utilize zero-config to automatically
instrument applications.
The collector will capture metrics, traces,
and logs for both applications and infra.

© 2024 SPLUNK INC.
Recommended Deployment Option for
Linux and Windows Hosts with UF
For Linux and Windows hosts where the
UF is already deployed:
● Use the Splunk Deployment Server
with the Splunk Add-On for
OpenTelemetry Collector to deploy and
manage the collector.
UF will continue to capture logs, while the
collector will capture metrics and traces for
both applications and infra.
Host
UF OTel
Coll.
Splunk
Cloud
Observability
Cloud
.conf
TAs
.yaml
Splunk
Deployment
Server
Can
manage
Reads logs
from
Spans,
metrics via
OTLP
Push, pull
metrics,
metadata
Logs via
HEC
Edge
Processor
Heavy
Forwarder

© 2024 SPLUNK INC.
Alternate Deployment Option for
Linux and Windows Hosts
For customers with Linux and Windows
hosts that want to use OpenTelemetry to
collect 100% of their data:
● Deploy the Splunk Distribution of the
OpenTelemetry Collector using an
configuration management tool of
choice (Ansible, Puppet, etc.)
● Utilize zero-config to automatically
instrument applications.
The collector will capture metrics, traces,
and logs for both applications and infra.
Host
OTel Coll.
Splunk
Cloud
Observability
Cloud
.yaml
Reads logs
from
Spans,
metrics via
OTLP
Push, pull
metrics,
metadata
Logs via
HEC
Edge
Processor
Heavy
Forwarder

© 2024 SPLUNK INC.
What’s next ?
OpenTelemetry Official documentation: https://opentelemetry.io/docs/
Accelerating an implementation of OpenTelemetry in Splunk Observability Cloud
https://lantern.splunk.com/Observability/Getting_Started/Accelerating_an_implementation_of_OpenTelemetry_in_Splunk_Observability_Cloud
Using OpenTelemetry to get data into Splunk Cloud Platform
https://lantern.splunk.com/Splunk_Platform/Product_Tips/Data_Management/Using_OpenTelemetry_to_get_data_into_Splunk_Cloud_Platform
For a better understanding of the OpenTelemetry Configuration file: https://www.otelbin.io/

© 2023 SPLUNK INC.
Wrap-Up
● That’s it :-)
● Please fill in the post-event survey. Check your mailboxes!
● Slides will be shared on SUG #5 event page an Slideshare as usually.
● We forgot to record Ian’s O11y session. We are very sorry :(
● Talk to us!
PSUG @ Slack
Register and subscribe to #prague-sug Slack channel
Talk to us!
Tomáš Moser tmoser@splunk.com, tommoser@cisco.com
Ingrid Nemečková inemeckova@splunk.com, inemecko@cisco.com
Michal Skorczewski mskorczewski@splunk.com, mskorcze@cisco.com
Radek Filip radek.filip@alef.com
PSUG @ LinkedIn
https://www.linkedin.com/groups/9544692/

PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience

More Related Content

Similar to PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience (20)

Recently uploaded (20)

PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience