SlideShare a Scribd company logo
© 2023 SPLUNK INC.
Prague Splunk
User Group #5
20/1/2025
Splunk Observability and Business
Resilience
Innogy | ALEF | Splunk
© 2023 SPLUNK INC.
15:30 - 15:50 (20 min) Check-in
15:50 - 16:00 (10 min) Opening
16:00 - 17:00 (60 min) Part 1: Splunk Observability Vision and Roadmap (English, remote)
Ian Wells, Observability Advisory Director, Splunk
17:00 - 17:20 (20 min) Coffee Break 1
17:20 - 18:05 (45 min) Part 2: Splunk Synthetic Monitoring, innogy ČR (Czech)
Lukáš Gottesman, innogy, Radek Filip/Jakub Tamchyna, ALEF
18:05 - 18:25 (20 min) Coffee Break 2
18:25 - 18:50 (25 min) Part 3: Splunk OpenTelemetry (English)
Houssem Eddine Djlassy, Splunk
18:50 - 19:00 (10 min) Wrap-up
19:00 - 22:00 (3 hours) Dinner - Pivo Karlín (same building)
Agenda
© 2023 SPLUNK INC.
Splunk User Group Community
From Splunkers To Splunkers
✓ No sales
✓ No marketing
✓ It’s about YOU!
✓ Ask!
© 2023 SPLUNK INC.
Prague SUG Team
Tomáš
Moser
Sr. Solutions
Engineer - GSS,
Splunk
tmoser@splunk.com
Technical Support
Engineer, Splunk
inemeckova@splunk.com
Ingrid
Nemečková
Splunk Consultant,
ALEF NULA
radek.filip@alef.com
Radek Filip
Michał
Skórczewski
Sr. Solutions Engineer,
Splunk
mskorczewsky@splunk.com
© 2023 SPLUNK INC.
Splunk Observability Vision & Roadmap
Director - Splunk Advisory Observability
Ian Wells
“Observability is a way to
investigate
unknown unknowns
By instrumenting
everything”
© 2024 SPLUNK INC.
Splunk Observability
“O11y”
Vision & Roadmap
Ian Wells
Director - Observability Advisory
2025
© 2024 SPLUNK INC.
What is
Observability?
Depends on who
you ask, but . . .
“Metrics, Traces, and Logs”
“The ability to infer the state of a system by examining its
output”
“A great new thing to spend money on”
The real definition:
Observability is a way to investigate
unknown unknowns by instrumenting
everything
© 2024 SPLUNK INC.
WHAT’S HAPPENING?
Observability The Three Pillars
METRICS
Detect
WHERE IS IT HAPPENING? TRACES
Troubleshoot
WHY IS IT HAPPENING? EVENTS / LOGS
Pinpoint
© 2024 SPLUNK INC.
Cloud Transformation drives O11y
To increase velocity, agility and responsiveness
Retain &
Optimize Lift & Shift Re-Factor Re-Architect /
Cloud-Native
DEV OPS DEV OPS DEV OPS DEV OPS
Cloud Managed e.g. RDS,
DynamoDB, SaaS
Cloud First Architecture
Tightly Coupled Apps,
Slow Deployment
Cycles
Primarily using
Cloud IaaS
More Modular, but
Dependent App Components
Loosely Coupled
Microservices, and Serverless
Functions
VM VM VM
VM VM VM VM VM VM
Private Public
VM VM VM VM VM VM
Private Public Private Public
© 2020 SPLUNK INC.
Cluster
Control plane
Containers
Pods
Micro-Service A
10.10.9.2
10.10.10.2
10.10.10.3
10.10.10.4
Micro-Service B
10.10.9.1
10.10.10.1
Node
Master
Cloud Native monitoring is a challenge
Huge volume of
high cardinality
(very unique)
metrics
Metrics
Tsunami
Why monitor every
5mn something that
can be deployed,
killed and
redeployed in
seconds
Ephemerality
© 2020 SPLUNK INC.
11:05:00 AM
OK
11:06:00 AM
OK
11:05:13 11:05:45
11:05:26
Application inside the container
crashed & will be restarted by K8s
Not “observable” long
minute
MONITORING
INFRASTRUCTURE
USERS
1 mn
Why monitor every
5mn something that
can be deployed,
killed and
redeployed in
seconds
Ephemerality
© 2020 SPLUNK INC.
Monolith
Monolith
Data
Single process
Q
A
A
Q
Q
Airport kiosk
Mobile app
Online booking
CHECKOUT
MILES
FLIGHT INFO
BAGAGE
</> Java
</> Node JS
</> .Net
Boarding
pass
</> Golang
A
Q
Q
Q
Q
A
A
A
Microservices
Monitoring Applications in the new
world is a Challenge
© 2024 SPLUNK INC.
Comprehensive Data Collection: Implement OpenTelemetry instrumentation across your application stack to collect
distributed traces, metrics, and logs.
2
Distributed Tracing: Utilize OpenTelemetry's distributed tracing capabilities to trace end-to-end journeys of financial
transactions, allowing you to identify bottlenecks and monitor the performance of critical services.
5
Data & Event Correlation: Correlate logs, metrics and traces to provide a holistic view of system behavior.
1
Full Fidelity Real-time Monitoring: Engineers should be able to access real-time dashboards, alerts, and
notifications to respond to incidents promptly. No data sampling should be required to achieve scale.
3
Automation and Self-Healing: Implement automation for common operational tasks and self-healing mechanisms
that can address issues without manual intervention. Move to O11y as code for adoption & implementation.
6
Centralised Data Platform: All data to be captured within a single common data platform which can ideally extend
beyond the O11y use case. Able to handle structured and unstructured data types in real time at high cardinality and
be accessible by multiple persona types. Will encourage collaboration between development, operations, and security
teams plus allow sharing of observability insights and data to facilitate joint troubleshooting and problem resolution.
4
Core Principles of Observability
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute
Why do you need
Observability?
© 2024 SPLUNK INC.
What problems do we need to solve?
Quite a lot actually ….
Security is a Data Problem
Observability is a Data
Problem
Resilience is a Data
Problem
Infrastructure is changing
Applications are changing
We work differently - Agile
vs Waterfall
“You Build it you Run it”
Expectations are high -
Digital Experience
Siloed Teams
Siloed Tools
Siloed Data
It’s a Data Problem Environment People & Process
© 2024 SPLUNK INC.
Where does the journey begin …..
© 2024 SPLUNK INC.
Solving the problem is hard …..
Dashboards are Green
Nobody will admit to a problem
But the Main Service is Down …..
© 2024 SPLUNK INC.
You have something like this…..
Security & IT Logs, Infra Data, App Data
Security Monitoring
Agent Agent
Agent
Agent
Agent
Agent
Agent
Agent Agent
© 2024 SPLUNK INC.
You could have this …..
Security & IT Logs Infrastructure Data
(Metrics)
Application Data
(Traces)
Observability Cloud
Open Telemetry
Splunk Security
Open Telemetry
Splunk Observability
© 2024 SPLUNK INC.
Observability Cloud
Is my Infrastucture
performing as I expect it
to? Servers, VMs,
OpenShift, K8S, Cloud,
Databases etc…
Is my Application
performing as I expect it
to? How is my service
interacting with others?
Is my Application / Site /
Service working as I expect
it to? Is it down or slow?
What are the users doing
with my application?
“I am an SRE”
“I want to know….”
● Real Time
● What went wrong?
● How do I fix it?
● How do I improve it?
Infrastructure
Application
Log
Analytics
What does this all do?
© 2024 SPLUNK INC.
© 2023 SPLUNK INC.
“The Engineer”
● Real Time
● What went wrong?
● How do I fix it?
Who are you?
How do you want to consume your information?
Infrastructure
Application
User Experience
Infrastructure
Application
Log Analytics “The Service Owner”
● Full Visibility
● How is the Service Running?
● Is there any action we need to take?
Log Analytics
Security Insights
© 2024 SPLUNK INC.
You now have this…
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute
Why Splunk
Observability for
Splunk Platform
users ?
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not
Combining Platform and Observability
help address 3 pain points
Explosion of telemetry and
tool sprawl have resulted in
demand for tool consolidation
and better data management
Growing data volume
and costs
Increased complexity
of environments
Fragmented admin
experience
Most organizations today are
hybrid cloud and deploy and
manage multiple
environments
Explosion of solutions
makes user and data
management challenging
for admins
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not
Splunk Observability completes Splunk
Platform (and vice versa)
ES/ITSI
Content
Packs
Splunk
Apps
Third
Party
APM
Splunk Platform
Splunk Forwarders
Proprietary
Agent(s)
Logs Metrics
Without Splunk Observability Cloud With Splunk Observability Cloud
Third
Party
Apps
Add powerful metric and trace analytics to provide out-of-the-box visibility, anomaly detection and directed
troubleshooting for hybrid infrastructure and applications
Splunk Platform
Splunk Forwarders
OpenTelemetry
Collector
Logs
Agent
Metrics Traces
Content
Packs
On-Call
ES/ITSI
Third
Party
APM
Splunk Enterprise 9.0, Splunk Cloud Platform
Splunk
Apps
Splunk
IM
Splunk
APM
Splunk
DEM
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not
System health monitoring
● Customizable dashboarding for better
insights
● Accurate system view with full-fidelity and
scalable data platform
Splunk
Enterprise/
Cloud
Splunk
Observability
Cloud
Unified platform for end-to-end workflows
Early issue
detection
● Schema-on-the-fly and SPL
● Alert action automation
● Proactive and Predictive,
ML-based alerting and
notifications (ITSI)
Fast root-cause
analysis
● Log analytics at scale
● Related Content for added
context
Contextualization
● Purpose-built views of infrastructure,
application and end-user experience
● Real-time analytics for traces and metrics
● Visibility across any type of workloads
Guided troubleshooting
● No code/intuitive interface
● Dynamic Service Map and AlwaysOn
Profiling for high and granular level views
● Distributed Tracing
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not
Splunk Platform and Observability Architecture
APM
Infrastructure
Monitoring
Splunk Observability Cloud
Splunk ITSI
Splunk Cloud/Enterprise Platform
Observability
Related Content
(APM, Infra
Monitoring)
in Splunk Cloud
Log Observer
Connect
(via Service-Account or
Unified Identity)
Logs2Metrics
(Victoria Experience)
Observability
Content Pack
(KPIs, Alerts)
Real-User Monitoring Synthetic Monitoring
UF/HF OpenTelemetry as TA
Public
Private
Hybrid
Cloud
Unified
Identity
(SSO+RBAC)
Metrics &
Traces
Logs
Observability Cloud
metrics store
Dashboard
Studio
Log Observer UI
© 2024 SPLUNK INC.
Unlock more use cases
Centralize data and
workflows for full
visibility into digital
systems
Optimize monitoring
costs to achieve better
economies of scale
Improve data and user
management by
alleviating admin efforts
© 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute
Why Build your
Observability
Practice with
Splunk ?
© 2024 SPLUNK INC.
Why Build a leading observability practice
with Splunk
Complete business
visibility
across any environment
and any stack
Earlier detection &
faster investigation
of business-impacting
issues
Better control
of your data and
costs
© 2024 SPLUNK INC.
across any environment and any stack
How Splunk helps you get there
Complete business
visibility
● Easily monitor critical business processes like sales,
orders, abandonment & customer behavior
● See every user transaction, with no blind spots
● Visibility across COTS & homegrown apps &
infrastructure, spanning monoliths to microservices
“We have so much information at our fingertips
thanks to Splunk… we’re constantly solving
business problems in creative ways.”
Don Mahler | Director of Performance Management | Leidos
© 2024 SPLUNK INC.
of business-impacting issues
How Splunk helps you get there
Earlier detection &
faster investigation
● Search & analyze unstructured & structured data at
petabyte-scale
● Dynamically updated visualizations, out-of-the-box
● Real-time detection and guided root cause analysis
● Event correlation for alert noise reduction
● Instrument, tag and analyze high-cardinality metrics
“Splunk Observability Cloud helps us make
blazing-faster decisions…our development
team gains instant intelligence to support our
goals of always offering customers
outstanding services.”
Jose Felipe Lopez | EVP | Engineering | Rappi
© 2024 SPLUNK INC.
of your data and costs
How Splunk helps you get there
Better control
● OpenTelemetry native - avoid vendor lock-in
● Reduce toil by instrumenting once as you build new apps
● Instrument everything flexibly, pay only for what’s needed
● Optimize telemetry volume and costs with flexible data
management capabilities
● Enterprise controls enable self-service observability
“We could bake OpenTelemetry into our
architecture from day one because we
have Splunk, who is the number-one
contributor to OpenTelemetry and way
ahead of the curve on this.”
Splunk
Observability
Open Standards
Data Collection
Sean Schade | Principal Architect, Care.com
© 2024 SPLUNK INC.
Logs
Integration
Cloud-native,
microservices
environments
Unified Experience
(Common look & feel, SSO, unified AI, deep links)
Traditional
three-tier
environments
Logs
Integration
On-prem | SaaS
Business service monitoring, event aggregation / AIOps, network monitoring
IT Service Intelligence
Observability Cloud
Platform
Private Public
Integrated Full-Stack Observability
A view of the combined portfolio
AppDynamics
© 2024 SPLUNK INC.
Cloud Native
Applications
Infrastructure
Monitoring
Microservices APM
Digital Experience
Monitoring
Splunk Platform
Log Analytics
Business
Performance
Monitoring
APM for
Three-Tier
Apps
Digital
Experience
Monitoring
App Security
& Risk Mgmt
Three-Tier
Applications
SPLUNK OBSERVABILITY CLOUD
SPLUNK APPDYNAMICS
Service Monitoring
AIOps Event Intelligence
SPLUNK IT SERVICE INTELLIGENCE
Private Public
Integrated Full-Stack Observability
A view of key capabilities & integration roadmap
Network Monitoring
via ThousandEyes & Meraki
© 2024 SPLUNK INC.
Resilience provides the advantage
Resilience = O11y + Security
● Customers want Resilience
● Splunk is the only vendor to lead in O11y + Security
© 2024 SPLUNK INC.
The Foundation for Digital Resilience
Unified Platform for Observability AND Security
© 2024 SPLUNK INC.
Add Observability Cloud + IT Service Intelligence
Enable Digital Resilience
© 2024 SPLUNK INC.
Thank You!
Thank you
© 2023 SPLUNK INC.
17:00 - 17:20
© 2023 SPLUNK INC.
Splunk Observability Cloud - Case Study
IT Specialist, Innogy ČR
Lukáš Gottesman
Sr. Solutions Architect, ALEF NULA
Jakub Tamchyna
Splunk DEM
Lukáš Gottesman
IT Specialist
1.2025
42
Splunk
Observability
Cloud
DEM
Jakub Tamchyna
Senior Solution Architect
43
Splunk
Observability
Real User Monitoring
APM
Infrastructure
Monitoring
Incident
Response
Log Analysis
Synthetic
Monitoring
On-Prem | Hybrid Cloud | Multi-Cloud | Cloud-Native
Real-Time Analytics-Powered Enterprise-Grade
OpenTelemetry-Nativ
e
Full-Stack
Splunk Observability Cloud components
Ensuring optimal performance and
user satisfaction in today's digital
landscape
DEM alias Digital Experience Monitoring
45
• Real User Monitoring - (RUM) injects an agent on each page of a website
or application. The agent reports real page load data for every request that
is really made for each page.
• Synthetic Monitoring - generates synthetic (not data from real users or
interactions) traffic data to collect data on page performance. Runs
periodically on remote (often global) infrastructure.
RUM vs Synthetic
Main goals - while Synthetic monitoring helps diagnose and solve shorter-term
performance problems, RUM offers insight into long-term trends.
46
Splunk Synthetic Monitoring key features
Proactive end-to-end monitoring
• SLA/SLO tracking
• Detailed performance metrics
• Business transactions (journeys)
Visualization and Reporting
Alerting and Notifications
• Web site uptime tests
• Browser tests
• Backend API tests
Global monitoring locations
Externí monitoring webů v innogy
Splunk Synthetic Monitoring 🡪 o11y
innogy · 20. ledna 2025
1
Obecně k monitoringu
webů
O co jde a co může
nabídnout …
2
Staráme se o spolehlivý
běh webů …
3
Několik čísel k
monitoringu …
4
Proč padla volba na
Splunk SM?
5
Perličky z
implementace …
6
Migrace Splunk SM 🡪
O11y
innogy · Externí monitoring webů v innogy · 20.ledna 2025
innogy · Externí monitoring webů v innogy · 20.ledna 2025 49
Je nás 11 a staráme se o weby, mobilní aplikace a integrace
Snažíme minimalizovat důsledky nedostupnosti či dlouhých odezev našich webových aplikací
způsobené např.:
- nasazením nových funkcí, změnami na infrastruktuře (servery, security politiky, …),
redaktorskými úpravami, apod.
Staráme se o spolehlivý běh webů …
Weby
Cca 23 webů na innogy.cz a subdoménách (homepage, B2B/B2C portály, emobility, CNG, …)
+ Testovací weby + Cca 19 domén kde je nastaven redirect
Mobilní aplikace
iOS + Android – každá má specifický způsob integrace na SAPy
Integrace
SAPy + služby třetích stran (Mluvii, Kadlec, MPSV, …)
innogy · Externí monitoring webů v innogy · 20.ledna 2025 50
54 Uptime checků – monitoring FE
•32 prioritních webů (co 1 minutu)
•3 weby s delší frekvencí (co 5/15 minut)
•19 redirektovaných domén (co 24 hodin)
Několik čísel k monitoringu …
20 API checků – monitoring BE
•14x služby na SAPy (CRM/ISU, portál/app)
(co 10 minut)
•6x služby třetích stran (Kadlec, Mluvii, OTE, …)
7 Browser checků – scénář průchodu uživatele
•např. přihlášení/odhlášení uživatele, kalkulačka produktů, ...
Za rok 2024 cca 1020 notifikací o chybě (chyb více – většinou nechodí notifikace hned po první chybě)
• delší odezvy (>3s)
• nedostupnost
• plánované odstávky (releasy, údržba,...)
Na týdenní bázi hodnotíme odezvy webů/BE služeb a řešíme případné odchylky od normálu.
innogy · Externí monitoring webů v innogy · 20.ledna 2025 51
Pro externí monitoring jsem využívali službu New Relic
•Došlo ke změně v modelu předplatného 🡪 výrazné zdražení
Hledali jsme možnou náhradu
•Splunk čerstvě převzal službu Rigor a začal poskytovat syntetický monitoring
•Uvažovali jsme i o možné integraci se Splunk Enterprise
Připravili jsme POC na Splunk Synthetic Monitoring
🡪 služba nám vyhovuje, integrace s „velkým splunkem“ se řešit nebude
Proč padla volba na Splunk Synthetic Monitoring?
innogy · Externí monitoring webů v innogy · 20.ledna 2025 52
Migrace z New Relic
•Export/Import browser checků pomocí Selenium skriptů – nebyl 100%, bylo nutné doladit
Notifikace
•Původně chtěné SMS nebyly u českých operátorů podporovány 🡪 nutno hledat jiný kanál (email
je nedostačující)
•Zvolen MS Teams a integrace přes Webhooky
Vytváření browser checků
•Často není jednoduché vybrat správný element na stránce či cestu jak se ho „chytit“
•Na pevno nastavený čas čekání na další krok (problém u pomalejší odezvy BE) – v o11y už je
možnost volby
Perličky z implementace …
innogy · Externí monitoring webů v innogy · 20.ledna 2025 53
Vlastní automatická migrace proběhla hladce
•jen výjimečně drobné doladění u některých checků/testů
Nebylo zmigrováno vše
•response monitor, alerty (detektory) – bylo nutné u všech testů znovu ručně nastavit
Problémy k řešení
• notifikace s výrazným odstupem
• nefunkční auto-retry
• nemožnost "Run now“
• nemožnost nastavit „Blackout periods“
• dashboardy
Migrace Splunk SM 🡪 O11y
Děkujeme!
54
© 2023 SPLUNK INC.
18:05 - 18:25
© 2023 SPLUNK INC.
OpenTelemetry And Splunk
Solutions Engineer - GSS, Splunk
Houssem Eddine Djlassi
© 2024 SPLUNK INC.
Link to this Presentation
© 2024 SPLUNK INC.
OpenTelemetry
& Splunk
January 2025
Houssem Djlassi - Solutions Engineer
Houssem Djlassi - Solutions Engineer
Forward-
looking
statements
This presentation may contain forward-looking statements regarding future events, plans or the expected financial
performance of our company, including our expectations regarding our products, technology, strategy, customers,
markets, acquisitions and investments. These statements reflect management’s current expectations, estimates and
assumptions based on the information currently available to us. These forward-looking statements are not guarantees of
future performance and involve significant risks, uncertainties and other factors that may cause our actual results,
performance or achievements to be materially different from results, performance or achievements expressed or implied
by the forward-looking statements contained in this presentation.
For additional information about factors that could cause actual results to differ materially from those described in the
forward-looking statements made in this presentation, please refer to our periodic reports and other filings with the SEC,
including the risk factors identified in our most recent quarterly reports on Form 10-Q and annual reports on Form 10-K,
copies of which may be obtained by visiting the Splunk Investor Relations website at www.investors.splunk.com or the
SEC's website at www.sec.gov. The forward-looking statements made in this presentation are made as of the time and
date of this presentation. If reviewed after the initial presentation, even if made available by us, on our website or
otherwise, it may not contain current or accurate information. We disclaim any obligation to update or revise any
forward-looking statement based on new information, future events or otherwise, except as required by applicable law.
In addition, any information about our roadmap outlines our general product direction and is subject to change at any
time without notice. It is for informational purposes only and shall not be incorporated into any contract or other
commitment. We undertake no obligation either to develop the features or functionalities described, in beta or in preview
(used interchangeably), or to include any such feature or functionality in a future release.
Splunk, Splunk> and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names or trademarks belong to their respective owners.
© 2024 Splunk Inc. All rights reserved.
© 2024 SPLUNK INC.
© 2024 SPLUNK INC.
What is
OpenTelemetry?
OpenTelemetry is a collection of tools, APIs and software development kits (SDKs) used to instrument,
generate, collect and export telemetry data (metrics, logs, traces, and more) that helps you analyze your
software’s performance and behavior.
© 2024 SPLUNK INC.
Why it matters?
Inconsistent instrumentation
across different services and
applications.
Fragmentation in
Observability Tools
Difficulty in switching between
observability backends or using
multiple tools simultaneously.
Vendor Lock-In
Inability to collect all three types
of telemetry data (Logs, Metrics
and Traces) using a single
framework.
Limited Data collection
capabilities
Inconsistent data formats and
semantics across different
observability solutions.
Lack of Standardization
in Telemetry Data
Challenges in instrumentation and
monitoring distributed systems, especially
in cloud-native and microservices
environments.
Complex integration
with Cloud-Native
Architectures
© 2024 SPLUNK INC.
OpenTelemetry
Project
Components
© 2024 SPLUNK INC.
OpenTelemetry Project Components
© 2024 SPLUNK INC.
Collector Components
© 2024 SPLUNK INC.
Main OpenTelemetry components:
❏ A specification for all components.
❏ Semantic conventions that define a standard naming
scheme for common telemetry data types.
❏ Language SDKs that implement the specification, APIs,
and export of telemetry data.
❏ Automatic instrumentation components that generate
telemetry data without requiring code changes.
❏ Various other tools, such as the OpenTelemetry Operator
for Kubernetes, OpenTelemetry Helm Charts, and
community assets for FaaS
❏ A standard protocol that defines the shape of telemetry
data.
❏ APIs that define how to generate telemetry data.
❏ A library ecosystem that implements instrumentation for
common libraries and frameworks.
❏ The OpenTelemetry Collector, a proxy that receives,
processes, and exports telemetry data.
OpenTelemetry consists of the following major components:
An Observability framework and toolkit
designed to create and manage telemetry
data such as traces, metrics, and logs.
© 2024 SPLUNK INC.
Exporters:
● OTLP: sends telemetry via the
OpenTelemetry Protocol.
● Splunk HEC: sends telemetry to
Splunk HTTP Event Collector endpoints.
● Jaeger: send traces data to Jaeger
backends.
Processors
● Batch: groups data into batches for more
efficient processing.
● Memory Limiter: limits the amount of
memory used by the collector.
● Transform: modifies data format or adds
metadata.
Receivers:
● OTLP: supports gRPC and HTTP
protocols.
● Prometheus: scrapes metrics from
Prometheus-instrumented targets.
● Filelog: tails and parses logs from files.
Examples
Collector Components
© 2024 SPLUNK INC.
Collector Components - Host Metric
Receiver
The Host Metrics receiver generates metrics about the host system scraped from various
sources and host entity event as log. This is intended to be used when the collector is
deployed as an agent.
© 2024 SPLUNK INC.
Collector Components - K8s attributes
Processor
Kubernetes attributes processor allow automatic setting of spans, metrics and logs resource attributes
with k8s metadata.
The processor automatically discovers* k8s resources (pods), extracts metadata from them and adds
the extracted metadata to the relevant spans, metrics and logs as resource attributes.
*: The processor uses the kubernetes API to discover all pods
running in a cluster, keeps a record of their IP addresses,
pod UIDs and interesting metadata.
© 2024 SPLUNK INC.
About instrumentation:
Automatic vs Manual
OpenTelemetry provides more than just zero-code and code-based telemetry solutions. The following things are also a part of OpenTelemetry:
➔ Libraries can leverage the OpenTelemetry API as a dependency, which will have no impact on applications using that library, unless the OpenTelemetry SDK is
imported.
➔ For each signal (traces, metrics, logs) you have several methods at your disposals to create, process, and export them.
➔ With context propagation built into the implementations, you can correlate signals regardless of where they are generated.
➔ Resources and Instrumentation Scopes allow grouping of signals, by different entities, like, the host, operating system or K8s cluster
➔ Each language-specific implementation of the API and SDK follows the requirements and expectations of the OpenTelemetry specification.
➔ Semantic Conventions provide a common naming schema that can be used for standardization across code bases and platforms.
© 2024 SPLUNK INC.
OTel Collector vs. Universal/Heavy Forwarder
© 2024 SPLUNK INC.
★ Patches / attaches to a library
★ Collect data library activities in runtime
★ Produces spans based on specification and
semantic-conventions
★ May offer additional configuration / features
★ List of all auto-instrumentation:
https://opentelemetry.io/registry/
Automatic/Zero-Code:
★ Application developer writes dedicated code
★ Starts and end span, set status
★ Adding attributes and events
Manual/Code-based:
About instrumentation:
Automatic vs Manual
In order to make a system observable, it must be instrumented: That is, code from the system’s components must emit traces, metrics, and
logs.
© 2019 SPLUNK INC.
Benefits
1) Avoid vendor lock-in
Choose the tools that are right for your business
2) Data Flexibility
Collect and analyze custom metrics
3) Easy set up
Instrument only once the way that works best for you
4) Operate at scale
Scale without tool-specific considerations
5) Community
Second most popular project behind Kubernetes
© 2024 SPLUNK INC.
Use Cases
Infrastructure:
● Perform version audits to
ensure zero vulnerabilities and
make sure configurations are
working
● Identify configuration changes
leading to performance
degeneration
● Check for misconfiguration
with your domain name
system (DNS), causing apps
to be inaccessible
Back End:
● Detect faulty logic or incorrect
user input, which leads to
exceptions being thrown
● Identify improperly
implemented API calls to the
back end
● Uncover poorly performing
code on an API, which also
leads to longer response time
Front End:
● Detect faulty logic or
incorrect user input to errors
● Find poorly implemented
code; which makes your UI
extremely slow despite
having fast APIs
● Locate geo-specific lag
requiring geo-distribution
Examples
© 2024 SPLUNK INC.
Status and Roadmap
Spans / Distributed Traces
Generally available across ~all languages, the Collector, and other components
Infrastructure Metrics, Application Metrics, Custom Metrics
Generally available across ~all languages, the Collector, and other components
Logs
Generally available across most languages, the Collector, and other components
Profiles
New, still being designed
© 2024 SPLUNK INC.
Observability Cloud GDI Roadmap
OpenTelemetry zero-configuration
Automatically instrument custom applications, databases, message queues, etc.
OpenTelemetry Agent Management
View every VM, container cluster, and service that you own, the OTel agent or SDK instrumenting each one (and
a list of ones that aren’t instrumented), and the health and status of each agent and SDK
Enhancements for Splunk Enterprise and Splunk Cloud Customers
Making the Collector deployable and configurable with Splunk Deployment Server, making the Collector fully
supported for all data types and all Splunk products, adding metrics and traces to Edge Processor
© 2024 SPLUNK INC.
Splunk and OpenTelemetry
● Splunk Observability = no proprietary
agents to enable data collection with
OpenTelemetry.
● Splunk supports automatic trace
instrumentation and configuration to
make it easy to get started.
● You can customize what's included by
building from the community source.
● View the status, interactions,
dashboards and logs from all of your
infrastructure in Splunk Observability
Cloud or other observability tools.
● You can use OpenTelemetry to capture
traces, metrics and logs from
OpenTelemetry SDKs on the same host
or over the networks, or from hundreds
of sources, including databases, network
proxies, Prometheus and Jaeger and
more.
Built-in standard for
observability
Powering end-to-end
observability
The Splunk Advantage
© 2024 SPLUNK INC.
The Splunk Distribution of the
OpenTelemetry Collector
The Splunk Distribution of OpenTelemetry Collector
supports automatic (no code modification) trace
instrumentation and comes with default configuration
and out-of-the-box support for Splunk Application
Performance Monitoring and Splunk Infrastructure
Monitoring — making it easier than ever to get started. Splunk
Observability
Open Standards
Data Collection
© 2024 SPLUNK INC.
Setup &
Configuration
© 2024 SPLUNK INC.
Collector deployment modes
This pattern consists of applications instrumented with an
OpenTelemetry SDK that export telemetry signals (traces,
metrics, logs) directly into a backend:
The agent collector deployment pattern consists of applications
— instrumented with an OpenTelemetry SDK using
OpenTelemetry protocol (OTLP) — or other collectors (using the
OTLP exporter) that send telemetry signals to a collector
instance running with the application or on the same host as
the application (such as a sidecar or a daemonset).
The gateway collector deployment pattern consists of applications
(or other collectors) sending telemetry signals to a single OTLP
endpoint provided by one or more collector instances running as a
standalone service (for example, a deployment in Kubernetes),
typically per cluster, per data center or per region.
© 2024 SPLUNK INC.
Recommended Deployment Option for K8s
Kubernetes
cluster
Splunk Cloud
Observability
Cloud
OTel
Coll.
.yaml
Reads logs
from
Logs via
S2S
Push, pull
metrics,
metadata
Spans,
metrics via
OTLP
Logs via
HEC
Edge
Processor
Heavy
Forwarder
For Kubernetes environments:
● Use the helm chart to rapidly deploy
the collector to all nodes in the cluster.
● Utilize zero-config to automatically
instrument applications.
The collector will capture metrics, traces,
and logs for both applications and infra.
© 2024 SPLUNK INC.
Recommended Deployment Option for
Linux and Windows Hosts with UF
For Linux and Windows hosts where the
UF is already deployed:
● Use the Splunk Deployment Server
with the Splunk Add-On for
OpenTelemetry Collector to deploy and
manage the collector.
UF will continue to capture logs, while the
collector will capture metrics and traces for
both applications and infra.
Host
UF OTel
Coll.
Splunk
Cloud
Observability
Cloud
.conf
TAs
.yaml
Splunk
Deployment
Server
Can
manage
Reads logs
from
Spans,
metrics via
OTLP
Push, pull
metrics,
metadata
Logs via
HEC
Edge
Processor
Heavy
Forwarder
© 2024 SPLUNK INC.
Alternate Deployment Option for
Linux and Windows Hosts
For customers with Linux and Windows
hosts that want to use OpenTelemetry to
collect 100% of their data:
● Deploy the Splunk Distribution of the
OpenTelemetry Collector using an
configuration management tool of
choice (Ansible, Puppet, etc.)
● Utilize zero-config to automatically
instrument applications.
The collector will capture metrics, traces,
and logs for both applications and infra.
Host
OTel Coll.
Splunk
Cloud
Observability
Cloud
.yaml
Reads logs
from
Spans,
metrics via
OTLP
Push, pull
metrics,
metadata
Logs via
HEC
Edge
Processor
Heavy
Forwarder
© 2024 SPLUNK INC.
What’s next ?
OpenTelemetry Official documentation: https://opentelemetry.io/docs/
Accelerating an implementation of OpenTelemetry in Splunk Observability Cloud
https://lantern.splunk.com/Observability/Getting_Started/Accelerating_an_implementation_of_OpenTelemetry_in_Splunk_Observability_Cloud
Using OpenTelemetry to get data into Splunk Cloud Platform
https://lantern.splunk.com/Splunk_Platform/Product_Tips/Data_Management/Using_OpenTelemetry_to_get_data_into_Splunk_Cloud_Platform
For a better understanding of the OpenTelemetry Configuration file: https://www.otelbin.io/
© 2024 SPLUNK INC.
Thank You!
© 2023 SPLUNK INC.
Wrap-Up
● That’s it :-)
● Please fill in the post-event survey. Check your mailboxes!
● Slides will be shared on SUG #5 event page an Slideshare as usually.
● We forgot to record Ian’s O11y session. We are very sorry :(
● Talk to us!
PSUG @ Slack
Register and subscribe to #prague-sug Slack channel
Talk to us!
Tomáš Moser tmoser@splunk.com, tommoser@cisco.com
Ingrid Nemečková inemeckova@splunk.com, inemecko@cisco.com
Michal Skorczewski mskorczewski@splunk.com, mskorcze@cisco.com
Radek Filip radek.filip@alef.com
PSUG @ LinkedIn
https://www.linkedin.com/groups/9544692/
© 2024 SPLUNK INC.
See you soon!

More Related Content

PDF
Splunk conf2014 - Getting Deeper Insights into your Virtualization and Storag...
PDF
Splunk-Presentation
PPTX
IoT Analytics @ splunk
PDF
Splunk bangalore user group 2020-06-01
PDF
Encontro anual para apresentação das novidades da .conf23
PDF
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
PDF
December Bengaluru Splunk User Group Meetup
PPTX
Splunk for vmware virtualization customer presentation
Splunk conf2014 - Getting Deeper Insights into your Virtualization and Storag...
Splunk-Presentation
IoT Analytics @ splunk
Splunk bangalore user group 2020-06-01
Encontro anual para apresentação das novidades da .conf23
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
December Bengaluru Splunk User Group Meetup
Splunk for vmware virtualization customer presentation

Similar to PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience (20)

PDF
Splunk MINT Deepdive
PDF
Splunk MINT Deepdive
PDF
Splunk MINT Deepdive
PDF
Splunk MINT Deepdive
PPTX
.conf21 - The Best of
PPTX
Monitoring End User Experiences with New Relic & Splunk
PDF
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
PPTX
Best of .conf21 Session Recommendations
PDF
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
PPTX
Getting Started with Splunk Enterprises
PPTX
SplunkLive! Splunk App for VMware
PPTX
Splunk for IT Operations Breakout Session
PPTX
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
PPTX
SplunkLive! Milano 2016 - Splunk Plenary Session
PPTX
.conf Go 2022 - Observability Session
PDF
Delivering New Visibility and Analytics for IT Operations
PDF
SplunkLive Auckland - Operational Intelligence
PDF
SplunkLive Wellington 2015 - Operational Intelligence
PPTX
SplunkLive! London 2017 - DevOps Powered by Splunk
PPTX
What’s New: Splunk App for Stream and Splunk MINT
Splunk MINT Deepdive
Splunk MINT Deepdive
Splunk MINT Deepdive
Splunk MINT Deepdive
.conf21 - The Best of
Monitoring End User Experiences with New Relic & Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
Best of .conf21 Session Recommendations
PSUG 3 - 2024-07-15 - Splunk & AI with Philipp Drieger
Getting Started with Splunk Enterprises
SplunkLive! Splunk App for VMware
Splunk for IT Operations Breakout Session
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
SplunkLive! Milano 2016 - Splunk Plenary Session
.conf Go 2022 - Observability Session
Delivering New Visibility and Analytics for IT Operations
SplunkLive Auckland - Operational Intelligence
SplunkLive Wellington 2015 - Operational Intelligence
SplunkLive! London 2017 - DevOps Powered by Splunk
What’s New: Splunk App for Stream and Splunk MINT
Ad

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Computer network topology notes for revision
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Knowledge Engineering Part 1
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Introduction to Business Data Analytics.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Computer network topology notes for revision
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Knowledge Engineering Part 1
.pdf is not working space design for the following data for the following dat...
Introduction to Business Data Analytics.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IBA_Chapter_11_Slides_Final_Accessible.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Fluorescence-microscope_Botany_detailed content
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Lecture1 pattern recognition............
Ad

PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience

  • 1. © 2023 SPLUNK INC. Prague Splunk User Group #5 20/1/2025 Splunk Observability and Business Resilience Innogy | ALEF | Splunk
  • 2. © 2023 SPLUNK INC. 15:30 - 15:50 (20 min) Check-in 15:50 - 16:00 (10 min) Opening 16:00 - 17:00 (60 min) Part 1: Splunk Observability Vision and Roadmap (English, remote) Ian Wells, Observability Advisory Director, Splunk 17:00 - 17:20 (20 min) Coffee Break 1 17:20 - 18:05 (45 min) Part 2: Splunk Synthetic Monitoring, innogy ČR (Czech) Lukáš Gottesman, innogy, Radek Filip/Jakub Tamchyna, ALEF 18:05 - 18:25 (20 min) Coffee Break 2 18:25 - 18:50 (25 min) Part 3: Splunk OpenTelemetry (English) Houssem Eddine Djlassy, Splunk 18:50 - 19:00 (10 min) Wrap-up 19:00 - 22:00 (3 hours) Dinner - Pivo Karlín (same building) Agenda
  • 3. © 2023 SPLUNK INC. Splunk User Group Community From Splunkers To Splunkers ✓ No sales ✓ No marketing ✓ It’s about YOU! ✓ Ask!
  • 4. © 2023 SPLUNK INC. Prague SUG Team Tomáš Moser Sr. Solutions Engineer - GSS, Splunk tmoser@splunk.com Technical Support Engineer, Splunk inemeckova@splunk.com Ingrid Nemečková Splunk Consultant, ALEF NULA radek.filip@alef.com Radek Filip Michał Skórczewski Sr. Solutions Engineer, Splunk mskorczewsky@splunk.com
  • 5. © 2023 SPLUNK INC. Splunk Observability Vision & Roadmap Director - Splunk Advisory Observability Ian Wells “Observability is a way to investigate unknown unknowns By instrumenting everything”
  • 6. © 2024 SPLUNK INC. Splunk Observability “O11y” Vision & Roadmap Ian Wells Director - Observability Advisory 2025
  • 7. © 2024 SPLUNK INC. What is Observability? Depends on who you ask, but . . . “Metrics, Traces, and Logs” “The ability to infer the state of a system by examining its output” “A great new thing to spend money on” The real definition: Observability is a way to investigate unknown unknowns by instrumenting everything
  • 8. © 2024 SPLUNK INC. WHAT’S HAPPENING? Observability The Three Pillars METRICS Detect WHERE IS IT HAPPENING? TRACES Troubleshoot WHY IS IT HAPPENING? EVENTS / LOGS Pinpoint
  • 9. © 2024 SPLUNK INC. Cloud Transformation drives O11y To increase velocity, agility and responsiveness Retain & Optimize Lift & Shift Re-Factor Re-Architect / Cloud-Native DEV OPS DEV OPS DEV OPS DEV OPS Cloud Managed e.g. RDS, DynamoDB, SaaS Cloud First Architecture Tightly Coupled Apps, Slow Deployment Cycles Primarily using Cloud IaaS More Modular, but Dependent App Components Loosely Coupled Microservices, and Serverless Functions VM VM VM VM VM VM VM VM VM Private Public VM VM VM VM VM VM Private Public Private Public
  • 10. © 2020 SPLUNK INC. Cluster Control plane Containers Pods Micro-Service A 10.10.9.2 10.10.10.2 10.10.10.3 10.10.10.4 Micro-Service B 10.10.9.1 10.10.10.1 Node Master Cloud Native monitoring is a challenge Huge volume of high cardinality (very unique) metrics Metrics Tsunami Why monitor every 5mn something that can be deployed, killed and redeployed in seconds Ephemerality
  • 11. © 2020 SPLUNK INC. 11:05:00 AM OK 11:06:00 AM OK 11:05:13 11:05:45 11:05:26 Application inside the container crashed & will be restarted by K8s Not “observable” long minute MONITORING INFRASTRUCTURE USERS 1 mn Why monitor every 5mn something that can be deployed, killed and redeployed in seconds Ephemerality
  • 12. © 2020 SPLUNK INC. Monolith Monolith Data Single process Q A A Q Q Airport kiosk Mobile app Online booking CHECKOUT MILES FLIGHT INFO BAGAGE </> Java </> Node JS </> .Net Boarding pass </> Golang A Q Q Q Q A A A Microservices Monitoring Applications in the new world is a Challenge
  • 13. © 2024 SPLUNK INC. Comprehensive Data Collection: Implement OpenTelemetry instrumentation across your application stack to collect distributed traces, metrics, and logs. 2 Distributed Tracing: Utilize OpenTelemetry's distributed tracing capabilities to trace end-to-end journeys of financial transactions, allowing you to identify bottlenecks and monitor the performance of critical services. 5 Data & Event Correlation: Correlate logs, metrics and traces to provide a holistic view of system behavior. 1 Full Fidelity Real-time Monitoring: Engineers should be able to access real-time dashboards, alerts, and notifications to respond to incidents promptly. No data sampling should be required to achieve scale. 3 Automation and Self-Healing: Implement automation for common operational tasks and self-healing mechanisms that can address issues without manual intervention. Move to O11y as code for adoption & implementation. 6 Centralised Data Platform: All data to be captured within a single common data platform which can ideally extend beyond the O11y use case. Able to handle structured and unstructured data types in real time at high cardinality and be accessible by multiple persona types. Will encourage collaboration between development, operations, and security teams plus allow sharing of observability insights and data to facilitate joint troubleshooting and problem resolution. 4 Core Principles of Observability
  • 14. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute Why do you need Observability?
  • 15. © 2024 SPLUNK INC. What problems do we need to solve? Quite a lot actually …. Security is a Data Problem Observability is a Data Problem Resilience is a Data Problem Infrastructure is changing Applications are changing We work differently - Agile vs Waterfall “You Build it you Run it” Expectations are high - Digital Experience Siloed Teams Siloed Tools Siloed Data It’s a Data Problem Environment People & Process
  • 16. © 2024 SPLUNK INC. Where does the journey begin …..
  • 17. © 2024 SPLUNK INC. Solving the problem is hard ….. Dashboards are Green Nobody will admit to a problem But the Main Service is Down …..
  • 18. © 2024 SPLUNK INC. You have something like this….. Security & IT Logs, Infra Data, App Data Security Monitoring Agent Agent Agent Agent Agent Agent Agent Agent Agent
  • 19. © 2024 SPLUNK INC. You could have this ….. Security & IT Logs Infrastructure Data (Metrics) Application Data (Traces) Observability Cloud Open Telemetry Splunk Security Open Telemetry Splunk Observability
  • 20. © 2024 SPLUNK INC. Observability Cloud Is my Infrastucture performing as I expect it to? Servers, VMs, OpenShift, K8S, Cloud, Databases etc… Is my Application performing as I expect it to? How is my service interacting with others? Is my Application / Site / Service working as I expect it to? Is it down or slow? What are the users doing with my application? “I am an SRE” “I want to know….” ● Real Time ● What went wrong? ● How do I fix it? ● How do I improve it? Infrastructure Application Log Analytics What does this all do?
  • 21. © 2024 SPLUNK INC. © 2023 SPLUNK INC. “The Engineer” ● Real Time ● What went wrong? ● How do I fix it? Who are you? How do you want to consume your information? Infrastructure Application User Experience Infrastructure Application Log Analytics “The Service Owner” ● Full Visibility ● How is the Service Running? ● Is there any action we need to take? Log Analytics Security Insights
  • 22. © 2024 SPLUNK INC. You now have this…
  • 23. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute Why Splunk Observability for Splunk Platform users ?
  • 24. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Combining Platform and Observability help address 3 pain points Explosion of telemetry and tool sprawl have resulted in demand for tool consolidation and better data management Growing data volume and costs Increased complexity of environments Fragmented admin experience Most organizations today are hybrid cloud and deploy and manage multiple environments Explosion of solutions makes user and data management challenging for admins
  • 25. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Splunk Observability completes Splunk Platform (and vice versa) ES/ITSI Content Packs Splunk Apps Third Party APM Splunk Platform Splunk Forwarders Proprietary Agent(s) Logs Metrics Without Splunk Observability Cloud With Splunk Observability Cloud Third Party Apps Add powerful metric and trace analytics to provide out-of-the-box visibility, anomaly detection and directed troubleshooting for hybrid infrastructure and applications Splunk Platform Splunk Forwarders OpenTelemetry Collector Logs Agent Metrics Traces Content Packs On-Call ES/ITSI Third Party APM Splunk Enterprise 9.0, Splunk Cloud Platform Splunk Apps Splunk IM Splunk APM Splunk DEM
  • 26. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not System health monitoring ● Customizable dashboarding for better insights ● Accurate system view with full-fidelity and scalable data platform Splunk Enterprise/ Cloud Splunk Observability Cloud Unified platform for end-to-end workflows Early issue detection ● Schema-on-the-fly and SPL ● Alert action automation ● Proactive and Predictive, ML-based alerting and notifications (ITSI) Fast root-cause analysis ● Log analytics at scale ● Related Content for added context Contextualization ● Purpose-built views of infrastructure, application and end-user experience ● Real-time analytics for traces and metrics ● Visibility across any type of workloads Guided troubleshooting ● No code/intuitive interface ● Dynamic Service Map and AlwaysOn Profiling for high and granular level views ● Distributed Tracing
  • 27. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Splunk Platform and Observability Architecture APM Infrastructure Monitoring Splunk Observability Cloud Splunk ITSI Splunk Cloud/Enterprise Platform Observability Related Content (APM, Infra Monitoring) in Splunk Cloud Log Observer Connect (via Service-Account or Unified Identity) Logs2Metrics (Victoria Experience) Observability Content Pack (KPIs, Alerts) Real-User Monitoring Synthetic Monitoring UF/HF OpenTelemetry as TA Public Private Hybrid Cloud Unified Identity (SSO+RBAC) Metrics & Traces Logs Observability Cloud metrics store Dashboard Studio Log Observer UI
  • 28. © 2024 SPLUNK INC. Unlock more use cases Centralize data and workflows for full visibility into digital systems Optimize monitoring costs to achieve better economies of scale Improve data and user management by alleviating admin efforts
  • 29. © 2024 SPLUNK INC. | Splunk Confidential and Internal - Do Not Distribute Why Build your Observability Practice with Splunk ?
  • 30. © 2024 SPLUNK INC. Why Build a leading observability practice with Splunk Complete business visibility across any environment and any stack Earlier detection & faster investigation of business-impacting issues Better control of your data and costs
  • 31. © 2024 SPLUNK INC. across any environment and any stack How Splunk helps you get there Complete business visibility ● Easily monitor critical business processes like sales, orders, abandonment & customer behavior ● See every user transaction, with no blind spots ● Visibility across COTS & homegrown apps & infrastructure, spanning monoliths to microservices “We have so much information at our fingertips thanks to Splunk… we’re constantly solving business problems in creative ways.” Don Mahler | Director of Performance Management | Leidos
  • 32. © 2024 SPLUNK INC. of business-impacting issues How Splunk helps you get there Earlier detection & faster investigation ● Search & analyze unstructured & structured data at petabyte-scale ● Dynamically updated visualizations, out-of-the-box ● Real-time detection and guided root cause analysis ● Event correlation for alert noise reduction ● Instrument, tag and analyze high-cardinality metrics “Splunk Observability Cloud helps us make blazing-faster decisions…our development team gains instant intelligence to support our goals of always offering customers outstanding services.” Jose Felipe Lopez | EVP | Engineering | Rappi
  • 33. © 2024 SPLUNK INC. of your data and costs How Splunk helps you get there Better control ● OpenTelemetry native - avoid vendor lock-in ● Reduce toil by instrumenting once as you build new apps ● Instrument everything flexibly, pay only for what’s needed ● Optimize telemetry volume and costs with flexible data management capabilities ● Enterprise controls enable self-service observability “We could bake OpenTelemetry into our architecture from day one because we have Splunk, who is the number-one contributor to OpenTelemetry and way ahead of the curve on this.” Splunk Observability Open Standards Data Collection Sean Schade | Principal Architect, Care.com
  • 34. © 2024 SPLUNK INC. Logs Integration Cloud-native, microservices environments Unified Experience (Common look & feel, SSO, unified AI, deep links) Traditional three-tier environments Logs Integration On-prem | SaaS Business service monitoring, event aggregation / AIOps, network monitoring IT Service Intelligence Observability Cloud Platform Private Public Integrated Full-Stack Observability A view of the combined portfolio AppDynamics
  • 35. © 2024 SPLUNK INC. Cloud Native Applications Infrastructure Monitoring Microservices APM Digital Experience Monitoring Splunk Platform Log Analytics Business Performance Monitoring APM for Three-Tier Apps Digital Experience Monitoring App Security & Risk Mgmt Three-Tier Applications SPLUNK OBSERVABILITY CLOUD SPLUNK APPDYNAMICS Service Monitoring AIOps Event Intelligence SPLUNK IT SERVICE INTELLIGENCE Private Public Integrated Full-Stack Observability A view of key capabilities & integration roadmap Network Monitoring via ThousandEyes & Meraki
  • 36. © 2024 SPLUNK INC. Resilience provides the advantage Resilience = O11y + Security ● Customers want Resilience ● Splunk is the only vendor to lead in O11y + Security
  • 37. © 2024 SPLUNK INC. The Foundation for Digital Resilience Unified Platform for Observability AND Security
  • 38. © 2024 SPLUNK INC. Add Observability Cloud + IT Service Intelligence Enable Digital Resilience
  • 39. © 2024 SPLUNK INC. Thank You! Thank you
  • 40. © 2023 SPLUNK INC. 17:00 - 17:20
  • 41. © 2023 SPLUNK INC. Splunk Observability Cloud - Case Study IT Specialist, Innogy ČR Lukáš Gottesman Sr. Solutions Architect, ALEF NULA Jakub Tamchyna Splunk DEM
  • 43. 43 Splunk Observability Real User Monitoring APM Infrastructure Monitoring Incident Response Log Analysis Synthetic Monitoring On-Prem | Hybrid Cloud | Multi-Cloud | Cloud-Native Real-Time Analytics-Powered Enterprise-Grade OpenTelemetry-Nativ e Full-Stack Splunk Observability Cloud components
  • 44. Ensuring optimal performance and user satisfaction in today's digital landscape DEM alias Digital Experience Monitoring
  • 45. 45 • Real User Monitoring - (RUM) injects an agent on each page of a website or application. The agent reports real page load data for every request that is really made for each page. • Synthetic Monitoring - generates synthetic (not data from real users or interactions) traffic data to collect data on page performance. Runs periodically on remote (often global) infrastructure. RUM vs Synthetic Main goals - while Synthetic monitoring helps diagnose and solve shorter-term performance problems, RUM offers insight into long-term trends.
  • 46. 46 Splunk Synthetic Monitoring key features Proactive end-to-end monitoring • SLA/SLO tracking • Detailed performance metrics • Business transactions (journeys) Visualization and Reporting Alerting and Notifications • Web site uptime tests • Browser tests • Backend API tests Global monitoring locations
  • 47. Externí monitoring webů v innogy Splunk Synthetic Monitoring 🡪 o11y innogy · 20. ledna 2025
  • 48. 1 Obecně k monitoringu webů O co jde a co může nabídnout … 2 Staráme se o spolehlivý běh webů … 3 Několik čísel k monitoringu … 4 Proč padla volba na Splunk SM? 5 Perličky z implementace … 6 Migrace Splunk SM 🡪 O11y innogy · Externí monitoring webů v innogy · 20.ledna 2025
  • 49. innogy · Externí monitoring webů v innogy · 20.ledna 2025 49 Je nás 11 a staráme se o weby, mobilní aplikace a integrace Snažíme minimalizovat důsledky nedostupnosti či dlouhých odezev našich webových aplikací způsobené např.: - nasazením nových funkcí, změnami na infrastruktuře (servery, security politiky, …), redaktorskými úpravami, apod. Staráme se o spolehlivý běh webů … Weby Cca 23 webů na innogy.cz a subdoménách (homepage, B2B/B2C portály, emobility, CNG, …) + Testovací weby + Cca 19 domén kde je nastaven redirect Mobilní aplikace iOS + Android – každá má specifický způsob integrace na SAPy Integrace SAPy + služby třetích stran (Mluvii, Kadlec, MPSV, …)
  • 50. innogy · Externí monitoring webů v innogy · 20.ledna 2025 50 54 Uptime checků – monitoring FE •32 prioritních webů (co 1 minutu) •3 weby s delší frekvencí (co 5/15 minut) •19 redirektovaných domén (co 24 hodin) Několik čísel k monitoringu … 20 API checků – monitoring BE •14x služby na SAPy (CRM/ISU, portál/app) (co 10 minut) •6x služby třetích stran (Kadlec, Mluvii, OTE, …) 7 Browser checků – scénář průchodu uživatele •např. přihlášení/odhlášení uživatele, kalkulačka produktů, ... Za rok 2024 cca 1020 notifikací o chybě (chyb více – většinou nechodí notifikace hned po první chybě) • delší odezvy (>3s) • nedostupnost • plánované odstávky (releasy, údržba,...) Na týdenní bázi hodnotíme odezvy webů/BE služeb a řešíme případné odchylky od normálu.
  • 51. innogy · Externí monitoring webů v innogy · 20.ledna 2025 51 Pro externí monitoring jsem využívali službu New Relic •Došlo ke změně v modelu předplatného 🡪 výrazné zdražení Hledali jsme možnou náhradu •Splunk čerstvě převzal službu Rigor a začal poskytovat syntetický monitoring •Uvažovali jsme i o možné integraci se Splunk Enterprise Připravili jsme POC na Splunk Synthetic Monitoring 🡪 služba nám vyhovuje, integrace s „velkým splunkem“ se řešit nebude Proč padla volba na Splunk Synthetic Monitoring?
  • 52. innogy · Externí monitoring webů v innogy · 20.ledna 2025 52 Migrace z New Relic •Export/Import browser checků pomocí Selenium skriptů – nebyl 100%, bylo nutné doladit Notifikace •Původně chtěné SMS nebyly u českých operátorů podporovány 🡪 nutno hledat jiný kanál (email je nedostačující) •Zvolen MS Teams a integrace přes Webhooky Vytváření browser checků •Často není jednoduché vybrat správný element na stránce či cestu jak se ho „chytit“ •Na pevno nastavený čas čekání na další krok (problém u pomalejší odezvy BE) – v o11y už je možnost volby Perličky z implementace …
  • 53. innogy · Externí monitoring webů v innogy · 20.ledna 2025 53 Vlastní automatická migrace proběhla hladce •jen výjimečně drobné doladění u některých checků/testů Nebylo zmigrováno vše •response monitor, alerty (detektory) – bylo nutné u všech testů znovu ručně nastavit Problémy k řešení • notifikace s výrazným odstupem • nefunkční auto-retry • nemožnost "Run now“ • nemožnost nastavit „Blackout periods“ • dashboardy Migrace Splunk SM 🡪 O11y
  • 55. © 2023 SPLUNK INC. 18:05 - 18:25
  • 56. © 2023 SPLUNK INC. OpenTelemetry And Splunk Solutions Engineer - GSS, Splunk Houssem Eddine Djlassi
  • 57. © 2024 SPLUNK INC. Link to this Presentation
  • 58. © 2024 SPLUNK INC. OpenTelemetry & Splunk January 2025 Houssem Djlassi - Solutions Engineer Houssem Djlassi - Solutions Engineer
  • 59. Forward- looking statements This presentation may contain forward-looking statements regarding future events, plans or the expected financial performance of our company, including our expectations regarding our products, technology, strategy, customers, markets, acquisitions and investments. These statements reflect management’s current expectations, estimates and assumptions based on the information currently available to us. These forward-looking statements are not guarantees of future performance and involve significant risks, uncertainties and other factors that may cause our actual results, performance or achievements to be materially different from results, performance or achievements expressed or implied by the forward-looking statements contained in this presentation. For additional information about factors that could cause actual results to differ materially from those described in the forward-looking statements made in this presentation, please refer to our periodic reports and other filings with the SEC, including the risk factors identified in our most recent quarterly reports on Form 10-Q and annual reports on Form 10-K, copies of which may be obtained by visiting the Splunk Investor Relations website at www.investors.splunk.com or the SEC's website at www.sec.gov. The forward-looking statements made in this presentation are made as of the time and date of this presentation. If reviewed after the initial presentation, even if made available by us, on our website or otherwise, it may not contain current or accurate information. We disclaim any obligation to update or revise any forward-looking statement based on new information, future events or otherwise, except as required by applicable law. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. We undertake no obligation either to develop the features or functionalities described, in beta or in preview (used interchangeably), or to include any such feature or functionality in a future release. Splunk, Splunk> and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2024 Splunk Inc. All rights reserved. © 2024 SPLUNK INC.
  • 60. © 2024 SPLUNK INC. What is OpenTelemetry? OpenTelemetry is a collection of tools, APIs and software development kits (SDKs) used to instrument, generate, collect and export telemetry data (metrics, logs, traces, and more) that helps you analyze your software’s performance and behavior.
  • 61. © 2024 SPLUNK INC. Why it matters? Inconsistent instrumentation across different services and applications. Fragmentation in Observability Tools Difficulty in switching between observability backends or using multiple tools simultaneously. Vendor Lock-In Inability to collect all three types of telemetry data (Logs, Metrics and Traces) using a single framework. Limited Data collection capabilities Inconsistent data formats and semantics across different observability solutions. Lack of Standardization in Telemetry Data Challenges in instrumentation and monitoring distributed systems, especially in cloud-native and microservices environments. Complex integration with Cloud-Native Architectures
  • 62. © 2024 SPLUNK INC. OpenTelemetry Project Components
  • 63. © 2024 SPLUNK INC. OpenTelemetry Project Components
  • 64. © 2024 SPLUNK INC. Collector Components
  • 65. © 2024 SPLUNK INC. Main OpenTelemetry components: ❏ A specification for all components. ❏ Semantic conventions that define a standard naming scheme for common telemetry data types. ❏ Language SDKs that implement the specification, APIs, and export of telemetry data. ❏ Automatic instrumentation components that generate telemetry data without requiring code changes. ❏ Various other tools, such as the OpenTelemetry Operator for Kubernetes, OpenTelemetry Helm Charts, and community assets for FaaS ❏ A standard protocol that defines the shape of telemetry data. ❏ APIs that define how to generate telemetry data. ❏ A library ecosystem that implements instrumentation for common libraries and frameworks. ❏ The OpenTelemetry Collector, a proxy that receives, processes, and exports telemetry data. OpenTelemetry consists of the following major components: An Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.
  • 66. © 2024 SPLUNK INC. Exporters: ● OTLP: sends telemetry via the OpenTelemetry Protocol. ● Splunk HEC: sends telemetry to Splunk HTTP Event Collector endpoints. ● Jaeger: send traces data to Jaeger backends. Processors ● Batch: groups data into batches for more efficient processing. ● Memory Limiter: limits the amount of memory used by the collector. ● Transform: modifies data format or adds metadata. Receivers: ● OTLP: supports gRPC and HTTP protocols. ● Prometheus: scrapes metrics from Prometheus-instrumented targets. ● Filelog: tails and parses logs from files. Examples Collector Components
  • 67. © 2024 SPLUNK INC. Collector Components - Host Metric Receiver The Host Metrics receiver generates metrics about the host system scraped from various sources and host entity event as log. This is intended to be used when the collector is deployed as an agent.
  • 68. © 2024 SPLUNK INC. Collector Components - K8s attributes Processor Kubernetes attributes processor allow automatic setting of spans, metrics and logs resource attributes with k8s metadata. The processor automatically discovers* k8s resources (pods), extracts metadata from them and adds the extracted metadata to the relevant spans, metrics and logs as resource attributes. *: The processor uses the kubernetes API to discover all pods running in a cluster, keeps a record of their IP addresses, pod UIDs and interesting metadata.
  • 69. © 2024 SPLUNK INC. About instrumentation: Automatic vs Manual OpenTelemetry provides more than just zero-code and code-based telemetry solutions. The following things are also a part of OpenTelemetry: ➔ Libraries can leverage the OpenTelemetry API as a dependency, which will have no impact on applications using that library, unless the OpenTelemetry SDK is imported. ➔ For each signal (traces, metrics, logs) you have several methods at your disposals to create, process, and export them. ➔ With context propagation built into the implementations, you can correlate signals regardless of where they are generated. ➔ Resources and Instrumentation Scopes allow grouping of signals, by different entities, like, the host, operating system or K8s cluster ➔ Each language-specific implementation of the API and SDK follows the requirements and expectations of the OpenTelemetry specification. ➔ Semantic Conventions provide a common naming schema that can be used for standardization across code bases and platforms.
  • 70. © 2024 SPLUNK INC. OTel Collector vs. Universal/Heavy Forwarder
  • 71. © 2024 SPLUNK INC. ★ Patches / attaches to a library ★ Collect data library activities in runtime ★ Produces spans based on specification and semantic-conventions ★ May offer additional configuration / features ★ List of all auto-instrumentation: https://opentelemetry.io/registry/ Automatic/Zero-Code: ★ Application developer writes dedicated code ★ Starts and end span, set status ★ Adding attributes and events Manual/Code-based: About instrumentation: Automatic vs Manual In order to make a system observable, it must be instrumented: That is, code from the system’s components must emit traces, metrics, and logs.
  • 72. © 2019 SPLUNK INC. Benefits 1) Avoid vendor lock-in Choose the tools that are right for your business 2) Data Flexibility Collect and analyze custom metrics 3) Easy set up Instrument only once the way that works best for you 4) Operate at scale Scale without tool-specific considerations 5) Community Second most popular project behind Kubernetes
  • 73. © 2024 SPLUNK INC. Use Cases Infrastructure: ● Perform version audits to ensure zero vulnerabilities and make sure configurations are working ● Identify configuration changes leading to performance degeneration ● Check for misconfiguration with your domain name system (DNS), causing apps to be inaccessible Back End: ● Detect faulty logic or incorrect user input, which leads to exceptions being thrown ● Identify improperly implemented API calls to the back end ● Uncover poorly performing code on an API, which also leads to longer response time Front End: ● Detect faulty logic or incorrect user input to errors ● Find poorly implemented code; which makes your UI extremely slow despite having fast APIs ● Locate geo-specific lag requiring geo-distribution Examples
  • 74. © 2024 SPLUNK INC. Status and Roadmap Spans / Distributed Traces Generally available across ~all languages, the Collector, and other components Infrastructure Metrics, Application Metrics, Custom Metrics Generally available across ~all languages, the Collector, and other components Logs Generally available across most languages, the Collector, and other components Profiles New, still being designed
  • 75. © 2024 SPLUNK INC. Observability Cloud GDI Roadmap OpenTelemetry zero-configuration Automatically instrument custom applications, databases, message queues, etc. OpenTelemetry Agent Management View every VM, container cluster, and service that you own, the OTel agent or SDK instrumenting each one (and a list of ones that aren’t instrumented), and the health and status of each agent and SDK Enhancements for Splunk Enterprise and Splunk Cloud Customers Making the Collector deployable and configurable with Splunk Deployment Server, making the Collector fully supported for all data types and all Splunk products, adding metrics and traces to Edge Processor
  • 76. © 2024 SPLUNK INC. Splunk and OpenTelemetry ● Splunk Observability = no proprietary agents to enable data collection with OpenTelemetry. ● Splunk supports automatic trace instrumentation and configuration to make it easy to get started. ● You can customize what's included by building from the community source. ● View the status, interactions, dashboards and logs from all of your infrastructure in Splunk Observability Cloud or other observability tools. ● You can use OpenTelemetry to capture traces, metrics and logs from OpenTelemetry SDKs on the same host or over the networks, or from hundreds of sources, including databases, network proxies, Prometheus and Jaeger and more. Built-in standard for observability Powering end-to-end observability The Splunk Advantage
  • 77. © 2024 SPLUNK INC. The Splunk Distribution of the OpenTelemetry Collector The Splunk Distribution of OpenTelemetry Collector supports automatic (no code modification) trace instrumentation and comes with default configuration and out-of-the-box support for Splunk Application Performance Monitoring and Splunk Infrastructure Monitoring — making it easier than ever to get started. Splunk Observability Open Standards Data Collection
  • 78. © 2024 SPLUNK INC. Setup & Configuration
  • 79. © 2024 SPLUNK INC. Collector deployment modes This pattern consists of applications instrumented with an OpenTelemetry SDK that export telemetry signals (traces, metrics, logs) directly into a backend: The agent collector deployment pattern consists of applications — instrumented with an OpenTelemetry SDK using OpenTelemetry protocol (OTLP) — or other collectors (using the OTLP exporter) that send telemetry signals to a collector instance running with the application or on the same host as the application (such as a sidecar or a daemonset). The gateway collector deployment pattern consists of applications (or other collectors) sending telemetry signals to a single OTLP endpoint provided by one or more collector instances running as a standalone service (for example, a deployment in Kubernetes), typically per cluster, per data center or per region.
  • 80. © 2024 SPLUNK INC. Recommended Deployment Option for K8s Kubernetes cluster Splunk Cloud Observability Cloud OTel Coll. .yaml Reads logs from Logs via S2S Push, pull metrics, metadata Spans, metrics via OTLP Logs via HEC Edge Processor Heavy Forwarder For Kubernetes environments: ● Use the helm chart to rapidly deploy the collector to all nodes in the cluster. ● Utilize zero-config to automatically instrument applications. The collector will capture metrics, traces, and logs for both applications and infra.
  • 81. © 2024 SPLUNK INC. Recommended Deployment Option for Linux and Windows Hosts with UF For Linux and Windows hosts where the UF is already deployed: ● Use the Splunk Deployment Server with the Splunk Add-On for OpenTelemetry Collector to deploy and manage the collector. UF will continue to capture logs, while the collector will capture metrics and traces for both applications and infra. Host UF OTel Coll. Splunk Cloud Observability Cloud .conf TAs .yaml Splunk Deployment Server Can manage Reads logs from Spans, metrics via OTLP Push, pull metrics, metadata Logs via HEC Edge Processor Heavy Forwarder
  • 82. © 2024 SPLUNK INC. Alternate Deployment Option for Linux and Windows Hosts For customers with Linux and Windows hosts that want to use OpenTelemetry to collect 100% of their data: ● Deploy the Splunk Distribution of the OpenTelemetry Collector using an configuration management tool of choice (Ansible, Puppet, etc.) ● Utilize zero-config to automatically instrument applications. The collector will capture metrics, traces, and logs for both applications and infra. Host OTel Coll. Splunk Cloud Observability Cloud .yaml Reads logs from Spans, metrics via OTLP Push, pull metrics, metadata Logs via HEC Edge Processor Heavy Forwarder
  • 83. © 2024 SPLUNK INC. What’s next ? OpenTelemetry Official documentation: https://opentelemetry.io/docs/ Accelerating an implementation of OpenTelemetry in Splunk Observability Cloud https://lantern.splunk.com/Observability/Getting_Started/Accelerating_an_implementation_of_OpenTelemetry_in_Splunk_Observability_Cloud Using OpenTelemetry to get data into Splunk Cloud Platform https://lantern.splunk.com/Splunk_Platform/Product_Tips/Data_Management/Using_OpenTelemetry_to_get_data_into_Splunk_Cloud_Platform For a better understanding of the OpenTelemetry Configuration file: https://www.otelbin.io/
  • 84. © 2024 SPLUNK INC. Thank You!
  • 85. © 2023 SPLUNK INC. Wrap-Up ● That’s it :-) ● Please fill in the post-event survey. Check your mailboxes! ● Slides will be shared on SUG #5 event page an Slideshare as usually. ● We forgot to record Ian’s O11y session. We are very sorry :( ● Talk to us! PSUG @ Slack Register and subscribe to #prague-sug Slack channel Talk to us! Tomáš Moser tmoser@splunk.com, tommoser@cisco.com Ingrid Nemečková inemeckova@splunk.com, inemecko@cisco.com Michal Skorczewski mskorczewski@splunk.com, mskorcze@cisco.com Radek Filip radek.filip@alef.com PSUG @ LinkedIn https://www.linkedin.com/groups/9544692/
  • 86. © 2024 SPLUNK INC. See you soon!