SlideShare a Scribd company logo
Copyright © 2015 Splunk Inc.
IT Service Intelligence
Splunk Live 2016
Michael Donnelly
Staff Architect, IT Operations Analytics
ITSI Core Concepts
2
What is a Service?
Service
Requests
Responses
In ITSI, a Service is a logical group of technology components that a user
deems need to be monitored together.
It can often be generalized as a “black box” which we send requests, and
expect responses
3
What is a Service?
DNS
Requests
Responses
Technical Services
Auth
Requests
Responses
Web
Requests
Responses
Services can be lower level (technical) …
4
What is a Service?
DNS
Requests
Responses
Technical Services
Customer
Transactions
Requests
Responses
Business Services
Auth
Requests
Responses
Web
Requests
Responses
Support Desk
Requests
Responses
Services can also be higher level (business) …
5
What is a Service?
Packet Network
Hypervisor and Hosts
RBMDBs
Storage Tier
API Services
Web Services
CustomerTransactions
Mobile
API/Middleware
PartnerPortal
DNS
Services can encompass multiple tiers of the IT domain.
Services may also depend upon other services
6
What is a KPI?
DNS
Requests
Responses
KPI: Number of requests
KPI: Error rate
KPI: Average response time
KPI: Servicer CPU load
KPI: Server network I/F errors
Customer
Transactions
Requests
Responses
KPI: Number of transactions
KPI: Error rate
KPI: Average response time
KPI: Count of Incident Tickets
KPI: Synthetic Transx Health
KPIs and Health scores constitute the means by which
Services are monitored.
7
Key Performance Indicators (KPIs)
8
A Key Performance Indicator (KPI) is a Splunk saved search created within the
ITSI UI that helps monitor a specific field like CPU, Memory, Number of Errors
and so on. KPIs are contained within Services.
Service Health Scores
9
A Health score is a score form 0-100 (0 being critical and 100 being normal)
that helps determine the health of a Service. It is calculated based on all KPIs
importance and its status (e.g. green, orange, red), once every minute.
ITSI Tour
10
Service Decomposition
11
1 - What is a high-value business service?
(“Online Store” in Buttercup Games)
Service Decomposition
12
1 - What is a high-value business service? (Online Store)
2- Process flow, and underlying sub-services?
(Web -> Middleware -> DB -> Middleware -> Web)
Service Decomposition
13
1 - What is a high-value business service? (Online Store)
2- Process flow, and underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs to show health & status?
(Database: errors, SQL hits, response time, …)
Service Decomposition
14
1 - What is a high-value business service? (Online Store)
2- Process flow & underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs? (Database: errors, SQL hits, …)
4- For each KPI: Need a Splunk search
(index=DB (warn* OR error*) | stats count)
Service Decomposition (Refresher)
15
1 - What is a high-value business service? (Online Store)
2- Process flow & underlying sub-services? (Web -> Middleware …)
3- For each (sub)service: KPIs? (Database: errors, SQL hits, …)
4- For each KPI: Need a Splunk search (index=DB (warn* OR error*) | stats count)
Service Decomp: The Business Processes
16
Service Decomp: End-To-End Process Flow
17
New Requirements!
18
● Create a new KPI for the DB Service:
● Network Utilization
● Modify the Executive Glass Table
in order to show off the services
you slave over
“WE only have about 15min
TO DO WHAT ???!!???”
Think about how long this
would take you today?
19
Configuration of DB Service
Click Configure >
Click Services
Let’s Talk Entities
20
● Select DB Service
● Entities are the relevant things which support
this service (usually hosts)
● Select the right entries with filters, ANDs, ORs
● Original Entity list can come from CMDB,
spreadsheet, Splunk search, others
A KPI in 5 minutes? Absolutely!
21
Click New – Generic KPI
Select Data Model
● Host Operating System
● Network
● # bytes
● Next
KPI Continued….
22
Splunk Builds Searches for you –
Oh Yeah, that’s happening 
● Select Yes for Split by & Filter options
● Select host for Entity Lookup & Alias options
● Click Next
Almost There…
23
Select
● KPI Search Schedule: Every Minute
● Entity Calculation: Average
● Service/Agg Calculation: Average
● Calculation Window: Last Minute
● Next
● Unit: Bps
● Next
Final Steps …
24
Set your thresholds
● Aggregate (All)
● Per Entity
● Click “Add Threshold” TWICE
● Make the Neapolitan ice cream colors
Yellow, Green, Yellow
● Drag the sliders around in order to get
the current data graph entirely inside the
Green (normal) band
● Finish
● Other options are also available,
including adaptive thresholds and
anomaly detection
Adaptive Thresholds
25
What if your KPI data looks like this?
26
Adaptive Thresholds
Static thresholds will not work…
27
Adaptive Thresholds
Adaptive Thresholding works beautifully with cyclical data
Anomaly Detection
28
● Machine Learning
● Works well for data with patterns
● Requires some “training” (trial & error)
to zero in on best sensitivity
● More sophisticated capabilities coming!
(multivariate, more algorithms, etc)
Let’s Fix that Glass Table
29
Clone the Glass Table
30
Return to Saved Glass Tables page
(click on Glass Tables in the upper menu bar)
CLICK Edit for “Buttercup Games Business Process”
• Select Clone
• Title: Add your username
to the front
• Permissions: Shared in App
• Clone Page
• Click on your new Glass Table
from the list, to view it
Edit & Have Fun!
31
Click on Edit in the upper right corner of your Glass Table
Use the “Services” panel on the left to select Individual KPIs,
or Aggregate Service Health Scores
• Choose 2 KPIs from Online Store that would be useful in
the “Order Process” section
• Drag the selected widgets onto the canvas, positioning in
the gray oval
• What’s the difference between the
and tools at the top left?
More Fun with the Glass Table Editor…
32
Use the Configurations panel on the right to edit a
selected widget
• Can change the visualization type, drilldown
behavior, and other settings
• You should hit Save frequently
• I wonder what Auto Layout does?
• (YIKES!) Revert All Changes might be helpful
Finishing up …
33
• Add a ServiceHealthScore widget for Online
Store under Buttercup
• Choose a Viz Type with a sparkline graph, then
resize to make it look pretty
• Modify the Custom Drilldown action to go to
the saved glass table,
Buttercup Games Online Store
• Bonus Points: Make the label bigger, more
readable
• Save
• View when done
A Troubleshooting Exercise
34
Let’s use ITSI to troubleshoot an outage
● Start at your Glass Table, “<UserName> Buttercup Business Process”
● Customer Care reports that unhappy customers are complaining of failures
and long delays when trying to purchase
● The calls began coming in at around 40 minutes after the (previous) hour.
● In the upper right corner of the Glass Table, change the time picker from Now
to XX:40:00.0, where XX is the previous hour. For example, if it is currently
14:05, set the time picker to 13:40:00.0, then Apply
● This is how we can “time travel” back to see conditions at a particular
outage– oh yeah!
A Troubleshooting Exercise, cont’d
35
● The Online Store seems to be degraded, just as Customer Care reported.
Click on the widget under Buttercup to drill down further
A Troubleshooting Exercise, cont’d
36
● The Online Store Glass Table shows a much more detailed view, including the impacted customer-facing KPIs
at the far left (Revenue, etc)
● Based on this view of all the relevant
services, where do you think the root cause
lies?
● Which service should we troubleshoot first?
● Click on Health widget for that service, to
drill down to a Deep Dive
Deep Dive
37
● Deep Dive shows multiple KPIs and Health Scores in parallel “swim
lanes”. The initial time span shown is 15 minutes.
● The Health Score for this DB Service is the top swim lane. Can you
see when it begins to degrade from 100%?
● Mousing over this point in time, can you spot the KPI with the
leading fault indication? I.e., what busted first?
● To improve readability, change the Primary
Time Range (lower left corner) to
Presets > Last 60 minutes
Multi-KPI Alerts and Notable Events
38
● Click on Notable Events Review
● Multiple KPIs and Healthscores can
be combined in sophisticated ways
to create Multi-KPI alerts
● When a Multi-KPI alert fires, one
of the outcomes is the creation of
a Notable Event
● Notable Events allow NOC
personnel and others to triage and
coordinate event management
efforts
Service Analyzer
39
● Click on Service Analyzer > Default Service Analyzer
● Back where we started!
● This view shows a “no-frills” list of
services (top) and hottest KPIs
(bottom)
● Provides a quick jumping off point
into Deep Dives and the Notable
Events Review
● It is useful for NOCs and others
who need a high-level situational
view
Review
40
● High-value services can be decomposed and modeled in ITSI, using machine data
from the relevant systems
● Services and KPIs can be created in minutes, with sophisticated thresholding
techniques to distinguish “normal” from “not normal”
● Glass Tables allow service health and KPI metrics to be displayed in a way that
makes sense to specific groups, such as Executive Leadership, Business Service
Owners, the NOC, DevOps & Others
● Deep Dives allow KPIs to be compared side-by-side across any time range,
accelerating root cause analysis and significantly reducing MTTR
● Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable
events and a means to manage them
● … and it’s fun to build!
Self-paced Hands-on!
41
You can have your very own 7-day free eval sandbox,
to test these features and more:
● http://splunk.com/ITSI Then select:
Use this guidebook to help you explore ITSI’s capabilities:
● https://splunk.box.com/ITSI-Sandbox-Guidebook
42
SEPT 26-29, 2016
WALT DISNEY WORLD, ORLANDO
SWAN AND DOLPHIN RESORTS
• 5000+ IT & Business Professionals
• 3 days of technical content
• 165+ sessions
• 80+ Customer Speakers
• 35+ Apps in Splunk Apps Showcase
• 75+ Technology Partners
• 1:1 networking: Ask The Experts and Security
Experts, Birds of a Feather and Chalk Talks
• NEW hands-on labs!
• Expanded show floor, Dashboards Control
Room & Clinic, and MORE!
The 7th Annual Splunk Worldwide Users’ Conference
PLUS Splunk University
• Three days: Sept 24-26, 2016
• Get Splunk Certified for FREE!
• Get CPE credits for CISSP, CAP, SSCP
• Save thousands on Splunk education!
Thank You
Michael Donnelly
Staff Architect, IT Operations Analytics

More Related Content

PDF
Getting Started with Splunk Enterprise Hands-On
PPTX
Machine Learning and Analytics Breakout Session
PPTX
Splunk for Developers Breakout Session
PPTX
Taking Splunk to the Next Level - Manager
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PDF
Splunk conf2014 - Onboarding Data Into Splunk
PDF
Data Onboarding
PPTX
IT Service Intelligence Hands On Breakout Session
Getting Started with Splunk Enterprise Hands-On
Machine Learning and Analytics Breakout Session
Splunk for Developers Breakout Session
Taking Splunk to the Next Level - Manager
How to Design, Build and Map IT and Business Services in Splunk
Splunk conf2014 - Onboarding Data Into Splunk
Data Onboarding
IT Service Intelligence Hands On Breakout Session

What's hot (19)

PPTX
Splunk Ninjas: New Features and Search Dojo
PPTX
Getting started with splunk it service intelligence
PPTX
What's New in 6.3 + Data On-Boarding
PPTX
IT Service Intelligence Hands On
PDF
Splunk Webinar Best Practices für Incident Investigation
PPTX
Splunk IT Service Intelligence
PPTX
Getting Started with Splunk Enterprise
PPT
Supporting Enterprise System Rollouts with Splunk
PPTX
Splunk Tutorial for Beginners - What is Splunk | Edureka
PPTX
Splunk for Developers
PDF
Qwasi Splunk and NCR Integration: Business Analytics
PDF
Splunk in Nordstrom: IT Operations
PDF
Keynote Presentation
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PPTX
Getting started with Splunk - Break out Session
PDF
Splunk in Yoox: Security and Compliance
PDF
Splunk in Target: Internet of Things (Robot Analytics)
PDF
PayPal Customer Presentation
PPTX
SplunkLive! Detroit April 2013 - Domino's Pizza
Splunk Ninjas: New Features and Search Dojo
Getting started with splunk it service intelligence
What's New in 6.3 + Data On-Boarding
IT Service Intelligence Hands On
Splunk Webinar Best Practices für Incident Investigation
Splunk IT Service Intelligence
Getting Started with Splunk Enterprise
Supporting Enterprise System Rollouts with Splunk
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk for Developers
Qwasi Splunk and NCR Integration: Business Analytics
Splunk in Nordstrom: IT Operations
Keynote Presentation
How to Design, Build and Map IT and Business Services in Splunk
Getting started with Splunk - Break out Session
Splunk in Yoox: Security and Compliance
Splunk in Target: Internet of Things (Robot Analytics)
PayPal Customer Presentation
SplunkLive! Detroit April 2013 - Domino's Pizza
Ad

Viewers also liked (14)

PPT
Emerging Roles of Universities:Implications for Research
PPT
Service Frame North West Shared Service Forum 290410
PPTX
Service Frame Ssc Forum Dec 2010
PPTX
WORK-KPI
PDF
The One on One Meeting Agenda
PPT
How To Review Software Requirements
PDF
Leading Indicators: What's so KEY about your KPIs
PPTX
Call Center Management & KPI Metrics
PPT
Kpi Executive Dashboard
PPT
Agenda Setting Theory
PPTX
Agenda and meeting minutes
PDF
Agenda slide design / 商業簡報網-韓明文講師
PPT
Balanced Scorecard
PDF
The Rise Of China
Emerging Roles of Universities:Implications for Research
Service Frame North West Shared Service Forum 290410
Service Frame Ssc Forum Dec 2010
WORK-KPI
The One on One Meeting Agenda
How To Review Software Requirements
Leading Indicators: What's so KEY about your KPIs
Call Center Management & KPI Metrics
Kpi Executive Dashboard
Agenda Setting Theory
Agenda and meeting minutes
Agenda slide design / 商業簡報網-韓明文講師
Balanced Scorecard
The Rise Of China
Ad

Similar to IT Service Intelligence Hands On Breakout Session (20)

PPTX
IT Service Intelligence Hands On Breakout Session
PPTX
Getting Started With Splunk It Service Intelligence
PPTX
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
PPTX
Daten getriebene Service Intelligence mit Splunk ITSI
PPTX
Gov Day Sacramento 2015 - IT Service Intelligence
PPTX
Gov & Education Day 2015 - IT Service Intelligence
PPTX
Hitchhikers Guide to Service Intelligence
PDF
Splunk workshop-Service Intelligence
PDF
The Hitchhiker's Guide to Service Intelligence Workshop
PDF
Hitchhikers Guide to Service Intelligence
PDF
The Hitchhikers Guide to Service Intelligence
PPTX
SplunkLive! - Splunk for IT Operations
PPTX
Design, Build and Map IT and Business Services in Splunk
PDF
Splunk 4 Ninja ITSI Workshop
PPTX
Splunk for IT Operations
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PPTX
Splunk: How to Design, Build and Map IT Services
PDF
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
PDF
SplunkSummit 2015 - IT Service Intelligence
PPTX
The Hitchhiker's Guide to Service Intelligence
IT Service Intelligence Hands On Breakout Session
Getting Started With Splunk It Service Intelligence
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
Daten getriebene Service Intelligence mit Splunk ITSI
Gov Day Sacramento 2015 - IT Service Intelligence
Gov & Education Day 2015 - IT Service Intelligence
Hitchhikers Guide to Service Intelligence
Splunk workshop-Service Intelligence
The Hitchhiker's Guide to Service Intelligence Workshop
Hitchhikers Guide to Service Intelligence
The Hitchhikers Guide to Service Intelligence
SplunkLive! - Splunk for IT Operations
Design, Build and Map IT and Business Services in Splunk
Splunk 4 Ninja ITSI Workshop
Splunk for IT Operations
How to Design, Build and Map IT and Business Services in Splunk
Splunk: How to Design, Build and Map IT Services
SplunkLive! Stockholm 2015 breakout - Splunk IT Service Intelligence
SplunkSummit 2015 - IT Service Intelligence
The Hitchhiker's Guide to Service Intelligence

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)

Recently uploaded (20)

PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
August Patch Tuesday
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Getting Started with Data Integration: FME Form 101
PDF
Approach and Philosophy of On baking technology
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Mushroom cultivation and it's methods.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Univ-Connecticut-ChatGPT-Presentaion.pdf
1 - Historical Antecedents, Social Consideration.pdf
August Patch Tuesday
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Zenith AI: Advanced Artificial Intelligence
Web App vs Mobile App What Should You Build First.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Hindi spoken digit analysis for native and non-native speakers
Enhancing emotion recognition model for a student engagement use case through...
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Unlocking AI with Model Context Protocol (MCP)
Getting Started with Data Integration: FME Form 101
Approach and Philosophy of On baking technology
A novel scalable deep ensemble learning framework for big data classification...
MIND Revenue Release Quarter 2 2025 Press Release

IT Service Intelligence Hands On Breakout Session

  • 1. Copyright © 2015 Splunk Inc. IT Service Intelligence Splunk Live 2016 Michael Donnelly Staff Architect, IT Operations Analytics
  • 3. What is a Service? Service Requests Responses In ITSI, a Service is a logical group of technology components that a user deems need to be monitored together. It can often be generalized as a “black box” which we send requests, and expect responses 3
  • 4. What is a Service? DNS Requests Responses Technical Services Auth Requests Responses Web Requests Responses Services can be lower level (technical) … 4
  • 5. What is a Service? DNS Requests Responses Technical Services Customer Transactions Requests Responses Business Services Auth Requests Responses Web Requests Responses Support Desk Requests Responses Services can also be higher level (business) … 5
  • 6. What is a Service? Packet Network Hypervisor and Hosts RBMDBs Storage Tier API Services Web Services CustomerTransactions Mobile API/Middleware PartnerPortal DNS Services can encompass multiple tiers of the IT domain. Services may also depend upon other services 6
  • 7. What is a KPI? DNS Requests Responses KPI: Number of requests KPI: Error rate KPI: Average response time KPI: Servicer CPU load KPI: Server network I/F errors Customer Transactions Requests Responses KPI: Number of transactions KPI: Error rate KPI: Average response time KPI: Count of Incident Tickets KPI: Synthetic Transx Health KPIs and Health scores constitute the means by which Services are monitored. 7
  • 8. Key Performance Indicators (KPIs) 8 A Key Performance Indicator (KPI) is a Splunk saved search created within the ITSI UI that helps monitor a specific field like CPU, Memory, Number of Errors and so on. KPIs are contained within Services.
  • 9. Service Health Scores 9 A Health score is a score form 0-100 (0 being critical and 100 being normal) that helps determine the health of a Service. It is calculated based on all KPIs importance and its status (e.g. green, orange, red), once every minute.
  • 11. Service Decomposition 11 1 - What is a high-value business service? (“Online Store” in Buttercup Games)
  • 12. Service Decomposition 12 1 - What is a high-value business service? (Online Store) 2- Process flow, and underlying sub-services? (Web -> Middleware -> DB -> Middleware -> Web)
  • 13. Service Decomposition 13 1 - What is a high-value business service? (Online Store) 2- Process flow, and underlying sub-services? (Web -> Middleware …) 3- For each (sub)service: KPIs to show health & status? (Database: errors, SQL hits, response time, …)
  • 14. Service Decomposition 14 1 - What is a high-value business service? (Online Store) 2- Process flow & underlying sub-services? (Web -> Middleware …) 3- For each (sub)service: KPIs? (Database: errors, SQL hits, …) 4- For each KPI: Need a Splunk search (index=DB (warn* OR error*) | stats count)
  • 15. Service Decomposition (Refresher) 15 1 - What is a high-value business service? (Online Store) 2- Process flow & underlying sub-services? (Web -> Middleware …) 3- For each (sub)service: KPIs? (Database: errors, SQL hits, …) 4- For each KPI: Need a Splunk search (index=DB (warn* OR error*) | stats count)
  • 16. Service Decomp: The Business Processes 16
  • 17. Service Decomp: End-To-End Process Flow 17
  • 18. New Requirements! 18 ● Create a new KPI for the DB Service: ● Network Utilization ● Modify the Executive Glass Table in order to show off the services you slave over “WE only have about 15min TO DO WHAT ???!!???” Think about how long this would take you today?
  • 19. 19 Configuration of DB Service Click Configure > Click Services
  • 20. Let’s Talk Entities 20 ● Select DB Service ● Entities are the relevant things which support this service (usually hosts) ● Select the right entries with filters, ANDs, ORs ● Original Entity list can come from CMDB, spreadsheet, Splunk search, others
  • 21. A KPI in 5 minutes? Absolutely! 21 Click New – Generic KPI Select Data Model ● Host Operating System ● Network ● # bytes ● Next
  • 22. KPI Continued…. 22 Splunk Builds Searches for you – Oh Yeah, that’s happening  ● Select Yes for Split by & Filter options ● Select host for Entity Lookup & Alias options ● Click Next
  • 23. Almost There… 23 Select ● KPI Search Schedule: Every Minute ● Entity Calculation: Average ● Service/Agg Calculation: Average ● Calculation Window: Last Minute ● Next ● Unit: Bps ● Next
  • 24. Final Steps … 24 Set your thresholds ● Aggregate (All) ● Per Entity ● Click “Add Threshold” TWICE ● Make the Neapolitan ice cream colors Yellow, Green, Yellow ● Drag the sliders around in order to get the current data graph entirely inside the Green (normal) band ● Finish ● Other options are also available, including adaptive thresholds and anomaly detection
  • 25. Adaptive Thresholds 25 What if your KPI data looks like this?
  • 27. 27 Adaptive Thresholds Adaptive Thresholding works beautifully with cyclical data
  • 28. Anomaly Detection 28 ● Machine Learning ● Works well for data with patterns ● Requires some “training” (trial & error) to zero in on best sensitivity ● More sophisticated capabilities coming! (multivariate, more algorithms, etc)
  • 29. Let’s Fix that Glass Table 29
  • 30. Clone the Glass Table 30 Return to Saved Glass Tables page (click on Glass Tables in the upper menu bar) CLICK Edit for “Buttercup Games Business Process” • Select Clone • Title: Add your username to the front • Permissions: Shared in App • Clone Page • Click on your new Glass Table from the list, to view it
  • 31. Edit & Have Fun! 31 Click on Edit in the upper right corner of your Glass Table Use the “Services” panel on the left to select Individual KPIs, or Aggregate Service Health Scores • Choose 2 KPIs from Online Store that would be useful in the “Order Process” section • Drag the selected widgets onto the canvas, positioning in the gray oval • What’s the difference between the and tools at the top left?
  • 32. More Fun with the Glass Table Editor… 32 Use the Configurations panel on the right to edit a selected widget • Can change the visualization type, drilldown behavior, and other settings • You should hit Save frequently • I wonder what Auto Layout does? • (YIKES!) Revert All Changes might be helpful
  • 33. Finishing up … 33 • Add a ServiceHealthScore widget for Online Store under Buttercup • Choose a Viz Type with a sparkline graph, then resize to make it look pretty • Modify the Custom Drilldown action to go to the saved glass table, Buttercup Games Online Store • Bonus Points: Make the label bigger, more readable • Save • View when done
  • 34. A Troubleshooting Exercise 34 Let’s use ITSI to troubleshoot an outage ● Start at your Glass Table, “<UserName> Buttercup Business Process” ● Customer Care reports that unhappy customers are complaining of failures and long delays when trying to purchase ● The calls began coming in at around 40 minutes after the (previous) hour. ● In the upper right corner of the Glass Table, change the time picker from Now to XX:40:00.0, where XX is the previous hour. For example, if it is currently 14:05, set the time picker to 13:40:00.0, then Apply ● This is how we can “time travel” back to see conditions at a particular outage– oh yeah!
  • 35. A Troubleshooting Exercise, cont’d 35 ● The Online Store seems to be degraded, just as Customer Care reported. Click on the widget under Buttercup to drill down further
  • 36. A Troubleshooting Exercise, cont’d 36 ● The Online Store Glass Table shows a much more detailed view, including the impacted customer-facing KPIs at the far left (Revenue, etc) ● Based on this view of all the relevant services, where do you think the root cause lies? ● Which service should we troubleshoot first? ● Click on Health widget for that service, to drill down to a Deep Dive
  • 37. Deep Dive 37 ● Deep Dive shows multiple KPIs and Health Scores in parallel “swim lanes”. The initial time span shown is 15 minutes. ● The Health Score for this DB Service is the top swim lane. Can you see when it begins to degrade from 100%? ● Mousing over this point in time, can you spot the KPI with the leading fault indication? I.e., what busted first? ● To improve readability, change the Primary Time Range (lower left corner) to Presets > Last 60 minutes
  • 38. Multi-KPI Alerts and Notable Events 38 ● Click on Notable Events Review ● Multiple KPIs and Healthscores can be combined in sophisticated ways to create Multi-KPI alerts ● When a Multi-KPI alert fires, one of the outcomes is the creation of a Notable Event ● Notable Events allow NOC personnel and others to triage and coordinate event management efforts
  • 39. Service Analyzer 39 ● Click on Service Analyzer > Default Service Analyzer ● Back where we started! ● This view shows a “no-frills” list of services (top) and hottest KPIs (bottom) ● Provides a quick jumping off point into Deep Dives and the Notable Events Review ● It is useful for NOCs and others who need a high-level situational view
  • 40. Review 40 ● High-value services can be decomposed and modeled in ITSI, using machine data from the relevant systems ● Services and KPIs can be created in minutes, with sophisticated thresholding techniques to distinguish “normal” from “not normal” ● Glass Tables allow service health and KPI metrics to be displayed in a way that makes sense to specific groups, such as Executive Leadership, Business Service Owners, the NOC, DevOps & Others ● Deep Dives allow KPIs to be compared side-by-side across any time range, accelerating root cause analysis and significantly reducing MTTR ● Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable events and a means to manage them ● … and it’s fun to build!
  • 41. Self-paced Hands-on! 41 You can have your very own 7-day free eval sandbox, to test these features and more: ● http://splunk.com/ITSI Then select: Use this guidebook to help you explore ITSI’s capabilities: ● https://splunk.box.com/ITSI-Sandbox-Guidebook
  • 42. 42 SEPT 26-29, 2016 WALT DISNEY WORLD, ORLANDO SWAN AND DOLPHIN RESORTS • 5000+ IT & Business Professionals • 3 days of technical content • 165+ sessions • 80+ Customer Speakers • 35+ Apps in Splunk Apps Showcase • 75+ Technology Partners • 1:1 networking: Ask The Experts and Security Experts, Birds of a Feather and Chalk Talks • NEW hands-on labs! • Expanded show floor, Dashboards Control Room & Clinic, and MORE! The 7th Annual Splunk Worldwide Users’ Conference PLUS Splunk University • Three days: Sept 24-26, 2016 • Get Splunk Certified for FREE! • Get CPE credits for CISSP, CAP, SSCP • Save thousands on Splunk education!
  • 43. Thank You Michael Donnelly Staff Architect, IT Operations Analytics

Editor's Notes

  • #2: FOR THE PRESENTER: With only 60 minutes available, time is critical in this presentation. With multiple users all attempting to access remote ITSI instances simultaneously, delays, problems and questions are likely to arise at almost ANY POINT during this presentation. You must be ready to fill “wait time” at any point; know which topics you can pivot to– even if they’re slightly out of sequence. This workshop requires that the presenter be able to deftly toggle quickly between slides and browser– often. This is even trickier when using “full screen” mode for the browser and slides. For this reason, I recommend using a PDF version of the slides (rather than PowerPoint), since Acrobat is simpler to operate, especially when connected to an external projector. However you choose to display the slides and browser stuff, you should probably practice toggling quickly between the two. You should have a timer visible (to you only), counting down from 60 min, to help with pacing. PREP: Recruit some Splunk/ITSI technical helpers, available to run around the room and assist with problems and student issues. Laura Snow can assist with spinning up the VMs ahead of time, with the proper ITSI “hands-on” package installed. Two students per VM works comfortably, though more users per VM could be tolerated. RECOMMEND: configure four user accounts on each VM, in case you have more students than expected, and have to “double up”. Find a way to print out the VM IP addresses and usernames, and hand out to the students as they enter the room. Since the VMs may be spun up “last minute” on the morning of the SplunkLive, you should plan how you’re going to acquire the addresses at that time, create print-outs for the students, and print them out at the hotel/venue.
  • #3: FOR THE PRESENTER: Check your audience to find out how many have seen ITSI, and indeed, how many have even seen core Splunk before. The more newbies you have, the more in-depth you should cover core concepts. This deck does not cover core Splunk concepts, but you may have to do so on your own, possibly. Although these concepts are important for the students to be able to understand the later exercises, do not spend too much time in this section.
  • #11: FOR THE PRESENTER: This entire Tour section should last no more than 10 min. Describe how GTs can show KPIs & health scores to any audience/group/team: Show GTs: Buttercup Games Business Process (executives, business service owners) On Line Transaction Service (NOC, Tier2); “can use visio diagrams…” Buttercup Games Online Store (service flow, sub-services) Show saved Deep Dive “DB Deep Dive”; BRIEFLY describe DD functionality (you will be able to go into more depth later) Show Notable Event Review, BRIEFLY describe (you will be able to go into more depth later) Show Service Analyzer, briefly describe Ask if the students have questions.
  • #12: “While you continue to open those Glass Tables in separate browser tabs, let’s review ‘service decomposition’, discussed in the earlier session…” The next five slides tie the theoretical service decomp exercise into real-world; how do you do “service decomp” in ITSI? Our chosen “high-value” service is “Online Store”. This process should be undertaken by BOTH business service people AND technical IT people– working together. ITSI is has the rare ability to bridge the chasm which often exists between “Business Types” and “Technical Types”. It is critical that high-value business services (the ones which affect revenue, customer satisfaction, SLA performance, etc) be identified by the Business Types, along with “interesting” KPIs such as revenue, and that the relevant technical services (and KPIs) be identified by the Technical Types. Because ITSI provides flexibility on how these services and KPIs are defined, it is possible to satisfy BOTH Business AND Technical Types. A miracle!
  • #13: Within our chosen high-level service (Online Store), what are the relevant sub-services, and how does the process flow?
  • #14: For a given sub-service, such as “Database”, what are some useful KPIs which would describe its health, status and performance? These KPI metrics are based on Splunk searches, so they can be almost anything. Be creative!
  • #15: For a particular KPI, what is the Splunk search to generate the KPI metrics? The example here could be used for the Database KPI, “DB errors”.
  • #16: In the “real world”, it will probably be necessary to iterate up & down these steps a few times. For example, what if a KPI requires data which is not being collected by Splunk?
  • #17: TO STUDENTS: You have this glass table on your own system. This Glass Table shows the high-level business process for Buttercup Games. Does anyone notice anything missing? (no info in Order Entry) We need better visibility into our Online Store, which is part of the Order Entry process.
  • #18: TO STUDENTS: You have this glass table on your own system. This Glass Table shows a more detailed process flow for the Online Store service. Notice the sub-services which make up our Online Store service, and how the process flows.
  • #19: Based on a recent DB outage which was caused by a saturated network interface, we’ve decided that network utilization would be a handy KPI for our Database Service. We’re also going to tweak the high-level Business Process Glass Table to provide more visibility into the Online Store service. And we’re going to do it in 15 minutes!
  • #20: FOR THE PRESENTER: Remind the students that they can refer to their own locally-downloaded slides for “click-by-click” reference for the process of adding a KPI. Then switch to your own browser and demonstrate these steps “live”. Have fun with the concept that a roomful of people can build a new KPI in only a few minutes, and that “the clock is ticking”.
  • #21: FOR THE PRESENTER: SHORT discussion of entities
  • #22: FOR THE PRESENTER: Briefly cover “data model” vs “ad hoc search”. Don’t spend a lot of time here.
  • #23: FOR THE PRESENTER: Briefly cover the concepts on this page, and point how the “Generated Search” window at the bottom, and how cool it is that Splunk builds the search for you; does anyone in the audience have users who could benefit from this? QUICK TANGENT: In the typical working environment, which often has a chasm between the “Business Types” and the “Tech Types”, how long would it take to map services to actual infrastructure?  "Many quarters, and possibly a year-- on the conservative side, right?" To quantify that, by show of hands, has anyone here been involved in an IT Service Management / Business Management team trying to map every server to a service or business function?  Did you sustain any long-term injuries?  And even IF you are successful in this effort, as soon as you finish you have to start over. ITSI is remarkable because it can allow the Business teams and Technical teams to map out the important services realistically and effectively– in DAYS and WEEKS. We offer a Glass Table Workshop to facilitate such an exercise on YOUR services and YOUR data– in a single day.
  • #24: Keep moving…
  • #25: FOR THE PRESENTER: This might take a while for “waiting for data” to produce an actual graph for the students (1-2 minutes, typically). Instruct the students that if will take a couple of minutes for the data to appear, and to not click on anything in the meantime. Then skip to the Adaptive Thresholds and Anomaly Detection slides and discussion, while the students wait. Afterwards, can be helpful to gauge progress by asking for a show of hands to see how many students are still waiting. If necessary, simply show the students how to set thresholds (on your own browser), then move forward.
  • #26: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #27: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #28: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #29: Talk through NOT WORK
  • #30: We’ve already discussed the high-level business process for Buttercup Games. We need better visibility into our Online Store, which is part of the Order Entry process.
  • #31: FOR THE PRESENTER: As before, switch to your own browser and demonstrate these steps “live”. Have fun with the concept of saving a copy before editing– so that you don’t muck it up.
  • #32: FOR THE PRESENTER: Have fun with this GT editor section. The GT editor is a bit twitchy, so exploit the humor and have fun with the students. GOALS (for the next 3 slides): Identify 2 “interesting/useful” KPIs from the Online Store service, to position in the gray “Order Entry” oval; let the students choose details and viz types Put a ServiceHealthScore widget (from Online Store) under the pony, to show overall health of the service. Modify “custom drilldown” to land on the “Buttercup Games Online Store” GT Encourage the students to use text boxes and other techniques to make the widget more readable, prettier to look at Remind the students that “the boss’ boss” will be looking at this GT, and we want to make sure that they’ve got good visibility into “our” service (Online Store).
  • #33: FOR THE PRESENTER: If you use the “Auto Layout” gag (i.e., hinting that the students should click on this, resulting in total destruction of their GT), MAKE SURE that everyone has SAVED before doing so. This gag can be fun, especially pointing out how deceitful/evil the instructor is.
  • #34: FOR THE PRESENTER: When finished (after everyone have hit ‘Save’ and ‘View’, and are looking at their own beautiful GTs): How long did it take to create a new KPI and make major changes to a Glass Table? Pretty cool! Ask the students if this (ITSI) could be useful in their own environments If you have more than 15 min of remaining time, speak through some actual (referenceable) customer ITSI use cases.
  • #35: FOR THE PRESENTER: This hands-on section can be very powerful for the students. This allows them to “put it all together”, driving ITSI with their own fingers. As before, switch to your own browser and demonstrate these troubleshooting steps “live”. The corresponding slides are intended as reference for the students. If pressed for time (i.e., less than about 10 min), talk through and show this process– but don’t have the students attempt to “click along” in real time.
  • #36: If pressed for time, talk through and show this process– but don’t have the students attempt to “click along” in real time
  • #37: Note that this “drill down” has inherited the same time selection (i.e., an earlier outage)– pretty cool! FOR THE PRESENTER: The major points here: During the heat of battle, when troubleshooting an outage, being able to visualize the entire service flow is extremely valuable By being able to see health status of all the underlying services, we can quickly choose where and how best to proceed. Potentially huge time savings– customers report major reductions in MTTR
  • #38: FOR THE PRESENTER: This is a good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session. Remind the students that they will have more time to play with DD later (yes, they might be confused by this, since only a few minutes remain in the session)
  • #39: FOR THE PRESENTER: This is another good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session. Remind the students that they will have more time to play with this stuff later (yes, they might be confused by this, since only a few minutes remain in the session)
  • #42: Look! Students have more time to play in their own sandbox environment, after all.
  • #43: We’re headed to the East Coast! 2 inspired Keynotes – General Session and Security Keynote + Super Sessions with Splunk Leadership in Cloud, IT Ops, Security and Business Analytics! 165+ Breakout sessions addressing all areas and levels of Operational Intelligence – IT, Business Analytics, Mobile, Cloud, IoT, Security…and MORE! 30+ hours of invaluable networking time with industry thought leaders, technologists, and other Splunk Ninjas and Champions waiting to share their business wins with you! Join the 50%+ of Fortune 100 companies who attended .conf2015 to get hands on with Splunk. You’ll be surrounded by thousands of other like-minded individuals who are ready to share exciting and cutting edge use cases and best practices. You can also deep dive on all things Splunk products together with your favorite Splunkers. Head back to your company with both practical and inspired new uses for Splunk, ready to unlock the unimaginable power of your data! Arrive in Orlando a Splunk user, leave Orlando a Splunk Ninja! REGISTRATION OPENS IN MARCH 2016 – STAY TUNED FOR NEWS ON OUR BEST REGISTRATION RATES – COMING SOON!