SlideShare a Scribd company logo
Copyright © 2016 Splunk Inc.
Operationalizing Machine Learning
Adrish Sannyasi
Staff Solutions Architect, Healthcare
Splunk, Inc.
Dr. Tom LaGatta
Staff Data Scientist
Splunk, Inc.
2
Disclaimer
During the course of this presentation, we may make forward looking statements regarding future events
or the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results
could differ materially. For important factors that may cause actual results to differ from those contained
in our forward-looking statements, please review our filings with the SEC. The forward-looking
statements made in the this presentation are being made as of the time and date of its live presentation.
If reviewed after its live presentation, this presentation may not contain current or accurate information.
We do not assume any obligation to update any forward looking statements we may make.
In addition, any information about our roadmap outlines our general product direction and is subject to
change at any time without notice. It is for informational purposes only and shall not, be incorporated
into any contract or other commitment. Splunk undertakes no obligation either to develop the features
or functionality described or to include any such feature or functionality in a future release.
Copyright © 2016 Splunk Inc.
Why do we need ML?
Copyright © 2016 Splunk Inc.
Historical Data Real-time Data Statistical Models
DB, Hadoop/S3/NoSQL, Splunk Machine Learning
T – a few days T + a few days
Why is this so challenging using traditional methods?
• DATA IS STILL IN MOTION, still in a BUSINESS PROCESS.
• Enrich real-time MACHINE DATA with structured HISTORICAL DATA
• Make decisions IN REAL TIME using ALL THE DATA
• Combine LEADING and LAGGING INDICATORS (KPIs)
Splunk
Security Operations Center
Network Operations Center
Business Operations Center
Copyright © 2016 Splunk Inc.
What is ML?
6
ML 101: What is it?
• Machine Learning (ML) is a process for generalizing from examples
– Examples = example or “training” data
– Generalizing = building “statistical models” to capture correlations
– Process = ML is never done, you must keep validating & refitting models
• Simple ML workflow:
– Explore data
– FIT models based on data
– APPLY models in production
– Keep validating models
“All models are wrong, but some are useful.”
- George Box
7
3 Types of Machine Learning
1. Supervised Learning: generalizing from labeled data
8
3 Types of Machine Learning
2. Unsupervised Learning: generalizing from unlabeled data
9
3 Types of Machine Learning
3. Reinforcement Learning: generalizing from rewards in time
Leitner System Recommender systems
Copyright © 2016 Splunk Inc.
ML Use Cases
11
IT Ops: Predictive Maintenance
1. Get resource usage data (CPU, latency, outage reports)
2. Explore data, and fit predictive models on past / real-time data
3. Apply & validate models until predictions are accurate
4. Forecast resource saturation, demand & usage
5. Surface incidents to IT Ops, who INVESTIGATES & ACTS
Problem: Network outages and truck rolls cause big time & money expense
Solution: Build predictive model to forecast outage scenarios, act pre-emptively & learn
12
Security: Find Insider Threats
Problem: Security breaches cause big time & money expense
Solution: Build predictive model to forecast threat scenarios, act pre-emptively & learn
1. Get security data (data transfers, authentication, incidents)
2. Explore data, and fit predictive models on past / real-time data
3. Apply & validate models until predictions are accurate
4. Forecast abnormal behavior, risk scores & notable events
5. Surface incidents to Security Ops, who INVESTIGATES & ACTS
13
Business Analytics: Predict Customer Churn
Problem: Customer churn causes big time & money expense
Solution: Build predictive model to forecast possible churn, act pre-emptively & learn
1. Get customer data (set-top boxes, web logs, transaction history)
2. Explore data, and fit predictive models on past / real-time data
3. Apply & validate models until predictions are accurate
4. Forecast churn rate & identify customers likely to churn
5. Surface incidents to Business Ops, who INVESTIGATES & ACTS
14
Summary: The ML Process
Problem: <Stuff in the world> causes big time & money expense
Solution: Build predictive model to forecast <possible incidents>, act pre-emptively & learn
1. Get all relevant data to problem
2. Explore data, and fit predictive models on past / real-time data
3. Apply & validate models until predictions are accurate
4. Forecast KPIs & notable events associated to use case
5. Surface incidents to X Ops, who INVESTIGATES & ACTS
Operationalize
Copyright © 2016 Splunk Inc.
ML with Splunk
17
Splunk User Behavior Analytics (UBA)
• ~100% of breaches involve valid credentials (Mandiant Report)
• Need to understand normal & anomalous behaviors for ALL users
• UBA detects Advanced Cyberattacks and Malicious Insider Threats
• Lots of ML under the hood:
– Behavior Baselining & Modeling
– Anomaly Detection (30+ models)
– Advanced Threat Detection
• E.g., Data Exfil Threat:
– “Saw this strange login & data transfer
for user mpittman at 3am in China…”
– Surface threat to SOC Analysts
18
Machine Learning in Splunk ITSI
Adaptive Thresholding:
• Learn baselines & dynamic thresholds
• Alert & act on deviations
• Manage for 1000s of KPIs & entities
• Stdev/Avg, Quartile/Median, Range
Anomaly Detection:
• Find “hiccups” in expected patterns
• Catches deviations beyond thresholds
• Uses Holt-Winters algorithm
19
ML Toolkit & Showcase
• Splunk Supported framework for building ML Apps
– Get it for free: http://tiny.cc/splunkmlapp
• Leverages Python for Scientific Computing (PSC) add-on:
– Open-source Python data science ecosystem
– NumPy, SciPy, scitkit-learn, pandas, statsmodels
• Showcase use cases: Predict Hard Drive Failure, Server Power
Consumption, Application Usage, Customer Churn & more
• Standard algorithms out of the box:
– Supervised: Logistic Regression, SVM, Linear Regression, Random Forest, etc.
– Unsupervised: KMeans, DBSCAN, Spectral Clustering, PCA, KernelPCA, etc.
• Implement one of 300+ algorithms by editing Python scripts
Copyright © 2016 Splunk Inc.
Building ML Apps
23
Analysts Business Users
1. Get Data & Find Decision-Makers
2
IT Users
ODBC
SDK
API
DB Connect
Look-Ups
Ad Hoc
Search
Monitor
and Alert
Reports /
Analyze
Custom
Dashboards
GPS /
Cellular
Devices Networks Hadoop
Servers Applications Online
Shopping Carts
Analysts Business Users
Structured Data Sources
CRM ERP HR Billing Product Finance
Data Warehouse
Clickstreams
24
2. Explore Data, Build Searches & Dashboards
• Start with the Exploratory Data Analysis phase
– “80% of data science is sourcing, cleaning, and preparing the data”
– Tip: leverage ITSI KPIs – lots of domain knowledge
• For each data source, build “data diagnostic” dashboard
– What’s interesting? Throw up some basic charts.
– What’s relevant for this use case?
– Any anomalies? Are thresholds useful?
• Mix data streams & compute aggregates
– Compute KPIs & statistics w/ stats, eventstats, etc.
– Enrich data streams with useful structured data
– stats count by X Y – where X,Y from different sources
– Build new KPIs from what you find
25
3. Fit, Apply & Validate Models
• ML SPL – New grammar for doing ML in Splunk
• fit – fit models based on training data
– [training data] | fit LinearRegression costly_KPI
from feature1 feature2 feature3 into my_model
• apply – apply models on testing and production data
– [testing/production data] | apply my_model
• Validate Your Model (The Hard Part)
– Why hard? Because statistics is hard! Also: model error ≠ real world risk.
– Analyze residuals, mean-square error, goodness of fit, cross-validate, etc.
– Take Splunk’s Analytics & Data Science Education course
26
4. Predict & Act
• Forecast KPIs & predict notable events
– When will my system have a critical error?
– In which service or process?
– What’s the probable root cause?
• How will people act on predictions?
– Is this a Sev 1/2/3 event? Who responds?
– Deliver via Notable Events or dashboard?
– Human response or automated response?
• How do you improve the models?
– Iterate, add more data, extract more features
– Keep track of true/false positives
27
5. Operationalize Your Models
• Operationalizing closes the loop of the ML Process:
1. Get data
2. Explore data & fit models
3. Apply & validate models
4. Forecast KPIs & events
5. Surface incidents to Ops team
• When you deliver the outcome, keep track of the response
– Human-generated response (detailed journal logs, etc)
– Machine-generated response (workflow actions, etc)
– External knowledge (closed tickets data, DB records, etc)
• Then operationalize: feed back Ops analysis to data inputs, repeat
• Lots of hard work & stats, but lots of value will come out.
Operationalize
Copyright © 2016 Splunk Inc.
Show me the ML!
29
Next Steps with Splunk ML
• Reach out to your Tech Team! We can help architect ML workflows.
• Lots of ML commands in Core Splunk (predict, anomalydetection, stats)
• ML Toolkit & Showcase – available and free, ready to use
– Get it for free: http://tiny.cc/splunkmlapp
• Splunk UBA: Applied ML for Security
– Unsupervised learning of Users & Entities
– Surfaces Anomalies & Threats
• Splunk ITSI: Applied ML for ITOA use cases
– Manage 1000s of KPIs & alerts
– Adaptive Thresholding & Anomaly Detection
• ML New Product Initiative (NPI) Program:
– Connect with Product & Engineering teams - mlprogram@splunk.com
30
SEPT 26-29, 2016
WALT DISNEY WORLD, ORLANDO
SWAN AND DOLPHIN RESORTS
• 5000+ IT & Business Professionals
• 3 days of technical content
• 165+ sessions
• 80+ Customer Speakers
• 35+ Apps in Splunk Apps Showcase
• 75+ Technology Partners
• 1:1 networking: Ask The Experts and Security
Experts, Birds of a Feather and Chalk Talks
• NEW hands-on labs!
• Expanded show floor, Dashboards Control
Room & Clinic, and MORE!
The 7th Annual Splunk Worldwide Users’ Conference
PLUS Splunk University
• Three days: Sept 24-26, 2016
• Get Splunk Certified for FREE!
• Get CPE credits for CISSP, CAP, SSCP
• Save thousands on Splunk education!

More Related Content

PDF
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
PPTX
Cloud Maturity Model: The Road to Adoption
PDF
Apache Kafka
PPSX
Paging in LTE
PDF
Carrier aggregation
PDF
Native support of Prometheus monitoring in Apache Spark 3
PDF
Apache pulsar - storage architecture
PDF
Native Support of Prometheus Monitoring in Apache Spark 3.0
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Cloud Maturity Model: The Road to Adoption
Apache Kafka
Paging in LTE
Carrier aggregation
Native support of Prometheus monitoring in Apache Spark 3
Apache pulsar - storage architecture
Native Support of Prometheus Monitoring in Apache Spark 3.0

What's hot (20)

PDF
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
PDF
3GPP SON Series: Minimization of Drive Testing (MDT)
PDF
Full text search
PDF
Parallel Execution With Oracle Database 12c - Masterclass
PPTX
Best Practices for Splunk Deployments
PPTX
Gc and-pagescan-attacks-by-linux
PPTX
LTE Vs. 3G
PPTX
Beginners guide to_optimizer
PDF
Spark Tuning for Enterprise System Administrators By Anya Bida
PDF
Introduction to Kibana
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
PDF
Scaling Apache Spark at Facebook
PDF
VoWifi 03 - vowifi epdg aaa and architecture (pdf ppt)
PDF
Performance Troubleshooting Using Apache Spark Metrics
PDF
Oracle RAC on Extended Distance Clusters - Presentation
PPTX
Sap PM gestão de veículos
PPTX
Calling Procedure and Channel Mapping in GSM Network
PDF
MySQL/MariaDB Proxy Software Test
PDF
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
PPT
2 g training optimization
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
3GPP SON Series: Minimization of Drive Testing (MDT)
Full text search
Parallel Execution With Oracle Database 12c - Masterclass
Best Practices for Splunk Deployments
Gc and-pagescan-attacks-by-linux
LTE Vs. 3G
Beginners guide to_optimizer
Spark Tuning for Enterprise System Administrators By Anya Bida
Introduction to Kibana
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Scaling Apache Spark at Facebook
VoWifi 03 - vowifi epdg aaa and architecture (pdf ppt)
Performance Troubleshooting Using Apache Spark Metrics
Oracle RAC on Extended Distance Clusters - Presentation
Sap PM gestão de veículos
Calling Procedure and Channel Mapping in GSM Network
MySQL/MariaDB Proxy Software Test
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
2 g training optimization
Ad

Viewers also liked (20)

PPTX
Splunk for Machine Learning and Analytics
PPTX
Splunk for Machine Learning and Analytics
PDF
Machine Learning + Analytics in Splunk
PDF
Machine Learning + Analytics
PDF
Splunk conf2014 - Splunk for Data Science
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PDF
Machine Data 101
PDF
Machine Learning + Analytics in Splunk
PDF
Splunk Webinar: Verwandeln Sie Daten in wertvolle Erkenntnisse - Machine Lear...
PDF
Machine Learning + Analytics
PPTX
Machine Learning + Analytics in Splunk
PDF
SplunkLive Auckland - Operational Intelligence
PPTX
IT Service Intelligence Hands On Breakout Session
PDF
Getting Started with IT Service Intelligence (Hands On)
PDF
jello resume-2
PDF
Splunk IT Service Intelligence
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PDF
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
PDF
Building Business Service Intelligence
PPTX
Taking Splunk to the Next Level - Management
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Machine Learning + Analytics in Splunk
Machine Learning + Analytics
Splunk conf2014 - Splunk for Data Science
How to Design, Build and Map IT and Business Services in Splunk
Machine Data 101
Machine Learning + Analytics in Splunk
Splunk Webinar: Verwandeln Sie Daten in wertvolle Erkenntnisse - Machine Lear...
Machine Learning + Analytics
Machine Learning + Analytics in Splunk
SplunkLive Auckland - Operational Intelligence
IT Service Intelligence Hands On Breakout Session
Getting Started with IT Service Intelligence (Hands On)
jello resume-2
Splunk IT Service Intelligence
How to Design, Build and Map IT and Business Services in Splunk
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Building Business Service Intelligence
Taking Splunk to the Next Level - Management
Ad

Similar to Machine Learning and Analytics in Splunk (20)

PPTX
Machine Learning and Analytics Breakout Session
PPTX
Machine Learning and Analytics Breakout Session
PPTX
Machine Learning and Analytics Breakout Session
PPTX
Splunk for Machine Learning and Analytics
PPTX
Machine Learning and Analytics Breakout Session
PPTX
Machine Learning and Analytics Breakout Session
PDF
SplunkLive DC April 2016 - Operationalizing Machine Learning
PPTX
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
PDF
Splunk AI & Machine Learning Roundtable 2019 - Zurich
PDF
Splunk Artificial Intelligence & Machine Learning Webinar
PPTX
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
PDF
Splunk ITOA Roundtable - Zurich: 30th November 2017
PDF
Get more from your Machine Data with Splunk AI and ML
PDF
Get more from your Machine Date with Splunk AI and ML
PPTX
SplunkLive! Frankfurt 2018 - Get More From Your Machine Data with Splunk AI
PPTX
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
PPTX
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
PDF
Machine Learning + Analytics
PPTX
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
PDF
SplunkLive Melbourne Machine Learning & Analytics
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk for Machine Learning and Analytics
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
SplunkLive DC April 2016 - Operationalizing Machine Learning
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
Splunk AI & Machine Learning Roundtable 2019 - Zurich
Splunk Artificial Intelligence & Machine Learning Webinar
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
Splunk ITOA Roundtable - Zurich: 30th November 2017
Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Date with Splunk AI and ML
SplunkLive! Frankfurt 2018 - Get More From Your Machine Data with Splunk AI
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
Machine Learning + Analytics
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive Melbourne Machine Learning & Analytics

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx

Machine Learning and Analytics in Splunk

  • 1. Copyright © 2016 Splunk Inc. Operationalizing Machine Learning Adrish Sannyasi Staff Solutions Architect, Healthcare Splunk, Inc. Dr. Tom LaGatta Staff Data Scientist Splunk, Inc.
  • 2. 2 Disclaimer During the course of this presentation, we may make forward looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.
  • 3. Copyright © 2016 Splunk Inc. Why do we need ML?
  • 4. Copyright © 2016 Splunk Inc. Historical Data Real-time Data Statistical Models DB, Hadoop/S3/NoSQL, Splunk Machine Learning T – a few days T + a few days Why is this so challenging using traditional methods? • DATA IS STILL IN MOTION, still in a BUSINESS PROCESS. • Enrich real-time MACHINE DATA with structured HISTORICAL DATA • Make decisions IN REAL TIME using ALL THE DATA • Combine LEADING and LAGGING INDICATORS (KPIs) Splunk Security Operations Center Network Operations Center Business Operations Center
  • 5. Copyright © 2016 Splunk Inc. What is ML?
  • 6. 6 ML 101: What is it? • Machine Learning (ML) is a process for generalizing from examples – Examples = example or “training” data – Generalizing = building “statistical models” to capture correlations – Process = ML is never done, you must keep validating & refitting models • Simple ML workflow: – Explore data – FIT models based on data – APPLY models in production – Keep validating models “All models are wrong, but some are useful.” - George Box
  • 7. 7 3 Types of Machine Learning 1. Supervised Learning: generalizing from labeled data
  • 8. 8 3 Types of Machine Learning 2. Unsupervised Learning: generalizing from unlabeled data
  • 9. 9 3 Types of Machine Learning 3. Reinforcement Learning: generalizing from rewards in time Leitner System Recommender systems
  • 10. Copyright © 2016 Splunk Inc. ML Use Cases
  • 11. 11 IT Ops: Predictive Maintenance 1. Get resource usage data (CPU, latency, outage reports) 2. Explore data, and fit predictive models on past / real-time data 3. Apply & validate models until predictions are accurate 4. Forecast resource saturation, demand & usage 5. Surface incidents to IT Ops, who INVESTIGATES & ACTS Problem: Network outages and truck rolls cause big time & money expense Solution: Build predictive model to forecast outage scenarios, act pre-emptively & learn
  • 12. 12 Security: Find Insider Threats Problem: Security breaches cause big time & money expense Solution: Build predictive model to forecast threat scenarios, act pre-emptively & learn 1. Get security data (data transfers, authentication, incidents) 2. Explore data, and fit predictive models on past / real-time data 3. Apply & validate models until predictions are accurate 4. Forecast abnormal behavior, risk scores & notable events 5. Surface incidents to Security Ops, who INVESTIGATES & ACTS
  • 13. 13 Business Analytics: Predict Customer Churn Problem: Customer churn causes big time & money expense Solution: Build predictive model to forecast possible churn, act pre-emptively & learn 1. Get customer data (set-top boxes, web logs, transaction history) 2. Explore data, and fit predictive models on past / real-time data 3. Apply & validate models until predictions are accurate 4. Forecast churn rate & identify customers likely to churn 5. Surface incidents to Business Ops, who INVESTIGATES & ACTS
  • 14. 14 Summary: The ML Process Problem: <Stuff in the world> causes big time & money expense Solution: Build predictive model to forecast <possible incidents>, act pre-emptively & learn 1. Get all relevant data to problem 2. Explore data, and fit predictive models on past / real-time data 3. Apply & validate models until predictions are accurate 4. Forecast KPIs & notable events associated to use case 5. Surface incidents to X Ops, who INVESTIGATES & ACTS Operationalize
  • 15. Copyright © 2016 Splunk Inc. ML with Splunk
  • 16. 17 Splunk User Behavior Analytics (UBA) • ~100% of breaches involve valid credentials (Mandiant Report) • Need to understand normal & anomalous behaviors for ALL users • UBA detects Advanced Cyberattacks and Malicious Insider Threats • Lots of ML under the hood: – Behavior Baselining & Modeling – Anomaly Detection (30+ models) – Advanced Threat Detection • E.g., Data Exfil Threat: – “Saw this strange login & data transfer for user mpittman at 3am in China…” – Surface threat to SOC Analysts
  • 17. 18 Machine Learning in Splunk ITSI Adaptive Thresholding: • Learn baselines & dynamic thresholds • Alert & act on deviations • Manage for 1000s of KPIs & entities • Stdev/Avg, Quartile/Median, Range Anomaly Detection: • Find “hiccups” in expected patterns • Catches deviations beyond thresholds • Uses Holt-Winters algorithm
  • 18. 19 ML Toolkit & Showcase • Splunk Supported framework for building ML Apps – Get it for free: http://tiny.cc/splunkmlapp • Leverages Python for Scientific Computing (PSC) add-on: – Open-source Python data science ecosystem – NumPy, SciPy, scitkit-learn, pandas, statsmodels • Showcase use cases: Predict Hard Drive Failure, Server Power Consumption, Application Usage, Customer Churn & more • Standard algorithms out of the box: – Supervised: Logistic Regression, SVM, Linear Regression, Random Forest, etc. – Unsupervised: KMeans, DBSCAN, Spectral Clustering, PCA, KernelPCA, etc. • Implement one of 300+ algorithms by editing Python scripts
  • 19. Copyright © 2016 Splunk Inc. Building ML Apps
  • 20. 23 Analysts Business Users 1. Get Data & Find Decision-Makers 2 IT Users ODBC SDK API DB Connect Look-Ups Ad Hoc Search Monitor and Alert Reports / Analyze Custom Dashboards GPS / Cellular Devices Networks Hadoop Servers Applications Online Shopping Carts Analysts Business Users Structured Data Sources CRM ERP HR Billing Product Finance Data Warehouse Clickstreams
  • 21. 24 2. Explore Data, Build Searches & Dashboards • Start with the Exploratory Data Analysis phase – “80% of data science is sourcing, cleaning, and preparing the data” – Tip: leverage ITSI KPIs – lots of domain knowledge • For each data source, build “data diagnostic” dashboard – What’s interesting? Throw up some basic charts. – What’s relevant for this use case? – Any anomalies? Are thresholds useful? • Mix data streams & compute aggregates – Compute KPIs & statistics w/ stats, eventstats, etc. – Enrich data streams with useful structured data – stats count by X Y – where X,Y from different sources – Build new KPIs from what you find
  • 22. 25 3. Fit, Apply & Validate Models • ML SPL – New grammar for doing ML in Splunk • fit – fit models based on training data – [training data] | fit LinearRegression costly_KPI from feature1 feature2 feature3 into my_model • apply – apply models on testing and production data – [testing/production data] | apply my_model • Validate Your Model (The Hard Part) – Why hard? Because statistics is hard! Also: model error ≠ real world risk. – Analyze residuals, mean-square error, goodness of fit, cross-validate, etc. – Take Splunk’s Analytics & Data Science Education course
  • 23. 26 4. Predict & Act • Forecast KPIs & predict notable events – When will my system have a critical error? – In which service or process? – What’s the probable root cause? • How will people act on predictions? – Is this a Sev 1/2/3 event? Who responds? – Deliver via Notable Events or dashboard? – Human response or automated response? • How do you improve the models? – Iterate, add more data, extract more features – Keep track of true/false positives
  • 24. 27 5. Operationalize Your Models • Operationalizing closes the loop of the ML Process: 1. Get data 2. Explore data & fit models 3. Apply & validate models 4. Forecast KPIs & events 5. Surface incidents to Ops team • When you deliver the outcome, keep track of the response – Human-generated response (detailed journal logs, etc) – Machine-generated response (workflow actions, etc) – External knowledge (closed tickets data, DB records, etc) • Then operationalize: feed back Ops analysis to data inputs, repeat • Lots of hard work & stats, but lots of value will come out. Operationalize
  • 25. Copyright © 2016 Splunk Inc. Show me the ML!
  • 26. 29 Next Steps with Splunk ML • Reach out to your Tech Team! We can help architect ML workflows. • Lots of ML commands in Core Splunk (predict, anomalydetection, stats) • ML Toolkit & Showcase – available and free, ready to use – Get it for free: http://tiny.cc/splunkmlapp • Splunk UBA: Applied ML for Security – Unsupervised learning of Users & Entities – Surfaces Anomalies & Threats • Splunk ITSI: Applied ML for ITOA use cases – Manage 1000s of KPIs & alerts – Adaptive Thresholding & Anomaly Detection • ML New Product Initiative (NPI) Program: – Connect with Product & Engineering teams - mlprogram@splunk.com
  • 27. 30 SEPT 26-29, 2016 WALT DISNEY WORLD, ORLANDO SWAN AND DOLPHIN RESORTS • 5000+ IT & Business Professionals • 3 days of technical content • 165+ sessions • 80+ Customer Speakers • 35+ Apps in Splunk Apps Showcase • 75+ Technology Partners • 1:1 networking: Ask The Experts and Security Experts, Birds of a Feather and Chalk Talks • NEW hands-on labs! • Expanded show floor, Dashboards Control Room & Clinic, and MORE! The 7th Annual Splunk Worldwide Users’ Conference PLUS Splunk University • Three days: Sept 24-26, 2016 • Get Splunk Certified for FREE! • Get CPE credits for CISSP, CAP, SSCP • Save thousands on Splunk education!

Editor's Notes

  • #5: [Shawn] Q: Why does BA matter? A1: BA can drive VOLUME in your accounts. Customer examples to come. A2: BA can drive VALUE in your accounts. Strategic, high-level, high-profile use cases. Use HIGH-VALUE SMALL DATA to enrich LARGE VOLUMES of MACHINE
  • #7: Q: What is a statistical model? A: A model is a little copy of the world you can hold in your hands. Formal: A model is a parametrized relationship between variables. FITTING a model sets the parameters using feature variables & observed values APPLYING a model fills in predicted values using feature variables Image source: http://phdp.github.io/posts/2013-07-05-dtl.html
  • #8: Supervised learning is where you have existing LABELS in the data to help you out. Example: If you’re training a model for CUSTOMER CHURN, historically you know which customers stayed and which left. You can build a model to correlate historical churn with other features in the data. Then you can PREDICT churn for each customer based on everything they’re doing in real-time and have done in the past.
  • #9: Unsupervised learning is where you have NO LABELS to help you out. You have to figure out patterns Example: If you’re trying to do BEHAVIORAL ANALYTICS, you might just have a big confusing pile of IT & Security data to wade through. Unsupervised learning is the art & science of finding PATTERNS, BASELINES and ANOMALIES in the data. Once you understand all this (that’s hard!) you can try to predict possible INCIDENTS and THREATS. Good ML involves FEEDBACK loops. Best bet is to incorporate INCIDENT RESPONSE data and learn from what analysts have done in the past. [NEXT SLIDE: Reinforcement Learning]
  • #10: Reinforcement Learning is basically Supervised Learning where LABELS = REWARDS, and there is a strong focus on TIME and FEEDBACK LOOPS. This is how you OPERATIONALIZE machine learning: by looping back results of analysis and workflow and LEARN from interactions with the world. Rewards can be POSITIVE or NEGATIVE. Image: The Leitner system is reinforcement learning for flashcards. Correct answers “advance” and accumulate more points. Incorrect answers go back to the beginning. https://en.wikipedia.org/wiki/Leitner_system Reinforcement learning is rooted in behavioral psychology. Humans & animals are hard-wired for rewards https://en.wikipedia.org/wiki/Reinforcement_learning
  • #13: Q: How is this slide similar to the previous one? (go back and forth) A: The ML Process is the same, it’s just that the data & the operations teams are different. Also different will be the actual analysis in the middle, but the *process* of doing that analysis is the same.
  • #15: The ML process is itself a generalization of the different use cases. ML spans domains! The arrow means OPERATIONALIZE. Feed back incident data & other high-level analysis back into the ML Process. Keep exploring that data & fitting better models to align with reality. Loop Step #5 (Act) back to Step #1 (Data).
  • #16: Reinforcement learning lets us OPERATIONALIZE machine learning. When the machine recommends something to an analyst, the model can LEARN from the outcome of their work. Create a culture of REWARDS for your analytics team, not punishments. The machines can/should learn from ALL the available data. You might have to build complex ML workflows. Want good Splunk admin to help architect. Want VIRTUOUS CYCLE between human-machine interaction
  • #18: Re: 100% of breaches involve valid credentials: "Mandiant is the leader in incident response. They are the best of the best. They're brought in to deal with the largest, highest profile, most damaging breaches. When you read a news headline about a large organization being compromised, there is a great chance that Mandiant is working behind the scenes to eradicate the attackers from the environment. In a recent yearly report-when they looked across all the very damaging attacks they responded to-they noticed that valid credentials were used at some point in every single one of them. Why do we care? Well we care because it means that we cannot use simple techniques like counting failed logins to detect an attack. In fact based on this stat, there may not even be any failed logins to count! Instead we need to be much smarter. For example how can we look at 1000 successful logins and determine which of them was the malicious one? We do that through behavior analytics, baseline/outlier, ML, etc."
  • #19: Adaptive Thresholding extra details: Adaptive: baselines evolve & learn in time Time-variate: different profiles for days/hours Looks at training data over the last 7, 14, 30 or 60 days Allows a user to select adaptive thresholding method of choice Generates thresholds based off the user settings and updates every night User can switch to Static to stop the auto threshold updates i.e. apply only once initially Anomaly Detection extra details: For each KPI looks at historical data over the last 7, 2 or 1 day, retrains nightly Continuously makes predictions based on historical data and generates an anomaly score After 24 hours of switching it ON users can tune sensitivity to anomaly score to avoid false positives Users can enable alerting to start getting alerts for anomalies detected Super-high level: ITSI AD guesses the next value of the KPI based on what that KPI has done in the past. If this guess is especially bad, compared with guesses for that KPI in the past, it’s an anomaly. High-level: Given a single KPI, anomaly detection in ITSI looks over past data and uses properties like the average value or the long-term trend to forecast what the next value will be. It keeps track of how accurate these forecasts have been in the past. When it makes a forecast that turns out to be remarkably bad, compared with those past forecasts, the value gets a high anomaly score. The sensitivity setting determines how high the anomaly score has to be before ITSI will raise an alert. Concise but technical: ITSI AD uses double-exponential smoothing (Holt-Winters) to model a KPI's behavior and to forecast the next value. It keeps track of forecasting errors (the difference between the forecast and the actual value) for each KPI, and assigns each value an anomaly score that represents the fraction of previous forecasts with lower error.
  • #20: Free app! Toolkit & PSC are both free. Go to ML App link above, and click Documentation. Links for all distros () Q: Why standalone SH? A: Don’t want ML exploration & production to bring down other Splunk workloads Can use standalone 6.4 SH with older version SH cluster & indexers.
  • #22: Re: ML App v0.9. To be updated after new release. Stay tuned! Lots to come w/ Splunk ML. Image modified from cover of book Protecting Study Volunteers in Research Publisher: CenterWatch LLC; 4th Edition edition (June 15, 2012) NEXT: either leave slide & discuss OR show ML demo
  • #24: Before you do machine learning, you need DATA and DECISION-MAKERS. Walk before you can run! Start with useful data sources that can help people solve problems, and build basic dashboards correlating different things in the data. This is called EXPLORATORY DATA ANALYSIS. Once you do that, THEN try to fit models based on what seems to correlate. Interviewing & iterating with decision-makers is key. DATA: ML isn’t magic. You need good data to learn from. DECISION-MAKERS: Once you find patterns, anomalies, etc., who are you going to deliver to them to? How do they want information presented? Emails? Dashboards? Incident tickets?
  • #25: Walk before running! Precursor to building models & doing ML. Source for “80% of data science is EDA” quote: http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0 Image: OpenStreetMap logo, from Wikipedia. Creative Commons
  • #26: Remember: Machine Learning is a PROCESS. Takes a lot of work & elbow grease to get from Exploratory Data Analysis to ML Models in Production. Q: Why hard? A1: Statistics is hard. Subtle questions re: model error & statistical assumptions. Remember: “All models are wrong, some are useful” A2: Validation is also difficult because not everyone has the same requirements. For example, for some users false positives may be much more expensive than false negatives; for others, the opposite may be true. For some users, being 2X wrong is twice as bad as being X wrong; for others, there may be a non-linear relationship between error and badness.
  • #27: Also: Output an aggregate table for further analysis? (send via ODBC Driver, DB Connect or Hunk/Hadoop)
  • #28: Re: ML App v0.9. To be updated after new release. Time estimate: “soon, stay tuned!” If you want to use ML in production, let us know! We have customers using ML in production TODAY. e.g., New York Air Brake
  • #29: Time for ML demo! Get the ML App: http://tiny.cc/splunkmlapp Want more? Take Splunk’s Analytics & Data Science course! Course prework: http://bit.ly/splunkanalytics
  • #30: Re: ML App v0.9. To be updated after new release. Stay tuned! Lots to come w/ Splunk ML. Image modified from cover of book Protecting Study Volunteers in Research Publisher: CenterWatch LLC; 4th Edition edition (June 15, 2012) NEXT: either leave slide & discuss OR show ML demo A direct customer-Splunk engagement focused on real-world use of the Splunk Enterprise - MachineLearning Toolkit and Showcase app and related SPL commands Objectives • Help the customer to be successful in the impactful use of ML• Help Splunk to understand customer use cases and product requirements Details • Splunk Account SE plus PM/Engineering work directly with customer to guide usage, provide support, note analytics and product requirements and refine product where feasible • Customer participates in the above, developing 1 or more models and putting them in production• Customer agrees to be referenced publically; sharing reasonable detail and business impact• Customer agrees to participate in a set of activities that may include: case study, press quote, use of logo, PR/AR reference call, video profile
  • #31: We’re headed to the East Coast! 2 inspired Keynotes – General Session and Security Keynote + Super Sessions with Splunk Leadership in Cloud, IT Ops, Security and Business Analytics! 165+ Breakout sessions addressing all areas and levels of Operational Intelligence – IT, Business Analytics, Mobile, Cloud, IoT, Security…and MORE! 30+ hours of invaluable networking time with industry thought leaders, technologists, and other Splunk Ninjas and Champions waiting to share their business wins with you! Join the 50%+ of Fortune 100 companies who attended .conf2015 to get hands on with Splunk. You’ll be surrounded by thousands of other like-minded individuals who are ready to share exciting and cutting edge use cases and best practices. You can also deep dive on all things Splunk products together with your favorite Splunkers. Head back to your company with both practical and inspired new uses for Splunk, ready to unlock the unimaginable power of your data! Arrive in Orlando a Splunk user, leave Orlando a Splunk Ninja! REGISTRATION OPENS IN MARCH 2016 – STAY TUNED FOR NEWS ON OUR BEST REGISTRATION RATES – COMING SOON!