SlideShare a Scribd company logo
Data Onboarding Overview
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Safe Harbor Statement
© 2018 SPLUNK INC.
1. Splunk Data Collection Architecture
2. Apps and Technology Add-ons
3. Demos / Examples
4. Best Practices
5. Resources and Q&A
We Will Discuss:
Splunk Data
Collection
Architecture
Basic Architecture Refresh
How Splunk works at a high level
distributed search
auto-load balanced indexing
change tickets
web access logs
windows event logs / perfmon linux logs vmware logs, configs and metrics firewall data
app sever logs jmx and jvm metrics database logs and metrics product pricing
Search Head - Splunk’s UI
Indexer – Data Store/Processing
Forwarder - Collect & Send
Agentless
What can Splunk Ingest?
Agent-Less and Forwarder Approach for Flexibility and Optimization
syslog
TCP/UDP
Event Logs, Active Directory, OS Stats
Unix, Linux and Windows hosts
Universal Forwarder
syslog hosts
and network devices
Local File Monitoring
Universal Forwarder
Aggretation
host Windows
Aggregated/API Data Sources
Pre-filtering, API subscriptions
Heavy Forwarder
Mainframes*nix
Wire Data
Splunk Stream
Universal Forwarder or
HTTP Event Collector
DevOps, IoT,
Containers
HTTP Event Collector
(Agentless)
shell
API
perf
Collects Data From Remote Sources
• Splunk Universal Forwarders collect data from a local data source and sends it to
one or more Splunk indexers.
Scalable
• Thousands of universal forwarders can be installed with little impact on network
and host performance.
Broad Platform Support
• Available for installation on diverse computing platforms and architectures. Small
computing/disk/memory footprint.
Splunk Universal Forwarder
The Splunk Universal Forwarder is a Separate Download
Also Collects Data From Remote Sources...
• ...but is typically used for data aggregation for passage through firewalls, data
routing and/or filtering, scripted/modular inputs, or for HEC endpoints (more on this
in a bit).
Often run as a “data collection node” for API/scripted data access
• A heavy forwarder is typically run as a “data collection node” for technologies
requiring access via API, and not for collection of data from the node itself
Platform Support limited to that of Splunk Enterprise
• Being standalone, Heavy Forwarders are typically run on Linux VMs...
Splunk Heavy Forwarder
Configured via the regular Splunk Enterprise download
Large-Scale Data Collection Directly from Applications
• Provides a simple, load-balancer-friendly, secure way (token-based JSON or RAW
API) to send data at scale from applications directly to Splunk
Agentless
• Data at scale can be sent directly to indexer tier, bypassing forwarder layer
Broad Development Platform Support
• Logging drivers available for many platforms (docker, AWS Lambda, etc.) and
simple HTTP endpoint compatible with all development environments
Splunk HTTP Event Collector (HEC)
The Newest Way to Collect Data at Scale
Apps and
Add-Ons
App??? Add-on
▶ Your first choice when onboarding
new data
• Clean and ready to go out-of-the-box
▶ App is a complete solution
• Typically uses one or more TAs
▶ Add-on
• Abstracts collection methodology (log file, API,
scripted input, HEC)
• Typically includes relevant field extractions
(schema-on-the-fly)
• Includes relevant config files (props/transforms)
and ancillary scripts binaries
Where do you get Apps? Splunkbase!
Thriving Community
dev.splunk.com
75,000+ questions
and answers
1,000+ apps
Local User Groups &
SplunkLive! events
Data Onboarding:
Demos
▶ Using the Data Previewer
• Upload a File (You did this in the Getting Started Hands-on Session!)
▶ Installing and using Apps and Add-ons
▶ Continuous Local File Monitoring (Universal Forwarder)
• Monitor a directory and multiple files in real-time
• Most common architecture for syslog-based sourcetypes
What You Will See
Data Onboarding
Best Practices
Components of a Splunk Success Program
Architecture
&
Infrastructure
Operations
& Supporting
Tools
Staffing
Data
On-
Boarding
User
On-Boarding
Inform
▶ Architect
• Design and optimize Splunk architecture for large-scale/distributed
deployments.
▶ System Administrator
• Implement and maintain Splunk infrastructure and configuration
▶ Search Expert
▶ App Developer
▶ Knowledge Manager
• Perform data interpretation, classification and enrichment
• Work with System Administrator to properly onboard data
Typical Splunk Staffing RolesArch &
Infra
Ops &
Tools
Staffing
Data
On-
Boarding
User
On-
Boarding
Inform
▶ Define on-boarding process for
new data sources / apps
▶ Repeatable, documented
process
▶ Provide customer interview
forum or survey
▶ Integrate with service workflow
Data Onboarding TasksArch &
Infra
Ops &
Tools
StaffingData
On-
Boarding
User
On-
Boarding
Inform
New Data Source Request
 Provide a data sample
 Describe the data’s structure
 timestamp | timezone  single-/multi-line
 sourcetype  interesting fields
 Describe initial uses for the data
 searches | alerts | reports | dashboards
 How to collect the data?
 UF | syslog | API
 How long to retain the data?
 Who should have access?
 Apply Common information Model
 Are there TA’s available?
 Validate
Ladies and Gentlemen, We’ll be Boarding Soon!
Six Things to Get Right at Index Time
Source
Event
Boundary /
LineBreaking
Host
Index
Sourcetype
Date
Timestamp
▶ Gather info (New Data Source Request):
• Where does this data originate/reside? How will Splunk collect it?
• Which users/groups will need access to this data? Access controls?
• Determine the indexing volume and data retention requirements
• Will this data need to drive existing dashboards (ES, PCI, etc.)?
• Who is the Owner/SME for this data?
▶ Map it out:
• Get a "big enough" sample of the event data
• Identify and map out fields (ensure CIM compliance)
• Assign sourcetype and TA names according to CIM conventions
Pre-Board Essentials
▶ Identify the specific sourcetype(s) - onboard each separately
• Important – syslog is not a sourcetype!
• More on this later
▶ Check for pre-existing app/add-on on splunk.com – don't
reinvent the wheel!
▶ Start with a “Test” index, Verify index-time settings correct
(previous slide)
• Try the Data Previewer first
• tweak props/transforms “by hand” only if absolutely necessary
Pre-Board Essentials (cont.)
▶ Find and fix index-time problems BEFORE
polluting your index
▶ A try-it-before-you-fry-it interface for figuring out
• Event breaking
• Timestamp recognition
• Timezone assignment
▶ Provides most necessary props.conf parameter settings
Your Friend, the Data Previewer
If you have to get into the weeds...
Always set these six parameters in props.conf
# SL17
[SL17]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19
SHOULD_LINEMERGE = False
LINE_BREAKER = ([nr]+)d{4}-d{2}-d{2}sd{2}:d{2}:d{2}
TRUNCATE = 10000
▶ The Common Information Model (CIM) defines relationships in
the underlying data, while leaving the raw machine data intact
▶ A naming convention for fields, eventtypes & tags
▶ More advanced reporting and correlation requires that the data
be normalized, categorized and parsed
▶ CIM-compliant data sources can drive CIM-based dashboards
(ES, PCI, others)
What Is the CIM and Why Should I Care?
▶ Syslog is a protocol – not a sourcetype
▶ Syslog typically carries multiple sourcetypes
▶ Best to pre-filter syslog traffic using syslog-ng or rsyslog
• Do not send syslog data directly to Splunk over a network port (514)
▶ Use a UF or HEC to transport data to Splunk (next slide)
• Ensures proper load balancing and data distribution
• Secure and efficient
• Insulates against Splunk component failures
▶ See https://www.splunk.com/blog/2017/03/30/syslog-ng-and-hec-scalable-
aggregated-data-collection-in-splunk.html for more info on this topic
A special note on Syslog
Recommended syslog architectures
Learn More
From Today
▶ https://splunkbase.splunk.com/app/2962/
▶ For creating REST API, Scripted or Modular Inputs through a GUI
▶ Helps your Add-ons get Certified
▶ Can also use on sample data to build out configs as well
Check Out the New Add-on Builder!
▶ Videos!
• http://www.splunk.com/view/education-videos/SP-CAAAGB6
▶ Getting Data In – Splunk Docs
• http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor
▶ Date and time format variables
• http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables
▶ Getting Data In – Dev Manual (very thorough!)
• http://dev.splunk.com/view/dev-guide/SP-CAAAE3A
▶ HTTP Event Collector
• http://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector
▶ .conf Sessions
• https://conf.splunk.com/session/2015/conf2015_Aduca_Splunk_Delpoying_OnboardingDataIntoSplunk.pdf
▶ GOOGLE!
Where to Go to Learn More
ORLANDO FLORIDA
Walt Disney World Swan and Dolphin Hotels
.conf18:
Monday, October 1 – Thursday, October 4
Splunk University:
Saturday, September 29 – Monday, October 1
Q&A

More Related Content

PPTX
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
PPTX
SplunkLive! Frankfurt 2018 - Legacy SIEM to Splunk, How to Conquer Migration ...
PPTX
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
PPTX
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
PPTX
SplunkLive! Munich 2018: Data Onboarding Overview
PPTX
SplunkLive! Frankfurt 2018 - Getting Hands On with Splunk Enterprise
PPTX
SplunkLive! Frankfurt 2018 - Get More From Your Machine Data with Splunk AI
PPTX
SplunkLive! Frankfurt 2018 - Intro to Security Analytics Methods
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Legacy SIEM to Splunk, How to Conquer Migration ...
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Getting Hands On with Splunk Enterprise
SplunkLive! Frankfurt 2018 - Get More From Your Machine Data with Splunk AI
SplunkLive! Frankfurt 2018 - Intro to Security Analytics Methods

What's hot (20)

PDF
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
PPTX
SplunkLive! Zurich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
PPTX
SplunkLive! Munich 2018: Intro to Security Analytics Methods
PPTX
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
PPTX
SplunkLive! Munich 2018: Getting Started with Splunk Enterprise
PPTX
SplunkLive! Paris 2018: Event Management Is Dead
PDF
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
PPTX
SplunkLive! Zurich 2018: Event Analytics
PPTX
SplunkLive! Munich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
PPTX
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
PPTX
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
PPTX
SplunkLive! Munich 2018: Integrating Metrics and Logs
PPTX
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
PPTX
SplunkLive! Frankfurt 2018 - Use Splunk for Incident Response, Orchestration ...
PPTX
SplunkLive! Munich 2018: Use Splunk for incident Response, Orchestration and ...
PPTX
SplunkLive! Zurich 2018: Integrating Metrics and Logs
PDF
Splunk Discovery: Warsaw 2018 - Intro to Security Analytics Methods
PPTX
SplunkLive! Zurich 2018: Intro to Security Analytics Methods
PPTX
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
PPTX
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Zurich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
SplunkLive! Munich 2018: Intro to Security Analytics Methods
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Getting Started with Splunk Enterprise
SplunkLive! Paris 2018: Event Management Is Dead
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
SplunkLive! Zurich 2018: Event Analytics
SplunkLive! Munich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Get More From Your Machine Data Splunk & AI
SplunkLive! Munich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Use Splunk for Incident Response, Orchestration ...
SplunkLive! Munich 2018: Use Splunk for incident Response, Orchestration and ...
SplunkLive! Zurich 2018: Integrating Metrics and Logs
Splunk Discovery: Warsaw 2018 - Intro to Security Analytics Methods
SplunkLive! Zurich 2018: Intro to Security Analytics Methods
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
Ad

Similar to SplunkLive! Frankfurt 2018 - Data Onboarding Overview (20)

PDF
Splunk Data Onboarding Overview - Splunk Data Collection Architecture
PDF
PSUG 1 - 2024-01-22 - Onboarding Best Practices
PPTX
SplunkLive! London 2016 Splunk Overview
PPTX
SplunkLive! Presentation - Data Onboarding with Splunk
PDF
Machine Data 101
PPTX
Splunk .conf18 Updates, Config Add-on, SplDevOps
PPTX
Data Onboarding Breakout Session
PPTX
Getting Started with Splunk Enterprise Hands-On
PPTX
Getting Started with Splunk Enterprise
PPTX
Getting Started with Splunk Enterprise
PPTX
Machine Data 101
PPTX
Getting Started with Splunk Enterprise Hands-On
PDF
Data Onboarding
PDF
Data Onboarding
PDF
Machine Data 101
PDF
Splunk workshop-Machine Data 101
PPTX
Machine Data 101: Turning Data Into Insight
PPTX
Getting Started with Splunk Enterprises
PPTX
Machine Data 101: Turning Data Into Insight
PDF
Getting Started with Splunk Enterprise
Splunk Data Onboarding Overview - Splunk Data Collection Architecture
PSUG 1 - 2024-01-22 - Onboarding Best Practices
SplunkLive! London 2016 Splunk Overview
SplunkLive! Presentation - Data Onboarding with Splunk
Machine Data 101
Splunk .conf18 Updates, Config Add-on, SplDevOps
Data Onboarding Breakout Session
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Machine Data 101
Getting Started with Splunk Enterprise Hands-On
Data Onboarding
Data Onboarding
Machine Data 101
Splunk workshop-Machine Data 101
Machine Data 101: Turning Data Into Insight
Getting Started with Splunk Enterprises
Machine Data 101: Turning Data Into Insight
Getting Started with Splunk Enterprise
Ad

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Modernizing your data center with Dell and AMD
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
A Presentation on Artificial Intelligence
NewMind AI Monthly Chronicles - July 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Modernizing your data center with Dell and AMD
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

SplunkLive! Frankfurt 2018 - Data Onboarding Overview

  • 2. During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved. Safe Harbor Statement
  • 3. © 2018 SPLUNK INC. 1. Splunk Data Collection Architecture 2. Apps and Technology Add-ons 3. Demos / Examples 4. Best Practices 5. Resources and Q&A We Will Discuss:
  • 5. Basic Architecture Refresh How Splunk works at a high level distributed search auto-load balanced indexing change tickets web access logs windows event logs / perfmon linux logs vmware logs, configs and metrics firewall data app sever logs jmx and jvm metrics database logs and metrics product pricing Search Head - Splunk’s UI Indexer – Data Store/Processing Forwarder - Collect & Send Agentless
  • 6. What can Splunk Ingest? Agent-Less and Forwarder Approach for Flexibility and Optimization syslog TCP/UDP Event Logs, Active Directory, OS Stats Unix, Linux and Windows hosts Universal Forwarder syslog hosts and network devices Local File Monitoring Universal Forwarder Aggretation host Windows Aggregated/API Data Sources Pre-filtering, API subscriptions Heavy Forwarder Mainframes*nix Wire Data Splunk Stream Universal Forwarder or HTTP Event Collector DevOps, IoT, Containers HTTP Event Collector (Agentless) shell API perf
  • 7. Collects Data From Remote Sources • Splunk Universal Forwarders collect data from a local data source and sends it to one or more Splunk indexers. Scalable • Thousands of universal forwarders can be installed with little impact on network and host performance. Broad Platform Support • Available for installation on diverse computing platforms and architectures. Small computing/disk/memory footprint. Splunk Universal Forwarder The Splunk Universal Forwarder is a Separate Download
  • 8. Also Collects Data From Remote Sources... • ...but is typically used for data aggregation for passage through firewalls, data routing and/or filtering, scripted/modular inputs, or for HEC endpoints (more on this in a bit). Often run as a “data collection node” for API/scripted data access • A heavy forwarder is typically run as a “data collection node” for technologies requiring access via API, and not for collection of data from the node itself Platform Support limited to that of Splunk Enterprise • Being standalone, Heavy Forwarders are typically run on Linux VMs... Splunk Heavy Forwarder Configured via the regular Splunk Enterprise download
  • 9. Large-Scale Data Collection Directly from Applications • Provides a simple, load-balancer-friendly, secure way (token-based JSON or RAW API) to send data at scale from applications directly to Splunk Agentless • Data at scale can be sent directly to indexer tier, bypassing forwarder layer Broad Development Platform Support • Logging drivers available for many platforms (docker, AWS Lambda, etc.) and simple HTTP endpoint compatible with all development environments Splunk HTTP Event Collector (HEC) The Newest Way to Collect Data at Scale
  • 11. App??? Add-on ▶ Your first choice when onboarding new data • Clean and ready to go out-of-the-box ▶ App is a complete solution • Typically uses one or more TAs ▶ Add-on • Abstracts collection methodology (log file, API, scripted input, HEC) • Typically includes relevant field extractions (schema-on-the-fly) • Includes relevant config files (props/transforms) and ancillary scripts binaries
  • 12. Where do you get Apps? Splunkbase!
  • 13. Thriving Community dev.splunk.com 75,000+ questions and answers 1,000+ apps Local User Groups & SplunkLive! events
  • 15. ▶ Using the Data Previewer • Upload a File (You did this in the Getting Started Hands-on Session!) ▶ Installing and using Apps and Add-ons ▶ Continuous Local File Monitoring (Universal Forwarder) • Monitor a directory and multiple files in real-time • Most common architecture for syslog-based sourcetypes What You Will See
  • 17. Components of a Splunk Success Program Architecture & Infrastructure Operations & Supporting Tools Staffing Data On- Boarding User On-Boarding Inform
  • 18. ▶ Architect • Design and optimize Splunk architecture for large-scale/distributed deployments. ▶ System Administrator • Implement and maintain Splunk infrastructure and configuration ▶ Search Expert ▶ App Developer ▶ Knowledge Manager • Perform data interpretation, classification and enrichment • Work with System Administrator to properly onboard data Typical Splunk Staffing RolesArch & Infra Ops & Tools Staffing Data On- Boarding User On- Boarding Inform
  • 19. ▶ Define on-boarding process for new data sources / apps ▶ Repeatable, documented process ▶ Provide customer interview forum or survey ▶ Integrate with service workflow Data Onboarding TasksArch & Infra Ops & Tools StaffingData On- Boarding User On- Boarding Inform New Data Source Request  Provide a data sample  Describe the data’s structure  timestamp | timezone  single-/multi-line  sourcetype  interesting fields  Describe initial uses for the data  searches | alerts | reports | dashboards  How to collect the data?  UF | syslog | API  How long to retain the data?  Who should have access?  Apply Common information Model  Are there TA’s available?  Validate
  • 20. Ladies and Gentlemen, We’ll be Boarding Soon! Six Things to Get Right at Index Time Source Event Boundary / LineBreaking Host Index Sourcetype Date Timestamp
  • 21. ▶ Gather info (New Data Source Request): • Where does this data originate/reside? How will Splunk collect it? • Which users/groups will need access to this data? Access controls? • Determine the indexing volume and data retention requirements • Will this data need to drive existing dashboards (ES, PCI, etc.)? • Who is the Owner/SME for this data? ▶ Map it out: • Get a "big enough" sample of the event data • Identify and map out fields (ensure CIM compliance) • Assign sourcetype and TA names according to CIM conventions Pre-Board Essentials
  • 22. ▶ Identify the specific sourcetype(s) - onboard each separately • Important – syslog is not a sourcetype! • More on this later ▶ Check for pre-existing app/add-on on splunk.com – don't reinvent the wheel! ▶ Start with a “Test” index, Verify index-time settings correct (previous slide) • Try the Data Previewer first • tweak props/transforms “by hand” only if absolutely necessary Pre-Board Essentials (cont.)
  • 23. ▶ Find and fix index-time problems BEFORE polluting your index ▶ A try-it-before-you-fry-it interface for figuring out • Event breaking • Timestamp recognition • Timezone assignment ▶ Provides most necessary props.conf parameter settings Your Friend, the Data Previewer
  • 24. If you have to get into the weeds... Always set these six parameters in props.conf # SL17 [SL17] TIME_PREFIX = ^ TIME_FORMAT = %Y-%m-%d %H:%M:%S MAX_TIMESTAMP_LOOKAHEAD = 19 SHOULD_LINEMERGE = False LINE_BREAKER = ([nr]+)d{4}-d{2}-d{2}sd{2}:d{2}:d{2} TRUNCATE = 10000
  • 25. ▶ The Common Information Model (CIM) defines relationships in the underlying data, while leaving the raw machine data intact ▶ A naming convention for fields, eventtypes & tags ▶ More advanced reporting and correlation requires that the data be normalized, categorized and parsed ▶ CIM-compliant data sources can drive CIM-based dashboards (ES, PCI, others) What Is the CIM and Why Should I Care?
  • 26. ▶ Syslog is a protocol – not a sourcetype ▶ Syslog typically carries multiple sourcetypes ▶ Best to pre-filter syslog traffic using syslog-ng or rsyslog • Do not send syslog data directly to Splunk over a network port (514) ▶ Use a UF or HEC to transport data to Splunk (next slide) • Ensures proper load balancing and data distribution • Secure and efficient • Insulates against Splunk component failures ▶ See https://www.splunk.com/blog/2017/03/30/syslog-ng-and-hec-scalable- aggregated-data-collection-in-splunk.html for more info on this topic A special note on Syslog
  • 29. ▶ https://splunkbase.splunk.com/app/2962/ ▶ For creating REST API, Scripted or Modular Inputs through a GUI ▶ Helps your Add-ons get Certified ▶ Can also use on sample data to build out configs as well Check Out the New Add-on Builder!
  • 30. ▶ Videos! • http://www.splunk.com/view/education-videos/SP-CAAAGB6 ▶ Getting Data In – Splunk Docs • http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor ▶ Date and time format variables • http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables ▶ Getting Data In – Dev Manual (very thorough!) • http://dev.splunk.com/view/dev-guide/SP-CAAAE3A ▶ HTTP Event Collector • http://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector ▶ .conf Sessions • https://conf.splunk.com/session/2015/conf2015_Aduca_Splunk_Delpoying_OnboardingDataIntoSplunk.pdf ▶ GOOGLE! Where to Go to Learn More
  • 31. ORLANDO FLORIDA Walt Disney World Swan and Dolphin Hotels .conf18: Monday, October 1 – Thursday, October 4 Splunk University: Saturday, September 29 – Monday, October 1
  • 32. Q&A

Editor's Notes

  • #4: Today’s goal is to talk about Data Onboarding or “Getting Data Into Splunk” from a ”New to Splunk” perspective. More specifically we’ll talk about the following and then do a little bit of demo. 1. Splunk Platform – a refresher You’ve seen the Splunk Overview, but I want to quickly go through a few overview slides and relate why data onboarding is important to them 2. What can Splunk Eat Then we’ll identify not only the data sources that Splunk can collect, but the methods of collection as well 3. Apps and Add-ons Next we’ll discuss how Apps and Add-ons from the ecosystem play a role 4. Data Onboarding Examples/Demos We’ll get into a few demos 5. Data Onboarding Best Practices and Next Steps And finally we’ll get into some common best practices and what to do from here!
  • #6: 1. Explain the different components at a high level 2. The forwarder is one of the many ways to collect data in Splunk – we will discuss setting up and using a forwarder in more detail later in the presentation
  • #7: 1. Spend some time talking about each collection method 2. Today we will concentrate on and demo the ones highlighted in blue
  • #8: Universal Forwarders provide reliable, secure data collection from remote sources and forward that data into Splunk software for indexing and consolidation. They can scale to tens of thousands of remote systems, collecting terabytes of data.
  • #9: Heavy forwarders allow for the aggregation, filtering and routing of data, as well as serving as a “data collection node” for applications such as DB Connect and other API-driven data sources. They are typically *not* used for local data collection.
  • #10: HTTP Event Collector (HEC pronounced H-E-C) is a new, robust, token-based JSON/raw API for sending events to Splunk from anywhere without requiring a forwarder. It is designed for performance and scale. Using a load balancer in front, it can be deployed to handle millions of events per second. It is highly available and it is secure. It is easy to configure, easy to use, and best of all it works out of the box.* A few other cool tidbits, it supports gzip compression, batching, HTTP keep-alive and HTTP/HTTPs.
  • #12: Splunk apps and add-ons: what & why? Splunk apps allow developers to extend data ingestion and processing capabilities of Splunk Enterprise for your specific needs. Apps facilitate more efficient completion of domain-specific tasks by the end user. High-level perspective A Splunk app is a prebuilt collection of additional capabilities packaged for a specific technology, or use cases, which allows a more effective usage of Splunk Enterprise. You can use Splunk apps to gain the specific insights you need from your machine data. Depending on the type and complexity of those use cases, and also whether the developer wants certain app parts to be configured or distributed separately (potentially by a third party), an app may rely on various add-ons. An add-on is a technical component that can be re-used across a number of different use cases and packaged with one or more Splunk apps. Add-ons may contain one or more knowledge objects, which encapsulate a specific functionality focused on a single concern and its configuration. Using an add-on should help to reduce the technical risk and cost of building an app. 
  • #14: Additionally we have the community! The community provides thousands of apps and add-ons that can help you onboard and ingest thousands of different data types and new content is added everyday!
  • #15: Let’s look at how we would use an Add-on from Splunkbase to get data in. Use an example that you are comfortable with and showcases using an add-on to get data in and mapped properly.
  • #16: < If you have another data source or want to improvise a little here feel free – otherwise you can use the following demo flow below > < Support files can be found here: LINK > 1. Install an instance of Splunk on your laptop. 2. Create an inputs.conf that monitors a directory that will contain the PANW logs files, using the PANW sourcetype from the TA.  Leave the directory empty for now. 3. Show the data preview wizard with the apache data.  Show how Splunk understands (and assigns an appropriate sourcetype) to the data.  Show proper field extractions when ingest is complete. 4. Use the wizard to upload one of the 5 PANW data files.  Show how the sourcetype is *not* automatically set, and that there are no relevant choices in the sourcetype picker in the Wizard.   Set the sourcetype to some arbitrary value.  Show that there are no relevant field extractions after ingest. 5. Now, install the PANW app. Make sure to RESTART. 6. Use the wizard to import the next PANW data file.  Now show that there *is* a relevant sourcetype in the picker.  Select it. 7. Show how fields are extracted properly in the data.  *HOWEVER* -- note that the original sourcetype is automatically changed by the TA, and you will get no results when jumping from the wizard into the search window.  Instead, show the 5 or 6 new sourcetypes that get generated as a result of the TA doing its thing. 8. Lastly, deposit the 3rd PANW data file into the monitor directory set up earlier.  Show the data in search, correctly sourcetyped. 9. Move the file to a “backup” filename in the monitored directory.  Show how Splunk does *not* reingest the data. 10. Add the 4th PANW data sample.  Show how the UF handles this.
  • #17: In this next section we are only looking at the tip of the iceberg. Data Onboarding can quickly become an advanced topic so the point of this next session is to introduce you to some of the most important/key points to get you started. After that you’ll need to do some research and learn the specifics yourself.
  • #18: These are the components that make up a successful Splunk program – both large and small. In a very large deployment, individual people (or more) can be dedicated to each of these components.
  • #19: Appropriate staffing will ensure these components are properly addressed. The person responsible for data onboarding from an architectural perspective is the Knowledge Manager
  • #20: It is important to have a defined, documented, and repeatable process for data onboarding.
  • #21: Explain Index Time Spend some time saying why these are so important for Splunk. Mention there will be references and resources at the end of the presentation to help dive deeper into these topics.
  • #22: It is important to not only get the technical details right, but also the data stewardship issues: Who owns the data, who can see it, and how long to keep it?
  • #23: It is important to “do the homework” prior to onboarding, not only to get the index-time parameters correct (previous slide) but also to ensure the resultant data in Splunk will be of value to the widest variety of people and use cases
  • #24: Make sure to show this in the demo, this slide is just a follow up reminder
  • #25: These are min number of parameters, that should be set when creating a new data source. Again, I like said when I flashed up the Splunk Apps site.. Find something similar to your source and re-work it. But make sure it includes these parameters.
  • #26: Normalizes data from different sources – Host and hostname discussion
  • #27: Syslog represents almost 50% of a typical Splunk installation’s data. And yet syslog itself is simply the protocol over which a number of devices’ log data flows. Be sure to *not* use syslog as the sourcetype, but rather that of the originating data. Use appropriate syslog tools to pre-filter data. They’re good at it, they’re free, they’re well-documented, and they integrate well with Splunk.
  • #28: HEC is the newest, and most scalable, way to collect syslog-based data.
  • #29: In addition to live, .conf, docs, answers, meetups etc etc