SlideShare a Scribd company logo
© 2017 SPLUNK INC.© 2017 SPLUNK INC.
MAY 9TH, 2017 | ZURICH
© 2017 SPLUNK INC.
Data Obfuscation in
Splunk Enterprise
Dirk Nitschke | Senior Sales Engineer
MAY 9TH, 2017 | ZURICH
© 2017 SPLUNK INC.
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Forward-Looking Statements
© 2017 SPLUNK INC.
Agenda
▶ The Drivers
▶ Role Based User Access
▶ Data-in-Flight
▶ Data-at-Rest
▶ Data Obfuscation within Splunk Enterprise
• Anonymization
• Pseudonymization
▶ Summing Up
▶ Demonstration
© 2017 SPLUNK INC.
Agenda
▶ The Drivers
▶ Role Based User Access
▶ Data-in-Flight
▶ Data-at-Rest
▶ Data Obfuscation within Splunk Enterprise
• Anonymization
• Pseudonymization
▶ Summing Up
▶ Demonstration
Best practices to protect your machine data
© 2017 SPLUNK INC.
The Drivers
risk
minimization
strategy
© 2017 SPLUNK INC.
Driver/
Requirements*
Role based
Access
Anonymization
Pseudonymiza
tion
Encryption
RAW Event
Archival
Workers
Council
Reputational
Risk
GDPR
Cross Country
Access
PCI
The Drivers
Different stakeholders have different requirements
*Examples only | Your legal department will assist you.
© 2017 SPLUNK INC.
Driver/
Requirements*
Role based
Access
Anonymization
Pseudonymiza
tion
Encryption
RAW Event
Archival
Workers
Council
Reputational
Risk
GDPR
Cross Country
Access
PCI
The Drivers
Different stakeholders have different requirements
*Examples only | Your legal department will assist you.
You need to ensure to have a flexible platform
that fits your needs
–
Even if your needs change!
© 2017 SPLUNK INC.
Confidentiality
Integrity
Authenticity
At Source
In Flight
At Rest
Presentation Layer
Anonymization
Pseudonymization
Usability
Maintainability
Cost
Spoilt for Choice
What Where How Impact
© 2017 SPLUNK INC.
Role Based User
Access
© 2017 SPLUNK INC.
Integrate Users and Roles
Integrate authentication with LDAP and Active Directory.
11
Problem Investigation Problem Investigation Problem Investigation
Save
Searches
Share
Searches
LDAP,AD Users and Groups Splunk Flexible Roles
Manage
Users
Manage
Indexes
Capabilities & Filters
NOT tag=PCI
app=ERP
…
Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.
© 2017 SPLUNK INC.
Data-in-Flight
© 2017 SPLUNK INC.
▶ Encryption and/or authentication using your own SSL certificates for:
• Communications between the browser and Splunk Web
• Communication from Splunk forwarders to indexers
• Other types of communication, such as communications between Splunk instances over the
management port
Data-in-Flight
Ways to secure your connections to Splunk Enterprise
Type of exchange Client function Server function Encryption Certificate
Authentication
Common Name
checking
Type of data exchanged
Browser to Splunk Web Browser Splunk Web NOT enabled by default dictated by client
(browser)
dictated by client
(browser)
search term results
Inter-Splunk
communication
Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results
Forwarding splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed
Deployment server to
indexers
splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey
instead.
http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL
© 2017 SPLUNK INC.
Data-at-Rest
© 2017 SPLUNK INC.
▶ Compute SHA256 hash for every slice in hot bucket
▶ When bucket rolls from hot to warm, create SHA256 hash of the file containing
the hashes of the individual slices
▶ Can verify integrity from the CLI
▶ Enable for an entire index
Integrity
Ways to ensure the integrity of your machine data stored in Splunk
http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/
© 2017 SPLUNK INC.
▶ Encryption of all data Splunk writes to disk
(index, raw data, metadata)
▶ Pros:
• Easy to implement with OS or device means
• Covers all data
• Transparent to Splunk
▶ Cons:
• Limited granularity
• Performance overhead
• Limited security against rogue users
Data-at-Rest Encryption
Entire data set
© 2017 SPLUNK INC.
Data-at-Rest Encryption
Used and available in Splunk Cloud
https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf
© 2017 SPLUNK INC.
Data Obfuscation
within Splunk
Anonymization
Pseudonymization
© 2017 SPLUNK INC.
▶ Anonymization of data means processing it with the aim of irreversibly
preventing the identification of the individual to whom it relates.
What is Anonymization?
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 ****** login successful
© 2017 SPLUNK INC.
▶ Pseudonymization of data means replacing any identifying characteristics of data
with a pseudonym, or, in other words, a value which does not allow the data
subject to be directly identified.
What is Pseudonymization?
20
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 0fc43cd589ec74dd login successful
© 2017 SPLUNK INC.
Anonymization
© 2017 SPLUNK INC.
▶ Used SEDCMD or TRANSFORMS at indexing time
▶ Pros:
• Easy to implement and maintain, easy usability, low complexity
• No impact on licensing
▶ Cons:
• Modifies raw events
• Anonymization -> less information available
At Indexing Time
https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
© 2017 SPLUNK INC.
Pseudonymization
© 2017 SPLUNK INC.
▶ Data pseudonymization before Splunk picks it up
▶ Pros:
• Managed earliest as possible in the process
• Data source owner responsible
• Data-Privacy challenge solved for data stored on source as well
▶ Cons:
• Individual solution per data source/type/method required
Application Layer
At Source
© 2017 SPLUNK INC.
▶ Hide data at presentation layer
▶ Locked down User
• Pre-defined App with dashboard access only
• No search app, no raw search, no raw event drill down
▶ | eval username = “****”
▶ | eval username=sha256(username)
▶ or use your own custom search command
Presentation Layer
© 2017 SPLUNK INC.
▶ Duplicate event, apply SEDCMD or TRANSFORMS
▶ Store original and pseudonymized event in separate
indexes
▶ Pros:
• Easy to implement and maintain, easy usability, low complexity
▶ Cons:
• Storage costs (can be limited with tsidx retention
but slower search)
• License costs
Event Duplication
idx_cleartext
idx_pseudonym
© 2017 SPLUNK INC.
▶ Scheduled summary search transforms the data
and stores it in a new summary index
▶ Pros:
• Summary index does not count against license
• Everything GUI managed
• Allows grouped aggregation (anonymization, too)
▶ Cons:
• Regular search utilizing resources
• Breaks out-of-the-box CIM (source=search name,
sourcetype=stash, original sourcetype moved to
orig_sourcetype)
Summary Index
idx_cleartext
idx_pseudonym
© 2017 SPLUNK INC.
▶ Data de-centralized piped through a custom method
using a modular input
▶ Pros:
• High flexibility on encryption, hashing etc. methods and requirements
• Processing can be done decentralized at each forwarder to distribute
processing load
▶ Cons:
• Scripting required for modular inputs
Input Layer
© 2017 SPLUNK INC.
Summing Up
© 2017 SPLUNK INC.
1. Many possible ways – each has pros and cons
2. Anonymization
• Data aggregation might be an additional layer as specific
access to a specific file from a specific host does
potentially allow identification back to an individual
3. Pseudonymization
• Requires a proper concept to ensure the pros and cons
are known and accepted in advance such that impact
and additional complexity is understood in production
and operation use
4. Choose and Mix
Summing Up
© 2017 SPLUNK INC.
Demo
Data Obfuscation
© 2017 SPLUNK INC.
Log File clear.log
Field Description Action to take
firstname First Name Encrypt with AES
lastname Last Name Encrypt with AES
dob Date of Birth Encrypt with AES
uid Employee ID Anonymize
© 2017 SPLUNK INC.
Demo Scenario
Encryption
Modular Input
Log file with sensitive data
Read log file data
File Monitor input (UF)
Modular Input encrypts field
values
Data sent to pipeline
Decryption
Custom Search Command
Events in Splunk with
encrypted field values
User is authorized to use
custom search command
Custom search command
decrypts fields
Anonymization
SEDCMD
Log file with sensitive data
Read log file data
File Monitor Input (UF)
Pipeline
Apply SEDCMD and replace
data
Data stored
© 2017 SPLUNK INC.
Modular Input?
http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro
© 2017 SPLUNK INC.
Modular Input? Splunkbase!
https://splunkbase.splunk.com/apps/#/search/Modular%20Input/
© 2017 SPLUNK INC.
▶ Different input protocols
▶ Custom data handler allows to
pre-process data
• Polyglot: many programming languages
can be used. E.g. Java, JavaScript,
Python, …
▶ Different output protocols
Protocol Data Input
Data Handler
https://splunkbase.splunk.com/app/1901/
© 2017 SPLUNK INC.
UF File Monitor
inputs.conf – outputs.conf
© 2017 SPLUNK INC.
IDX Anonymization SEDCMD
props.conf
© 2017 SPLUNK INC.
Create PDI Custom Data Handler
© 2017 SPLUNK INC.
Receiver – Protocol Data Input
© 2017 SPLUNK INC.
PDI Configuration – Protocols
© 2017 SPLUNK INC.
▶ PDI Custom Data Handler
• AESEncryptionHandler (Java)
▶ Parameters for Data Handler
• REGEX: identify fields to encrypt
• *_KeyFile: encryption keys
PDI Configuration – Data Handler
© 2017 SPLUNK INC.
Encrypted and
anonomyized
Processed
Raw Event
© 2017 SPLUNK INC.
Custom Search
Command
Decrypt data
on the fly
© 2017 SPLUNK INC.
Custom Search
Command
Encrypt data
on the fly
© 2017 SPLUNK INC.
Cleartext
Statistics
© 2017 SPLUNK INC.
Pseudonymized
Statistics?
© 2017 SPLUNK INC.
Decryption on the fly
Statistics
© 2017 SPLUNK INC.
SEPT 25-28, 2017
Walter E. Washington Convention Center
Washington, D.C.
.conf2017
The 8th Annual Splunk Conference
conf.splunk.com
You will receive an email after registration
opens with a link to save over $450 on the
full conference rate.
You’ll have 30 days to take advantage of
this special promotional rate!
SAVE OVER $450
© 2017 SPLUNK INC.
Take the Survey on Pony Poll
ponypoll.com/zurich2017
© 2017 SPLUNK INC.© 2017 SPLUNK INC.
THANK YOU

More Related Content

PDF
CBA Certificate
PPTX
Machine Data 101: Turning Data Into Insight
PDF
Machine Data 101
PDF
Splunk Discovery Day Milwaukee 9-14-17
PDF
Splunk Data Onboarding Overview - Splunk Data Collection Architecture
PDF
Splunk Discovery Indianapolis - October 10, 2017
PPTX
stackArmor Security MicroSummit - AWS Security with Splunk
PPTX
SplunkLive! London 2017 - DevOps Powered by Splunk
CBA Certificate
Machine Data 101: Turning Data Into Insight
Machine Data 101
Splunk Discovery Day Milwaukee 9-14-17
Splunk Data Onboarding Overview - Splunk Data Collection Architecture
Splunk Discovery Indianapolis - October 10, 2017
stackArmor Security MicroSummit - AWS Security with Splunk
SplunkLive! London 2017 - DevOps Powered by Splunk

Similar to SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise (20)

PPTX
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
PPTX
SplunkLive! Zurich 2017 - Splunk Add-ons and Alerts
PPTX
Security investigation hands on workshop 2018-05
PPTX
Security investigation hands-on workshop 2018
PPTX
Splunk Discovery: Milan 2018 - Intro to Security Analytics Methods
PDF
Machine Data 101
PPTX
SplunkLive! London 2017 - Happy Apps, Happy Users
PDF
Using Machine Learning and Analytics to Hunt for Security Threats - Webinar
PPTX
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
PPTX
Security crawl walk run presentation mckay v1 2017
PPTX
SplunkLive! London 2017 - Splunk Enterprise for IT Troubleshooting
PPTX
Partner Exec Summit 2018 - Frankfurt: Splunk Business Flow Beta
PPTX
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
PPTX
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
PDF
cisco-and-splunk-innovation-through-the-power-of-integration.pdf
PPTX
Splunk .conf18 Updates, Config Add-on, SplDevOps
PPTX
Essential 8 App for Splunk
PDF
Using Splunk to Defend Against Advanced Threats - Webinar Slides: November 2017
PDF
Power of SPL Workshop
PPTX
Splunk Discovery Brussels - September 2017
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
SplunkLive! Zurich 2017 - Splunk Add-ons and Alerts
Security investigation hands on workshop 2018-05
Security investigation hands-on workshop 2018
Splunk Discovery: Milan 2018 - Intro to Security Analytics Methods
Machine Data 101
SplunkLive! London 2017 - Happy Apps, Happy Users
Using Machine Learning and Analytics to Hunt for Security Threats - Webinar
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Security crawl walk run presentation mckay v1 2017
SplunkLive! London 2017 - Splunk Enterprise for IT Troubleshooting
Partner Exec Summit 2018 - Frankfurt: Splunk Business Flow Beta
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
Splunk Phantom, the Endpoint Data Model & Splunk Security Essentials App!
cisco-and-splunk-innovation-through-the-power-of-integration.pdf
Splunk .conf18 Updates, Config Add-on, SplDevOps
Essential 8 App for Splunk
Using Splunk to Defend Against Advanced Threats - Webinar Slides: November 2017
Power of SPL Workshop
Splunk Discovery Brussels - September 2017
Ad

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf

SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise

  • 1. © 2017 SPLUNK INC.© 2017 SPLUNK INC. MAY 9TH, 2017 | ZURICH
  • 2. © 2017 SPLUNK INC. Data Obfuscation in Splunk Enterprise Dirk Nitschke | Senior Sales Engineer MAY 9TH, 2017 | ZURICH
  • 3. © 2017 SPLUNK INC. During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved. Forward-Looking Statements
  • 4. © 2017 SPLUNK INC. Agenda ▶ The Drivers ▶ Role Based User Access ▶ Data-in-Flight ▶ Data-at-Rest ▶ Data Obfuscation within Splunk Enterprise • Anonymization • Pseudonymization ▶ Summing Up ▶ Demonstration
  • 5. © 2017 SPLUNK INC. Agenda ▶ The Drivers ▶ Role Based User Access ▶ Data-in-Flight ▶ Data-at-Rest ▶ Data Obfuscation within Splunk Enterprise • Anonymization • Pseudonymization ▶ Summing Up ▶ Demonstration Best practices to protect your machine data
  • 6. © 2017 SPLUNK INC. The Drivers risk minimization strategy
  • 7. © 2017 SPLUNK INC. Driver/ Requirements* Role based Access Anonymization Pseudonymiza tion Encryption RAW Event Archival Workers Council Reputational Risk GDPR Cross Country Access PCI The Drivers Different stakeholders have different requirements *Examples only | Your legal department will assist you.
  • 8. © 2017 SPLUNK INC. Driver/ Requirements* Role based Access Anonymization Pseudonymiza tion Encryption RAW Event Archival Workers Council Reputational Risk GDPR Cross Country Access PCI The Drivers Different stakeholders have different requirements *Examples only | Your legal department will assist you. You need to ensure to have a flexible platform that fits your needs – Even if your needs change!
  • 9. © 2017 SPLUNK INC. Confidentiality Integrity Authenticity At Source In Flight At Rest Presentation Layer Anonymization Pseudonymization Usability Maintainability Cost Spoilt for Choice What Where How Impact
  • 10. © 2017 SPLUNK INC. Role Based User Access
  • 11. © 2017 SPLUNK INC. Integrate Users and Roles Integrate authentication with LDAP and Active Directory. 11 Problem Investigation Problem Investigation Problem Investigation Save Searches Share Searches LDAP,AD Users and Groups Splunk Flexible Roles Manage Users Manage Indexes Capabilities & Filters NOT tag=PCI app=ERP … Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.
  • 12. © 2017 SPLUNK INC. Data-in-Flight
  • 13. © 2017 SPLUNK INC. ▶ Encryption and/or authentication using your own SSL certificates for: • Communications between the browser and Splunk Web • Communication from Splunk forwarders to indexers • Other types of communication, such as communications between Splunk instances over the management port Data-in-Flight Ways to secure your connections to Splunk Enterprise Type of exchange Client function Server function Encryption Certificate Authentication Common Name checking Type of data exchanged Browser to Splunk Web Browser Splunk Web NOT enabled by default dictated by client (browser) dictated by client (browser) search term results Inter-Splunk communication Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results Forwarding splunkd as a forwarder splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed Deployment server to indexers splunkd as a forwarder splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey instead. http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL
  • 14. © 2017 SPLUNK INC. Data-at-Rest
  • 15. © 2017 SPLUNK INC. ▶ Compute SHA256 hash for every slice in hot bucket ▶ When bucket rolls from hot to warm, create SHA256 hash of the file containing the hashes of the individual slices ▶ Can verify integrity from the CLI ▶ Enable for an entire index Integrity Ways to ensure the integrity of your machine data stored in Splunk http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/
  • 16. © 2017 SPLUNK INC. ▶ Encryption of all data Splunk writes to disk (index, raw data, metadata) ▶ Pros: • Easy to implement with OS or device means • Covers all data • Transparent to Splunk ▶ Cons: • Limited granularity • Performance overhead • Limited security against rogue users Data-at-Rest Encryption Entire data set
  • 17. © 2017 SPLUNK INC. Data-at-Rest Encryption Used and available in Splunk Cloud https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf
  • 18. © 2017 SPLUNK INC. Data Obfuscation within Splunk Anonymization Pseudonymization
  • 19. © 2017 SPLUNK INC. ▶ Anonymization of data means processing it with the aim of irreversibly preventing the identification of the individual to whom it relates. What is Anonymization? 2016-12-24 09:00 host1 mm28522 login successful 2016-12-24 09:00 host1 ****** login successful
  • 20. © 2017 SPLUNK INC. ▶ Pseudonymization of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the data subject to be directly identified. What is Pseudonymization? 20 2016-12-24 09:00 host1 mm28522 login successful 2016-12-24 09:00 host1 0fc43cd589ec74dd login successful
  • 21. © 2017 SPLUNK INC. Anonymization
  • 22. © 2017 SPLUNK INC. ▶ Used SEDCMD or TRANSFORMS at indexing time ▶ Pros: • Easy to implement and maintain, easy usability, low complexity • No impact on licensing ▶ Cons: • Modifies raw events • Anonymization -> less information available At Indexing Time https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
  • 23. © 2017 SPLUNK INC. Pseudonymization
  • 24. © 2017 SPLUNK INC. ▶ Data pseudonymization before Splunk picks it up ▶ Pros: • Managed earliest as possible in the process • Data source owner responsible • Data-Privacy challenge solved for data stored on source as well ▶ Cons: • Individual solution per data source/type/method required Application Layer At Source
  • 25. © 2017 SPLUNK INC. ▶ Hide data at presentation layer ▶ Locked down User • Pre-defined App with dashboard access only • No search app, no raw search, no raw event drill down ▶ | eval username = “****” ▶ | eval username=sha256(username) ▶ or use your own custom search command Presentation Layer
  • 26. © 2017 SPLUNK INC. ▶ Duplicate event, apply SEDCMD or TRANSFORMS ▶ Store original and pseudonymized event in separate indexes ▶ Pros: • Easy to implement and maintain, easy usability, low complexity ▶ Cons: • Storage costs (can be limited with tsidx retention but slower search) • License costs Event Duplication idx_cleartext idx_pseudonym
  • 27. © 2017 SPLUNK INC. ▶ Scheduled summary search transforms the data and stores it in a new summary index ▶ Pros: • Summary index does not count against license • Everything GUI managed • Allows grouped aggregation (anonymization, too) ▶ Cons: • Regular search utilizing resources • Breaks out-of-the-box CIM (source=search name, sourcetype=stash, original sourcetype moved to orig_sourcetype) Summary Index idx_cleartext idx_pseudonym
  • 28. © 2017 SPLUNK INC. ▶ Data de-centralized piped through a custom method using a modular input ▶ Pros: • High flexibility on encryption, hashing etc. methods and requirements • Processing can be done decentralized at each forwarder to distribute processing load ▶ Cons: • Scripting required for modular inputs Input Layer
  • 29. © 2017 SPLUNK INC. Summing Up
  • 30. © 2017 SPLUNK INC. 1. Many possible ways – each has pros and cons 2. Anonymization • Data aggregation might be an additional layer as specific access to a specific file from a specific host does potentially allow identification back to an individual 3. Pseudonymization • Requires a proper concept to ensure the pros and cons are known and accepted in advance such that impact and additional complexity is understood in production and operation use 4. Choose and Mix Summing Up
  • 31. © 2017 SPLUNK INC. Demo Data Obfuscation
  • 32. © 2017 SPLUNK INC. Log File clear.log Field Description Action to take firstname First Name Encrypt with AES lastname Last Name Encrypt with AES dob Date of Birth Encrypt with AES uid Employee ID Anonymize
  • 33. © 2017 SPLUNK INC. Demo Scenario Encryption Modular Input Log file with sensitive data Read log file data File Monitor input (UF) Modular Input encrypts field values Data sent to pipeline Decryption Custom Search Command Events in Splunk with encrypted field values User is authorized to use custom search command Custom search command decrypts fields Anonymization SEDCMD Log file with sensitive data Read log file data File Monitor Input (UF) Pipeline Apply SEDCMD and replace data Data stored
  • 34. © 2017 SPLUNK INC. Modular Input? http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro
  • 35. © 2017 SPLUNK INC. Modular Input? Splunkbase! https://splunkbase.splunk.com/apps/#/search/Modular%20Input/
  • 36. © 2017 SPLUNK INC. ▶ Different input protocols ▶ Custom data handler allows to pre-process data • Polyglot: many programming languages can be used. E.g. Java, JavaScript, Python, … ▶ Different output protocols Protocol Data Input Data Handler https://splunkbase.splunk.com/app/1901/
  • 37. © 2017 SPLUNK INC. UF File Monitor inputs.conf – outputs.conf
  • 38. © 2017 SPLUNK INC. IDX Anonymization SEDCMD props.conf
  • 39. © 2017 SPLUNK INC. Create PDI Custom Data Handler
  • 40. © 2017 SPLUNK INC. Receiver – Protocol Data Input
  • 41. © 2017 SPLUNK INC. PDI Configuration – Protocols
  • 42. © 2017 SPLUNK INC. ▶ PDI Custom Data Handler • AESEncryptionHandler (Java) ▶ Parameters for Data Handler • REGEX: identify fields to encrypt • *_KeyFile: encryption keys PDI Configuration – Data Handler
  • 43. © 2017 SPLUNK INC. Encrypted and anonomyized Processed Raw Event
  • 44. © 2017 SPLUNK INC. Custom Search Command Decrypt data on the fly
  • 45. © 2017 SPLUNK INC. Custom Search Command Encrypt data on the fly
  • 46. © 2017 SPLUNK INC. Cleartext Statistics
  • 47. © 2017 SPLUNK INC. Pseudonymized Statistics?
  • 48. © 2017 SPLUNK INC. Decryption on the fly Statistics
  • 49. © 2017 SPLUNK INC. SEPT 25-28, 2017 Walter E. Washington Convention Center Washington, D.C. .conf2017 The 8th Annual Splunk Conference conf.splunk.com You will receive an email after registration opens with a link to save over $450 on the full conference rate. You’ll have 30 days to take advantage of this special promotional rate! SAVE OVER $450
  • 50. © 2017 SPLUNK INC. Take the Survey on Pony Poll ponypoll.com/zurich2017
  • 51. © 2017 SPLUNK INC.© 2017 SPLUNK INC. THANK YOU

Editor's Notes

  • #8: Different Stakeholders have different requirements on the same set of data. With a centralized solution an organization can ensure that there is a standard set and guidelines are followed and not each department has to re-invite the wheel for their own data sources they own a logging solution. However it could happen that there is a clash of requirements – where the business and a regulation requires that the chain of custody can’t be broken but on the other side data should be pseudonymized. This means combination of multiple different techniques need to be applied.
  • #9: Different Stakeholders have different requirements on the same set of data. With a centralized solution an organization can ensure that there is a standard set and guidelines are followed and not each department has to re-invite the wheel for their own data sources they own a logging solution. However it could happen that there is a clash of requirements – where the business and a regulation requires that the chain of custody can’t be broken but on the other side data should be pseudonymized. This means combination of multiple different techniques need to be applied.
  • #12: Splunk allows you to extend your existing AAA systems into the Splunk search system for both security and convenience. Splunk can connect to your LDAP based systems, like AD, and directly map your groups and users to Splunk users and roles. From there, define what users and groups can access Splunk, which apps and searches they have access to, and automatically (and transparently) filter their results by any search you can define. That allows you to not only exclude whole events that are inappropriate for a user to see, but also mask or hide specific fields in the data – such as customer names or credit card numbers – from those not authorized to see the entire event.
  • #16: Bei gespeicherten Daten stellt sich zum einen die Frage, wie man erkennen kann, ob sie nach der initialen Speicherung verändert worden sind. Dabei kann es sich um das berühmte gekippte Bit handeln aber auch um eine Veränderung, die von einem Benutzer versehentlich oder absichtlich durchgeführt wurde.   Splunk bietet hier die sogenannte Data Integrity Control. Dabei werden für Slices in Hot Buckets eine Indexes SHA-256 Hashes erzeugt und in einer Datei abgespeichert. Wenn die Buckets in den Status „warm“ wechseln, wird zusätzlich eine Checksumme von der Datei mit den Hashes erzeugt und abgelegt.   Über die Kommandozeile und den Befehl splunk check-integrity kann dann die Integrität der Slices überprüft werden.
  • #17: Um Vertraulichkeit zu erreichen, kann man darüber nachdenken, all Daten, die von Splunk gespeichert werden, zu verschlüsseln. Viele Speichersysteme aber auch Dateisysteme bieten eine solche Möglichkeit, die einige Vorteile hat: Sie ist leicht umsetzbar Es lassen sich alle Daten verschlüsseln Verschlüsselung / Entschlüsselung ist transparent für Splunk und erfordert keine Anpassungen Wie üblich hat diese Methode auch ihre Schattenseiten: Die Granularität ist meist auf ganze Dateisysteme oder LUNs beschränkt Verschlüsselung auf Betriebssystemebene kann Einfluss auf die Systemperformance haben Da Verschlüsselung transparent für die Anwendung ist, bietet sie nur Schutz gegen ausgewählte Bedrohungen wie den Diebstahl eines Speichermediums. Wer Zugang zum System zur Laufzeit hat, kann auf Daten im Klartext zugreifen.
  • #18: Hier können zusätzliche Tools helfen, die eine erweiterte Zugriffskontrolle bei transparenter Verschlüsselung bieten, wie zum Beispiel Vormetric Transparent Encryption. Aber das ist ein eigenes Thema.
  • #23: Auf der rechten Seite sehen wir den Datenfluss durch Splunk. Eine Applikation liefert Daten, die über einen sogenannten Input an Splunk geschickt werden. Dann laufen die Daten durch die verschiedenen Pipelines in Splunk und werden schließlich in einem Index abgespeichert. Der Anwender greift über den Search Head auf die Daten zu und erhält eine Ansicht der Ergebnisse.   Splunk bietet von Haus aus die Möglichkeit, raw Events zu modifizieren, bevor sie in einen Index geschrieben werden. Dies erfolgt über entsprechende SEDCMD oder TRANSFORMS, die in der Typing Pipeline zur Anwendung kommen.   Im Beispiel werden bei einer neunstelligen Zahl, die auf ssn= folgt, die ersten fünf Stellen durch x ersetzt und dann der veränderte raw Event abgespeichert.
  • #24: Auf der rechten Seite sehen wir den Datenfluss durch Splunk. Eine Applikation liefert Daten, die über einen sogenannten Input an Splunk geschickt werden. Dann laufen die Daten durch die verschiedenen Pipelines in Splunk und werden schließlich in einem Index abgespeichert. Der Anwender greift über den Search Head auf die Daten zu und erhält eine Ansicht der Ergebnisse.   Splunk bietet von Haus aus die Möglichkeit, raw Events zu modifizieren, bevor sie in einen Index geschrieben werden. Dies erfolgt über entsprechende SEDCMD oder TRANSFORMS, die in der Typing Pipeline zur Anwendung kommen.   Dies ist einfach umzusetzen und zu pflegen und nicht sonderlich komplex. Andererseits werden hierdurch die raw Events vor der Speicherung verändert und wie bereits gesagt, gehen Informationen durch die Anonymisierung verloren.
  • #26: Am Anfang des Datenflusses steht die Applikation. Die Pseudonymisierung kann gegebenenfalls schon durchgeführt werden, bevor Splunk die Daten verarbeitet.   Die Vorteile dieser Methode bestehen darin, dass die Pseudonymisierung früh im gesamten Prozess erfolgt und die Verantwortung bei den Data Ownern liegt   Andererseits ist jede Applikation daraufhin zu prüfen, ob sie Daten überhaupt entsprechend zur Verfügung stellen.  
  • #27: Wir können auch am anderen Ende des Datenflusses, also im Presentation Layer tätig werden und zum Beispiel einen Benutzernamen durch einen Hash ersetzen oder ein eigenes Custom Search Command verwenden, zum Beispiel eines, dass den Benutzernamen verschlüsselt. Dies ist jedoch nur sinnvoll, wenn gleichzeitig die Möglichkeiten des Benutzers weiter eingeschränkt werden, so dass er keinen zugriff auf die unveränderten Daten hat.
  • #28: Was können wir bei den Inputs tun?   Event Routing erlaubt es, Events zu duplizieren und zum Beispiel in verschiedene Indizes zu schreiben. In einem Index befinden sich pseudonymisierte Daten, im anderen Klartext. Zugriffsregeln auf die Indices regeln, welche Benutzer auf welchen Index zugreifen können.   Dies ist einfach zu konfigurieren und zu pflegen.   Jedoch erhöht sich das indizierte Datenvolumen, was erhöhte Lizenz- und Speicherkosten zur Folge hat. Im Speicherbereich kann tsidx Retention helfen, aber zu verlängerten Suchzeiten führen.
  • #29: Alternativ lassen sich Summary Index Searches definieren, die keine kritischen Daten enthalten. Auch hier wird der Zugriff auf den Index mit den Daten im Klartext eingeschränkt.   Vorteil hier ist, dass Summary Indexe nicht gegen das täglich indizierte Datenvolumen gezählt werden. Andererseits benötigen Summary Searches Systemressourcen und die Analyse kann erschwert werden: So ändert sich der sourcetype zu „stash“ und der originäre sourcetype findet sich im Feld orig_sourcetype.
  • #30: Wenn die bisher vorgestellten Ansätze die gestellten Anforderungen nicht erfüllen, kann man auf der Ebene der Inputs ansetzen.   Splunk kennt neben File Inputs, Network Inputs und Scripted Inputs die sogenannten Modular Inputs. Dies haben den Vorteil, dass sie sehr flexibel sind. Allerdings muss man das nötige Skripting bzw. die Programmierung selber durchführen. Der Splunk Add-on Builder kann hier eine Hilfe sein. Wie man einen solchen Modular Input für die Verschleierung von Daten nutzen kann, wird beispielhaft in der Demonstration gezeigt.
  • #34: Hier sehen wir das Logfile. Jeder Event besteht aus einer Zeile. Am Beginn der Zeile steht der Timestamp. Die restlichen Daten liegen als Key/Value Paare vor, die durch ein & voneinander getrennt sind.   Die Felder firstname, lastname, dob gelten als kritisch und sollen mit AES verschlüsselt werden. Das Feld uid soll anonymisiert werden
  • #35: Die Anforderung ist mehrteilig:   Wir haben einen Search Head und einen Indexer. Auf einem Universal Forwarder liegt ein Logfile vor, welches Daten enthält, die teilweise zu verschlüsseln sind. Wir wollen die zuvor beschriebene Idee umsetzen, einen Modular Input für die Pseudonymisierung der Daten zu verwenden. Dabei wollen wir den Aufwand möglichst gering halten, also möglichst viele Funktionalitäten nutzen, die bereits in Splunk vorhanden sind. Die Grundidee besteht daher darin, die Logdatei auf dem Universal Forwarder durch einen File Monitor überwachen zu lassen und die Daten dann an den Modular Input zu schicken. Berechtigte Personen sollen die Möglichkeit haben, die zuvor verschlüsselten Daten zu entschlüsseln. Dies wird über ein Custom Search Command und entsprechende Berechtigungen realisiert. Teile der Daten sind zu anonymisieren. Dies erfolgt einfach über SEDCMD.
  • #36: Aber was ist ein Modular Input und wie baut man einen? Zunächst einmal ist ein Modular Input eine weitere Möglichkeit Daten in Splunk zu erfassen. Schauen wir in die Splunk-Dokumentation zu Modular Inputs, so wird als einer der Anwendungsfälle das Reformatieren komplexer Daten genannt – also zum Beispiel Verschleierung.
  • #37: Wie üblich kann man versuchen herauszufinden, ob bereits jemand dass Problem gelöst hat. Auf Splunkbase findet an zahlreiche Beispiele für Modular Inputs.
  • #38: Die Protocol Data Inputs App hat meine besondere Aufmerksamkeit geweckt. Denn dieser Modular Input erlaubt es, Daten über verschiedene Protokolle wie TCP, UDP, HTTP entgegenzunehmen. Diese Daten werden dann von einem sogenannten Data Handler verarbeitet. Der Data Handler kann in unterschiedlichen Programmiersprachen wie Java, JavaScript, Python und weiteren erstellt werden. Die Ausgabe der Daten erfolgt dann ebenfalls über verschiedene Wege, nämlich STDOUT, TCP oder HTTP Event Collector.   Eingabe und Ausgabe sind also schon erledigt und man kann sich auf die Verarbeitung der Daten konzentrieren, also die Verschlüsselung von Daten. Sieht nach einem guten Lösungsansatz für unsere Aufgabe aus!
  • #39: Auf dem Universal Forwarder wird ein File Input definiert. Die Daten im raw Format an einen Server / Port geschickt.
  • #40: Das entsprechende SEDCMD für die Anonymisierung der uid wird auf dem Indexer in props.conf definiert.
  • #41: Jetzt ist es an der Zeit, den Data Handler für den Protocol Data Input zu erstellen.
  • #42: Anschließend muss der Der Protocol Data Input definiert und konfiguriert werden. Der PDI lauscht auf dem Port 41002, an den der UF die Daten im Raw-Format schickt.
  • #43: Daten werden über TCP Port 41002 entgegengenommen und über STDOUT ausgegeben (und landen damit in der Data Pipeline).
  • #44: Die Verarbeitung der Daten erfolgt durch den Custom Data Handler. In diesem Fall ist dies eine Java Klasse, die die eingehenden Daten mit AES verschlüsselt.   Dem Custom Data Handler können Konfigurationsparameter im JSON-Format übergeben werden. Dieser Data Handler erlaubt es, über einen regulären Ausdruck zu beschreiben, welche Daten verschlüsselt werden sollen, nämlich die in der Capturing Group encrypt. Die Gruppen pre und post beschreiben Text, der in der Ausgabe vor bzw. hinter dem verschlüsselten und zusätzlich Base64 kodierten Text ausgegeben werden soll. Weitere Parameter sind zum Beispiel der Name des Key-Files.
  • #51: And of course, the live expression of our community is our users conference. Journalists last year said it was more like a family reunion than a technology conference, and we take that as a compliment. It’s the best place to share best practices, new ideas and learn directly from the smartest people in the Splunk ecosystem. Doesn’t matter if you’re just getting started with Splunk or are a veteran user, everyone learns something and gets reenergized at .conf. 4 inspired Keynotes 165+ Breakout sessions addressing all areas and levels of Operational Intelligence – IT, Business Analytics, Mobile, Cloud, IoT, Security…and MORE! 30+ hours of invaluable networking time with industry thought leaders, technologists, and other Splunk Ninjas and Champions waiting to share their business wins with you! Join the 50%+ of Fortune 100 companies who attended .conf2015 to get hands on with Splunk. You’ll be surrounded by thousands of other like-minded individuals who are ready to share exciting and cutting edge use cases and best practices. You can also deep dive on all things Splunk products together with your favorite Splunkers. Head back to your company with both practical and inspired new uses for Splunk, ready to unlock the unimaginable power of your data! Arrive in Orlando a Splunk user, leave Orlando a Splunk Ninja! REGISTRATION IS OPEN, sessions will be posted by end of June