SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise

© 2017 SPLUNK INC.© 2017 SPLUNK INC.
MAY 9TH, 2017 | ZURICH

© 2017 SPLUNK INC.
Data Obfuscation in
Splunk Enterprise
Dirk Nitschke | Senior Sales Engineer
MAY 9TH, 2017 | ZURICH

© 2017 SPLUNK INC.
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Forward-Looking Statements

© 2017 SPLUNK INC.
Agenda
▶ The Drivers
▶ Role Based User Access
▶ Data-in-Flight
▶ Data-at-Rest
▶ Data Obfuscation within Splunk Enterprise
• Anonymization
• Pseudonymization
▶ Summing Up
▶ Demonstration

© 2017 SPLUNK INC.
Agenda
▶ The Drivers
▶ Role Based User Access
▶ Data-in-Flight
▶ Data-at-Rest
▶ Data Obfuscation within Splunk Enterprise
• Anonymization
• Pseudonymization
▶ Summing Up
▶ Demonstration
Best practices to protect your machine data

© 2017 SPLUNK INC.
The Drivers
risk
minimization
strategy

© 2017 SPLUNK INC.
Driver/
Requirements*
Role based
Access
Anonymization
Pseudonymiza
tion
Encryption
RAW Event
Archival
Workers
Council
Reputational
Risk
GDPR
Cross Country
Access
PCI
The Drivers
Different stakeholders have different requirements
*Examples only | Your legal department will assist you.

© 2017 SPLUNK INC.
Driver/
Requirements*
Role based
Access
Anonymization
Pseudonymiza
tion
Encryption
RAW Event
Archival
Workers
Council
Reputational
Risk
GDPR
Cross Country
Access
PCI
The Drivers
Different stakeholders have different requirements
*Examples only | Your legal department will assist you.
You need to ensure to have a flexible platform
that fits your needs
–
Even if your needs change!

© 2017 SPLUNK INC.
Confidentiality
Integrity
Authenticity
At Source
In Flight
At Rest
Presentation Layer
Anonymization
Pseudonymization
Usability
Maintainability
Cost
Spoilt for Choice
What Where How Impact

© 2017 SPLUNK INC.
Role Based User
Access

© 2017 SPLUNK INC.
Integrate Users and Roles
Integrate authentication with LDAP and Active Directory.
11
Problem Investigation Problem Investigation Problem Investigation
Save
Searches
Share
Searches
LDAP,AD Users and Groups Splunk Flexible Roles
Manage
Users
Manage
Indexes
Capabilities & Filters
NOT tag=PCI
app=ERP
…
Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.

© 2017 SPLUNK INC.
Data-in-Flight

© 2017 SPLUNK INC.
▶ Encryption and/or authentication using your own SSL certificates for:
• Communications between the browser and Splunk Web
• Communication from Splunk forwarders to indexers
• Other types of communication, such as communications between Splunk instances over the
management port
Data-in-Flight
Ways to secure your connections to Splunk Enterprise
Type of exchange Client function Server function Encryption Certificate
Authentication
Common Name
checking
Type of data exchanged
Browser to Splunk Web Browser Splunk Web NOT enabled by default dictated by client
(browser)
dictated by client
(browser)
search term results
Inter-Splunk
communication
Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results
Forwarding splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed
Deployment server to
indexers
splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey
instead.
http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL

© 2017 SPLUNK INC.
Data-at-Rest

© 2017 SPLUNK INC.
▶ Compute SHA256 hash for every slice in hot bucket
▶ When bucket rolls from hot to warm, create SHA256 hash of the file containing
the hashes of the individual slices
▶ Can verify integrity from the CLI
▶ Enable for an entire index
Integrity
Ways to ensure the integrity of your machine data stored in Splunk
http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/

© 2017 SPLUNK INC.
▶ Encryption of all data Splunk writes to disk
(index, raw data, metadata)
▶ Pros:
• Easy to implement with OS or device means
• Covers all data
• Transparent to Splunk
▶ Cons:
• Limited granularity
• Performance overhead
• Limited security against rogue users
Data-at-Rest Encryption
Entire data set

© 2017 SPLUNK INC.
Data-at-Rest Encryption
Used and available in Splunk Cloud
https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf

© 2017 SPLUNK INC.
Data Obfuscation
within Splunk
Anonymization
Pseudonymization

© 2017 SPLUNK INC.
▶ Anonymization of data means processing it with the aim of irreversibly
preventing the identification of the individual to whom it relates.
What is Anonymization?
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 ****** login successful

© 2017 SPLUNK INC.
▶ Pseudonymization of data means replacing any identifying characteristics of data
with a pseudonym, or, in other words, a value which does not allow the data
subject to be directly identified.
What is Pseudonymization?
20
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 0fc43cd589ec74dd login successful

© 2017 SPLUNK INC.
Anonymization

© 2017 SPLUNK INC.
▶ Used SEDCMD or TRANSFORMS at indexing time
▶ Pros:
• Easy to implement and maintain, easy usability, low complexity
• No impact on licensing
▶ Cons:
• Modifies raw events
• Anonymization -> less information available
At Indexing Time
https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

© 2017 SPLUNK INC.
Pseudonymization

© 2017 SPLUNK INC.
▶ Data pseudonymization before Splunk picks it up
▶ Pros:
• Managed earliest as possible in the process
• Data source owner responsible
• Data-Privacy challenge solved for data stored on source as well
▶ Cons:
• Individual solution per data source/type/method required
Application Layer
At Source

© 2017 SPLUNK INC.
▶ Hide data at presentation layer
▶ Locked down User
• Pre-defined App with dashboard access only
• No search app, no raw search, no raw event drill down
▶ | eval username = “****”
▶ | eval username=sha256(username)
▶ or use your own custom search command
Presentation Layer

© 2017 SPLUNK INC.
▶ Duplicate event, apply SEDCMD or TRANSFORMS
▶ Store original and pseudonymized event in separate
indexes
▶ Pros:
• Easy to implement and maintain, easy usability, low complexity
▶ Cons:
• Storage costs (can be limited with tsidx retention
but slower search)
• License costs
Event Duplication
idx_cleartext
idx_pseudonym

© 2017 SPLUNK INC.
▶ Scheduled summary search transforms the data
and stores it in a new summary index
▶ Pros:
• Summary index does not count against license
• Everything GUI managed
• Allows grouped aggregation (anonymization, too)
▶ Cons:
• Regular search utilizing resources
• Breaks out-of-the-box CIM (source=search name,
sourcetype=stash, original sourcetype moved to
orig_sourcetype)
Summary Index
idx_cleartext
idx_pseudonym

© 2017 SPLUNK INC.
▶ Data de-centralized piped through a custom method
using a modular input
▶ Pros:
• High flexibility on encryption, hashing etc. methods and requirements
• Processing can be done decentralized at each forwarder to distribute
processing load
▶ Cons:
• Scripting required for modular inputs
Input Layer

© 2017 SPLUNK INC.
Summing Up

© 2017 SPLUNK INC.
1. Many possible ways – each has pros and cons
2. Anonymization
• Data aggregation might be an additional layer as specific
access to a specific file from a specific host does
potentially allow identification back to an individual
3. Pseudonymization
• Requires a proper concept to ensure the pros and cons
are known and accepted in advance such that impact
and additional complexity is understood in production
and operation use
4. Choose and Mix
Summing Up

© 2017 SPLUNK INC.
Demo
Data Obfuscation

© 2017 SPLUNK INC.
Log File clear.log
Field Description Action to take
firstname First Name Encrypt with AES
lastname Last Name Encrypt with AES
dob Date of Birth Encrypt with AES
uid Employee ID Anonymize

© 2017 SPLUNK INC.
Demo Scenario
Encryption
Modular Input
Log file with sensitive data
Read log file data
File Monitor input (UF)
Modular Input encrypts field
values
Data sent to pipeline
Decryption
Custom Search Command
Events in Splunk with
encrypted field values
User is authorized to use
custom search command
Custom search command
decrypts fields
Anonymization
SEDCMD
Log file with sensitive data
Read log file data
File Monitor Input (UF)
Pipeline
Apply SEDCMD and replace
data
Data stored

© 2017 SPLUNK INC.
Modular Input?
http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro

© 2017 SPLUNK INC.
Modular Input? Splunkbase!
https://splunkbase.splunk.com/apps/#/search/Modular%20Input/

© 2017 SPLUNK INC.
▶ Different input protocols
▶ Custom data handler allows to
pre-process data
• Polyglot: many programming languages
can be used. E.g. Java, JavaScript,
Python, …
▶ Different output protocols
Protocol Data Input
Data Handler
https://splunkbase.splunk.com/app/1901/

© 2017 SPLUNK INC.
▶ PDI Custom Data Handler
• AESEncryptionHandler (Java)
▶ Parameters for Data Handler
• REGEX: identify fields to encrypt
• *_KeyFile: encryption keys
PDI Configuration – Data Handler

© 2017 SPLUNK INC.
SEPT 25-28, 2017
Walter E. Washington Convention Center
Washington, D.C.
.conf2017
The 8th Annual Splunk Conference
conf.splunk.com
You will receive an email after registration
opens with a link to save over $450 on the
full conference rate.
You’ll have 30 days to take advantage of
this special promotional rate!
SAVE OVER $450

SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise

More Related Content

Similar to SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise (20)

More from Splunk (20)

Recently uploaded (20)

SplunkLive! Zurich 2017 - Data Obfuscation in Splunk Enterprise

Editor's Notes