SlideShare a Scribd company logo
1© Copyright 2015 EMC Corporation. All rights reserved.
Building Massive and Efficient Indexer Storage Environments for Splunk
Don Green– SAS Specialist
@Texas_Don
2© Copyright 2015 EMC Corporation. All rights reserved.
• Data and Storage Tech Trends
• Splunk Architecture
• Why Flash your Home Path?
• The Big, Cold Data Lake
• Converged Solutions
• Resources – Sweet Apps
Agenda
3
DATA GROWTH
Source: IDC
3
2015
71 EB
Total Capacity Shipped, Worldwide % of Unstructured Data
75%
78%
80%
2016
106 EB
2017
133 EB
4
Do more with less…
5
Architecture Matters…
Scale-up Scale-Out
6
Enter SPLUNK ENTERPRISE
Enterprise
Scalability
Search &
Investigation
Proactive
Monitoring
Operational
Visibility
Real-time
Business
Insights
Operational IntelligenceAny Machine Data
INDUSTRY-LEADINGPLATFORMFORMACHINE DATA
Online
Services
Web
Services
Servers
Security
GPS
Location
Storage Desktops
Networks
Packaged
Applications
Custom
Applications
Messaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call Detail
Records
Smartphones
and Devices
RFID
Datacenter
Private
Cloud
Public
Cloud
7
SPLUNK ARCHITECTURE
Search Heads
Query information across indexers
and are usually CPU and memory
intensive.
Indexers
Write data to disk and are both CPU
and I/O intensive.
Forwarders
Collect and forward data; usually
lightweight and not resource
intensive.
http://docs.splunk.com/Documentation/Splunk/latest/Overview/AboutSplunkEnterprisedeployments
8
• High-Performance Storage
– Rare & Sparse Searches
• High-Capacity Storage
– Long-Term Retention
• Scale-Out Infrastructure
– Indexer & Search Heads
• De-dupe & Compression
– Clustered Indexer Deployments
• Backup & Security
– Data Protection & Compliance
SPLUNK STORAGE REQUIREMENTS
ENTERPRISE PERFORMANCE AND DATA SERVICES
Indexers
Search Heads
Capacity
Triggered
HOT
WARM
COLD
Age
Triggered
9
Splunk Indexer Buckets
HOT / WARM
• Recent searches/dashboards
• Usually block LUN
• High Random reads
• Sequential reads / writes
• Rare searches
• Usually NAS share
• Light Random Reads
• Sequential reads / writes
• Not searchable
• Usually offline media
• Only sequential write
Splunk moves indexes from between from hot/warm to cold to frozen
based on user configuration
Size of the buckets impacts performance
COLD FROZEN
10
Indexer Storage Capacity
Indexer
Uncompressed ‘indexes’
70% of written data
 350GB
1TB Ingested Data
= ½ Ingested Data
= 500GB
Compressed Raw data
30% of written data
 150GB
Raw Data Indexes
*.gz *.tsidx
Written Data
11
How much storage you need?
Indexer
1TB Data Ingested Daily
= 1TB * ½ * 60 days = 30TB
Raw Data Indexes
150GB/Day * 60 Days = 9TB 350GB/Day * 60 Days = 21TB
= Daily indexing rate
x ½
x Retention policy
12
Multiple copies of index and raw data
• Replication factor (RF) = # of of copies of raw data
• Search factor (SF) = # copies of indexes 
SPLUNK INDEXER AVAILABILITY
30TB written  30TB replicated
1TB * 60 days x ½ x 2
= 60TB (RF/SF=2) ** doubled **
1TB * 60 days x ½ x 3
= 90TB (RF/SF=3) ** tripled **
STORAGE CAPACITY
MULTIPLIES!
Raw
9TB
Index
21TB
Raw
9TB
Index
21TB
SF=2 / RF=2
13
Just how much storage?
Storage Requirements in TB
1 Year 2 Years 3 Years 4 Years 5 Years
SplunkLicense(GB/DAY)
Retention (Days) 1 7 14 30 90 180 365 730 1095 1460 1825
25 0.025 0.175 0.35 0.75 2.25 4.5 9.125 18.25 27.375 36.5 45.625
50 0.05 0.35 0.7 1.5 4.5 9 18.25 36.5 54.75 73 91.25
100 0.1 0.7 1.4 3 9 18 36.5 73 109.5 146 182.5
250 0.25 1.75 3.5 7.5 22.5 45 91.25 182.5 273.75 365 456.25
500 0.5 3.5 7 15 45 90 182.5 365 547.5 730 912.5
1000 1 7 14 30 90 180 365 730 1095 1460 1825
2000 2 14 28 60 180 360 730 1460 2190 2920 3650
3000 3 21 42 90 270 540 1095 2190 3285 4380 5475
4000 4 28 56 120 360 720 1460 2920 4380 5840 7300
5000 5 35 70 150 450 900 1825 3650 5475 7300 9125
6000 6 42 84 180 540 1080 2190 4380 6570 8760 10950
7000 7 49 98 210 630 1260 2555 5110 7665 10220 12775
10000 10 70 140 300 900 1800 3650 7300 10950 14600 18250
*Assumes RF/SF = 2
14
DAS PRESENTS CHALLENGES
SPLUNK DAS ENVIRONMENT
1
Dedicated Storage Infrastructure
• Silo that only runs Splunk
2
Compromised Availability
• SSDs & servers fail
• Index rebuilds can take hours to days
3
Lack of Enterprise Data Protection
• No Snapshots or Compliance
• DR limited to Multisite Clustering
4
Poor Storage Efficiency
• Multiple copies of data
• Multisite Clustering Increases Overhead
5
Non-Optimized Growth
• Fixed compute to storage ratio
• Servers must maintain storage symmetry
6
Management complexity
• Multiple management points
1x
2x
3x
2x
3x
1x
15
WHY EMC FOR SPLUNK
OPTIMIZED INFRASTRUCTURE FOR BIG & FAST DATA
Optimized Shared
Storage & Tiering
Hot & Warm
Data Deployed
On XtremIO or
ScaleIO
Cold & Frozen
Data Deployed
On Isilon
Powerful Data
Services
Encyption &
Security
Index File
Compression
Deduplication Of
Clustered Indexes
Snapshots For
Backups
Cost-Effective &
Flexible Scale-Out
Scale-Out Capacity &
Compute Independently Or
As Converged Platform
16
Why Flash?!?
Economic Influences
 Consumer Demand
 Data Services
Allowing free Copies
of Application Data
 Flash technology has
improved at a faster
rate than Moore’s
Law
Intelligent Scale-out Flash
HDD
17
AGILE
WRITEABLE
SNAPSHOTS
INLINE
DATA AT REST
ENCRYPTION
XTREMIO DATA
PROTECTION
INLINE
DEDUPLICATION
INLINE
COMPRESSION
ALWAYS-ON
THIN
PROVISIONING
XTREMIO DATA SERVICES
ALWAYS-ON, INLINE, ZERO PENALTY, FREE
18
Data Services For
Hot & Warm Data
Self-Encrypting
Flash Drives
Index File
Compression
Dedupe Clustered
Index Copies
In-Memory Data
Copy Services
EMC XTREMIO & SPLUNK
ALL-FLASH INFRASTRUCTURE FOR HOT & WARM DATA
Scale-Out Flash For
I/O-Bound Data
>1M IOPS & <1ms Latencies
High-Speed Search
Accelerate SuperSparse
& Rare Searches
Indexers
Search Heads
19
XTREMIO & INDEXER AVAILABILITY
30TB written  30TB replicated  30TB replicated
Raw
9TB
Index
21TB
Raw
9TB
Index
21TB
(RF/SF=2)
Raw
9TB
Index
21TB
= 31.25TB
(RF/SF=3)
= 32.5TB
(RF/SF=1)
1TB * 60 days x ½ x 1 = 30TB
= 30TB
** doubled ** ** tripled **
With XTREMIO Inline De-Dup
1TB * 60 days x ½ x 2 = 60TB 1TB * 60 days x ½ x 3 = 90TB
20
EMC SCALEIO & SPLUNK
CONVERGED ARCHITECTURE FOR HOT & WARM DATA
Indexers
Search Heads
Servers
Network
Storage
Converged Splunk
Architecture
Leveraging Existing
Hardware Investments
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
Shared Capacity &
Performance
Remove Silos & Increase
ROI On DAS Capacity & No
Single Point Of Failure
25K IOPS & 5TB
21
OneFS
EMC Isilon – Deep and WIDE Storage
Single Volume/
File System
Policy based
Tiering
Simplicity &
Ease of Use
Linear
Scalability
Multi-protocol
support
High
Performance
Unmatched
Efficiency
Easy
Growth
22
Consolidate, Protect
& Secure Cold Data
SmartLock Protects
Cold & Frozen Data
SmartDedupe For
Clustered Indexes
Snapshots IQ
For Backups
EMC ISILON & SPLUNK
LOW-COST & SECURE SCALE-OUT FOR COLD DATA
High-Speed Ingest
& Long-Term Retention With
Native HDFS Integration
Indexers
Search Heads
Scale-Out Capacity
Up To 50PB Of Highly
Available Capacity
Self-Encrypting
Drives
2323
NEXT-GENWORKLOADSTRADITIONALWORKLOADS
Data Silos vs Consolidated Data Lake
24
Isilon
Scale-Out
Data Lake
24
Data Silos vs Consolidated Data Lake
25
• One
instance of
the file
services all
dependent
workloads
simultaneo
usly
FILE
25
FILE
EMC Isilon Next-Gen Access Methods
26
EMC REFERENCE
ARCHITECTURES FOR
SPLUNK ENTERPRISE
XtremIO and Isilon Reference Architecture
ScaleIO and Isilon Reference Architecture
27
EMC REFERENCE ARCHITECTURES
Single-Instance Distributed
HOT & COLD
Mostly searches
Heavy Random reads
Sequential writes
Adhoc searches
Light Random Reads
Sequential Writes
Indexers Indexers Search
WARM
XtremIO
Scale-Out 160 TB Flash & No Tuning
No RAID Configuration Needed
Many Copies & No Overhead
Isilon
Multi-Protocol = Always Searchable
Tier “Frozen” Data Without Migration
Scale-Out 50 PB Of Cold & Frozen Data
Deduplication & Compression SmartDedupe & SmartLock
28
VBLOCK® SYSTEMS – THE ONLY TRUE CONVERGED
INFRASTRUCTURE
Application
Optimization
Lifecycle System
Assurance
API Enabled, Converged
Management
Integrated Protection and
Workload Mobility Solutions
Pre-engineered,
Pre-validated, Pre-tested
Best-of-breed
Technology
Fastest Time-to-Business
Highest Performance
Highest Availability
Converged Management
Lowest Risk
Customer Experience
Lowest TCO
29
WHY VCE FOR SPLUNK?
Factory Physical and Logical Build
Roadmap and New Feature
Planning
Compliance-Ready
Configuration and Patch
Management
Performance and Availability Single Support Through VCE
FOCUS ON BUSINESS OPERATIONS, NOT MAINTAINING INFRASTRUCTURE
30
VCE to Address Three Opportunities Splunk
VCE™
technology
extension for
EMC® Isilon®
Storage
Vblock/VxBlock
System 540
Vblock/VxBlock
System 340
HOT/WARM COLDRack-Scale Bundle
Block/Scale-Out Bundle
31
SPLUNK APP FOR VCE Vision
 VCE Systems presented as
an entity
 Compliance history – what
has been changed?
 System inventory and
health
 KPI dashboards
 One command to configure
system logs and events
32
SPLUNK APPS
https://splunkbase.splunk.com/app/2812/
https://splunkbase.splunk.com/app/2688/
Allows Splunk to
monitor Isilon
performance
XtremIO app
out now too!
33
How We Size Splunk Infrastructure
• We Use Splunk Best Practices
“Religiously”
• We build “Converged Systems” first
• We have our own Ninjas to help!
• http://splunk-sizing.appspot.com/
Not Officially Splunk-Supported
EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environments for Splunk

More Related Content

PDF
NVMe and Flash – Make Your Storage Great Again!
PPTX
#PCMVision: VMware NSX - Transforming Security
 
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

PDF
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
PPTX
Machine Learning Models: From Research to Production 6.13.18
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
PDF
Emerson converged infrastructure (thermal-power-mgt.-security (gs)
PDF
Manage easier, deliver faster, innovate more - Top 10 facts on Dell Enterpris...
NVMe and Flash – Make Your Storage Great Again!
#PCMVision: VMware NSX - Transforming Security
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
Machine Learning Models: From Research to Production 6.13.18
Spark and Deep Learning Frameworks at Scale 7.19.18
Emerson converged infrastructure (thermal-power-mgt.-security (gs)
Manage easier, deliver faster, innovate more - Top 10 facts on Dell Enterpris...

What's hot (20)

PPTX
#PCMVision: HPE Family: Numble Storage and SimpliVity
 
PDF
Get Started with Data Science by Analyzing Traffic Data from California Highways
PDF
Splunk Cloud
PPTX
Get started with Cloudera's cyber solution
PDF
How to Avoid Disasters via Software-Defined Storage Replication & Site Recovery
PDF
Machine Learning in the Enterprise 2019
PDF
Garance 100% dostupnosti dat! Kdo z vás to má?
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
PPTX
Pros and Cons of Moving to Cloud and Managed Services
PPTX
Enabling the Software Defined Data Center for Hybrid IT
PPTX
Kudu Forrester Webinar
PDF
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
PPTX
SplunkLive! Customer Presentation – Availity
PPT
A Community Approach to Fighting Cyber Threats
PDF
Enterprise Cloud transformation z pohledu Oracle
PDF
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
PPTX
Cloudera Altus: Big Data in the Cloud Made Easy
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Big data journey to the cloud 5.30.18 asher bartch
#PCMVision: HPE Family: Numble Storage and SimpliVity
 
Get Started with Data Science by Analyzing Traffic Data from California Highways
Splunk Cloud
Get started with Cloudera's cyber solution
How to Avoid Disasters via Software-Defined Storage Replication & Site Recovery
Machine Learning in the Enterprise 2019
Garance 100% dostupnosti dat! Kdo z vás to má?
Leveraging the Cloud for Big Data Analytics 12.11.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Pros and Cons of Moving to Cloud and Managed Services
Enabling the Software Defined Data Center for Hybrid IT
Kudu Forrester Webinar
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
SplunkLive! Customer Presentation – Availity
A Community Approach to Fighting Cyber Threats
Enterprise Cloud transformation z pohledu Oracle
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Cloudera Altus: Big Data in the Cloud Made Easy
Modern Data Warehouse Fundamentals Part 3
Big data journey to the cloud 5.30.18 asher bartch
Ad

Viewers also liked (8)

PDF
Modèle de sécurité AWS
PPTX
Splunk-EMC
PDF
Détecter et neutraliser efficacement les cybermenaces !
PPTX
Présentation sur splunk
PDF
Webinar: Was ist neu in Splunk Enterprise 6.5
PDF
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
PDF
Slides That Rock
PDF
Visual Design with Data
Modèle de sécurité AWS
Splunk-EMC
Détecter et neutraliser efficacement les cybermenaces !
Présentation sur splunk
Webinar: Was ist neu in Splunk Enterprise 6.5
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
Slides That Rock
Visual Design with Data
Ad

Similar to EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environments for Splunk (20)

PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PDF
Optimize IT Infrastructure
PPTX
Denver Big Data Analytics Day
PDF
Getting Started with Splunk Enterprise
PDF
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PDF
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
PPTX
SplunkLive! Warsaw 2015 Keynote
PDF
Big Data Workshop: Splunk and Dell EMC...Better Together
PDF
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
PPTX
Splunk Enterprise 6.4
PPTX
Getting Started with Splunk
PPTX
Splunk - Verwandeln Sie Datensilos in Operational Intelligence
PPTX
Taking Splunk to the Next Level – Architecture
PPTX
Taking Splunk to the Next Level - Technical
PPTX
Taking Splunk to the Next Level – Architecture
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PPTX
Taking Splunk to the Next Level - Architecture
PPTX
Research and technology explosion in scale-out storage
PPTX
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture Breakout Session
Optimize IT Infrastructure
Denver Big Data Analytics Day
Getting Started with Splunk Enterprise
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Taking Splunk to the Next Level - Architecture Breakout Session
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Warsaw 2015 Keynote
Big Data Workshop: Splunk and Dell EMC...Better Together
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Splunk Enterprise 6.4
Getting Started with Splunk
Splunk - Verwandeln Sie Datensilos in Operational Intelligence
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level - Technical
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture
Research and technology explosion in scale-out storage
Taking Splunk to the Next Level - Architecture

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity

EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environments for Splunk

  • 1. 1© Copyright 2015 EMC Corporation. All rights reserved. Building Massive and Efficient Indexer Storage Environments for Splunk Don Green– SAS Specialist @Texas_Don
  • 2. 2© Copyright 2015 EMC Corporation. All rights reserved. • Data and Storage Tech Trends • Splunk Architecture • Why Flash your Home Path? • The Big, Cold Data Lake • Converged Solutions • Resources – Sweet Apps Agenda
  • 3. 3 DATA GROWTH Source: IDC 3 2015 71 EB Total Capacity Shipped, Worldwide % of Unstructured Data 75% 78% 80% 2016 106 EB 2017 133 EB
  • 4. 4 Do more with less…
  • 6. 6 Enter SPLUNK ENTERPRISE Enterprise Scalability Search & Investigation Proactive Monitoring Operational Visibility Real-time Business Insights Operational IntelligenceAny Machine Data INDUSTRY-LEADINGPLATFORMFORMACHINE DATA Online Services Web Services Servers Security GPS Location Storage Desktops Networks Packaged Applications Custom Applications Messaging Telecoms Online Shopping Cart Web Clickstreams Databases Energy Meters Call Detail Records Smartphones and Devices RFID Datacenter Private Cloud Public Cloud
  • 7. 7 SPLUNK ARCHITECTURE Search Heads Query information across indexers and are usually CPU and memory intensive. Indexers Write data to disk and are both CPU and I/O intensive. Forwarders Collect and forward data; usually lightweight and not resource intensive. http://docs.splunk.com/Documentation/Splunk/latest/Overview/AboutSplunkEnterprisedeployments
  • 8. 8 • High-Performance Storage – Rare & Sparse Searches • High-Capacity Storage – Long-Term Retention • Scale-Out Infrastructure – Indexer & Search Heads • De-dupe & Compression – Clustered Indexer Deployments • Backup & Security – Data Protection & Compliance SPLUNK STORAGE REQUIREMENTS ENTERPRISE PERFORMANCE AND DATA SERVICES Indexers Search Heads Capacity Triggered HOT WARM COLD Age Triggered
  • 9. 9 Splunk Indexer Buckets HOT / WARM • Recent searches/dashboards • Usually block LUN • High Random reads • Sequential reads / writes • Rare searches • Usually NAS share • Light Random Reads • Sequential reads / writes • Not searchable • Usually offline media • Only sequential write Splunk moves indexes from between from hot/warm to cold to frozen based on user configuration Size of the buckets impacts performance COLD FROZEN
  • 10. 10 Indexer Storage Capacity Indexer Uncompressed ‘indexes’ 70% of written data  350GB 1TB Ingested Data = ½ Ingested Data = 500GB Compressed Raw data 30% of written data  150GB Raw Data Indexes *.gz *.tsidx Written Data
  • 11. 11 How much storage you need? Indexer 1TB Data Ingested Daily = 1TB * ½ * 60 days = 30TB Raw Data Indexes 150GB/Day * 60 Days = 9TB 350GB/Day * 60 Days = 21TB = Daily indexing rate x ½ x Retention policy
  • 12. 12 Multiple copies of index and raw data • Replication factor (RF) = # of of copies of raw data • Search factor (SF) = # copies of indexes  SPLUNK INDEXER AVAILABILITY 30TB written  30TB replicated 1TB * 60 days x ½ x 2 = 60TB (RF/SF=2) ** doubled ** 1TB * 60 days x ½ x 3 = 90TB (RF/SF=3) ** tripled ** STORAGE CAPACITY MULTIPLIES! Raw 9TB Index 21TB Raw 9TB Index 21TB SF=2 / RF=2
  • 13. 13 Just how much storage? Storage Requirements in TB 1 Year 2 Years 3 Years 4 Years 5 Years SplunkLicense(GB/DAY) Retention (Days) 1 7 14 30 90 180 365 730 1095 1460 1825 25 0.025 0.175 0.35 0.75 2.25 4.5 9.125 18.25 27.375 36.5 45.625 50 0.05 0.35 0.7 1.5 4.5 9 18.25 36.5 54.75 73 91.25 100 0.1 0.7 1.4 3 9 18 36.5 73 109.5 146 182.5 250 0.25 1.75 3.5 7.5 22.5 45 91.25 182.5 273.75 365 456.25 500 0.5 3.5 7 15 45 90 182.5 365 547.5 730 912.5 1000 1 7 14 30 90 180 365 730 1095 1460 1825 2000 2 14 28 60 180 360 730 1460 2190 2920 3650 3000 3 21 42 90 270 540 1095 2190 3285 4380 5475 4000 4 28 56 120 360 720 1460 2920 4380 5840 7300 5000 5 35 70 150 450 900 1825 3650 5475 7300 9125 6000 6 42 84 180 540 1080 2190 4380 6570 8760 10950 7000 7 49 98 210 630 1260 2555 5110 7665 10220 12775 10000 10 70 140 300 900 1800 3650 7300 10950 14600 18250 *Assumes RF/SF = 2
  • 14. 14 DAS PRESENTS CHALLENGES SPLUNK DAS ENVIRONMENT 1 Dedicated Storage Infrastructure • Silo that only runs Splunk 2 Compromised Availability • SSDs & servers fail • Index rebuilds can take hours to days 3 Lack of Enterprise Data Protection • No Snapshots or Compliance • DR limited to Multisite Clustering 4 Poor Storage Efficiency • Multiple copies of data • Multisite Clustering Increases Overhead 5 Non-Optimized Growth • Fixed compute to storage ratio • Servers must maintain storage symmetry 6 Management complexity • Multiple management points 1x 2x 3x 2x 3x 1x
  • 15. 15 WHY EMC FOR SPLUNK OPTIMIZED INFRASTRUCTURE FOR BIG & FAST DATA Optimized Shared Storage & Tiering Hot & Warm Data Deployed On XtremIO or ScaleIO Cold & Frozen Data Deployed On Isilon Powerful Data Services Encyption & Security Index File Compression Deduplication Of Clustered Indexes Snapshots For Backups Cost-Effective & Flexible Scale-Out Scale-Out Capacity & Compute Independently Or As Converged Platform
  • 16. 16 Why Flash?!? Economic Influences  Consumer Demand  Data Services Allowing free Copies of Application Data  Flash technology has improved at a faster rate than Moore’s Law Intelligent Scale-out Flash HDD
  • 17. 17 AGILE WRITEABLE SNAPSHOTS INLINE DATA AT REST ENCRYPTION XTREMIO DATA PROTECTION INLINE DEDUPLICATION INLINE COMPRESSION ALWAYS-ON THIN PROVISIONING XTREMIO DATA SERVICES ALWAYS-ON, INLINE, ZERO PENALTY, FREE
  • 18. 18 Data Services For Hot & Warm Data Self-Encrypting Flash Drives Index File Compression Dedupe Clustered Index Copies In-Memory Data Copy Services EMC XTREMIO & SPLUNK ALL-FLASH INFRASTRUCTURE FOR HOT & WARM DATA Scale-Out Flash For I/O-Bound Data >1M IOPS & <1ms Latencies High-Speed Search Accelerate SuperSparse & Rare Searches Indexers Search Heads
  • 19. 19 XTREMIO & INDEXER AVAILABILITY 30TB written  30TB replicated  30TB replicated Raw 9TB Index 21TB Raw 9TB Index 21TB (RF/SF=2) Raw 9TB Index 21TB = 31.25TB (RF/SF=3) = 32.5TB (RF/SF=1) 1TB * 60 days x ½ x 1 = 30TB = 30TB ** doubled ** ** tripled ** With XTREMIO Inline De-Dup 1TB * 60 days x ½ x 2 = 60TB 1TB * 60 days x ½ x 3 = 90TB
  • 20. 20 EMC SCALEIO & SPLUNK CONVERGED ARCHITECTURE FOR HOT & WARM DATA Indexers Search Heads Servers Network Storage Converged Splunk Architecture Leveraging Existing Hardware Investments 5K IOPS 1 TB 5K IOPS 1 TB 5K IOPS 1 TB 5K IOPS 1 TB 5K IOPS 1 TB Shared Capacity & Performance Remove Silos & Increase ROI On DAS Capacity & No Single Point Of Failure 25K IOPS & 5TB
  • 21. 21 OneFS EMC Isilon – Deep and WIDE Storage Single Volume/ File System Policy based Tiering Simplicity & Ease of Use Linear Scalability Multi-protocol support High Performance Unmatched Efficiency Easy Growth
  • 22. 22 Consolidate, Protect & Secure Cold Data SmartLock Protects Cold & Frozen Data SmartDedupe For Clustered Indexes Snapshots IQ For Backups EMC ISILON & SPLUNK LOW-COST & SECURE SCALE-OUT FOR COLD DATA High-Speed Ingest & Long-Term Retention With Native HDFS Integration Indexers Search Heads Scale-Out Capacity Up To 50PB Of Highly Available Capacity Self-Encrypting Drives
  • 24. 24 Isilon Scale-Out Data Lake 24 Data Silos vs Consolidated Data Lake
  • 25. 25 • One instance of the file services all dependent workloads simultaneo usly FILE 25 FILE EMC Isilon Next-Gen Access Methods
  • 26. 26 EMC REFERENCE ARCHITECTURES FOR SPLUNK ENTERPRISE XtremIO and Isilon Reference Architecture ScaleIO and Isilon Reference Architecture
  • 27. 27 EMC REFERENCE ARCHITECTURES Single-Instance Distributed HOT & COLD Mostly searches Heavy Random reads Sequential writes Adhoc searches Light Random Reads Sequential Writes Indexers Indexers Search WARM XtremIO Scale-Out 160 TB Flash & No Tuning No RAID Configuration Needed Many Copies & No Overhead Isilon Multi-Protocol = Always Searchable Tier “Frozen” Data Without Migration Scale-Out 50 PB Of Cold & Frozen Data Deduplication & Compression SmartDedupe & SmartLock
  • 28. 28 VBLOCK® SYSTEMS – THE ONLY TRUE CONVERGED INFRASTRUCTURE Application Optimization Lifecycle System Assurance API Enabled, Converged Management Integrated Protection and Workload Mobility Solutions Pre-engineered, Pre-validated, Pre-tested Best-of-breed Technology Fastest Time-to-Business Highest Performance Highest Availability Converged Management Lowest Risk Customer Experience Lowest TCO
  • 29. 29 WHY VCE FOR SPLUNK? Factory Physical and Logical Build Roadmap and New Feature Planning Compliance-Ready Configuration and Patch Management Performance and Availability Single Support Through VCE FOCUS ON BUSINESS OPERATIONS, NOT MAINTAINING INFRASTRUCTURE
  • 30. 30 VCE to Address Three Opportunities Splunk VCE™ technology extension for EMC® Isilon® Storage Vblock/VxBlock System 540 Vblock/VxBlock System 340 HOT/WARM COLDRack-Scale Bundle Block/Scale-Out Bundle
  • 31. 31 SPLUNK APP FOR VCE Vision  VCE Systems presented as an entity  Compliance history – what has been changed?  System inventory and health  KPI dashboards  One command to configure system logs and events
  • 33. 33 How We Size Splunk Infrastructure • We Use Splunk Best Practices “Religiously” • We build “Converged Systems” first • We have our own Ninjas to help! • http://splunk-sizing.appspot.com/ Not Officially Splunk-Supported