SlideShare a Scribd company logo
© 2019 SPLUNK INC.
Session FN1435:
Sizing Splunk SmartStore -
Spend Less and Get More
Out of Splunk
Make your infra $$ work harder for you
Bharath Aleti
Director Product Management | Splunk Inc.
© 2019 SPLUNK INC.
Splunk Architect | ADP
Jon Rust
Offering Manager | IBM Cloud Object Storage
Jane Joki
During the course of this presentation, we may make forward-looking statements regarding future
events or the expected performance of the company. We caution you that such statements reflect our
current expectations and estimates based on factors currently known to us and that actual events or
results could differ materially. For important factors that may cause actual results to differ from those
contained in our forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of
its live presentation. If reviewed after its live presentation, this presentation may not contain current
or accurate information. We do not assume any obligation to update any forward-looking statements
we may make. In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to
develop the features or functionality described or to include any such feature or functionality in a
future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL
are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All
other brand names, product names, or trademarks belong to their respective owners. © 2019 Splunk
Inc. All rights reserved.
Forward-
Looking
Statements
© 2019 SPLUNK INC.© 2019 SPLUNK INC.
▶ SmartStore Overview
▶ Sizing, Performance & TCO Savings
▶ Customer story - ADP
▶ Storage Partner - IBM COS
▶ Wrap-up
Agenda
© 2019 SPLUNK INC.
SmartStore
Overview
Growing data volumes requires $$$ infra spend
Events Indexing Tier
Search Tier
………
Adding new indexers in response to data growth is expensive => High cost
Searches typically run over only on a partial subset of data => Inefficient utilization
Distributed scale out architecture => No longer a good fit for growing data volumes
cold
warm
hot
Cloud Storage On-premise Storage
Splunk SmartStore
Decoupled Compute & Storage
Compute
Storage
Compute
Storage
Compute
Storage
Compute
Storage
Scale storage for
higher data volumes
Scale compute
for performance
Compute
Compute
 Scale compute and storage independently at significantly reduced cost
• Leverages cost economies from cloud and
on-prem object storage innovations
• Cost savings w/ smaller indexer footprint
Cloud Storage On-premise Storage
Splunk SmartStore
Application and Data aware cache
ComputeCompute Compute Compute
 Dynamic data placement with active dataset in local cache
cache cache cache cache
• Recently ingested data and recently used
data reside in local cache
• Cache data based on age, access patterns
and admin-defined (business) priority
• Only brings in datasets required by search
Cloud Storage On-premise Storage
Splunk SmartStore
Brings in data closer to compute on-demand
ComputeCompute Compute Compute
 Optimized search performance for both local and remote data
cache cache cache cache
• Unused data sets are evicted from cache
• Brings older data from remote storage into
cache on-demand
• Uses prefetch and parallel fetch to minimize
impact of cache miss
Cloud Storage On-premise Storage
Splunk SmartStore
Stateless architecture accelerates recovery and simplifies management
 Higher agility & lower administrative overhead
Compute
cache
Compute
cache
Compute
cache
Compute
cache
• Upgrade or replace h/w by shutting down all
compute nodes w/o data impact
• Setup or tear-down compute nodes on-
demand for investigative use-cases
• Faster indexer recovery and data rebalance
Cloud Storage On-premise Storage
Splunk SmartStore
Migrate one index or all indexes
ComputeCompute Compute Compute
 Simple migration path for existing Splunk deployments
cache cache cache cache
• SmartStore enabled indexes can co-exist
with non-SmartStore enabled indexes
• Start small and expand to entire cluster
StorageStorage Storage Storage
A4 A5 An
C4
B4 b5 bn
c5 cnC5 Cn
B5 Bnb1 b3
c1 c3
b2 b3
c2 c3
…….
cold
warm
hot
Search Tier
Indexer Tier
SmartStore
A1 A2 A3
C1
B1
A1 A2 A3
C2
B2
Brings data closer to compute on-demand for large scale data processing
c3c3c2 c1
b3b2 b3b1 b1 b2
c1 c2
A1 A2 A3
C3
B3
c1 c2
c1 c2
C1 C2
B1 B2
C1 C3
B1 B3
C2 C3
B2 B3
B
b
DataC
c Metadata
Remote storage (warm/cold data)
C1 C2 C3 C4 C5 C6 C7 C8 Cn…….
B1 B2 B3 B4 B5 B6 B7 B8 Bn…….
• Decoupled storage and compute
• Longer data retention by
independently scaling storage
• Scale out compute based on
performance demands
• Lower TCO with S3 & S3 API
compliant object stores
• Fewer indexers required with only
one full copy of warm/cold, further
reduces TCO
• Faster node recovery and
rebalance operations
• Application and data (age, access
patterns, priority) aware cache
Hot and recently access data on indexers
Warm/cold data in remote storage
Splunk SmartStore makes it cost effective to retain voluminous data & unravel data insights
A4 A5 An
C4
B4 b2 b3B5 Bn
c2 c3C5 Cn
A4 A5 An
C4
B4 b2 b3B5 Bn
c2 c3C5 Cn
A1 A2 A3
b1 b2
c1 c2
b3
c3
Cache Manager loads active dataset on indexers
B7 C7
B7
C7
S3 or S3 API compliant object stores
© 2019 SPLUNK INC.
TCO &
Sizing
SmartStore Cost SavingsReference only, may vary based on your pricing
Non-SmartStore Infrastructure Costs
Non-SmartStore Server On-demand Pricing/Hr 1.38
Non-SmartStore Server Cost/Year $12088.8
Non-SmartStore Storage Per Node (GB) 12000
Non-SmartStore Indexers Required 31
Non-SmartStore Indexer Cost/Year $374753
Non-SmartStore Total Cost/Year $374753
SmartStore Infrastructure Costs
SmartStore Server (SSD) On-demand Pricing/Hr $0.624
SmartStore Server (SSD) Cost/Year $5,466
SmartStore Cache Required 15500
SmartStore Min Indexers Required 8
SmartStore Indexer Cost/Year $43,730
SmartStore remote storage pricing/GB/month $0.021
SmartStore Remote Storage Cost/Year $45.990
SmartStore Total Cost/Year $89,720
Ingestion Rate: 1TB/day
Total Retention: 365 days
Replication Factor: 2
Max Search Concurrency: 64
At 1TB/day for 365 days and RF=2, storage capacity req is 365TB
With 12TB per indexer, this would require 31 indexers
At a server cost of $12K/year, this comes to $374K
Non-SmartStore Infrastructure Cost
With 30 days cache retention, indexer footprint is reduced to 8
With 2TB per indexer (SSD), annual cost of indexers is $43K
Storage cost is $46K cost/year, with total cost =$90K
SmartStore approx cost savings: 75%
SmartStore Infrastructure Cost
Deployment
More performance => Add indexers
More storage Capacity => Add storage
Cost savings go down with increase in number of indexers and increases with
higher ingest rate/retention requirements
SmartStore Cache Sizing Guidelines
Daily Ingestion Rate (I)
Cache Retention (C) = 1/10/30 days or more
Available disk space (D) on your indexers (assuming homogenous disk space)
Replication Factor (R) = 2
Compression Factor (CF) = 0.5
Min required cache size: [I*R*CF + I*(CF)*(C-1)] (assuming 24 hour hot bucket rollover)
Min required indexers = Min required cache size / D
Also factor in ingestion throughput requirements (~300GB/day/indexer) to determine the number of indexers
Set min_cache_size to 80% of total disk capacity on the indexer
SmartStore Sizing
1TBDay_7DayCache 1TBDay_10DayCache 1TBDay_30DayCache 10TBday_10DayCache 10TBDay_30DayCache
Ingest/Day (GB) 1,000 1,000 1,000 10,000 10,000
Storage/Indexer (GB) 2,000 2,000 2,000 2,000 2,000
Cache Retention 7 10 30 10 30
Replication Factor 2 2 2 2 2
Hot data storage (I*RF*CF) 1000 1000 1000 10000 10000
Warm data storage [I*CF*(C-1)] 3000 4500 14500 45000 145000
Min Required Cache (GB) 4000 5500 15500 55000 155000
Min Required #Indexers 3* 3 8 34 78
Performance
Mixed Search Workload
Dense Search Workload
• 100% cached: Search time grows
linearly along with time range
• Sharp spikes on cache-miss when
hitting non-cached data
• Impact is lower for dense searches due
to data locality and prefetch
• On cache miss, the search time may
increase from 2s to >100s, depending
on the search
• E.g .To fetch a single bucket of
750MB on 1 Gbps network, the
latency is 7.5s.
• Prefetching reduces the overall
search response impact by
overlapping with CPU/IO
operations
Monitoring Console Additions
SmartStore in Production
• 95% of Splunk Cloud prod stacks running on SmartStore
• Support for Enterprise Security (RA/DMA) added in Splunk 7.3
• Successful adoption at key customer accounts and more in the pipeline
• ADP, Lawrence Livermore National Labs speaking at Conf ….
• 100+ on-prem deployments
• Quotes
• “SmartStore working like a dream”
• “Saving many millions per year with AWS S3 storage”
• “No longer worried about running out of disk space for long term retention”
• ”Easy to scale storage independent of compute means now we can increase long term
retention on-demand i.e within minutes instead of waiting for days/months”
• “Not only is S3 cheaper than disk, but you also need 50% less as replication is built into S3”
© 2019 SPLUNK INC.
SmartStore in
Production at ADP
Jon Rust
Splunk Admin, ADP
20 TB license, 11 TB avg day, 19 TB recent peak
500 TB of retention (growing since implementing S2)
600,000 searches per day
• Avg runtime 4.0s, unchanged since S2
5500 users
80 groups (each group gets a Splunk app)
1000 indexes (each group gets multiple indexes)
• Largest cluster has 300
Overview - Usage
72 physical indexers, 2 VM (lab) in 7 environments
• Largest clusters are 25 and 29 indexers
16 VM search heads
• Largest cluster is 9
Overview - Infrastructure
Most traffic still comes through SUF
Growing HEC, close to 50% lately
Separate HEC HF farm
• Flexibility
• HEC overuse doesn’t impact indexers
COS: Cloud Object Store from IBM
• Formerly known as CleverSafe
Overview – Basic Cluster
Overview – Production
Management unhappy with the cost of Splunk
• $50k per indexer, 20 cores
• 15 TB of usable RAID10 SSD
• Under-provisioned and looking to expand
With SmartStore (S2)
• $12k per indexer, 36 cores
• 30+ days cache retention
• 7 TB of usable RAID0 SSD
– BUT! S2 redundancy
• COS disk cost is about $0.35/GB
• 2x indexer count, almost 4x core count
– Still < 50% the $$
• Extended data retention (almost free !!)
“Indexers are too expensive”
• Increase or decrease peer count very quickly
• Random other example, “re-RAID project Q12019”
– Management forced us to use RAID5 during initial build-out
– RAID5 needs to die in a fire
– We eventually hit the IO wall
– With S2, rebuilding RAID volumes was pretty painless!
splunk offline
Take mount offline, rebuild the volume as RAID10
splunk restart
<repeat for each indexer>
12 indexers/site in the cluster, less than 2 hours of work, no service interruption
More than money management: Agility!
Most common searches are unchanged
• Recent data is in cache, performs exactly as before but faster with more h/w
• Historic searches are okay, depends
– Big window searches over old data can trigger large downloads from remote store
• We’ve had zero complaints about search performance since updating to S2
• Most users have no idea
But how does it search?
Mostly turn-key
• A few beta/early release issues (since solved)
• When migrating a cluster
– Chose 1 index first and verified
– Good? Chose 5 more and verified
– Good? Rolled the rest
– 500TB of data migrated !!
• Upload concurrency during migration
– We turned this down (from default of 8, to 4)
– Our COS infra wasn’t designed to handle so much upload data all at once
– Consider your network and S3 limits before migration
– Normal day-to-day use spreads out uploads pretty nicely
Was migration difficult?
[volume:remote_store]
storageType = remote
path = s3://splunk-s2-webtier-dc2
remote.s3.access_key = **key**
remote.s3.secret_key = **key**
remote.s3.endpoint = https://internalS3.endpoint
remote.s3.signature_version = v2
[some_index]
remotePath = volume:remote_store/$_index_name
homePath = volume:hot/$_index_name
maxGlobalDataSizeMB = 175000
frozenTimePeriodInSecs = 12096000
# required, but only used during migration; no data will land here after migration
coldPath = volume:cold/$_index_name
Sample SmartStore config
https://github.com/camrunr/s2_traffic_report
Dashboard: SmartStore Traffic
© 2019 SPLUNK INC.
Splunk SmartStore
and IBM Cloud
Object Storage
A Gamechanger for Your Splunk Environment
Jane Jokl
Offering Manager, IBM Cloud Object Storage Solutions
Topics
• Brief Overview of IBM Cloud Object Storage
• Solution Highlights
Efficiency of IBM Cloud Object Storage
RAID 6 + Replication Software Defined Solutions
1 PB
3.6 PB
900
3.6x
3.6x
3 FTE
Replication/backup
Usable Storage
Raw Storage
4TB Disks
Racks Required
Floor Space
Ops Staffing
Extra Software
$$
70% +
TCO Savings
Original
1.20 PB Raw
Onsite mirror
1.20 PB Raw
Remote copy
1.20 PB Raw
1 PB
1.7 PB
432
1.7x
1.7x
.5 FTE
None
• IBM Cloud Object Storage Industry Leader
• IDC and Gartner Market leader for over 6 years
• Simplified Distributed Architecture
• Access from anywhere
• Reduce points of failure
• Enhanced durability w/ consistency checks
• Simplify management
• Much less to tune (no controller nodes or replication)
• No snapshots or backup copies
• Virtually infinite scalability
• Scale Capacity to Exabytes
• Flexible addition/removal
• Reduced cost
• Commodity hardware
• Single copy protection
• No file system limitations
• Number of files per directories – no limit
• Total objects in a volume and max size
• Single volume max capacity
• Custom metadata
• Ready for AI/Analytics
• Stored with object for new use cases
Why is Cloud Object Storage a good fit for Unstructured Data?
Slicestors
Accesser Cluster
Splunk
S3
App #2
S3
App #3
S3
Notes:
• All deployment models supported – On Premise, Hybrid, Public Cloud
• Available as Software only; Supported on approved customer x86
platforms
• IBM appliances also available
How IBM Cloud Object Storage Works
Content Transformation
IBM COS software encrypts,
slices and applies Information
Dispersal Algorithms,
otherwise known as erasure
coding policies to the data.
Data Ingest
Accesser
Software
Storage Nodes
Site 1 Site 2 Site 3
Physical Distribution
Slices are distributed to separate disks on
industry standard x86 hardware across
geographic locations.
Data
Retrieval
Storage Nodes
Reliable Retrieval
An operator defined subset of slices is
needed to retrieve data bit perfectly in real
time.
Site 1 Site 2 Site 3
Benefits
The level of resiliency is fully
customizable resulting in a
massively reliable and
efficient way to store data at
scale as opposed to RAID
and replication techniques.
Slicestor
Software
Accesser
Software
Example of 1PB Data Use Case with SmartStore and COS
dsNet Storage System
(Disbursed Storage Network)
12/7/9 IDA
Load Balancer
Load Balancer
Site A Site B Site C
Slicestor
Slicestor
Slicestor
Slicestor
Accesser
Accesser
Slicestor
Slicestor
Slicestor
Slicestor
Accesser
Manager
Accesser
Slicestor
Slicestor
Slicestor
Slicestor
Accesser
Accesser
Search
Head
Clustered
Indexer
searchindexer
Local Storage
(cache)
Cache Management Layer
SmartStore Remote Storage
Event Data
Copy warm
data to
remote
storage
Move data
from remote
storage to
local storage
Read & write data locally
COS Configuration
• IDA: 12/7/9
• Data Reliability: 10 9’s
• Expansion: 1.71
• 12 TB HDDs
• Usable: 1008 TB
• Primary Raw: 1728 TB
• Managers: 1
• Accessers: 6
• Slicestors: 12
• Number of Accessers can
be scaled to handle
throughput
• Each accesser handles
approx 750MB/sec;
varies depending on
object size
• Slicestors can be scaled
for capacity
Highlights of Splunk SmartStore with IBM COS
Splunk administrators can seamlessly increase storage as well as storage performance with
IBM COS without having to scale up compute at the same time
Both Splunk and IBM COS highly flexible and extremely scalable without any downtime
• Scaling COS performance is as simple as adding more Accessers serving the storage pool
• If the dsNet becomes storage pool constrained, IBM COS allows realtime addition of additional sets of Slicestors
to the storage pool to increase storage pool performance
• Additional method of scaling performance from a COS perspective: use SmartStore’s ability to have different
endpoints for each volume; Ex: One set of indices use one dsNet, and other indices use another dsNet
Performance
• Can be as performant as Splunk’s traditional architecture – minimal performance delta with SmartStore remote
storage
• ADP use case success story
Benefits of On Prem deployments
• Less capacity costs
• No retrieval charges (egress bandwidth and operational requests)
• Higher reliability
• Data in your control
• Performance you control and more predictable
Unlock the Value of Splunk SmartStore with IBM COS
• Take advantage of the SmartStore feature in Splunk Enterprise with IBM Cloud Object Storage
• Lower TCO
• Scale Warm tier (IBM COS) independent of adding more indexing servers
• Optimize Hot tier Servers for Performance
• Extend Data Retention and Maximize Data Accessibility
• Hot tier remains the same as classic architecture
• Everything else is IBM COS which is WARM and SEARCHABLE (Warm/Cold = Warm)
• Agility of Infrastructure – Data not tied to Servers; No Downtime; Seamless Scalability
• Take advantage of intrinsic HA capabilities provided by IBM COS as Warm tier remote storage
• Simplify Data Management and Deployment model with only 2 tiers – Hot and Warm
• Architected for Massive Scale
• No size limitations on ingest with SmartStore; Setup parameters will need to be set according to either
architecture
• Can be implemented on a per Index basis, i.e. deployments do not have to be “all Classic” or “all
SmartStore”
Splunk SmartStore
Achieve massive scale with lower TCO
Lower TCO
Performance
at Scale
Faster Failure
Recovery
On-Demand
Cluster
• Brings in data closer to
compute on-demand
• Application and data
aware cache
• Cache data based on
age, priority and access
patterns
• Add/remove indexers on-
demand
• Setup/teardown cluster on-
demand
• Faster indexer recovery
• Faster data rebalance
• Decoupled compute and
storage
• Scale storage for longer
retention & indexers on
performance demand
• Reduced indexer
footprint for warm/cold
data
Splunk SmartStore
© 2019 SPLUNK INC.
1. Decoupled compute and storage w/
SmartStore provides scale and
performance at low cost
2. Size local cache to meet majority (95%)
of your search timespan (30/90 days)
3. Increased infrastructure agility to scale
up/down to meet business needs
Splunk SmartStore
Key
Takeaways
© 2019 SPLUNK INC.© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
More details,
FN2168:
SmartStore deep dive and
performance numbers
Wednesday, October 23, 12:30 PM - 01:15 PM
© 2019 SPLUNK INC.
Q&A

More Related Content

PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
Ceph and RocksDB
PDF
Deep Dive into GPU Support in Apache Spark 3.x
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
PPTX
Real Time analytics with Druid, Apache Spark and Kafka
PPTX
NetApp Se training storage grid webscale technical overview
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Ceph and RocksDB
Deep Dive into GPU Support in Apache Spark 3.x
Taking Splunk to the Next Level - Architecture Breakout Session
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Real Time analytics with Druid, Apache Spark and Kafka
NetApp Se training storage grid webscale technical overview
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...

What's hot (20)

PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
PDF
Building real time analytics applications using pinot : A LinkedIn case study
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PDF
Kubernetes scheduling and QoS
PDF
Building an open data platform with apache iceberg
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
PDF
NATS Streaming - an alternative to Apache Kafka?
PPTX
Apache Arrow: In Theory, In Practice
PPTX
Strata sf - Amundsen presentation
PPTX
Kolla talk at OpenStack Summit 2017 in Sydney
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
PPTX
Cloudera Hadoop Distribution
PDF
Let’s get to know Snowflake
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Data ingestion and distribution with apache NiFi
PDF
Apache Nifi Crash Course
PDF
Airflow presentation
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Building real time analytics applications using pinot : A LinkedIn case study
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Kubernetes scheduling and QoS
Building an open data platform with apache iceberg
Architect’s Open-Source Guide for a Data Mesh Architecture
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
NATS Streaming - an alternative to Apache Kafka?
Apache Arrow: In Theory, In Practice
Strata sf - Amundsen presentation
Kolla talk at OpenStack Summit 2017 in Sydney
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Introduction to Apache NiFi dws19 DWS - DC 2019
Cloudera Hadoop Distribution
Let’s get to know Snowflake
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Data ingestion and distribution with apache NiFi
Apache Nifi Crash Course
Airflow presentation
Ad

Similar to Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk (20)

PPTX
Taking Splunk to the Next Level - Architecture
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PPTX
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
PDF
Reduce Costs and Complexity with Backup-Free Storage
PPTX
Deploying All-Flash Cloud Infrastructure without Breaking the Bank
PDF
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
PDF
Scaling Apache Pulsar to 10 PB/day
PPTX
What's New with the Latest Splunk Platform Release
PPTX
Splunk Cloud and Splunk Enterprise 7.2
PPTX
Splunk Cloud and Splunk Enterprise 7.2
PPTX
Splunk Cloud and Splunk Enterprise 7.2
PPT
SunGard Storage Solutions
PPTX
Stor simple presentation customers
PPTX
How to Lower TCO and Avoid Cloud Lock-in

PPTX
Alle Neuigkeiten im letzten Plattform Release
PPTX
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
PPTX
Cost Optimizations In Cloud The best way to run a continuous optimization cycle
PDF
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Reduce Costs and Complexity with Backup-Free Storage
Deploying All-Flash Cloud Infrastructure without Breaking the Bank
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 PB/day
What's New with the Latest Splunk Platform Release
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
SunGard Storage Solutions
Stor simple presentation customers
How to Lower TCO and Avoid Cloud Lock-in

Alle Neuigkeiten im letzten Plattform Release
Collecting AWS Logs & Introducing Splunk New S3 Compatible Storage (SmartStore)
Taking Splunk to the Next Level - Architecture Breakout Session
Cost Optimizations In Cloud The best way to run a continuous optimization cycle
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
Ad

More from Paula Koziol (20)

PDF
AI Scalability for the Next Decade
PDF
Delivering Modern Data Protection for VMware Environments
PDF
IBM Storage for SAP HANA Deployments
PDF
IBM Power Systems at FIS InFocus 2019
PDF
IBM Storage at FIS InFocus 2019
PDF
Unlock Real Value from Back Up Data with IBM Spectrum Protect Plus
PDF
A Winning Combination: IBM Storage and VMware
PDF
Data Protection Modernization - Restore, Reuse, Reinvent
PDF
IBM Storage at Fiserv Forum 2018
PDF
IBM Storage at FIS Connect 2018
PDF
Addressing VMware Data Backup and Availability Challenges with IBM Spectrum P...
PDF
IBM & Veeam: Bridging the availability gap
PDF
Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...
PDF
Accelerate Your Signature Banking Applications with IBM Storage Offerings
PDF
Implementing a Disaster Recovery Solution using VMware Site Recovery Manager ...
PDF
IBM Storage and VMware – A Winning Combination
PDF
Scalable Data Computing for Healthcare and Life Sciences Industry
PDF
Future Proof Your Data: IBM Storage at VeeamON
PDF
IBM Storage at SAPPHIRE 2017
PDF
Optimize Your VMware SDDC with IBM Infrastructure
AI Scalability for the Next Decade
Delivering Modern Data Protection for VMware Environments
IBM Storage for SAP HANA Deployments
IBM Power Systems at FIS InFocus 2019
IBM Storage at FIS InFocus 2019
Unlock Real Value from Back Up Data with IBM Spectrum Protect Plus
A Winning Combination: IBM Storage and VMware
Data Protection Modernization - Restore, Reuse, Reinvent
IBM Storage at Fiserv Forum 2018
IBM Storage at FIS Connect 2018
Addressing VMware Data Backup and Availability Challenges with IBM Spectrum P...
IBM & Veeam: Bridging the availability gap
Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...
Accelerate Your Signature Banking Applications with IBM Storage Offerings
Implementing a Disaster Recovery Solution using VMware Site Recovery Manager ...
IBM Storage and VMware – A Winning Combination
Scalable Data Computing for Healthcare and Life Sciences Industry
Future Proof Your Data: IBM Storage at VeeamON
IBM Storage at SAPPHIRE 2017
Optimize Your VMware SDDC with IBM Infrastructure

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Empathic Computing: Creating Shared Understanding
NewMind AI Monthly Chronicles - July 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...

Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk

  • 1. © 2019 SPLUNK INC. Session FN1435: Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk Make your infra $$ work harder for you
  • 2. Bharath Aleti Director Product Management | Splunk Inc.
  • 3. © 2019 SPLUNK INC. Splunk Architect | ADP Jon Rust Offering Manager | IBM Cloud Object Storage Jane Joki
  • 4. During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2019 Splunk Inc. All rights reserved. Forward- Looking Statements
  • 5. © 2019 SPLUNK INC.© 2019 SPLUNK INC. ▶ SmartStore Overview ▶ Sizing, Performance & TCO Savings ▶ Customer story - ADP ▶ Storage Partner - IBM COS ▶ Wrap-up Agenda
  • 6. © 2019 SPLUNK INC. SmartStore Overview
  • 7. Growing data volumes requires $$$ infra spend Events Indexing Tier Search Tier ……… Adding new indexers in response to data growth is expensive => High cost Searches typically run over only on a partial subset of data => Inefficient utilization Distributed scale out architecture => No longer a good fit for growing data volumes cold warm hot
  • 8. Cloud Storage On-premise Storage Splunk SmartStore Decoupled Compute & Storage Compute Storage Compute Storage Compute Storage Compute Storage Scale storage for higher data volumes Scale compute for performance Compute Compute  Scale compute and storage independently at significantly reduced cost • Leverages cost economies from cloud and on-prem object storage innovations • Cost savings w/ smaller indexer footprint
  • 9. Cloud Storage On-premise Storage Splunk SmartStore Application and Data aware cache ComputeCompute Compute Compute  Dynamic data placement with active dataset in local cache cache cache cache cache • Recently ingested data and recently used data reside in local cache • Cache data based on age, access patterns and admin-defined (business) priority • Only brings in datasets required by search
  • 10. Cloud Storage On-premise Storage Splunk SmartStore Brings in data closer to compute on-demand ComputeCompute Compute Compute  Optimized search performance for both local and remote data cache cache cache cache • Unused data sets are evicted from cache • Brings older data from remote storage into cache on-demand • Uses prefetch and parallel fetch to minimize impact of cache miss
  • 11. Cloud Storage On-premise Storage Splunk SmartStore Stateless architecture accelerates recovery and simplifies management  Higher agility & lower administrative overhead Compute cache Compute cache Compute cache Compute cache • Upgrade or replace h/w by shutting down all compute nodes w/o data impact • Setup or tear-down compute nodes on- demand for investigative use-cases • Faster indexer recovery and data rebalance
  • 12. Cloud Storage On-premise Storage Splunk SmartStore Migrate one index or all indexes ComputeCompute Compute Compute  Simple migration path for existing Splunk deployments cache cache cache cache • SmartStore enabled indexes can co-exist with non-SmartStore enabled indexes • Start small and expand to entire cluster StorageStorage Storage Storage
  • 13. A4 A5 An C4 B4 b5 bn c5 cnC5 Cn B5 Bnb1 b3 c1 c3 b2 b3 c2 c3 ……. cold warm hot Search Tier Indexer Tier SmartStore A1 A2 A3 C1 B1 A1 A2 A3 C2 B2 Brings data closer to compute on-demand for large scale data processing c3c3c2 c1 b3b2 b3b1 b1 b2 c1 c2 A1 A2 A3 C3 B3 c1 c2 c1 c2 C1 C2 B1 B2 C1 C3 B1 B3 C2 C3 B2 B3 B b DataC c Metadata Remote storage (warm/cold data) C1 C2 C3 C4 C5 C6 C7 C8 Cn……. B1 B2 B3 B4 B5 B6 B7 B8 Bn……. • Decoupled storage and compute • Longer data retention by independently scaling storage • Scale out compute based on performance demands • Lower TCO with S3 & S3 API compliant object stores • Fewer indexers required with only one full copy of warm/cold, further reduces TCO • Faster node recovery and rebalance operations • Application and data (age, access patterns, priority) aware cache Hot and recently access data on indexers Warm/cold data in remote storage Splunk SmartStore makes it cost effective to retain voluminous data & unravel data insights A4 A5 An C4 B4 b2 b3B5 Bn c2 c3C5 Cn A4 A5 An C4 B4 b2 b3B5 Bn c2 c3C5 Cn A1 A2 A3 b1 b2 c1 c2 b3 c3 Cache Manager loads active dataset on indexers B7 C7 B7 C7 S3 or S3 API compliant object stores
  • 14. © 2019 SPLUNK INC. TCO & Sizing
  • 15. SmartStore Cost SavingsReference only, may vary based on your pricing Non-SmartStore Infrastructure Costs Non-SmartStore Server On-demand Pricing/Hr 1.38 Non-SmartStore Server Cost/Year $12088.8 Non-SmartStore Storage Per Node (GB) 12000 Non-SmartStore Indexers Required 31 Non-SmartStore Indexer Cost/Year $374753 Non-SmartStore Total Cost/Year $374753 SmartStore Infrastructure Costs SmartStore Server (SSD) On-demand Pricing/Hr $0.624 SmartStore Server (SSD) Cost/Year $5,466 SmartStore Cache Required 15500 SmartStore Min Indexers Required 8 SmartStore Indexer Cost/Year $43,730 SmartStore remote storage pricing/GB/month $0.021 SmartStore Remote Storage Cost/Year $45.990 SmartStore Total Cost/Year $89,720 Ingestion Rate: 1TB/day Total Retention: 365 days Replication Factor: 2 Max Search Concurrency: 64 At 1TB/day for 365 days and RF=2, storage capacity req is 365TB With 12TB per indexer, this would require 31 indexers At a server cost of $12K/year, this comes to $374K Non-SmartStore Infrastructure Cost With 30 days cache retention, indexer footprint is reduced to 8 With 2TB per indexer (SSD), annual cost of indexers is $43K Storage cost is $46K cost/year, with total cost =$90K SmartStore approx cost savings: 75% SmartStore Infrastructure Cost Deployment More performance => Add indexers More storage Capacity => Add storage Cost savings go down with increase in number of indexers and increases with higher ingest rate/retention requirements
  • 16. SmartStore Cache Sizing Guidelines Daily Ingestion Rate (I) Cache Retention (C) = 1/10/30 days or more Available disk space (D) on your indexers (assuming homogenous disk space) Replication Factor (R) = 2 Compression Factor (CF) = 0.5 Min required cache size: [I*R*CF + I*(CF)*(C-1)] (assuming 24 hour hot bucket rollover) Min required indexers = Min required cache size / D Also factor in ingestion throughput requirements (~300GB/day/indexer) to determine the number of indexers Set min_cache_size to 80% of total disk capacity on the indexer SmartStore Sizing 1TBDay_7DayCache 1TBDay_10DayCache 1TBDay_30DayCache 10TBday_10DayCache 10TBDay_30DayCache Ingest/Day (GB) 1,000 1,000 1,000 10,000 10,000 Storage/Indexer (GB) 2,000 2,000 2,000 2,000 2,000 Cache Retention 7 10 30 10 30 Replication Factor 2 2 2 2 2 Hot data storage (I*RF*CF) 1000 1000 1000 10000 10000 Warm data storage [I*CF*(C-1)] 3000 4500 14500 45000 145000 Min Required Cache (GB) 4000 5500 15500 55000 155000 Min Required #Indexers 3* 3 8 34 78
  • 17. Performance Mixed Search Workload Dense Search Workload • 100% cached: Search time grows linearly along with time range • Sharp spikes on cache-miss when hitting non-cached data • Impact is lower for dense searches due to data locality and prefetch • On cache miss, the search time may increase from 2s to >100s, depending on the search • E.g .To fetch a single bucket of 750MB on 1 Gbps network, the latency is 7.5s. • Prefetching reduces the overall search response impact by overlapping with CPU/IO operations
  • 19. SmartStore in Production • 95% of Splunk Cloud prod stacks running on SmartStore • Support for Enterprise Security (RA/DMA) added in Splunk 7.3 • Successful adoption at key customer accounts and more in the pipeline • ADP, Lawrence Livermore National Labs speaking at Conf …. • 100+ on-prem deployments • Quotes • “SmartStore working like a dream” • “Saving many millions per year with AWS S3 storage” • “No longer worried about running out of disk space for long term retention” • ”Easy to scale storage independent of compute means now we can increase long term retention on-demand i.e within minutes instead of waiting for days/months” • “Not only is S3 cheaper than disk, but you also need 50% less as replication is built into S3”
  • 20. © 2019 SPLUNK INC. SmartStore in Production at ADP Jon Rust Splunk Admin, ADP
  • 21. 20 TB license, 11 TB avg day, 19 TB recent peak 500 TB of retention (growing since implementing S2) 600,000 searches per day • Avg runtime 4.0s, unchanged since S2 5500 users 80 groups (each group gets a Splunk app) 1000 indexes (each group gets multiple indexes) • Largest cluster has 300 Overview - Usage
  • 22. 72 physical indexers, 2 VM (lab) in 7 environments • Largest clusters are 25 and 29 indexers 16 VM search heads • Largest cluster is 9 Overview - Infrastructure
  • 23. Most traffic still comes through SUF Growing HEC, close to 50% lately Separate HEC HF farm • Flexibility • HEC overuse doesn’t impact indexers COS: Cloud Object Store from IBM • Formerly known as CleverSafe Overview – Basic Cluster
  • 25. Management unhappy with the cost of Splunk • $50k per indexer, 20 cores • 15 TB of usable RAID10 SSD • Under-provisioned and looking to expand With SmartStore (S2) • $12k per indexer, 36 cores • 30+ days cache retention • 7 TB of usable RAID0 SSD – BUT! S2 redundancy • COS disk cost is about $0.35/GB • 2x indexer count, almost 4x core count – Still < 50% the $$ • Extended data retention (almost free !!) “Indexers are too expensive”
  • 26. • Increase or decrease peer count very quickly • Random other example, “re-RAID project Q12019” – Management forced us to use RAID5 during initial build-out – RAID5 needs to die in a fire – We eventually hit the IO wall – With S2, rebuilding RAID volumes was pretty painless! splunk offline Take mount offline, rebuild the volume as RAID10 splunk restart <repeat for each indexer> 12 indexers/site in the cluster, less than 2 hours of work, no service interruption More than money management: Agility!
  • 27. Most common searches are unchanged • Recent data is in cache, performs exactly as before but faster with more h/w • Historic searches are okay, depends – Big window searches over old data can trigger large downloads from remote store • We’ve had zero complaints about search performance since updating to S2 • Most users have no idea But how does it search?
  • 28. Mostly turn-key • A few beta/early release issues (since solved) • When migrating a cluster – Chose 1 index first and verified – Good? Chose 5 more and verified – Good? Rolled the rest – 500TB of data migrated !! • Upload concurrency during migration – We turned this down (from default of 8, to 4) – Our COS infra wasn’t designed to handle so much upload data all at once – Consider your network and S3 limits before migration – Normal day-to-day use spreads out uploads pretty nicely Was migration difficult?
  • 29. [volume:remote_store] storageType = remote path = s3://splunk-s2-webtier-dc2 remote.s3.access_key = **key** remote.s3.secret_key = **key** remote.s3.endpoint = https://internalS3.endpoint remote.s3.signature_version = v2 [some_index] remotePath = volume:remote_store/$_index_name homePath = volume:hot/$_index_name maxGlobalDataSizeMB = 175000 frozenTimePeriodInSecs = 12096000 # required, but only used during migration; no data will land here after migration coldPath = volume:cold/$_index_name Sample SmartStore config
  • 31. © 2019 SPLUNK INC. Splunk SmartStore and IBM Cloud Object Storage A Gamechanger for Your Splunk Environment Jane Jokl Offering Manager, IBM Cloud Object Storage Solutions
  • 32. Topics • Brief Overview of IBM Cloud Object Storage • Solution Highlights
  • 33. Efficiency of IBM Cloud Object Storage RAID 6 + Replication Software Defined Solutions 1 PB 3.6 PB 900 3.6x 3.6x 3 FTE Replication/backup Usable Storage Raw Storage 4TB Disks Racks Required Floor Space Ops Staffing Extra Software $$ 70% + TCO Savings Original 1.20 PB Raw Onsite mirror 1.20 PB Raw Remote copy 1.20 PB Raw 1 PB 1.7 PB 432 1.7x 1.7x .5 FTE None
  • 34. • IBM Cloud Object Storage Industry Leader • IDC and Gartner Market leader for over 6 years • Simplified Distributed Architecture • Access from anywhere • Reduce points of failure • Enhanced durability w/ consistency checks • Simplify management • Much less to tune (no controller nodes or replication) • No snapshots or backup copies • Virtually infinite scalability • Scale Capacity to Exabytes • Flexible addition/removal • Reduced cost • Commodity hardware • Single copy protection • No file system limitations • Number of files per directories – no limit • Total objects in a volume and max size • Single volume max capacity • Custom metadata • Ready for AI/Analytics • Stored with object for new use cases Why is Cloud Object Storage a good fit for Unstructured Data? Slicestors Accesser Cluster Splunk S3 App #2 S3 App #3 S3 Notes: • All deployment models supported – On Premise, Hybrid, Public Cloud • Available as Software only; Supported on approved customer x86 platforms • IBM appliances also available
  • 35. How IBM Cloud Object Storage Works Content Transformation IBM COS software encrypts, slices and applies Information Dispersal Algorithms, otherwise known as erasure coding policies to the data. Data Ingest Accesser Software Storage Nodes Site 1 Site 2 Site 3 Physical Distribution Slices are distributed to separate disks on industry standard x86 hardware across geographic locations. Data Retrieval Storage Nodes Reliable Retrieval An operator defined subset of slices is needed to retrieve data bit perfectly in real time. Site 1 Site 2 Site 3 Benefits The level of resiliency is fully customizable resulting in a massively reliable and efficient way to store data at scale as opposed to RAID and replication techniques. Slicestor Software Accesser Software
  • 36. Example of 1PB Data Use Case with SmartStore and COS dsNet Storage System (Disbursed Storage Network) 12/7/9 IDA Load Balancer Load Balancer Site A Site B Site C Slicestor Slicestor Slicestor Slicestor Accesser Accesser Slicestor Slicestor Slicestor Slicestor Accesser Manager Accesser Slicestor Slicestor Slicestor Slicestor Accesser Accesser Search Head Clustered Indexer searchindexer Local Storage (cache) Cache Management Layer SmartStore Remote Storage Event Data Copy warm data to remote storage Move data from remote storage to local storage Read & write data locally COS Configuration • IDA: 12/7/9 • Data Reliability: 10 9’s • Expansion: 1.71 • 12 TB HDDs • Usable: 1008 TB • Primary Raw: 1728 TB • Managers: 1 • Accessers: 6 • Slicestors: 12 • Number of Accessers can be scaled to handle throughput • Each accesser handles approx 750MB/sec; varies depending on object size • Slicestors can be scaled for capacity
  • 37. Highlights of Splunk SmartStore with IBM COS Splunk administrators can seamlessly increase storage as well as storage performance with IBM COS without having to scale up compute at the same time Both Splunk and IBM COS highly flexible and extremely scalable without any downtime • Scaling COS performance is as simple as adding more Accessers serving the storage pool • If the dsNet becomes storage pool constrained, IBM COS allows realtime addition of additional sets of Slicestors to the storage pool to increase storage pool performance • Additional method of scaling performance from a COS perspective: use SmartStore’s ability to have different endpoints for each volume; Ex: One set of indices use one dsNet, and other indices use another dsNet Performance • Can be as performant as Splunk’s traditional architecture – minimal performance delta with SmartStore remote storage • ADP use case success story Benefits of On Prem deployments • Less capacity costs • No retrieval charges (egress bandwidth and operational requests) • Higher reliability • Data in your control • Performance you control and more predictable
  • 38. Unlock the Value of Splunk SmartStore with IBM COS • Take advantage of the SmartStore feature in Splunk Enterprise with IBM Cloud Object Storage • Lower TCO • Scale Warm tier (IBM COS) independent of adding more indexing servers • Optimize Hot tier Servers for Performance • Extend Data Retention and Maximize Data Accessibility • Hot tier remains the same as classic architecture • Everything else is IBM COS which is WARM and SEARCHABLE (Warm/Cold = Warm) • Agility of Infrastructure – Data not tied to Servers; No Downtime; Seamless Scalability • Take advantage of intrinsic HA capabilities provided by IBM COS as Warm tier remote storage • Simplify Data Management and Deployment model with only 2 tiers – Hot and Warm • Architected for Massive Scale • No size limitations on ingest with SmartStore; Setup parameters will need to be set according to either architecture • Can be implemented on a per Index basis, i.e. deployments do not have to be “all Classic” or “all SmartStore”
  • 39. Splunk SmartStore Achieve massive scale with lower TCO Lower TCO Performance at Scale Faster Failure Recovery On-Demand Cluster • Brings in data closer to compute on-demand • Application and data aware cache • Cache data based on age, priority and access patterns • Add/remove indexers on- demand • Setup/teardown cluster on- demand • Faster indexer recovery • Faster data rebalance • Decoupled compute and storage • Scale storage for longer retention & indexers on performance demand • Reduced indexer footprint for warm/cold data Splunk SmartStore
  • 40. © 2019 SPLUNK INC. 1. Decoupled compute and storage w/ SmartStore provides scale and performance at low cost 2. Size local cache to meet majority (95%) of your search timespan (30/90 days) 3. Increased infrastructure agility to scale up/down to meet business needs Splunk SmartStore Key Takeaways
  • 41. © 2019 SPLUNK INC.© 2019 SPLUNK INC.
  • 42. © 2019 SPLUNK INC. More details, FN2168: SmartStore deep dive and performance numbers Wednesday, October 23, 12:30 PM - 01:15 PM
  • 43. © 2019 SPLUNK INC. Q&A