In Flux Limiting for a multi-tenant logging service

In-Flux Limiting for a Multi-Tenant Logging Service
Ambud Sharma & Suma Cherukuri
Cloud Platform Engineering @ Symantec
In-Flux Limiting for a Multi-Tenant Logging Service 1

Overview
• Who are we?
• Architecture
• Streaming Pipeline
• Influx Issue
• Influx Limiting Design & Solution
• Conclusion
• Q & A

Who are we?
• Symantec’s internal cloud team
• Host over $1B+ revenue applications
• Team
– Logging as a Service (LaaS) – Elasticsearch/Kibana
– Metering as a Service (MaaS) – InfluxDB/Grafana
– Alerting as a Service (AaaS) – Hendrix
We are hiring!
Also checkout Hendrix: https://github.com/Symantec/hendrix

Our Data
Logs
• Application and system
logs data from VM’s and
Containers
• Used for troubleshooting
Metrics
• Application and system
telemetries
• Used for Application
Performance
Monitoring
{
“message”: “User logged in from 1.1.1.1”,
“@version”: "1",
“@timestamp”: "2014-07-16T06:49:39.919Z",
“host”: "value",
“path”: “/opt/logstash/sample.log",
“tenant_id”: "291167ebed3221a006eb",
“apikey”: "06be8a-28ef-4568-8cb8-612",
“string_boolean”: "true",
“host_ip”: "192.168.99.01"
}
{
“@version”: "1",
“@timestamp”: "2014-07-16T06:49:39.919Z",
“host”: "host1.symantec.com",
“tenant_id”: "291167ebed3221a006ebf6",
“apikey”: "06be8a-28ef-4568-8cb8-618",
“value”: 0.65,
“name”: “cpu”
}
Log Event Metric Event

LMM Architecture
Redis
Customer
Agents
Elasticsearch
InfluxDB
Log Topology
Metrics Topology
Kafka
Logstash
Users
Open to
customers

Streaming Pipeline
• Validate events to match schema to optimize indexing
• Authenticate events to route data to the correct index
• Have 1 index per day per tenant
Kafka
Validate Auth Index

Influx Issue
• You know your data store performance
limits (find EPS from benchmark/capacity)
• Tenants send a lot of data and ingestion
rate is never linear
• Ingestion spikes are bound to happen in a
real-time streaming application
• Wouldn’t it be great if you could
normalize these spikes?

Influx Limiting
• Normalize the EPS curve using buffers
• Like a Hydro Dam, explicitly allocate EPS resource to tenants
Before
After

Design - Options
Approach 1 Approach 2
• Route to separate Kafka topic
• No back-pressure in primary queue
• Secondary queue is drained
at a slower pace
• Events may appear out of order
• Controlled back-pressure in the
primary queue
• Selectively reduce ingestion rate
for tenants
• Events will always appear in order

Customer Requirements
• Customers want threshold quotas defined for them
• Thresholds defined as policies (duration in seconds)
• Policies saved in a data store
Tenant A Tenant B Tenant C
{
“threshold”: 100,
“window”: 90
}
{
“window”: 10
}
{
“window”: 1
}

Bolt Design
Kafka
1. Track “Event Rate” for each Tenant for the policy window
2. If threshold exceeds then throttle else allow the events
3. Reset window when the time interval is complete (tumbling window)
Validate Auth Throttle Index

Scheduled-task design pattern
• Clock is maintained using
Storm Tick Tuple
• Tenant’s counter is
incremented when event is
received from it
• Counters are reset when
modulated value matches
Is Time % Throttle Duration = 0?
= Tenant Throttle Counter
Clock time
Modulo
Reset counters for each tenant in this sliceNothing to Reset
= Tenant Throttle Duration (modulated)
Reset counters for each tenant in this slice

Results
13
• Reduced EPS to
Elasticsearch
• We can normalize
flow rate based on
load

Conclusion
• Overview of real-time log and metric indexing
• Approaches to rate limit in real-time streaming application
• Design pattern to efficiently perform counting in Storm
14
That’s all folks!

Questions?

In Flux Limiting for a multi-tenant logging service

More Related Content

What's hot (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

In Flux Limiting for a multi-tenant logging service

Editor's Notes