Sherlock: an anomaly detection service on top of Druid

Sherlock: Automated Anomaly
Detector

Introduction
Jigar Patel
Software Engineer
Oath/Yahoo Inc.
Job description:
❏ Working on stream and batch data
processing,
❏ Automated Anomaly Detection using
Sherlock,
❏ Multi-armed bandit testing in user
engagements
Technologies:
Kafka, Storm, Druid, Tranquility stream push,
Stream pull using Kafka indexing service in druid,
HDFS, Hive/Pig ETL, Chef, Kubernetes, Apache
Superset

Introduction
David Servose
Software Engineer
Oath/Yahoo Inc.
Job description:
❏ Working on batch data
processing
❏ Automated Anomaly Detection
using Sherlock
❏ GDPR data retrieval system
Technologies:
Druid, HDFS, Hive/Pig ETL, Kubernetes,
Apache Superset

Motivation
Complex User Engagement Patterns
Experimentation like A/B testing,
Multi-armed bandit testing; seasonal
events like NBA, NFL; holidays cause
unusual deviation in user
engagements.
4
Online learning and Reinforcement
learning uses real-time feedback in the
system, which requires continuous
monitoring to check the product
health.
Growth in Big data and Analytics
Technological advances in big data
and analytics helped processing very
large amount of data with analytics
on top of it. That resulted in large
number of dashboards with KPIs.
Instrumentation bugs affect business KPIs
Bugs in software instrumentation
sometimes affect user experience
directly penalizing revenue numbers.
Continuous Monitoring Requirement

Current Practices and Challenges
Passive Monitoring
◆ Not feasible for large number of
dashboards
◆ Not scalable for all cardinalities of
dimensions
Thresholding Time-series
values for Alerts
◆ Not scalable to all time-series with
different scales of values
◆ Large number of false positives
◆ Does not take into account the
seasonality and trend in time-series

Methods & Technologies
Easy and Faster Data
Exploration and Aggregation
EGADSDruid Sherlock API
Automate the Monitoring of
Large Number of Time-series
Learn Seasonality and Trend
for Time-series Data

Druid
Scalable to trillions of events,
petabytes of data, and
thousands of concurrent users
Data store designed for
sub-second queries on
real-time and historical data
http://druid.io/
https://datasketches.github.io/
Supports DataSketch
aggregators to compute set
cardinalities, frequency
estimation and more in linear
time
Used for business intelligence
OLAP queries on event level
data

EGADS
https://github.com/yahoo/egads
Time-Series Modeling Anomaly Detection Module Alerting Module
● Olympic model
● Exponential smoothing
● Moving Average
● ARIMA
● Regression models
● Spectral Kalman filter
● Extreme Low Density Outlier
● Change Point Detection
● KSigma Outlier
This module uses the error metrics
produced from anomaly detection
models and outputs candidate
anomalies based on dynamically
learnt threshold and filters
irrelevant anomalies
(Extensible Generic Anomaly Detection System)
- Open-source Java Library to Automatically Detect Anomalies in large scale time-series data

Sherlock
Sherlock is an anomaly
detection service built
on top of Druid
Uses EGADS for
time-series modeling
and detecting
anomalous behaviour
Email
Alerting
Ad hoc
Anomaly
Detection
Cron style
Anomaly Job
Scheduling
https://github.com/yahoo/sherlock

Sherlock
Server
Jobs
priority queue based
on next runtime
Persistent Data
(job info,
druid cluster info,
reports)
Druid query
Sherlock
Client
Architecture
Job
metadata

Job
start
query
response
Time-series
parser
EGADS Report
Generation
time-series anomalies alerts(emails)
Job from
priority queue
Store to persistent
storage
Email
Service
Job life cycle

Druid Query
{
"metric": "metric(m1/m2)",
"aggregations": [
{
"filter": { … },
…
}
],
"dimension": [d1,d2],
"intervals": "2017-10-11/2017-12-11",
"dataSource": "",
"granularity": {
"timeZone": "UTC",
"type": "period",
"period": "P1D"
},
"threshold": 50,
"postAggregations": [ … ]
"queryType": "topN"
}
● It computes metric(m1/m2)
● Group by dimensions: d1, d2
● Query interval: 2017-10-11 to
2017-12-11
● Granularity: Daily (P1D)

Druid Query
"aggregations": [
{
"filter": { … },
"aggregator": [
{
"fieldName": “f1",
"type": "longSum",
"name": "f1"
},
{
"type": "thetaSketch",
"name": "f2"
}
],
"type": "filtered"
}
]
{
"aggregations": [
{
"filter": { … },
…
}
],
"intervals": "2017-10-11/2017-12-11",
"dataSource": "",
"granularity": {
"timeZone": "UTC",
"type": "period",
"period": "P1D"
},
"threshold": 50,
"queryType": "topN"
}

Druid Query
"aggregations": [
{
"filter": {
"fields": [
{
"type": "selector",
"dimension": "d4",
"value": "value1"
},
… (same for d(n))
],
"type": "and"
},
...
}
]
{
"aggregations": [
{
"filter": { … },
…
}
],
"intervals": "2017-10-11/2017-12-11",
"dataSource": "",
"granularity": {
"timeZone": "UTC",
"type": "period",
"period": "P1D"
},
"threshold": 50,
"queryType": "topN"
}

Druid Query
"postAggregations": [
{
"fields": [
{
"type": "fieldAccess",
"name": "m1"
},
{
"fieldName": "f2",
"type": "thetaSketch",
"name": "m2"
}
],
"type": "arithmetic",
"name": "metric(m1/m2)",
"fn": "/"
}
]
{
"aggregations": [
{
"filter": { … },
…
}
],
"intervals": "2017-10-11/2017-12-11",
"dataSource": "",
"granularity": {
"timeZone": "UTC",
"type": "period",
"period": "P1D"
},
"threshold": 50,
"queryType": "topN"
}

Druid Response
[
{
"timestamp" : "2017-10-11T00:00:00.000Z",
"result" : [
{
"groupByDimension" : "v1d1
and v1d2
",
"metric(m1/m2)" : 8,
"m1" : 128,
"m2" : 16
},
{
"groupByDimension" : "v1d1
and v2d2
",
"metric(m1/m2)" : 4.5,
"m1" : 42,
"m2" : 9.33
}
]
},
… (data for 2017-10-12,2017-10-13 … 2017-12-11)
]
Let,
d1 has one cardinality value (v1)
d2 has two cardinality value (v1,v2)

EGADS Anomaly Detection
KSigma Sensitivity Kernel-based Change Point
Detection
Density based Sensitivity

Job Scheduling allows periodic monitoring hourly, daily,
weekly, monthly
Less False Positives
EGADS anomaly detection models filters unwanted anomalies
using dynamic threshold bounds on error metrics
Learns Trends and
Seasonality
Time-series modeling algorithms adapts to changing
patterns and trends in data
Faster & More
Scalable
Druid faster data aggregation and EGADS light weight
models enables learning and detection almost in
real-time
Automated
Monitoring
Why Sherlock is better

DEMO

Q/A

Sherlock: an anomaly detection service on top of Druid

More Related Content

What's hot (20)

Similar to Sherlock: an anomaly detection service on top of Druid (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Sherlock: an anomaly detection service on top of Druid