SlideShare a Scribd company logo
Sta$s$cal 
Learning 
Based 
Anomaly 
Detec$on 
@ 
Twi9er 
Arun Kejariwal 
(@arun_kejariwal) 
Joint work with Jordan Hochenbaum and Owen Vallis 
November 2014
Internet 
trends 
• Real-time 
[1] 
h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ 
AK 
2 
[1]
Twi9er: 
Global 
Town 
Square 
AK 
3
Data 
Fidelity 
• Data-driven decision making 
q Evolving product landscape 
• Data partners 
q Nielsen 
q Dataminr 
• Operational 
q Performance and Availability 
AK 
4
Data 
Fidelity: 
Challenges 
• Anomalies 
q Exogenic factors 
§ User behavior 
§ Events 
§ Data center 
q Endogenic factors 
§ Agile development 
o Fail fast 
§ Data collection 
• Millions of time series [1,2] 
q Scalability 
AK 
5 
[1] 
h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html 
[2] 
h9p://strataconf.com/strata2014/public/schedule/detail/32431
Anomaly 
Detec$on: 
Why 
Bother? 
• Analyze User Engagement 
q Events 
§ Super Bowl, Japanese New Year 
q Year over year analysis (input to forecasting) 
• Identify Attacks 
q DoS 
q Malware attacks 
• Identify Bots 
q Separating actual users from spam 
AK 
6
Anomaly 
Detec$on 
• Visual 
q Prone to errors 
q Not scalable 
§ Machine generated data 
11% of the digital universe in 2005 
to > 40% by 2020 [1] 
§ Cloud Infrastructure 2013-2017 CAGR ~50% [2] 
• Algorithmic approach 
q Automate! 
[1] 
h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm 
AK 
7 
[2] 
h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
Anomaly 
Detec$on: 
Background 
• Over 50 years of research [1] 
q Statistics 
§ Extreme Value Theory 
§ Robust Statistics, Grubb’s Test, ESD 
q Econometrics 
q Finance 
§ Value at Risk (VaR) 
q Signal Processing 
q Music Information Retrieval 
q Networking 
q E- Commerce 
q Performance Regression 
[1] 
“Anomaly 
Detec$on” 
by 
Chandola 
et 
al. 
ACM 
Compu$ng 
Surveys, 
2009. 
AK 
8 
Jon 
from 
Etsy 
Toufic 
from 
Metafor
Anomaly 
Detec$on: 
Overview 
• Definition 
q “An anomaly is an observation that deviates so much from other observations so 
as to arouse suspicions that it is was generated by a different mechanism” [1,2] 
[1] 
“Iden$fica$on 
of 
outliers” 
by 
Hawkins, 
Douglas 
M. 
London: 
Chapman 
and 
Hall, 
1980. 
AK 
9 
[2] 
“Outlier 
Analysis” 
by 
Charu 
C. 
Aggarwal. 
Springer, 
2013.
Anomaly 
Detec$on 
• Characterization 
q Magnitude 
q Width 
q Frequency 
q Direction 
AK 
10
Anomaly 
Detec$on 
(contd.) 
• Two flavors 
q Global 
§ Max Value 
q Local 
§ Intra-day 
AK 
11 
Global 
Local
Anomaly 
Detec$on 
(contd.) 
• Traditional Approaches 
q Metrics 
§ Mean μ 
§ Variance σ 
q Rule of thumb 
§ μ + 3*σ 
q Which time series? 
§ Raw 
§ Moving Averages 
o SMA, EWMA, PEWMA 
AK 
12 
3 * σ
Anomaly 
Detec$on 
(contd.) 
• Impact of multi-modal distribution 
q μ Shift ~ 0.2% 
q Inflates σ by 4.5% 
§ Miss quite a few anomalies 
q What do multiple modes correspond to? 
§ Seasonality 
AK 
13
• Robust Statistics 
q MAD 
§ Robust Breakdown point 
o Median 50% vs. Mean 0% 
q σMAD 
§ K = 1.4826 for normally distributed data 
AK 
14 
Anomaly 
Detec$on 
(contd.)
• Limitations of using MAD 
AK 
15 
Anomaly 
Detec$on 
(contd.)
• Grubb’s Test 
q Critical value is derived from data using a statistical confidence (α) 
• Limitations 
q Assumes data distribution is normal 
q Good for detecting ONLY 1 outlier 
q Seasonality unaware 
AK 
16 
Anomaly 
Detec$on 
(contd.)
• ESD (Generalized Extreme Studentized Deviate) [1] 
q Critical value (λi) re-calculated every iteration 
q Largest i such that Ri > λi determines # of anomalies 
q An upper-bound on the number of anomalies is an input parameter 
• Limitations 
q Generalized ESD assumes a “normal” distribution 
q Seasonality unaware 
AK 
17 
Anomaly 
Detec$on 
(contd.) 
[1] 
Rosner, 
Bernard. 
“Percentage 
Points 
for 
a 
Generalized 
ESD 
Many-­‐outlier 
Procedure.” 
Technometrics 
25, 
no. 
2 
(1983): 
165–172.
Our 
Approach
• Addressing Seasonality 
q Key Idea 
§ Time Series Decomposition 
AK 
19 
Anomaly 
Detec$on 
(contd.)
• Determining seasonal component 
q Regression on sub-cycle plots [1] 
AK 
20 
Anomaly 
Detec$on 
(contd.) 
[1] 
“STL: 
A 
seasonal-­‐trend 
decomposi$on 
procedure 
based 
on 
loess” 
by 
Cleveland, 
et 
al. 
Journal 
of 
Official 
Sta$s$cs, 
Vol. 
6, 
Issue 
1, 
1990.
• Impact of removal of seasonal and trend 
q Transforms our multi-modal data into unimodal data. 
§ Amenable to ESD/MAD! 
AK 
21 
Anomaly 
Detec$on 
(contd.) 
The decomposed Residual 
becomes "Uni-modal". This 
significantly shrinks the value of 
sigma. 
The original "Multi-Modal" 
Raw Data has a much wider 
value for sigma, leading ESD 
to miss a lot of the outliers.
Trend Smoothing Distortion 
Creates “Phantom” Anomalies 
• Challenges remain! 
AK 
22 
Anomaly 
Detec$on 
(contd.)
• Marrying Robust Statistics with Seasonal Decomposition 
AK 
23 
Anomaly 
Detec$on 
(contd.) 
Median is Free from Distortion
• Applying ESD on the Residual 
AK 
24 
Anomaly 
Detec$on 
(contd.) 
Decomposition Exposes Anomalies
• Recap 
q Extract the seasonal component using STL 
§ Filters out periodic spikes 
q Residual = Raw - Seasonalraw- Medianraw 
q Run ESD on residual (using median and MAD) 
AK 
25 
Anomaly 
Detec$on 
(contd.)
• Illustrative example 
AK 
26 
Anomaly 
Detec$on 
(contd.)
• Applications 
q Three perspectives 
§ Capacity 
o CPU utilization 
o Garbage collection 
o Network activity 
§ User behavior 
o Events 
• Impressions 
• Link clicks 
o Spam 
§ Forecasting 
AK 
27 
Anomaly 
Detec$on 
(contd.)
• Deployed in production 
q Used by large number of services at Twitter 
q Automatic e-mail notification 
§ Only sent if anomalies are present 
§ Anomalies annotated 
§ CSV with anomaly locations attached 
AK 
28 
Anomaly 
Detec$on 
(contd.)
• Skyline from Etsy 
q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py 
• Coming soon! 
q R package 
AK 
29 
Open 
Sourcing
Join 
the 
Flock 
Like 
problem 
solving? 
Like 
challenges? 
Be 
at 
cukng 
Edge 
Make 
an 
impact 
• We are hiring!! 
q https://twitter.com/JoinTheFlock 
q https://twitter.com/jobs 
q Contact us: @arun_kejariwal 
AK 
30

More Related Content

PDF
Gangliaはじめました
PDF
Yahoo! JAPANにおけるApache Cassandraへの取り組み
PDF
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
TIME_WAITに関する話
PDF
ネットワーク ゲームにおけるTCPとUDPの使い分け
PDF
並行実行制御の最適化手法
PDF
ダブル配列の豆知識
PPTX
Tensor コアを使った PyTorch の高速化
Gangliaはじめました
Yahoo! JAPANにおけるApache Cassandraへの取り組み
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
TIME_WAITに関する話
ネットワーク ゲームにおけるTCPとUDPの使い分け
並行実行制御の最適化手法
ダブル配列の豆知識
Tensor コアを使った PyTorch の高速化

What's hot (20)

PDF
ストリーミングのげんざい
PDF
失敗から学ぶ機械学習応用
PPTX
これがCassandra
PPT
Cassandraのしくみ データの読み書き編
PPTX
HashMapとは?
PDF
時系列分析入門
PPTX
分散システムについて語らせてくれ
PDF
DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)
PDF
できる!並列・並行プログラミング
PDF
Graph Attention Network
PDF
PostgreSQL 15 開発最新情報
PPTX
トランザクションの設計と進化
PPTX
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
PPT
インフラエンジニアのためのcassandra入門
PDF
暗号文のままで計算しよう - 準同型暗号入門 -
PDF
WebブラウザでP2Pを実現する、WebRTCのAPIと周辺技術
PDF
最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた
PDF
Apache Impalaパフォーマンスチューニング #dbts2018
PDF
クラシックな機械学習の入門 4. 学習データと予測性能
PPTX
Elasticsearch as a Distributed System
ストリーミングのげんざい
失敗から学ぶ機械学習応用
これがCassandra
Cassandraのしくみ データの読み書き編
HashMapとは?
時系列分析入門
分散システムについて語らせてくれ
DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)
できる!並列・並行プログラミング
Graph Attention Network
PostgreSQL 15 開発最新情報
トランザクションの設計と進化
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
インフラエンジニアのためのcassandra入門
暗号文のままで計算しよう - 準同型暗号入門 -
WebブラウザでP2Pを実現する、WebRTCのAPIと周辺技術
最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた
Apache Impalaパフォーマンスチューニング #dbts2018
クラシックな機械学習の入門 4. 学習データと予測性能
Elasticsearch as a Distributed System
Ad

Viewers also liked (20)

PDF
Data Data Everywhere: Not An Insight to Take Action Upon
PDF
Anomaly detection
PPTX
Anomaly detection
PDF
Finding bad apples early: Minimizing performance impact
PDF
Velocity 2015-final
PDF
Real Time Analytics: Algorithms and Systems
PDF
Anomaly detection in real-time data streams using Heron
PDF
Anomaly Detection @Twitter
PDF
Isolating Events from the Fail Whale
PDF
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
PDF
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
PDF
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
PPTX
Everyone is a Data Analyst Adobe EMEA Summit 2014
PDF
Days In Green (DIG): Forecasting the life of a healthy service
PDF
A Systematic Approach to Capacity Planning in the Real World
PPTX
Time series Analysis & fpp package
PPTX
PyGotham 2016
PDF
Anomaly detection : QuantUniversity Workshop
PDF
Data, data, everywhere… - SEE UK - 2016
PDF
Anomaly detection Meetup Slides
Data Data Everywhere: Not An Insight to Take Action Upon
Anomaly detection
Anomaly detection
Finding bad apples early: Minimizing performance impact
Velocity 2015-final
Real Time Analytics: Algorithms and Systems
Anomaly detection in real-time data streams using Heron
Anomaly Detection @Twitter
Isolating Events from the Fail Whale
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone is a Data Analyst Adobe EMEA Summit 2014
Days In Green (DIG): Forecasting the life of a healthy service
A Systematic Approach to Capacity Planning in the Real World
Time series Analysis & fpp package
PyGotham 2016
Anomaly detection : QuantUniversity Workshop
Data, data, everywhere… - SEE UK - 2016
Anomaly detection Meetup Slides
Ad

Similar to Statistical Learning Based Anomaly Detection @ Twitter (20)

PDF
Outlier analysis for Temporal Datasets
PPTX
Anomaly detection
PDF
Anomaly Detection in Seasonal Time Series
PPTX
Time Series Anomaly Detection with .net and Azure
PDF
Anomaly detection (Unsupervised Learning) in Machine Learning
PDF
Anomaly detection Workshop slides
PDF
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
PDF
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
PDF
Dataday Texas 2016 - Datadog
PDF
Term_Paper_Shengzhe_Wang
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
PDF
An Introduction to Anomaly Detection
PPTX
Time Series Anomaly Detection with .net and Azure
PPTX
Anomaly Detection - New York Machine Learning
PDF
Analytics for large-scale time series and event data
PPTX
Simple math for anomaly detection toufic boubez - metafor software - monito...
PDF
A_review_on_outlier_detection_in_time_series_data__BCAM_1.pdf.pdf
PPTX
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
PPTX
Anomaly Detection for Real-World Systems
PPTX
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Outlier analysis for Temporal Datasets
Anomaly detection
Anomaly Detection in Seasonal Time Series
Time Series Anomaly Detection with .net and Azure
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection Workshop slides
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Dataday Texas 2016 - Datadog
Term_Paper_Shengzhe_Wang
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
An Introduction to Anomaly Detection
Time Series Anomaly Detection with .net and Azure
Anomaly Detection - New York Machine Learning
Analytics for large-scale time series and event data
Simple math for anomaly detection toufic boubez - metafor software - monito...
A_review_on_outlier_detection_in_time_series_data__BCAM_1.pdf.pdf
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
Anomaly Detection for Real-World Systems
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25

More from Arun Kejariwal (13)

PDF
Anomaly Detection At The Edge
PDF
Serverless Streaming Architectures and Algorithms for the Enterprise
PDF
Sequence-to-Sequence Modeling for Time Series
PDF
Sequence-to-Sequence Modeling for Time Series
PDF
Model Serving via Pulsar Functions
PDF
Designing Modern Streaming Data Applications
PDF
Correlation Analysis on Live Data Streams
PDF
Deep Learning for Time Series Data
PDF
Correlation Analysis on Live Data Streams
PDF
Live Anomaly Detection
PDF
Modern real-time streaming architectures
PDF
Techniques for Minimizing Cloud Footprint
PDF
A Tool for Practical Garbage Collection Analysis In the Cloud
Anomaly Detection At The Edge
Serverless Streaming Architectures and Algorithms for the Enterprise
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
Model Serving via Pulsar Functions
Designing Modern Streaming Data Applications
Correlation Analysis on Live Data Streams
Deep Learning for Time Series Data
Correlation Analysis on Live Data Streams
Live Anomaly Detection
Modern real-time streaming architectures
Techniques for Minimizing Cloud Footprint
A Tool for Practical Garbage Collection Analysis In the Cloud

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx

Statistical Learning Based Anomaly Detection @ Twitter

  • 1. Sta$s$cal Learning Based Anomaly Detec$on @ Twi9er Arun Kejariwal (@arun_kejariwal) Joint work with Jordan Hochenbaum and Owen Vallis November 2014
  • 2. Internet trends • Real-time [1] h9p://techcrunch.com/2014/05/05/amazon-­‐extends-­‐its-­‐shopping-­‐cart-­‐to-­‐twi9er/ AK 2 [1]
  • 3. Twi9er: Global Town Square AK 3
  • 4. Data Fidelity • Data-driven decision making q Evolving product landscape • Data partners q Nielsen q Dataminr • Operational q Performance and Availability AK 4
  • 5. Data Fidelity: Challenges • Anomalies q Exogenic factors § User behavior § Events § Data center q Endogenic factors § Agile development o Fail fast § Data collection • Millions of time series [1,2] q Scalability AK 5 [1] h9p://strata.oreilly.com/2013/09/how-­‐twi9er-­‐monitors-­‐millions-­‐of-­‐$me-­‐series.html [2] h9p://strataconf.com/strata2014/public/schedule/detail/32431
  • 6. Anomaly Detec$on: Why Bother? • Analyze User Engagement q Events § Super Bowl, Japanese New Year q Year over year analysis (input to forecasting) • Identify Attacks q DoS q Malware attacks • Identify Bots q Separating actual users from spam AK 6
  • 7. Anomaly Detec$on • Visual q Prone to errors q Not scalable § Machine generated data 11% of the digital universe in 2005 to > 40% by 2020 [1] § Cloud Infrastructure 2013-2017 CAGR ~50% [2] • Algorithmic approach q Automate! [1] h9p://www.emc.com/about/news/press/2012/20121211-­‐01.htm AK 7 [2] h9p://www.forbes.com/sites/gilpress/2013/12/12/16-­‐1-­‐billion-­‐big-­‐data-­‐market-­‐2014-­‐predic$ons-­‐from-­‐idc-­‐and-­‐iia/
  • 8. Anomaly Detec$on: Background • Over 50 years of research [1] q Statistics § Extreme Value Theory § Robust Statistics, Grubb’s Test, ESD q Econometrics q Finance § Value at Risk (VaR) q Signal Processing q Music Information Retrieval q Networking q E- Commerce q Performance Regression [1] “Anomaly Detec$on” by Chandola et al. ACM Compu$ng Surveys, 2009. AK 8 Jon from Etsy Toufic from Metafor
  • 9. Anomaly Detec$on: Overview • Definition q “An anomaly is an observation that deviates so much from other observations so as to arouse suspicions that it is was generated by a different mechanism” [1,2] [1] “Iden$fica$on of outliers” by Hawkins, Douglas M. London: Chapman and Hall, 1980. AK 9 [2] “Outlier Analysis” by Charu C. Aggarwal. Springer, 2013.
  • 10. Anomaly Detec$on • Characterization q Magnitude q Width q Frequency q Direction AK 10
  • 11. Anomaly Detec$on (contd.) • Two flavors q Global § Max Value q Local § Intra-day AK 11 Global Local
  • 12. Anomaly Detec$on (contd.) • Traditional Approaches q Metrics § Mean μ § Variance σ q Rule of thumb § μ + 3*σ q Which time series? § Raw § Moving Averages o SMA, EWMA, PEWMA AK 12 3 * σ
  • 13. Anomaly Detec$on (contd.) • Impact of multi-modal distribution q μ Shift ~ 0.2% q Inflates σ by 4.5% § Miss quite a few anomalies q What do multiple modes correspond to? § Seasonality AK 13
  • 14. • Robust Statistics q MAD § Robust Breakdown point o Median 50% vs. Mean 0% q σMAD § K = 1.4826 for normally distributed data AK 14 Anomaly Detec$on (contd.)
  • 15. • Limitations of using MAD AK 15 Anomaly Detec$on (contd.)
  • 16. • Grubb’s Test q Critical value is derived from data using a statistical confidence (α) • Limitations q Assumes data distribution is normal q Good for detecting ONLY 1 outlier q Seasonality unaware AK 16 Anomaly Detec$on (contd.)
  • 17. • ESD (Generalized Extreme Studentized Deviate) [1] q Critical value (λi) re-calculated every iteration q Largest i such that Ri > λi determines # of anomalies q An upper-bound on the number of anomalies is an input parameter • Limitations q Generalized ESD assumes a “normal” distribution q Seasonality unaware AK 17 Anomaly Detec$on (contd.) [1] Rosner, Bernard. “Percentage Points for a Generalized ESD Many-­‐outlier Procedure.” Technometrics 25, no. 2 (1983): 165–172.
  • 19. • Addressing Seasonality q Key Idea § Time Series Decomposition AK 19 Anomaly Detec$on (contd.)
  • 20. • Determining seasonal component q Regression on sub-cycle plots [1] AK 20 Anomaly Detec$on (contd.) [1] “STL: A seasonal-­‐trend decomposi$on procedure based on loess” by Cleveland, et al. Journal of Official Sta$s$cs, Vol. 6, Issue 1, 1990.
  • 21. • Impact of removal of seasonal and trend q Transforms our multi-modal data into unimodal data. § Amenable to ESD/MAD! AK 21 Anomaly Detec$on (contd.) The decomposed Residual becomes "Uni-modal". This significantly shrinks the value of sigma. The original "Multi-Modal" Raw Data has a much wider value for sigma, leading ESD to miss a lot of the outliers.
  • 22. Trend Smoothing Distortion Creates “Phantom” Anomalies • Challenges remain! AK 22 Anomaly Detec$on (contd.)
  • 23. • Marrying Robust Statistics with Seasonal Decomposition AK 23 Anomaly Detec$on (contd.) Median is Free from Distortion
  • 24. • Applying ESD on the Residual AK 24 Anomaly Detec$on (contd.) Decomposition Exposes Anomalies
  • 25. • Recap q Extract the seasonal component using STL § Filters out periodic spikes q Residual = Raw - Seasonalraw- Medianraw q Run ESD on residual (using median and MAD) AK 25 Anomaly Detec$on (contd.)
  • 26. • Illustrative example AK 26 Anomaly Detec$on (contd.)
  • 27. • Applications q Three perspectives § Capacity o CPU utilization o Garbage collection o Network activity § User behavior o Events • Impressions • Link clicks o Spam § Forecasting AK 27 Anomaly Detec$on (contd.)
  • 28. • Deployed in production q Used by large number of services at Twitter q Automatic e-mail notification § Only sent if anomalies are present § Anomalies annotated § CSV with anomaly locations attached AK 28 Anomaly Detec$on (contd.)
  • 29. • Skyline from Etsy q https://github.com/etsy/skyline/blob/master/src/analyzer/algorithms.py • Coming soon! q R package AK 29 Open Sourcing
  • 30. Join the Flock Like problem solving? Like challenges? Be at cukng Edge Make an impact • We are hiring!! q https://twitter.com/JoinTheFlock q https://twitter.com/jobs q Contact us: @arun_kejariwal AK 30