THE NUMENTA ANOMALY
BENCHMARK
EVALUATING REAL TIME ANOMALY DETECTION
SF Data Science Meetup
November 19, 2015
Alexander Lavin
alavin@numenta.com
2
Monitoring
IT infrastructure
Uncovering
fraudulent
transactions
Tracking
vehicles
Real-time
health
monitoring
Monitoring
energy
consumption
Detection is necessary, but prevention is often the goal
REAL-TIME ANOMALY DETECTION
•  Exponential growth in IoT, sensors, and real-time data collection is driving an
explosion of streaming data
•  The biggest application for machine learning is anomaly detection
3
EXAMPLE: PREVENTATIVE MAINTENANCE
Planned
shutdown
Behavioral change
preceding failure
Catastrophic
failure
4
TYPES OF ANOMALIES IN STREAMING DATA
Point anomalies
Temporal
(contextual/
conditional)
5
ANOMALY DETECTION TECHNIQUES
•  Traditional techniques
•  Classification-based
•  Clustering & nearest-neighbor
•  Statistical techniques
•  Chandola et al., “Anomaly Detection: A Survey”
•  In streaming we typically see a collection of statistical techniques
•  time-series modeling and forecasting models (e.g. ARIMA)
•  change point detection
•  outliers tests (e.g. ESD, k-sigma)
•  Most techniques not suitable for streaming data
•  new approaches needed
•  non-streaming benchmarks aren't very useful
6
WHY CREATE A BENCHMARK?
•  A benchmark consists of:
•  Labeled data files
•  Scoring mechanism
•  Versioning system
•  Most existing benchmarks are designed for batch data, not
streaming data
•  We saw a need for a benchmark that is designed to test anomaly
detection algorithms on real-time, streaming data
•  Hard to find benchmarks containing real world data labeled with
anomalies
•  Impact of published techniques suffers because researchers use
use different data, and/or completely artificial data.
•  A standard community benchmark could spur innovation in real-
time anomaly detection algorithms
7
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark dataset
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Custom scoring function
•  Reward early detection
•  Anomaly windows
•  Different “application profiles”
•  Open resource
•  AGPL repository contains data, source code,
and documentation
•  github.com/numenta/NAB!
8
Unusually high load balancer latency
EXAMPLE: LOAD BALANCER HEALTH
9
Unusually low demandSpike in demand
EXAMPLE: NYC TAXI HOURLY SERVICE
DEMAND
10
EXAMPLE: PRODUCTION SERVER CPU
Spiking behavior becomes the new norm
Spike anomaly
11
HOW SHOULD WE SCORE ANOMALIES?
•  The perfect detector
•  Detects every anomaly
•  Detects anomalies as soon as possible
•  tremendous value to detecting anomalies beforehand
•  Provides detections in real time
•  Triggers no false alarms
•  Requires no parameter tuning
•  can’t manually tune params because potentially thousands of models
•  Automatically adapts to changing statistics
•  e.g. servers get new SW
12
HOW SHOULD WE SCORE ANOMALIES?
•  Scoring methods in traditional benchmarks are insufficient
•  Precision, recall, and F1-score do not incorporate the value of time
•  early detections are not rewarded
•  Artificial separation into training and test sets does not handle continuous learning
•  Batch data files allow look ahead and multiple passes through the data
•  this is unrealistic for real-world use
13
WHERE IS THE ANOMALY?
14
NAB DEFINES ANOMALY WINDOWS
15
•  Effect of each detection is scaled
relative to position within window:
•  Detections outside window are false
positives (scored low)
•  Multiple detections within window are
ignored (use earliest one)
•  Total score is sum of scaled detections
+ weighted sum of missed detections:
SCORING FUNCTION
16
OTHER DETAILS
•  Application profiles
•  Application profiles assign different weightings based on the tradeoff between false
positives and false negatives.
•  EKG data on a cardiac patient favors FPs over FNs.
•  IT / DevOps professionals hate FPs.
•  Three application profiles: standard, favor low false positives, favor low false negatives.
•  NAB emulates practical real-time scenarios
•  Look ahead not allowed for algorithms. Detections must be made on the fly.
•  No separation between training and test files. Invoke model, start streaming, and go.
•  No batch, per data file, parameter tuning. Must be fully automated with single set of
parameters across data files. Any further parameter tuning must be done on the fly.
17
TESTING ALGORITHMS WITH NAB
•  NAB is a community effort
•  The goal is to have researchers independently evaluate a large number of algorithms
•  Very easy to plug in and test new algorithms
•  Seed results with three algorithms:
•  Hierarchical Temporal Memory
•  Numenta’s open source streaming anomaly detection algorithm
•  Models temporal sequences in data, continuously learning
•  Etsy Skyline
•  Popular open source anomaly detection technique
•  Mixture of statistical experts, continuously learning
•  Twitter AnomalyDetection
•  Open source anomaly detection released earlier this year
•  Robust outlier statistics + piecewise approximation
18
NAB V1.0 RESULTS
A lot of room for improvement!
19
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
20
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
21
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
22
SUMMARY
•  Anomaly detection is most common application for streaming analytics
•  NAB is a community benchmark for streaming anomaly detection
•  Includes a labeled dataset with real data
•  Scoring methodology designed for practical real-time applications
•  Fully open source codebase
•  What can you get out of NAB?
•  Test and improve your algorithms
•  Contribute and improve NAB
•  Learn about streaming anomaly detection
23
SUMMARY
•  What’s next for NAB?
•  We hope to see researchers test additional algorithms
•  We hope to spark improved algorithms for streaming
•  More data sets!
•  Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)
•  Would love to get more labeled streaming datasets from you
•  Add support for multivariate anomaly detection
•  Any changes that affect the results will be released with v2.0
24
NAB RESOURCES
Repository: github.com/numenta/NAB
Paper:
A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms –
the Numenta Anomaly Benchmark,” to appear in 14th International Conference
on Machine Learning and Applications (IEEE ICMLA’15), 2015.
Preprint available: arxiv.org/abs/1510.03336
Presentation from MLConf:
https://www.youtube.com/watch?v=SxtsCrTHz-4
Contact info:
nab@numenta.org
alavin@numenta.com, sahmad@numenta.com
THANK YOU!
QUESTIONS?
26
NUMENTA RESOURCES
•  “Properties of Sparse Distributed Representations and their Application to
Hierarchical Temporal Memory”: http://arxiv.org/abs/1503.07469
•  “Why Neurons Have Thousands of Synapses, A Theory of Sequence
Memory in Neocortex”: http://arxiv.org/abs/1511.00083
•  NuPIC: Numenta Platform for Intelligent Computing open source repo
•  https://github.com/numenta/nupic
•  http://numenta.org/
•  Numenta
•  http://numenta.com/
•  HTM Whitepaper:
http://numenta.com/learn/hierarchical-temporal-memory-white-paper.html
27
NAB EXAMPLES
•  Figs. 1, 2, 5 from the paper: plot.ly/~alavin/3767
•  Fig. 4 from the paper: plot.ly/~alavin/3753
•  Fig. 6 from the paper: plot.ly/~alavin/3706
•  Subtle change in CPU utilization that precedes a much larger anomaly:
plot.ly/~alavin/3720
•  An anomaly preceding a much larger drop in CPU utilization: plot.ly/
~alavin/3717
•  All three detectors get the two TPs, but in different orders: plot.ly/~alavin/
3741
•  Good detections by HTM, but a lot of FPs: plot.ly/~alavin/3711
•  Noisy, difficult CPU utilization data: plot.ly/~alavin/3761
•  Temporal anomalies in spiking social media data: plot.ly/~alavin/3815
•  No true anomalies, but FP detections in CPU utilization data: https://plot.ly/
~alavin/3723
28
CUSTOM DETECTOR
How$to$enter$a$custom$anomaly$detection$algorithm$into$NAB$
Please&follow&a&path&for&your&detector&under&test&(DUT).&File&extensions&are&from&NAB/&directory.&
Path%I:%create%a%detector%
Subclass$detectors/base.py$for$
your$detector$“alpha”,$add$it$as$
detectors/alpha/
alpha_detector.py.$Then$execute$
on$the$console:$python
run.py –d alpha!
Path%II:%give%anomaly%scores%
Use$your$algorithm$to$create$anomaly$
scores$in$the$Eile$format$speciEied$in$
Appendix$F$of$the$NAB$writeup,$then$
execute$from$the$console:$python
run.py –d alpha –optimize
–score --normalize!
Path%III:%give%detections%
Use$your$algorithm$to$create$
anomaly$detections$in$the$Eile$format$
speciEied$in$Appendix$F$of$the$NAB$
writeup,$then$execute$from$the$
console:$python run.py –d
alpha --score --normalize!
NAB$DATA$
CORPUS$
$
data/$
SCORES$
$
results/$
DETECTORS$
$
detectors/$
ANOMALY$
SCORES$
$
results/$
RAW$
LABELS$
$
labels/
raw/$
COMBINED)
LABELS)
)
labels/)
PROFILES)
)
config/)
preprocessed&
SCORER$
$
nab/
scorer.py$
OPTIMIZE$
THRESHOLD$$
$
nab/
runner.py$
ANOMALY$
SCORES$
$
results/$
29
•  Scoring example
a)  FP before the window
b)  TP in the window
c)  additional TP (not counted)
d)  FP soon after the window
e)  FP long after the window
Ø  total score = -1.809
•  Missing a window
completely (i.e. FN)
detriments the score
-1.0
SCALED SIGMOID SCORING FUNCTION
29
(a)
(c)
(d)
(e)
(b)
30
ANOMALY DETECTION WITH HTM
•  How do we turn a data stream into anomaly scores?
HTM Algorithms
Encoder SDR Predictions
Raw anomaly score
Anomaly likelihood
Data
31
CALCULATING RAW ANOMALY SCORE
• Raw anomaly score is the fraction of active columns that were not
predicted.
• This is high when the spatial or temporal patterns deviate from the
norm.
rawAnomalyScore =
At −(Pt−1 ∩ At )
At
Pt = Predicted columns at time t
At = Active columns at time t
32
0
20
40
60
80
100
120
Machine Temperature
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Raw Anomaly Score
RAW ANOMALY SCORES EXAMPLE
33
• Compute normal distribution over history
• Compute probability for each point relative to the distribution
CALCULATING ANOMALY LIKELIHOOD
µ = xP(x)∑ σ = E[(X −µ)2
]
34
CALCULATING ANOMALY LIKELIHOOD
0	
  
0.02	
  
0.04	
  
0.06	
  
0.08	
  
0.1	
  
0.12	
  
0.14	
  
0.16	
  
0.18	
  
Probability	
   Probability	
  Distribu.on	
  
Mean 0.0201
Std. Dev. 0.1237
0	
  
0.2	
  
0.4	
  
0.6	
  
0.8	
  
1	
  
Raw	
  Anomaly	
  Score	
  

More Related Content

PPTX
Anomaly Detection Using the CLA
PDF
Predictive Analytics with Numenta Machine Intelligence
PPTX
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
PPTX
Streaming Analytics: It's Not the Same Game
PPTX
Getting Started with Numenta Technology
PPTX
Science of Anomaly Detection
PDF
Detecting Anomalies in Streaming Data
PPTX
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15
Anomaly Detection Using the CLA
Predictive Analytics with Numenta Machine Intelligence
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Streaming Analytics: It's Not the Same Game
Getting Started with Numenta Technology
Science of Anomaly Detection
Detecting Anomalies in Streaming Data
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15

What's hot (20)

PPTX
Applications of Hierarchical Temporal Memory (HTM)
PPTX
Real-Time Streaming Data Analysis with HTM
PDF
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
PPTX
HTM & Apache Flink (2016-06-27)
PPTX
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
PDF
A Practical Guide to Anomaly Detection for DevOps
PPTX
SplunkLive! Prelert Session - Extending Splunk with Machine Learning
PDF
Data pipelines and anomaly detection
PPTX
Deep dive time series anomaly detection with different Azure Data Services
PDF
Finding bad apples early: Minimizing performance impact
PDF
Anomaly detection in real-time data streams using Heron
PDF
Performance engineering methodologies
PPTX
DockerCon SF 2019 - TDD is Dead
PDF
How to Monitoring the SRE Golden Signals (E-Book)
PPTX
DockerCon SF 2019 - Observability Workshop
PDF
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
PDF
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
PDF
Migrating to Prometheus: what we learned running it in production
PPTX
SecureWV - APT2
PPTX
Developing Highly Instrumented Applications with Minimal Effort
Applications of Hierarchical Temporal Memory (HTM)
Real-Time Streaming Data Analysis with HTM
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
HTM & Apache Flink (2016-06-27)
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
A Practical Guide to Anomaly Detection for DevOps
SplunkLive! Prelert Session - Extending Splunk with Machine Learning
Data pipelines and anomaly detection
Deep dive time series anomaly detection with different Azure Data Services
Finding bad apples early: Minimizing performance impact
Anomaly detection in real-time data streams using Heron
Performance engineering methodologies
DockerCon SF 2019 - TDD is Dead
How to Monitoring the SRE Golden Signals (E-Book)
DockerCon SF 2019 - Observability Workshop
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
Migrating to Prometheus: what we learned running it in production
SecureWV - APT2
Developing Highly Instrumented Applications with Minimal Effort
Ad

Viewers also liked (13)

PPTX
A Whole New World [DEMO #4] (2014 Fall NuPIC Hackathon)
PPTX
2014 Spring NuPIC Hackathon Kickoff
PPTX
2014 Fall NuPIC Hackathon Kickoff
PPTX
Temporal memory in racket
PPTX
State of NuPIC
PDF
We'll Always Have Paris
PPTX
Beginner's Guide to NuPIC
PPTX
Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence
PPTX
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
PPTX
Sparse Distributed Representations: Our Brain's Data Structure
PPTX
What the Brain says about Machine Intelligence
PPTX
Why Neurons have thousands of synapses? A model of sequence memory in the brain
PDF
Detecting Anomalies in Streaming Data
A Whole New World [DEMO #4] (2014 Fall NuPIC Hackathon)
2014 Spring NuPIC Hackathon Kickoff
2014 Fall NuPIC Hackathon Kickoff
Temporal memory in racket
State of NuPIC
We'll Always Have Paris
Beginner's Guide to NuPIC
Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Sparse Distributed Representations: Our Brain's Data Structure
What the Brain says about Machine Intelligence
Why Neurons have thousands of synapses? A model of sequence memory in the brain
Detecting Anomalies in Streaming Data
Ad

Similar to Numenta Anomaly Benchmark - SF Data Science Meetup (20)

PDF
GlobalAI2016-Yuwei
PPTX
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
PPTX
Time Series Anomaly Detection for .net and Azure
PPTX
Deep Dive Time Series Anomaly Detection in Azure with dotnet
PDF
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
PDF
Strata 2014 Anomaly Detection
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
PPTX
Time Series Anomaly Detection with Azure and .NETT
PPTX
Time Series Anomaly Detection with .net and Azure
PDF
Influx/Days 2017 San Francisco | Baron Schwartz
PDF
stackconf 2024 | IGNITE: Practical AI with Machine Learning for Observability...
PPTX
Time Series Anomaly Detection with .net and Azure
PDF
Anomaly Detection using Deep Auto-Encoders
PPTX
Hierarchical Temporal Memory for Real-time Anomaly Detection
PDF
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
PPTX
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
PPTX
Anomaly detection - TIBCO Data Science Central
PPTX
Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak
PPTX
How to find what you didn't know to look for, oractical anomaly detection
PPTX
A review of machine learning based anomaly detection
GlobalAI2016-Yuwei
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Time Series Anomaly Detection for .net and Azure
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014 Anomaly Detection
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Time Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with .net and Azure
Influx/Days 2017 San Francisco | Baron Schwartz
stackconf 2024 | IGNITE: Practical AI with Machine Learning for Observability...
Time Series Anomaly Detection with .net and Azure
Anomaly Detection using Deep Auto-Encoders
Hierarchical Temporal Memory for Real-time Anomaly Detection
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Anomaly detection - TIBCO Data Science Central
Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak
How to find what you didn't know to look for, oractical anomaly detection
A review of machine learning based anomaly detection

More from Numenta (20)

PDF
Deep learning at the edge: 100x Inference improvement on edge devices
PDF
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
PDF
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
PDF
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
PDF
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
PDF
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
PDF
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
PDF
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
PDF
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
PDF
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
PDF
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
PDF
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
PDF
Sparsity In The Neocortex, And Its Implications For Machine Learning
PDF
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
PPTX
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
PPTX
Location, Location, Location - A Framework for Intelligence and Cortical Comp...
PPTX
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...
PPTX
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
PPTX
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
PDF
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
Deep learning at the edge: 100x Inference improvement on edge devices
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
Sparsity In The Neocortex, And Its Implications For Machine Learning
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Location, Location, Location - A Framework for Intelligence and Cortical Comp...
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)

Recently uploaded (20)

PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Architecture types and enterprise applications.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PPTX
Modernising the Digital Integration Hub
PDF
A comparative study of natural language inference in Swahili using monolingua...
DOCX
search engine optimization ppt fir known well about this
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
Module 1.ppt Iot fundamentals and Architecture
Architecture types and enterprise applications.pdf
The various Industrial Revolutions .pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Modernising the Digital Integration Hub
A comparative study of natural language inference in Swahili using monolingua...
search engine optimization ppt fir known well about this
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Custom Battery Pack Design Considerations for Performance and Safety
Abstractive summarization using multilingual text-to-text transfer transforme...
1 - Historical Antecedents, Social Consideration.pdf
Hindi spoken digit analysis for native and non-native speakers
A review of recent deep learning applications in wood surface defect identifi...
Enhancing emotion recognition model for a student engagement use case through...
NewMind AI Weekly Chronicles – August ’25 Week III
Taming the Chaos: How to Turn Unstructured Data into Decisions
UiPath Agentic Automation session 1: RPA to Agents
OpenACC and Open Hackathons Monthly Highlights July 2025
A contest of sentiment analysis: k-nearest neighbor versus neural network

Numenta Anomaly Benchmark - SF Data Science Meetup

  • 1. THE NUMENTA ANOMALY BENCHMARK EVALUATING REAL TIME ANOMALY DETECTION SF Data Science Meetup November 19, 2015 Alexander Lavin alavin@numenta.com
  • 2. 2 Monitoring IT infrastructure Uncovering fraudulent transactions Tracking vehicles Real-time health monitoring Monitoring energy consumption Detection is necessary, but prevention is often the goal REAL-TIME ANOMALY DETECTION •  Exponential growth in IoT, sensors, and real-time data collection is driving an explosion of streaming data •  The biggest application for machine learning is anomaly detection
  • 3. 3 EXAMPLE: PREVENTATIVE MAINTENANCE Planned shutdown Behavioral change preceding failure Catastrophic failure
  • 4. 4 TYPES OF ANOMALIES IN STREAMING DATA Point anomalies Temporal (contextual/ conditional)
  • 5. 5 ANOMALY DETECTION TECHNIQUES •  Traditional techniques •  Classification-based •  Clustering & nearest-neighbor •  Statistical techniques •  Chandola et al., “Anomaly Detection: A Survey” •  In streaming we typically see a collection of statistical techniques •  time-series modeling and forecasting models (e.g. ARIMA) •  change point detection •  outliers tests (e.g. ESD, k-sigma) •  Most techniques not suitable for streaming data •  new approaches needed •  non-streaming benchmarks aren't very useful
  • 6. 6 WHY CREATE A BENCHMARK? •  A benchmark consists of: •  Labeled data files •  Scoring mechanism •  Versioning system •  Most existing benchmarks are designed for batch data, not streaming data •  We saw a need for a benchmark that is designed to test anomaly detection algorithms on real-time, streaming data •  Hard to find benchmarks containing real world data labeled with anomalies •  Impact of published techniques suffers because researchers use use different data, and/or completely artificial data. •  A standard community benchmark could spur innovation in real- time anomaly detection algorithms
  • 7. 7 NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark dataset •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points •  Scoring mechanism •  Custom scoring function •  Reward early detection •  Anomaly windows •  Different “application profiles” •  Open resource •  AGPL repository contains data, source code, and documentation •  github.com/numenta/NAB!
  • 8. 8 Unusually high load balancer latency EXAMPLE: LOAD BALANCER HEALTH
  • 9. 9 Unusually low demandSpike in demand EXAMPLE: NYC TAXI HOURLY SERVICE DEMAND
  • 10. 10 EXAMPLE: PRODUCTION SERVER CPU Spiking behavior becomes the new norm Spike anomaly
  • 11. 11 HOW SHOULD WE SCORE ANOMALIES? •  The perfect detector •  Detects every anomaly •  Detects anomalies as soon as possible •  tremendous value to detecting anomalies beforehand •  Provides detections in real time •  Triggers no false alarms •  Requires no parameter tuning •  can’t manually tune params because potentially thousands of models •  Automatically adapts to changing statistics •  e.g. servers get new SW
  • 12. 12 HOW SHOULD WE SCORE ANOMALIES? •  Scoring methods in traditional benchmarks are insufficient •  Precision, recall, and F1-score do not incorporate the value of time •  early detections are not rewarded •  Artificial separation into training and test sets does not handle continuous learning •  Batch data files allow look ahead and multiple passes through the data •  this is unrealistic for real-world use
  • 13. 13 WHERE IS THE ANOMALY?
  • 15. 15 •  Effect of each detection is scaled relative to position within window: •  Detections outside window are false positives (scored low) •  Multiple detections within window are ignored (use earliest one) •  Total score is sum of scaled detections + weighted sum of missed detections: SCORING FUNCTION
  • 16. 16 OTHER DETAILS •  Application profiles •  Application profiles assign different weightings based on the tradeoff between false positives and false negatives. •  EKG data on a cardiac patient favors FPs over FNs. •  IT / DevOps professionals hate FPs. •  Three application profiles: standard, favor low false positives, favor low false negatives. •  NAB emulates practical real-time scenarios •  Look ahead not allowed for algorithms. Detections must be made on the fly. •  No separation between training and test files. Invoke model, start streaming, and go. •  No batch, per data file, parameter tuning. Must be fully automated with single set of parameters across data files. Any further parameter tuning must be done on the fly.
  • 17. 17 TESTING ALGORITHMS WITH NAB •  NAB is a community effort •  The goal is to have researchers independently evaluate a large number of algorithms •  Very easy to plug in and test new algorithms •  Seed results with three algorithms: •  Hierarchical Temporal Memory •  Numenta’s open source streaming anomaly detection algorithm •  Models temporal sequences in data, continuously learning •  Etsy Skyline •  Popular open source anomaly detection technique •  Mixture of statistical experts, continuously learning •  Twitter AnomalyDetection •  Open source anomaly detection released earlier this year •  Robust outlier statistics + piecewise approximation
  • 18. 18 NAB V1.0 RESULTS A lot of room for improvement!
  • 19. 19 DETECTION RESULTS: CPU USAGE ON PRODUCTION SERVER Simple spike, all 3 algorithms detect Shift in usage Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 20. 20 DETECTION RESULTS: MACHINE TEMPERATURE READINGS HTM detects purely temporal anomaly Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key All 3 detect catastrophic failure
  • 21. 21 DETECTION RESULTS: TEMPORAL CHANGES IN BEHAVIOR OFTEN PRECEDE A LARGER SHIFT HTM detects anomaly 3 hours earlier Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 22. 22 SUMMARY •  Anomaly detection is most common application for streaming analytics •  NAB is a community benchmark for streaming anomaly detection •  Includes a labeled dataset with real data •  Scoring methodology designed for practical real-time applications •  Fully open source codebase •  What can you get out of NAB? •  Test and improve your algorithms •  Contribute and improve NAB •  Learn about streaming anomaly detection
  • 23. 23 SUMMARY •  What’s next for NAB? •  We hope to see researchers test additional algorithms •  We hope to spark improved algorithms for streaming •  More data sets! •  Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source) •  Would love to get more labeled streaming datasets from you •  Add support for multivariate anomaly detection •  Any changes that affect the results will be released with v2.0
  • 24. 24 NAB RESOURCES Repository: github.com/numenta/NAB Paper: A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark,” to appear in 14th International Conference on Machine Learning and Applications (IEEE ICMLA’15), 2015. Preprint available: arxiv.org/abs/1510.03336 Presentation from MLConf: https://www.youtube.com/watch?v=SxtsCrTHz-4 Contact info: nab@numenta.org alavin@numenta.com, sahmad@numenta.com
  • 26. 26 NUMENTA RESOURCES •  “Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory”: http://arxiv.org/abs/1503.07469 •  “Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex”: http://arxiv.org/abs/1511.00083 •  NuPIC: Numenta Platform for Intelligent Computing open source repo •  https://github.com/numenta/nupic •  http://numenta.org/ •  Numenta •  http://numenta.com/ •  HTM Whitepaper: http://numenta.com/learn/hierarchical-temporal-memory-white-paper.html
  • 27. 27 NAB EXAMPLES •  Figs. 1, 2, 5 from the paper: plot.ly/~alavin/3767 •  Fig. 4 from the paper: plot.ly/~alavin/3753 •  Fig. 6 from the paper: plot.ly/~alavin/3706 •  Subtle change in CPU utilization that precedes a much larger anomaly: plot.ly/~alavin/3720 •  An anomaly preceding a much larger drop in CPU utilization: plot.ly/ ~alavin/3717 •  All three detectors get the two TPs, but in different orders: plot.ly/~alavin/ 3741 •  Good detections by HTM, but a lot of FPs: plot.ly/~alavin/3711 •  Noisy, difficult CPU utilization data: plot.ly/~alavin/3761 •  Temporal anomalies in spiking social media data: plot.ly/~alavin/3815 •  No true anomalies, but FP detections in CPU utilization data: https://plot.ly/ ~alavin/3723
  • 28. 28 CUSTOM DETECTOR How$to$enter$a$custom$anomaly$detection$algorithm$into$NAB$ Please&follow&a&path&for&your&detector&under&test&(DUT).&File&extensions&are&from&NAB/&directory.& Path%I:%create%a%detector% Subclass$detectors/base.py$for$ your$detector$“alpha”,$add$it$as$ detectors/alpha/ alpha_detector.py.$Then$execute$ on$the$console:$python run.py –d alpha! Path%II:%give%anomaly%scores% Use$your$algorithm$to$create$anomaly$ scores$in$the$Eile$format$speciEied$in$ Appendix$F$of$the$NAB$writeup,$then$ execute$from$the$console:$python run.py –d alpha –optimize –score --normalize! Path%III:%give%detections% Use$your$algorithm$to$create$ anomaly$detections$in$the$Eile$format$ speciEied$in$Appendix$F$of$the$NAB$ writeup,$then$execute$from$the$ console:$python run.py –d alpha --score --normalize! NAB$DATA$ CORPUS$ $ data/$ SCORES$ $ results/$ DETECTORS$ $ detectors/$ ANOMALY$ SCORES$ $ results/$ RAW$ LABELS$ $ labels/ raw/$ COMBINED) LABELS) ) labels/) PROFILES) ) config/) preprocessed& SCORER$ $ nab/ scorer.py$ OPTIMIZE$ THRESHOLD$$ $ nab/ runner.py$ ANOMALY$ SCORES$ $ results/$
  • 29. 29 •  Scoring example a)  FP before the window b)  TP in the window c)  additional TP (not counted) d)  FP soon after the window e)  FP long after the window Ø  total score = -1.809 •  Missing a window completely (i.e. FN) detriments the score -1.0 SCALED SIGMOID SCORING FUNCTION 29 (a) (c) (d) (e) (b)
  • 30. 30 ANOMALY DETECTION WITH HTM •  How do we turn a data stream into anomaly scores? HTM Algorithms Encoder SDR Predictions Raw anomaly score Anomaly likelihood Data
  • 31. 31 CALCULATING RAW ANOMALY SCORE • Raw anomaly score is the fraction of active columns that were not predicted. • This is high when the spatial or temporal patterns deviate from the norm. rawAnomalyScore = At −(Pt−1 ∩ At ) At Pt = Predicted columns at time t At = Active columns at time t
  • 33. 33 • Compute normal distribution over history • Compute probability for each point relative to the distribution CALCULATING ANOMALY LIKELIHOOD µ = xP(x)∑ σ = E[(X −µ)2 ]
  • 34. 34 CALCULATING ANOMALY LIKELIHOOD 0   0.02   0.04   0.06   0.08   0.1   0.12   0.14   0.16   0.18   Probability   Probability  Distribu.on   Mean 0.0201 Std. Dev. 0.1237 0   0.2   0.4   0.6   0.8   1   Raw  Anomaly  Score