SlideShare a Scribd company logo
BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Anomaly Detection
database integrated
Dr. Olaf Nimz
Our company.
anomaly detection2 08/01/2018
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
Scoring Engine for R
3
EXEC sp_execute_external_script
@language =N'R',
-- SQL Part (sends to @script)
@input_data_1 =N 'SELECT 1 as Installed',
-- R Part (gets @input_data_1)
@script=N'OutputDataSet<-InputDataSet'
WITH RESULT SETS
(([Installed] int not null));
GO
Microsoft R ServerLaunchpad
(BxlServer and
SQL Satellite,
Rserver.dll)
08/01/2018 anomaly detection
Agenda
anomaly detection4 08/01/2018
1. Prioritise data quality effort
Cleansing DWH
IoT Streams
Online Learning
2. Unsupervised Measures
Mahalanobis distance
Clustering
Local Outlier Factor
Isolation Forest
Variational AutoEncoders
Novelty, Noise, Outlier, Anomaly, Fraud, Instability
anomaly detection5 08/01/2018
1. Special Observations
in Relation to Baseline
Contextual
2. Suspicious Observations
Novelty
Outlier / Anomaly
Data quality issue
Instable Process
Random Noise
Local - Global
anomaly detection6 08/01/2018
Local - Global
anomaly detection7 08/01/2018
High dimensional: Distance is meaningless

Approach for Detection
anomaly detection8 08/01/2018
1. Statistical Distribution
Entropy
2. Deviation from Normal
Sequence of Events
Conditional (temporal, spatial context)
Collective like DoS-Attack
3. Distance to neighborhood
4. Local Density
5. High-dimensional adaptations
Subspace projection
Angle based
 


Univariate Extreme Values
anomaly detection9 08/01/2018
IQR = Inter quartile range
~ 95%
Median = 50% Percentile
50%
> 2 stdev (~ 2%)
Grubb’s test per point
Scaling by z-Scores
(robust using
Median absolute deviation)
Multivariate data
anomaly detection10 08/01/2018
Robust
Mahalanobis
Distance
chisq.plot()
dimensions
2
No outlier ?
anomaly detection11 08/01/2018
+
SCADA of Wind Turbine
anomaly detection16 08/01/2018
Power Curve – Deviation from Prediction
anomaly detection17 08/01/2018
lm( power ~ wind_speed
+ I(wind_speed^2)
+ I(wind_speed^3) , data)
R2
adj.= 95%
Sample
Boxplot – univariate
anomaly detection18 08/01/2018
Mean
Median
Power
WindSpeed
Temperature
Wind distribution
anomaly detection19 08/01/2018
Mahalanobis Distance
anomaly detection20 08/01/2018
Multivariate:
multi dimensional
Scale:
How many stdev
away from center ?
Mahalanobis Distance
candidates
Outlier in Orginal Space
anomaly detection21 08/01/2018
Eigenvector Space
anomaly detection22 08/01/2018
Overview of Outliers
anomaly detection23 08/01/2018
Leland Wilkinson's probabilistic HDoutlier model
=> for mixture of numeric and categorical variables
1D
2D
3D
4D
HDBSCAN
anomaly detection24 08/01/2018
Only Borderline & Core cases
Scaled by z-Scores
Local Outlier Factor
anomaly detection25 08/01/2018
Reachability distance: It can be "reached" from its neighbors.
LOF: relative Reachability compared to its neighbours
K-Nearest Neighbors
Reachability distance
Hierarchical Clustering
anomaly detection26 08/01/2018
MDS coloured by cuttree
Isolation Forest - Emsemble
anomaly detection27 08/01/2018
Challenges
anomaly detection28 08/01/2018
1. Manual threshold – automatic is expensive
2. Mixed data type (numeric & categorical)
3. High dimensional spaces
4. High Cardinality (Granularity)
5. Multi-Modal: Global vs Local Scope
6. Online
http://projects.rajivshah.com/shiny/outlier/

More Related Content

PDF
[Pervasive systems: Final Project Presentation] The Sparkle Lung System
PDF
Air Pollution in Sofia - Solution through Data Science by Kiwi team
PDF
EUXDAT API Examples
PDF
FIWARE Global Summit - FISMEP: a FIWARE-based Platform for Energy Applications
PPSX
Nuclear Power Plant in Japan- The Evolution
PPTX
agm presentation v2
PDF
Shape functions
DOC
Cmis 102 hands on/tutorialoutlet
[Pervasive systems: Final Project Presentation] The Sparkle Lung System
Air Pollution in Sofia - Solution through Data Science by Kiwi team
EUXDAT API Examples
FIWARE Global Summit - FISMEP: a FIWARE-based Platform for Energy Applications
Nuclear Power Plant in Japan- The Evolution
agm presentation v2
Shape functions
Cmis 102 hands on/tutorialoutlet

Similar to Anomaly detection - database integrated (20)

PDF
IRJET- Accident Detection and Vehicle Safety using Zigbee
PDF
IRJET- Different Data Mining Techniques for Weather Prediction
PPTX
Smartive STORM
PDF
Automated Driving Test and Issuing Of Driving Licenses
PDF
IRJET- Design and Analysis of Passive Multi-Static Radar System
PDF
AIRLINE FARE PRICE PREDICTION
PDF
Smart Helmet using GSM and GPS
PDF
IRJET- Ad-hoc Based Outdoor Positioning System
PDF
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
PDF
Criminal Identification using Arm7
PDF
Toxic Gas Detection-An Experiment with the Simulation Process
PDF
Intelligent traffic light controller using embedded system
PDF
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
PDF
40120130406006
PDF
Sensor Fault Detection in IoT System Using Machine Learning
PDF
Design and Implementation of Test Vector Generation using Random Forest Techn...
PDF
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
PDF
Irjet v7 i3475
PDF
Geo Spatial Data And it’s Quality Assessment
PDF
IRJET- Secure Data on Multi-Cloud using Homomorphic Encryption
IRJET- Accident Detection and Vehicle Safety using Zigbee
IRJET- Different Data Mining Techniques for Weather Prediction
Smartive STORM
Automated Driving Test and Issuing Of Driving Licenses
IRJET- Design and Analysis of Passive Multi-Static Radar System
AIRLINE FARE PRICE PREDICTION
Smart Helmet using GSM and GPS
IRJET- Ad-hoc Based Outdoor Positioning System
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
Criminal Identification using Arm7
Toxic Gas Detection-An Experiment with the Simulation Process
Intelligent traffic light controller using embedded system
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
40120130406006
Sensor Fault Detection in IoT System Using Machine Learning
Design and Implementation of Test Vector Generation using Random Forest Techn...
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
Irjet v7 i3475
Geo Spatial Data And it’s Quality Assessment
IRJET- Secure Data on Multi-Cloud using Homomorphic Encryption
Ad

More from Zurich_R_User_Group (11)

PDF
R at Sanitas - Workflow, Problems and Solutions
PDF
Modeling Bus Bunching
PDF
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
PDF
Introduction to Renjin, the alternative engine for R
PDF
How to use R in different professions: R for Car Insurance Product (Speaker: ...
PDF
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
PDF
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
PDF
Visualization Challenge: Mapping Health During Travel
PDF
Zurich R User group: Desc tools
PDF
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
PDF
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
R at Sanitas - Workflow, Problems and Solutions
Modeling Bus Bunching
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
Introduction to Renjin, the alternative engine for R
How to use R in different professions: R for Car Insurance Product (Speaker: ...
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
Visualization Challenge: Mapping Health During Travel
Zurich R User group: Desc tools
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
Ad

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Transform Your Business with a Software ERP System
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
top salesforce developer skills in 2025.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
medical staffing services at VALiNTRY
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms II-SECS-1021-03
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo Companies in India – Driving Business Transformation.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Transform Your Business with a Software ERP System
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
ai tools demonstartion for schools and inter college
wealthsignaloriginal-com-DS-text-... (1).pdf
Upgrade and Innovation Strategies for SAP ERP Customers
VVF-Customer-Presentation2025-Ver1.9.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Navsoft: AI-Powered Business Solutions & Custom Software Development
How to Migrate SBCGlobal Email to Yahoo Easily
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
top salesforce developer skills in 2025.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

Anomaly detection - database integrated

  • 1. BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Anomaly Detection database integrated Dr. Olaf Nimz
  • 2. Our company. anomaly detection2 08/01/2018 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 3. Scoring Engine for R 3 EXEC sp_execute_external_script @language =N'R', -- SQL Part (sends to @script) @input_data_1 =N 'SELECT 1 as Installed', -- R Part (gets @input_data_1) @script=N'OutputDataSet<-InputDataSet' WITH RESULT SETS (([Installed] int not null)); GO Microsoft R ServerLaunchpad (BxlServer and SQL Satellite, Rserver.dll) 08/01/2018 anomaly detection
  • 4. Agenda anomaly detection4 08/01/2018 1. Prioritise data quality effort Cleansing DWH IoT Streams Online Learning 2. Unsupervised Measures Mahalanobis distance Clustering Local Outlier Factor Isolation Forest Variational AutoEncoders
  • 5. Novelty, Noise, Outlier, Anomaly, Fraud, Instability anomaly detection5 08/01/2018 1. Special Observations in Relation to Baseline Contextual 2. Suspicious Observations Novelty Outlier / Anomaly Data quality issue Instable Process Random Noise
  • 6. Local - Global anomaly detection6 08/01/2018
  • 7. Local - Global anomaly detection7 08/01/2018 High dimensional: Distance is meaningless
  • 8.  Approach for Detection anomaly detection8 08/01/2018 1. Statistical Distribution Entropy 2. Deviation from Normal Sequence of Events Conditional (temporal, spatial context) Collective like DoS-Attack 3. Distance to neighborhood 4. Local Density 5. High-dimensional adaptations Subspace projection Angle based    
  • 9. Univariate Extreme Values anomaly detection9 08/01/2018 IQR = Inter quartile range ~ 95% Median = 50% Percentile 50% > 2 stdev (~ 2%) Grubb’s test per point Scaling by z-Scores (robust using Median absolute deviation)
  • 10. Multivariate data anomaly detection10 08/01/2018 Robust Mahalanobis Distance chisq.plot() dimensions 2
  • 11. No outlier ? anomaly detection11 08/01/2018 +
  • 12. SCADA of Wind Turbine anomaly detection16 08/01/2018
  • 13. Power Curve – Deviation from Prediction anomaly detection17 08/01/2018 lm( power ~ wind_speed + I(wind_speed^2) + I(wind_speed^3) , data) R2 adj.= 95% Sample
  • 14. Boxplot – univariate anomaly detection18 08/01/2018 Mean Median Power WindSpeed Temperature
  • 16. Mahalanobis Distance anomaly detection20 08/01/2018 Multivariate: multi dimensional Scale: How many stdev away from center ? Mahalanobis Distance candidates
  • 17. Outlier in Orginal Space anomaly detection21 08/01/2018
  • 19. Overview of Outliers anomaly detection23 08/01/2018 Leland Wilkinson's probabilistic HDoutlier model => for mixture of numeric and categorical variables 1D 2D 3D 4D
  • 20. HDBSCAN anomaly detection24 08/01/2018 Only Borderline & Core cases Scaled by z-Scores
  • 21. Local Outlier Factor anomaly detection25 08/01/2018 Reachability distance: It can be "reached" from its neighbors. LOF: relative Reachability compared to its neighbours K-Nearest Neighbors Reachability distance
  • 22. Hierarchical Clustering anomaly detection26 08/01/2018 MDS coloured by cuttree
  • 23. Isolation Forest - Emsemble anomaly detection27 08/01/2018
  • 24. Challenges anomaly detection28 08/01/2018 1. Manual threshold – automatic is expensive 2. Mixed data type (numeric & categorical) 3. High dimensional spaces 4. High Cardinality (Granularity) 5. Multi-Modal: Global vs Local Scope 6. Online http://projects.rajivshah.com/shiny/outlier/