SlideShare a Scribd company logo
Ágnes Salánki
salanki.agnes@gmail.com
Budapest BI Forum 2015
 PhD student in Computer Engineering
 Fault Tolerant Systems Research Group
 Availability of 99.99%
 2011: „We need a method to detect erroneous
observations.”
 PhD. student in Computer Engineering
 Fault Tolerant Systems Research Group
 Availability of 99.99%
 2011: „We need a method to detect erroneous
observations.”
 PhD. student in Computer Engineering
 Fault Tolerant Systems Research Group
 Availability of 99.99%
 2011: „We need a method to detect erroneous
observations.”
 PhD. student in Computer Engineering
 Fault Tolerant Systems Research Group
 Availability of 99.99%
 2011: „We need a method to detect erroneous
observations.”
„An outlier is an observation which deviates
so much from the other observations as to
arouse suspicions that it was generated by a
different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates
so much from the other observations as to
arouse suspicions that it was generated by a
different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates
so much from the other observations as to
arouse suspicions that it was generated by a
different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates
so much from the other observations as to
arouse suspicions that it was generated by a
different mechanism” (Hawkins 1980)
Looking for Something Special -- Outlier Detection in R
isodepth mve db lof
1970 1980 1990 2000 2010
mcd
bacon s-h-esd
fast-mcd
isodepth mve db lof
1970 1980 1990 2000 2010
mcd
bacon s-h-esd
fast-mcd
 PISA 2012 results
 Children’s math and reading scores by country
 PISA 2012 results
 Children’s math and reading scores by country
 PISA 2012 results
 Children’s math and reading scores by country
China-
Shanghai
Quatar
Peru
Japan
Indonesia
Colombia
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
China-
Shanghai
Kazakhstan
Japan
Costa Rica
Colombia
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1987
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1987
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1987
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
China-
Shanghai
Kazakhstan
Montenegro
Peru
Albania
Quatar
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1987
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
China-
Shanghai
Japan
Costa
Rica
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
1970
1980
1990
2000
2010
isodepth
mve
db
lof
China-
Shanghai
Macao
Liechtenstein
 Hans Rosling’s TED talk in 2006
 Still one of the most popular talks (as of Oct. 2015)
Looking for Something Special -- Outlier Detection in R
Happy
families are all
alike;
every unhappy
family is
unhappy in its
own way.
/Anna Karenina/
Looking for Something Special -- Outlier Detection in R
Looking for Something Special -- Outlier Detection in R
Looking for Something Special -- Outlier Detection in R
 Fault Tolerant Systems Research Group
 Outliers: high communication workload
 Only planned system maintenance with moving lots of
data
isodepth mve db lof
1970 1980 1990 2000 2010
mcd
bacon s-h-esd
fast-mcd
Algorithm R Scikit-learn Rapidminer WEKA ELKI
isodepth  
MVE  
DB  
LOF    
salankia
 R packages: depth, fields, robustX, DMwR
 Pictures
 Forest Gump, Judit Polgár, Garry Kasparov
 Outlier detection applications in finance, security,
medicine, police surveillance
 1977 and 1987 pictures
 Github code:
https://github.com/salankia/OutlierDetection-
Budapest-BI-2015

More Related Content

PDF
The Streamgraph R package
PDF
Anomáliadetektálás R-ben
PPTX
Research: Documentary publicity shots
PPTX
Sensing Opportunities and Zero Effort Applications for Mobile Health Persuasion
PPT
Running Hot October 2008
PDF
Cuss Language Research Paper
PDF
DevDay 2016: Dave Farley - The Rationale for Continuous Delivery
PPT
The ever existing gaps in indian research –
The Streamgraph R package
Anomáliadetektálás R-ben
Research: Documentary publicity shots
Sensing Opportunities and Zero Effort Applications for Mobile Health Persuasion
Running Hot October 2008
Cuss Language Research Paper
DevDay 2016: Dave Farley - The Rationale for Continuous Delivery
The ever existing gaps in indian research –

Recently uploaded (20)

PDF
Microsoft 365 products and services descrption
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Microsoft Core Cloud Services powerpoint
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Introduction to Data Science and Data Analysis
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
[EN] Industrial Machine Downtime Prediction
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Introduction to Inferential Statistics.pptx
PPT
Predictive modeling basics in data cleaning process
Microsoft 365 products and services descrption
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Microsoft Core Cloud Services powerpoint
DU, AIS, Big Data and Data Analytics.ppt
CYBER SECURITY the Next Warefare Tactics
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
A Complete Guide to Streamlining Business Processes
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Introduction to Data Science and Data Analysis
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
[EN] Industrial Machine Downtime Prediction
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
IMPACT OF LANDSLIDE.....................
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Introduction to Inferential Statistics.pptx
Predictive modeling basics in data cleaning process
Ad
Ad

Looking for Something Special -- Outlier Detection in R