SlideShare a Scribd company logo
Anomaly Detection
via Online Over-
Sampling Principal
Component Analysis
Guide
NAME USN
Kumara BG 1NT11CS408
Mahesha GR 1NT11CS409
Mallikarjun S 1NT11CS410
Deepak Kumar 1NT10CS129
Ms.Nirmala
Senior lecturer
Dept of CSE
Problem Statement
 We propose an online over-sampling
principal component analysis (osPCA)
algorithm and it is detecting the
presence of outliers from a large
amount of data. Unlike prior PCA
based approaches, we do not store
the entire data matrix or covariance
matrix, and thus our approach is
especially of interest in online or large-
scale problems.
Introduction
 We are drowning in the deluge of data
that are being collected world-wide,
while starving for knowledge at the
same time.
 Anomalous events occur relatively
infrequently
What are Anomalies?
 Anomaly is a pattern in the data that
does not conform to the expected
behaviour
 Also referred to as outliers,
exceptions, peculiarities, surprise, etc.
 Anomalies translate to significant
(often critical) real life entities
◦ Credit card fraud
◦ An abnormally high purchase made on a
credit card
Motivation
National / International Journals
Objectives
 The aim for this project is to detect the
presence of outliers in a very large
sampled data by finding the :
◦ Covariance matrix
◦ EigenValues
◦ EigenVectors, which are the direction of
principal component
◦ Find Coordinates of each point in the
direction of principal component
Hardware Specification:
 Processor - Pentium –IV
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard
Windows Keyboard
 Mouse - Two or Three Button
Mouse
Software Specification
 Operating System :
Windows XP
 Programming Language :
JAVA
 Java Version : JDK 1.6 &
above.
 IDE tool : ECLIPSE
Literature Survey:
 Research Paper Referred :
 Anomaly Detection Via Online Oversampling
Principal Component Analysis by Yuh-Jye Lee,
Yi-Ren Yeh and Yu-Chiang Frank Wang
 Other References:
 A Survey on Intrusion Detection Using
Outlier Detection Techniques by V.
Gunamani, M. Abarna
Design Of the Project :
Algorithm- Principal Component
Analysis :
 PCA is a dimension reduction method.
 PCA is sensitive to outliers and we
only need few principal components to
represent the main data structure.
 An outlier or a deviated instance will
cause a larger effect on these
principal directions.
 With PCA outliers are detected by
means of “Leave One Out” procedure
.
 We explore the variation of the principal
directions with removing or adding a data
point and use this information to identify
outliers and detect new arriving deviated
data
 The effect of LOO with a particular data
may be diminished when the size of the
data is large.
 An outlier via LOO strategy, we duplicate
the target instance instead of removing it.
 Finally, we duplicate the target instance
many times (10% of the whole data in our
experiments) and observe how much
variation do the principal directions
Implementation:
 It includes two steps :
 Data Cleaning Phase
 On-line Anomaly Detection Phase
 Data Cleaning Phase :The osPCA is applied
for the data set for finding the principal direction. In
this method the target instance will be duplicated
multiple times, and the idea is to amplify the effect of
outlier rather than that of normal data. After that using
Leave One Out (LOO) strategy, the angle difference
will be identified. In which if we add or remove one
data instance, the direction will be changed.
 On-line Anomaly Detection Phase : In
the on-line anomaly detection phase,
the goal is to identify the new arriving
abnormal instance. The quick
updating of the principal directions
given in this approach can satisfy the
on-line detecting demand. A new
arriving instance will be marked .
Snapshots :
Anomaly Detection Via PCA
Anomaly Detection Via PCA
Outcomes
 We have explored the variation of
principal directions in the leave one
out scenario.
 We demonstrated that the variation of
principal directions caused by outliers
indeed can help us to detect the
anomaly.
 The over-sampling PCA to enlarge the
outlierness of an outlier.
Conclusion :
 This project has attempted to establish the
significance of anomaly detection using
osPCA technique.
 Our method does not need to keep the
entire covariance or data matrices during
the online detection process.
 Compared with other anomaly detection
methods, our approach is able to achieve
satisfactory results while significantly
reducing computational costs and memory
requirements.
Future Enhancement :
 In this Project we are working on a
particular data set that we got from an
online website but in future we’ll work
on any data set to detect the
anomalies.
Thank You

More Related Content

PPTX
Chapter 10 Anomaly Detection
PPTX
Outlier analysis and anomaly detection
PDF
Outlier Detection
PPTX
Anomaly Detection
PDF
Anomaly Detection
PPTX
Anomaly Detection Technique
PPTX
Anomaly detection
PPTX
Data Mining: Outlier analysis
Chapter 10 Anomaly Detection
Outlier analysis and anomaly detection
Outlier Detection
Anomaly Detection
Anomaly Detection
Anomaly Detection Technique
Anomaly detection
Data Mining: Outlier analysis

What's hot (19)

PDF
Introduction to unsupervised learning: outlier detection
PPT
Data cleaning-outlier-detection
PPTX
Outlier detection handling
PDF
PDF
Anomaly detection
PDF
12 outlier
PPT
PPT
Chap10 Anomaly Detection
PPTX
Anomaly Detection for Real-World Systems
PDF
Anomaly Detection: A Survey
PDF
Anomaly detection Workshop slides
PDF
Anomaly detection
PDF
An Introduction to Anomaly Detection
PDF
Missing data handling
PPTX
Missing Data and data imputation techniques
PPTX
Anomaly detection workshop
PDF
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PPTX
Anomaly detection- Credit Card Fraud Detection
PPTX
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Introduction to unsupervised learning: outlier detection
Data cleaning-outlier-detection
Outlier detection handling
Anomaly detection
12 outlier
Chap10 Anomaly Detection
Anomaly Detection for Real-World Systems
Anomaly Detection: A Survey
Anomaly detection Workshop slides
Anomaly detection
An Introduction to Anomaly Detection
Missing data handling
Missing Data and data imputation techniques
Anomaly detection workshop
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Anomaly detection- Credit Card Fraud Detection
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Ad

Similar to Anomaly Detection Via PCA (20)

PDF
Anomaly Detection using multidimensional reduction Principal Component Analysis
DOCX
Anomaly detection via online over sampling principal component analysis
PPTX
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
PDF
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
PPTX
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PPTX
Rus agro elpis
PPTX
5_6062260451842985429.pptx machine learning
PDF
Ijecet 06 09_007
PDF
Detection of Outliers in Large Dataset using Distributed Approach
PPTX
swatiVCprsentation artificial learning and machine learning.pptx
PDF
Anomaly detection via eliminating data redundancy and rectifying data error i...
PDF
IRJET- Probability based Missing Value Imputation Method and its Analysis
PDF
New Developments In Unsupervised Outlier Detection Algorithms And Application...
PDF
Outlier Detection Using Unsupervised Learning on High Dimensional Data
PDF
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
Lung-Cancer-Detection-Simple-Project-Using-Neural-Network.pdf
PDF
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly detection via online over sampling principal component analysis
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
Rus agro elpis
5_6062260451842985429.pptx machine learning
Ijecet 06 09_007
Detection of Outliers in Large Dataset using Distributed Approach
swatiVCprsentation artificial learning and machine learning.pptx
Anomaly detection via eliminating data redundancy and rectifying data error i...
IRJET- Probability based Missing Value Imputation Method and its Analysis
New Developments In Unsupervised Outlier Detection Algorithms And Application...
Outlier Detection Using Unsupervised Learning on High Dimensional Data
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
Lung-Cancer-Detection-Simple-Project-Using-Neural-Network.pdf
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Anomaly Detection Via PCA

  • 1. Anomaly Detection via Online Over- Sampling Principal Component Analysis
  • 2. Guide NAME USN Kumara BG 1NT11CS408 Mahesha GR 1NT11CS409 Mallikarjun S 1NT11CS410 Deepak Kumar 1NT10CS129 Ms.Nirmala Senior lecturer Dept of CSE
  • 3. Problem Statement  We propose an online over-sampling principal component analysis (osPCA) algorithm and it is detecting the presence of outliers from a large amount of data. Unlike prior PCA based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large- scale problems.
  • 4. Introduction  We are drowning in the deluge of data that are being collected world-wide, while starving for knowledge at the same time.  Anomalous events occur relatively infrequently
  • 5. What are Anomalies?  Anomaly is a pattern in the data that does not conform to the expected behaviour  Also referred to as outliers, exceptions, peculiarities, surprise, etc.  Anomalies translate to significant (often critical) real life entities ◦ Credit card fraud ◦ An abnormally high purchase made on a credit card
  • 7. Objectives  The aim for this project is to detect the presence of outliers in a very large sampled data by finding the : ◦ Covariance matrix ◦ EigenValues ◦ EigenVectors, which are the direction of principal component ◦ Find Coordinates of each point in the direction of principal component
  • 8. Hardware Specification:  Processor - Pentium –IV  RAM - 256 MB(min)  Hard Disk - 20 GB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse
  • 9. Software Specification  Operating System : Windows XP  Programming Language : JAVA  Java Version : JDK 1.6 & above.  IDE tool : ECLIPSE
  • 10. Literature Survey:  Research Paper Referred :  Anomaly Detection Via Online Oversampling Principal Component Analysis by Yuh-Jye Lee, Yi-Ren Yeh and Yu-Chiang Frank Wang  Other References:  A Survey on Intrusion Detection Using Outlier Detection Techniques by V. Gunamani, M. Abarna
  • 11. Design Of the Project :
  • 12. Algorithm- Principal Component Analysis :  PCA is a dimension reduction method.  PCA is sensitive to outliers and we only need few principal components to represent the main data structure.  An outlier or a deviated instance will cause a larger effect on these principal directions.  With PCA outliers are detected by means of “Leave One Out” procedure .
  • 13.  We explore the variation of the principal directions with removing or adding a data point and use this information to identify outliers and detect new arriving deviated data  The effect of LOO with a particular data may be diminished when the size of the data is large.  An outlier via LOO strategy, we duplicate the target instance instead of removing it.  Finally, we duplicate the target instance many times (10% of the whole data in our experiments) and observe how much variation do the principal directions
  • 14. Implementation:  It includes two steps :  Data Cleaning Phase  On-line Anomaly Detection Phase  Data Cleaning Phase :The osPCA is applied for the data set for finding the principal direction. In this method the target instance will be duplicated multiple times, and the idea is to amplify the effect of outlier rather than that of normal data. After that using Leave One Out (LOO) strategy, the angle difference will be identified. In which if we add or remove one data instance, the direction will be changed.
  • 15.  On-line Anomaly Detection Phase : In the on-line anomaly detection phase, the goal is to identify the new arriving abnormal instance. The quick updating of the principal directions given in this approach can satisfy the on-line detecting demand. A new arriving instance will be marked .
  • 19. Outcomes  We have explored the variation of principal directions in the leave one out scenario.  We demonstrated that the variation of principal directions caused by outliers indeed can help us to detect the anomaly.  The over-sampling PCA to enlarge the outlierness of an outlier.
  • 20. Conclusion :  This project has attempted to establish the significance of anomaly detection using osPCA technique.  Our method does not need to keep the entire covariance or data matrices during the online detection process.  Compared with other anomaly detection methods, our approach is able to achieve satisfactory results while significantly reducing computational costs and memory requirements.
  • 21. Future Enhancement :  In this Project we are working on a particular data set that we got from an online website but in future we’ll work on any data set to detect the anomalies.