Anomaly Detection Via PCA

Anomaly Detection
via Online Over-
Sampling Principal
Component Analysis

Guide
NAME USN
Kumara BG 1NT11CS408
Mahesha GR 1NT11CS409
Mallikarjun S 1NT11CS410
Deepak Kumar 1NT10CS129
Ms.Nirmala
Senior lecturer
Dept of CSE

Problem Statement
 We propose an online over-sampling
principal component analysis (osPCA)
algorithm and it is detecting the
presence of outliers from a large
amount of data. Unlike prior PCA
based approaches, we do not store
the entire data matrix or covariance
matrix, and thus our approach is
especially of interest in online or large-
scale problems.

Introduction
 We are drowning in the deluge of data
that are being collected world-wide,
while starving for knowledge at the
same time.
 Anomalous events occur relatively
infrequently

What are Anomalies?
 Anomaly is a pattern in the data that
does not conform to the expected
behaviour
 Also referred to as outliers,
exceptions, peculiarities, surprise, etc.
 Anomalies translate to significant
(often critical) real life entities
◦ Credit card fraud
◦ An abnormally high purchase made on a
credit card

Motivation
National / International Journals

Objectives
 The aim for this project is to detect the
presence of outliers in a very large
sampled data by finding the :
◦ Covariance matrix
◦ EigenValues
◦ EigenVectors, which are the direction of
principal component
◦ Find Coordinates of each point in the
direction of principal component

Hardware Specification:
 Processor - Pentium –IV
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard
Windows Keyboard
 Mouse - Two or Three Button
Mouse

Software Specification
 Operating System :
Windows XP
 Programming Language :
JAVA
 Java Version : JDK 1.6 &
above.
 IDE tool : ECLIPSE

Literature Survey:
 Research Paper Referred :
 Anomaly Detection Via Online Oversampling
Principal Component Analysis by Yuh-Jye Lee,
Yi-Ren Yeh and Yu-Chiang Frank Wang
 Other References:
 A Survey on Intrusion Detection Using
Outlier Detection Techniques by V.
Gunamani, M. Abarna

Algorithm- Principal Component
Analysis :
 PCA is a dimension reduction method.
 PCA is sensitive to outliers and we
only need few principal components to
represent the main data structure.
 An outlier or a deviated instance will
cause a larger effect on these
principal directions.
 With PCA outliers are detected by
means of “Leave One Out” procedure
.

 We explore the variation of the principal
directions with removing or adding a data
point and use this information to identify
outliers and detect new arriving deviated
data
 The effect of LOO with a particular data
may be diminished when the size of the
data is large.
 An outlier via LOO strategy, we duplicate
the target instance instead of removing it.
 Finally, we duplicate the target instance
many times (10% of the whole data in our
experiments) and observe how much
variation do the principal directions

Implementation:
 It includes two steps :
 Data Cleaning Phase
 On-line Anomaly Detection Phase
 Data Cleaning Phase :The osPCA is applied
for the data set for finding the principal direction. In
this method the target instance will be duplicated
multiple times, and the idea is to amplify the effect of
outlier rather than that of normal data. After that using
Leave One Out (LOO) strategy, the angle difference
will be identified. In which if we add or remove one
data instance, the direction will be changed.

 On-line Anomaly Detection Phase : In
the on-line anomaly detection phase,
the goal is to identify the new arriving
abnormal instance. The quick
updating of the principal directions
given in this approach can satisfy the
on-line detecting demand. A new
arriving instance will be marked .

Outcomes
 We have explored the variation of
principal directions in the leave one
out scenario.
 We demonstrated that the variation of
principal directions caused by outliers
indeed can help us to detect the
anomaly.
 The over-sampling PCA to enlarge the
outlierness of an outlier.

Conclusion :
 This project has attempted to establish the
significance of anomaly detection using
osPCA technique.
 Our method does not need to keep the
entire covariance or data matrices during
the online detection process.
 Compared with other anomaly detection
methods, our approach is able to achieve
satisfactory results while significantly
reducing computational costs and memory
requirements.

Future Enhancement :
 In this Project we are working on a
particular data set that we got from an
online website but in future we’ll work
on any data set to detect the
anomalies.

Anomaly Detection Via PCA

More Related Content

What's hot (19)

Similar to Anomaly Detection Via PCA (20)

Recently uploaded (20)

Anomaly Detection Via PCA