SlideShare a Scribd company logo
VENKAT PROJECTS
Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110
Website: www.venkatjavaprojects.com What‘s app: +91 9966499110
Local Dynamic Neighborhood Based Outlier Detection Approach
and its Framework for Large-Scale Datasets
ABSTARCT :
Local outlier detection is a hot area and great challenge in data mining, especially for large-scale
datasets. On the one hand, traditional algorithms often achieve low-quality detection results and
are sensitive to neighborhood size. On the other hand, they are infeasible for large-scale datasets
due to at least O(N2 ) time and space complexity. In light of these, we propose a new local
outlier detection algorithm, which is designed based on a new stable neighborhood strategy-
dynamic references nearest neighbors (DRNN). Meanwhile, we present a new detection
framework by combining the proposed approach and k-mean for large-scale datasets.
Experimental results demonstrate that the proposed algorithm can produce higher quality and
robust detection results compared to several classic methods. Meanwhile, the new detection
framework is able to significantly improve detecting efficiency without sacrificing accuracy.
VENKAT PROJECTS
Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110
Website: www.venkatjavaprojects.com What‘s app: +91 9966499110
EXITING SYSTEM :
The local outlier factor (LOF) [14] is the most well-known local outlier detection algorithm and
introduce the idea of local outlier first. LOF can be considered as a ratio of local densities, and a
higher LOF value indicates more likely of a local outlier. Based on the idea of LOF, many
variants of it have been proposed, such as connectivity-based outlier factor (COF) [15], local
correlation integral (LOCI) [16], influenced outlierness (INFLO) [17], local outlier probability
(LoOP) [18], local distance-based outlier factor(LDOF) [19], and so on. COF is similar to LOF,
but the former estimates the local density of a data record using a set based nearest trail (SBN-
trail) approach. Compared to LOF, COF indicates how far away a data instance shifts from a
pattern. A interesting contribution of LOCI is the LOCI plot, which summarizes a wealth of
information about the data in the vicinity of the point, and provides an intuitive understanding of
why specific data points should be recognized as outliers. When a dataset contains clusters with
different densities and they are close to each other, some traditional methods, such as LOF,
would fail to score the points at the borders of the clusters. In INFLO,both the kNN and Reverse
k-nearest neighbors (RkNN) [20,21] are combined to compute the outlier score. By using this
strategy, outliers between clusters with different densities would be detected more accurately. To
address the issue of threshold selection, LoOP uses a useful idea of outputting an anomaly
probability instead of a outlier score for a data point. Due to implicit data patterns and parameter
setting issues, existing outlier detection algorithms are ineffective on scattered real-world
datasets. Zhang et al. proposed a novel LDOF method to measure the outlier scores of objects in
scattered datasets, which uses the relative location of an instance to its kNN neighbors to
determine the degree to which the instance deviates from its neighborhood. These methods
mentioned above usually use kNN or RkNN to measure their neighborhoods. In recent years,
Zhu et al. proposed natural neighbors (NaN) [22] and natural outlier factor (NOF) [23] to
improve the robustness of local detection. The NaN is designed by integrating kNN and RkNN
when the stable searching state is satisfied, and the NOF algorithm can detection outliers without
parameter k.
DISADVANTAGES OF EXISTING SYSTEM :
1) Less accuracy
2)low Efficiency
VENKAT PROJECTS
Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110
Website: www.venkatjavaprojects.com What‘s app: +91 9966499110
PROPOSED SYSTEM:
In this section, the proposed algorithm (LDNOD) and its framework (LDNOD-km) are
introduced in details. The LDNOD can produce high-quality and robust detection results.
Meanwhile, the detection framework by integrating LDNOD with k-means can handle larger-
scale datasets efficiently without sacrificing accuracy.
ADVANTAGES OF PROPOSED SYSTEM :
1) High accuracy
2)High efficiency
VENKAT PROJECTS
Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110
Website: www.venkatjavaprojects.com What‘s app: +91 9966499110
SYSTEM ARCHITECTURE :
VENKAT PROJECTS
Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110
Website: www.venkatjavaprojects.com What‘s app: +91 9966499110
HARDWARE & SOFTWARE REQUIREMENTS:
HARD REQUIRMENTS :
 System : i3 or above
 Ram : 4GB Ram. 
 Hard disk : 40GB
SOFTWARE REQUIRMENTS :
 Operating system : Windows
 Coding Language : python

More Related Content

PDF
Labreport
PDF
A framework for outlier detection in
PDF
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
PDF
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
PDF
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
PDF
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
DOCX
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
PPTX
Anomaly Detection
Labreport
A framework for outlier detection in
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Anomaly Detection

Similar to 4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWORK FOR LARGE-SCALE DATASETS.docx (20)

PPTX
Anomaly Detection
PPTX
Anomaly Detection
PDF
Detecting outliers and anomalies in data streams
PPT
Chap10 Anomaly Detection
PDF
Detection of Outliers in Large Dataset using Distributed Approach
PDF
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
PDF
Outlier Detection Using Unsupervised Learning on High Dimensional Data
PDF
angle based outlier de
PDF
Kdd08 abod
ODP
Local Outlier Factor
PDF
Multiple Linear Regression Models in Outlier Detection
PDF
G44093135
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
PDF
Outlier Detection Approaches in Data Mining
PPTX
Dik seminar
PPT
3.7 outlier analysis
PDF
Outlier detection method introduction
Anomaly Detection
Anomaly Detection
Detecting outliers and anomalies in data streams
Chap10 Anomaly Detection
Detection of Outliers in Large Dataset using Distributed Approach
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
Outlier Detection Using Unsupervised Learning on High Dimensional Data
angle based outlier de
Kdd08 abod
Local Outlier Factor
Multiple Linear Regression Models in Outlier Detection
G44093135
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
Outlier Detection Approaches in Data Mining
Dik seminar
3.7 outlier analysis
Outlier detection method introduction
Ad

More from Venkat Projects (20)

DOCX
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
DOCX
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
DOCX
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
DOCX
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
DOCX
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
DOCX
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
DOCX
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
DOCX
WATERMARKING IMAGES
DOCX
Application and evaluation of a K-Medoidsbased shape clustering method for an...
DOCX
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
DOCX
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
DOCX
2022 PYTHON MAJOR PROJECTS LIST.docx
DOCX
2022 PYTHON PROJECTS LIST.docx
DOCX
2021 PYTHON PROJECTS LIST.docx
DOCX
2021 python projects list
DOCX
10.sentiment analysis of customer product reviews using machine learni
DOCX
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
DOCX
6.iris recognition using machine learning technique
DOCX
5.local community detection algorithm based on minimal cluster
DOCX
4.detection of fake news through implementation of data science application
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
WATERMARKING IMAGES
Application and evaluation of a K-Medoidsbased shape clustering method for an...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
2022 PYTHON MAJOR PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx
2021 python projects list
10.sentiment analysis of customer product reviews using machine learni
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
6.iris recognition using machine learning technique
5.local community detection algorithm based on minimal cluster
4.detection of fake news through implementation of data science application
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPT
Project quality management in manufacturing
PDF
composite construction of structures.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT 4 Total Quality Management .pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
Internet of Things (IOT) - A guide to understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Strings in CPP - Strings in C++ are sequences of characters used to store and...
web development for engineering and engineering
Digital Logic Computer Design lecture notes
Operating System & Kernel Study Guide-1 - converted.pdf
Mechanical Engineering MATERIALS Selection
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Arduino robotics embedded978-1-4302-3184-4.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Lesson 3_Tessellation.pptx finite Mathematics
Project quality management in manufacturing
composite construction of structures.pdf

4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWORK FOR LARGE-SCALE DATASETS.docx

  • 1. VENKAT PROJECTS Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110 Website: www.venkatjavaprojects.com What‘s app: +91 9966499110 Local Dynamic Neighborhood Based Outlier Detection Approach and its Framework for Large-Scale Datasets ABSTARCT : Local outlier detection is a hot area and great challenge in data mining, especially for large-scale datasets. On the one hand, traditional algorithms often achieve low-quality detection results and are sensitive to neighborhood size. On the other hand, they are infeasible for large-scale datasets due to at least O(N2 ) time and space complexity. In light of these, we propose a new local outlier detection algorithm, which is designed based on a new stable neighborhood strategy- dynamic references nearest neighbors (DRNN). Meanwhile, we present a new detection framework by combining the proposed approach and k-mean for large-scale datasets. Experimental results demonstrate that the proposed algorithm can produce higher quality and robust detection results compared to several classic methods. Meanwhile, the new detection framework is able to significantly improve detecting efficiency without sacrificing accuracy.
  • 2. VENKAT PROJECTS Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110 Website: www.venkatjavaprojects.com What‘s app: +91 9966499110 EXITING SYSTEM : The local outlier factor (LOF) [14] is the most well-known local outlier detection algorithm and introduce the idea of local outlier first. LOF can be considered as a ratio of local densities, and a higher LOF value indicates more likely of a local outlier. Based on the idea of LOF, many variants of it have been proposed, such as connectivity-based outlier factor (COF) [15], local correlation integral (LOCI) [16], influenced outlierness (INFLO) [17], local outlier probability (LoOP) [18], local distance-based outlier factor(LDOF) [19], and so on. COF is similar to LOF, but the former estimates the local density of a data record using a set based nearest trail (SBN- trail) approach. Compared to LOF, COF indicates how far away a data instance shifts from a pattern. A interesting contribution of LOCI is the LOCI plot, which summarizes a wealth of information about the data in the vicinity of the point, and provides an intuitive understanding of why specific data points should be recognized as outliers. When a dataset contains clusters with different densities and they are close to each other, some traditional methods, such as LOF, would fail to score the points at the borders of the clusters. In INFLO,both the kNN and Reverse k-nearest neighbors (RkNN) [20,21] are combined to compute the outlier score. By using this strategy, outliers between clusters with different densities would be detected more accurately. To address the issue of threshold selection, LoOP uses a useful idea of outputting an anomaly probability instead of a outlier score for a data point. Due to implicit data patterns and parameter setting issues, existing outlier detection algorithms are ineffective on scattered real-world datasets. Zhang et al. proposed a novel LDOF method to measure the outlier scores of objects in scattered datasets, which uses the relative location of an instance to its kNN neighbors to determine the degree to which the instance deviates from its neighborhood. These methods mentioned above usually use kNN or RkNN to measure their neighborhoods. In recent years, Zhu et al. proposed natural neighbors (NaN) [22] and natural outlier factor (NOF) [23] to improve the robustness of local detection. The NaN is designed by integrating kNN and RkNN when the stable searching state is satisfied, and the NOF algorithm can detection outliers without parameter k. DISADVANTAGES OF EXISTING SYSTEM : 1) Less accuracy 2)low Efficiency
  • 3. VENKAT PROJECTS Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110 Website: www.venkatjavaprojects.com What‘s app: +91 9966499110 PROPOSED SYSTEM: In this section, the proposed algorithm (LDNOD) and its framework (LDNOD-km) are introduced in details. The LDNOD can produce high-quality and robust detection results. Meanwhile, the detection framework by integrating LDNOD with k-means can handle larger- scale datasets efficiently without sacrificing accuracy. ADVANTAGES OF PROPOSED SYSTEM : 1) High accuracy 2)High efficiency
  • 4. VENKAT PROJECTS Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110 Website: www.venkatjavaprojects.com What‘s app: +91 9966499110 SYSTEM ARCHITECTURE :
  • 5. VENKAT PROJECTS Email:venkatjavaprojects@gmail.com Mobile No: +91 9966499110 Website: www.venkatjavaprojects.com What‘s app: +91 9966499110 HARDWARE & SOFTWARE REQUIREMENTS: HARD REQUIRMENTS :  System : i3 or above  Ram : 4GB Ram.  Hard disk : 40GB SOFTWARE REQUIRMENTS :  Operating system : Windows  Coding Language : python