SlideShare a Scribd company logo
What is Noise in Data Mining?
• Noisy data are data with a large amount of additional meaningless
information called noise. This includes data corruption, and the term is
often used as a synonym for corrupt data. It also includes any data that
a user system cannot understand and interpret correctly.
• Improper procedures (or improperly-documented procedures) to
subtract out the noise in data can lead to a false sense of accuracy or
false conclusions.
• Data = true signal + noise
• Noisy data unnecessarily increases the amount of storage space
required and can adversely affect any data mining analysis results.
x
How to Manage Noisy Data?
Removing noise from a data set is termed data smoothing. The following
ways can be used for Smoothing:
1. Binning
There are three methods for smoothing data in the
bin.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.
• A density-based methods typically measure distances between individual data
points and the rest of their respective groups. This is the approach taken by
density-based methods, where anomalies are defined as observations of low
probability.
• A distance-based outlier detection method consults
the neighborhood of an object, which is defined by a given radius. An
object is then considered an outlier if its neighborhood does not have
enough other points.
• The grid-based method in outlier detection involves dividing the data space
into a grid or many cells. Here Each cell contains a group of data points.
The density and the number of data points in each cell are noted. The cells
with low data point density are identified as outliers.
• Deviation- based can reveal surprising facts hidden inside data.
CMSR Data Miner (Cramer Modeling, Segmentation and Rules )provides tools that can be
used to detect deviations, anomalies, and outliers.

More Related Content

PPTX
Data pre processing
PPT
Unit 3 part ii Data mining
PPTX
1Chapter_ Two_ 2 Data Preparation lecture note.pptx
PPT
Data preprocessing ng
PPT
Data preprocessing ng
PPTX
UNIT-1 Data pre-processing-Data cleaning, Transformation, Reduction, Integrat...
PDF
Data preprocessing
Data pre processing
Unit 3 part ii Data mining
1Chapter_ Two_ 2 Data Preparation lecture note.pptx
Data preprocessing ng
Data preprocessing ng
UNIT-1 Data pre-processing-Data cleaning, Transformation, Reduction, Integrat...
Data preprocessing

Similar to What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise. (20)

PPTX
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
DOC
Data Mining: Data Preprocessing
PPTX
Assignmentdatamining
PPTX
22PCOAM21 Data Quality Session 3 Data Quality.pptx
DOCX
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
DOCX
AnomalyOutlier DetectionWhat are anomaliesoutliersThe set.docx
PPT
Pre-Processing and Data Preparation
PPT
Data1
PPT
Data1
PPT
Datapreprocess
PPT
data mining concepts and techniques and systems
PPTX
chap9_anomaly_detection.pptx
PPT
DATA MININ _ TECHNOLOGY AND TECHNIQUE.ppt
PPTX
Handling noisy data
PPT
summarized best pre-processing techniques
DOC
Data processing
PPT
Data cleaning-outlier-detection
PPT
Preprocessing data mining hhxdzsdsasaasa
PPT
PPT
Pre processing
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
Data Mining: Data Preprocessing
Assignmentdatamining
22PCOAM21 Data Quality Session 3 Data Quality.pptx
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
AnomalyOutlier DetectionWhat are anomaliesoutliersThe set.docx
Pre-Processing and Data Preparation
Data1
Data1
Datapreprocess
data mining concepts and techniques and systems
chap9_anomaly_detection.pptx
DATA MININ _ TECHNOLOGY AND TECHNIQUE.ppt
Handling noisy data
summarized best pre-processing techniques
Data processing
Data cleaning-outlier-detection
Preprocessing data mining hhxdzsdsasaasa
Pre processing
Ad

More from logeswarisaravanan (18)

PPTX
Important operator in Nosql MongoDB .pptx
PPTX
Basics of Buusiness Intelligence using Excelunit 1.pptx
PPTX
Data Warehousing – Core Concepts and Components
PPTX
Infromation & Coding Theory -Linear Feedback Shift.pptx
PPTX
Information & Communication System --Syndrome.pptx
PPTX
1.2 Information & Coding :Information Theory.pptx
PPTX
1.1Information & Coding Theory:Communication System.pptx
PDF
unit II Mining Association Rule.pdf
PDF
Data Mining Appliction chapter 5.pdf
PPT
Chapter 2 Data Preprocessing part3.ppt
PPTX
Introduction-to-DBMS-and-Data-Mining.pptx
PPTX
Introduction-to-Text-Classification.pptx
PPTX
Fundamentals of Data Science: Introduction.pptx
PPTX
UNIT 4 E Introduction to linear model.pptx
PPTX
A Introduction-to-Forms-of-Learning.pptx
PPTX
AI: Introduction-to-Goal-Based-Agents.pptx
PPTX
Artificial Intelligence: Intelligent Agents
PPT
Java introduction
Important operator in Nosql MongoDB .pptx
Basics of Buusiness Intelligence using Excelunit 1.pptx
Data Warehousing – Core Concepts and Components
Infromation & Coding Theory -Linear Feedback Shift.pptx
Information & Communication System --Syndrome.pptx
1.2 Information & Coding :Information Theory.pptx
1.1Information & Coding Theory:Communication System.pptx
unit II Mining Association Rule.pdf
Data Mining Appliction chapter 5.pdf
Chapter 2 Data Preprocessing part3.ppt
Introduction-to-DBMS-and-Data-Mining.pptx
Introduction-to-Text-Classification.pptx
Fundamentals of Data Science: Introduction.pptx
UNIT 4 E Introduction to linear model.pptx
A Introduction-to-Forms-of-Learning.pptx
AI: Introduction-to-Goal-Based-Agents.pptx
Artificial Intelligence: Intelligent Agents
Java introduction
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
web development for engineering and engineering
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
737-MAX_SRG.pdf student reference guides
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
UNIT 4 Total Quality Management .pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
additive manufacturing of ss316l using mig welding
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Well-logging-methods_new................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
web development for engineering and engineering
Mechanical Engineering MATERIALS Selection
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
Safety Seminar civil to be ensured for safe working.
CYBER-CRIMES AND SECURITY A guide to understanding
737-MAX_SRG.pdf student reference guides
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
573137875-Attendance-Management-System-original
additive manufacturing of ss316l using mig welding
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
III.4.1.2_The_Space_Environment.p pdffdf
Well-logging-methods_new................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Fundamentals of safety and accident prevention -final (1).pptx

What is Noise in Data Mining? Noisy data are data with a large amount of additional meaningless information called noise.

  • 1. What is Noise in Data Mining? • Noisy data are data with a large amount of additional meaningless information called noise. This includes data corruption, and the term is often used as a synonym for corrupt data. It also includes any data that a user system cannot understand and interpret correctly. • Improper procedures (or improperly-documented procedures) to subtract out the noise in data can lead to a false sense of accuracy or false conclusions. • Data = true signal + noise • Noisy data unnecessarily increases the amount of storage space required and can adversely affect any data mining analysis results. x
  • 2. How to Manage Noisy Data? Removing noise from a data set is termed data smoothing. The following ways can be used for Smoothing: 1. Binning There are three methods for smoothing data in the bin.
  • 11. • A density-based methods typically measure distances between individual data points and the rest of their respective groups. This is the approach taken by density-based methods, where anomalies are defined as observations of low probability. • A distance-based outlier detection method consults the neighborhood of an object, which is defined by a given radius. An object is then considered an outlier if its neighborhood does not have enough other points. • The grid-based method in outlier detection involves dividing the data space into a grid or many cells. Here Each cell contains a group of data points. The density and the number of data points in each cell are noted. The cells with low data point density are identified as outliers. • Deviation- based can reveal surprising facts hidden inside data. CMSR Data Miner (Cramer Modeling, Segmentation and Rules )provides tools that can be used to detect deviations, anomalies, and outliers.