Imbalanced multiple noisy labeling

Imbalanced Multiple Noisy Labeling
Abstract:
It can be easy to collect multiple noisy labels for the same object via
Internet-based crowd sourcing systems. Labelers may have bias when
labeling, due to lacking expertise, dedication, and personal preference.
These cause Imbalanced Multiple Noisy Labeling. In most cases, we have
no information about the labeling qualities of labelers and the underlying
class distributions. It is important to design agnostic solutions to utilize
these noisy labels for supervised learning. We first investigate how
imbalanced multiple noisy labeling affects the class distributions of
training sets and the performance of classification. Then, an agnostic
algorithm Positive LAbel frequency Threshold (PLAT) is proposed to deal
with the imbalanced labeling issue. Simulations on eight UCI data sets with
different underlying class distributions show that PLAT not only
effectively deals with the imbalanced multiple noisy labeling problems that
off-the-shelf agnostic methods cannot cope with, but also performs nearly
the same as majority voting under the circumstances without imbalance.
We also apply PLAT to eight real-world data sets with imbalanced labels
collected from Amazon Mechanical Turk, and the experimental results
show that PLAT is efficient and better than other ground truth inference
algorithms.

Existing System:
Previous work implicitly assumed that mislabeling is uniformly distributed
across entire data points, and concluded that as long as the labeling quality
is greater than 50 percent (higher than randomly guessing in binary
labeling), the eventual integrated labeling quality and the performance of
the learned model are improved if more labels are obtained. However, the
reality is that mislabeling is usually not uniformly distributed. Because of
lacking the expert knowledge, personal preference or some other factors,
most labelers tend to make shallow determination by common sense or
simply repeat what others say. These cause Imbalanced Multiple Noisy
Labeling. Taking binary classification for example, we usually confirm that
labeling on the minority is error-prone. Thus, in our study we treat the
minority as the positive examples.
Proposed System:
The goal of our work is to generate a good training set, in which the
integrated labels of examples are as close as possible to their true values. To
deal with this problem, we propose an agnostic algorithm to use these
skewed labels to induce an integrated label for each example.
It solves the problem that the minority (assuming the positive) examples in
the training set occur rarely, because of imbalanced multiple noisy labeling.

Hardware Requirements:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• RAM : 256 Mb.
Software Requirements:
• Operating system : - Windows XP.
• Front End : - JSP
• Back End : - SQL Server
Software Requirements:
• Operating system : - Windows XP.
• Front End : - .Net
• Back End : - SQL Server

Imbalanced multiple noisy labeling

More Related Content

Viewers also liked (9)

Similar to Imbalanced multiple noisy labeling (20)

More from ieeepondy (20)

Recently uploaded (20)

Imbalanced multiple noisy labeling