This document describes a method for improving classifier accuracy using unlabeled data in addition to a small set of labeled data. The algorithm builds an initial classifier using just the labeled data, then uses that classifier to label a larger set of unlabeled data. A new classifier is then built using both the original labeled data and the now labeled unlabeled data. Experimental results using three common learning algorithms (neural networks, Naive Bayes, C4.5) on 10 datasets show average accuracy improvements of 5%, 3%, and 8% respectively when incorporating unlabeled data. The results indicate that leveraging unlabeled data can significantly boost classifier performance when labeled data is limited.
Related topics: