The document discusses a supervised learning algorithm developed for classifying research articles using Apache Spark, targeting large-scale high-dimensional data. It utilizes the Open Academic Graph dataset, containing 167 million publications, and incorporates various features such as keywords and author history to enhance classification accuracy. The proposed method outperforms existing classifiers in Spark's MLlib, achieving higher accuracy and efficiency in both original and reduced feature spaces.
Related topics: