This paper analyzes k-nearest neighbor (kNN) operations in the context of MapReduce, presenting a comprehensive comparison of existing methodologies both theoretically and experimentally. It identifies three main steps for kNN computation: data preprocessing, partitioning, and computation, while assessing various algorithms based on load balancing, accuracy, and complexity. Extensive experiments demonstrate the impact of data volume and dimensions on the algorithms' performance, providing insights useful for practical applications in big data environments.
Related topics: