The document presents a multi-core k-means implementation that enhances performance for big data through MIMD and SIMD parallelism, avoiding branching operations. It evaluates the performance using synthetic and real datasets, demonstrating the efficiency of techniques like AVX and backpacked cluster ID coding. Performance metrics show significant improvements with increased threading and optimized coding strategies, providing insights into scalability and processing time.
Related topics: