This document summarizes a study on using the Mahout machine learning library to perform K-means clustering on large datasets using Hadoop. The study tested the performance of Mahout K-means clustering on Amazon EC2 instances using a 1.1GB network intrusion detection dataset. The results showed that Mahout was able to scale to utilize multiple nodes and significantly reduce clustering time as the dataset and number of nodes increased. Specifically, clustering time was reduced by over 350% when using 5 nodes compared to a single node for the full 1.1GB dataset. The quality of the clustering, as measured by the sum of squared distances between centroids, was maintained as Mahout was able to leverage Hadoop and multiple nodes to efficiently perform