This document describes HaLoop, a system that extends MapReduce to efficiently support iterative data processing on large clusters. HaLoop introduces caching mechanisms that allow loop-invariant data to be accessed without reloading or reshuffling between iterations. This improves performance for iterative algorithms like PageRank, transitive closure, and k-means clustering. The largest gains come from caching invariant data in the reducer input cache to avoid unnecessary loading and shuffling. HaLoop also eliminates extra MapReduce jobs for termination checking in some cases. Overall, HaLoop shows that minimal extensions to MapReduce can efficiently support a wide range of recursive programs and languages on large-scale clusters.