The document summarizes a system for analyzing very large datasets distributed across multiple machines in parallel. It separates analysis into two phases - a filtering phase where a query is expressed in a new programming language to emit intermediate data for each record, and an aggregation phase where the intermediate data is aggregated across machines. The system is designed to exploit parallelism across hundreds or thousands of machines by distributing both the filtering and aggregation phases. It introduces the Sawzall programming language used in the filtering phase and describes how analyses are executed in a fault-tolerant manner at large scale.