This document provides a technical introduction to Hadoop, including:
- Hadoop has been tested on a 4000 node cluster with 32,000 cores and 16 petabytes of storage.
- Key Hadoop concepts are explained, including jobs, tasks, task attempts, mappers, reducers, and the JobTracker and TaskTracker processes.
- The flow of a MapReduce job is described, from the client submitting the job to the JobTracker, TaskTrackers running tasks on data splits using the mapper and reducer classes, and writing outputs.