This document discusses data management on Hadoop at Yahoo. It notes that data volumes are growing steadily, with over 30 terabytes of data moving into Hadoop grids per day. It describes a Data Acquisition Service that replicates and moves data across Hadoop clusters from source warehouses. The service uses pluggable interfaces to support different data sources, and leverages MapReduce jobs to load, convert, and copy data between clusters while isolating each source warehouse and cluster. Job throttling is also implemented to manage resource usage.