This document discusses Hadoop, an open source software platform for distributed storage and processing of large datasets across clusters of computers. It describes Hadoop's core components - HDFS for storage and MapReduce for processing. HDFS stores large files across clusters and provides fault tolerance, while MapReduce allows parallel processing of datasets using a map and reduce function. The document also provides an example of using MapReduce to find the highest CGPA for each year from a dataset of grades.