Hadoop is an open source framework that allows storing and processing very large datasets in a distributed fashion across clusters of commodity servers. It uses the Hadoop Distributed File System (HDFS) to manage huge datasets across servers and provides a parallel processing engine called MapReduce. MapReduce programs split data, distribute processing across nodes, and aggregate results to allow processing massive amounts of data in parallel. Hadoop provides scalability, fault tolerance, and easy programming for distributed storage and processing of big data.