Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It consists of Hadoop Common (libraries and utilities), HDFS (distributed file system), YARN (resource management), and MapReduce (programming model). Hadoop is designed to reliably handle failures of individual machines or racks of machines by detecting and handling failures in software. It allows programming in any language using Hadoop Streaming and exposes higher-level interfaces like Pig Latin and SQL through related projects.