This document provides an introduction to Hadoop, including:
- A brief history of Hadoop and how it was created to address limitations of relational databases for big data.
- An overview of core Hadoop concepts like its shared-nothing architecture and using computation near storage.
- Descriptions of HDFS for distributed storage and MapReduce as the original programming framework.
- How the Hadoop ecosystem has grown to include additional frameworks like Hive, Pig, HBase and tools like Sqoop and Zookeeper.
- A discussion of YARN which separates resource management from job scheduling in Hadoop.