This document provides information about Hadoop ecosystem components over multiple years:
- MapReduce was introduced in 2004 and the Hadoop system paper was published in 2005. Core and HDFS levels were added to Hadoop in 2006.
- YARN was developed and introduced in 2013 to address limitations of the MapReduce framework. It separated processing from scheduling and resource management.
- Various other projects were added over time like Spark in 2014, Flink in 2015, and Kubernetes integration in 2017. Hadoop has expanded from a compute framework to a full data operating system.
- New components continue to be added for improved performance, ease of use, and to address different workloads like stream processing and SQL. H