This document discusses building a data pipeline using tools from the Apache Hadoop ecosystem. It begins with an introduction to the speaker and why Hadoop is useful for data pipelines. It then provides a matrix comparing the different Hadoop distributions and their included components. It outlines the various tiers of projects in the Hadoop ecosystem and disclaims any completeness. It also presents the typical data lifecycle of capture, enrichment, analysis, presentation, reporting, archival and removal. The document concludes with a reference to demo code and soliciting questions.