This document discusses improving the reliability and availability of Hadoop clusters. It notes that while Hadoop is taking on more database-like features, the uptime of many Hadoop clusters and lack of SLAs is still an afterthought. It proposes separating computing and storage to improve availability like cloud Hadoop offerings do. It also suggests building KPIs and monitoring around Hadoop clusters similar to how many companies monitor data warehouses. Centralizing Hadoop infrastructure management into a "Big Data as a Service" model is presented as another way to improve reliability.