This document provides guidance on building an enterprise-grade data lake using IBM Spectrum Scale and Hortonworks Data Platform (HDP) for performing analytics workloads. It covers the benefits of the integrated solution and deployment models, including:
1) IBM Spectrum Scale provides extreme scalability, a global namespace, and reduced data center footprint for HDP analytics.
2) There are two deployment models - a shared storage model using IBM Elastic Storage Server behind an HDP cluster, and a shared nothing storage model running IBM Spectrum Scale directly on storage servers.
3) Guidelines are provided for cluster configuration using IBM Elastic Storage Server as centralized backend storage with HDP compute nodes connected over the network.