The document discusses optimizing Hadoop clusters on AWS for bursty analysis demands. It presents three solutions: a permanent EC2 cluster, using Elastic MapReduce with data stored in S3, and using an EBS-backed HDFS cluster with task-only nodes. Performance results are shown for different workloads, showing the EBS HDFS solution has the lowest cost for most scenarios. Tips are also provided for setting up Hadoop on EC2.
Related topics: