This document describes a customer success story involving Cloudera and Xpand IT. It discusses how Xpand IT developed a solution to provide near real-time monitoring and management of Hadoop clusters. The solution involves collecting telemetry data from Hadoop jobs, storing it in Kafka for real-time access, and using Spark to parse the logs and load data into Impala and HBase. This allows for real-time monitoring and control of ETL jobs across multiple Hadoop components in a fault-tolerant manner. The architecture is designed according to lambda architecture principles to handle both real-time and batch data processing.
Related topics: