The document presents a novel approach for improving failure propagation analysis in cloud computing systems by integrating fault injection, distributed tracing, and anomaly detection with a probabilistic model. The case study of OpenStack demonstrates high accuracy in identifying actual failure symptoms while minimizing false anomalies and maintaining low computational costs. The results suggest that this method effectively addresses the challenges of failure analysis in distributed systems.
Related topics: