Bruce Wong, an engineering manager at Netflix, discusses the importance of chaos engineering in addressing inevitable failures such as disk outages and software bugs. He shares insights from case studies, particularly related to cloud vulnerabilities, emphasizing the need for resilience testing and design for failure. The presentation outlines strategies for initiating chaos testing, including starting small and gradually increasing complexity to build confidence in handling outages.
Related topics: