This document discusses fault tolerance and distributed systems concepts. It defines key terms like failure, error and fault. It describes different types of faults like hard and soft faults. It discusses failure detection metrics like MTBF, MTTD and MTTR. It also covers different failure models like fail-stop, Byzantine and omission failures. The document then discusses distributed algorithms, their properties of safety and liveness, and timing models like synchronous, asynchronous and partial synchrony. It covers distributed consensus algorithms and how they ensure agreement, validity and termination properties. It provides examples of synchronous fail-stop and Byzantine consensus algorithms.