This document discusses a proposal for a system that can quickly recover from operating system failures without losing file caches. The system runs two operating systems simultaneously on a single machine - an active OS and a backup OS. If the active OS encounters a failure, the backup OS can take over in less than one second by migrating devices, file caches, and launching applications from the active to the backup OS. Evaluation of a prototype implementation using Linux showed the system could perform failover in under one second and minimize downtime for applications like an NFS file server during recovery from OS failures.
Related topics: