The paper presents Mekha, a novel hypervisor-level checkpoint-restart mechanism designed for virtual clusters (VCs) that aims to improve fault tolerance in cloud data centers by addressing high checkpoint downtimes and overheads from existing methods. Mekha employs a coordinated checkpointing protocol using memory-bound time-multiplexed data transfers to create shadow VMs for efficient state replication, significantly reducing checkpoint latencies and overheads by up to 90% compared to traditional techniques. Extensive experiments demonstrate Mekha's effectiveness in maintaining global consistency and handling message loss, making it a practical solution for cloud computing environments.