This document discusses reproducible distributed experiments. It motivates reproducibility in data science due to analytical vs empirical proofs and complex scheduling and fault tolerance. It defines reproducibility as infrastructure, software, experiments and data. It demos a word count experiment on Karamel, a framework for reproducibility across bare metal, VMs, and software defined in Chef Github. Karamel Engine uses a DSL service and cloud clients to orchestrate physical mapping. Orchestration follows a queuing model. Challenges include scalability, fault recovery, elasticity, instrumentation, and language support.
Related topics: