This document discusses training deep learning models using multi-GPU systems. It introduces distributed training using data or model parallelism and synchronous vs asynchronous training. Horovod is presented as a distributed training framework that can plug into TensorFlow, Keras and other frameworks to enable efficient multi-GPU and multi-node training using MPI for communication. Examples are given for using Horovod with Keras to distribute training across multiple GPUs on a single machine or multiple machines.
Related topics: