This paper explores the efficacy of the bfloat16 half-precision format for deep learning training, demonstrating that it can achieve state-of-the-art results across various domains without requiring hyper-parameter tuning. Compared to other half-precision formats like fp16, bfloat16 maintains the same dynamic range as fp32 and facilitates simpler tensor operations, enabling faster convergence in training. The authors implemented a library called quantlib to emulate bfloat16 operations in major deep learning frameworks, successfully validating their methodology across several neural network architectures.