This document discusses quantization techniques for convolutional neural networks to improve performance. It examines quantizing models trained with floating point precision to fixed point to reduce memory usage and accelerate inference. Tensorflow and Caffe Ristretto quantization approaches are described and tested on MNIST and CIFAR10 datasets. Results show quantization reduces model size with minimal accuracy loss but increases inference time, likely due to limited supported operations.
Related topics: