1. OpenCL caffe aims to enable cross-platform machine learning by porting the popular Caffe framework to use OpenCL instead of CUDA. This allows deployment of deep learning models on a variety of devices.
2. Performance optimizations included batching data to improve parallelism, and using multiple command queues to increase concurrent tasks. These provided up to 4.5x speedup over the baseline clBLAS library.
3. While OpenCL caffe performance matched CUDA caffe, a 2x gap remained versus proprietary cuDNN library, indicating potential for further hardware-specific optimizations to close this gap. The work helps address challenges of cross-platform deep learning.