This document summarizes work optimizing a deep convolutional network for the Intel Xeon Phi coprocessor. The optimizations included loop unrolling, vectorization using SIMD intrinsics, and parallelization with OpenMP. Testing on an Intel Core i7 showed up to 6.3x speedup from vectorization. Mapping to the Xeon Phi with its 512-bit SIMD units yielded up to 11x speedup over an unoptimized version. Roofline models showed performance was bounded by memory bandwidth. Overall, the work contributed optimized code for convolutional networks running up to 43 frames per second on a Xeon Phi.