This document summarizes a methodology for accelerating convolutional neural networks (CNNs) on FPGAs using a dataflow approach. The key aspects of the methodology are exploiting the dataflow pattern of CNN operations, using independent and parametrically scalable modules, and a streaming computational paradigm with efficient memory access. This allows for performance improvements over large batches of data while maintaining high scalability given the FPGA's limited resources. Experimental results demonstrating the methodology show throughput increases of over 10x compared to CPU/GPU implementations.