This document discusses customizing a deep learning accelerator. It begins with a demonstration of object detection using a Tiny YOLO v1 model on an FPGA-based prototype. It then discusses designing a high-efficiency accelerator with three steps: 1) increasing MAC processing elements and utilization, 2) increasing data supply, and 3) improving energy efficiency. Various neural network models are profiled to analyze memory bandwidth and computational power tradeoffs. The document proposes a customizable architecture and discusses solutions like layer fusion, quantization-aware training, and post-training quantization. Performance estimates using an equation-based profiler for sample models are provided to demonstrate the customized accelerator design.