Optimized feedforward network of cnn with xnor v5

Optimized Feedforward Network
of CNN with XNOR
E 501 INTRO TO COMPUTER ENGINEERING
Kadupitiya Kadupitige
Vibhatha Abeykoon

Introduction
● What is a Convolutional Networks and from what it is inspired?
○ Supervised deep learning algorithm: feedforward (testing), back propagation(training)
○ Biologically-inspired model (behavior of optic nerves in living creatures).
○ Mostly trained in GPU nodes and uses pretrained weights to do testing/prediction.
○ Even uses transfer learning to retrain the (classification layer: for new tasks)
● Applications
○ Most successful in analyzing visual imagery with spatial data(when positioning matters)
○ Sound processing, face detection,natural language processing, video scanning and many more .

Versions of CNN
● Why Binary-Networks?
○ CNN Testing/prediction is what matters most for real time applications (Mobile devices, etc).
○ Feature learning layer : Convolution, Normalization(Relu) and Pooling
○ Classification layer : Fully Connected, Regularization(dropping), Activation(softmax).
○ Out of all these convolutional layers consume > 90% of total test time.
● What is a XNOR-Network?
○ Both weights(filters) and inputs are binary.
● Our project:
○ Implementing a XNOR version of Convolution layer
○ Dot product = > XNOR followed by popcount.

Objectives
● Study the existing literature to understand state of art approaches to solve the problem.
● Understanding the CNN and Binary weight network version of CNN.
● Implementing a XNOR Operation - vhdl.
● Implementing a popcount/bitcount operation - vhdl.
● Implementing IO file handling - vhdl and python.
● Combining implemented operations to replicate convolution operation - vhdl.
● Compare the output against a hand calculated convolution operation - python.

Literature Review
● Xnor-net: Imagenet classification using binary convolutional neural networks:
○ Rastegari et al. [1] – Studied : Binary-Weight-Networks and XNOR-Networks to train the CNN
● Xnor-pop: A Processing-in-Memory Architecture for Binary Convolutional Neural Networks in Wide-
IO2 DRAMs
○ Jiang et al. [2] – CNN test performance improved by using XNOR-POP
○ Performance by 4× to 11× with small hardware and power overhead
● Bitwise neural networks
○ Kim et al. [3] - Bitwise Neural Network (BNN) is especially suitable for resource constrained environments
● Optimizing fpga-based accelerator design for deep convolutional neural networks
○ Zhang et al. [4] - They have quantitatively analyzed computing throughput and required memory bandwidth of
any potential solution of a CNN design on an FPGA platform using roofline analysis.
● According to the literature, XNOR-Networks can be used to implement:
○ Expensive multiply-accumulate (MAC) method found in convolution operation.
○ MAC = XNOR operation followed by a bitcount or popcount.

Proposed Methodology :
1. Program initializes with set of python calls.
a. Downloads ImageNet sample data
b. Load the images and convert to a binary format
c. Save the images in a matrix format in form of csv files
1. As input there are two files, one file contains a filter and the
other file contains the image matrix in binary format.
2. The filter is saved in form of a vector (1x9)
3. The image is saved with padding and windowing added.
4. At Convolution stage the binary matrix file and filters are
being used to compute the convolution value.
5. XNORPOP components does the XNOR Pop count
calculation. This is done in VHDL end.
6. In python context, the windowing is being reversed to get
the original image after convolution.
7. We use python only for preprocessing and post processing
of data. The core concept is implemented in VHDL.

File Formats and Data :Filter
User Input Filter Pre-Processing API Converts to a Vector Format

File Formats and Data : Image
1. Original Image converted to
numeric form contains values
between 0-255.
2. For the project we need this to be
only at two levels, so we first
convert the color image to
grayscale and this is the grayscale
output.
3. Still we need to get the black and
white nature which provides only
two states of values 0 or 255
Our Python ImageAPI reads...

File Formats and Data :Binarized Image (0-255)
User Input Image in Terms of Numbers Pre-Processing API Adds Padding (Important for
Convolution to avoid losing edge information)

File Formats and Data :VHDL Binarized Image (0-1)
1. Images data points with
significantly small goes
to 0 and higher values
goes to 1.
Pre-Processing API Converts to Binary Format

File Formats and Data :Sliding Window VHDL INPUT
1. Input to VHDL File
read is a delimiter
less matrix of Mx9.
2. M means the
number of sliding
windowing times
3. 9 becomes the filter
vector size (as we
choose 3x3 matrix
for filter)
4. Here M = 160,000
Pre-Processing API Generates Sliding-windowed matrix

Input Processing
500 x 472 500 x 472 400 x 400
1. Download images from
ImageNet
2. Convert to Grayscale
3. Crop 400x400 image
portion
4. Transform to binary format
(0,255)
5. Scaling down
transformation for XNOR-
POP (0,1)
Binary File
(0,1)
scaled
Load Binarized
(OpenCV) Scale to 0,1 (lower values converted to zero and higher values to 1
Padding By 1 on all sides
range of values : 0-255

Input File Definitions
1. In this project we define couple of file formats.
2. First we keep .sld files containing binary files applied with sliding windowing
for the ease of convolution computation. The image files which are binarized
are converted to this format.
3. The filters are with .filter extension and the filter is also a flattened matrix
which is actually vector and this is also done for the computation ease.
4. The VHDL I/O component takes the .sld and .fliter files as inputs and do the
convolution.

Theory : Simple example
1. Lets convolute A = 10010 and B = 01111 : A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (0 * 1) + (0 * 1) + (1 * 1) + (0 * 1) = +1
Considering 0 as -1: A ⊗ B = (1 * -1) + (-1 * 1) + (-1 * 1) + (1 * 1) + (-1 * 1) = -3
○ XNORPOP: scaling term: Result = 2*P - N, where N is the total number of bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(00010)= +1
Scaled Result : 2*P - N = 2*1 - 5 = -3
1. Now : Lets convolute A = 110110111 and B = 010111010: A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (1 * 1) + (0 * 0) + (1 * 1) + (1 * 1) + (0 * 1)+ (1 * 0) + (1 * 1) + (1* 0) = +4
0 ->-1: A ⊗ B = (1 *-1) + (1 * 1) + (-1 *-1) + (1 * 1) + (1 * 1) + (-1* 1)+ (1 *-1) + (1 * 1) + (1*-1) = +1
○ XNORPOP: If you replace (0,1) =>(-1, 1) , scaling term: Result = 2*P - N, where N is the total number of
bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(011110010)= +5
Scaled Result : 2*P - N = 2*5 - 9 = +1

Convolution Results Comparison
With multiply-accumulate (MAC - Python)
With XNOR-POP (VHDL)

Screen capture of VHDL simulation for XNOR-POP

Enhanced Comparison
Input VHDL output Difference
Difference between input image and convoluted output after 1 layer

Deliverables
● Data files:
○ Input image
○ Preprocessed input image
○ Filter image
○ Convoluted output image (XNOR-POP)
○ Hand calculated output image (using MAC)
● Source codes:
○ Python APIs for preprocessing (Vibhatha) (Completed)
○ VHDL code for file reading (I) (Vibhatha) (Completed)
○ VHDL code for XNOR-POP
(Kadupitiya)(Completed)
○ VHDL code for file writing (O) (Kadupitiya)
(Completed)
○ Python API for post processing (Vibhatha) (Completed)
○ Python API for hand calculated output (Kadupitiya) (Completed)

Track Commits and Latest Releases of Project
1. Github Code Base (Till 2018/March/20) : https://github.com/vibhatha/Computer-Eng-
Project
1. Github Official Code Base : https://github.com/iuisefinalprojects/ice
1. Current released version is 3.0.0
a. https://github.com/iuisefinalprojects/ice/tree/master/release-3.0.0
1. Github.io : https://iuisefinalprojects.github.io/pages/projects/ice/ice-proj.html

References
[1] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016, October). Xnor-net: Imagenet classification using binary
convolutional neural networks. In European Conference on Computer Vision (pp. 525-542). Springer, Cham.
[2] Jiang, L., Kim, M., Wen, W., & Wang, D. (2017, July). XNOR-POP: A processing-in-memory architecture for binary
Convolutional Neural Networks in Wide-IO2 DRAMs. In Low Power Electronics and Design (ISLPED, 2017 IEEE/ACM
International Symposium on (pp. 1-6). IEEE.
[3] Kim, M., & Smaragdis, P. (2016). Bitwise neural networks. arXiv preprint arXiv:1601.06071.
[4] Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for
deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-
Programmable Gate Arrays (pp. 161-170). ACM.

Optimized feedforward network of cnn with xnor v5

More Related Content

What's hot (20)

Similar to Optimized feedforward network of cnn with xnor v5 (20)

Recently uploaded (20)

Optimized feedforward network of cnn with xnor v5

Editor's Notes