Optimized Feedforward Network of CNN with Xnor Final Presentation

Optimized Feedforward Network
of CNN with XNOR
E 501 INTRO TO COMPUTER ENGINEERING
Kadupitiya Kadupitige
Vibhatha Abeykoon

Introduction
● What is a Convolutional Networks and from what it is inspired?
○ Supervised deep learning algorithm: feedforward (testing), back propagation(training)
○ Biologically-inspired model (behavior of optic nerves in living creatures).
○ Mostly trained in GPU nodes and uses pretrained weights to do testing/prediction.
○ Even uses transfer learning to retrain the (classification layer: for new tasks)
● Applications
○ Most successful in analyzing visual imagery with spatial data(when positioning matters)
○ Sound processing, face detection,natural language processing, video scanning and many more .

Versions of CNN
● Why Binary-Networks?
○ CNN Testing/prediction is what matters most for real time applications (Mobile devices, etc).
○ Feature learning layer : Convolution, Normalization(Relu) and Pooling
○ Classification layer : Fully Connected, Regularization(dropping), Activation(softmax).
○ Out of all these convolutional layers consume > 90% of total test time.
● What is a XNOR-Network?
○ Both weights(filters) and inputs are binary.
● Our project:
○ Implementing a XNOR version of Convolution layer
○ Dot product = > XNOR followed by popcount.

Objective
● Our project claims on the fact of creating a dynamic
framework which can aid to perform convolution
operations for different sizes of kernels in hardware
based Feedforward Convolutional Neural Network
using XNOR-POP

Scope of the Project
1. Data Processing Engine to download images from ImageNet Database
2. Binarization mechanism to transform a color image into a black and white image without losing the
features in the original image.
3. Applying vivid binarization mechanisms to transform data into VHDL friendly vector mode.
4. Dynamic Windowing Capability to use a user defined filter size to pre-process the downloaded
images to form a vectorized binary file with 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations.
5. Data Feeding Engine to hardware layer in forms of Files and dynamical experiments can be run for
vivid images.
6. Improving the performance of XNOR Pop count under three different criteria.
7. XNOR Pop Count implementation on VHDL for 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations.
8. VHDL output generation and image reconstruction after the convolutions operation on the input data
set.
9. Comparison of vivid approaches against final outputs obtained by the convolution in form of
statistics and images.
10. Source and content publication via github.io

Literature Review
● Xnor-net: Imagenet classification using binary convolutional neural networks:
○ Rastegari et al. [1] – Studied : Binary-Weight-Networks and XNOR-Networks to train the CNN
● Xnor-pop: A Processing-in-Memory Architecture for Binary Convolutional Neural Networks in Wide-
IO2 DRAMs
○ Jiang et al. [2] – CNN test performance improved by using XNOR-POP
○ Performance by 4× to 11× with small hardware and power overhead
● Bitwise neural networks
○ Kim et al. [3] - Bitwise Neural Network (BNN) is especially suitable for resource constrained environments
● Optimizing fpga-based accelerator design for deep convolutional neural networks
○ Zhang et al. [4] - They have quantitatively analyzed computing throughput and required memory bandwidth of
any potential solution of a CNN design on an FPGA platform using roofline analysis.
● According to the literature, XNOR-Networks can be used to implement:
○ Expensive multiply-accumulate (MAC) method found in convolution operation.
○ MAC = XNOR operation followed by a bitcount or popcount.

Methodology :
1. Program initializes with set of python calls.
a. Downloads ImageNet sample data
b. Load the images and convert to a binary format
c. Save the images in a matrix format in form of csv files
1. There are two files as inputs. One file contains a filter and
the other file contains the image matrix in binary format.
2. The filter is saved in form of a vector (1xN )
3. The image is saved with padding and windowing added.
4. At Convolution stage the binary matrix file and filters are
being used to compute the convolution value.
5. XNORPOP components does the XNOR Pop count
calculation. This is done in VHDL end.
6. In python context, the windowing is being reversed to get
the original image after the convolution.
7. We use python only for preprocessing and post processing
of data. The core concept is implemented in VHDL.

Input Processing Framework
Image Urls
<User Input>
Save Images
Locally
ImageNet Save Images
Locally
Phase 1

class FilterAPI:
filter_path = 'filter/'
filter_file = '8x8_cross.filter'
def __init__(self, filter_path, filter_file):
print("Filter Path : " + filter_path)
print("Filter File : " + filter_file)
def mat2vec(self):
file_path = self.filter_path+self.filter_file
matrix = genfromtxt(file_path, delimiter=',')
vector = matrix.flatten()
print("Converted Vector : ")
print(vector)
dest_file = str.split(file_path,".")[0]+"_vector.filter"
np.savetxt(dest_file, vector, delimiter=",")
def sliding_window(self, arr, window_size=3):
rows = len(arr)
cols = len(arr[0])
print(rows, cols)
sliders = []
for i in range(0, rows - (window_size-1)):
for j in range(0, cols - (window_size-1)):
window = arr[i:i + window_size, j:j + window_size]
window_flat = window.flatten()
sliders.append(window_flat)
return np.array(sliders)
def save_sliding_window(self, window_size=3, source_file='binaries/image_3_200x200_pad.pad',
dest_file='binaries/sliding/file1_slidingwindow'):
#print("sliding window source file : ",source_file)
arr = genfromtxt(source_file, delimiter=',')
new_arr = self.sliding_window(arr, window_size)
new_arr1 = new_arr
rows = len(new_arr)
cols = len(new_arr[0])
print(rows,cols)
dest_file = str.split(dest_file,".")[0]+"__"+str(rows)+"x"+str(cols)+".sld"
dest_file_trim = str.split(dest_file, ".")[0] + "__" + str(rows) + "x" + str(cols) + "_trim.sld"
#print("Destination Sliding Window File : ",dest_file)
print(new_arr)
np.savetxt(dest_file, new_arr, delimiter=",",fmt='%d')
np.savetxt(dest_file_trim, new_arr1, delimiter='', fmt='%d')
class ImageAPI:
image_urls = []
base_path = ''
bin_dir = ''
image_dir= ''
image_file_path = ''
def __init__(self, image_file_path, base_path, bin_dir, image_dir):
self.image_urls = self.load_image_urls(image_file_path)
self.base_path = base_path
self.bin_dir = bin_dir
self.image_dir = image_dir
def load_image_urls(self, image_file_path):
image_urls = []
print("Image Url File : " + image_file_path)
lines = open(image_file_path, "r")
for line in lines:
image_urls.append(line)
self.image_urls = image_urls
return self.image_urls
def load_local_image_urls(self,image_file_path):
image_urls = []
print("Image Url File : " + image_file_path)
lines = open(image_file_path, "r")
for line in lines:
line = str.split(line,"n")[0]
image_urls.append(line)
image_urls = image_urls
return image_urls
def download_images(self):
download_paths = []
print("Downloading " + str(len(self.image_urls)) + " images ...")
count = 0
for image_url in self.image_urls:
print(image_url)
dest_path = str(self.image_dir+"/"+"image_"+str(count)+".jpg")
print("Downloaded " + dest_path)
urllib.urlretrieve(image_url, dest_path)
count = count + 1
download_paths.append(dest_path)
return download_paths
def rgb2gray(self,rgb):
return np.dot(rgb[..., :3], [0.299, 0.587, 0.114])
Filtering API Image Pre-Processing API

Input Processing Framework
Crop Image and
Generate Binaries
<User Define Size>
Save Crop
Images and
Corresponding
Binaries
Sliding Window
Added Binaries
Generation
Phase 2
Load Saved
Images

class VhdlAPI:
source_bin_file = 'image_0_200x200_pad.min'
source_bin_path='binaries/'
def __init__(self, source_bin_path = source_bin_path, source_bin_file =
source_bin_file):
self.source_bin_path = source_bin_path
self.source_bin_file = source_bin_file
def bin2vhdl(self,output_path=''):
fnames = str.split(self.source_bin_file,".")
print(fnames)
source_file = self.source_bin_path + self.source_bin_file
output_file = output_path + fnames[0]+".vhdlbin"
print('Converting to VHDL Binary Format')
array = genfromtxt(source_file, delimiter=',')
#array[array > 127] = 1
#array[array < 127] = 0
array = self.clip_array(array)
np.savetxt(output_file, array, delimiter=",", fmt='%d')
def padbin2jpg(self,bin_file, output_path=''):
image_array = genfromtxt(bin_file, delimiter=',')
fnames = str.split(bin_file, ".")
file_name = str.split(fnames[0],"/")[4]
print(fnames)
output_file = output_path + file_name + "_crop.jpg"
scipy.misc.imsave(output_file, image_array)
def vhdlbin2bin(self, source_file):
# converts the vhdl format bin file to normal bin file generated by opencv
# data range in vhdl bin is 0 or 1, and opencv is 0 or 255
image_array = genfromtxt(source_file, delimiter=',')
unpad_image_array = self.remove_single_padding(image_array)
unpad_image_array[unpad_image_array == 1.0] = 255.0
return unpad_image_array
def vhdlbinimg2binimg(self, source_file, output_file):
# converts a vhdlbinaries file to a opencv binarized image
new_array = self.vhdlbin2bin(source_file)
scipy.misc.imsave(output_file, new_array)
image_base = "images/crop/v5/"
image_file = "image_3_bin_400x400_crop.jpg"
image = io.imread(image_base+image_file)
binary_global = image > threshold_otsu(image)
#binary_global = image > threshold_mean(image)
window_size = 63
thresh_niblack = threshold_niblack(image, window_size=window_size, k=0.8)
thresh_sauvola = threshold_sauvola(image, window_size=window_size)
binary_niblack = image > thresh_niblack
binary_sauvola = image > thresh_sauvola
plt.figure(figsize=(8, 7))
plt.subplot(2, 2, 1)
plt.imshow(image, cmap=plt.cm.gray)
plt.title('Original')
plt.axis('off')
plt.title('Global Threshold')
plt.imshow(binary_global, cmap=plt.cm.gray)
plt.axis('off')
plt.imshow(binary_niblack, cmap=plt.cm.gray)
plt.title('Niblack Threshold')
plt.axis('off')
plt.imshow(binary_sauvola, cmap=plt.cm.gray)
plt.title('Sauvola Threshold')
plt.axis('off')
plt.show()
image_base = "images/crop/v5/"
image_file = "image_3_bin_400x400_crop.jpg"
img = cv2.imread(image_base+image_file, 0)
ret, imgf = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
ret, imgf2 = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY +
cv2.THRESH_OTSU)
ret, imgf3 = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)
plt.subplot(3, 1, 1), plt.imshow(img, cmap='gray')
plt.title('Original Noisy Image'), plt.xticks([]), plt.yticks([])
plt.subplot(3, 1, 2), plt.hist(img.ravel(), 256)
plt.axvline(x=ret, color='r', linestyle='dashed', linewidth=2)
plt.title('Histogram'), plt.xticks([]), plt.yticks([])
plt.subplot(3, 1, 3), plt.imshow(imgf, cmap='gray')
plt.title('Otsu thresholding'), plt.xticks([]), plt.yticks([])
plt.show()
imgf = imgf3-imgf2
exp="otsu-binary-otsu"
cv2.imwrite("binaries/test/"+exp+".jpg", imgf)
binary_image_file_path="binaries/test/otsu.jpg"
base_path = "binaries/test"
bin_dir = "binaries/test/"+exp+".txt"
image_dir="binaries/test"
experiment_base = image_dir
np.savetxt(bin_dir, imgf, delimiter=',', fmt='%d')
References: OpenCV, Skimage
VhdlInput API Binarization Method 2 Binarization Method 3

File Formats and Data :Filter
User Input Filter Pre-Processing API Converts to a Vector Format

File Formats and Data : Image
1. Original Image converted to
numeric form contains values
between 0-255.
2. For the project we need this to be
only at two levels, so we first
convert the color image to
grayscale and this is the grayscale
output.
3. Still we need to get the black and
white nature which provides only
two states of values 0 or 255
Our Python ImageAPI reads...

File Formats and Data :Binarized Image (0-255)
User Input Image in Terms of Numbers Pre-Processing API Adds Padding (Important for
Convolution to avoid losing edge information)

File Formats and Data :VHDL Binarized Image (0-1)
1. Images data points with
significantly small goes
to 0 and higher values
goes to 1.
Pre-Processing API Converts to Binary Format

File Formats and Data :Sliding Window VHDL INPUT
1. Input to VHDL File
read is a delimiter
less matrix of Mx9.
2. M means the
number of sliding
windowing times
3. 9 becomes the filter
vector size (as we
choose 3x3 matrix
for filter)
4. Here M = 160,000
Pre-Processing API Generates Sliding-windowed matrix

4x4 Window=>16 bit 5x5 Window=>25 bit 6x6 Window=>36 bit 7x7 Window=>49 bit
8x8 Window=>64 bit
Different Windowing Size for VHDL Input Generation (9 bit - 64 bit)

Input Processing Summary
500 x 472 500 x 472 400 x 400
1. Download images from
ImageNet
2. Convert to Grayscale
3. Crop 400x400 image
portion
4. Transform to binary format
(0,255)
5. Scaling down
transformation for XNOR-
POP (0,1)
Binary File
(0,1)
scaled
Load Binarized
(OpenCV) Scale to 0,1 (lower values converted to zero and higher values to 1
Padding By 1 on all sides
range of values : 0-255

Input File Definitions
1. In this project we define couple of file formats.
2. First we keep .sld files containing binary files applied with sliding windowing
for the ease of convolution computation. The image files which are binarized
are converted to this format.
3. The filters are with .filter extension and the filter is also a flattened matrix
which is actually vector and this is also done for the computation ease.
4. The VHDL I/O component takes the .sld and .fliter files as inputs and do the
convolution.

Theory : Simple example
1. Lets convolute A = 10010 and B = 01111 : A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (0 * 1) + (0 * 1) + (1 * 1) + (0 * 1) = +1
Considering 0 as -1: A ⊗ B = (1 * -1) + (-1 * 1) + (-1 * 1) + (1 * 1) + (-1 * 1) = -3
○ XNORPOP: scaling term: Result = 2*P - N, where N is the total number of bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(00010)= +1
Scaled Result : 2*P - N = 2*1 - 5 = -3
1. Now : Lets convolute A = 110110111 and B = 010111010: A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (1 * 1) + (0 * 0) + (1 * 1) + (1 * 1) + (0 * 1)+ (1 * 0) + (1 * 1) + (1* 0) = +4
0 ->-1: A ⊗ B = (1 *-1) + (1 * 1) + (-1 *-1) + (1 * 1) + (1 * 1) + (-1* 1)+ (1 *-1) + (1 * 1) + (1*-1) = +1
○ XNORPOP: If you replace (0,1) =>(-1, 1) , scaling term: Result = 2*P - N, where N is the total number of
bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(011110010)= +5
Scaled Result : 2*P - N = 2*5 - 9 = +1

Vhdl code: XNOR-POP: Version 1

Convolution Results Comparison
With multiply-accumulate (MAC - Python)
With XNOR-POP (VHDL)

Enhanced Comparison
Input VHDL output Difference
Difference between input image and convoluted output after 1 layer

Screen capture of VHDL simulation for XNOR-POP

Deliverables @ Proposal Presentation
● Data files:
○ Input image
○ Preprocessed input image
○ Filter image
○ Convoluted output image (XNOR-POP)
○ Hand calculated output image (using MAC)
● Source codes:
○ Python APIs for preprocessing (Vibhatha) (Completed)
○ VHDL code for file reading (I) (Vibhatha) (Completed)
○ VHDL code for XNOR-POP
(Kadupitiya)(Completed)
○ VHDL code for file writing (O) (Kadupitiya)
(Completed)
○ Python API for post processing (Vibhatha) (Completed)
○ Python API for hand calculated output (Kadupitiya) (Completed)

After the Proposal
Presentation

Binarization Methodologies
1. First we implemented our own idea of choosing a random threshold like 127
and substituting 0 and 1 depending being greater or lesser than the threshold.
(Proposal presentation approach)
2. In the second approach “Niblak Thresholding” was used.
3. In the third approach “Sauvola Thresholding” was used.
4. In second and third approaches we used SkImage implementations of these
thresholding mechanisms to generate the binary images.
5. As for the fourth mechanism, we used the OTSU binarization mechanism
along with couple of variations.
a. Binary + OTSU binarization
b. OTSU binarization

Niblak and Sauvola Binarization Against Window Size
window_size = 3 window_size = 5

window_size = 7 window_size = 9

window_size = 63

SkImage Binarization
Pure Binary Version Otsu + Binary Otsu

Comparison of SkImage Binarization Outputs
We subtracted each binary output from the other binary outputs
OTSU-(Binary+OTSU) OTSU-Binary
The binarization outputs
from Skimage matches
with each filter variation.

Binarization Method Conclusion
1. By experiments, concluded to use OpenCV default binarization
mechanism.
1. The default binary mechanism in OpenCV provided the same
binarization characteristics as the other approaches.

XNOR-POP: Version 2
By adding the bit sequence together.
● This is much more simpler than design 1, and we will discuss synthesis results later.
● Can we achieve more performance still?

XNOR-POP: Version 3 (Gate level code for design 2)
We can use a series of full adders(with 3 inputs inside the block) and half
adders(with 2 inputs inside the block)
● This is much more complex than design 2.
● Is it worth the effort?

Synthesis Results
Name of Design Combination delay Number of Slice
LUT’s used
Design 1 6.642 ns 47
● Design 2 is far better than design 1.
● Design 3 has slight improvement over design 2.
● Design 2 was used for 25, 36, 49 and 64 bit configurations as design 3 was so complex.
● We check the Synthesis report for popcount design (16-bit) using xilinx.

Convolution for different filter sizes
● We implemented convolution
operation for 9, 16, 25, 36, 49
and 64 bit sizes.
● We tested our modules with two
sets of filters.
○ Box filter (Kernel) : 3x3, 4x4, 5x5,
6x6, 7x7 and 8x8
○ Cross filter (Kernel) : 3x3, 4x4, 5x5,
6x6, 7x7 and 8x8

VHDL simulations for 9, 16, 25, 36, 49 and 64 bits Pop counts
8x8
3x3
5x5
7x7
6x6
4x4

Python Comparison Module (VHDL vs MAC Output)
def conv(filter, array):
output1 = []
rows = len(array)
cols = len(array[0])
for i in range(0, rows):
count = 0
for j in range(0, cols):
count = count + array[i][j] * filter[j]
if (count > 0):
count = 1
else:
count = 0
output1.append(count)
return output1
filterSize = 8
vhdl_output_bin_file = 'output' + str(filterSize) + 'x' + str(filterSize) + '.txt'
size = 395
inputImagevd = np.loadtxt(vhdl_output_bin_file)
inputImage1 = np.reshape(inputImagevd, (size, size))
plt.imshow(inputImage1, cmap=plt.get_cmap('gray'))
# plt.show()
input_bin_file = 'input_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
filter_bin_file = 'filter_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
output_bin_file = 'output_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
inputImage2read = np.loadtxt(input_bin_file, delimiter=',')
inputFilter = np.loadtxt(filter_bin_file)
inputImage2read[inputImage2read == 0] = -1
inputFilter[inputFilter == 0] = -1
ar1 = np.array(conv(inputFilter, inputImage2read))
np.savetxt(output_bin_file, ar1, delimiter="", fmt='%d')
inputImage2 = np.reshape(ar1, (size, size))
plt.imshow(inputImage2, cmap=plt.get_cmap('gray'))
print("comparing 8x8 image outputs : " + str(np.array_equal(inputImage1, inputImage2)))
scipy.misc.imsave('image1_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage1)
scipy.misc.imsave('image2_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage2)
8x8 Window Size Output Comparison

Convolution Operation Results for Box-Shaped Filter
8x8
3x3 4x4 5x5
7x76x6

8x8
3x3 4x4 5x5
7x76x6
Convolution Operation Results for Cross-Shaped Filter

Convolution Operation Conclusion
1. Depending on the number of pixels with ones and zeros, the effect
from the filter can be understood.
2. Depending on the percentage of ones in a filter,we observed a
unique behavior.
3. When the ones percentage is lesser than or equal to 44%, the
inversion and the dilation happens. When the percentage of ones
is greater than 44% the erosion takes place.

Deliverables @ Final Presentation
● Data files:
○ Multiple Input images
○ Preprocessed input image s
○ Filter images
○ Convoluted output image (XNOR-POP) ( Tested and Enhanced in two different ways)
○ Hand calculated output image (using MAC)
● Source codes:
○ Dynamic Python APIs for preprocessing (Vibhatha) (Completed)
○ Python Binarization Mechanisms (Vibhatha) (Completed)
○ VHDL code for file reading (I) (Vibhatha) (Completed)
○ VHDL code for XNOR-POP
(Kadupitiya)(Completed)
○ VHDL code for file writing (O) (Kadupitiya)
(Completed)
○ VHDL Enhanced Pop Count (Kadupitiya)
(Completed)
○ Multiple Window Size Comparison (Kadupitiya & Vibhatha)
(Completed)
○ Python API for post processing (Vibhatha) (Completed)
○ Python API for hand calculated output (Kadupitiya) (Completed)

Track Commits and Latest Releases of Project
1. Github Code Base (Till 2018/April/18) : https://github.com/vibhatha/Computer-Eng-
Project
1. Github Official Code Base : https://github.com/iuisefinalprojects/ice
1. Current released version is 4.0.0
a. https://github.com/iuisefinalprojects/ice/tree/master/release-4.0.0
1. Github.io : https://iuisefinalprojects.github.io/pages/projects/ice/ice-proj.html

References
[1] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016, October). Xnor-net: Imagenet classification using binary
convolutional neural networks. In European Conference on Computer Vision (pp. 525-542). Springer, Cham.
[2] Jiang, L., Kim, M., Wen, W., & Wang, D. (2017, July). XNOR-POP: A processing-in-memory architecture for binary
Convolutional Neural Networks in Wide-IO2 DRAMs. In Low Power Electronics and Design (ISLPED, 2017 IEEE/ACM
International Symposium on (pp. 1-6). IEEE.
[3] Kim, M., & Smaragdis, P. (2016). Bitwise neural networks. arXiv preprint arXiv:1601.06071.
[4] Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for
deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-
Programmable Gate Arrays (pp. 161-170). ACM.

Optimized Feedforward Network of CNN with Xnor Final Presentation

More Related Content

What's hot (20)

Similar to Optimized Feedforward Network of CNN with Xnor Final Presentation (20)

Recently uploaded (20)

Optimized Feedforward Network of CNN with Xnor Final Presentation

Editor's Notes