SlideShare a Scribd company logo
Optimized Feedforward Network
of CNN with XNOR
E 501 INTRO TO COMPUTER ENGINEERING
Kadupitiya Kadupitige
Vibhatha Abeykoon
Introduction
● What is a Convolutional Networks and from what it is inspired?
○ Supervised deep learning algorithm: feedforward (testing), back propagation(training)
○ Biologically-inspired model (behavior of optic nerves in living creatures).
○ Mostly trained in GPU nodes and uses pretrained weights to do testing/prediction.
○ Even uses transfer learning to retrain the (classification layer: for new tasks)
● Applications
○ Most successful in analyzing visual imagery with spatial data(when positioning matters)
○ Sound processing, face detection,natural language processing, video scanning and many more .
Versions of CNN
● Why Binary-Networks?
○ CNN Testing/prediction is what matters most for real time applications (Mobile devices, etc).
○ Feature learning layer : Convolution, Normalization(Relu) and Pooling
○ Classification layer : Fully Connected, Regularization(dropping), Activation(softmax).
○ Out of all these convolutional layers consume > 90% of total test time.
● What is a XNOR-Network?
○ Both weights(filters) and inputs are binary.
● Our project:
○ Implementing a XNOR version of Convolution layer
○ Dot product = > XNOR followed by popcount.
Objective
● Our project claims on the fact of creating a dynamic
framework which can aid to perform convolution
operations for different sizes of kernels in hardware
based Feedforward Convolutional Neural Network
using XNOR-POP
Scope of the Project
1. Data Processing Engine to download images from ImageNet Database
2. Binarization mechanism to transform a color image into a black and white image without losing the
features in the original image.
3. Applying vivid binarization mechanisms to transform data into VHDL friendly vector mode.
4. Dynamic Windowing Capability to use a user defined filter size to pre-process the downloaded
images to form a vectorized binary file with 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations.
5. Data Feeding Engine to hardware layer in forms of Files and dynamical experiments can be run for
vivid images.
6. Improving the performance of XNOR Pop count under three different criteria.
7. XNOR Pop Count implementation on VHDL for 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations.
8. VHDL output generation and image reconstruction after the convolutions operation on the input data
set.
9. Comparison of vivid approaches against final outputs obtained by the convolution in form of
statistics and images.
10. Source and content publication via github.io
Literature Review
● Xnor-net: Imagenet classification using binary convolutional neural networks:
○ Rastegari et al. [1] – Studied : Binary-Weight-Networks and XNOR-Networks to train the CNN
● Xnor-pop: A Processing-in-Memory Architecture for Binary Convolutional Neural Networks in Wide-
IO2 DRAMs
○ Jiang et al. [2] – CNN test performance improved by using XNOR-POP
○ Performance by 4× to 11× with small hardware and power overhead
● Bitwise neural networks
○ Kim et al. [3] - Bitwise Neural Network (BNN) is especially suitable for resource constrained environments
● Optimizing fpga-based accelerator design for deep convolutional neural networks
○ Zhang et al. [4] - They have quantitatively analyzed computing throughput and required memory bandwidth of
any potential solution of a CNN design on an FPGA platform using roofline analysis.
● According to the literature, XNOR-Networks can be used to implement:
○ Expensive multiply-accumulate (MAC) method found in convolution operation.
○ MAC = XNOR operation followed by a bitcount or popcount.
Methodology :
1. Program initializes with set of python calls.
a. Downloads ImageNet sample data
b. Load the images and convert to a binary format
c. Save the images in a matrix format in form of csv files
1. There are two files as inputs. One file contains a filter and
the other file contains the image matrix in binary format.
2. The filter is saved in form of a vector (1xN )
3. The image is saved with padding and windowing added.
4. At Convolution stage the binary matrix file and filters are
being used to compute the convolution value.
5. XNORPOP components does the XNOR Pop count
calculation. This is done in VHDL end.
6. In python context, the windowing is being reversed to get
the original image after the convolution.
7. We use python only for preprocessing and post processing
of data. The core concept is implemented in VHDL.
Input Processing Framework
Image Urls
<User Input>
Save Images
Locally
ImageNet Save Images
Locally
Phase 1
class FilterAPI:
filter_path = 'filter/'
filter_file = '8x8_cross.filter'
def __init__(self, filter_path, filter_file):
print("Filter Path : " + filter_path)
print("Filter File : " + filter_file)
def mat2vec(self):
file_path = self.filter_path+self.filter_file
matrix = genfromtxt(file_path, delimiter=',')
vector = matrix.flatten()
print("Converted Vector : ")
print(vector)
dest_file = str.split(file_path,".")[0]+"_vector.filter"
np.savetxt(dest_file, vector, delimiter=",")
def sliding_window(self, arr, window_size=3):
rows = len(arr)
cols = len(arr[0])
print(rows, cols)
sliders = []
for i in range(0, rows - (window_size-1)):
for j in range(0, cols - (window_size-1)):
window = arr[i:i + window_size, j:j + window_size]
window_flat = window.flatten()
sliders.append(window_flat)
return np.array(sliders)
def save_sliding_window(self, window_size=3, source_file='binaries/image_3_200x200_pad.pad',
dest_file='binaries/sliding/file1_slidingwindow'):
#print("sliding window source file : ",source_file)
arr = genfromtxt(source_file, delimiter=',')
new_arr = self.sliding_window(arr, window_size)
new_arr1 = new_arr
rows = len(new_arr)
cols = len(new_arr[0])
print(rows,cols)
dest_file = str.split(dest_file,".")[0]+"__"+str(rows)+"x"+str(cols)+".sld"
dest_file_trim = str.split(dest_file, ".")[0] + "__" + str(rows) + "x" + str(cols) + "_trim.sld"
#print("Destination Sliding Window File : ",dest_file)
print(new_arr)
np.savetxt(dest_file, new_arr, delimiter=",",fmt='%d')
np.savetxt(dest_file_trim, new_arr1, delimiter='', fmt='%d')
class ImageAPI:
image_urls = []
base_path = ''
bin_dir = ''
image_dir= ''
image_file_path = ''
def __init__(self, image_file_path, base_path, bin_dir, image_dir):
self.image_urls = self.load_image_urls(image_file_path)
self.base_path = base_path
self.bin_dir = bin_dir
self.image_dir = image_dir
def load_image_urls(self, image_file_path):
image_urls = []
print("Image Url File : " + image_file_path)
lines = open(image_file_path, "r")
for line in lines:
image_urls.append(line)
self.image_urls = image_urls
return self.image_urls
def load_local_image_urls(self,image_file_path):
image_urls = []
print("Image Url File : " + image_file_path)
lines = open(image_file_path, "r")
for line in lines:
line = str.split(line,"n")[0]
image_urls.append(line)
image_urls = image_urls
return image_urls
def download_images(self):
download_paths = []
print("Downloading " + str(len(self.image_urls)) + " images ...")
count = 0
for image_url in self.image_urls:
print(image_url)
dest_path = str(self.image_dir+"/"+"image_"+str(count)+".jpg")
print("Downloaded " + dest_path)
urllib.urlretrieve(image_url, dest_path)
count = count + 1
download_paths.append(dest_path)
return download_paths
def rgb2gray(self,rgb):
return np.dot(rgb[..., :3], [0.299, 0.587, 0.114])
Filtering API Image Pre-Processing API
Input Processing Framework
Crop Image and
Generate Binaries
<User Define Size>
Save Crop
Images and
Corresponding
Binaries
Sliding Window
Added Binaries
Generation
Phase 2
Load Saved
Images
class VhdlAPI:
source_bin_file = 'image_0_200x200_pad.min'
source_bin_path='binaries/'
def __init__(self, source_bin_path = source_bin_path, source_bin_file =
source_bin_file):
self.source_bin_path = source_bin_path
self.source_bin_file = source_bin_file
def bin2vhdl(self,output_path=''):
fnames = str.split(self.source_bin_file,".")
print(fnames)
source_file = self.source_bin_path + self.source_bin_file
output_file = output_path + fnames[0]+".vhdlbin"
print('Converting to VHDL Binary Format')
array = genfromtxt(source_file, delimiter=',')
#array[array > 127] = 1
#array[array < 127] = 0
array = self.clip_array(array)
np.savetxt(output_file, array, delimiter=",", fmt='%d')
def padbin2jpg(self,bin_file, output_path=''):
image_array = genfromtxt(bin_file, delimiter=',')
fnames = str.split(bin_file, ".")
file_name = str.split(fnames[0],"/")[4]
print(fnames)
output_file = output_path + file_name + "_crop.jpg"
scipy.misc.imsave(output_file, image_array)
def vhdlbin2bin(self, source_file):
# converts the vhdl format bin file to normal bin file generated by opencv
# data range in vhdl bin is 0 or 1, and opencv is 0 or 255
image_array = genfromtxt(source_file, delimiter=',')
unpad_image_array = self.remove_single_padding(image_array)
unpad_image_array[unpad_image_array == 1.0] = 255.0
return unpad_image_array
def vhdlbinimg2binimg(self, source_file, output_file):
# converts a vhdlbinaries file to a opencv binarized image
new_array = self.vhdlbin2bin(source_file)
scipy.misc.imsave(output_file, new_array)
image_base = "images/crop/v5/"
image_file = "image_3_bin_400x400_crop.jpg"
image = io.imread(image_base+image_file)
binary_global = image > threshold_otsu(image)
#binary_global = image > threshold_mean(image)
window_size = 63
thresh_niblack = threshold_niblack(image, window_size=window_size, k=0.8)
thresh_sauvola = threshold_sauvola(image, window_size=window_size)
binary_niblack = image > thresh_niblack
binary_sauvola = image > thresh_sauvola
plt.figure(figsize=(8, 7))
plt.subplot(2, 2, 1)
plt.imshow(image, cmap=plt.cm.gray)
plt.title('Original')
plt.axis('off')
plt.subplot(2, 2, 2)
plt.title('Global Threshold')
plt.imshow(binary_global, cmap=plt.cm.gray)
plt.axis('off')
plt.subplot(2, 2, 3)
plt.imshow(binary_niblack, cmap=plt.cm.gray)
plt.title('Niblack Threshold')
plt.axis('off')
plt.subplot(2, 2, 4)
plt.imshow(binary_sauvola, cmap=plt.cm.gray)
plt.title('Sauvola Threshold')
plt.axis('off')
plt.show()
image_base = "images/crop/v5/"
image_file = "image_3_bin_400x400_crop.jpg"
img = cv2.imread(image_base+image_file, 0)
ret, imgf = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
ret, imgf2 = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY +
cv2.THRESH_OTSU)
ret, imgf3 = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)
plt.subplot(3, 1, 1), plt.imshow(img, cmap='gray')
plt.title('Original Noisy Image'), plt.xticks([]), plt.yticks([])
plt.subplot(3, 1, 2), plt.hist(img.ravel(), 256)
plt.axvline(x=ret, color='r', linestyle='dashed', linewidth=2)
plt.title('Histogram'), plt.xticks([]), plt.yticks([])
plt.subplot(3, 1, 3), plt.imshow(imgf, cmap='gray')
plt.title('Otsu thresholding'), plt.xticks([]), plt.yticks([])
plt.show()
imgf = imgf3-imgf2
exp="otsu-binary-otsu"
cv2.imwrite("binaries/test/"+exp+".jpg", imgf)
binary_image_file_path="binaries/test/otsu.jpg"
base_path = "binaries/test"
bin_dir = "binaries/test/"+exp+".txt"
image_dir="binaries/test"
experiment_base = image_dir
np.savetxt(bin_dir, imgf, delimiter=',', fmt='%d')
References: OpenCV, Skimage
VhdlInput API Binarization Method 2 Binarization Method 3
File Formats and Data :Filter
User Input Filter Pre-Processing API Converts to a Vector Format
File Formats and Data : Image
1. Original Image converted to
numeric form contains values
between 0-255.
2. For the project we need this to be
only at two levels, so we first
convert the color image to
grayscale and this is the grayscale
output.
3. Still we need to get the black and
white nature which provides only
two states of values 0 or 255
Our Python ImageAPI reads...
File Formats and Data :Binarized Image (0-255)
User Input Image in Terms of Numbers Pre-Processing API Adds Padding (Important for
Convolution to avoid losing edge information)
File Formats and Data :VHDL Binarized Image (0-1)
1. Images data points with
significantly small goes
to 0 and higher values
goes to 1.
Pre-Processing API Converts to Binary Format
File Formats and Data :Sliding Window VHDL INPUT
1. Input to VHDL File
read is a delimiter
less matrix of Mx9.
2. M means the
number of sliding
windowing times
3. 9 becomes the filter
vector size (as we
choose 3x3 matrix
for filter)
4. Here M = 160,000
Pre-Processing API Generates Sliding-windowed matrix
4x4 Window=>16 bit 5x5 Window=>25 bit 6x6 Window=>36 bit 7x7 Window=>49 bit
8x8 Window=>64 bit
Different Windowing Size for VHDL Input Generation (9 bit - 64 bit)
Input Processing Summary
500 x 472 500 x 472 400 x 400
1. Download images from
ImageNet
2. Convert to Grayscale
3. Crop 400x400 image
portion
4. Transform to binary format
(0,255)
5. Scaling down
transformation for XNOR-
POP (0,1)
Binary File
(0,1)
scaled
Load Binarized
(OpenCV) Scale to 0,1 (lower values converted to zero and higher values to 1
Padding By 1 on all sides
range of values : 0-255
Input File Definitions
1. In this project we define couple of file formats.
2. First we keep .sld files containing binary files applied with sliding windowing
for the ease of convolution computation. The image files which are binarized
are converted to this format.
3. The filters are with .filter extension and the filter is also a flattened matrix
which is actually vector and this is also done for the computation ease.
4. The VHDL I/O component takes the .sld and .fliter files as inputs and do the
convolution.
Theory : Simple example
1. Lets convolute A = 10010 and B = 01111 : A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (0 * 1) + (0 * 1) + (1 * 1) + (0 * 1) = +1
Considering 0 as -1: A ⊗ B = (1 * -1) + (-1 * 1) + (-1 * 1) + (1 * 1) + (-1 * 1) = -3
○ XNORPOP: scaling term: Result = 2*P - N, where N is the total number of bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(00010)= +1
Scaled Result : 2*P - N = 2*1 - 5 = -3
1. Now : Lets convolute A = 110110111 and B = 010111010: A ⊗ B.
○ Expensive multiply-accumulate (MAC) method:
A ⊗ B = (1 * 0) + (1 * 1) + (0 * 0) + (1 * 1) + (1 * 1) + (0 * 1)+ (1 * 0) + (1 * 1) + (1* 0) = +4
0 ->-1: A ⊗ B = (1 *-1) + (1 * 1) + (-1 *-1) + (1 * 1) + (1 * 1) + (-1* 1)+ (1 *-1) + (1 * 1) + (1*-1) = +1
○ XNORPOP: If you replace (0,1) =>(-1, 1) , scaling term: Result = 2*P - N, where N is the total number of
bits
POPCOUNT(XNOR(A, B)) = POPCOUNT(011110010)= +5
Scaled Result : 2*P - N = 2*5 - 9 = +1
Vhdl code: XNOR-POP: Version 1
Vhdl Code: IO Handing
Convolution Results Comparison
With multiply-accumulate (MAC - Python)
With XNOR-POP (VHDL)
Enhanced Comparison
Input VHDL output Difference
Difference between input image and convoluted output after 1 layer
Screen capture of VHDL simulation for XNOR-POP
Deliverables @ Proposal Presentation
● Data files:
○ Input image
○ Preprocessed input image
○ Filter image
○ Convoluted output image (XNOR-POP)
○ Hand calculated output image (using MAC)
● Source codes:
○ Python APIs for preprocessing (Vibhatha) (Completed)
○ VHDL code for file reading (I) (Vibhatha) (Completed)
○ VHDL code for XNOR-POP
(Kadupitiya)(Completed)
○ VHDL code for file writing (O) (Kadupitiya)
(Completed)
○ Python API for post processing (Vibhatha) (Completed)
○ Python API for hand calculated output (Kadupitiya) (Completed)
After the Proposal
Presentation
Binarization Methodologies
1. First we implemented our own idea of choosing a random threshold like 127
and substituting 0 and 1 depending being greater or lesser than the threshold.
(Proposal presentation approach)
2. In the second approach “Niblak Thresholding” was used.
3. In the third approach “Sauvola Thresholding” was used.
4. In second and third approaches we used SkImage implementations of these
thresholding mechanisms to generate the binary images.
5. As for the fourth mechanism, we used the OTSU binarization mechanism
along with couple of variations.
a. Binary + OTSU binarization
b. OTSU binarization
Niblak and Sauvola Binarization Against Window Size
window_size = 3 window_size = 5
Niblak and Sauvola Binarization Against Window Size
window_size = 7 window_size = 9
Niblak and Sauvola Binarization Against Window Size
window_size = 63
SkImage Binarization
Pure Binary Version Otsu + Binary Otsu
Comparison of SkImage Binarization Outputs
We subtracted each binary output from the other binary outputs
OTSU-(Binary+OTSU) OTSU-Binary
The binarization outputs
from Skimage matches
with each filter variation.
Binarization Method Conclusion
1. By experiments, concluded to use OpenCV default binarization
mechanism.
1. The default binary mechanism in OpenCV provided the same
binarization characteristics as the other approaches.
Box Shape Filters
Cross Shape Filters
XNOR-POP: Version 2
By adding the bit sequence together.
● This is much more simpler than design 1, and we will discuss synthesis results later.
● Can we achieve more performance still?
XNOR-POP: Version 3 (Gate level code for design 2)
We can use a series of full adders(with 3 inputs inside the block) and half
adders(with 2 inputs inside the block)
● This is much more complex than design 2.
● Is it worth the effort?
Synthesis Results
Name of Design Combination delay Number of Slice
LUT’s used
Design 1 6.642 ns 47
Design 2 3.330 ns 25
Design 3 3.260 ns 21
● Design 2 is far better than design 1.
● Design 3 has slight improvement over design 2.
● Design 2 was used for 25, 36, 49 and 64 bit configurations as design 3 was so complex.
● We check the Synthesis report for popcount design (16-bit) using xilinx.
Convolution for different filter sizes
● We implemented convolution
operation for 9, 16, 25, 36, 49
and 64 bit sizes.
● We tested our modules with two
sets of filters.
○ Box filter (Kernel) : 3x3, 4x4, 5x5,
6x6, 7x7 and 8x8
○ Cross filter (Kernel) : 3x3, 4x4, 5x5,
6x6, 7x7 and 8x8
VHDL simulations for 9, 16, 25, 36, 49 and 64 bits Pop counts
8x8
3x3
5x5
7x7
6x6
4x4
Python Comparison Module (VHDL vs MAC Output)
def conv(filter, array):
output1 = []
rows = len(array)
cols = len(array[0])
for i in range(0, rows):
count = 0
for j in range(0, cols):
count = count + array[i][j] * filter[j]
if (count > 0):
count = 1
else:
count = 0
output1.append(count)
return output1
filterSize = 8
vhdl_output_bin_file = 'output' + str(filterSize) + 'x' + str(filterSize) + '.txt'
size = 395
inputImagevd = np.loadtxt(vhdl_output_bin_file)
inputImage1 = np.reshape(inputImagevd, (size, size))
plt.imshow(inputImage1, cmap=plt.get_cmap('gray'))
# plt.show()
input_bin_file = 'input_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
filter_bin_file = 'filter_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
output_bin_file = 'output_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt'
inputImage2read = np.loadtxt(input_bin_file, delimiter=',')
inputFilter = np.loadtxt(filter_bin_file)
inputImage2read[inputImage2read == 0] = -1
inputFilter[inputFilter == 0] = -1
ar1 = np.array(conv(inputFilter, inputImage2read))
np.savetxt(output_bin_file, ar1, delimiter="", fmt='%d')
inputImage2 = np.reshape(ar1, (size, size))
plt.imshow(inputImage2, cmap=plt.get_cmap('gray'))
print("comparing 8x8 image outputs : " + str(np.array_equal(inputImage1, inputImage2)))
scipy.misc.imsave('image1_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage1)
scipy.misc.imsave('image2_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage2)
8x8 Window Size Output Comparison
Convolution Operation Results for Box-Shaped Filter
8x8
3x3 4x4 5x5
7x76x6
8x8
3x3 4x4 5x5
7x76x6
Convolution Operation Results for Cross-Shaped Filter
Convolution Operation Conclusion
1. Depending on the number of pixels with ones and zeros, the effect
from the filter can be understood.
2. Depending on the percentage of ones in a filter,we observed a
unique behavior.
3. When the ones percentage is lesser than or equal to 44%, the
inversion and the dilation happens. When the percentage of ones
is greater than 44% the erosion takes place.
Deliverables @ Final Presentation
● Data files:
○ Multiple Input images
○ Preprocessed input image s
○ Filter images
○ Convoluted output image (XNOR-POP) ( Tested and Enhanced in two different ways)
○ Hand calculated output image (using MAC)
● Source codes:
○ Dynamic Python APIs for preprocessing (Vibhatha) (Completed)
○ Python Binarization Mechanisms (Vibhatha) (Completed)
○ VHDL code for file reading (I) (Vibhatha) (Completed)
○ VHDL code for XNOR-POP
(Kadupitiya)(Completed)
○ VHDL code for file writing (O) (Kadupitiya)
(Completed)
○ VHDL Enhanced Pop Count (Kadupitiya)
(Completed)
○ Multiple Window Size Comparison (Kadupitiya & Vibhatha)
(Completed)
○ Python API for post processing (Vibhatha) (Completed)
○ Python API for hand calculated output (Kadupitiya) (Completed)
Track Commits and Latest Releases of Project
1. Github Code Base (Till 2018/April/18) : https://github.com/vibhatha/Computer-Eng-
Project
1. Github Official Code Base : https://github.com/iuisefinalprojects/ice
1. Current released version is 4.0.0
a. https://github.com/iuisefinalprojects/ice/tree/master/release-4.0.0
1. Github.io : https://iuisefinalprojects.github.io/pages/projects/ice/ice-proj.html
References
[1] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016, October). Xnor-net: Imagenet classification using binary
convolutional neural networks. In European Conference on Computer Vision (pp. 525-542). Springer, Cham.
[2] Jiang, L., Kim, M., Wen, W., & Wang, D. (2017, July). XNOR-POP: A processing-in-memory architecture for binary
Convolutional Neural Networks in Wide-IO2 DRAMs. In Low Power Electronics and Design (ISLPED, 2017 IEEE/ACM
International Symposium on (pp. 1-6). IEEE.
[3] Kim, M., & Smaragdis, P. (2016). Bitwise neural networks. arXiv preprint arXiv:1601.06071.
[4] Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for
deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-
Programmable Gate Arrays (pp. 161-170). ACM.
Thank You

More Related Content

PDF
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
PPTX
Optimized feedforward network of cnn with xnor v5
PPTX
Working with images in matlab graphics
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PDF
D0325016021
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PDF
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
PDF
Modified approximate 8-point multiplier less DCT like transform
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
Optimized feedforward network of cnn with xnor v5
Working with images in matlab graphics
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
D0325016021
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Modified approximate 8-point multiplier less DCT like transform

What's hot (20)

PDF
A flexible method to create wave file features
PDF
Image processing with matlab
PDF
T01022103108
PDF
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Bt32444450
PPTX
Coin recognition using matlab
PDF
Techniques for effective and efficient fire detection from social media images
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
Paper id 37201520
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PDF
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
PDF
A New Cross Diamond Search Motion Estimation Algorithm for HEVC
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
PDF
LeNet-5
PDF
Centernet
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
PPTX
JPEG Image Compression
PDF
Ijetcas14 466
A flexible method to create wave file features
Image processing with matlab
T01022103108
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Bt32444450
Coin recognition using matlab
Techniques for effective and efficient fire detection from social media images
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Paper id 37201520
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
A New Cross Diamond Search Motion Estimation Algorithm for HEVC
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
LeNet-5
Centernet
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
JPEG Image Compression
Ijetcas14 466
Ad

Similar to Optimized Feedforward Network of CNN with Xnor Final Presentation (20)

PPTX
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
PPTX
Image Encryption in java ppt.
PDF
OpenPOWER Workshop in Silicon Valley
PPTX
SeRanet introduction
PDF
Automated Image Captioning – Model Based on CNN – GRU Architecture
PDF
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
PPTX
Detection of medical instruments project- PART 1
PDF
AIML4 CNN lab256 1hr (111-1).pdf
PDF
Content Based Image Retrieval (CBIR)
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
物件偵測與辨識技術
PPTX
Presentation on BornoNet Research Paper and Python Basics
PPTX
Introduction to Machine Learning by MARK
PDF
Region-oriented Convolutional Networks for Object Retrieval
PDF
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
PPTX
Rapid object detection using boosted cascade of simple features
PPTX
Introduction to Convolutional Neural Networks (CNNs).pptx
PDF
Overview of Chainer and Its Features
PDF
NeuralProcessingofGeneralPurposeApproximatePrograms
PDF
Towards neuralprocessingofgeneralpurposeapproximateprograms
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
Image Encryption in java ppt.
OpenPOWER Workshop in Silicon Valley
SeRanet introduction
Automated Image Captioning – Model Based on CNN – GRU Architecture
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Detection of medical instruments project- PART 1
AIML4 CNN lab256 1hr (111-1).pdf
Content Based Image Retrieval (CBIR)
Deep learning for molecules, introduction to chainer chemistry
物件偵測與辨識技術
Presentation on BornoNet Research Paper and Python Basics
Introduction to Machine Learning by MARK
Region-oriented Convolutional Networks for Object Retrieval
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Rapid object detection using boosted cascade of simple features
Introduction to Convolutional Neural Networks (CNNs).pptx
Overview of Chainer and Its Features
NeuralProcessingofGeneralPurposeApproximatePrograms
Towards neuralprocessingofgeneralpurposeapproximateprograms
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
web development for engineering and engineering
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Digital Logic Computer Design lecture notes
PPTX
Current and future trends in Computer Vision.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Mechanical Engineering MATERIALS Selection
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Geodesy 1.pptx...............................................
web development for engineering and engineering
Internet of Things (IOT) - A guide to understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT 4 Total Quality Management .pptx
Sustainable Sites - Green Building Construction
Operating System & Kernel Study Guide-1 - converted.pdf
PPT on Performance Review to get promotions
Safety Seminar civil to be ensured for safe working.
Digital Logic Computer Design lecture notes
Current and future trends in Computer Vision.pptx
573137875-Attendance-Management-System-original
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mechanical Engineering MATERIALS Selection
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Project quality management in manufacturing

Optimized Feedforward Network of CNN with Xnor Final Presentation

  • 1. Optimized Feedforward Network of CNN with XNOR E 501 INTRO TO COMPUTER ENGINEERING Kadupitiya Kadupitige Vibhatha Abeykoon
  • 2. Introduction ● What is a Convolutional Networks and from what it is inspired? ○ Supervised deep learning algorithm: feedforward (testing), back propagation(training) ○ Biologically-inspired model (behavior of optic nerves in living creatures). ○ Mostly trained in GPU nodes and uses pretrained weights to do testing/prediction. ○ Even uses transfer learning to retrain the (classification layer: for new tasks) ● Applications ○ Most successful in analyzing visual imagery with spatial data(when positioning matters) ○ Sound processing, face detection,natural language processing, video scanning and many more .
  • 3. Versions of CNN ● Why Binary-Networks? ○ CNN Testing/prediction is what matters most for real time applications (Mobile devices, etc). ○ Feature learning layer : Convolution, Normalization(Relu) and Pooling ○ Classification layer : Fully Connected, Regularization(dropping), Activation(softmax). ○ Out of all these convolutional layers consume > 90% of total test time. ● What is a XNOR-Network? ○ Both weights(filters) and inputs are binary. ● Our project: ○ Implementing a XNOR version of Convolution layer ○ Dot product = > XNOR followed by popcount.
  • 4. Objective ● Our project claims on the fact of creating a dynamic framework which can aid to perform convolution operations for different sizes of kernels in hardware based Feedforward Convolutional Neural Network using XNOR-POP
  • 5. Scope of the Project 1. Data Processing Engine to download images from ImageNet Database 2. Binarization mechanism to transform a color image into a black and white image without losing the features in the original image. 3. Applying vivid binarization mechanisms to transform data into VHDL friendly vector mode. 4. Dynamic Windowing Capability to use a user defined filter size to pre-process the downloaded images to form a vectorized binary file with 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations. 5. Data Feeding Engine to hardware layer in forms of Files and dynamical experiments can be run for vivid images. 6. Improving the performance of XNOR Pop count under three different criteria. 7. XNOR Pop Count implementation on VHDL for 9 bit, 16 bit 25 bit, 49 bit, and 64 bit configurations. 8. VHDL output generation and image reconstruction after the convolutions operation on the input data set. 9. Comparison of vivid approaches against final outputs obtained by the convolution in form of statistics and images. 10. Source and content publication via github.io
  • 6. Literature Review ● Xnor-net: Imagenet classification using binary convolutional neural networks: ○ Rastegari et al. [1] – Studied : Binary-Weight-Networks and XNOR-Networks to train the CNN ● Xnor-pop: A Processing-in-Memory Architecture for Binary Convolutional Neural Networks in Wide- IO2 DRAMs ○ Jiang et al. [2] – CNN test performance improved by using XNOR-POP ○ Performance by 4× to 11× with small hardware and power overhead ● Bitwise neural networks ○ Kim et al. [3] - Bitwise Neural Network (BNN) is especially suitable for resource constrained environments ● Optimizing fpga-based accelerator design for deep convolutional neural networks ○ Zhang et al. [4] - They have quantitatively analyzed computing throughput and required memory bandwidth of any potential solution of a CNN design on an FPGA platform using roofline analysis. ● According to the literature, XNOR-Networks can be used to implement: ○ Expensive multiply-accumulate (MAC) method found in convolution operation. ○ MAC = XNOR operation followed by a bitcount or popcount.
  • 7. Methodology : 1. Program initializes with set of python calls. a. Downloads ImageNet sample data b. Load the images and convert to a binary format c. Save the images in a matrix format in form of csv files 1. There are two files as inputs. One file contains a filter and the other file contains the image matrix in binary format. 2. The filter is saved in form of a vector (1xN ) 3. The image is saved with padding and windowing added. 4. At Convolution stage the binary matrix file and filters are being used to compute the convolution value. 5. XNORPOP components does the XNOR Pop count calculation. This is done in VHDL end. 6. In python context, the windowing is being reversed to get the original image after the convolution. 7. We use python only for preprocessing and post processing of data. The core concept is implemented in VHDL.
  • 8. Input Processing Framework Image Urls <User Input> Save Images Locally ImageNet Save Images Locally Phase 1
  • 9. class FilterAPI: filter_path = 'filter/' filter_file = '8x8_cross.filter' def __init__(self, filter_path, filter_file): print("Filter Path : " + filter_path) print("Filter File : " + filter_file) def mat2vec(self): file_path = self.filter_path+self.filter_file matrix = genfromtxt(file_path, delimiter=',') vector = matrix.flatten() print("Converted Vector : ") print(vector) dest_file = str.split(file_path,".")[0]+"_vector.filter" np.savetxt(dest_file, vector, delimiter=",") def sliding_window(self, arr, window_size=3): rows = len(arr) cols = len(arr[0]) print(rows, cols) sliders = [] for i in range(0, rows - (window_size-1)): for j in range(0, cols - (window_size-1)): window = arr[i:i + window_size, j:j + window_size] window_flat = window.flatten() sliders.append(window_flat) return np.array(sliders) def save_sliding_window(self, window_size=3, source_file='binaries/image_3_200x200_pad.pad', dest_file='binaries/sliding/file1_slidingwindow'): #print("sliding window source file : ",source_file) arr = genfromtxt(source_file, delimiter=',') new_arr = self.sliding_window(arr, window_size) new_arr1 = new_arr rows = len(new_arr) cols = len(new_arr[0]) print(rows,cols) dest_file = str.split(dest_file,".")[0]+"__"+str(rows)+"x"+str(cols)+".sld" dest_file_trim = str.split(dest_file, ".")[0] + "__" + str(rows) + "x" + str(cols) + "_trim.sld" #print("Destination Sliding Window File : ",dest_file) print(new_arr) np.savetxt(dest_file, new_arr, delimiter=",",fmt='%d') np.savetxt(dest_file_trim, new_arr1, delimiter='', fmt='%d') class ImageAPI: image_urls = [] base_path = '' bin_dir = '' image_dir= '' image_file_path = '' def __init__(self, image_file_path, base_path, bin_dir, image_dir): self.image_urls = self.load_image_urls(image_file_path) self.base_path = base_path self.bin_dir = bin_dir self.image_dir = image_dir def load_image_urls(self, image_file_path): image_urls = [] print("Image Url File : " + image_file_path) lines = open(image_file_path, "r") for line in lines: image_urls.append(line) self.image_urls = image_urls return self.image_urls def load_local_image_urls(self,image_file_path): image_urls = [] print("Image Url File : " + image_file_path) lines = open(image_file_path, "r") for line in lines: line = str.split(line,"n")[0] image_urls.append(line) image_urls = image_urls return image_urls def download_images(self): download_paths = [] print("Downloading " + str(len(self.image_urls)) + " images ...") count = 0 for image_url in self.image_urls: print(image_url) dest_path = str(self.image_dir+"/"+"image_"+str(count)+".jpg") print("Downloaded " + dest_path) urllib.urlretrieve(image_url, dest_path) count = count + 1 download_paths.append(dest_path) return download_paths def rgb2gray(self,rgb): return np.dot(rgb[..., :3], [0.299, 0.587, 0.114]) Filtering API Image Pre-Processing API
  • 10. Input Processing Framework Crop Image and Generate Binaries <User Define Size> Save Crop Images and Corresponding Binaries Sliding Window Added Binaries Generation Phase 2 Load Saved Images
  • 11. class VhdlAPI: source_bin_file = 'image_0_200x200_pad.min' source_bin_path='binaries/' def __init__(self, source_bin_path = source_bin_path, source_bin_file = source_bin_file): self.source_bin_path = source_bin_path self.source_bin_file = source_bin_file def bin2vhdl(self,output_path=''): fnames = str.split(self.source_bin_file,".") print(fnames) source_file = self.source_bin_path + self.source_bin_file output_file = output_path + fnames[0]+".vhdlbin" print('Converting to VHDL Binary Format') array = genfromtxt(source_file, delimiter=',') #array[array > 127] = 1 #array[array < 127] = 0 array = self.clip_array(array) np.savetxt(output_file, array, delimiter=",", fmt='%d') def padbin2jpg(self,bin_file, output_path=''): image_array = genfromtxt(bin_file, delimiter=',') fnames = str.split(bin_file, ".") file_name = str.split(fnames[0],"/")[4] print(fnames) output_file = output_path + file_name + "_crop.jpg" scipy.misc.imsave(output_file, image_array) def vhdlbin2bin(self, source_file): # converts the vhdl format bin file to normal bin file generated by opencv # data range in vhdl bin is 0 or 1, and opencv is 0 or 255 image_array = genfromtxt(source_file, delimiter=',') unpad_image_array = self.remove_single_padding(image_array) unpad_image_array[unpad_image_array == 1.0] = 255.0 return unpad_image_array def vhdlbinimg2binimg(self, source_file, output_file): # converts a vhdlbinaries file to a opencv binarized image new_array = self.vhdlbin2bin(source_file) scipy.misc.imsave(output_file, new_array) image_base = "images/crop/v5/" image_file = "image_3_bin_400x400_crop.jpg" image = io.imread(image_base+image_file) binary_global = image > threshold_otsu(image) #binary_global = image > threshold_mean(image) window_size = 63 thresh_niblack = threshold_niblack(image, window_size=window_size, k=0.8) thresh_sauvola = threshold_sauvola(image, window_size=window_size) binary_niblack = image > thresh_niblack binary_sauvola = image > thresh_sauvola plt.figure(figsize=(8, 7)) plt.subplot(2, 2, 1) plt.imshow(image, cmap=plt.cm.gray) plt.title('Original') plt.axis('off') plt.subplot(2, 2, 2) plt.title('Global Threshold') plt.imshow(binary_global, cmap=plt.cm.gray) plt.axis('off') plt.subplot(2, 2, 3) plt.imshow(binary_niblack, cmap=plt.cm.gray) plt.title('Niblack Threshold') plt.axis('off') plt.subplot(2, 2, 4) plt.imshow(binary_sauvola, cmap=plt.cm.gray) plt.title('Sauvola Threshold') plt.axis('off') plt.show() image_base = "images/crop/v5/" image_file = "image_3_bin_400x400_crop.jpg" img = cv2.imread(image_base+image_file, 0) ret, imgf = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY) ret, imgf2 = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) ret, imgf3 = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU) plt.subplot(3, 1, 1), plt.imshow(img, cmap='gray') plt.title('Original Noisy Image'), plt.xticks([]), plt.yticks([]) plt.subplot(3, 1, 2), plt.hist(img.ravel(), 256) plt.axvline(x=ret, color='r', linestyle='dashed', linewidth=2) plt.title('Histogram'), plt.xticks([]), plt.yticks([]) plt.subplot(3, 1, 3), plt.imshow(imgf, cmap='gray') plt.title('Otsu thresholding'), plt.xticks([]), plt.yticks([]) plt.show() imgf = imgf3-imgf2 exp="otsu-binary-otsu" cv2.imwrite("binaries/test/"+exp+".jpg", imgf) binary_image_file_path="binaries/test/otsu.jpg" base_path = "binaries/test" bin_dir = "binaries/test/"+exp+".txt" image_dir="binaries/test" experiment_base = image_dir np.savetxt(bin_dir, imgf, delimiter=',', fmt='%d') References: OpenCV, Skimage VhdlInput API Binarization Method 2 Binarization Method 3
  • 12. File Formats and Data :Filter User Input Filter Pre-Processing API Converts to a Vector Format
  • 13. File Formats and Data : Image 1. Original Image converted to numeric form contains values between 0-255. 2. For the project we need this to be only at two levels, so we first convert the color image to grayscale and this is the grayscale output. 3. Still we need to get the black and white nature which provides only two states of values 0 or 255 Our Python ImageAPI reads...
  • 14. File Formats and Data :Binarized Image (0-255) User Input Image in Terms of Numbers Pre-Processing API Adds Padding (Important for Convolution to avoid losing edge information)
  • 15. File Formats and Data :VHDL Binarized Image (0-1) 1. Images data points with significantly small goes to 0 and higher values goes to 1. Pre-Processing API Converts to Binary Format
  • 16. File Formats and Data :Sliding Window VHDL INPUT 1. Input to VHDL File read is a delimiter less matrix of Mx9. 2. M means the number of sliding windowing times 3. 9 becomes the filter vector size (as we choose 3x3 matrix for filter) 4. Here M = 160,000 Pre-Processing API Generates Sliding-windowed matrix
  • 17. 4x4 Window=>16 bit 5x5 Window=>25 bit 6x6 Window=>36 bit 7x7 Window=>49 bit 8x8 Window=>64 bit Different Windowing Size for VHDL Input Generation (9 bit - 64 bit)
  • 18. Input Processing Summary 500 x 472 500 x 472 400 x 400 1. Download images from ImageNet 2. Convert to Grayscale 3. Crop 400x400 image portion 4. Transform to binary format (0,255) 5. Scaling down transformation for XNOR- POP (0,1) Binary File (0,1) scaled Load Binarized (OpenCV) Scale to 0,1 (lower values converted to zero and higher values to 1 Padding By 1 on all sides range of values : 0-255
  • 19. Input File Definitions 1. In this project we define couple of file formats. 2. First we keep .sld files containing binary files applied with sliding windowing for the ease of convolution computation. The image files which are binarized are converted to this format. 3. The filters are with .filter extension and the filter is also a flattened matrix which is actually vector and this is also done for the computation ease. 4. The VHDL I/O component takes the .sld and .fliter files as inputs and do the convolution.
  • 20. Theory : Simple example 1. Lets convolute A = 10010 and B = 01111 : A ⊗ B. ○ Expensive multiply-accumulate (MAC) method: A ⊗ B = (1 * 0) + (0 * 1) + (0 * 1) + (1 * 1) + (0 * 1) = +1 Considering 0 as -1: A ⊗ B = (1 * -1) + (-1 * 1) + (-1 * 1) + (1 * 1) + (-1 * 1) = -3 ○ XNORPOP: scaling term: Result = 2*P - N, where N is the total number of bits POPCOUNT(XNOR(A, B)) = POPCOUNT(00010)= +1 Scaled Result : 2*P - N = 2*1 - 5 = -3 1. Now : Lets convolute A = 110110111 and B = 010111010: A ⊗ B. ○ Expensive multiply-accumulate (MAC) method: A ⊗ B = (1 * 0) + (1 * 1) + (0 * 0) + (1 * 1) + (1 * 1) + (0 * 1)+ (1 * 0) + (1 * 1) + (1* 0) = +4 0 ->-1: A ⊗ B = (1 *-1) + (1 * 1) + (-1 *-1) + (1 * 1) + (1 * 1) + (-1* 1)+ (1 *-1) + (1 * 1) + (1*-1) = +1 ○ XNORPOP: If you replace (0,1) =>(-1, 1) , scaling term: Result = 2*P - N, where N is the total number of bits POPCOUNT(XNOR(A, B)) = POPCOUNT(011110010)= +5 Scaled Result : 2*P - N = 2*5 - 9 = +1
  • 21. Vhdl code: XNOR-POP: Version 1
  • 22. Vhdl Code: IO Handing
  • 23. Convolution Results Comparison With multiply-accumulate (MAC - Python) With XNOR-POP (VHDL)
  • 24. Enhanced Comparison Input VHDL output Difference Difference between input image and convoluted output after 1 layer
  • 25. Screen capture of VHDL simulation for XNOR-POP
  • 26. Deliverables @ Proposal Presentation ● Data files: ○ Input image ○ Preprocessed input image ○ Filter image ○ Convoluted output image (XNOR-POP) ○ Hand calculated output image (using MAC) ● Source codes: ○ Python APIs for preprocessing (Vibhatha) (Completed) ○ VHDL code for file reading (I) (Vibhatha) (Completed) ○ VHDL code for XNOR-POP (Kadupitiya)(Completed) ○ VHDL code for file writing (O) (Kadupitiya) (Completed) ○ Python API for post processing (Vibhatha) (Completed) ○ Python API for hand calculated output (Kadupitiya) (Completed)
  • 28. Binarization Methodologies 1. First we implemented our own idea of choosing a random threshold like 127 and substituting 0 and 1 depending being greater or lesser than the threshold. (Proposal presentation approach) 2. In the second approach “Niblak Thresholding” was used. 3. In the third approach “Sauvola Thresholding” was used. 4. In second and third approaches we used SkImage implementations of these thresholding mechanisms to generate the binary images. 5. As for the fourth mechanism, we used the OTSU binarization mechanism along with couple of variations. a. Binary + OTSU binarization b. OTSU binarization
  • 29. Niblak and Sauvola Binarization Against Window Size window_size = 3 window_size = 5
  • 30. Niblak and Sauvola Binarization Against Window Size window_size = 7 window_size = 9
  • 31. Niblak and Sauvola Binarization Against Window Size window_size = 63
  • 32. SkImage Binarization Pure Binary Version Otsu + Binary Otsu
  • 33. Comparison of SkImage Binarization Outputs We subtracted each binary output from the other binary outputs OTSU-(Binary+OTSU) OTSU-Binary The binarization outputs from Skimage matches with each filter variation.
  • 34. Binarization Method Conclusion 1. By experiments, concluded to use OpenCV default binarization mechanism. 1. The default binary mechanism in OpenCV provided the same binarization characteristics as the other approaches.
  • 37. XNOR-POP: Version 2 By adding the bit sequence together. ● This is much more simpler than design 1, and we will discuss synthesis results later. ● Can we achieve more performance still?
  • 38. XNOR-POP: Version 3 (Gate level code for design 2) We can use a series of full adders(with 3 inputs inside the block) and half adders(with 2 inputs inside the block) ● This is much more complex than design 2. ● Is it worth the effort?
  • 39. Synthesis Results Name of Design Combination delay Number of Slice LUT’s used Design 1 6.642 ns 47 Design 2 3.330 ns 25 Design 3 3.260 ns 21 ● Design 2 is far better than design 1. ● Design 3 has slight improvement over design 2. ● Design 2 was used for 25, 36, 49 and 64 bit configurations as design 3 was so complex. ● We check the Synthesis report for popcount design (16-bit) using xilinx.
  • 40. Convolution for different filter sizes ● We implemented convolution operation for 9, 16, 25, 36, 49 and 64 bit sizes. ● We tested our modules with two sets of filters. ○ Box filter (Kernel) : 3x3, 4x4, 5x5, 6x6, 7x7 and 8x8 ○ Cross filter (Kernel) : 3x3, 4x4, 5x5, 6x6, 7x7 and 8x8
  • 41. VHDL simulations for 9, 16, 25, 36, 49 and 64 bits Pop counts 8x8 3x3 5x5 7x7 6x6 4x4
  • 42. Python Comparison Module (VHDL vs MAC Output) def conv(filter, array): output1 = [] rows = len(array) cols = len(array[0]) for i in range(0, rows): count = 0 for j in range(0, cols): count = count + array[i][j] * filter[j] if (count > 0): count = 1 else: count = 0 output1.append(count) return output1 filterSize = 8 vhdl_output_bin_file = 'output' + str(filterSize) + 'x' + str(filterSize) + '.txt' size = 395 inputImagevd = np.loadtxt(vhdl_output_bin_file) inputImage1 = np.reshape(inputImagevd, (size, size)) plt.imshow(inputImage1, cmap=plt.get_cmap('gray')) # plt.show() input_bin_file = 'input_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt' filter_bin_file = 'filter_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt' output_bin_file = 'output_mac' + str(filterSize) + 'x' + str(filterSize) + '.txt' inputImage2read = np.loadtxt(input_bin_file, delimiter=',') inputFilter = np.loadtxt(filter_bin_file) inputImage2read[inputImage2read == 0] = -1 inputFilter[inputFilter == 0] = -1 ar1 = np.array(conv(inputFilter, inputImage2read)) np.savetxt(output_bin_file, ar1, delimiter="", fmt='%d') inputImage2 = np.reshape(ar1, (size, size)) plt.imshow(inputImage2, cmap=plt.get_cmap('gray')) print("comparing 8x8 image outputs : " + str(np.array_equal(inputImage1, inputImage2))) scipy.misc.imsave('image1_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage1) scipy.misc.imsave('image2_' + str(filterSize) + 'x' + str(filterSize) + '.jpg', inputImage2) 8x8 Window Size Output Comparison
  • 43. Convolution Operation Results for Box-Shaped Filter 8x8 3x3 4x4 5x5 7x76x6
  • 44. 8x8 3x3 4x4 5x5 7x76x6 Convolution Operation Results for Cross-Shaped Filter
  • 45. Convolution Operation Conclusion 1. Depending on the number of pixels with ones and zeros, the effect from the filter can be understood. 2. Depending on the percentage of ones in a filter,we observed a unique behavior. 3. When the ones percentage is lesser than or equal to 44%, the inversion and the dilation happens. When the percentage of ones is greater than 44% the erosion takes place.
  • 46. Deliverables @ Final Presentation ● Data files: ○ Multiple Input images ○ Preprocessed input image s ○ Filter images ○ Convoluted output image (XNOR-POP) ( Tested and Enhanced in two different ways) ○ Hand calculated output image (using MAC) ● Source codes: ○ Dynamic Python APIs for preprocessing (Vibhatha) (Completed) ○ Python Binarization Mechanisms (Vibhatha) (Completed) ○ VHDL code for file reading (I) (Vibhatha) (Completed) ○ VHDL code for XNOR-POP (Kadupitiya)(Completed) ○ VHDL code for file writing (O) (Kadupitiya) (Completed) ○ VHDL Enhanced Pop Count (Kadupitiya) (Completed) ○ Multiple Window Size Comparison (Kadupitiya & Vibhatha) (Completed) ○ Python API for post processing (Vibhatha) (Completed) ○ Python API for hand calculated output (Kadupitiya) (Completed)
  • 47. Track Commits and Latest Releases of Project 1. Github Code Base (Till 2018/April/18) : https://github.com/vibhatha/Computer-Eng- Project 1. Github Official Code Base : https://github.com/iuisefinalprojects/ice 1. Current released version is 4.0.0 a. https://github.com/iuisefinalprojects/ice/tree/master/release-4.0.0 1. Github.io : https://iuisefinalprojects.github.io/pages/projects/ice/ice-proj.html
  • 48. References [1] Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016, October). Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (pp. 525-542). Springer, Cham. [2] Jiang, L., Kim, M., Wen, W., & Wang, D. (2017, July). XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs. In Low Power Electronics and Design (ISLPED, 2017 IEEE/ACM International Symposium on (pp. 1-6). IEEE. [3] Kim, M., & Smaragdis, P. (2016). Bitwise neural networks. arXiv preprint arXiv:1601.06071. [4] Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays (pp. 161-170). ACM.

Editor's Notes

  • #19: The border vanishes from the 3rd image in bottom row because of scaling to 0 and 1s. It cause loosing some small details from the picture. And also for the classification of a cat or not for CNN the color doesn’t matter. Only boundary conditions matters.
  • #21: Check why Bipolar notation(-1/+1) is used instead of 1/0 = otherwise we cant replace MAC with XNORPOP Image can have positive or negative values: -127 +128 (0 - 255) Sign(of all inputs and weights) -> you get bipolar notation But for XNORPOP it assumes -1 as 0 and do the operation Finally convert values to 2P-N.
  • #24: Averaging filter=> should remove the noise and smooth the image. Similar Laplacian filter
  • #26: Averaging filter=> should remove the noise and smooth the image. Similar Laplacian filter
  • #38: Another way to achieve our purpose, would be to add all the bits in our input. Think of it as a sequence of 16 one bit-adders. The zeros in the input vector will not change the sum and effectively we get the sum as the number of ones in the vector. Version 2 much more simpler than design 1 and the synthesis results showed that its much more faster and uses less LUT's
  • #39: As you can see the code looks much more complicated for Design 3. Is it worth the effort?
  • #40: The trick is to click whatever module you want to get the slice count for and set it as the top level module by going to Source->Set as top level module. Once you do that, under the Processes pane (making sure the module is still highlighted in the Sources pane) go to the Synthesize - XST and double click 'View Synthesis Report'. The number of slices for that module is then listed in that report. Design Summary