SlideShare a Scribd company logo
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
DOI: 10.5121/ijcsitce.2019.6101 1
CONTRAST OF RESNET AND DENSENET BASED ON
THE RECOGNITION OF SIMPLE FRUIT DATA SET
Ding Tianye
Hangzhou Foreign Language School, Hangzhou, Zhejiang, China
ABSTRACT
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used
Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different
kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer)
are used. The mathematic principle, experiment detail and the experiment result will be explained through
comparison.
KEYWORDS
Deep learning, Object recognition, Computer vision, Image processing, Convolutional Neural Networks.
1. INTRODUCTION
The aim of this paper is to discover the learning efficiency and convergence rate in machine
learning of ResNet and DenseNet_BC through comparable experiments. The fruit image data set
[1] used in the experiment consists of images of 100*100 with black/white background and
Figure 1. one of the images in the image data set [1]
without noise interference. Including 50 kinds of different fruits, 25,100 images in total as
training data and 12,700 images as testing data. In the experiment, the only consideration is how
many times is needed to train the neural network so that it can have an accuracy of more than
98%. The first step in the training process is to pre-process the input images, using the Python
module OpenCV to turn the images into RGB channel images, and divide each channel value by
225 so that the resultant value is in the range of 0 to 1. Combining 50 images as an input batch,
determining the loss between each prediction and actual value through computing the cross
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
2
entropy, , every time after 10 times of iterate input, compute the recognition
accuracy and output as a way of visualization, and train the two networks for 2,000 times before
input the testing data. Each convolutional layer uses ReLU(Rectified Linear Unit) as activation
function, using a layer of batch normalization between each code block and a layer of dropout
with keep probability of 0.5 to avoid the appearance of over fitting, the training of both neuron
networks uses Adam Optimizer and epsilon of 0.1, and the setting of learning rate uses the
method of Learning Rate Exponential
Decay ( ), the decay step is
the same as the total training time, setting stair case as True, the initial learning rate is 1e-3, using
the decay rate of 0.96. The hardware environment used in the experiment is running on GPU
GTX 1070 with allocated memory of 5.0Gb, the framework used is TensorFlow [2] developed by
Google in 2015.
2. DEEP LEARNING
The deep learning neuron networks are usually consisted by multiple layers, the input data of
each layer is the output of the previous layer, compare with shallow learning, deep learning is
usually been regarded a great from Weak AI towards Strong AI.
The Convolutional Neural Networks (CNN) can be categorized as supervised learning in deep
learning, which means that the training process of the neuron networks needs to provide not only
the input data, but also the actual data used to calculate the loss between prediction value and
actual value, and the optimizers will use different back propagation algorithms to contribute the
total loss onto each neuron and the activation functions inside each neuron will change the
parameters inside each neuron, so that after enough data input to the network and trained for
enough times, the neuron network will tend to find the local or global optimize and achieve a
good enough performance on the particular question that it is trained for.
CNN is most widely used in the field of image recognition, a paper [3] has proven the idea that
the convolutional neural networks have a better and those advanced image recognition networks
like ResNet and DenseNet are developed on the base of CNN. That is one reason why fruit data
set [1] is chosen as training data. The pooling layers in the CNN are used to reinforce and
compress the feature in each feature map to reduce the possibility of feature disappear.
3. TWO CONVOLUTIONAL NETWORKS
In the neuron networks that were used in the computer vision and image recognition before, with
the increase of the depth of the network, gradient disappear of explosion and OOM (Out of
Memory) lead to the reduction in the accuracy of the neuron network has turned into a difficulty
that many teams or individuals are trying to overcome.
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
3
2.1. ResNet
Figure 2. The parameters of different structures of ResNet, the structure used was 34-layer
In 2015, the introduction of ResNet (Residual Convolutional Network), which has won the
champion of classification in the ImageNet competition, greatly narrow down the problem
through the method called residual transmission- using simple code block to transfer the input and
the output in the previous layer as residual to be the input data of the next layer, as shown in
Figure 3, which provides a channel for the input gradient in each layer only has change in
dimension but no need to process and get into the next layer of neuron network. The principle of
ResNet is simple code blocks, stacking and connecting by channels, not only simplify the
complexity of neuron networks, but also reduce the memory occupied by the session in the
running process, which greatly reduce the probability of OOM, improve the efficiency of machine
learning and the rate of gradient convergence, so that many programmers are keen on using
ResNet in supervised learning.
Figure 3. Visible structure of ResNet
The formula of ResNet in the Essay “Densely Connected Convolutional Networks” [4] is briefly
introduced as “ ” where represents the input data of current layer,
represents non-linear transformation, the input of current layer is residual formed by combining
the input and output of the previous layer and transmit to the next layer. Also, each code block in
the ResNet contains residual block used to convolution and the identity block used to directly
transmit gradient, the resultant output of each block is , and represents
the part of residual. Theoretically, with the increase in the depth of the neuron network, the output
of part of the residual blocks will gradually tend to be 0, but the channel through identity block is
still transmitting the gradient, so that the problem of gradient disappear won’t happened in the
network, and the network can stay in the optimal state.
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
4
When the two adjacent blocks have a change in the convolutional kernel depth, there will be an
identity block with transformation operation to change the dimension of the input tensor, so the
dimension of input tensor and the output tensor will stay the same on the y-axis.
The convolution layers of the ResNet-34 in the experiment except from the very early stage and
the full-connected layers are all using a kernel size of 3*3, the output number of feature maps in
each stage is separately 64, 128, 256 and 512, ensuring that the computing speed and learning
efficiency won’t be influenced because of the output number of feature maps is too large.
Normalizing the value of RGB into the range of 0 to 1 can reduce the computation complexity
while training. And the labels of actual values are processed into one hot code.
During the training process, in the first 190 times of input data batch iteration, the output
accuracy of the neuron network doesn’t have a great increase, is about 0% to 6%. After 190 times
of iteration, it appears to be a rapid increase of the recognition accuracy and when the training
step reaches about 630 to 640, the accuracy first reached about 98%; in the further training, the
accuracy is about 94% to 100%. After 2000 times of training, the output testing accuracy is about
96% to 98%
.2.2. DenseNet
Figure 4. The parameters of different DenseNet structures, the structure used was DenseNet_BC-121
from Densely Connected Neuron Network [4]
In CVPR 2017, the oral paper of “Densely Connected Neuron Networks” [4] by Gao Huang,
Zhuang Liu, Kilian Q. Weinberger and Laurens van der Maaten, which firstly introduce a
completely new neuron network structure- DenseNet, which focuses on solving the problem of
ResNet (the gradient passing through residual might be impeded when the network gets deeper),
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
5
Figure 5. the visible structure of DenseNet form Densely Connected Neuron Networks [4]
the paper describes the brief formula of DenseNet as “ ”, this
function was achieved in the experiment through stacking the output of each previous layer by the
3rd
axis as input of the current layer, adopt a growth rate of 32, which means the output of each
layer is 32, and drop rate of 0.5, means in the transition layer, the number of output feature maps
is reduced into half of the initial number. Compare to ResNet, theoretically, because the
DenseNet is narrower than ResNet, the DenseNet with same number of neuron layers will have
less parameter and the because of the bottleneck layer and small number of output feature maps
(every layer usually have a small convolutional kernel depth), which can save the memory used
while running the session, and because input of every neuron layers is directly access to the input
and output gradient of each layer, so the problem of gradient disappear won’t exist with the
increase in the depth of neuron network.
In the training process, the assumption is that the training of DenseNet is more speed-efficient
and takes fewer steps for converging to a higher accuracy compares with the performance of
ResNet. However, the output result is not as ideal as the expectation, the time taken for the
network to reach an output accuracy is much long than the training process of ResNet-34, which
might be cause by the depth of the neuron network, furthermore, when the training step reached
about 200, there isn’t an obvious improvement in the training accuracy, even after 2000 times of
training, there is only an increase in accuracy about 10% to 35%, then several tries for eliminating
the bottleneck layer to keep the most number of feature maps, whereas still the training result
isn’t as ideal as expectation.
2.3. Code Segments
In the experiment, the TensorFlow [2] framework was used, import tensorflow as tf, also import
the module slim, from tensorflow.contrib.slim as slim, to simplify the complex procedure of
defining a complete convolutional layer.
In both experiments, the tensor of weights and biases are randomly produced as below:
define function weights with parameter(shape)
init = tf.truncated_normal with parameter(shape, standard deviation=0.01) (tensor
with shape filled with random truncated normal with standard deviation 0.01)
return variable init as the output of the function
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
6
define function biases with parameter(shape)
init = tf.constant with parameter(0.02, shape) (tensor with shape filled with 0.02)
return variable init as the output of the function
In the experiment of ResNet-34, the convolutional layers with the transformation in
each stage is defined as below:
define function conv with parameters (inputs, out_size, k_size)
x_short_cut = inputs
conv1 = 2-dimensional convolution (inputs, number of output feature
maps=out_size, kernel size=k_size, stride=2, padding='SAME', without activation
function)
conv1_output = ReLU activation function (batch normalize (conv1 on axis=3))
conv2 = 2-dimensinal convolution (conv1_output, number of output feature
maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation
function)
conv2_output = batch normalize (conv2 on axis=3)
input_conv = 2-dimensinal convolution (x_short_cut, number of output feature
maps =out_size, kernel size=k_size, stride=2, padding='SAME', without activation
function)
input_reshape = batch normalize (input_conv on axis=3)
output = ReLU activation function (input_reshape+conv2_output)
return output as the output of the function
The identity blocks which include the channel for passing the residual of input data
and the output result is defined as below:
define identity with parameters (inputs, out_size, k_size)
x_short_cut = inputs
conv1 = 2-dimensinal convolution (inputs, number of output feature
maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation
function)
conv1_output = ReLU activation function (batch normalize (conv1 on axis=3))
conv2 = 2-dimensinal convolution (conv1_output, number of output feature
maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation
function)
conv2_BN = batch normalize (conv2 on axis=3)
conv2_output = ReLU activation function (conv2_BN+x_short_cut)
return conv2_output as the output of the function
In the experiment of DenseNet_BC-121, the dense blocks with growth rate of 32
and a bottle neck layer is defined as below:
Define function dense with parameters (inputs, growth_rate=32,
internal_layer=True, keep_prob)
x_input = batch normalize (inputs, axis=3)
x_relu = ReLU activation function (x_input)
# bottleneck layer
conv1 = 2-dimensinal convolution (x_relu, number of output feature
maps=growth_rate*4, kernel size=1*1, stride=1, padding='SAME', without
activation function)
conv1_dropout = Dropout (conv1, keep percentage=keep_prob)
conv1_relu = ReLU activation function (batch normalize (conv1_dropout on
axis=3))
conv2 = 2-dimensinal convolution (conv1_relu, number of output feature
maps=growth_rate, kernel size=3*3, stride=1, padding='SAME', without activation
function)
conv2_dropout = Dropout (conv2, keep percentage=keep_prob)
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
7
if internal_layer is True then
output = stack ([inputs, conv2_dropout] on axis=3)
else
output = conv2_dropout
return output as the output of the function
The transition block used to compress the dimension of input tensor is defined as below:
Define function transition with parameters (inputs, drop=0.5)
x_input = batch normalize (inputs on axis=3)
conv = 2-dimensinal convolution (x_input, number of output feature maps=the depth of tensor
x_input*drop, kernel size=1*1, stride=1, padding='SAME', without activation function)
h1 = average pooling (conv, pooling kernel size=2*2, strides=2, padding='SAME')
return h1 as the output of the function
2.4. Experiment details
The procedure of tuning parameters in the experiment was mainly focus on the learning rate of
the neural networks and the default decay rate in exponential decay was 0.96. The method of
tuning refers to A Practical Guide [5], range while tuning the learning rate is set between 1E-4
and 0.7, first start with 0.7 and observe the rate of convergence through the output accuracy.
Bigger learning rate was easier for the accuracy to rise to about 95%, whereas the prediction will
start to move back and forth the local optimum and hardly make further improvements. Then the
learning rate will be change to 0.3, but the problem still occurs. However, if the initial learning
rate was too small, it will take a long time to converge since the beginning of training process.
Ultimately, it turned out to be that 0.003 is the most efficient learning rate for ResNet after
limited times of experiments.
2.5 Limitations
On the one hand, the neural network structure used in the experiment procedure was programmed
by myself based on the understanding of the papers published by the developers of these two
neural networks, rather than using the existed code segments provided by the developers. On the
other hand, one factor that may cause the slow convergence rate of DenseNet compares to ResNet
may caused by the depth of the neural network which was 121, much deeper than 34 layers
ResNet and had more connections.
4. CONCLUSION
Through the comparison of the two experiments, it is obvious that ResNet-34 performs better than
the DenseNet_BC-121 on the training and adaption of the fruit data set [1], since the efficiency of
learning, the rate of fitting the data set is much faster and takes less time to return a resultant
accuracy. It can be concluded that ResNet performs much better than DenseNet on simple data
sets, since the ResNet avoids the disappearance of gradient through residual passing, so that the
current layer is able to get both the input gradient and tensor of the previous layer and the output
tensor and gradient. However, in the DenseNet, each layer needs to compute all of the input and
output tensor and gradient of all the previous layers, when it is applied on those simple data sets,
this might cause an increment in the computation complexity also it might blur the output
processed tensor and gradient of the previous layer, so it resulted in only a limited improvement
in its performance after 2000 times of training and might be the reasons why the using of
DenseNet structure now is not as much as the using of ResNet structure. According to one of the
developers of DenseNet [6], the structures of DenseNet are usually narrower but much deeper, so
that the training process of DenseNet is more parameter-efficiency but less memory and speed-
efficiency, which is mainly why the training process is slow and lack of increment in the
accuracy.
The International Journal of Computational Science, Information Technology and Control Engineering
(IJCSITCE) Vol.6, No.1, January 2019
8
ACKNOWLEDGEMENTS
It gives me great pleasure in acknowledging the support and help of Professor Juntao Ye. I would
like to thank my parents for giving me support when I met difficulties. And everyone who gives
me assistance when I met troubles.
REFERENCES
[1] Fruits-360, Version: 2018.07.01.0. https://www.kaggle.com/moltean/fruits. last visited on 12.07.2018.
[2] TensorFlow. https://www.tensorflow.org. last visited on 15.07.2018.
[3] M. Liang, X. Hu, Recurrent Convolutional Neural Network for Object Recognition, IEEE Conference
on Computer Vision and Pattern Recognition (CVPR) Boston, pp. 3367-3375, 2015.
[4] G. Huang, Z. Liu, K. Weinberger, L. Maaten, Densely Connected Convolutional Networks, 2017.
[5] keitakurita, Learning Rate Tuning in Deep Learning: A Practical Guide, 2018.
http://mlexplained.com/2018/01/29/learning-rate-tuning-in-deep-learning-a-practical-guide/. last
visited on 15.01.2019.
[6] Z. Liu, answering why DenseNet requires more memory in training, 2017.
https://www.reddit.com/r/MachineLearning/comments/67fds7/d_how_does_densenet_compare_to_re
snet_and/. last visited on 12.01.2019
AUTHORS
Ding Tianye, year 11 senior high student in Hangzhou Foreign Language School,
has a month of summer program experience in National Laboratory of Pattern
Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences
(CASIA). Currently the president of Developer Association, studying machine
learning.

More Related Content

PDF
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
PDF
D028018022
PDF
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
PDF
A survey research summary on neural networks
PDF
Ijetcas14 527
PDF
STUDY OF TASK SCHEDULING STRATEGY BASED ON TRUSTWORTHINESS
PDF
On Text Realization Image Steganography
PDF
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
D028018022
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
A survey research summary on neural networks
Ijetcas14 527
STUDY OF TASK SCHEDULING STRATEGY BASED ON TRUSTWORTHINESS
On Text Realization Image Steganography
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...

What's hot (20)

DOCX
Digit recognition using mnist database
PDF
A Survey of Deep Learning Algorithms for Malware Detection
PDF
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
PDF
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
PDF
ANN based STLF of Power System
PDF
Bp34412415
PDF
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
PDF
Data clustering using kernel based
PPTX
Convolutional neural network from VGG to DenseNet
PDF
PDF
Hidden Layer Leraning Vector Quantizatio
PDF
Knowledge distillation deeplab
PDF
Efficient design of feedforward network for pattern classification
PDF
A novel secure image steganography method based on chaos theory in spatial do...
PDF
Reconfiguration layers of convolutional neural network for fundus patches cla...
PDF
F017533540
PDF
3ways to improve semantic segmentation
PDF
Neural network based image compression with lifting scheme and rlc
PPTX
Image Compression Using Neural Network
Digit recognition using mnist database
A Survey of Deep Learning Algorithms for Malware Detection
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ANN based STLF of Power System
Bp34412415
Mobile Network Coverage Determination at 900MHz for Abuja Rural Areas using A...
Data clustering using kernel based
Convolutional neural network from VGG to DenseNet
Hidden Layer Leraning Vector Quantizatio
Knowledge distillation deeplab
Efficient design of feedforward network for pattern classification
A novel secure image steganography method based on chaos theory in spatial do...
Reconfiguration layers of convolutional neural network for fundus patches cla...
F017533540
3ways to improve semantic segmentation
Neural network based image compression with lifting scheme and rlc
Image Compression Using Neural Network
Ad

Similar to CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA SET (20)

PDF
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PPTX
convolutional_neural_networks.pptx
PPTX
CNN Arcitecture Implementation Resnet CNN-RESNET
PDF
PDF
ResNet basics (Deep Residual Network for Image Recognition)
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
PDF
ImageNet Classification with Deep Convolutional Neural Networks
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
CNNs: from the Basics to Recent Advances
PDF
Handwritten Digit Recognition using Convolutional Neural Networks
PDF
Finding the best solution for Image Processing
PPTX
ResNet.pptx
PDF
Convolutional Neural Networks : Popular Architectures
PPTX
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
PPTX
ResNet.pptx
PPTX
Multi-class Image Classification using deep convolutional networks on extreme...
PPTX
Multi-class Image Classification using Deep Convolutional Networks on extreme...
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
convolutional_neural_networks.pptx
CNN Arcitecture Implementation Resnet CNN-RESNET
ResNet basics (Deep Residual Network for Image Recognition)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
ImageNet Classification with Deep Convolutional Neural Networks
DLD meetup 2017, Efficient Deep Learning
CNNs: from the Basics to Recent Advances
Handwritten Digit Recognition using Convolutional Neural Networks
Finding the best solution for Image Processing
ResNet.pptx
Convolutional Neural Networks : Popular Architectures
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
ResNet.pptx
Multi-class Image Classification using deep convolutional networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Convolutional neural networks for image classification — evidence from Kaggle...
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Ad

More from rinzindorjej (20)

PDF
14th International Conference on Advanced Computer Science and Information Te...
PDF
Using Gen AI Agents With GAE and VAE to Enhance Resilience of US Markets
PDF
Prediction Based Cloud Bandwidth and Costreduction System of Cloud Computing
PDF
14th International Conference on Information Technology Convergence and Servi...
PDF
Real-Time Mobile App Traffic Sign Recognition with YOLOv10 and CNN for Drivin...
PDF
12th International Conference on Computer Science, Engineering and Informatio...
PDF
On-Board Satellite Image Compression Using the Fourier Transform and Huffman ...
PDF
11th International Conference on Data Mining (DaMi 2025)
PDF
Ed-Edily M. Azhari, Mudzakkir M. Hatta, Zaw Zaw Htike and Shoon Lei Win, Inte...
PDF
Rim Hendel1, Farid Khaber1 and Najib Essounbouli2, 1University of Setif, Alge...
PDF
Design of Fast Transient Response, Low Dropout Regulator with Enhanced Steady...
PDF
A Study on Optical Character Recognition Techniques
PDF
AN EXQUISITE APPROACH FOR IMAGE COMPRESSION TECHNIQUE USING LOSSLESS COMPRESS...
PDF
Adaptive Type-2 Fuzzy Second Order Sliding Mode Control for Nonlinear Uncerta...
PDF
A Hybrid Critical Path Methodology - ABCP (As Built Critical Path); its Imple...
PDF
International Journal of Computational Science, Information Technology and Co...
PDF
Adaptive Type-2 Fuzzy Second Order Sliding Mode Control for Nonlinear Uncerta...
PDF
ADAPTIVE TYPE-2 FUZZY SECOND ORDER SLIDING MODE CONTROL FOR NONLINEAR UNCERTA...
PDF
Color Satellite Image Compression Using the Evidence Theory and Huffman Coding
PDF
Evaluating the Effects of Repetitive Task Execution on Performance and Learni...
14th International Conference on Advanced Computer Science and Information Te...
Using Gen AI Agents With GAE and VAE to Enhance Resilience of US Markets
Prediction Based Cloud Bandwidth and Costreduction System of Cloud Computing
14th International Conference on Information Technology Convergence and Servi...
Real-Time Mobile App Traffic Sign Recognition with YOLOv10 and CNN for Drivin...
12th International Conference on Computer Science, Engineering and Informatio...
On-Board Satellite Image Compression Using the Fourier Transform and Huffman ...
11th International Conference on Data Mining (DaMi 2025)
Ed-Edily M. Azhari, Mudzakkir M. Hatta, Zaw Zaw Htike and Shoon Lei Win, Inte...
Rim Hendel1, Farid Khaber1 and Najib Essounbouli2, 1University of Setif, Alge...
Design of Fast Transient Response, Low Dropout Regulator with Enhanced Steady...
A Study on Optical Character Recognition Techniques
AN EXQUISITE APPROACH FOR IMAGE COMPRESSION TECHNIQUE USING LOSSLESS COMPRESS...
Adaptive Type-2 Fuzzy Second Order Sliding Mode Control for Nonlinear Uncerta...
A Hybrid Critical Path Methodology - ABCP (As Built Critical Path); its Imple...
International Journal of Computational Science, Information Technology and Co...
Adaptive Type-2 Fuzzy Second Order Sliding Mode Control for Nonlinear Uncerta...
ADAPTIVE TYPE-2 FUZZY SECOND ORDER SLIDING MODE CONTROL FOR NONLINEAR UNCERTA...
Color Satellite Image Compression Using the Evidence Theory and Huffman Coding
Evaluating the Effects of Repetitive Task Execution on Performance and Learni...

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA SET

  • 1. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 DOI: 10.5121/ijcsitce.2019.6101 1 CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA SET Ding Tianye Hangzhou Foreign Language School, Hangzhou, Zhejiang, China ABSTRACT In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer) are used. The mathematic principle, experiment detail and the experiment result will be explained through comparison. KEYWORDS Deep learning, Object recognition, Computer vision, Image processing, Convolutional Neural Networks. 1. INTRODUCTION The aim of this paper is to discover the learning efficiency and convergence rate in machine learning of ResNet and DenseNet_BC through comparable experiments. The fruit image data set [1] used in the experiment consists of images of 100*100 with black/white background and Figure 1. one of the images in the image data set [1] without noise interference. Including 50 kinds of different fruits, 25,100 images in total as training data and 12,700 images as testing data. In the experiment, the only consideration is how many times is needed to train the neural network so that it can have an accuracy of more than 98%. The first step in the training process is to pre-process the input images, using the Python module OpenCV to turn the images into RGB channel images, and divide each channel value by 225 so that the resultant value is in the range of 0 to 1. Combining 50 images as an input batch, determining the loss between each prediction and actual value through computing the cross
  • 2. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 2 entropy, , every time after 10 times of iterate input, compute the recognition accuracy and output as a way of visualization, and train the two networks for 2,000 times before input the testing data. Each convolutional layer uses ReLU(Rectified Linear Unit) as activation function, using a layer of batch normalization between each code block and a layer of dropout with keep probability of 0.5 to avoid the appearance of over fitting, the training of both neuron networks uses Adam Optimizer and epsilon of 0.1, and the setting of learning rate uses the method of Learning Rate Exponential Decay ( ), the decay step is the same as the total training time, setting stair case as True, the initial learning rate is 1e-3, using the decay rate of 0.96. The hardware environment used in the experiment is running on GPU GTX 1070 with allocated memory of 5.0Gb, the framework used is TensorFlow [2] developed by Google in 2015. 2. DEEP LEARNING The deep learning neuron networks are usually consisted by multiple layers, the input data of each layer is the output of the previous layer, compare with shallow learning, deep learning is usually been regarded a great from Weak AI towards Strong AI. The Convolutional Neural Networks (CNN) can be categorized as supervised learning in deep learning, which means that the training process of the neuron networks needs to provide not only the input data, but also the actual data used to calculate the loss between prediction value and actual value, and the optimizers will use different back propagation algorithms to contribute the total loss onto each neuron and the activation functions inside each neuron will change the parameters inside each neuron, so that after enough data input to the network and trained for enough times, the neuron network will tend to find the local or global optimize and achieve a good enough performance on the particular question that it is trained for. CNN is most widely used in the field of image recognition, a paper [3] has proven the idea that the convolutional neural networks have a better and those advanced image recognition networks like ResNet and DenseNet are developed on the base of CNN. That is one reason why fruit data set [1] is chosen as training data. The pooling layers in the CNN are used to reinforce and compress the feature in each feature map to reduce the possibility of feature disappear. 3. TWO CONVOLUTIONAL NETWORKS In the neuron networks that were used in the computer vision and image recognition before, with the increase of the depth of the network, gradient disappear of explosion and OOM (Out of Memory) lead to the reduction in the accuracy of the neuron network has turned into a difficulty that many teams or individuals are trying to overcome.
  • 3. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 3 2.1. ResNet Figure 2. The parameters of different structures of ResNet, the structure used was 34-layer In 2015, the introduction of ResNet (Residual Convolutional Network), which has won the champion of classification in the ImageNet competition, greatly narrow down the problem through the method called residual transmission- using simple code block to transfer the input and the output in the previous layer as residual to be the input data of the next layer, as shown in Figure 3, which provides a channel for the input gradient in each layer only has change in dimension but no need to process and get into the next layer of neuron network. The principle of ResNet is simple code blocks, stacking and connecting by channels, not only simplify the complexity of neuron networks, but also reduce the memory occupied by the session in the running process, which greatly reduce the probability of OOM, improve the efficiency of machine learning and the rate of gradient convergence, so that many programmers are keen on using ResNet in supervised learning. Figure 3. Visible structure of ResNet The formula of ResNet in the Essay “Densely Connected Convolutional Networks” [4] is briefly introduced as “ ” where represents the input data of current layer, represents non-linear transformation, the input of current layer is residual formed by combining the input and output of the previous layer and transmit to the next layer. Also, each code block in the ResNet contains residual block used to convolution and the identity block used to directly transmit gradient, the resultant output of each block is , and represents the part of residual. Theoretically, with the increase in the depth of the neuron network, the output of part of the residual blocks will gradually tend to be 0, but the channel through identity block is still transmitting the gradient, so that the problem of gradient disappear won’t happened in the network, and the network can stay in the optimal state.
  • 4. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 4 When the two adjacent blocks have a change in the convolutional kernel depth, there will be an identity block with transformation operation to change the dimension of the input tensor, so the dimension of input tensor and the output tensor will stay the same on the y-axis. The convolution layers of the ResNet-34 in the experiment except from the very early stage and the full-connected layers are all using a kernel size of 3*3, the output number of feature maps in each stage is separately 64, 128, 256 and 512, ensuring that the computing speed and learning efficiency won’t be influenced because of the output number of feature maps is too large. Normalizing the value of RGB into the range of 0 to 1 can reduce the computation complexity while training. And the labels of actual values are processed into one hot code. During the training process, in the first 190 times of input data batch iteration, the output accuracy of the neuron network doesn’t have a great increase, is about 0% to 6%. After 190 times of iteration, it appears to be a rapid increase of the recognition accuracy and when the training step reaches about 630 to 640, the accuracy first reached about 98%; in the further training, the accuracy is about 94% to 100%. After 2000 times of training, the output testing accuracy is about 96% to 98% .2.2. DenseNet Figure 4. The parameters of different DenseNet structures, the structure used was DenseNet_BC-121 from Densely Connected Neuron Network [4] In CVPR 2017, the oral paper of “Densely Connected Neuron Networks” [4] by Gao Huang, Zhuang Liu, Kilian Q. Weinberger and Laurens van der Maaten, which firstly introduce a completely new neuron network structure- DenseNet, which focuses on solving the problem of ResNet (the gradient passing through residual might be impeded when the network gets deeper),
  • 5. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 5 Figure 5. the visible structure of DenseNet form Densely Connected Neuron Networks [4] the paper describes the brief formula of DenseNet as “ ”, this function was achieved in the experiment through stacking the output of each previous layer by the 3rd axis as input of the current layer, adopt a growth rate of 32, which means the output of each layer is 32, and drop rate of 0.5, means in the transition layer, the number of output feature maps is reduced into half of the initial number. Compare to ResNet, theoretically, because the DenseNet is narrower than ResNet, the DenseNet with same number of neuron layers will have less parameter and the because of the bottleneck layer and small number of output feature maps (every layer usually have a small convolutional kernel depth), which can save the memory used while running the session, and because input of every neuron layers is directly access to the input and output gradient of each layer, so the problem of gradient disappear won’t exist with the increase in the depth of neuron network. In the training process, the assumption is that the training of DenseNet is more speed-efficient and takes fewer steps for converging to a higher accuracy compares with the performance of ResNet. However, the output result is not as ideal as the expectation, the time taken for the network to reach an output accuracy is much long than the training process of ResNet-34, which might be cause by the depth of the neuron network, furthermore, when the training step reached about 200, there isn’t an obvious improvement in the training accuracy, even after 2000 times of training, there is only an increase in accuracy about 10% to 35%, then several tries for eliminating the bottleneck layer to keep the most number of feature maps, whereas still the training result isn’t as ideal as expectation. 2.3. Code Segments In the experiment, the TensorFlow [2] framework was used, import tensorflow as tf, also import the module slim, from tensorflow.contrib.slim as slim, to simplify the complex procedure of defining a complete convolutional layer. In both experiments, the tensor of weights and biases are randomly produced as below: define function weights with parameter(shape) init = tf.truncated_normal with parameter(shape, standard deviation=0.01) (tensor with shape filled with random truncated normal with standard deviation 0.01) return variable init as the output of the function
  • 6. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 6 define function biases with parameter(shape) init = tf.constant with parameter(0.02, shape) (tensor with shape filled with 0.02) return variable init as the output of the function In the experiment of ResNet-34, the convolutional layers with the transformation in each stage is defined as below: define function conv with parameters (inputs, out_size, k_size) x_short_cut = inputs conv1 = 2-dimensional convolution (inputs, number of output feature maps=out_size, kernel size=k_size, stride=2, padding='SAME', without activation function) conv1_output = ReLU activation function (batch normalize (conv1 on axis=3)) conv2 = 2-dimensinal convolution (conv1_output, number of output feature maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation function) conv2_output = batch normalize (conv2 on axis=3) input_conv = 2-dimensinal convolution (x_short_cut, number of output feature maps =out_size, kernel size=k_size, stride=2, padding='SAME', without activation function) input_reshape = batch normalize (input_conv on axis=3) output = ReLU activation function (input_reshape+conv2_output) return output as the output of the function The identity blocks which include the channel for passing the residual of input data and the output result is defined as below: define identity with parameters (inputs, out_size, k_size) x_short_cut = inputs conv1 = 2-dimensinal convolution (inputs, number of output feature maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation function) conv1_output = ReLU activation function (batch normalize (conv1 on axis=3)) conv2 = 2-dimensinal convolution (conv1_output, number of output feature maps=out_size, kernel size=k_size, stride=1, padding='SAME', without activation function) conv2_BN = batch normalize (conv2 on axis=3) conv2_output = ReLU activation function (conv2_BN+x_short_cut) return conv2_output as the output of the function In the experiment of DenseNet_BC-121, the dense blocks with growth rate of 32 and a bottle neck layer is defined as below: Define function dense with parameters (inputs, growth_rate=32, internal_layer=True, keep_prob) x_input = batch normalize (inputs, axis=3) x_relu = ReLU activation function (x_input) # bottleneck layer conv1 = 2-dimensinal convolution (x_relu, number of output feature maps=growth_rate*4, kernel size=1*1, stride=1, padding='SAME', without activation function) conv1_dropout = Dropout (conv1, keep percentage=keep_prob) conv1_relu = ReLU activation function (batch normalize (conv1_dropout on axis=3)) conv2 = 2-dimensinal convolution (conv1_relu, number of output feature maps=growth_rate, kernel size=3*3, stride=1, padding='SAME', without activation function) conv2_dropout = Dropout (conv2, keep percentage=keep_prob)
  • 7. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 7 if internal_layer is True then output = stack ([inputs, conv2_dropout] on axis=3) else output = conv2_dropout return output as the output of the function The transition block used to compress the dimension of input tensor is defined as below: Define function transition with parameters (inputs, drop=0.5) x_input = batch normalize (inputs on axis=3) conv = 2-dimensinal convolution (x_input, number of output feature maps=the depth of tensor x_input*drop, kernel size=1*1, stride=1, padding='SAME', without activation function) h1 = average pooling (conv, pooling kernel size=2*2, strides=2, padding='SAME') return h1 as the output of the function 2.4. Experiment details The procedure of tuning parameters in the experiment was mainly focus on the learning rate of the neural networks and the default decay rate in exponential decay was 0.96. The method of tuning refers to A Practical Guide [5], range while tuning the learning rate is set between 1E-4 and 0.7, first start with 0.7 and observe the rate of convergence through the output accuracy. Bigger learning rate was easier for the accuracy to rise to about 95%, whereas the prediction will start to move back and forth the local optimum and hardly make further improvements. Then the learning rate will be change to 0.3, but the problem still occurs. However, if the initial learning rate was too small, it will take a long time to converge since the beginning of training process. Ultimately, it turned out to be that 0.003 is the most efficient learning rate for ResNet after limited times of experiments. 2.5 Limitations On the one hand, the neural network structure used in the experiment procedure was programmed by myself based on the understanding of the papers published by the developers of these two neural networks, rather than using the existed code segments provided by the developers. On the other hand, one factor that may cause the slow convergence rate of DenseNet compares to ResNet may caused by the depth of the neural network which was 121, much deeper than 34 layers ResNet and had more connections. 4. CONCLUSION Through the comparison of the two experiments, it is obvious that ResNet-34 performs better than the DenseNet_BC-121 on the training and adaption of the fruit data set [1], since the efficiency of learning, the rate of fitting the data set is much faster and takes less time to return a resultant accuracy. It can be concluded that ResNet performs much better than DenseNet on simple data sets, since the ResNet avoids the disappearance of gradient through residual passing, so that the current layer is able to get both the input gradient and tensor of the previous layer and the output tensor and gradient. However, in the DenseNet, each layer needs to compute all of the input and output tensor and gradient of all the previous layers, when it is applied on those simple data sets, this might cause an increment in the computation complexity also it might blur the output processed tensor and gradient of the previous layer, so it resulted in only a limited improvement in its performance after 2000 times of training and might be the reasons why the using of DenseNet structure now is not as much as the using of ResNet structure. According to one of the developers of DenseNet [6], the structures of DenseNet are usually narrower but much deeper, so that the training process of DenseNet is more parameter-efficiency but less memory and speed- efficiency, which is mainly why the training process is slow and lack of increment in the accuracy.
  • 8. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE) Vol.6, No.1, January 2019 8 ACKNOWLEDGEMENTS It gives me great pleasure in acknowledging the support and help of Professor Juntao Ye. I would like to thank my parents for giving me support when I met difficulties. And everyone who gives me assistance when I met troubles. REFERENCES [1] Fruits-360, Version: 2018.07.01.0. https://www.kaggle.com/moltean/fruits. last visited on 12.07.2018. [2] TensorFlow. https://www.tensorflow.org. last visited on 15.07.2018. [3] M. Liang, X. Hu, Recurrent Convolutional Neural Network for Object Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Boston, pp. 3367-3375, 2015. [4] G. Huang, Z. Liu, K. Weinberger, L. Maaten, Densely Connected Convolutional Networks, 2017. [5] keitakurita, Learning Rate Tuning in Deep Learning: A Practical Guide, 2018. http://mlexplained.com/2018/01/29/learning-rate-tuning-in-deep-learning-a-practical-guide/. last visited on 15.01.2019. [6] Z. Liu, answering why DenseNet requires more memory in training, 2017. https://www.reddit.com/r/MachineLearning/comments/67fds7/d_how_does_densenet_compare_to_re snet_and/. last visited on 12.01.2019 AUTHORS Ding Tianye, year 11 senior high student in Hangzhou Foreign Language School, has a month of summer program experience in National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA). Currently the president of Developer Association, studying machine learning.