SlideShare a Scribd company logo
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1062
Classification of Images Using CNN Model and its Variants
Narayan Dhamala1, Krishna Prasad Acharya2
1Teaching Assistant, Department of Computer Science & Application, Mechi Multiple Campus, Jhapa, Nepal
2 Assistant Professor, Department of Computer Science & Application, Mechi Multiple Campus, Jhapa, Nepal
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Image classification is a method of assigning a
label to an image and it is suitable to use deep learningforthis
task due to spatial nature of image which can leverage the
massively parallel structure to learn various features. In this
research, a Convolution Neural Networks (CNN) model is
presented with three configurations. The first configuration
is simple and other two configurations are improvement of
first configuration by usingtechniques topreventoverfitting.
The training and testing is performed on a CIFAR-10 dataset
which consists of 60000 sets of images of10 differentobjects.
During comparison of variants of model using different
performance matrices it is observed that the dropout
regularization technique can significantly make the model
more accurate. It also shows that lower batch size can give
better result than higher batch size.
Key Words: Convolution Neural Network, Epos, Pooling,
relu, softmax.
1.INTRODUCTION
CNN is a type of Deep Neural Networks which is mainly used
for solving image recognition related problems.Itavariant of
Multi-Layer Perceptron (MLP) also called fully connected
neural networks. In such networks, each layer in one layer is
connected to every neuron in next layer. CNNs can form
complex patterns from simpler ones whichmakesthemmore
effective in recognition of images with lots of features. In this
CNN architecture, thethree differentlayersarepresent which
are: convolution layer, pooling layer and fully connectedlayer.
Figure 1: Basic architecture of a CNN
1.1.1 Convolutional layer
It is first and most significant layer. This layer is responsible
for learning features from the input image. It takes image as
input and applies a kernel (filter) to an image and produces
the output. This operation is called convolution. It helps to
reduce shape of the image while retaining its features.
Different filters can be stacked together to extract many
features. In CNN, an image with a shape (no. of images) x
(image height) x (image width) is passed through
Convolution layer which in turn produces a feature map of
shape (no. of images) x (feature map height) x (feature map
width) x (feature map channels).
Figure 2: Convolution operation
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
Image classification is considered as an intelligent task of
classifying images into differentclassesofgivenobjectsbased
on features. The classification problem can be binary or
multiple classifications. Examples of binary classification are
classifying between cat or dog images,absenceorpresenceof
cancer cells in the medical images etc. Similarly, multiple
classifications include classifying cat or dog images, different
animals, digit recognition etc. Image classification is used in
the field of computer vision for analyzing various image data
to get useful insight. It can differentiate between the given
images based on tiny details which could be missed even by
expert humans in the given domain It is often misunderstood
with object recognition. Object recognition is a boarder term
which is a combination of computer vision tasks including
image classification to detect and recognize different objects
in the image. So, the main difference is that image
classification only deals with classifying images into different
types or classes while object recognition involvesdetectionof
various objects in the image and recognizing them. Image
classification involves training the machine learning model
using large data sets of images. The model learns the pattern
and features present in various classes of objects and can
predict the class of object from previously unseen image.
1.1 Convolution Neural Network (CNN)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1063
1.2.2 Pooling layer
It is related to down sampling the feature maps by summarize
-ng the presence of features in patches of the feature map.
The pooling is used to further reduce the size of the feature
map. The size of pooling filter is always in the form of 2 by 2
matrixes with a stride of 2 pixels.
The two mostly used pooling methods are: max pooling and
average pooling. In max pooling, the 2 by 2filterisslidedover
the feature map and find out the maximum value in the box.
In an average pooling, the pooling filter of size 2 by 2 is slided
over the feature map and the average value in the box is
chosen.
Fig 1-3: Max-pooling operation
1.2.3 ReLU layer
It is a rectified linear unit which removes negative values
from activation map by setting them to zero.Itincreasesnon-
linear properties of the network. It applies the activation
function f(x) = max (0, x). It helps to overcome thelimitations
of other activation functions like sigmoid and tanh. Layers
deep in large networks using these nonlinear activation
functions fail to receive useful gradient information.
Figure 4: ReLU activation function
1.2.4 Fully Connected layer
This layer takes the output produced from convolution and
pooling method to classify the input image into different
classes (labels). The result from previous layer is flattened
into a single vector of values, each representing a probability
that a certain feature belongs to a label. For example, in
handwriting recognition system the labels can belettersfrom
A to Z. If the input image is of letter ‘P’ then the probabilities
of features representing circle and a line should be high.
1.3 Problem statement
The different machine learning algorithms used in detection
and recognition of objects are: Support Vector Machines
(SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB),Binary
Decision Tree (BDT) and DiscriminantAnalysis(DA).Mostof
these traditional machine learningalgorithmsarenotsuitable
for problems which are non-linear in nature. They can often
provide inaccurate model and suffer from high error rate. So,
for solving such problems a type of machine learning
technique called deep learning is becoming more popular.
Although neural networks were developed very early, they
were not explored extensively due to lack of sufficient
computing power. Due to advancements in computer
hardware and rise of accelerated processing units such as
GPU and TPU such neural networksarebeingused toachieve
greater performance in various intelligent tasks. Similarly, it
can be combined with traditional techniques to make them
more powerful. It massively parallel network similar to the
biological network of neurons. Due to the spatial nature of
images, they can be more efficiently processed using neural
networks. It becomes harder for the neural networks to find
underlying patterns of features by analyzing each and every
pixel in the image. So, this simple way of using Neural
Networks cannot perform well in the high-quality images
having lots of features. So, in ordertosolvetheaboveproblem,
a more advanced type of neural networks called CNN can be
used.
1.4 Objectives
The main objectives of this research are as follows:
•To implement Convolution Neural Network model and it's
variants for image classification.
•To compare and analyze the different variants of the model
using different performance metrics like: classification
accuracy rate, error rate, sensitivity, specificity, precision, F1
score and effect of batch size on the given image dataset.
2. Literature Review
In 2021, Ruchika Arora et.al [1] proposed an optimized CNN
for image classification of Tuberculosisimagesusing efficient
tuning of hyper parameters based on hyper band search
optimization approach. The tuberculosis diseases in chest X-
ray images are trained using NLM china dataset and also
tested on them. The efficient hyperparametersarechosenby
trial and error method and according totheexperienceofthe
designer. The experiment shows that usage of hyper
parameters on a given data set using CNN method achieves
91.42 % accuracy.
The Alex Krizhevsky et al.built ImageNet [2] CNN forILSVRC-
2012 contest where the task was to classify 1.2 million high
resolution images into 1000 different classes. The authors
were able to achieve top-1 and top-5 error rates of 37.5%
and 17.0 % on the test data. It is considered tobebetterresult
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1064
than previous state-of-art. The CNN model consists of five
convolutional layers some of which are followed by max
pooling layers and three fully connected layers. This model
uses efficient GPU implementation to make training faster. It
also utilizes dropout regularization technique to reduce over
fitting. This paper showed that a deep CNN is capable of
achieving state of art result using purely supervisedlearning.
The Md. Anwar Hossain et al. [3] proposed a CNN image
classification which was trained using CIFAR-10 dataset. The
architecture of the CNN model consists of three blocks of
Convolution and ReLU layer. The firstblockisfollowed byMax
pooling, second is followed by Average poolingandlastoneis
followed by a fully connected layer. The author has
implemented this model using MatConvNet. In the
experiment, the maximum accuracy of 93.47% was yielded
with batch size 60, no. of epochs 300 and learning rate
0.0001.
For ImageNet challenge 2014, The Karen Simonyan et al. [4]
at VGG team came up with more accurate CNN architectures
which not only achieve the state-of-the-art accuracy on
ILSVRC classification and localization tasks, but are also
applicable to other image recognition datasets. The authors
present different configuration of the network which differs
only in depth from network 8 convolutional layers with 3 FC
layers to 16 convolutional layers with 3 FC layers. The main
highlight of this network is that it used very small filters
throughout the network which helps to achieve significant
improvement.
GoogLeNet[5] was able to push the limit of the CNN depth
with a 22 layers structure. It is found that the deeper and
wider layer helps to improve accuracy. The main hallmark of
this architecture is the improved utilization of computing
resources in the network. The authors were able to increase
the depth simultaneously keeping the computational
requirement constant by improving the architecture. The
paper discusses the idea of applying dimension reduction
wherever computational requirementsincrease. Theauthors
have presented a type of network configuration called
inception network which consists of modules stacked upon
each other. Thus, this paper shows the strength of inception
architecture and provides evidence that it can achievesimilar
result as more computationally expensive networks.
The Shivam et al. [6] performs image classification using five
different architecture of image classification that are made
based on varying convolutionlayer,fullyconnectedlayersand
filter size. To perform the experiment the different hyper
parameters like: activation function, optimizer, learningrate,
dropout rate and size of batch are considered. The result
shows that the choice of activationfunction,dropoutrateand
optimizer has influence on the accuracy of the architecture.
The CNN Model gives 99% of accuracy for MNIST dataset but
lower than that for Fashion-MINST dataset this maybedueto
complex nature of data in the Fashion-MINST dataset.
3. Methodology
3.1 Data Collection
The dataset used in this research is CIFAR-10dataset.Itisone
of the most widely used dataset for machine learning. It
consists of 60,000 images with a 32 x 32 color images in 10
classes, with 6,000 images per class. The ten different classes
in the datasets are airplane, automobile, bird, cat, deer, dog,
frog, horse, ship and truck. Since the resolution of the image
is low, it can be used to test new machine learning algorithms
quickly with very less computing power.
Figure 5: Some images on CIFAR-10 dataset
3.2 Tools used
Following are the hardware and software tools used to
implement this research
3.2.1 Hardware requirements
CPU: Intel x64_86
RAM: 4GB
HDD: 1 TB
3.2.2 Software requirements
Operating System: Windows 10 64bit
Developed In: Python 3.7
Libraries Used: Keras, Scikit-Learn, Matplotlib, Seaborn
3.3 Data preprocessing
It is necessary to apply some preprocessing on the data
before providing it into the network. The dataset of 60,000
images is split into 40,000 training set, 10,000 for validation
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1065
during training and 10,000 for testing. The pixel data in x
values is converted into floating point value and normalized
into range 0-255. The output or y value consists of
categorical data which are converted into numeric values.
3.4 Architecture of CNN model
Figure 6: The architecture of CNN Model
The network of the model is structured in 3-block VGG style
having sequence of CONV-CONV-POOL-CONV-CONV-POOL-
CONV-CONV-POOL-FC-FC. Each block consists of two
convolution layer (CONV) followed by a max pooling layer
(POOL). There are two fully connected layers (FC) at the end
of the network.
.
The first layer of the network is a convolutionallayerinwhich
the input images are fed. Each image is of size n*n=32*32
with depth of 3 for each color channels. It uses f*f=3*3kernel
for convolution operation.It usesReLU(RectifiedLinearUnit)
activation function. The stride is 1 which means the filter
window is moved by 1 pixel at a time. since our 32*32 image
is reduced to 30*30, zero padding around input layer is
applied such that output image size is same as input. In this
case padding=1 is applied. Next, 32 filters are applied in this
layer. This layer produces feature maps of size 32@32*32
where first 32 is no. of feature maps and 32 in 32*32 is given
as: ((n+2p-f)/s) +1. So, in this case, ((32+2*1-3)/1) +1 =32.
The no. of feature maps in next layers can be calculated using
same formula.
The second layer is also a convolutional layer which takes
input from previous layer and produces feature maps of size
32@30x30. It also applies 32 filters of 3*3 size. It also has
stride=1 but it doesn’t apply any zero padding.
The third layer is pooling layer which applies max-pooling
using 2*2 filter with stride=2.Itproducesoutputfeaturemaps
of size 32@15*15. This layer doesn’t apply any activation
function.
The fourth layer is a convolutional layer similartofirstlayer.It
applies 64 filters of size 3x3and produce outputfeaturemaps
of size 64@15*15. It has stride, padding and activation
function same as first layer.
The fifth layer is a convolution layer same as second layer. It
produces feature maps ofsize64@13*13.Likesecondlayer,it
doesn’t have padding around boundary. It has stride, padding
and activation function same as second layer.
The sixth layer is pooling layer same as third layer. It also
applies max-pooling using 2x2 filter with stride=2. It
produces output feature maps of size 64@6*6.
Next three layers are similar as fourth, fifth and sixth layers
respectively. The seventh layer which is convolutional layer
produces feature maps of size 64@6*6 which is fed into next
convolutional layer that outputs feature maps of 64@4*4.
Next, the ninth layer which is pooling layer produces output
of size 64@2*2.
The next layer is a fully connected layer which converts the
3D array into a 1D array of size 2*2*64 = 256. It is also called
flattened layer. It uses ReLU activation function. Finally, the
last layer is also a fully connected layer whichhas10nodesfor
representing each class in CIFAR-10 dataset. It uses soft-max
activation function.
3.5 Variants of the model
3.5.1 Baseline model
This model is built using the architecture presented in
previous section. It is a 3 block VGG style model which is
modular and easy to implement. It uses Stochastic Gradient
Descent (SGD) for optimizing model with learning rate of
0.001. This variant of the model doesn’t use any techniques
for the improvement of the model. It is trained using batch
size of 32 and 64 and no. of epochs 100. Batch size refers the
total number of training examples used before the model is
updated. The number of epos is defined as the numbers of
times the algorithms will work in the given training datasets.
Generally the higher number of epos is chosen and can later
reduce it to a number at which optimal performance is
achieved. Higher number of epochs can lead to over-fitting
3.5.2 Improved model I
This model uses same architecture as baseline model but it
applies a technique called dropout regularization. This
technique randomly removescertain nodesfromthe network.
It helps to prevent the condition of over fitting in our model
which causes the model to memorize or fit so closely to
training data that it performs poorly on unseen or test
dataset. This model applies dropout rate of 25% after each
block CONV-CONV-MAX and 5% before final fully connected
layer.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1066
3.5.3 Improved model II
This model further improves the previous model by applying
image augmentationtechniquewhichaugmentationhasbeen
used which increases size of training data by adding more
images obtained by transformingtrainingimages.Inthis case
the height and width of image is shiftedrandomlybetween1-
10%.
3.6 Comparative Criteria
The variants of the model will be analyzed based on the
following performance metrics.
3.6.1 Classification accuracy rate
Classification accuracy=
Where, True Positives (TP) = actual positives which are
predicted positive
True Negatives (TN) = actual negatives which are
predicted negative
False Positives (FP) = actual positives which are predicted
negative
False Negatives (FN) = actual negatives which are
predicted positive
3.6.2 Error rate
Error rate =
3.6.3 True positive rate (TPR) or Sensitivity or Recall
TPR =
3.6.4True negative rate (TNR) or Specificity
TNR =
3.6.5 Positive Predictive Value (PPV) or Precision
PPV or Precision =
3.6.6F1Score
For any general value β:
Fβ=
For β=1:
F1 Score=
4. Result, Analysis and Comparison
In this section, the variants of the CNN model are compared
along with other training parameters like batch size and no.
of epochs. The comparison is done based on performance
matrices: classification accuracy rate, error rate, TPR and
TNR. The Table 4 shows performance metrics of variants of
the model based on batch size 32 and 64. The variants of the
model were trained and tested up to 100 epochs. For each
configuration of model, the performance metrices has been
presented.
Table -1: Performance metrics of different variants of model
4.1 Comparison result of classification accuracy
rate
Figure 7: Comparison graph of classification accuracy rate
92.5
93
93.5
94
94.5
95
95.5
96
96.5
97
97.5
Batch size 32 Batch size 64
classifcation accuracy rate (in percentage)
Baseline model Improved model I
Improved model II
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1067
The graph on figure 7 shows that the accuracy rate is higher
for model with lower batch size i.e. 32 in comparison with
higher batch size i.e. 64 for first two variant. In slightly
decreases on last variant for batch size 64.For batch size 32,
the rate was increased by 2.116%frombaseline toimproved
model I. Similarly, it was further increased by 0.302% in
improved model II. For batch size 64, the rate was increased
by 2.424% from baseline to improved model I. Similarly, it
was further increased by 0.038% in improved model II.
4.2 Comparison result of error rate
Figure 9: Comparison graph of TPR
Based on graph on above image, the TPR increases fromfirst
variant to final except when batch size is 64. It increases by
10.58 % from baseline model to improved model I and again
slightly increases by 1.51 % inImproved modelIIwhenbatch
size is 32. For batch size 64, It increases by 12.12 % from
baseline model to improved model I but slightly decreases by
0.19 % in Improved model II.
4.4 Comparison result of True Negative Rate
Figure 10: Comparison graph of TNR
The graph on the figure 10 shows that the TNR increases in
each successive variant by 1.176 % and0.168 % when the
batch size is 32. In case of batch size 64, it increases by 1.347
% from first to second variant but decreases by 0.021 % on
third variant.
4.5 Comparison result of Positive Predictive Value
Figure 11: Comparison graph of PPV
0
1
2
3
4
5
6
7
Batchsize 32 Batchsize 64
classifcationerror rate (in percentage)
Baseline model Improved model I Improved model II
0
10
20
30
40
50
60
70
80
90
Batch size 32 Batch size 64
PPV(in percentage)
Baseline model Improved model I Improved model II
Figure: 8 Comparison graph of classification error rate
Based on figure: 8, it is clear that the second variant of the
model significantly decreases the error rate and it decreases
further for batch size 32 but increases slightly for batch size
64 in final variant.
4.3 Comparison result of True Positive Rate
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1068
The graph on the figure 11 shows that the PPV increases in
each successive variant by 10.393 % and further 1.801 %
when the batch size is 32. In case of batch size 64, it
increases by 12.516 % from first to second variant but
decreases by 0.513 % on third variant.
4.6 Comparison result of F1 score
Figure 12: Comparison graph of F1 Score
Based on the figure 12, the F1 score increases by 5.275 % in
improved model I and again by 4.287 % on Improved model
II when the batch size is 32. In case of batch size 64, it
increases by 6.088 % from first to second variant but
decreases by 0.195 % on third variant.
5.Conclusion
In this research, a CNN model is built with three variation or
configuration. It is tested by solving image classification
problem using CIFAR-10 dataset. The first variant is baseline
model and two other improved variants use techniques like
image augmentation and dropout regularization. The result
from this research shows that the dropout technique greatly
increased the performance of the model while that of image
augmentation slightly improved the model. It also shows that
the performance can be decreased if some parameter is not
optimal as such in improved model II when batch size is 64.
This research also demonstrates that thedeeplearningcan be
applied to solve vision related tasks. It can be also concluded
that the parameters of the model should be tweaked and
tested using hit and trial method to get optimal result from
the network and lower batch size gives more classification
accuracy as compared to that of higher batch size.
6. Future Works
It can be further developed and extended to work on other
datasets. This model can be further fine-tuned to get better
results by adjusting various parameters. It can give us more
accurate results by using even more datasets and deeper
network. It was also realized that high performance GPU
computing is necessary to train model to achieve significant
results. This model can be further extended and developed to
solve other more complex tasks such as classification of
medical images to detect diseases like cancer, cataract,
pneumonia etc. Beside this it can be developed various other
image classification tasks in the future.
REFERENCES
[1] Ruchika Arora, Indu Saini and Neetu Sood, "Efficient
Tuning of Hyper-parameters in Convolutional Neural
Network for Classification of Tuberculosis Images,"
Proceedings of International Conference on Women
Researchers in Electronics and Computing (WREC 2021)
April 22–24, 2021, DOI: 10.21467/proceedings.114.
[2] Krizhevsky, Alex, Sutskever, Ilya, and Hinton,Geoffrey.
ImageNet classification with deep convolutional neural
networks. In Advances in Neural Information Processing
Systems 25 (NIPS’2012). 2012.
[3] Hossain, Md. Anwar &Sajib, Md. (2019). Classification of
Image using Convolutional Neural Network (CNN). Global
Journal of Computer Science and Technology. 19. 13-18.
10.34257/GJCSTDVOL19IS2PG13.
[4] Simonyan, Karen & Zisserman, Andrew. (2014). Very
Deep Convolutional Networks for Large-Scale Image
Recognition. arXiv 1409.1556.
[5] C. Szegedy et al., "Going deeper with convolutions," 2015
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Boston, MA, 2015, pp. 1-9, doi:
10.1109/CVPR.2015.7298594.
[6] Kadam, Shivam&Adamuthe,Amol &Patil,Ashwini.(2020).
CNN Model for Image Classification on MNIST and Fashion-
MNIST Dataset. Journal of scientific research. 64. 374-384.
10.37398/JSR.2020.640251.

More Related Content

PDF
IRJET- Face Recognition using Machine Learning
PDF
From Pixels to Understanding: Deep Learning's Impact on Image Classification ...
PDF
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
PDF
Plant Disease Detection using Convolution Neural Network (CNN)
PDF
Garbage Classification Using Deep Learning Techniques
PDF
A Survey on Image Processing using CNN in Deep Learning
PDF
Machine learning based augmented reality for improved learning application th...
IRJET- Face Recognition using Machine Learning
From Pixels to Understanding: Deep Learning's Impact on Image Classification ...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
Plant Disease Detection using Convolution Neural Network (CNN)
Garbage Classification Using Deep Learning Techniques
A Survey on Image Processing using CNN in Deep Learning
Machine learning based augmented reality for improved learning application th...

Similar to Classification of Images Using CNN Model and its Variants (20)

PDF
DEEP LEARNING BASED BRAIN STROKE DETECTION
PDF
Devanagari Digit and Character Recognition Using Convolutional Neural Network
PDF
IRJET- Mango Classification using Convolutional Neural Networks
PDF
Improved Image Based Super Resolution and Concrete Crack Prediction Using Pre...
PDF
IRJET- Art Authentication System using Deep Neural Networks
PPTX
Introduction to Convolutional Neural Networks
PDF
A Review on Color Recognition using Deep Learning and Different Image Segment...
PDF
Image super resolution using Generative Adversarial Network.
PDF
IRJET- Machine Learning based Object Identification System using Python
PDF
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
PDF
Deep Learning for Natural Language Processing
PDF
Mnist report
PPTX
Mnist report ppt
PDF
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
PDF
IMAGE SEGMENTATION AND ITS TECHNIQUES
PDF
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
PPTX
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Image Classification using Deep Learning
PDF
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
DEEP LEARNING BASED BRAIN STROKE DETECTION
Devanagari Digit and Character Recognition Using Convolutional Neural Network
IRJET- Mango Classification using Convolutional Neural Networks
Improved Image Based Super Resolution and Concrete Crack Prediction Using Pre...
IRJET- Art Authentication System using Deep Neural Networks
Introduction to Convolutional Neural Networks
A Review on Color Recognition using Deep Learning and Different Image Segment...
Image super resolution using Generative Adversarial Network.
IRJET- Machine Learning based Object Identification System using Python
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
Deep Learning for Natural Language Processing
Mnist report
Mnist report ppt
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IMAGE SEGMENTATION AND ITS TECHNIQUES
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
International Journal of Computational Engineering Research(IJCER)
Image Classification using Deep Learning
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
573137875-Attendance-Management-System-original
PDF
Well-logging-methods_new................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Welding lecture in detail for understanding
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original
Well-logging-methods_new................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Welding lecture in detail for understanding
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
R24 SURVEYING LAB MANUAL for civil enggi
OOP with Java - Java Introduction (Basics)
Operating System & Kernel Study Guide-1 - converted.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS

Classification of Images Using CNN Model and its Variants

  • 1. © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1062 Classification of Images Using CNN Model and its Variants Narayan Dhamala1, Krishna Prasad Acharya2 1Teaching Assistant, Department of Computer Science & Application, Mechi Multiple Campus, Jhapa, Nepal 2 Assistant Professor, Department of Computer Science & Application, Mechi Multiple Campus, Jhapa, Nepal ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Image classification is a method of assigning a label to an image and it is suitable to use deep learningforthis task due to spatial nature of image which can leverage the massively parallel structure to learn various features. In this research, a Convolution Neural Networks (CNN) model is presented with three configurations. The first configuration is simple and other two configurations are improvement of first configuration by usingtechniques topreventoverfitting. The training and testing is performed on a CIFAR-10 dataset which consists of 60000 sets of images of10 differentobjects. During comparison of variants of model using different performance matrices it is observed that the dropout regularization technique can significantly make the model more accurate. It also shows that lower batch size can give better result than higher batch size. Key Words: Convolution Neural Network, Epos, Pooling, relu, softmax. 1.INTRODUCTION CNN is a type of Deep Neural Networks which is mainly used for solving image recognition related problems.Itavariant of Multi-Layer Perceptron (MLP) also called fully connected neural networks. In such networks, each layer in one layer is connected to every neuron in next layer. CNNs can form complex patterns from simpler ones whichmakesthemmore effective in recognition of images with lots of features. In this CNN architecture, thethree differentlayersarepresent which are: convolution layer, pooling layer and fully connectedlayer. Figure 1: Basic architecture of a CNN 1.1.1 Convolutional layer It is first and most significant layer. This layer is responsible for learning features from the input image. It takes image as input and applies a kernel (filter) to an image and produces the output. This operation is called convolution. It helps to reduce shape of the image while retaining its features. Different filters can be stacked together to extract many features. In CNN, an image with a shape (no. of images) x (image height) x (image width) is passed through Convolution layer which in turn produces a feature map of shape (no. of images) x (feature map height) x (feature map width) x (feature map channels). Figure 2: Convolution operation International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 Image classification is considered as an intelligent task of classifying images into differentclassesofgivenobjectsbased on features. The classification problem can be binary or multiple classifications. Examples of binary classification are classifying between cat or dog images,absenceorpresenceof cancer cells in the medical images etc. Similarly, multiple classifications include classifying cat or dog images, different animals, digit recognition etc. Image classification is used in the field of computer vision for analyzing various image data to get useful insight. It can differentiate between the given images based on tiny details which could be missed even by expert humans in the given domain It is often misunderstood with object recognition. Object recognition is a boarder term which is a combination of computer vision tasks including image classification to detect and recognize different objects in the image. So, the main difference is that image classification only deals with classifying images into different types or classes while object recognition involvesdetectionof various objects in the image and recognizing them. Image classification involves training the machine learning model using large data sets of images. The model learns the pattern and features present in various classes of objects and can predict the class of object from previously unseen image. 1.1 Convolution Neural Network (CNN)
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1063 1.2.2 Pooling layer It is related to down sampling the feature maps by summarize -ng the presence of features in patches of the feature map. The pooling is used to further reduce the size of the feature map. The size of pooling filter is always in the form of 2 by 2 matrixes with a stride of 2 pixels. The two mostly used pooling methods are: max pooling and average pooling. In max pooling, the 2 by 2filterisslidedover the feature map and find out the maximum value in the box. In an average pooling, the pooling filter of size 2 by 2 is slided over the feature map and the average value in the box is chosen. Fig 1-3: Max-pooling operation 1.2.3 ReLU layer It is a rectified linear unit which removes negative values from activation map by setting them to zero.Itincreasesnon- linear properties of the network. It applies the activation function f(x) = max (0, x). It helps to overcome thelimitations of other activation functions like sigmoid and tanh. Layers deep in large networks using these nonlinear activation functions fail to receive useful gradient information. Figure 4: ReLU activation function 1.2.4 Fully Connected layer This layer takes the output produced from convolution and pooling method to classify the input image into different classes (labels). The result from previous layer is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label. For example, in handwriting recognition system the labels can belettersfrom A to Z. If the input image is of letter ‘P’ then the probabilities of features representing circle and a line should be high. 1.3 Problem statement The different machine learning algorithms used in detection and recognition of objects are: Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB),Binary Decision Tree (BDT) and DiscriminantAnalysis(DA).Mostof these traditional machine learningalgorithmsarenotsuitable for problems which are non-linear in nature. They can often provide inaccurate model and suffer from high error rate. So, for solving such problems a type of machine learning technique called deep learning is becoming more popular. Although neural networks were developed very early, they were not explored extensively due to lack of sufficient computing power. Due to advancements in computer hardware and rise of accelerated processing units such as GPU and TPU such neural networksarebeingused toachieve greater performance in various intelligent tasks. Similarly, it can be combined with traditional techniques to make them more powerful. It massively parallel network similar to the biological network of neurons. Due to the spatial nature of images, they can be more efficiently processed using neural networks. It becomes harder for the neural networks to find underlying patterns of features by analyzing each and every pixel in the image. So, this simple way of using Neural Networks cannot perform well in the high-quality images having lots of features. So, in ordertosolvetheaboveproblem, a more advanced type of neural networks called CNN can be used. 1.4 Objectives The main objectives of this research are as follows: •To implement Convolution Neural Network model and it's variants for image classification. •To compare and analyze the different variants of the model using different performance metrics like: classification accuracy rate, error rate, sensitivity, specificity, precision, F1 score and effect of batch size on the given image dataset. 2. Literature Review In 2021, Ruchika Arora et.al [1] proposed an optimized CNN for image classification of Tuberculosisimagesusing efficient tuning of hyper parameters based on hyper band search optimization approach. The tuberculosis diseases in chest X- ray images are trained using NLM china dataset and also tested on them. The efficient hyperparametersarechosenby trial and error method and according totheexperienceofthe designer. The experiment shows that usage of hyper parameters on a given data set using CNN method achieves 91.42 % accuracy. The Alex Krizhevsky et al.built ImageNet [2] CNN forILSVRC- 2012 contest where the task was to classify 1.2 million high resolution images into 1000 different classes. The authors were able to achieve top-1 and top-5 error rates of 37.5% and 17.0 % on the test data. It is considered tobebetterresult
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1064 than previous state-of-art. The CNN model consists of five convolutional layers some of which are followed by max pooling layers and three fully connected layers. This model uses efficient GPU implementation to make training faster. It also utilizes dropout regularization technique to reduce over fitting. This paper showed that a deep CNN is capable of achieving state of art result using purely supervisedlearning. The Md. Anwar Hossain et al. [3] proposed a CNN image classification which was trained using CIFAR-10 dataset. The architecture of the CNN model consists of three blocks of Convolution and ReLU layer. The firstblockisfollowed byMax pooling, second is followed by Average poolingandlastoneis followed by a fully connected layer. The author has implemented this model using MatConvNet. In the experiment, the maximum accuracy of 93.47% was yielded with batch size 60, no. of epochs 300 and learning rate 0.0001. For ImageNet challenge 2014, The Karen Simonyan et al. [4] at VGG team came up with more accurate CNN architectures which not only achieve the state-of-the-art accuracy on ILSVRC classification and localization tasks, but are also applicable to other image recognition datasets. The authors present different configuration of the network which differs only in depth from network 8 convolutional layers with 3 FC layers to 16 convolutional layers with 3 FC layers. The main highlight of this network is that it used very small filters throughout the network which helps to achieve significant improvement. GoogLeNet[5] was able to push the limit of the CNN depth with a 22 layers structure. It is found that the deeper and wider layer helps to improve accuracy. The main hallmark of this architecture is the improved utilization of computing resources in the network. The authors were able to increase the depth simultaneously keeping the computational requirement constant by improving the architecture. The paper discusses the idea of applying dimension reduction wherever computational requirementsincrease. Theauthors have presented a type of network configuration called inception network which consists of modules stacked upon each other. Thus, this paper shows the strength of inception architecture and provides evidence that it can achievesimilar result as more computationally expensive networks. The Shivam et al. [6] performs image classification using five different architecture of image classification that are made based on varying convolutionlayer,fullyconnectedlayersand filter size. To perform the experiment the different hyper parameters like: activation function, optimizer, learningrate, dropout rate and size of batch are considered. The result shows that the choice of activationfunction,dropoutrateand optimizer has influence on the accuracy of the architecture. The CNN Model gives 99% of accuracy for MNIST dataset but lower than that for Fashion-MINST dataset this maybedueto complex nature of data in the Fashion-MINST dataset. 3. Methodology 3.1 Data Collection The dataset used in this research is CIFAR-10dataset.Itisone of the most widely used dataset for machine learning. It consists of 60,000 images with a 32 x 32 color images in 10 classes, with 6,000 images per class. The ten different classes in the datasets are airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. Since the resolution of the image is low, it can be used to test new machine learning algorithms quickly with very less computing power. Figure 5: Some images on CIFAR-10 dataset 3.2 Tools used Following are the hardware and software tools used to implement this research 3.2.1 Hardware requirements CPU: Intel x64_86 RAM: 4GB HDD: 1 TB 3.2.2 Software requirements Operating System: Windows 10 64bit Developed In: Python 3.7 Libraries Used: Keras, Scikit-Learn, Matplotlib, Seaborn 3.3 Data preprocessing It is necessary to apply some preprocessing on the data before providing it into the network. The dataset of 60,000 images is split into 40,000 training set, 10,000 for validation
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1065 during training and 10,000 for testing. The pixel data in x values is converted into floating point value and normalized into range 0-255. The output or y value consists of categorical data which are converted into numeric values. 3.4 Architecture of CNN model Figure 6: The architecture of CNN Model The network of the model is structured in 3-block VGG style having sequence of CONV-CONV-POOL-CONV-CONV-POOL- CONV-CONV-POOL-FC-FC. Each block consists of two convolution layer (CONV) followed by a max pooling layer (POOL). There are two fully connected layers (FC) at the end of the network. . The first layer of the network is a convolutionallayerinwhich the input images are fed. Each image is of size n*n=32*32 with depth of 3 for each color channels. It uses f*f=3*3kernel for convolution operation.It usesReLU(RectifiedLinearUnit) activation function. The stride is 1 which means the filter window is moved by 1 pixel at a time. since our 32*32 image is reduced to 30*30, zero padding around input layer is applied such that output image size is same as input. In this case padding=1 is applied. Next, 32 filters are applied in this layer. This layer produces feature maps of size 32@32*32 where first 32 is no. of feature maps and 32 in 32*32 is given as: ((n+2p-f)/s) +1. So, in this case, ((32+2*1-3)/1) +1 =32. The no. of feature maps in next layers can be calculated using same formula. The second layer is also a convolutional layer which takes input from previous layer and produces feature maps of size 32@30x30. It also applies 32 filters of 3*3 size. It also has stride=1 but it doesn’t apply any zero padding. The third layer is pooling layer which applies max-pooling using 2*2 filter with stride=2.Itproducesoutputfeaturemaps of size 32@15*15. This layer doesn’t apply any activation function. The fourth layer is a convolutional layer similartofirstlayer.It applies 64 filters of size 3x3and produce outputfeaturemaps of size 64@15*15. It has stride, padding and activation function same as first layer. The fifth layer is a convolution layer same as second layer. It produces feature maps ofsize64@13*13.Likesecondlayer,it doesn’t have padding around boundary. It has stride, padding and activation function same as second layer. The sixth layer is pooling layer same as third layer. It also applies max-pooling using 2x2 filter with stride=2. It produces output feature maps of size 64@6*6. Next three layers are similar as fourth, fifth and sixth layers respectively. The seventh layer which is convolutional layer produces feature maps of size 64@6*6 which is fed into next convolutional layer that outputs feature maps of 64@4*4. Next, the ninth layer which is pooling layer produces output of size 64@2*2. The next layer is a fully connected layer which converts the 3D array into a 1D array of size 2*2*64 = 256. It is also called flattened layer. It uses ReLU activation function. Finally, the last layer is also a fully connected layer whichhas10nodesfor representing each class in CIFAR-10 dataset. It uses soft-max activation function. 3.5 Variants of the model 3.5.1 Baseline model This model is built using the architecture presented in previous section. It is a 3 block VGG style model which is modular and easy to implement. It uses Stochastic Gradient Descent (SGD) for optimizing model with learning rate of 0.001. This variant of the model doesn’t use any techniques for the improvement of the model. It is trained using batch size of 32 and 64 and no. of epochs 100. Batch size refers the total number of training examples used before the model is updated. The number of epos is defined as the numbers of times the algorithms will work in the given training datasets. Generally the higher number of epos is chosen and can later reduce it to a number at which optimal performance is achieved. Higher number of epochs can lead to over-fitting 3.5.2 Improved model I This model uses same architecture as baseline model but it applies a technique called dropout regularization. This technique randomly removescertain nodesfromthe network. It helps to prevent the condition of over fitting in our model which causes the model to memorize or fit so closely to training data that it performs poorly on unseen or test dataset. This model applies dropout rate of 25% after each block CONV-CONV-MAX and 5% before final fully connected layer.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1066 3.5.3 Improved model II This model further improves the previous model by applying image augmentationtechniquewhichaugmentationhasbeen used which increases size of training data by adding more images obtained by transformingtrainingimages.Inthis case the height and width of image is shiftedrandomlybetween1- 10%. 3.6 Comparative Criteria The variants of the model will be analyzed based on the following performance metrics. 3.6.1 Classification accuracy rate Classification accuracy= Where, True Positives (TP) = actual positives which are predicted positive True Negatives (TN) = actual negatives which are predicted negative False Positives (FP) = actual positives which are predicted negative False Negatives (FN) = actual negatives which are predicted positive 3.6.2 Error rate Error rate = 3.6.3 True positive rate (TPR) or Sensitivity or Recall TPR = 3.6.4True negative rate (TNR) or Specificity TNR = 3.6.5 Positive Predictive Value (PPV) or Precision PPV or Precision = 3.6.6F1Score For any general value β: Fβ= For β=1: F1 Score= 4. Result, Analysis and Comparison In this section, the variants of the CNN model are compared along with other training parameters like batch size and no. of epochs. The comparison is done based on performance matrices: classification accuracy rate, error rate, TPR and TNR. The Table 4 shows performance metrics of variants of the model based on batch size 32 and 64. The variants of the model were trained and tested up to 100 epochs. For each configuration of model, the performance metrices has been presented. Table -1: Performance metrics of different variants of model 4.1 Comparison result of classification accuracy rate Figure 7: Comparison graph of classification accuracy rate 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97 97.5 Batch size 32 Batch size 64 classifcation accuracy rate (in percentage) Baseline model Improved model I Improved model II
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1067 The graph on figure 7 shows that the accuracy rate is higher for model with lower batch size i.e. 32 in comparison with higher batch size i.e. 64 for first two variant. In slightly decreases on last variant for batch size 64.For batch size 32, the rate was increased by 2.116%frombaseline toimproved model I. Similarly, it was further increased by 0.302% in improved model II. For batch size 64, the rate was increased by 2.424% from baseline to improved model I. Similarly, it was further increased by 0.038% in improved model II. 4.2 Comparison result of error rate Figure 9: Comparison graph of TPR Based on graph on above image, the TPR increases fromfirst variant to final except when batch size is 64. It increases by 10.58 % from baseline model to improved model I and again slightly increases by 1.51 % inImproved modelIIwhenbatch size is 32. For batch size 64, It increases by 12.12 % from baseline model to improved model I but slightly decreases by 0.19 % in Improved model II. 4.4 Comparison result of True Negative Rate Figure 10: Comparison graph of TNR The graph on the figure 10 shows that the TNR increases in each successive variant by 1.176 % and0.168 % when the batch size is 32. In case of batch size 64, it increases by 1.347 % from first to second variant but decreases by 0.021 % on third variant. 4.5 Comparison result of Positive Predictive Value Figure 11: Comparison graph of PPV 0 1 2 3 4 5 6 7 Batchsize 32 Batchsize 64 classifcationerror rate (in percentage) Baseline model Improved model I Improved model II 0 10 20 30 40 50 60 70 80 90 Batch size 32 Batch size 64 PPV(in percentage) Baseline model Improved model I Improved model II Figure: 8 Comparison graph of classification error rate Based on figure: 8, it is clear that the second variant of the model significantly decreases the error rate and it decreases further for batch size 32 but increases slightly for batch size 64 in final variant. 4.3 Comparison result of True Positive Rate
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | Jul 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1068 The graph on the figure 11 shows that the PPV increases in each successive variant by 10.393 % and further 1.801 % when the batch size is 32. In case of batch size 64, it increases by 12.516 % from first to second variant but decreases by 0.513 % on third variant. 4.6 Comparison result of F1 score Figure 12: Comparison graph of F1 Score Based on the figure 12, the F1 score increases by 5.275 % in improved model I and again by 4.287 % on Improved model II when the batch size is 32. In case of batch size 64, it increases by 6.088 % from first to second variant but decreases by 0.195 % on third variant. 5.Conclusion In this research, a CNN model is built with three variation or configuration. It is tested by solving image classification problem using CIFAR-10 dataset. The first variant is baseline model and two other improved variants use techniques like image augmentation and dropout regularization. The result from this research shows that the dropout technique greatly increased the performance of the model while that of image augmentation slightly improved the model. It also shows that the performance can be decreased if some parameter is not optimal as such in improved model II when batch size is 64. This research also demonstrates that thedeeplearningcan be applied to solve vision related tasks. It can be also concluded that the parameters of the model should be tweaked and tested using hit and trial method to get optimal result from the network and lower batch size gives more classification accuracy as compared to that of higher batch size. 6. Future Works It can be further developed and extended to work on other datasets. This model can be further fine-tuned to get better results by adjusting various parameters. It can give us more accurate results by using even more datasets and deeper network. It was also realized that high performance GPU computing is necessary to train model to achieve significant results. This model can be further extended and developed to solve other more complex tasks such as classification of medical images to detect diseases like cancer, cataract, pneumonia etc. Beside this it can be developed various other image classification tasks in the future. REFERENCES [1] Ruchika Arora, Indu Saini and Neetu Sood, "Efficient Tuning of Hyper-parameters in Convolutional Neural Network for Classification of Tuberculosis Images," Proceedings of International Conference on Women Researchers in Electronics and Computing (WREC 2021) April 22–24, 2021, DOI: 10.21467/proceedings.114. [2] Krizhevsky, Alex, Sutskever, Ilya, and Hinton,Geoffrey. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’2012). 2012. [3] Hossain, Md. Anwar &Sajib, Md. (2019). Classification of Image using Convolutional Neural Network (CNN). Global Journal of Computer Science and Technology. 19. 13-18. 10.34257/GJCSTDVOL19IS2PG13. [4] Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556. [5] C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594. [6] Kadam, Shivam&Adamuthe,Amol &Patil,Ashwini.(2020). CNN Model for Image Classification on MNIST and Fashion- MNIST Dataset. Journal of scientific research. 64. 374-384. 10.37398/JSR.2020.640251.