SlideShare a Scribd company logo
https://medium.com/aiguys/deep-convolutional-neural-networks-dcnns-explained-in-layman-terms-
b990b2818061
Deep Convolutional Neural Networks
(DCNNs) explained in layman's terms
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural
networks, most commonly applied to analyze visual imagery. They are also known as shift
invariant or space invariant artificial neural networks (SIANN), based on the shared-weight
architecture of the convolution kernels or filters that slide along input features and provide
translation equivariant responses known as feature maps.
Basic DCNN architecture
DCNN Architecture
1. Convolutional Layer + Relu
2. Pooling Layer
3. Fully Connected Layer
4. Dropout
5. Activation Functions
1. Convolutional Layer + Relu
This layer is the first layer that is used to extract the various features from the input images. In
this layer, the mathematical operation of convolution is performed between the input image and a
filter of a particular size MxM. By sliding the filter over the input image, the dot product is taken
between the filter and the parts of the input image with respect to the size of the filter (MxM).
The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other features
of the input image.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this
layer is to decrease the size of the convolved feature map to reduce computational costs. This is
performed by decreasing the connections between layers and independently operating on each
feature map. Depending upon the method used, there are several types of Pooling operations.
In Max Pooling, the largest element is taken from the feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the elements in
the predefined section is computed in Sum Pooling. The Pooling Layer usually serves as a bridge
between the Convolutional Layer and the FC Layer
3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is
used to connect the neurons between two different layers. These layers are usually placed before
the output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers is flattened and fed to the FC layer. The
flattened vector then undergoes a few more FC layers where mathematical operations usually
take place. In this stage, the classification process begins to take place.
4. Dropout
Usually, when all the features are connected to the FC layer, it can cause overfitting in the
training dataset. Overfitting occurs when a particular model works so well on the training data
causing a negative impact on the model’s performance when used on new data.
To overcome this problem, a dropout layer is utilized wherein a few neurons are dropped from
the neural network during the training process resulting in reduced size of the model. On passing
a dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.
5. Activation Functions
Finally, one of the most important parameters of the CNN model is the activation function. They
are used to learn and approximate any kind of continuous and complex relationship between
variables of the network. In simple words, it decides which information of the model should fire
in the forward direction and which ones should not at the end of the network.
It adds non-linearity to the network. There are several commonly used activation functions such
as the ReLU, Softmax, tanH, and the Sigmoid functions. Each of these functions has a specific
usage. For a binary classification CNN model, sigmoid and softmax functions are preferred and
for multi-class classification, softmax is used.
How does a DCNN work?
CNN compares images piece by piece. The pieces that it looks for are called features which are
nothing but a bunch of MxM matrices with numbers(images are nothing but MxM number
matrices of pixel values for a computer). By finding rough feature matches in roughly the same
positions in two images, CNNs get a lot better at seeing similarities than whole-image matching
schemes. However, When presented with a new image, the CNN doesn’t know exactly where
these features will match so it tries them everywhere, in every possible position(matches feature
matrices in steps by shifting the defined steps at a time). In calculating the match to a feature
across the whole image, we make it a filter. The math we use to do this is called convolution,
from which Convolutional Neural Networks take their name.
The next step is to repeat the convolution process in its entirety for each of the other features.
The result is a set of filtered images, one for each of our filters. It’s convenient to think of this
whole collection of convolution operations as a single processing step.
Now comes the step where we introduce so-called “non-linearity” in our model so that our model
can predict and learn non-linear boundaries. A very common way to do this is using a non-linear
function (like Relu, gelu). The most popular non-linear function is RELU which performs a
simple math operation: wherever a negative number occurs, swap it out for a 0. This helps the
CNN stay mathematically healthy by keeping learned values from getting stuck near 0 or
blowing up toward infinity. Note that this convolution + Relu operation may create massive
feature maps and it is crucial to reduce the feature map size while keeping the identified feature
intact.
Pooling is a way to take large images and shrink them down while preserving the most important
information in them. It consists of stepping a small window across an image and taking the
maximum value from the window at each step. In practice, a window of 2 or 3 pixels on a side
and steps of 2 pixels work well. A pooling layer is just the operation of performing pooling on an
image or a collection of images. The output will have the same number of images, but they will
each have fewer pixels. This is also helpful in managing the computational load.
Once the desired amount of convolution operations are performed (depending upon the designed
model) it is now time to make use of the power of deep learning neural networks to harness the
full potential of the operations performed in earlier stages. But before we pass the pooled feature
maps to the neural network for learning, we need to flatten the matrices. The reason is very
obvious: neural network only accepts a single dimension input. So we stack them like Lego
bricks. In the end, raw images get filtered, rectified, and pooled to create a set of shrunken,
feature-filtered images and now it is ready to go into the world of neurons (Neural network).
The Fully connected layers in the neural network take the high-level filtered images (1
dimension rectified pooled feature map) and translate them into votes (or signals). These votes
are expressed as weights, or connection strengths, between each value and each category. When
a new image is presented to the CNN, it percolates through the lower layers until it reaches the
fully connected layer at the end. Then an election is held. The answer with the most votes wins
and is declared the category of the input.
And that is how a Deep CNN works. The below figure would summarize what we have talked
about above
DCNN process
DCNN model considerations
(Hyperparameter tuning)
Unfortunately, not every aspect of CNNs can be learned in so straightforward a manner. There is
still a long list of decisions that a CNN designer must make.
 For each convolution layer, How many features? How many pixels in each feature?
 For each pooling layer, What window size? What stride?
 What function should I use? How many epochs? Any early stopping?
 For each extra fully connected layer, How many hidden neurons? and so on...
In addition to these, there are also higher-level architectural decisions to make like how many of
each layer to include? In what order? There are lots of tweaks that we can try, such as new layer
types and more complex ways to connect layers with each other or simply increasing the number
of epochs or changing the activation function.
And the best way to decide is to do and see it for yourself.
Here is a simple notebook where you can see what this might look like and how you can come to
a conclusion for selecting the best CNN hyperparameter combination.
https://colab.re
search.google.com/drive/1gXenThfIViK2v14WJ2D-U9U3hcW5QjC3?usp=sharing
Do note that it may be computationally heavy and hence optimization of your image and batch
size might be essential.

More Related Content

PPTX
Speech Processing with deep learning
PPTX
Dssg talk CNN intro
PPTX
Mnist report ppt
PDF
Mnist report
PPTX
PPTX
Introduction to Convolutional Neural Networks
PPTX
Deep Learning course slides Week 5.pptx
DOCX
deep learning
Speech Processing with deep learning
Dssg talk CNN intro
Mnist report ppt
Mnist report
Introduction to Convolutional Neural Networks
Deep Learning course slides Week 5.pptx
deep learning

Similar to Deep Neural Network DNN.docx (20)

PPTX
Chapter Four Deep Learning artificial intelligence .pptx
PPTX
Let_s_Dive_to_Deep_Learning.pptx
PPTX
Convolutional Neural Network - Computer Vision.pptx
PDF
Visualizing and Understanding Convolutional Networks
PPTX
11_Saloni Malhotra_SummerTraining_PPT.pptx
PPTX
Introduction to convolutional networks .pptx
PPTX
Convolutional Neural Network and Its Applications
PDF
DL.pdf
PDF
Convolutional Neural Networks (CNN)
PDF
improving Profile detection using Deep Learning
PPTX
BASIC CONCEPT OF DEEP LEARNING.pptx
PPTX
Deep Learning
PPTX
[Revised] Intro to CNN
PPTX
Deep Computer Vision - 1.pptx
PDF
Handwritten Digit Recognition using Convolutional Neural Networks
DOCX
Convolutional Neural Networks
PPTX
CNN.pptx
DOCX
doc1.docx
PPTX
let's dive to deep learning
PDF
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Chapter Four Deep Learning artificial intelligence .pptx
Let_s_Dive_to_Deep_Learning.pptx
Convolutional Neural Network - Computer Vision.pptx
Visualizing and Understanding Convolutional Networks
11_Saloni Malhotra_SummerTraining_PPT.pptx
Introduction to convolutional networks .pptx
Convolutional Neural Network and Its Applications
DL.pdf
Convolutional Neural Networks (CNN)
improving Profile detection using Deep Learning
BASIC CONCEPT OF DEEP LEARNING.pptx
Deep Learning
[Revised] Intro to CNN
Deep Computer Vision - 1.pptx
Handwritten Digit Recognition using Convolutional Neural Networks
Convolutional Neural Networks
CNN.pptx
doc1.docx
let's dive to deep learning
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Ad

More from jaffarbikat (10)

DOCX
Deep Learning Vocabulary.docx
PPTX
Chapter 24 till slide 40.pptx
PPTX
Chapter 21 Couloumb Law 25.pptx
PDF
Chapter 21.pdf
PPTX
Chapter 21 Couloumb Law 25.pptx
PPTX
10-Sequences and summation.pptx
PPTX
11-Induction CIIT.pptx
PPTX
9-Functions.pptx
PPT
8-Sets-2.ppt
PPT
7-Sets-1.ppt
Deep Learning Vocabulary.docx
Chapter 24 till slide 40.pptx
Chapter 21 Couloumb Law 25.pptx
Chapter 21.pdf
Chapter 21 Couloumb Law 25.pptx
10-Sequences and summation.pptx
11-Induction CIIT.pptx
9-Functions.pptx
8-Sets-2.ppt
7-Sets-1.ppt
Ad

Recently uploaded (20)

PPTX
6- Architecture design complete (1).pptx
PPTX
HPE Aruba-master-icon-library_052722.pptx
PPTX
Entrepreneur intro, origin, process, method
PDF
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
PPTX
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
PDF
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
PDF
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
PPTX
YV PROFILE PROJECTS PROFILE PRES. DESIGN
PDF
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
PPTX
Special finishes, classification and types, explanation
PPTX
DOC-20250430-WA0014._20250714_235747_0000.pptx
PDF
Urban Design Final Project-Context
PDF
Quality Control Management for RMG, Level- 4, Certificate
PPT
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
PPTX
Fundamental Principles of Visual Graphic Design.pptx
PPTX
Tenders & Contracts Works _ Services Afzal.pptx
PDF
BRANDBOOK-Presidential Award Scheme-Kenya-2023
PPTX
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
PPT
pump pump is a mechanism that is used to transfer a liquid from one place to ...
PDF
YOW2022-BNE-MinimalViableArchitecture.pdf
6- Architecture design complete (1).pptx
HPE Aruba-master-icon-library_052722.pptx
Entrepreneur intro, origin, process, method
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
YV PROFILE PROJECTS PROFILE PRES. DESIGN
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
Special finishes, classification and types, explanation
DOC-20250430-WA0014._20250714_235747_0000.pptx
Urban Design Final Project-Context
Quality Control Management for RMG, Level- 4, Certificate
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
Fundamental Principles of Visual Graphic Design.pptx
Tenders & Contracts Works _ Services Afzal.pptx
BRANDBOOK-Presidential Award Scheme-Kenya-2023
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
pump pump is a mechanism that is used to transfer a liquid from one place to ...
YOW2022-BNE-MinimalViableArchitecture.pdf

Deep Neural Network DNN.docx

  • 1. https://medium.com/aiguys/deep-convolutional-neural-networks-dcnns-explained-in-layman-terms- b990b2818061 Deep Convolutional Neural Networks (DCNNs) explained in layman's terms In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural networks, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps. Basic DCNN architecture DCNN Architecture 1. Convolutional Layer + Relu 2. Pooling Layer 3. Fully Connected Layer 4. Dropout 5. Activation Functions
  • 2. 1. Convolutional Layer + Relu This layer is the first layer that is used to extract the various features from the input images. In this layer, the mathematical operation of convolution is performed between the input image and a filter of a particular size MxM. By sliding the filter over the input image, the dot product is taken between the filter and the parts of the input image with respect to the size of the filter (MxM). The output is termed as the Feature map which gives us information about the image such as the corners and edges. Later, this feature map is fed to other layers to learn several other features of the input image. 2. Pooling Layer In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this layer is to decrease the size of the convolved feature map to reduce computational costs. This is performed by decreasing the connections between layers and independently operating on each feature map. Depending upon the method used, there are several types of Pooling operations. In Max Pooling, the largest element is taken from the feature map. Average Pooling calculates the average of the elements in a predefined sized Image section. The total sum of the elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually serves as a bridge between the Convolutional Layer and the FC Layer 3. Fully Connected Layer The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is used to connect the neurons between two different layers. These layers are usually placed before the output layer and form the last few layers of a CNN Architecture. In this, the input image from the previous layers is flattened and fed to the FC layer. The flattened vector then undergoes a few more FC layers where mathematical operations usually take place. In this stage, the classification process begins to take place. 4. Dropout Usually, when all the features are connected to the FC layer, it can cause overfitting in the training dataset. Overfitting occurs when a particular model works so well on the training data causing a negative impact on the model’s performance when used on new data. To overcome this problem, a dropout layer is utilized wherein a few neurons are dropped from the neural network during the training process resulting in reduced size of the model. On passing a dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.
  • 3. 5. Activation Functions Finally, one of the most important parameters of the CNN model is the activation function. They are used to learn and approximate any kind of continuous and complex relationship between variables of the network. In simple words, it decides which information of the model should fire in the forward direction and which ones should not at the end of the network. It adds non-linearity to the network. There are several commonly used activation functions such as the ReLU, Softmax, tanH, and the Sigmoid functions. Each of these functions has a specific usage. For a binary classification CNN model, sigmoid and softmax functions are preferred and for multi-class classification, softmax is used. How does a DCNN work? CNN compares images piece by piece. The pieces that it looks for are called features which are nothing but a bunch of MxM matrices with numbers(images are nothing but MxM number matrices of pixel values for a computer). By finding rough feature matches in roughly the same positions in two images, CNNs get a lot better at seeing similarities than whole-image matching schemes. However, When presented with a new image, the CNN doesn’t know exactly where these features will match so it tries them everywhere, in every possible position(matches feature matrices in steps by shifting the defined steps at a time). In calculating the match to a feature across the whole image, we make it a filter. The math we use to do this is called convolution, from which Convolutional Neural Networks take their name. The next step is to repeat the convolution process in its entirety for each of the other features. The result is a set of filtered images, one for each of our filters. It’s convenient to think of this whole collection of convolution operations as a single processing step. Now comes the step where we introduce so-called “non-linearity” in our model so that our model can predict and learn non-linear boundaries. A very common way to do this is using a non-linear function (like Relu, gelu). The most popular non-linear function is RELU which performs a simple math operation: wherever a negative number occurs, swap it out for a 0. This helps the CNN stay mathematically healthy by keeping learned values from getting stuck near 0 or blowing up toward infinity. Note that this convolution + Relu operation may create massive feature maps and it is crucial to reduce the feature map size while keeping the identified feature intact. Pooling is a way to take large images and shrink them down while preserving the most important information in them. It consists of stepping a small window across an image and taking the maximum value from the window at each step. In practice, a window of 2 or 3 pixels on a side and steps of 2 pixels work well. A pooling layer is just the operation of performing pooling on an image or a collection of images. The output will have the same number of images, but they will each have fewer pixels. This is also helpful in managing the computational load.
  • 4. Once the desired amount of convolution operations are performed (depending upon the designed model) it is now time to make use of the power of deep learning neural networks to harness the full potential of the operations performed in earlier stages. But before we pass the pooled feature maps to the neural network for learning, we need to flatten the matrices. The reason is very obvious: neural network only accepts a single dimension input. So we stack them like Lego bricks. In the end, raw images get filtered, rectified, and pooled to create a set of shrunken, feature-filtered images and now it is ready to go into the world of neurons (Neural network). The Fully connected layers in the neural network take the high-level filtered images (1 dimension rectified pooled feature map) and translate them into votes (or signals). These votes are expressed as weights, or connection strengths, between each value and each category. When a new image is presented to the CNN, it percolates through the lower layers until it reaches the fully connected layer at the end. Then an election is held. The answer with the most votes wins and is declared the category of the input. And that is how a Deep CNN works. The below figure would summarize what we have talked about above DCNN process DCNN model considerations (Hyperparameter tuning) Unfortunately, not every aspect of CNNs can be learned in so straightforward a manner. There is still a long list of decisions that a CNN designer must make.  For each convolution layer, How many features? How many pixels in each feature?  For each pooling layer, What window size? What stride?  What function should I use? How many epochs? Any early stopping?  For each extra fully connected layer, How many hidden neurons? and so on...
  • 5. In addition to these, there are also higher-level architectural decisions to make like how many of each layer to include? In what order? There are lots of tweaks that we can try, such as new layer types and more complex ways to connect layers with each other or simply increasing the number of epochs or changing the activation function. And the best way to decide is to do and see it for yourself. Here is a simple notebook where you can see what this might look like and how you can come to a conclusion for selecting the best CNN hyperparameter combination. https://colab.re search.google.com/drive/1gXenThfIViK2v14WJ2D-U9U3hcW5QjC3?usp=sharing Do note that it may be computationally heavy and hence optimization of your image and batch size might be essential.