Unit-2
Deep Learning
Introduction to CNNs
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Lets, see the working flow of CNN based on the image of the flower:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Another name of Convolutional Neural Network is ConvNet.
Working flow of CNN in pictorial representation:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Now, suppose I want to see the output in probability format, then in that case, I will be using an activation function
before output. Since I am dealing with house and tree simultaneously, in that case, I will be using softmax. If I am
only dealing with a house or tree, then I will be using sigmoid, but in this case, I am dealing with a house and tree,
because house and tree are both important feature. So, I will be using softmax here.
Here, the machine will automatially calculate the raw input of each feature. After that, it will use
softmax formula to find the probabilities.
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
We can draw either of them, its upto us.
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Now, lets go in more details:-
In the previous example, we discussed an image of a house and a tree with a background. Now, suppose I
have an image of the digit '8' with no background, just the '8' itself. In this case, how will convolution break
the image into smaller parts, and how will pooling select the relevant features?
You have a plain image with the digit "8." Even though it’s a single object with no
background, convolution and pooling still apply effectively. The goal here is to extract
features from the shape and structure of the "8.“
Input:- First it will take the image as an input.
Convolution:-
The image is divided into smaller sections, called pixels, and
each pixel has a value.
• In grayscale images, each pixel has a single intensity value representing shades of gray, typically
ranging from 0 (black) to 255 (white).
• In colorful images, each pixel is represented by three values corresponding to the RGB channels (Red,
Green, and Blue), each with its own intensity ranging from 0 to 255. These values combine to form the
pixel's color.
• How to combine these three values to form a single value:-
Since, we have a black white image, the values of pixel will remain same.
The CNN model will automatically compute the number in each pixel, if the model is dealing with colourful image,
it will automatically generate the formula to convert three numbers into single number.
After getting all the numbers which are present in each pixel, it will automatically generate Kernel.
•In a Convolutional Neural Network (CNN), the kernel values learned during the training process by the model only.
•Initially, the kernel values (weights) are randomly initialized.
•During training, these kernel weights are updated through backpropagation to minimize the error (loss) and better extract
features from the input data.
•Over time, the model "learns" the optimal kernel values that help identify relevant features such as edges, patterns, or other
characteristics in the image.
For small images (like 3x3), use a 2x2 kernel to capture local features efficiently. For larger images, use a 3x3 or 5x5 kernel.
Larger kernels increase computation and may cause overfitting. A 3x3 kernel is common in many CNNs because it balances
feature extraction and efficiency, while 5x5 can be used for more complex patterns but should be used cautiously.
Although the model automatically generate the values of kernel (Weights), we have to define whether we should
implement 2 x 2 or 3 x 3 Kernel. In most cases, 3 x 3 kernel is recommended. As:-
Now lets see, how kernel is used with pixel value to generate Convolved Feature, Activation Map, or Feature Map.
Suppose we have an input image of 4 x 4, and we want to use 2 x2 kernel:-
Deep Learning Unit Two Power Point Presentation
The behavior of the convolutional layer is primarily governed by us with the following main hyperparameters:
1. Kernel Size: The kernel size is the size of the filter window that slides over the image (e.g., 3x3, 5x5). It defines
how many pixels the kernel looks at during each step of the convolution operation.
2. Stride: Stride refers to how far the kernel moves with each step.
For example, with a stride of 1, the kernel moves one pixel at a time, covering every pixel in the image.
With a stride of 2, the kernel skips every other pixel, making the output smaller and faster to compute.
Strides of 1 are commonly used to prevent underfitting because the kernel will capture more details.
3. Padding: Padding involves adding extra pixels (usually zeros) around the edges of the image to ensure that the
kernel can fully process all parts of the image, including the edges.
•Padding helps preserve the dimensions of the output feature map and allows the kernel to cover all areas of the input
image.
4. Number of Filters/Depth: If you use 3 filters, the convolutional layer will produce 3 different feature maps, each
representing a different characteristic of the image.
Deep Learning Unit Two Power Point Presentation
After the convolution layer creates the feature map by breaking the image into smaller parts and performing
element-wise multiplication and summation, the resulting feature map is passed to the activation layer (Relu),
where, the edge are analyzed (via pixel value with positive number). ReLU sets all negative values in the feature
map to zero, introducing non-linearity into the model and enabling it to learn complex patterns.
The resulted image is forwarded to pooling layer.
• The goal of the pooling layer is to retain the most relevant information and discard irrelevant details.
• In this scenario, the pooling layer processes the pixels of the feature map directly.
The two primary types of pooling are max pooling and average pooling.
1. Max Pooling:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Max Pooling (2x2) with Stride 2 in 4 x 4 input:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Coming to average pooling:-
Deep Learning Unit Two Power Point Presentation
After seeing both examples of max pooling, here are our observations:-
In most cases, max pooling is chosen because pixels with larger values are considered more relevant. Larger
values indicate stronger activations, which are more important for feature detection. Max pooling ignores weak
activations and focuses on the strongest features.
In contrast, average pooling computes the average by including even noisy or irrelevant values.
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Suppose we want to express the output in probability format; in that case, we can either
use sigmoid or softmax. Since we are dealing with one feature, we will use sigmoid as
the activation function in the output.
The model will automatically generate a raw value based on the data it has. Using that value, we can
substitute it into the activation function to obtain the probability. For example, suppose the raw value is 9:-
Deep Learning Unit Two Power Point Presentation
Till now, we have seen 2 methods to classify an image by implementing CNN.
Deep Learning Unit Two Power Point Presentation
Till now, in the convolution step, we generate either a 3x3 kernel or a 5x5 kernel, with the values automatically
generated. Let me explain: to generate the kernel in the convolution method, we either use correlation (where the
matrix of the kernel remains as it is) or convolution (where the matrix of the kernel is flipped). Then, we apply the
activation function as usual on the activation map. In max situations, we use convolution only. Even if we use
correlation, we still use convolution simultaneously.
Convolution and Correlation
Deep Learning Unit Two Power Point Presentation
We primarily use convolution in most cases because of its unique advantages and properties,
especially in the context of Convolutional Neural Networks (CNNs).
Here’s why:-
• 1. Feature Extraction: Convolution flips the kernel, which helps in better capturing patterns like edges, textures, and
other local features in an image.
• 2. Translation Invariance: Convolution ensures that features are detected regardless of their position in the input.
Flipping the kernel helps achieve this by analyzing patterns more comprehensively.
• 3. Efficient Computation: Convolution is computationally efficient as it reduces the dimensions of the input while
preserving essential features. This makes it a preferred choice for feature reduction before applying pooling.
• 4. Edge Detection: Convolution is excellent for edge detection due to the way kernels are flipped and applied. Edges are
crucial in understanding the structure of images, and convolution captures this effectively.
• 5. Compatibility with Activation Functions: After applying convolution, the output is often passed through an
activation function like ReLU to introduce non-linearity, which improves the model’s ability to learn complex patterns.
• 6. Correlation is Less Informative: Correlation does not flip the kernel, so it might miss some spatial relationships that
convolution can capture. While correlation can be used, convolution is generally preferred because of its robustness in
detecting features.
• 7. Standard Practice: Most pre-trained models and standard CNN architectures are designed with convolution layers,
making it a widely adopted practice in machine learning and computer vision.
Deep Learning Unit Two Power Point Presentation
Let’s see how we use convolution and correlation together (especially for edge detection):-
Suppose, we have an original pic of “8”:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Pooling Layer
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
CNN Architecture
So, till now, we saw how:-
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Detection and Segmentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Image Classification
Image ( or Text) Classification and Hyper-parameter tuning.
Deep Learning Unit Two Power Point Presentation
Advanced CNNs for computer vision.
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation
Deep Learning Unit Two Power Point Presentation

More Related Content

PPTX
Convolutional Neural Network Architecture
PPTX
Introduction to convolutional networks .pptx
PPTX
Machine Learning - Introduction to Convolutional Neural Networks
PPTX
Mnist report ppt
PDF
Mnist report
PDF
Introduction to Applied Machine Learning
PPTX
Introduction to Convolutional Neural Networks
PDF
Overview of Convolutional Neural Networks
Convolutional Neural Network Architecture
Introduction to convolutional networks .pptx
Machine Learning - Introduction to Convolutional Neural Networks
Mnist report ppt
Mnist report
Introduction to Applied Machine Learning
Introduction to Convolutional Neural Networks
Overview of Convolutional Neural Networks

Similar to Deep Learning Unit Two Power Point Presentation (20)

PPTX
CNN.pptx
PPTX
Deep learning requirement and notes for novoice
PPT
digital image processing - convolutional networks
PDF
Convolutional_neural_network mechanism.pptx.pdf
PPTX
Deep-Learning-2017-Lecture5CNN.pptx
PPTX
Convolutional neural network in deep learning
PPTX
CNN_AH.pptx
PPTX
CNN_AH.pptx
PPT
Convolutional Neural Networks definicion y otros
PDF
Convolutional neural network complete guide
PDF
convolutional neural network and its applications.pdf
PPT
Introduction to Deep-Learning-CNN Arch.ppt
PDF
Practical Deep Learning Using Tensor Flow - Sandeep Kath
PPTX
Deep Computer Vision - 1.pptx
PPTX
Deep-LearningwithVisualExamplesExplaine.pptx
PDF
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
PPTX
11_Saloni Malhotra_SummerTraining_PPT.pptx
PDF
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
CNN.pptx
Deep learning requirement and notes for novoice
digital image processing - convolutional networks
Convolutional_neural_network mechanism.pptx.pdf
Deep-Learning-2017-Lecture5CNN.pptx
Convolutional neural network in deep learning
CNN_AH.pptx
CNN_AH.pptx
Convolutional Neural Networks definicion y otros
Convolutional neural network complete guide
convolutional neural network and its applications.pdf
Introduction to Deep-Learning-CNN Arch.ppt
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Deep Computer Vision - 1.pptx
Deep-LearningwithVisualExamplesExplaine.pptx
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
11_Saloni Malhotra_SummerTraining_PPT.pptx
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Ad

Recently uploaded (20)

PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Configure Apache Mutual Authentication
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Architecture types and enterprise applications.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
DOCX
search engine optimization ppt fir known well about this
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Five Habits of High-Impact Board Members
A review of recent deep learning applications in wood surface defect identifi...
sustainability-14-14877-v2.pddhzftheheeeee
Configure Apache Mutual Authentication
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Custom Battery Pack Design Considerations for Performance and Safety
A comparative study of natural language inference in Swahili using monolingua...
Architecture types and enterprise applications.pdf
Geologic Time for studying geology for geologist
Flame analysis and combustion estimation using large language and vision assi...
2018-HIPAA-Renewal-Training for executives
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
The influence of sentiment analysis in enhancing early warning system model f...
Chapter 5: Probability Theory and Statistics
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
search engine optimization ppt fir known well about this
CloudStack 4.21: First Look Webinar slides
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Five Habits of High-Impact Board Members
Ad

Deep Learning Unit Two Power Point Presentation

  • 5. Lets, see the working flow of CNN based on the image of the flower:-
  • 8. Another name of Convolutional Neural Network is ConvNet.
  • 9. Working flow of CNN in pictorial representation:-
  • 14. Now, suppose I want to see the output in probability format, then in that case, I will be using an activation function before output. Since I am dealing with house and tree simultaneously, in that case, I will be using softmax. If I am only dealing with a house or tree, then I will be using sigmoid, but in this case, I am dealing with a house and tree, because house and tree are both important feature. So, I will be using softmax here.
  • 15. Here, the machine will automatially calculate the raw input of each feature. After that, it will use softmax formula to find the probabilities.
  • 18. We can draw either of them, its upto us.
  • 22. Now, lets go in more details:- In the previous example, we discussed an image of a house and a tree with a background. Now, suppose I have an image of the digit '8' with no background, just the '8' itself. In this case, how will convolution break the image into smaller parts, and how will pooling select the relevant features?
  • 23. You have a plain image with the digit "8." Even though it’s a single object with no background, convolution and pooling still apply effectively. The goal here is to extract features from the shape and structure of the "8.“ Input:- First it will take the image as an input. Convolution:- The image is divided into smaller sections, called pixels, and each pixel has a value.
  • 24. • In grayscale images, each pixel has a single intensity value representing shades of gray, typically ranging from 0 (black) to 255 (white). • In colorful images, each pixel is represented by three values corresponding to the RGB channels (Red, Green, and Blue), each with its own intensity ranging from 0 to 255. These values combine to form the pixel's color. • How to combine these three values to form a single value:-
  • 25. Since, we have a black white image, the values of pixel will remain same.
  • 26. The CNN model will automatically compute the number in each pixel, if the model is dealing with colourful image, it will automatically generate the formula to convert three numbers into single number. After getting all the numbers which are present in each pixel, it will automatically generate Kernel. •In a Convolutional Neural Network (CNN), the kernel values learned during the training process by the model only. •Initially, the kernel values (weights) are randomly initialized. •During training, these kernel weights are updated through backpropagation to minimize the error (loss) and better extract features from the input data. •Over time, the model "learns" the optimal kernel values that help identify relevant features such as edges, patterns, or other characteristics in the image. For small images (like 3x3), use a 2x2 kernel to capture local features efficiently. For larger images, use a 3x3 or 5x5 kernel. Larger kernels increase computation and may cause overfitting. A 3x3 kernel is common in many CNNs because it balances feature extraction and efficiency, while 5x5 can be used for more complex patterns but should be used cautiously.
  • 27. Although the model automatically generate the values of kernel (Weights), we have to define whether we should implement 2 x 2 or 3 x 3 Kernel. In most cases, 3 x 3 kernel is recommended. As:- Now lets see, how kernel is used with pixel value to generate Convolved Feature, Activation Map, or Feature Map.
  • 28. Suppose we have an input image of 4 x 4, and we want to use 2 x2 kernel:-
  • 30. The behavior of the convolutional layer is primarily governed by us with the following main hyperparameters: 1. Kernel Size: The kernel size is the size of the filter window that slides over the image (e.g., 3x3, 5x5). It defines how many pixels the kernel looks at during each step of the convolution operation.
  • 31. 2. Stride: Stride refers to how far the kernel moves with each step. For example, with a stride of 1, the kernel moves one pixel at a time, covering every pixel in the image. With a stride of 2, the kernel skips every other pixel, making the output smaller and faster to compute. Strides of 1 are commonly used to prevent underfitting because the kernel will capture more details.
  • 32. 3. Padding: Padding involves adding extra pixels (usually zeros) around the edges of the image to ensure that the kernel can fully process all parts of the image, including the edges. •Padding helps preserve the dimensions of the output feature map and allows the kernel to cover all areas of the input image. 4. Number of Filters/Depth: If you use 3 filters, the convolutional layer will produce 3 different feature maps, each representing a different characteristic of the image.
  • 34. After the convolution layer creates the feature map by breaking the image into smaller parts and performing element-wise multiplication and summation, the resulting feature map is passed to the activation layer (Relu), where, the edge are analyzed (via pixel value with positive number). ReLU sets all negative values in the feature map to zero, introducing non-linearity into the model and enabling it to learn complex patterns. The resulted image is forwarded to pooling layer. • The goal of the pooling layer is to retain the most relevant information and discard irrelevant details. • In this scenario, the pooling layer processes the pixels of the feature map directly. The two primary types of pooling are max pooling and average pooling.
  • 40. Max Pooling (2x2) with Stride 2 in 4 x 4 input:-
  • 43. Coming to average pooling:-
  • 45. After seeing both examples of max pooling, here are our observations:- In most cases, max pooling is chosen because pixels with larger values are considered more relevant. Larger values indicate stronger activations, which are more important for feature detection. Max pooling ignores weak activations and focuses on the strongest features. In contrast, average pooling computes the average by including even noisy or irrelevant values.
  • 48. Suppose we want to express the output in probability format; in that case, we can either use sigmoid or softmax. Since we are dealing with one feature, we will use sigmoid as the activation function in the output. The model will automatically generate a raw value based on the data it has. Using that value, we can substitute it into the activation function to obtain the probability. For example, suppose the raw value is 9:-
  • 50. Till now, we have seen 2 methods to classify an image by implementing CNN.
  • 52. Till now, in the convolution step, we generate either a 3x3 kernel or a 5x5 kernel, with the values automatically generated. Let me explain: to generate the kernel in the convolution method, we either use correlation (where the matrix of the kernel remains as it is) or convolution (where the matrix of the kernel is flipped). Then, we apply the activation function as usual on the activation map. In max situations, we use convolution only. Even if we use correlation, we still use convolution simultaneously. Convolution and Correlation
  • 54. We primarily use convolution in most cases because of its unique advantages and properties, especially in the context of Convolutional Neural Networks (CNNs). Here’s why:- • 1. Feature Extraction: Convolution flips the kernel, which helps in better capturing patterns like edges, textures, and other local features in an image. • 2. Translation Invariance: Convolution ensures that features are detected regardless of their position in the input. Flipping the kernel helps achieve this by analyzing patterns more comprehensively. • 3. Efficient Computation: Convolution is computationally efficient as it reduces the dimensions of the input while preserving essential features. This makes it a preferred choice for feature reduction before applying pooling. • 4. Edge Detection: Convolution is excellent for edge detection due to the way kernels are flipped and applied. Edges are crucial in understanding the structure of images, and convolution captures this effectively. • 5. Compatibility with Activation Functions: After applying convolution, the output is often passed through an activation function like ReLU to introduce non-linearity, which improves the model’s ability to learn complex patterns. • 6. Correlation is Less Informative: Correlation does not flip the kernel, so it might miss some spatial relationships that convolution can capture. While correlation can be used, convolution is generally preferred because of its robustness in detecting features. • 7. Standard Practice: Most pre-trained models and standard CNN architectures are designed with convolution layers, making it a widely adopted practice in machine learning and computer vision.
  • 56. Let’s see how we use convolution and correlation together (especially for edge detection):- Suppose, we have an original pic of “8”:-
  • 66. CNN Architecture So, till now, we saw how:-
  • 76. Image ( or Text) Classification and Hyper-parameter tuning.
  • 78. Advanced CNNs for computer vision.