ConvNet
S.Hemalatha
AP(S r.G)
Dept. Of Computer Applications
KEC
Convolutional Network
CNN
• A Convolutional Neural Network (ConvNet or CNN) is a class of
deep neural networks primarily used for analyzing visual data such
as images or videos, where data patterns play a crucial role. .
• advanced version of artificial neural networks (ANNs),
primarily designed to extract features from grid-like matrix
datasets.
• CNNs are widely used in computer vision applications.
• CNNs consist of multiple layers like the input layer, Convolutional
layer, pooling layer, and fully connected layers.
Architecture of CNN
Core Components and Layers of a ConvNet:
Input Layer:
• This layer receives the raw input data, typically an image represented as a multi-dimensional array (e.g., height x width x color
channels). (eg.image of dimension 32 x 32 x 3)
• This layer holds the raw input of the image with width 32, height 32, and depth 3.
Convolutional Layer:
• This is the fundamental building block of a ConvNet.
• It applies a set of learnable filters (also called kernels) to the input volume. The filters/kernels are smaller matrices usually 2x2,
3x3, or 5x5 shape.
• Each filter performs a convolution operation, sliding across the input and computing the dot product between the filter's weights
and the corresponding input region.
• This process extracts features like edges, textures, and patterns, generating feature maps that represent the presence of these
features at different locations.
Activation Function (e.g., ReLU):
• Typically applied after the convolutional layer.
• It introduces non-linearity into the model, allowing it to learn more complex relationships in the data.
• Rectified Linear Unit (ReLU) is a common choice, setting all negative values in the feature map to zero and keeping positive values
unchanged.
Pooling Layer:
• Its purpose is to reduce the spatial dimensions (height and width) of the feature maps, thereby decreasing computational
complexity and the number of parameters.
• Common pooling operations include Max Pooling (selecting the maximum value within a defined window) and Average Pooling
(calculating the average value within a window).
Fully Connected Layer (FC Layer):
• Located at the end of the network, after several convolutional and pooling layers.
• The flattened output from the preceding layers is fed into this layer.
• Each neuron in a fully connected layer is connected to every neuron in the previous layer, allowing the network
to learn high-level representations and make predictions based on the extracted features.
Output Layer:
• The final layer of the ConvNet.
• It typically uses an activation function like Softmax for classification tasks, producing a probability
distribution over the possible classes.
Additional Components:
Dropout Layer:
A regularization technique often used in fully connected layers to prevent overfitting by randomly deactivating a
fraction of neurons during training.
Batch Normalization:
A technique to normalize the activations of a layer, which can improve training stability and speed.
This hierarchical structure allows ConvNets to automatically learn and extract increasingly complex features from
raw visual data, making them highly effective for tasks such as image classification, object detection, and
segmentation.
Detailed description of each layers
1. Input Layer
Purpose: Accepts raw image data (e.g., 224×224×3 for a color
image).
Data shape: (Height × Width × Channels).
Example: For grayscale: 28×28×1, for RGB: 32×32×3.
2. Convolutional Layers
Core operation: Applies a set of learnable filters (kernels) to
extract local features.
Key parameters:
Number of filters (e.g., 32, 64, 128…)
Kernel size (e.g., 3×3, 5×5)
Stride (step size of filter movement eg 1 or 2 or 3 etc)
Padding ("same" to preserve dimensions, "valid" to reduce)
Activation: Usually ReLU for non-linearity.
Outcome: Feature maps (spatial representation of learned
features).
3. Pooling Layers
Purpose: Reduce spatial dimensions to lower computational
cost and control overfitting.
Types:
Max Pooling: Keeps the largest value in each window.
Average Pooling: Takes the mean of values in the window.
Typical size: 2×2 with stride 2.
4. Dropout Layers (optional)
Purpose: Randomly "turns off" a fraction of neurons during
training to prevent overfitting.
Typical rate: 0.25–0.5.
5. Fully Connected (Dense) Layers
Purpose: Flatten the feature maps into a vector and learn high-
level representations. (nD to 1D)
Often uses ReLU activation, ending with a Softmax (for
classification).
6. Output Layer
Purpose: Produces final predictions.
Activation:
Softmax for multi-class classification.
Sigmoid for binary classification.
Architecture of the CNNs applied to digit
recognition
Convolutional Neural Network Architecture
0.1*1
Convolutional Neural Network Architecture
• Convolution- The
term convolution refers to the
mathematical combination of two
functions to produce a third
function.
• Pooling- The objective
of Pooling is to down-sample an
input representation (image,
hidden-layer output matrix,
etc.), reducing its dimensions and
allowing for assumptions to be
made about features contained in
the sub-regions created.
Convolution
Pooling
Fully Connected Layers- FCL in a neural
network are those layers where all the
inputs from one layer are connected to
every activation unit of the next layer.
1. What is a Convolutional Layer?
A Convolutional Layer in a CNN applies small, trainable filters (also called kernels) over an input
image (or feature map) to detect features such as edges, textures, shapes, etc.
2. Key Components
a) Filter (Kernel)
• A small matrix of weights, e.g., 3×3 or 5×5.
• Scans across the image, multiplying values element-wise and summing them up.
• Each filter detects a specific pattern (e.g., vertical edges, curves).
• One convolutional layer typically has many filters (e.g., 32, 64…).
b) Stride
• How many pixels the filter moves at each step.
Stride = 1 → Filter moves one pixel at a time (more overlap, bigger output).
Stride = 2 → Filter moves two pixels at a time (less overlap, smaller output).
c) Padding
Decides what happens at the image borders:
Valid padding → No padding, output shrinks.
Same padding → Pads with zeros so output has the same spatial size as input.
3. How the Process Works
Example Input
A 5×5 grayscale image:
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
Filter (Kernel)
A 3×3 filter:
1 0 1
0 1 0
1 0 1
step-by-Step (Stride = 1, No Padding)
Place the filter on the top-left corner of the image.
Multiply each filter value with the overlapping image pixel.
Sum the results → This becomes one pixel in the output.
Slide the filter right by stride steps → Repeat until the end of the row.
Move down by stride steps and repeat for the next row.
First Position Calculation
Filter on top-left corner:
Image patch Filter Multiply & Sum
1 1 1 1 0 1 (1×1)+(1×0)+(1×1) +
0 1 1 × 0 1 0 = (0×0)+(1×1)+(1×0) +
0 0 1 1 0 1 (0×1)+(0×0)+(1×1)
= 1 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 1 = 4
So, the first output pixel = 4.
Output Size Formula
If:
n = input size
f = filter size
p = padding
s = stride
Then output size
Example:
Input = 5×5, Filter = 3×3, Stride = 1, Padding = 0:
So output is 3×3.
4. Multiple Filters
If the layer has 32 filters, it produces 32 different feature maps (one per filter), stacked
together as the output.
Convolutional Neural Network Architecture
What is a Pooling Layer?
Purpose: Reduce spatial dimensions of feature maps while keeping important
features.
Benefit: Fewer parameters → faster computation → less overfitting.
How: Applies a small window (e.g., 2×2) and replaces it with a single value.
use the same 5×5 matrix (after convolution) for pooling demonstration
4 2 1 0 3
3 1 2 3 4
1 0 1 2 3
0 1 3 4 2
2 3 0 1 1
Max Pooling
Operation: Take the maximum value in each window.
Pooling size: 2×2, Stride: 1, Padding: None
step-by-step:
First Output Row
(1,1) →
4 2
3 1
Max = 4
(1,2) →
2 1
1 2
Max = 2
(1,3) →
1 0
2 3
Max = 3
Output row → [4, 2, 3]
Second Output Row
(2,1) →
3 1
1 0
Max = 3
(2,2) →
1 2
0 1
Max = 2
(2,3) →
2 3
1 2
Max = 3
Output row → [3, 2, 3]
Third Output Row
(3,1) →
1 0
0 3
Max = 3
(3,2) →
0 1
3 4
Max = 4
(3,3) →
1 2
4 2
Max = 4
Output row → [3, 4, 4]
Final Max Pooling Output (Stride
= 1)
4 2 3
3 2 3
3 4 4
Max Pooling with padding
• Max Pooling with padding and stride = 2 for your same 5×5 feature map step-by-step.
• Step 1: Add Padding
• If we use 2×2 pooling with stride = 2 and want to cover all regions including borders, we can add
zero padding around the matrix.
We’ll pad 1 row/column on the bottom and right so pooling windows fit perfectly.
• Padded matrix:
• 4 2 1 0 3 0
• 3 1 2 3 4 0
• 1 0 1 2 3 0
• 0 1 3 4 2 0
• 2 3 0 1 1 0
• 0 0 0 0 0 0
• Size → 6×6
• Pool size: 2×2
• Stride: 2
• Padding: None
• for your same 5×5 feature map.
Max Pooling without padding
Step 2: Apply Max Pooling (2×2, stride=2)
Row 1 of Output
Window 1 (rows 1–2, cols 1–2):
4 2
3 1
Max = 4
Window 2 (rows 1–2, cols 3–4):
1 0
2 3
Max = 3
(No more columns for another 2×2 window.)
Row 2 of Output
Window 1 (rows 3–4, cols 1–2):
1 0
0 1
Max = 1
Window 2 (rows 3–4, cols 3–4):
1 2
3 4
Max = 4
(No more columns for another window.)
Final Max Pooling Output (stride=2, no padding)
4 3
1 4
So, compared to stride=2 with padding,
The output shrinks from 3×3 → 2×2
The bottom/right edges are ignored since no padding is added.
Assignment:
Try all four cases:
stride=1 no padding
stride=2 no padding
stride=1 with padding
stride=2 with padding
into one comparison chart so can see how output size
and values change. That would make the
differences very clear.
Average Pooling
Global Pooling
Average Pooling
Operation: Take the average of all values in
each window.
Example: Pool size = 2×2, Stride = 2.
First window:
4 2
3 1
Average = (4+2+3+1) / 4 = 2.5
Average pooling output:
2.5 1.5 3.5
0.5 2.5 2.5
2.0 1.0 1.5
Global Pooling
Global Max Pooling: Takes the maximum from
the whole feature map (reduces entire map to
1 value per channel).
From our example → max = 4.
Global Average Pooling: Takes the mean of all
values in the feature map.
From our example → sum all values / total count.
This is usually applied just before the output
layer in place of a Flatten → Dense layer.
what happens between pooling layers and fully connected (dense)
layers
How does the 2D matrix from pooling become the 1D vector that a
fully connected layer expects?
Suppose after the last pooling layer, your output shape is:
4 × 4 × 64
Height = 4
Width = 4
Channels (feature maps) = 64
This is not yet suitable for a fully connected layer, because a
Dense layer expects a 1D vector.
Flattening the Output
use a Flatten operation to convert the 3D tensor into a 1D vector.
Example:
Before Flatten: 4 × 4 × 64 Number of elements = 4 × 4 × 64 =
1024 After Flatten: [x , x , x , ..., x ] → Shape: (1024,)
₁ ₂ ₃ ₁₀₂₄
In frameworks:
Keras/TensorFlow: Flatten() layer
Feeding to Fully Connected Layer
Now the 1D vector becomes the input to the dense layer.
Each element of the vector is connected to every neuron in the FC
layer.
If the first Dense layer has 128 neurons:
Input size = 1024 (from flattening)
Weight matrix size = (1024 × 128)
Bias size = (128)
• Why Flatten Is Needed
• Convolutional and pooling layers maintain spatial structure
(H, W, Channels).
• Fully connected layers treat inputs as simple feature lists —
no spatial layout.
• Flatten bridges the gap by reshaping without losing the
learned feature values.
• Pooling → Matrix
Flatten → Vector
Vector → FC layer for final decision-making.
Some common architectures of CNN
• LeNet-5
• AlexNet
• VGG 16
• Inception (GoogLeNet)
• ResNet
• DenseNet
Convolutional Neural Network Architecture
Convolutional Neural Network Architecture
Convolutional Neural Network Architecture
Deep Learning Frameworks for CNNs

More Related Content

PPTX
Convolutional neural network in deep learning
PPTX
Convolutional neural network in deep learning
PPTX
The Technology behind Shadow Warrior, ZTG 2014
PPTX
CNN_Presentation to learn the basics of CNN Model.pptx
PPTX
Introduction to Neural Networks and Deep Learning
PPTX
Waste Classification System using Convolutional Neural Networks.pptx
PDF
Eye deep
PDF
Hardware Acceleration for Machine Learning
Convolutional neural network in deep learning
Convolutional neural network in deep learning
The Technology behind Shadow Warrior, ZTG 2014
CNN_Presentation to learn the basics of CNN Model.pptx
Introduction to Neural Networks and Deep Learning
Waste Classification System using Convolutional Neural Networks.pptx
Eye deep
Hardware Acceleration for Machine Learning

Similar to Convolutional Neural Network Architecture (20)

PPTX
Introduction to convolutional networks .pptx
PPTX
CNN_AH.pptx
PPTX
CNN_AH.pptx
PPTX
Convolutional Neural Networks
PPTX
PPTX
CG _LINE DRAWING_algorithm_Algorithm_frame_buffer.pptx
PDF
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
PPTX
CNN LSTM Transformers Presentation .pptx
PPTX
Convolutional Neural Network (CNN)of Deep Learning
PPTX
Deep learning requirement and notes for novoice
PDF
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
PPTX
different filters used in the images.pptx
PDF
Overview of Convolutional Neural Networks
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
PPTX
presentation of IntroductionDeepLearning.pptx
PDF
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
PPTX
CNN.pptx
PPTX
B.tech_project_ppt.pptx
PDF
Introduction to Applied Machine Learning
PDF
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Introduction to convolutional networks .pptx
CNN_AH.pptx
CNN_AH.pptx
Convolutional Neural Networks
CG _LINE DRAWING_algorithm_Algorithm_frame_buffer.pptx
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
CNN LSTM Transformers Presentation .pptx
Convolutional Neural Network (CNN)of Deep Learning
Deep learning requirement and notes for novoice
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
different filters used in the images.pptx
Overview of Convolutional Neural Networks
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
presentation of IntroductionDeepLearning.pptx
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
CNN.pptx
B.tech_project_ppt.pptx
Introduction to Applied Machine Learning
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Ad

Recently uploaded (20)

PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
advance database management system book.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Computer Architecture Input Output Memory.pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Complications of Minimal Access-Surgery.pdf
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
Module on health assessment of CHN. pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
advance database management system book.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Core Concepts of Personalized Learning and Virtual Learning Environments
Environmental Education MCQ BD2EE - Share Source.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Computer Architecture Input Output Memory.pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
Unit 4 Computer Architecture Multicore Processor.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
Complications of Minimal Access-Surgery.pdf
AI-driven educational solutions for real-life interventions in the Philippine...
Module on health assessment of CHN. pptx
What’s under the hood: Parsing standardized learning content for AI
Ad

Convolutional Neural Network Architecture

  • 1. ConvNet S.Hemalatha AP(S r.G) Dept. Of Computer Applications KEC Convolutional Network
  • 2. CNN • A Convolutional Neural Network (ConvNet or CNN) is a class of deep neural networks primarily used for analyzing visual data such as images or videos, where data patterns play a crucial role. . • advanced version of artificial neural networks (ANNs), primarily designed to extract features from grid-like matrix datasets. • CNNs are widely used in computer vision applications. • CNNs consist of multiple layers like the input layer, Convolutional layer, pooling layer, and fully connected layers.
  • 4. Core Components and Layers of a ConvNet: Input Layer: • This layer receives the raw input data, typically an image represented as a multi-dimensional array (e.g., height x width x color channels). (eg.image of dimension 32 x 32 x 3) • This layer holds the raw input of the image with width 32, height 32, and depth 3. Convolutional Layer: • This is the fundamental building block of a ConvNet. • It applies a set of learnable filters (also called kernels) to the input volume. The filters/kernels are smaller matrices usually 2x2, 3x3, or 5x5 shape. • Each filter performs a convolution operation, sliding across the input and computing the dot product between the filter's weights and the corresponding input region. • This process extracts features like edges, textures, and patterns, generating feature maps that represent the presence of these features at different locations. Activation Function (e.g., ReLU): • Typically applied after the convolutional layer. • It introduces non-linearity into the model, allowing it to learn more complex relationships in the data. • Rectified Linear Unit (ReLU) is a common choice, setting all negative values in the feature map to zero and keeping positive values unchanged. Pooling Layer: • Its purpose is to reduce the spatial dimensions (height and width) of the feature maps, thereby decreasing computational complexity and the number of parameters. • Common pooling operations include Max Pooling (selecting the maximum value within a defined window) and Average Pooling (calculating the average value within a window).
  • 5. Fully Connected Layer (FC Layer): • Located at the end of the network, after several convolutional and pooling layers. • The flattened output from the preceding layers is fed into this layer. • Each neuron in a fully connected layer is connected to every neuron in the previous layer, allowing the network to learn high-level representations and make predictions based on the extracted features. Output Layer: • The final layer of the ConvNet. • It typically uses an activation function like Softmax for classification tasks, producing a probability distribution over the possible classes. Additional Components: Dropout Layer: A regularization technique often used in fully connected layers to prevent overfitting by randomly deactivating a fraction of neurons during training. Batch Normalization: A technique to normalize the activations of a layer, which can improve training stability and speed. This hierarchical structure allows ConvNets to automatically learn and extract increasingly complex features from raw visual data, making them highly effective for tasks such as image classification, object detection, and segmentation.
  • 6. Detailed description of each layers 1. Input Layer Purpose: Accepts raw image data (e.g., 224×224×3 for a color image). Data shape: (Height × Width × Channels). Example: For grayscale: 28×28×1, for RGB: 32×32×3. 2. Convolutional Layers Core operation: Applies a set of learnable filters (kernels) to extract local features. Key parameters: Number of filters (e.g., 32, 64, 128…) Kernel size (e.g., 3×3, 5×5) Stride (step size of filter movement eg 1 or 2 or 3 etc) Padding ("same" to preserve dimensions, "valid" to reduce) Activation: Usually ReLU for non-linearity. Outcome: Feature maps (spatial representation of learned features). 3. Pooling Layers Purpose: Reduce spatial dimensions to lower computational cost and control overfitting. Types: Max Pooling: Keeps the largest value in each window. Average Pooling: Takes the mean of values in the window. Typical size: 2×2 with stride 2. 4. Dropout Layers (optional) Purpose: Randomly "turns off" a fraction of neurons during training to prevent overfitting. Typical rate: 0.25–0.5. 5. Fully Connected (Dense) Layers Purpose: Flatten the feature maps into a vector and learn high- level representations. (nD to 1D) Often uses ReLU activation, ending with a Softmax (for classification). 6. Output Layer Purpose: Produces final predictions. Activation: Softmax for multi-class classification. Sigmoid for binary classification.
  • 7. Architecture of the CNNs applied to digit recognition
  • 11. • Convolution- The term convolution refers to the mathematical combination of two functions to produce a third function. • Pooling- The objective of Pooling is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensions and allowing for assumptions to be made about features contained in the sub-regions created. Convolution Pooling
  • 12. Fully Connected Layers- FCL in a neural network are those layers where all the inputs from one layer are connected to every activation unit of the next layer.
  • 13. 1. What is a Convolutional Layer? A Convolutional Layer in a CNN applies small, trainable filters (also called kernels) over an input image (or feature map) to detect features such as edges, textures, shapes, etc. 2. Key Components a) Filter (Kernel) • A small matrix of weights, e.g., 3×3 or 5×5. • Scans across the image, multiplying values element-wise and summing them up. • Each filter detects a specific pattern (e.g., vertical edges, curves). • One convolutional layer typically has many filters (e.g., 32, 64…). b) Stride • How many pixels the filter moves at each step. Stride = 1 → Filter moves one pixel at a time (more overlap, bigger output). Stride = 2 → Filter moves two pixels at a time (less overlap, smaller output). c) Padding Decides what happens at the image borders: Valid padding → No padding, output shrinks. Same padding → Pads with zeros so output has the same spatial size as input.
  • 14. 3. How the Process Works Example Input A 5×5 grayscale image: 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 Filter (Kernel) A 3×3 filter: 1 0 1 0 1 0 1 0 1 step-by-Step (Stride = 1, No Padding) Place the filter on the top-left corner of the image. Multiply each filter value with the overlapping image pixel. Sum the results → This becomes one pixel in the output. Slide the filter right by stride steps → Repeat until the end of the row. Move down by stride steps and repeat for the next row.
  • 15. First Position Calculation Filter on top-left corner: Image patch Filter Multiply & Sum 1 1 1 1 0 1 (1×1)+(1×0)+(1×1) + 0 1 1 × 0 1 0 = (0×0)+(1×1)+(1×0) + 0 0 1 1 0 1 (0×1)+(0×0)+(1×1) = 1 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 1 = 4 So, the first output pixel = 4.
  • 16. Output Size Formula If: n = input size f = filter size p = padding s = stride Then output size Example: Input = 5×5, Filter = 3×3, Stride = 1, Padding = 0: So output is 3×3. 4. Multiple Filters If the layer has 32 filters, it produces 32 different feature maps (one per filter), stacked together as the output.
  • 18. What is a Pooling Layer? Purpose: Reduce spatial dimensions of feature maps while keeping important features. Benefit: Fewer parameters → faster computation → less overfitting. How: Applies a small window (e.g., 2×2) and replaces it with a single value. use the same 5×5 matrix (after convolution) for pooling demonstration 4 2 1 0 3 3 1 2 3 4 1 0 1 2 3 0 1 3 4 2 2 3 0 1 1 Max Pooling Operation: Take the maximum value in each window. Pooling size: 2×2, Stride: 1, Padding: None
  • 19. step-by-step: First Output Row (1,1) → 4 2 3 1 Max = 4 (1,2) → 2 1 1 2 Max = 2 (1,3) → 1 0 2 3 Max = 3 Output row → [4, 2, 3] Second Output Row (2,1) → 3 1 1 0 Max = 3 (2,2) → 1 2 0 1 Max = 2 (2,3) → 2 3 1 2 Max = 3 Output row → [3, 2, 3] Third Output Row (3,1) → 1 0 0 3 Max = 3 (3,2) → 0 1 3 4 Max = 4 (3,3) → 1 2 4 2 Max = 4 Output row → [3, 4, 4] Final Max Pooling Output (Stride = 1) 4 2 3 3 2 3 3 4 4
  • 20. Max Pooling with padding • Max Pooling with padding and stride = 2 for your same 5×5 feature map step-by-step. • Step 1: Add Padding • If we use 2×2 pooling with stride = 2 and want to cover all regions including borders, we can add zero padding around the matrix. We’ll pad 1 row/column on the bottom and right so pooling windows fit perfectly. • Padded matrix: • 4 2 1 0 3 0 • 3 1 2 3 4 0 • 1 0 1 2 3 0 • 0 1 3 4 2 0 • 2 3 0 1 1 0 • 0 0 0 0 0 0 • Size → 6×6
  • 21. • Pool size: 2×2 • Stride: 2 • Padding: None • for your same 5×5 feature map. Max Pooling without padding
  • 22. Step 2: Apply Max Pooling (2×2, stride=2) Row 1 of Output Window 1 (rows 1–2, cols 1–2): 4 2 3 1 Max = 4 Window 2 (rows 1–2, cols 3–4): 1 0 2 3 Max = 3 (No more columns for another 2×2 window.) Row 2 of Output Window 1 (rows 3–4, cols 1–2): 1 0 0 1 Max = 1 Window 2 (rows 3–4, cols 3–4): 1 2 3 4 Max = 4 (No more columns for another window.) Final Max Pooling Output (stride=2, no padding) 4 3 1 4 So, compared to stride=2 with padding, The output shrinks from 3×3 → 2×2 The bottom/right edges are ignored since no padding is added. Assignment: Try all four cases: stride=1 no padding stride=2 no padding stride=1 with padding stride=2 with padding into one comparison chart so can see how output size and values change. That would make the differences very clear.
  • 23. Average Pooling Global Pooling Average Pooling Operation: Take the average of all values in each window. Example: Pool size = 2×2, Stride = 2. First window: 4 2 3 1 Average = (4+2+3+1) / 4 = 2.5 Average pooling output: 2.5 1.5 3.5 0.5 2.5 2.5 2.0 1.0 1.5 Global Pooling Global Max Pooling: Takes the maximum from the whole feature map (reduces entire map to 1 value per channel). From our example → max = 4. Global Average Pooling: Takes the mean of all values in the feature map. From our example → sum all values / total count. This is usually applied just before the output layer in place of a Flatten → Dense layer.
  • 24. what happens between pooling layers and fully connected (dense) layers How does the 2D matrix from pooling become the 1D vector that a fully connected layer expects? Suppose after the last pooling layer, your output shape is: 4 × 4 × 64 Height = 4 Width = 4 Channels (feature maps) = 64 This is not yet suitable for a fully connected layer, because a Dense layer expects a 1D vector. Flattening the Output use a Flatten operation to convert the 3D tensor into a 1D vector. Example: Before Flatten: 4 × 4 × 64 Number of elements = 4 × 4 × 64 = 1024 After Flatten: [x , x , x , ..., x ] → Shape: (1024,) ₁ ₂ ₃ ₁₀₂₄ In frameworks: Keras/TensorFlow: Flatten() layer Feeding to Fully Connected Layer Now the 1D vector becomes the input to the dense layer. Each element of the vector is connected to every neuron in the FC layer. If the first Dense layer has 128 neurons: Input size = 1024 (from flattening) Weight matrix size = (1024 × 128) Bias size = (128) • Why Flatten Is Needed • Convolutional and pooling layers maintain spatial structure (H, W, Channels). • Fully connected layers treat inputs as simple feature lists — no spatial layout. • Flatten bridges the gap by reshaping without losing the learned feature values. • Pooling → Matrix Flatten → Vector Vector → FC layer for final decision-making.
  • 25. Some common architectures of CNN • LeNet-5 • AlexNet • VGG 16 • Inception (GoogLeNet) • ResNet • DenseNet