SlideShare a Scribd company logo
Unit 3
Feature Detection
Prepared By:
Aarti Parekh
Contents
• Edge detection
• Corner detection
• Line and curve detection
• Active contours
• SIFT and HOG descriptors
• Shape context descriptors
• Morphological operations
What is Feature?
Feature
• A feature is a piece of information which is relevant for solving the
computational task related to a certain application. Features may be
specific structures in the image such as points, edges or objects.
Features may also be the result of a general neighborhood operation
or feature detection applied to the image.
Classification of Feature
• Local Feature
• Global Feature
Global Feature
Computer Vision UNit 3 Presentaion Slide
Computer Vision UNit 3 Presentaion Slide
Local Features
Simplified Explanation:
1.On the left, you have an image of a motorcycle.
2.The middle box represents a "feature extraction algorithm." This is a process that looks at the
image and identifies important parts or patterns.
3.On the right, the image is broken into smaller parts, showing different sections of the motorcycle
(like the wheels, exhaust, or seat). These smaller sections are the "features" that have been
extracted from the image. They represent key details that describe parts of the motorcycle.
Computer Vision UNit 3 Presentaion Slide
• Simplest Example:
• Global Feature:
• Example: The overall color of an image.
• If you have a photo of the ocean, a global feature might be that the image is
mostly blue. This captures a broad, high-level characteristic of the entire
image.
• Local Feature:
• Example: The corner of an object in the image.
• In the same ocean photo, a local feature could be a distinct corner or edge
of a boat within the image. It represents a more specific, detailed part of
the image, limited to a smaller region.
Edge Detection
• Edge detection is a technique of image processing used to identify
points in a digital image with discontinuities, simply to say, sharp
changes in the image brightness. These points where the image
brightness varies sharply are called the edges (or boundaries) of the
image.
Computer Vision UNit 3 Presentaion Slide
• It is one of the basic steps in image processing, pattern recognition in
images and computer vision. When we process very high-resolution
digital images, convolution techniques come to our rescue. Let us
understand the convolution operation (represented in the below
image using *) using an example-
various methods in edge detection
• Prewitt edge detection
• Sobel edge detection
• Laplacian edge detection
• Canny edge detection
Prewitt Edge Detection
• This method is a commonly used edge detector mostly to detect the
horizontal and vertical edges in images. The following are the Prewitt
edge detection filters-
Prewitt Vertical
Edge detection
Prewitt Horizontal
Edge detection
Sobel Edge Detection
• This uses a filter that gives more emphasis to the center of the filter. It
is one of the most commonly used edge detectors and helps reduce
noise and provides differentiating, giving edge response
simultaneously. The following are the filters used in this method-
Sobel Vertical
Edge detection
Sobel Horizontal
Edge detection
Laplacian Edge Detection
• The Laplacian edge detectors vary from the previously discussed edge
detectors. This method uses only one filter (also called a kernel). In a
single pass, Laplacian edge detection performs second-order
derivatives and hence are sensitive to noise. To avoid this sensitivity
to noise, before applying this method, Gaussian smoothing is
performed on the image.
Laplacian
Edge detection
Laplacian
Edge detection
Canny Edge Detection
• This is the most commonly used highly effective and complex compared to many
other methods. It is a multi-stage algorithm used to detect/identify a wide range of
edges. The following are the various stages of the Canny edge detection algorithm-
1. Convert the image to grayscale
2. Reduce noise – as the edge detection that using derivatives is sensitive to noise,
we reduce it.
3. Calculate the gradient – helps identify the edge intensity and direction.
4. Non-maximum suppression – to thin the edges of the image.
5. Double threshold – to identify the strong, weak and irrelevant pixels in the
images.
6. Hysteresis edge tracking – helps convert the weak pixels into strong ones only if
they have a strong pixel around them.
Canny
Edge detection
Computer Vision UNit 3 Presentaion Slide
Image Matching
• Image matching is an important task in computer vision. We need to
know if two different images are for same scene or not.
• It is a challenging task. Challenges arise from different geometric and
photometric transformation. Geometric transformations include
translation, rotation and scaling.
• Photometric transformations like change in brightness or exposure.
For example, next two figures are images for same scene. How can we
match them ?
Computer Vision UNit 3 Presentaion Slide
Matching Problem
Patch Matching
• The basic idea for image matching is patch matching. Patch matching is applied
by selection of patch (square) in one image and match it with a patch in the other
image.
• Which patch to select ?
• As we seen in the next figure, the patch in left image will be matched with many
patches in the right image. So it will be confusing to select such patch. We need a
patch with unique shape in the image.
Patch Matching
Not all Patches are created Equal!
Not all Patches are created Equal!
What are Corners?
Computer Vision UNit 3 Presentaion Slide
Harris Corner Detector: Basic Idea
Harris Corner Detector: Mathematics
Line detection
• In general, for each point (x0, y0) , we can define the family of lines that
goes through that point as:
• Meaning that each pair (rθ,θ ) represents each line that passes by (x0, y0)
Computer Vision UNit 3 Presentaion Slide
Computer Vision UNit 3 Presentaion Slide
• A line can be detected by finding the number of intersections between
curves.
• The more curves intersecting means that the line represented by that
intersection have more points.
• In general, we can define a threshold of the minimum number of
intersections needed to detect a line.
• It keeps track of the intersection between curves of every point in the
image.
• If the number of intersections is above some threshold, then it declares
it as a line with the parameters ( θ, rθ) of the intersection point.
Detecting lines using Hough transform
• Using Hough Transform show that (1,1) (2,2) (3,3) are collinear .
Can you recognize these shapes?
Computer Vision UNit 3 Presentaion Slide
Sometimes edge detectors find the
boundary pretty well.
Sometimes it’s not
enough.
Active Contour
• Image Segmentation is a section of image processing for the
separation of information from the required target region of the
image.
• There are different techniques used for segmentation of pixels of
interest from the image.
• Active contour technique is applied for separation of foreground from
the background and the segmented region of interest undergoes
further image analysis
• Active contour is one of the active models in segmentation techniques, which
makes use of the energy constraints and forces in the image for separation of
region of interest.
• Active contour defines a separate boundary or curvature for the regions of target
object for segmentation
• In medical imaging, active contours are used in segmentation of regions from
different medical images such as brain CT images, MRI images of different organs,
cardiac images and different images of regions in the human body.
Snake model
• Snake model is a technique that has the potential of solving wide
class of segmentation cases. The model mainly works to identify and
outlines the target object considered for segmentation.
• It uses a certain amount of prior knowledge about the target object
contour especially for complex objects.
Active contours (Snakes)
• Snake model is designed to vary its shape and position while tending to search
through the minimal energy state.
• When the snake model moves around a closed curve, it moves with the influence
of both internal and external energy to keep the total energy minimum.
• The total energy of active snake model is a summation of three types of energy
namely
(i) internal energy (Ei) which depends on the degree of the spline relating to the
shape of the target image;
(ii) external energy (Ee) which includes the external forces given by the user and
also energy from various other factors;
(iii) energy of the image under consideration (EI) which conveys valuable data on
the illumination of the spline representing the target object. The total energy
defined for the contour formation in the snake model is given by Eq.
ET=Ei+Ee+ EI
Gradient vector flow model
• Gradient vector flow model is an extended and well-defined
technique of snake or active contour models. The traditional
snake model possesses two limitations that is poor convergence
performance of the contour for concave boundaries and when
the snake curve flow is initiated at long distance from the
minimum.
• Contour of the target object from the image is defined based on
the edge mapping function and gradient vector flow field.
• The gradient vector flow model is used for the segmentation of
exact target region compared to the snake model.
• Gradient vector flow (GVF) field is determined based on the following steps.
• The primary step is to detect the edge mapping function f(x, y) from the
image I(x, y).
• Edge mapping function for binary images is described by Eq. ,
f(x,y)=−Gσ(x,y) I(x,y)
∗
where Gσ(x,y) is a 2D quassian function with the statistical parameter,
standard deviation σ.
• The functional energy possesses two different terms such as smoothing
term and data term which depends on the parameter μ.
• The parameter value is based on the noise level in the image that is if the
noise level is high then the parameter has to be increased.
• The main problem or limitation with gradient vector flow is the smoothing
term that forms rounding of the edges of the contour. Therefore, increase in
the value of μ reduces the rounding of edges but weakens the smoothing
condition of the contour to a certain extent.
• This model helps in motion tracking of the various regions in the human
body especially pumping action of the heart and muscular activities of
various regions.
mammogram segmentation using gradient vector flow (GVF)
model.
Balloon Model
• A snake model is not attracted to distant edges. The snake model will
shrink inner side, if no substantial images forces are acting upon it.
• A snake larger than the minima contour will eventually shrink into it,
but a snake smaller than minima contour will not find the minima and
instead continue to shrink.
• Skin lesion segregation from the dermal images using balloon models.
• These contours are used for further processing and
prediction of skin cancer.
• The main disadvantage of the balloon model is slow
processing that it is difficult to handle sharp edges and it has
a manual object placement.
• Balloon model is widely used in analysing the extraction of
specific image contour.
SIFT( Scale Invariant Feature Transform)
• SIFT, or Scale Invariant Feature Transform, is a feature detection
algorithm in Computer Vision.
• SIFT helps locate the local features in an image, commonly known as
the ‘keypoints‘ of the image.
• These keypoints are scale & rotation invariant that can be used for
various computer vision applications, like image matching, object
detection, scene detection, etc.
• For example, here is image of the Eiffel Tower along with its smaller
version. The keypoints of the object in the first image are matched
with the keypoints found in the second image. The same goes for two
images when the object in the other image is slightly rotated.
• Let’s understand how these keypoints are identified and what are the
techniques used to ensure the scale and rotation invariance. Broadly
speaking, the entire process can be divided into 4 parts:
1. Constructing a Scale Space: To make sure that features are scale-
independent
2. Keypoint Localisation: Identifying the suitable features or keypoints
3. Orientation Assignment: Ensure the keypoints are rotation invariant
4. Keypoint Descriptor: Assign a unique fingerprint to each keypoint
1. Constructing the Scale Space
• Scale space is a collection of images having different scales, generated from a
single image.
• We need to identify the most distinct features in a given image while ignoring any
noise. Additionally, we need to ensure that the features are not scale-dependent.
These are critical concepts so let’s talk about them one-by-one.
• We use the Gaussian Blurring technique to reduce the noise in an image.
• So, for every pixel in an image, the Gaussian Blur calculates a value based on its
neighboring pixels. Below is an example of image before and after applying the
Gaussian Blur. As you can see, the texture and minor details are removed from
the image and only the relevant information like the shape and edges remain.
• these blur images are created for multiple scales. To create a new set of images of
different scales, we will take the original image and reduce the scale by half. For
each new image, we will create blur versions as we saw above.
• we have created images of multiple scales (often represented by σ) and used
Gaussian blur for each of them to reduce the noise in the image. Next, we will try
to enhance the features using a technique called Difference of Gaussians or DoG.
• Difference of Gaussian is a feature enhancement algorithm that involves the
subtraction of one blurred version of an original image from another, less blurred
version of the original.
2. Keypoint Localisation
• Once the images have been created, the next step is to find the
important keypoints from the image that can be used for feature
matching. The idea is to find the local maxima and minima for the
images. This part is divided into two steps:
1. Find the local maxima and minima
2. Remove low contrast keypoints (keypoint selection)
• To locate the local maxima and minima, we go through every pixel in the image
and compare it with its neighboring pixels.
• we have successfully generated scale-invariant keypoints. But some of these
keypoints may not be robust to noise. This is why we need to perform a final
check to make sure that we have the most accurate keypoints to represent the
image features.
• we perform a check to identify the poorly located keypoints. These are the
keypoints that are close to the edge and have a high edge response but may not
be robust to a small amount of noise.
3. Orientation Assignment
• At this stage, we have a set of stable keypoints for the images. We will
now assign an orientation to each of these keypoints so that they are
invariant to rotation. We can again divide this step into two smaller
steps:
1. Calculate the magnitude and orientation
2. Create a histogram for magnitude and orientation
• Let’s say we want to find the magnitude and orientation for the pixel value in red.
For this, we will calculate the gradients in x and y directions by taking the
difference between 55 & 46 and 56 & 42. This comes out to be Gx = 9 and Gy = 14
respectively.
• Once we have the gradients, we can find the magnitude and orientation using the
following formulas:
• Magnitude = √[(Gx)2
+(Gy)2
] = 16.64
• Φ = atan(Gy / Gx) = atan(1.55) = 57.17
• Creating a Histogram for Magnitude and Orientation
• On the x-axis, we will have bins for angle values, like 0-9, 10 – 19, 20-
29, up to 360. Since our angle value is 57, it will fall in the 6th bin. The
6th bin value will be in proportion to the magnitude of the pixel, i.e.
16.64. We will do this for all the pixels around the keypoint.
• This is how we get the below histogram:
4. Keypoint Descriptor
• This is the final step for SIFT. So far, we have stable keypoints that are scale-
invariant and rotation invariant. In this section, we will use the neighboring pixels,
their orientations, and magnitude, to generate a unique fingerprint for this
keypoint called a ‘descriptor’.
• Additionally, since we use the surrounding pixels, the descriptors will be partially
invariant to illumination or brightness of the images.
• We will first take a 16×16 neighborhood around the keypoint. This 16×16 block is
further divided into 4×4 sub-blocks and for each of these sub-blocks, we generate
the histogram using magnitude and orientation.
• At this stage, the bin size is increased and we take only 8 bins (not 36).
Each of these arrows represents the 8 bins and the length of the
arrows define the magnitude. So, we will have a total of 128 bin
values for every keypoint.
HOG ( Histogram of Oriented Gradients)
• HOG, or Histogram of Oriented Gradients, is a feature descriptor that
is often used to extract features from image data. It is widely used in
computer vision tasks for object detection.
• The HOG descriptor focuses on the structure or the shape of an
object.
• In the case of edge features, we only identify if the pixel is an edge or
not. HOG is able to provide the edge direction as well. This is done by
extracting the gradient and orientation (or you can say magnitude
and direction) of the edges
• Additionally, these orientations are calculated in ‘localized’ portions.
This means that the complete image is broken down into smaller
regions and for each region, the gradients and orientation are
calculated.
• Finally the HOG would generate a Histogram for each of these regions
separately. The histograms are created using the gradients and
orientations of the pixel values, hence the name ‘Histogram of
Oriented Gradients’
step-by-step process to calculate HOG.
• Consider the below image of size (180 x 280). Let us take a detailed
look at how the HOG features will be created for this image:
Step 1: Preprocess the Data (64 x 128)
• We need to preprocess the image and bring down the width to height
ratio to 1:2.
• The image size should preferably be 64 x 128. This is because we will
be dividing the image into 8*8 and 16*16 patches to extract the
features. Having the specified size (64 x 128) will make all our
calculations pretty simple.
Step 2: Calculating Gradients (direction x and y)
• The next step is to calculate the gradient for every pixel in the
image. Gradients are the small change in the x and y
directions. Here, take a small patch from the image and calculate the
gradients on that:
We will get the pixel values for this patch. Let’s say we generate the below
pixel matrix for the given patch (the matrix shown here is merely used as
an example and these are not the original pixel values for the given patch)
• Hence the resultant gradients in the x and y direction for this pixel are:
• Change in X direction(Gx) = 89 – 78 = 11
• Change in Y direction(Gy) = 68 – 56 = 8
• This process will give us two new matrices – one storing gradients in
the x-direction and the other storing gradients in the y direction. This
is similar to using a Sobel Kernel of size 1. The magnitude would be
higher when there is a sharp change in intensity, such as around the
edges.
• We have calculated the gradients in both x and y direction separately.
The same process is repeated for all the pixels in the image. The next
step would be to find the magnitude and orientation using these
values.
Step 3: Calculate the Magnitude and Orientation
• Using the gradients we calculated in the last step, we will now
determine the magnitude and direction for each pixel value. For this
step, we will be using the Pythagoras theorem
• The gradients are basically the base and perpendicular here. So, for the previous
example, we had Gx and Gy as 11 and 8.
• Let’s apply the Pythagoras theorem to calculate the total gradient magnitude:
Total Gradient Magnitude = √[(Gx)2
+(Gy)2
]
Total Gradient Magnitude = √[(11)2
+(8)2
] = 13.6
• Next, calculate the orientation (or direction) for the same pixel. We know that we
can write the tan for the angles:
tan(Φ) = Gy / Gx
• Hence, the value of the angle would be:
Φ = atan(Gy / Gx)
• The orientation comes out to be 36 when we plug in the values. So now, for every
pixel value, we have the total gradient (magnitude) and the orientation (direction).
We need to generate the histogram using these gradients and orientations.
Step 4: Calculate Histogram of Gradients in 8×8 cells
• The histograms created in the HOG feature descriptor are not
generated for the whole image. Instead, the image is divided into 8×8
cells, and the histogram of oriented gradients is computed for each
cell.
• By doing so, we get the features (or histogram) for the smaller
patches which in turn represent the whole image. We can certainly
change this value here from 8 x 8 to 16 x 16 or 32 x 32.
• If we divide the image into 8×8 cells and generate the histograms, we
will get a 9 x 1 matrix for each cell.
• Once we have generated the HOG for the 8×8 patches in the image,
the next step is to normalize the histogram.
Step 5: Normalize gradients in 16×16 cell (36×1)
• Although we already have the HOG features created for the 8×8 cells of the
image, the gradients of the image are sensitive to the overall lighting. This means
that for a particular picture, some portion of the image would be very bright as
compared to the other portions.
• We cannot completely eliminate this from the image. But we can reduce this
lighting variation by normalizing the gradients by taking 16×16 blocks. Here is an
example that can explain how 16×16 blocks are created:
• Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know
that each 8×8 cell has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices
or a single 36×1 matrix. To normalize this matrix, we will divide each of these values by
the square root of the sum of squares of the values. Mathematically, for a given vector V:
V = [a1, a2, a3, ….a36]
We calculate the root of the sum of squares:
k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2
And divide all the values in the vector V with this value k:
• The resultant would be a normalized vector of size 36×1.
Step 6: Features for the complete image
• We are now at the final step of generating HOG features for the image. So far, we
have created features for 16×16 blocks of the image. Now, we will combine all
these to get the features for the final image.
• We would have 105 (7×15) blocks of 16×16. Each of these 105 blocks has a vector
of 36×1 as features. Hence, the total features for the image would be 105 x 36×1
= 3780 features.
• We will now generate HOG features for a single image and verify if we get the
same number of features at the end.
Computer Vision UNit 3 Presentaion Slide
Morphological Operation
• It is collection of non-linear operations related to the shape or
morphology of features in an image.
• Morphological Operations in Image Processing pursues the goals of
removing these imperfections by accounting for the form and
structure of the image.
• Morphological Operations : 1) Dilation 2)Erosion 3) Opening 4)closing
5) Gradient 6)Blackhat 7)tophat
Structuring element
• The number of pixels added or removed from the objects in an image
depends on the size and shape of the structuring element
• It’s a matrix of 1’s and 0’s
• A small shape or template called a structuring element, a matrix that
identifies the pixel in the image being processed and defines the
neighborhood used in the processing of each pixel is used to probe an
image in these Morphological techniques. It is positioned at all possible
locations in the input image and compared with the
corresponding neighbourhood of pixels.
• The center pixel of the structuring element, called the origin
Probing of an image with a structuring element
Some Examples of structuring element
Origin of a Diamond-Shaped Structuring
Element
Dilation and Erosion
• Dilation and erosion are two fundamental morphological operations.
• Dilation adds pixels to the boundaries of objects in an image
• Erosion removes pixels on object boundaries.
• The number of pixels added or removed from the objects in an image
depends on the size and shape of the structuring element used to
process the image.
Dilation
• The basic effect of dilation on binary images is to enlarge the areas of
foreground pixels (i.e. white pixels) at their borders.
• The areas of foreground pixels thus grow in size, while the
background "holes" within them shrink.
• represented by the symbol ⊕
Dilation for bridging gaps in an Image
Example: dilation
Erosion
• The basic effect of erosion operator on a binary image is to erode
away the boundaries of foreground pixels (usually the white pixels).
• Thus areas of foreground pixels shrink in size, and "holes" within
those areas become larger.
• represented by the symbol ⊖
Erosion for Remove unwanted details
Example: Erosion
• Refer Link for other morphological operation : opening, closing, gradient,
blackhat, tophat
• https://docs.opencv.org/4.5.2/d9/d61/tutorial_py_morphological_ops.html

More Related Content

PDF
PPT s06-machine vision-s2
PDF
Image segmentation methods for brain mri images
PPT
MODULE_4_part1_Intro_image-segzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz...
PPT
MODULE_4_part1_Intro_image-segmentation.ppt AAAAAAAAAAAAAAAAAAAAAAA
PPTX
Introduction to Edges Detection Techniques
PDF
Lecture 4&5 computer vision edge-detection code chains hough transform snakes
PPTX
Lecture 06 - image processingcourse1.pptx
PPTX
image segmentation image segmentation.pptx
PPT s06-machine vision-s2
Image segmentation methods for brain mri images
MODULE_4_part1_Intro_image-segzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz...
MODULE_4_part1_Intro_image-segmentation.ppt AAAAAAAAAAAAAAAAAAAAAAA
Introduction to Edges Detection Techniques
Lecture 4&5 computer vision edge-detection code chains hough transform snakes
Lecture 06 - image processingcourse1.pptx
image segmentation image segmentation.pptx

Similar to Computer Vision UNit 3 Presentaion Slide (20)

PPTX
06 image features
PDF
Ijarcet vol-2-issue-7-2246-2251
PDF
Ijarcet vol-2-issue-7-2246-2251
PDF
Edge detection.pdf
PPTX
Module 4-Image segmentation.pptx aaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
Biomedical engineering 20231023-segmentation-1.pdf
PPT
Edges and lines
PDF
Study of Various Edge Detection Techniques and Implementation of Real Time Fr...
PPTX
Computer vision - edge detection
PPTX
Image segmentation
PDF
UNIT-4.pdf image processing btech aktu notes
PDF
Module-5-1_230523_171754 (1).pdf
PPT
Chapter10 image segmentation
PDF
A Review on Edge Detection Algorithms in Digital Image Processing Applications
PPTX
Lecture_07_InterestPoints_computer_vision.pptx
PPTX
lec04_harris_for_web computer vision and
PPTX
08_Lecture -Chapter 10- Image Segmentation_Part I_Edge Detection.pptx
PDF
Edge detection by using lookup table
PPT
Chapter10_Segmentation.ppt
PDF
Feature detection and matching
06 image features
Ijarcet vol-2-issue-7-2246-2251
Ijarcet vol-2-issue-7-2246-2251
Edge detection.pdf
Module 4-Image segmentation.pptx aaaaaaaaaaaaaaaaaaaaaaaaaaa
Biomedical engineering 20231023-segmentation-1.pdf
Edges and lines
Study of Various Edge Detection Techniques and Implementation of Real Time Fr...
Computer vision - edge detection
Image segmentation
UNIT-4.pdf image processing btech aktu notes
Module-5-1_230523_171754 (1).pdf
Chapter10 image segmentation
A Review on Edge Detection Algorithms in Digital Image Processing Applications
Lecture_07_InterestPoints_computer_vision.pptx
lec04_harris_for_web computer vision and
08_Lecture -Chapter 10- Image Segmentation_Part I_Edge Detection.pptx
Edge detection by using lookup table
Chapter10_Segmentation.ppt
Feature detection and matching
Ad

Recently uploaded (20)

PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Construction Project Organization Group 2.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
737-MAX_SRG.pdf student reference guides
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Current and future trends in Computer Vision.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPT
introduction to datamining and warehousing
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
DOCX
573137875-Attendance-Management-System-original
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
Internet of Things (IOT) - A guide to understanding
Categorization of Factors Affecting Classification Algorithms Selection
Construction Project Organization Group 2.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
737-MAX_SRG.pdf student reference guides
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Current and future trends in Computer Vision.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT 4 Total Quality Management .pptx
Fundamentals of safety and accident prevention -final (1).pptx
introduction to datamining and warehousing
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Fundamentals of Mechanical Engineering.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
573137875-Attendance-Management-System-original
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Internet of Things (IOT) - A guide to understanding
Ad

Computer Vision UNit 3 Presentaion Slide

  • 2. Contents • Edge detection • Corner detection • Line and curve detection • Active contours • SIFT and HOG descriptors • Shape context descriptors • Morphological operations
  • 4. Feature • A feature is a piece of information which is relevant for solving the computational task related to a certain application. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image.
  • 5. Classification of Feature • Local Feature • Global Feature
  • 10. Simplified Explanation: 1.On the left, you have an image of a motorcycle. 2.The middle box represents a "feature extraction algorithm." This is a process that looks at the image and identifies important parts or patterns. 3.On the right, the image is broken into smaller parts, showing different sections of the motorcycle (like the wheels, exhaust, or seat). These smaller sections are the "features" that have been extracted from the image. They represent key details that describe parts of the motorcycle.
  • 12. • Simplest Example: • Global Feature: • Example: The overall color of an image. • If you have a photo of the ocean, a global feature might be that the image is mostly blue. This captures a broad, high-level characteristic of the entire image. • Local Feature: • Example: The corner of an object in the image. • In the same ocean photo, a local feature could be a distinct corner or edge of a boat within the image. It represents a more specific, detailed part of the image, limited to a smaller region.
  • 13. Edge Detection • Edge detection is a technique of image processing used to identify points in a digital image with discontinuities, simply to say, sharp changes in the image brightness. These points where the image brightness varies sharply are called the edges (or boundaries) of the image.
  • 15. • It is one of the basic steps in image processing, pattern recognition in images and computer vision. When we process very high-resolution digital images, convolution techniques come to our rescue. Let us understand the convolution operation (represented in the below image using *) using an example-
  • 16. various methods in edge detection • Prewitt edge detection • Sobel edge detection • Laplacian edge detection • Canny edge detection
  • 17. Prewitt Edge Detection • This method is a commonly used edge detector mostly to detect the horizontal and vertical edges in images. The following are the Prewitt edge detection filters-
  • 20. Sobel Edge Detection • This uses a filter that gives more emphasis to the center of the filter. It is one of the most commonly used edge detectors and helps reduce noise and provides differentiating, giving edge response simultaneously. The following are the filters used in this method-
  • 23. Laplacian Edge Detection • The Laplacian edge detectors vary from the previously discussed edge detectors. This method uses only one filter (also called a kernel). In a single pass, Laplacian edge detection performs second-order derivatives and hence are sensitive to noise. To avoid this sensitivity to noise, before applying this method, Gaussian smoothing is performed on the image.
  • 26. Canny Edge Detection • This is the most commonly used highly effective and complex compared to many other methods. It is a multi-stage algorithm used to detect/identify a wide range of edges. The following are the various stages of the Canny edge detection algorithm- 1. Convert the image to grayscale 2. Reduce noise – as the edge detection that using derivatives is sensitive to noise, we reduce it. 3. Calculate the gradient – helps identify the edge intensity and direction. 4. Non-maximum suppression – to thin the edges of the image. 5. Double threshold – to identify the strong, weak and irrelevant pixels in the images. 6. Hysteresis edge tracking – helps convert the weak pixels into strong ones only if they have a strong pixel around them.
  • 29. Image Matching • Image matching is an important task in computer vision. We need to know if two different images are for same scene or not. • It is a challenging task. Challenges arise from different geometric and photometric transformation. Geometric transformations include translation, rotation and scaling. • Photometric transformations like change in brightness or exposure. For example, next two figures are images for same scene. How can we match them ?
  • 32. Patch Matching • The basic idea for image matching is patch matching. Patch matching is applied by selection of patch (square) in one image and match it with a patch in the other image. • Which patch to select ? • As we seen in the next figure, the patch in left image will be matched with many patches in the right image. So it will be confusing to select such patch. We need a patch with unique shape in the image.
  • 34. Not all Patches are created Equal!
  • 35. Not all Patches are created Equal!
  • 39. Harris Corner Detector: Mathematics
  • 41. • In general, for each point (x0, y0) , we can define the family of lines that goes through that point as: • Meaning that each pair (rθ,θ ) represents each line that passes by (x0, y0)
  • 44. • A line can be detected by finding the number of intersections between curves. • The more curves intersecting means that the line represented by that intersection have more points. • In general, we can define a threshold of the minimum number of intersections needed to detect a line. • It keeps track of the intersection between curves of every point in the image. • If the number of intersections is above some threshold, then it declares it as a line with the parameters ( θ, rθ) of the intersection point.
  • 45. Detecting lines using Hough transform • Using Hough Transform show that (1,1) (2,2) (3,3) are collinear .
  • 46. Can you recognize these shapes?
  • 48. Sometimes edge detectors find the boundary pretty well.
  • 50. Active Contour • Image Segmentation is a section of image processing for the separation of information from the required target region of the image. • There are different techniques used for segmentation of pixels of interest from the image. • Active contour technique is applied for separation of foreground from the background and the segmented region of interest undergoes further image analysis
  • 51. • Active contour is one of the active models in segmentation techniques, which makes use of the energy constraints and forces in the image for separation of region of interest. • Active contour defines a separate boundary or curvature for the regions of target object for segmentation • In medical imaging, active contours are used in segmentation of regions from different medical images such as brain CT images, MRI images of different organs, cardiac images and different images of regions in the human body.
  • 52. Snake model • Snake model is a technique that has the potential of solving wide class of segmentation cases. The model mainly works to identify and outlines the target object considered for segmentation. • It uses a certain amount of prior knowledge about the target object contour especially for complex objects.
  • 54. • Snake model is designed to vary its shape and position while tending to search through the minimal energy state. • When the snake model moves around a closed curve, it moves with the influence of both internal and external energy to keep the total energy minimum. • The total energy of active snake model is a summation of three types of energy namely (i) internal energy (Ei) which depends on the degree of the spline relating to the shape of the target image; (ii) external energy (Ee) which includes the external forces given by the user and also energy from various other factors; (iii) energy of the image under consideration (EI) which conveys valuable data on the illumination of the spline representing the target object. The total energy defined for the contour formation in the snake model is given by Eq. ET=Ei+Ee+ EI
  • 55. Gradient vector flow model • Gradient vector flow model is an extended and well-defined technique of snake or active contour models. The traditional snake model possesses two limitations that is poor convergence performance of the contour for concave boundaries and when the snake curve flow is initiated at long distance from the minimum. • Contour of the target object from the image is defined based on the edge mapping function and gradient vector flow field. • The gradient vector flow model is used for the segmentation of exact target region compared to the snake model.
  • 56. • Gradient vector flow (GVF) field is determined based on the following steps. • The primary step is to detect the edge mapping function f(x, y) from the image I(x, y). • Edge mapping function for binary images is described by Eq. , f(x,y)=−Gσ(x,y) I(x,y) ∗ where Gσ(x,y) is a 2D quassian function with the statistical parameter, standard deviation σ.
  • 57. • The functional energy possesses two different terms such as smoothing term and data term which depends on the parameter μ. • The parameter value is based on the noise level in the image that is if the noise level is high then the parameter has to be increased. • The main problem or limitation with gradient vector flow is the smoothing term that forms rounding of the edges of the contour. Therefore, increase in the value of μ reduces the rounding of edges but weakens the smoothing condition of the contour to a certain extent. • This model helps in motion tracking of the various regions in the human body especially pumping action of the heart and muscular activities of various regions.
  • 58. mammogram segmentation using gradient vector flow (GVF) model.
  • 59. Balloon Model • A snake model is not attracted to distant edges. The snake model will shrink inner side, if no substantial images forces are acting upon it. • A snake larger than the minima contour will eventually shrink into it, but a snake smaller than minima contour will not find the minima and instead continue to shrink.
  • 60. • Skin lesion segregation from the dermal images using balloon models.
  • 61. • These contours are used for further processing and prediction of skin cancer. • The main disadvantage of the balloon model is slow processing that it is difficult to handle sharp edges and it has a manual object placement. • Balloon model is widely used in analysing the extraction of specific image contour.
  • 62. SIFT( Scale Invariant Feature Transform) • SIFT, or Scale Invariant Feature Transform, is a feature detection algorithm in Computer Vision. • SIFT helps locate the local features in an image, commonly known as the ‘keypoints‘ of the image. • These keypoints are scale & rotation invariant that can be used for various computer vision applications, like image matching, object detection, scene detection, etc.
  • 63. • For example, here is image of the Eiffel Tower along with its smaller version. The keypoints of the object in the first image are matched with the keypoints found in the second image. The same goes for two images when the object in the other image is slightly rotated.
  • 64. • Let’s understand how these keypoints are identified and what are the techniques used to ensure the scale and rotation invariance. Broadly speaking, the entire process can be divided into 4 parts: 1. Constructing a Scale Space: To make sure that features are scale- independent 2. Keypoint Localisation: Identifying the suitable features or keypoints 3. Orientation Assignment: Ensure the keypoints are rotation invariant 4. Keypoint Descriptor: Assign a unique fingerprint to each keypoint
  • 65. 1. Constructing the Scale Space • Scale space is a collection of images having different scales, generated from a single image. • We need to identify the most distinct features in a given image while ignoring any noise. Additionally, we need to ensure that the features are not scale-dependent. These are critical concepts so let’s talk about them one-by-one. • We use the Gaussian Blurring technique to reduce the noise in an image. • So, for every pixel in an image, the Gaussian Blur calculates a value based on its neighboring pixels. Below is an example of image before and after applying the Gaussian Blur. As you can see, the texture and minor details are removed from the image and only the relevant information like the shape and edges remain.
  • 66. • these blur images are created for multiple scales. To create a new set of images of different scales, we will take the original image and reduce the scale by half. For each new image, we will create blur versions as we saw above.
  • 67. • we have created images of multiple scales (often represented by σ) and used Gaussian blur for each of them to reduce the noise in the image. Next, we will try to enhance the features using a technique called Difference of Gaussians or DoG. • Difference of Gaussian is a feature enhancement algorithm that involves the subtraction of one blurred version of an original image from another, less blurred version of the original.
  • 68. 2. Keypoint Localisation • Once the images have been created, the next step is to find the important keypoints from the image that can be used for feature matching. The idea is to find the local maxima and minima for the images. This part is divided into two steps: 1. Find the local maxima and minima 2. Remove low contrast keypoints (keypoint selection)
  • 69. • To locate the local maxima and minima, we go through every pixel in the image and compare it with its neighboring pixels. • we have successfully generated scale-invariant keypoints. But some of these keypoints may not be robust to noise. This is why we need to perform a final check to make sure that we have the most accurate keypoints to represent the image features. • we perform a check to identify the poorly located keypoints. These are the keypoints that are close to the edge and have a high edge response but may not be robust to a small amount of noise.
  • 70. 3. Orientation Assignment • At this stage, we have a set of stable keypoints for the images. We will now assign an orientation to each of these keypoints so that they are invariant to rotation. We can again divide this step into two smaller steps: 1. Calculate the magnitude and orientation 2. Create a histogram for magnitude and orientation
  • 71. • Let’s say we want to find the magnitude and orientation for the pixel value in red. For this, we will calculate the gradients in x and y directions by taking the difference between 55 & 46 and 56 & 42. This comes out to be Gx = 9 and Gy = 14 respectively. • Once we have the gradients, we can find the magnitude and orientation using the following formulas: • Magnitude = √[(Gx)2 +(Gy)2 ] = 16.64 • Φ = atan(Gy / Gx) = atan(1.55) = 57.17
  • 72. • Creating a Histogram for Magnitude and Orientation • On the x-axis, we will have bins for angle values, like 0-9, 10 – 19, 20- 29, up to 360. Since our angle value is 57, it will fall in the 6th bin. The 6th bin value will be in proportion to the magnitude of the pixel, i.e. 16.64. We will do this for all the pixels around the keypoint. • This is how we get the below histogram:
  • 73. 4. Keypoint Descriptor • This is the final step for SIFT. So far, we have stable keypoints that are scale- invariant and rotation invariant. In this section, we will use the neighboring pixels, their orientations, and magnitude, to generate a unique fingerprint for this keypoint called a ‘descriptor’. • Additionally, since we use the surrounding pixels, the descriptors will be partially invariant to illumination or brightness of the images. • We will first take a 16×16 neighborhood around the keypoint. This 16×16 block is further divided into 4×4 sub-blocks and for each of these sub-blocks, we generate the histogram using magnitude and orientation.
  • 74. • At this stage, the bin size is increased and we take only 8 bins (not 36). Each of these arrows represents the 8 bins and the length of the arrows define the magnitude. So, we will have a total of 128 bin values for every keypoint.
  • 75. HOG ( Histogram of Oriented Gradients) • HOG, or Histogram of Oriented Gradients, is a feature descriptor that is often used to extract features from image data. It is widely used in computer vision tasks for object detection. • The HOG descriptor focuses on the structure or the shape of an object.
  • 76. • In the case of edge features, we only identify if the pixel is an edge or not. HOG is able to provide the edge direction as well. This is done by extracting the gradient and orientation (or you can say magnitude and direction) of the edges • Additionally, these orientations are calculated in ‘localized’ portions. This means that the complete image is broken down into smaller regions and for each region, the gradients and orientation are calculated. • Finally the HOG would generate a Histogram for each of these regions separately. The histograms are created using the gradients and orientations of the pixel values, hence the name ‘Histogram of Oriented Gradients’
  • 77. step-by-step process to calculate HOG. • Consider the below image of size (180 x 280). Let us take a detailed look at how the HOG features will be created for this image:
  • 78. Step 1: Preprocess the Data (64 x 128) • We need to preprocess the image and bring down the width to height ratio to 1:2. • The image size should preferably be 64 x 128. This is because we will be dividing the image into 8*8 and 16*16 patches to extract the features. Having the specified size (64 x 128) will make all our calculations pretty simple.
  • 79. Step 2: Calculating Gradients (direction x and y) • The next step is to calculate the gradient for every pixel in the image. Gradients are the small change in the x and y directions. Here, take a small patch from the image and calculate the gradients on that:
  • 80. We will get the pixel values for this patch. Let’s say we generate the below pixel matrix for the given patch (the matrix shown here is merely used as an example and these are not the original pixel values for the given patch) • Hence the resultant gradients in the x and y direction for this pixel are: • Change in X direction(Gx) = 89 – 78 = 11 • Change in Y direction(Gy) = 68 – 56 = 8
  • 81. • This process will give us two new matrices – one storing gradients in the x-direction and the other storing gradients in the y direction. This is similar to using a Sobel Kernel of size 1. The magnitude would be higher when there is a sharp change in intensity, such as around the edges. • We have calculated the gradients in both x and y direction separately. The same process is repeated for all the pixels in the image. The next step would be to find the magnitude and orientation using these values.
  • 82. Step 3: Calculate the Magnitude and Orientation • Using the gradients we calculated in the last step, we will now determine the magnitude and direction for each pixel value. For this step, we will be using the Pythagoras theorem
  • 83. • The gradients are basically the base and perpendicular here. So, for the previous example, we had Gx and Gy as 11 and 8. • Let’s apply the Pythagoras theorem to calculate the total gradient magnitude: Total Gradient Magnitude = √[(Gx)2 +(Gy)2 ] Total Gradient Magnitude = √[(11)2 +(8)2 ] = 13.6 • Next, calculate the orientation (or direction) for the same pixel. We know that we can write the tan for the angles: tan(Φ) = Gy / Gx • Hence, the value of the angle would be: Φ = atan(Gy / Gx) • The orientation comes out to be 36 when we plug in the values. So now, for every pixel value, we have the total gradient (magnitude) and the orientation (direction). We need to generate the histogram using these gradients and orientations.
  • 84. Step 4: Calculate Histogram of Gradients in 8×8 cells • The histograms created in the HOG feature descriptor are not generated for the whole image. Instead, the image is divided into 8×8 cells, and the histogram of oriented gradients is computed for each cell. • By doing so, we get the features (or histogram) for the smaller patches which in turn represent the whole image. We can certainly change this value here from 8 x 8 to 16 x 16 or 32 x 32. • If we divide the image into 8×8 cells and generate the histograms, we will get a 9 x 1 matrix for each cell.
  • 85. • Once we have generated the HOG for the 8×8 patches in the image, the next step is to normalize the histogram.
  • 86. Step 5: Normalize gradients in 16×16 cell (36×1) • Although we already have the HOG features created for the 8×8 cells of the image, the gradients of the image are sensitive to the overall lighting. This means that for a particular picture, some portion of the image would be very bright as compared to the other portions. • We cannot completely eliminate this from the image. But we can reduce this lighting variation by normalizing the gradients by taking 16×16 blocks. Here is an example that can explain how 16×16 blocks are created:
  • 87. • Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know that each 8×8 cell has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices or a single 36×1 matrix. To normalize this matrix, we will divide each of these values by the square root of the sum of squares of the values. Mathematically, for a given vector V: V = [a1, a2, a3, ….a36] We calculate the root of the sum of squares: k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2 And divide all the values in the vector V with this value k: • The resultant would be a normalized vector of size 36×1.
  • 88. Step 6: Features for the complete image • We are now at the final step of generating HOG features for the image. So far, we have created features for 16×16 blocks of the image. Now, we will combine all these to get the features for the final image. • We would have 105 (7×15) blocks of 16×16. Each of these 105 blocks has a vector of 36×1 as features. Hence, the total features for the image would be 105 x 36×1 = 3780 features. • We will now generate HOG features for a single image and verify if we get the same number of features at the end.
  • 90. Morphological Operation • It is collection of non-linear operations related to the shape or morphology of features in an image. • Morphological Operations in Image Processing pursues the goals of removing these imperfections by accounting for the form and structure of the image. • Morphological Operations : 1) Dilation 2)Erosion 3) Opening 4)closing 5) Gradient 6)Blackhat 7)tophat
  • 91. Structuring element • The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element • It’s a matrix of 1’s and 0’s • A small shape or template called a structuring element, a matrix that identifies the pixel in the image being processed and defines the neighborhood used in the processing of each pixel is used to probe an image in these Morphological techniques. It is positioned at all possible locations in the input image and compared with the corresponding neighbourhood of pixels. • The center pixel of the structuring element, called the origin
  • 92. Probing of an image with a structuring element
  • 93. Some Examples of structuring element
  • 94. Origin of a Diamond-Shaped Structuring Element
  • 95. Dilation and Erosion • Dilation and erosion are two fundamental morphological operations. • Dilation adds pixels to the boundaries of objects in an image • Erosion removes pixels on object boundaries. • The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image.
  • 96. Dilation • The basic effect of dilation on binary images is to enlarge the areas of foreground pixels (i.e. white pixels) at their borders. • The areas of foreground pixels thus grow in size, while the background "holes" within them shrink. • represented by the symbol ⊕
  • 97. Dilation for bridging gaps in an Image
  • 99. Erosion • The basic effect of erosion operator on a binary image is to erode away the boundaries of foreground pixels (usually the white pixels). • Thus areas of foreground pixels shrink in size, and "holes" within those areas become larger. • represented by the symbol ⊖
  • 100. Erosion for Remove unwanted details
  • 102. • Refer Link for other morphological operation : opening, closing, gradient, blackhat, tophat • https://docs.opencv.org/4.5.2/d9/d61/tutorial_py_morphological_ops.html