Computer Vision UNit 3 Presentaion Slide

Unit 3
Feature Detection
Prepared By:
Aarti Parekh

Contents
• Edge detection
• Corner detection
• Line and curve detection
• Active contours
• SIFT and HOG descriptors
• Shape context descriptors
• Morphological operations

Feature
• A feature is a piece of information which is relevant for solving the
computational task related to a certain application. Features may be
specific structures in the image such as points, edges or objects.
Features may also be the result of a general neighborhood operation
or feature detection applied to the image.

Classification of Feature
• Local Feature
• Global Feature

Simplified Explanation:
1.On the left, you have an image of a motorcycle.
2.The middle box represents a "feature extraction algorithm." This is a process that looks at the
image and identifies important parts or patterns.
3.On the right, the image is broken into smaller parts, showing different sections of the motorcycle
(like the wheels, exhaust, or seat). These smaller sections are the "features" that have been
extracted from the image. They represent key details that describe parts of the motorcycle.

• Simplest Example:
• Global Feature:
• Example: The overall color of an image.
• If you have a photo of the ocean, a global feature might be that the image is
mostly blue. This captures a broad, high-level characteristic of the entire
image.
• Local Feature:
• Example: The corner of an object in the image.
• In the same ocean photo, a local feature could be a distinct corner or edge
of a boat within the image. It represents a more specific, detailed part of
the image, limited to a smaller region.

Edge Detection
• Edge detection is a technique of image processing used to identify
points in a digital image with discontinuities, simply to say, sharp
changes in the image brightness. These points where the image
brightness varies sharply are called the edges (or boundaries) of the
image.

• It is one of the basic steps in image processing, pattern recognition in
images and computer vision. When we process very high-resolution
digital images, convolution techniques come to our rescue. Let us
understand the convolution operation (represented in the below
image using *) using an example-

various methods in edge detection
• Prewitt edge detection
• Sobel edge detection
• Laplacian edge detection
• Canny edge detection

Prewitt Edge Detection
• This method is a commonly used edge detector mostly to detect the
horizontal and vertical edges in images. The following are the Prewitt
edge detection filters-

Prewitt Vertical
Edge detection

Prewitt Horizontal
Edge detection

Sobel Edge Detection
• This uses a filter that gives more emphasis to the center of the filter. It
is one of the most commonly used edge detectors and helps reduce
noise and provides differentiating, giving edge response
simultaneously. The following are the filters used in this method-

Sobel Horizontal
Edge detection

Laplacian Edge Detection
• The Laplacian edge detectors vary from the previously discussed edge
detectors. This method uses only one filter (also called a kernel). In a
single pass, Laplacian edge detection performs second-order
derivatives and hence are sensitive to noise. To avoid this sensitivity
to noise, before applying this method, Gaussian smoothing is
performed on the image.

Canny Edge Detection
• This is the most commonly used highly effective and complex compared to many
other methods. It is a multi-stage algorithm used to detect/identify a wide range of
edges. The following are the various stages of the Canny edge detection algorithm-
1. Convert the image to grayscale
2. Reduce noise – as the edge detection that using derivatives is sensitive to noise,
we reduce it.
3. Calculate the gradient – helps identify the edge intensity and direction.
4. Non-maximum suppression – to thin the edges of the image.
5. Double threshold – to identify the strong, weak and irrelevant pixels in the
images.
6. Hysteresis edge tracking – helps convert the weak pixels into strong ones only if
they have a strong pixel around them.

Image Matching
• Image matching is an important task in computer vision. We need to
know if two different images are for same scene or not.
• It is a challenging task. Challenges arise from different geometric and
photometric transformation. Geometric transformations include
translation, rotation and scaling.
• Photometric transformations like change in brightness or exposure.
For example, next two figures are images for same scene. How can we
match them ?

Patch Matching
• The basic idea for image matching is patch matching. Patch matching is applied
by selection of patch (square) in one image and match it with a patch in the other
image.
• Which patch to select ?
• As we seen in the next figure, the patch in left image will be matched with many
patches in the right image. So it will be confusing to select such patch. We need a
patch with unique shape in the image.

Not all Patches are created Equal!

Harris Corner Detector: Basic Idea

Harris Corner Detector: Mathematics

• In general, for each point (x0, y0) , we can define the family of lines that
goes through that point as:
• Meaning that each pair (rθ,θ ) represents each line that passes by (x0, y0)

• A line can be detected by finding the number of intersections between
curves.
• The more curves intersecting means that the line represented by that
intersection have more points.
• In general, we can define a threshold of the minimum number of
intersections needed to detect a line.
• It keeps track of the intersection between curves of every point in the
image.
• If the number of intersections is above some threshold, then it declares
it as a line with the parameters ( θ, rθ) of the intersection point.

Detecting lines using Hough transform
• Using Hough Transform show that (1,1) (2,2) (3,3) are collinear .

Can you recognize these shapes?

Sometimes edge detectors find the
boundary pretty well.

Active Contour
• Image Segmentation is a section of image processing for the
separation of information from the required target region of the
image.
• There are different techniques used for segmentation of pixels of
interest from the image.
• Active contour technique is applied for separation of foreground from
the background and the segmented region of interest undergoes
further image analysis

• Active contour is one of the active models in segmentation techniques, which
makes use of the energy constraints and forces in the image for separation of
region of interest.
• Active contour defines a separate boundary or curvature for the regions of target
object for segmentation
• In medical imaging, active contours are used in segmentation of regions from
different medical images such as brain CT images, MRI images of different organs,
cardiac images and different images of regions in the human body.

Snake model
• Snake model is a technique that has the potential of solving wide
class of segmentation cases. The model mainly works to identify and
outlines the target object considered for segmentation.
• It uses a certain amount of prior knowledge about the target object
contour especially for complex objects.

• Snake model is designed to vary its shape and position while tending to search
through the minimal energy state.
• When the snake model moves around a closed curve, it moves with the influence
of both internal and external energy to keep the total energy minimum.
• The total energy of active snake model is a summation of three types of energy
namely
(i) internal energy (Ei) which depends on the degree of the spline relating to the
shape of the target image;
(ii) external energy (Ee) which includes the external forces given by the user and
also energy from various other factors;
(iii) energy of the image under consideration (EI) which conveys valuable data on
the illumination of the spline representing the target object. The total energy
defined for the contour formation in the snake model is given by Eq.
ET=Ei+Ee+ EI

Gradient vector flow model
• Gradient vector flow model is an extended and well-defined
technique of snake or active contour models. The traditional
snake model possesses two limitations that is poor convergence
performance of the contour for concave boundaries and when
the snake curve flow is initiated at long distance from the
minimum.
• Contour of the target object from the image is defined based on
the edge mapping function and gradient vector flow field.
• The gradient vector flow model is used for the segmentation of
exact target region compared to the snake model.

• Gradient vector flow (GVF) field is determined based on the following steps.
• The primary step is to detect the edge mapping function f(x, y) from the
image I(x, y).
• Edge mapping function for binary images is described by Eq. ,
f(x,y)=−Gσ(x,y) I(x,y)
∗
where Gσ(x,y) is a 2D quassian function with the statistical parameter,
standard deviation σ.

• The functional energy possesses two different terms such as smoothing
term and data term which depends on the parameter μ.
• The parameter value is based on the noise level in the image that is if the
noise level is high then the parameter has to be increased.
• The main problem or limitation with gradient vector flow is the smoothing
term that forms rounding of the edges of the contour. Therefore, increase in
the value of μ reduces the rounding of edges but weakens the smoothing
condition of the contour to a certain extent.
• This model helps in motion tracking of the various regions in the human
body especially pumping action of the heart and muscular activities of
various regions.

mammogram segmentation using gradient vector flow (GVF)
model.

Balloon Model
• A snake model is not attracted to distant edges. The snake model will
shrink inner side, if no substantial images forces are acting upon it.
• A snake larger than the minima contour will eventually shrink into it,
but a snake smaller than minima contour will not find the minima and
instead continue to shrink.

• Skin lesion segregation from the dermal images using balloon models.

• These contours are used for further processing and
prediction of skin cancer.
• The main disadvantage of the balloon model is slow
processing that it is difficult to handle sharp edges and it has
a manual object placement.
• Balloon model is widely used in analysing the extraction of
specific image contour.

SIFT( Scale Invariant Feature Transform)
• SIFT, or Scale Invariant Feature Transform, is a feature detection
algorithm in Computer Vision.
• SIFT helps locate the local features in an image, commonly known as
the ‘keypoints‘ of the image.
• These keypoints are scale & rotation invariant that can be used for
various computer vision applications, like image matching, object
detection, scene detection, etc.

• For example, here is image of the Eiffel Tower along with its smaller
version. The keypoints of the object in the first image are matched
with the keypoints found in the second image. The same goes for two
images when the object in the other image is slightly rotated.

• Let’s understand how these keypoints are identified and what are the
techniques used to ensure the scale and rotation invariance. Broadly
speaking, the entire process can be divided into 4 parts:
1. Constructing a Scale Space: To make sure that features are scale-
independent
2. Keypoint Localisation: Identifying the suitable features or keypoints
3. Orientation Assignment: Ensure the keypoints are rotation invariant
4. Keypoint Descriptor: Assign a unique fingerprint to each keypoint

1. Constructing the Scale Space
• Scale space is a collection of images having different scales, generated from a
single image.
• We need to identify the most distinct features in a given image while ignoring any
noise. Additionally, we need to ensure that the features are not scale-dependent.
These are critical concepts so let’s talk about them one-by-one.
• We use the Gaussian Blurring technique to reduce the noise in an image.
• So, for every pixel in an image, the Gaussian Blur calculates a value based on its
neighboring pixels. Below is an example of image before and after applying the
Gaussian Blur. As you can see, the texture and minor details are removed from
the image and only the relevant information like the shape and edges remain.

• these blur images are created for multiple scales. To create a new set of images of
different scales, we will take the original image and reduce the scale by half. For
each new image, we will create blur versions as we saw above.

• we have created images of multiple scales (often represented by σ) and used
Gaussian blur for each of them to reduce the noise in the image. Next, we will try
to enhance the features using a technique called Difference of Gaussians or DoG.
• Difference of Gaussian is a feature enhancement algorithm that involves the
subtraction of one blurred version of an original image from another, less blurred
version of the original.

2. Keypoint Localisation
• Once the images have been created, the next step is to find the
important keypoints from the image that can be used for feature
matching. The idea is to find the local maxima and minima for the
images. This part is divided into two steps:
1. Find the local maxima and minima
2. Remove low contrast keypoints (keypoint selection)

• To locate the local maxima and minima, we go through every pixel in the image
and compare it with its neighboring pixels.
• we have successfully generated scale-invariant keypoints. But some of these
keypoints may not be robust to noise. This is why we need to perform a final
check to make sure that we have the most accurate keypoints to represent the
image features.
• we perform a check to identify the poorly located keypoints. These are the
keypoints that are close to the edge and have a high edge response but may not
be robust to a small amount of noise.

3. Orientation Assignment
• At this stage, we have a set of stable keypoints for the images. We will
now assign an orientation to each of these keypoints so that they are
invariant to rotation. We can again divide this step into two smaller
steps:
1. Calculate the magnitude and orientation
2. Create a histogram for magnitude and orientation

• Let’s say we want to find the magnitude and orientation for the pixel value in red.
For this, we will calculate the gradients in x and y directions by taking the
difference between 55 & 46 and 56 & 42. This comes out to be Gx = 9 and Gy = 14
respectively.
• Once we have the gradients, we can find the magnitude and orientation using the
following formulas:
• Magnitude = √[(Gx)2
+(Gy)2
] = 16.64
• Φ = atan(Gy / Gx) = atan(1.55) = 57.17

• Creating a Histogram for Magnitude and Orientation
• On the x-axis, we will have bins for angle values, like 0-9, 10 – 19, 20-
29, up to 360. Since our angle value is 57, it will fall in the 6th bin. The
6th bin value will be in proportion to the magnitude of the pixel, i.e.
16.64. We will do this for all the pixels around the keypoint.
• This is how we get the below histogram:

4. Keypoint Descriptor
• This is the final step for SIFT. So far, we have stable keypoints that are scale-
invariant and rotation invariant. In this section, we will use the neighboring pixels,
their orientations, and magnitude, to generate a unique fingerprint for this
keypoint called a ‘descriptor’.
• Additionally, since we use the surrounding pixels, the descriptors will be partially
invariant to illumination or brightness of the images.
• We will first take a 16×16 neighborhood around the keypoint. This 16×16 block is
further divided into 4×4 sub-blocks and for each of these sub-blocks, we generate
the histogram using magnitude and orientation.

• At this stage, the bin size is increased and we take only 8 bins (not 36).
Each of these arrows represents the 8 bins and the length of the
arrows define the magnitude. So, we will have a total of 128 bin
values for every keypoint.

HOG ( Histogram of Oriented Gradients)
• HOG, or Histogram of Oriented Gradients, is a feature descriptor that
is often used to extract features from image data. It is widely used in
computer vision tasks for object detection.
• The HOG descriptor focuses on the structure or the shape of an
object.

• In the case of edge features, we only identify if the pixel is an edge or
not. HOG is able to provide the edge direction as well. This is done by
extracting the gradient and orientation (or you can say magnitude
and direction) of the edges
• Additionally, these orientations are calculated in ‘localized’ portions.
This means that the complete image is broken down into smaller
regions and for each region, the gradients and orientation are
calculated.
• Finally the HOG would generate a Histogram for each of these regions
separately. The histograms are created using the gradients and
orientations of the pixel values, hence the name ‘Histogram of
Oriented Gradients’

step-by-step process to calculate HOG.
• Consider the below image of size (180 x 280). Let us take a detailed
look at how the HOG features will be created for this image:

Step 1: Preprocess the Data (64 x 128)
• We need to preprocess the image and bring down the width to height
ratio to 1:2.
• The image size should preferably be 64 x 128. This is because we will
be dividing the image into 8*8 and 16*16 patches to extract the
features. Having the specified size (64 x 128) will make all our
calculations pretty simple.

Step 2: Calculating Gradients (direction x and y)
• The next step is to calculate the gradient for every pixel in the
image. Gradients are the small change in the x and y
directions. Here, take a small patch from the image and calculate the
gradients on that:

We will get the pixel values for this patch. Let’s say we generate the below
pixel matrix for the given patch (the matrix shown here is merely used as
an example and these are not the original pixel values for the given patch)
• Hence the resultant gradients in the x and y direction for this pixel are:
• Change in X direction(Gx) = 89 – 78 = 11
• Change in Y direction(Gy) = 68 – 56 = 8

• This process will give us two new matrices – one storing gradients in
the x-direction and the other storing gradients in the y direction. This
is similar to using a Sobel Kernel of size 1. The magnitude would be
higher when there is a sharp change in intensity, such as around the
edges.
• We have calculated the gradients in both x and y direction separately.
The same process is repeated for all the pixels in the image. The next
step would be to find the magnitude and orientation using these
values.

Step 3: Calculate the Magnitude and Orientation
• Using the gradients we calculated in the last step, we will now
determine the magnitude and direction for each pixel value. For this
step, we will be using the Pythagoras theorem

• The gradients are basically the base and perpendicular here. So, for the previous
example, we had Gx and Gy as 11 and 8.
• Let’s apply the Pythagoras theorem to calculate the total gradient magnitude:
Total Gradient Magnitude = √[(Gx)2
+(Gy)2
]
Total Gradient Magnitude = √[(11)2
+(8)2
] = 13.6
• Next, calculate the orientation (or direction) for the same pixel. We know that we
can write the tan for the angles:
tan(Φ) = Gy / Gx
• Hence, the value of the angle would be:
Φ = atan(Gy / Gx)
• The orientation comes out to be 36 when we plug in the values. So now, for every
pixel value, we have the total gradient (magnitude) and the orientation (direction).
We need to generate the histogram using these gradients and orientations.

Step 4: Calculate Histogram of Gradients in 8×8 cells
• The histograms created in the HOG feature descriptor are not
generated for the whole image. Instead, the image is divided into 8×8
cells, and the histogram of oriented gradients is computed for each
cell.
• By doing so, we get the features (or histogram) for the smaller
patches which in turn represent the whole image. We can certainly
change this value here from 8 x 8 to 16 x 16 or 32 x 32.
• If we divide the image into 8×8 cells and generate the histograms, we
will get a 9 x 1 matrix for each cell.

• Once we have generated the HOG for the 8×8 patches in the image,
the next step is to normalize the histogram.

Step 5: Normalize gradients in 16×16 cell (36×1)
• Although we already have the HOG features created for the 8×8 cells of the
image, the gradients of the image are sensitive to the overall lighting. This means
that for a particular picture, some portion of the image would be very bright as
compared to the other portions.
• We cannot completely eliminate this from the image. But we can reduce this
lighting variation by normalizing the gradients by taking 16×16 blocks. Here is an
example that can explain how 16×16 blocks are created:

• Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know
that each 8×8 cell has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices
or a single 36×1 matrix. To normalize this matrix, we will divide each of these values by
the square root of the sum of squares of the values. Mathematically, for a given vector V:
V = [a1, a2, a3, ….a36]
We calculate the root of the sum of squares:
k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2
And divide all the values in the vector V with this value k:
• The resultant would be a normalized vector of size 36×1.

Step 6: Features for the complete image
• We are now at the final step of generating HOG features for the image. So far, we
have created features for 16×16 blocks of the image. Now, we will combine all
these to get the features for the final image.
• We would have 105 (7×15) blocks of 16×16. Each of these 105 blocks has a vector
of 36×1 as features. Hence, the total features for the image would be 105 x 36×1
= 3780 features.
• We will now generate HOG features for a single image and verify if we get the
same number of features at the end.

Morphological Operation
• It is collection of non-linear operations related to the shape or
morphology of features in an image.
• Morphological Operations in Image Processing pursues the goals of
removing these imperfections by accounting for the form and
structure of the image.
• Morphological Operations : 1) Dilation 2)Erosion 3) Opening 4)closing
5) Gradient 6)Blackhat 7)tophat

Structuring element
• The number of pixels added or removed from the objects in an image
depends on the size and shape of the structuring element
• It’s a matrix of 1’s and 0’s
• A small shape or template called a structuring element, a matrix that
identifies the pixel in the image being processed and defines the
neighborhood used in the processing of each pixel is used to probe an
image in these Morphological techniques. It is positioned at all possible
locations in the input image and compared with the
corresponding neighbourhood of pixels.
• The center pixel of the structuring element, called the origin

Probing of an image with a structuring element

Some Examples of structuring element

Origin of a Diamond-Shaped Structuring
Element

Dilation and Erosion
• Dilation and erosion are two fundamental morphological operations.
• Dilation adds pixels to the boundaries of objects in an image
• Erosion removes pixels on object boundaries.
• The number of pixels added or removed from the objects in an image
depends on the size and shape of the structuring element used to
process the image.

Dilation
• The basic effect of dilation on binary images is to enlarge the areas of
foreground pixels (i.e. white pixels) at their borders.
• The areas of foreground pixels thus grow in size, while the
background "holes" within them shrink.
• represented by the symbol ⊕

Dilation for bridging gaps in an Image

Erosion
• The basic effect of erosion operator on a binary image is to erode
away the boundaries of foreground pixels (usually the white pixels).
• Thus areas of foreground pixels shrink in size, and "holes" within
those areas become larger.
• represented by the symbol ⊖

Erosion for Remove unwanted details

• Refer Link for other morphological operation : opening, closing, gradient,
blackhat, tophat
• https://docs.opencv.org/4.5.2/d9/d61/tutorial_py_morphological_ops.html

Computer Vision UNit 3 Presentaion Slide

More Related Content

Similar to Computer Vision UNit 3 Presentaion Slide (20)

Recently uploaded (20)

Computer Vision UNit 3 Presentaion Slide