SlideShare a Scribd company logo
Hough Transform: Serial and Parallel
Implementations
Jan Essbach1 , Bj¨orn Lindequist1 , Claudia Nacke1
1 University of Applied Sciences Engineering and Economics Berlin
Treskowallee 8, 10318 Berlin, Germany
Abstract – Circle detection has been widely applied in image
processing applications. Hough transform, the most popular
method of shape detection, normally takes a long time to achieve
reasonable results, especially for large images. Such perfor-
mance makes it almost impossible to conduct real-time image
processing with sequential algorithms on community computers.
Recently, OpenCL was developed providing a programming
paradigm to explore the tremendous computational power for
operations on vectors, matrices and high-dimensional matrices.
In this paper, five different approaches of sequential and
parallelized Hough transform algorithms are researched using
CPU and GPU execution. Experimental results indicate that the
realized Hough transform on GPUs can achieve up to 4000 times
speedup over the serial version on CPU. With other efficient
image scaling algorithms, real-time circle extraction can be
achieved with GPU support.
Keywords – Hough Transform, GPU Acceleration, OpenCL,
Image Processing
I. INTRODUCTION
Hough transform is a popular technique for feature extrac-
tion in image processing and computer vision. This concept
was first proposed to detect straight lines [1] and was later
generalized into a robust technique to detect the positions
and directions for any shapes that are already known [2].
Such scheme was known as generalized Hough transform.
Because of its powerful nature on shape recognition, Hough
transform also plays an important role in image and object re-
construction. However, the classical Hough transform adopts
brute-force approach, which normally takes long execution
time to detect shapes with more than two parameters, such as
circles and ellipses. Many researchers have been working on
optimizations of Hough transform. So far, the execution time
of Hough transform to detect shapes with multiple parameters
is still intolerable.
Circle detection can be found in many applications in
a wide range of academic areas, such as medical image
processing [3], [4] and robot vision [5]. Since a circle in
plane has three parameters, the parameter domain should be
a cube, which requires long execution time and large memory
capacity. To solve such kind of problem, people try to reduce
the dimension of parameter space using specific techniques
on certain problems [6]. However a generalized solution has
not been accomplished yet.
This paper intends to accelerate Hough-transform-based
circle detection with parameter space using OpenCL tech-
nology on GPUs. It makes the following contributions:
• One sequential Hough transform algorithm [7] is re-
searched using CPU execution (1)
• An sequential (2) and parallelized (3) optimization of
former algorithm is investigated and
• A CPU (4) and GPU (5) version of Hough transform
are deployed for OpenCL architecture.
The rest of the paper is organized as follows:
Section 2 discusses the Hough transform in general and
focuses on circle detection using generalized Hough trans-
form. Section 3 introduces the possibilities of parallelization
regarding Algorithms and Choice of architecture. Section 4
provides the actual state of the art and Section 5 will provide
the concrete implementation details of Hough transform for
OpenCL. Section 6 illustrates detailed experimental results
and performance analyses to demonstrate the effectiveness of
GPU acceleration. Finally, in the last section our conclusion
and future work are described.
II. HOUGH TRANSFORM AND CIRCLE DETECTION
In this paper, we only discuss shapes on a plane. Shapes
with two parameters a and b can be represented as a function
f(a, b) = 0, such as x + ay + b = 0 and x2
+ ax + b − y = 0
for lines and a special type of parabolas, respectively. While
the image domain resides in the X-Y coordinate system, the
transformed domain, or parameter domain, should be located
in the A-B coordinate system. Fig. 1 gives rough ideas about
Hough transform. First the edges of the input image will be
determined using edge detection algorithms like canny edge
detector [8]. The image will then be converted to a binary
image and Hough Transform can be applied. The resulting
hough space is then utilized to find local maxims which will
represent the circle centers.
Edge
Detection
Hough
Transform
Find local
maxima
Binary
Image
Input Image
Hough
space
Model Parameters
Figure 1. General workflow of Hough Transform
The equation of circles can be written as (x − a)2
+
(y − b)2
= c2
, so we set our parameter domain as a 3-D
cube A-B-C. Assuming we have a edge point (x0, y0) in
the image domain, which can be seen as a point on the
circumference of circle centered at (xc, yc), we can select all
other points as this center point, then rc can be calculated
by rc = (xc − x0)2 + (yc − y0)2, hence we get a set
of coordinates (xc, yc, rc) corresponding to a point in the
parameter domain (see Fig. 2).
ϕ
rc
(xc, yc)
(x0, y0)
Figure 2. Calculation of points in parameter domain
We set counters for all points in that 3-D cube and
increase them by 1 when they are “visited” by the calculated
(xc, yc, rc). It can be imagined that for one (x0, y0), there will
be a conical surface radiate from the point (x0, y0, 0) along
the line f(a, b) : {a = x0; b = y0} in the 3-D cube. The
conical surfaces here are counterparts of the lines in the K-B
domain. If we deduce back from the parameter domain, we
can find that for a real circle, say (x−a0)2
+(y−b0)2
= c2
0 ,
counter’s value of the corresponding point (a0, b0, c0) in the
parameter domain must outstand among its neighbors. We
can see this phenomenon clearly in Fig.3, where three circles
shaded by intensive noises are found in parameter domain as
peaks. Another good nature of circular Hough transform is
we do not have to worry about the infinite slope, but the range
of r should still be specified to save space and calculation
time.
In real applications, we should not always follow the tra-
ditional algorithm but need to find better ways to implement
it. There are several mapping strategies between the image
domain and the parameter domain to make right points stand
out. Two strategies of circular Hough transform and their
relative merits will be discussed.
III. POSSIBILITIES OF PARALLELIZATION
Parallelization can be achieved through various different
approaches. They range from simple thread-programming
in popular programming languages (eg C/C++ or Java) to
the inclusion of powerful frameworks. For Hough transform
the fact that each edge point from the original image is
processed independently of one another can be utilized, there-
fore numerous ways of parallelized solutions are possible.
Some of these possibilities are presented and discussed in the
first part of this section. The second part introduces popular
computing platforms and programming models that allow the
participation of the GPU of modern graphic cards.
A. Algorithms
The basic idea of parallelization is to divide the overall
task into independent sub-tasks that can be processed si-
multaneously. Modern computers have one or more CPUs
with multiple cores each, so a real parallel processing can be
realized. The division of the original problem into appropriate
sub-tasks is of crucial importance, especially when resources
must be shared between them.
The simplest form of parallelization is the processing of
multiple images simultaneously. Each available core calcu-
lates the complete Hough transform for an image. Data
exchange between the cores is not required. When processing
different-sized images it is very likely that the work is not
equally distributed among the cores. To avoid such a waste
of processing power, a more complex approach is required.
One possibility for the Hough transform to recognize
circles with unknown radii is the parallel processing of the
individual radii for one image. Each CPU core computes
the Hough transform for a single radius, and generates a
separate Hough space, only itself has access to. Therefore
a synchronization of this space is not required. All CPU
cores can operate completely in parallel without blocking
each other, since the shared resource – the list of contour
points – is read-only. The result of each sub-task contains
all center points of the detected circles with this particular
radius.
Another approach is the parallel processing of individual
regions within an image. Each task computes the Hough
transform for contour points of its region and generates a
separate section of the complete Hough space. These sections
are each task-exclusive, so synchronization is not necessary.
A disadvantage of this approach is an additional processing
step in which the Hough space for neighboring regions must
be composed of the individual spaces of each task. On
the other hand not the entire Hough space has to be fully
allocated. Only the subspace of neighboring regions must
be kept in memory simultaneously. Once a region is no
longer needed for the voting process, the memory used can
be released. This reduction of the required memory can be
extremely useful when processing large images (eg with more
than 100,000 pixels in each dimension).
There are many other approaches to the parallelization of
the Hough transform. These examples should give only a
general view of the possibilities.
B. Choice of architecture
The hardware architecture of a CPU and a GPU differ
significantly from each other. The reason is the different
purpose of both devices. A task that is highly parallelizable,
can be calculated much faster on a GPU than on a CPU if
Figure 3. Hough transform on circles with r = 30
the specifics of the GPU (eg memory usage) are taken into
account. The gain in speed between CPU and GPU is greater
when a job involves a lot of calculations a GPU is optimized
for. For a detailed description of the hardware architecture
and the resulting application scenarios for GPUs see [9].
In recent years, various computing platforms were intro-
duced, allowing the programming of software that can be
executed on GPUs. Two of these platforms are CUDA1
and
OpenCL2
. In addition to the details of their implementation,
the most striking difference is the number of supported
hardware devices. While OpenCL supports different devices
(CPU, GPU, DSP or other processors) from different manu-
facturers, CUDA can only utilize GPUs from NVIDIA. This
limitation of CUDA can lead to a speed advantage of up to
50% in comparison to an OpenCL program which has not
been specifically adapted3
for Nvidia hardware [10].
As already described, the Hough transform is a highly
parallelizable task. The use of a platform for parallelization,
with the aim to significantly reduce the computation time, is
a reasonable solution.
IV. STATE OF THE ART
Digital image processing is an area with a wide range
of applications with an active research and development
community. Since the Hough transform is an important tool,
it is included in many frameworks or applications. One of
the most common is the cross-platform framework OpenCV.
Despite the existing solutions, the Hough transform is
a field of active development, especially since the option
to include the GPU for calculations is available. Current
GPU implementations of the Hough transform can achieve
remarkable increases in speed when compared with a CPU.
Two recent implementations for the CUDA platform could
achieve speedup factors of up to 400 [11] [12].
1Compute Unified Device Architecture – NVIDIA
2Open Computing Language – initially Apple Inc., now Khronos Group
3adapted OpenCL programs can be equally fast but loose their portability
V. PARALLELIZATION OF HOUGH TRANSFORM
Within our solution the first step is to generate all jobs
for the hough kernel. These jobs are represented by a triple
(x, y, r). For each edge point in the original image and
all radii, meaning rtotal ∈ [rmin, rmax], such a pair is
generated. The total job size is calculated by edges · |rtotal|
and will be the global work size for the first OpenCL kernel.
For each pair (x, y, r) the kernel will be executed and the
midpoint algorithm is used to “visit” the pixels of interest
and increment the visited pixel by value one.
w
h
w'
h'
0 0 00...
w' * h'
Original Image Resized Hough Space Linear aligned Hough Space
rmax
x,y x',y'
rmax
Figure 4. hough space
The original image will be transformed into the resized
hough space by adding the maximum radius rmax to all sides
of the image domain. Within the OpenCL kernel the resized
hough space is interpreted as a linear aligned data structure.
The array index n for Point (r, x, y) is calculated using
formula i(r, x, y) which is shown in Fig. V. For each radius,
different hough space images are aligned in a sequential
order. Thus, formula one is used, to determine the offset
within the hough space. To determine the original parameters
(r, x, y) from the array index n the third formula i−1
(n) can
be used. Adding this “junk data” leads to an optimization
of the midpoint algorithm where no further boundary check
within the OpenCL kernel is needed. Checking operations
are extremely slow within OpenCL using GPU execution.
This junk data will be ignored for voting process by setting
the corresponding three dimensional global work-size of the
voting kernel.
ϕ(r) = (r − rmin) · w · h (1)
i(r, x, y) = ϕ(r) + y · w + x (2)
i−1
(n) =



n/(w · h ), r ∈ N
n − ϕ(r)/w , y ∈ N
n − ϕ(r) − y · w , x ∈ N
(3)
The voting kernel will be started with a three dimensional
global work-size and a global offset to skip all junk data,
which can be seen in listing 1.
1 offset[3] = {r_max, r_max, 0};
2 worksize[3] = {img.cols + r_max, img.rows + r_max,
3 r_max - r_min + 1};
Listing 1. Setting clEnqueueNDRangeKernel Parameters
For voting a simple thresholding and local maxims search
is used. The threshold is dependent on the radius of the
circle. Within the voting process the coordinates (x , y )
are converted back to original image domain (x, y) and the
number of found circles is saved in a __global variable.
The vote space itself will have size of found edges as a
maximum because no dynamic data structures can be used
within a OpenCL kernel. After the vote space kernel is
finished the data structure consisting of concrete instances
of (x, y, r) is copied back to the host system and the Hough
transform is finished.
VI. EVALUATION
To evaluate the implemented Hough transform with
OpenCL a benchmark was conducted. The structure of the
benchmark is as follows:
A. Test images
Images with 4 different quadratic resolutions (256x256 -
2048x2048 pixel) and an increasing number of edge points
(up to ≈ 100.000) were used in the benchmark. One sample
image with a resolution of 2048x2048 pixel is shown in
Fig. 5.
B. Implementations
In order to assess the implementation presented, it should
be compared with three other freely available solutions:
• V1 – sequential Hough transform algorithm [7] (CPU)
• V2 – optimization of V1 (sequential – CPU)
• V3 – optimization of V1 (parallelized, 30 Threads –
CPU)
• V4 – presented OpenCL implementation (CPU)
• V5 – presented OpenCL implementation (GPU)
All implementations of the Hough transform should deter-
mine circles having radii of 10 to 40 in the respective image.
C. Hardware
Used Hardware for Benchmark:
• CPU: Intel(R) Core(TM) i7-3720QM
• GPU: ATI Radeon HD 5770
Figure 5. Test image 2048x2048 pixels and 92640 egde points
D. Results
To obtain reliable processing times, all images were pro-
cessed ten times by each solution and then the average time
was calculated.
The average times for all solutions and images are shown
in Tab. I
Format Edges V1 V2 V3 V4 V5
256x256 1503 11,321 0,813 0,210 0,022 0,031
512x512 5790 167,432 3,301 0,856 0,080 0,073
1024x1024 23160 1200,344 13,373 3,278 0,476 0,354
2048x2048 92640 3400,442 53,169 13,056 1,772 0,791
Table I. EXECUTION TIME OF HOUGH TRANSFORM IN SECONDS
A graphical representation of the benchmark result is
shown in Fig. 6 (V1 has been omitted for clarity).
The result of the benchmark shows that the presented
OpenCL implementation (CPU and GPU) is significantly
faster than any other solution. The minimal difference for
images up to the resolution of 1024x1024 can be explained by
the simplicity of the calculations, which do not fully exploit
the potential of the GPU. Only for the largest image, the
resolution is almost irrelevant, the number of contour points
is crucial to the work, the GPU is faster than the CPU by
a factor of 2. The number of calculations depends on the
number of contour points which is multiplied by the number
of observed radii. For the 2048x2048 resolution image there
are 92640 contour points and 31 radii (10 to 40) which results
in 31 · 92.640 = 2.871.840 jobs each calling the midpoint
OpenCL kernel. With this large number of calls, the graphics
card can begin to take benefit of their advantages.
500 1,000 1,500 2,000
0
10
20
30
40
50
V2
V3
V4
V5
Image Dimension n × n
Executiontimeinseconds
Figure 6. Execution time of different Approaches
If the processing time of solutions V1 and V5 are com-
pared, the speedup can be calculated as follows:
SpeedupV 1/V 5 =
3400.442
0.791
= 4298.91 (4)
The implemented Hough transform for the OpenCL pro-
gramming platform (processed on a GPU) can achieve a
speedup by a factor of ≈ 4000 compared with a serial
CPU version. By optimizing the OpenCL kernels the speedup
could possibly be increased even further.
VII. CONCLUSION
In this paper the broad applicability of Hough transform in
digital image processing has been presented. Various possi-
bilities for parallelization to reduce the processing time were
discussed. The result of this work is the implementation of
Hough transform using the programming platform OpenCL.
In order to assess the results in comparison to other imple-
mentations (serial and parallel on CPU), the processing time
was compared with these. The introduced OpenCL solution
can reach a speedup by a factor of up to ≈ 4000.
To further improve the solution, the implemented algo-
rithms are investigated to determine whether they can exploit
the capabilities of the GPU even more.
So far, all tests were conducted solely with artificial
images. To assess the quality of the implemented OpenCL
solution even better, real-world images will be used for
testing in the next step. Possible sources for these test images
could be free databases of test images for image processing,
eg [13] [14].
Another approach to further increase the reliability of the
evaluation is the use of parameters such as precision and
recall. Using these values, a statement can be made about
the reliability of the detection, ie could the Hough transform
detect all circles in the image and were objects that are not
circles, identified as such.
The long-term goal of this work is the detection of circles
with variable radii in a video stream (eg a web cam) in
real-time. To achieve this, an edge detection is necessary
to perform the Hough transform and has to be also created
in real-time. The next step will therefore be to evaluate
appropriate edge detectors. The main criterion is the effective
computability on a GPU using OpenCL. In [15] the pos-
sibility to implement the canny edge detector on an GPU
was investigated. This method should be sufficient for the
intended application.
REFERENCES
[1] P. V. C. Hough, “Method and means for recognizing complex patterns,”
1962.
[2] R. O. Duda and P. E. Hart, “Use of the hough transformation to detect
lines and curves in pictures,” in Communications of the ACM, 1972.
[3] S. Eom, R. Bise, and T. Kanade, “Detection of hematopoietic stem
cells in microscopy images using a bank of ring filters,” in The IEEE
International Symposium on Biomedical Imaging, 2010.
[4] M. Smereka and I. Duleba, “Circular object detection using a modified
hough transform,” in International Journal of Applied Mathematics and
Computer Science, 2008.
[5] Y. Yabuta, H. Mizumoto, and S. Arii, “Binocular robot vision system
with shape recognition,” in International Conference on Control,
Automation and Systems, 2007.
[6] Y. Xie and Q. Ji, “A new efficient ellipse detection method,” in
International Conference on Pattern Recognition, 2002.
[7] M. Bowes. (2009) Hough circle detector. [Online]. Available:
https://github.com/marcbowes/Hough-Circle-Detector
[8] J. Canny, “A computational approach to edge detection,” Pattern
Analysis and Machine Intelligence, IEEE Transactions on, no. 6, pp.
679–698, 1986.
[9] J. Owens and U. D. Davis. (2007) Gpu architecture
overview. [Online]. Available: http://gpgpu.org/static/s2007/slides/
02-gpu-architecture-overview-s07.pdf
[10] K. Karimi, N. G. Dickson, and F. Hamze. (2010) A performance
comparison of cuda and opencl. [Online]. Available: http://arxiv.org/
ftp/arxiv/papers/1005/1005.2581.pdf
[11] S. Chen and H. Jiang, “Accelerating the hough transform with cuda
on graphics processing units,” in Proceedings of 2011 International
Conference on Parallel and Distributed Processing Techniques and
Applications (PDPTA), 2011.
[12] F. S. Tasel and A. Temizel, “Parallelization of hough transform for
circles using cuda,” GPU Technology Conference, 2012.
[13] Computer vision test images. Carnegie Mellon University. [Online].
Available: http://www.cs.cmu.edu/∼cil/v-images.html
[14] The usc-sipi image database. Signal and Image Processing Institute.
[Online]. Available: http://sipi.usc.edu/database/
[15] Y. Luo and R. Duraiswami, “Canny edge detection on nvidia cuda,” in
Computer Vision and Pattern Recognition Workshops, 2008. CVPRW
’08. IEEE Computer Society Conference on, 2008, pp. 1–8.
[16] M. B¨auml and R. Stiefelhagen, “Evaluation of Local Features for
Person Re-Identification in Image Sequences,” Research Paper, Institute
of Technology, Karlsruhe, 2011.
[17] D. Wagner, “Marker-Based Tracking,” 2008. [Online]. Available:
http://handheldar.icg.tugraz.at/markerbased.php
[18] D. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,”
Ph.D. dissertation, University of British Columbia, Canada, 2004.
[19] ——, “Object Recognition from Local Scale-Invariant Features,” Re-
search Paper, University of British Columbia, Canada, 1999.
[20] R. E. Kalman, “A New Approach to Linear Filtering and Prediction
Problems,” Research Paper, Research Institute for Advanced Study,
Baltimore, 2002.

More Related Content

PPTX
Color model in computer graphics
PPTX
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
PPTX
Color models
PDF
Sistem Cerdas
PPTX
Visible surface identification
PDF
Toko online erd dan analisis sistem informasi penjualan berbasis web - mode...
PDF
Modul latihan kkpi ms. excel
PPTX
HSV color model
Color model in computer graphics
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Color models
Sistem Cerdas
Visible surface identification
Toko online erd dan analisis sistem informasi penjualan berbasis web - mode...
Modul latihan kkpi ms. excel
HSV color model

What's hot (11)

PPT
3 d viewing
PPTX
Traffic sign detection
PPT
Polygon clipping
PPTX
(管理者向け) Microsoft Edge の展開と管理の手法
PDF
Unit-IV Windowing and Clipping.pdf
PPTX
Computer graphics chapter 4
PPTX
14_grafik dan dashboard pivot table(1).pptx
DOCX
Makalah KTI (Command Prompt dan Blog)
PPTX
Unit Converter Java project
PDF
4. THREE DIMENSIONAL DISPLAY METHODS
PDF
02 struktur-sistem-operasi
3 d viewing
Traffic sign detection
Polygon clipping
(管理者向け) Microsoft Edge の展開と管理の手法
Unit-IV Windowing and Clipping.pdf
Computer graphics chapter 4
14_grafik dan dashboard pivot table(1).pptx
Makalah KTI (Command Prompt dan Blog)
Unit Converter Java project
4. THREE DIMENSIONAL DISPLAY METHODS
02 struktur-sistem-operasi
Ad

Similar to Hough Transform: Serial and Parallel Implementations (20)

PDF
F017663344
PDF
A Hough Transform Implementation for Line Detection for a Mobile Robot Self-N...
PDF
A Hough Transform Based On a Map-Reduce Algorithm
PDF
Implementation_of_Hough_Transform_for_image_processing_applications.pdf
PDF
Log polar coordinates
PDF
International Journal of Engineering Inventions (IJEI)
PDF
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
PDF
E0343034
PDF
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
PDF
Todtree
PDF
An efficient image segmentation approach through enhanced watershed algorithm
PDF
robio-2014-falquez
PDF
Machine learning for high-speed corner detection
PDF
B018110915
PDF
Multi Wavelet for Image Retrival Based On Using Texture and Color Querys
PDF
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
PDF
Parallel implementation of geodesic distance transform with application in su...
PPT
Poster Segmentation Chain
PDF
Vf sift
PDF
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
F017663344
A Hough Transform Implementation for Line Detection for a Mobile Robot Self-N...
A Hough Transform Based On a Map-Reduce Algorithm
Implementation_of_Hough_Transform_for_image_processing_applications.pdf
Log polar coordinates
International Journal of Engineering Inventions (IJEI)
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
E0343034
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
Todtree
An efficient image segmentation approach through enhanced watershed algorithm
robio-2014-falquez
Machine learning for high-speed corner detection
B018110915
Multi Wavelet for Image Retrival Based On Using Texture and Color Querys
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
Parallel implementation of geodesic distance transform with application in su...
Poster Segmentation Chain
Vf sift
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Ad

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
.pdf is not working space design for the following data for the following dat...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PPTX
Computer network topology notes for revision
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to machine learning and Linear Models
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
.pdf is not working space design for the following data for the following dat...
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
Computer network topology notes for revision
Introduction-to-Cloud-ComputingFinal.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Qualitative Qantitative and Mixed Methods.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
Foundation of Data Science unit number two notes
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to machine learning and Linear Models

Hough Transform: Serial and Parallel Implementations

  • 1. Hough Transform: Serial and Parallel Implementations Jan Essbach1 , Bj¨orn Lindequist1 , Claudia Nacke1 1 University of Applied Sciences Engineering and Economics Berlin Treskowallee 8, 10318 Berlin, Germany Abstract – Circle detection has been widely applied in image processing applications. Hough transform, the most popular method of shape detection, normally takes a long time to achieve reasonable results, especially for large images. Such perfor- mance makes it almost impossible to conduct real-time image processing with sequential algorithms on community computers. Recently, OpenCL was developed providing a programming paradigm to explore the tremendous computational power for operations on vectors, matrices and high-dimensional matrices. In this paper, five different approaches of sequential and parallelized Hough transform algorithms are researched using CPU and GPU execution. Experimental results indicate that the realized Hough transform on GPUs can achieve up to 4000 times speedup over the serial version on CPU. With other efficient image scaling algorithms, real-time circle extraction can be achieved with GPU support. Keywords – Hough Transform, GPU Acceleration, OpenCL, Image Processing I. INTRODUCTION Hough transform is a popular technique for feature extrac- tion in image processing and computer vision. This concept was first proposed to detect straight lines [1] and was later generalized into a robust technique to detect the positions and directions for any shapes that are already known [2]. Such scheme was known as generalized Hough transform. Because of its powerful nature on shape recognition, Hough transform also plays an important role in image and object re- construction. However, the classical Hough transform adopts brute-force approach, which normally takes long execution time to detect shapes with more than two parameters, such as circles and ellipses. Many researchers have been working on optimizations of Hough transform. So far, the execution time of Hough transform to detect shapes with multiple parameters is still intolerable. Circle detection can be found in many applications in a wide range of academic areas, such as medical image processing [3], [4] and robot vision [5]. Since a circle in plane has three parameters, the parameter domain should be a cube, which requires long execution time and large memory capacity. To solve such kind of problem, people try to reduce the dimension of parameter space using specific techniques on certain problems [6]. However a generalized solution has not been accomplished yet. This paper intends to accelerate Hough-transform-based circle detection with parameter space using OpenCL tech- nology on GPUs. It makes the following contributions: • One sequential Hough transform algorithm [7] is re- searched using CPU execution (1) • An sequential (2) and parallelized (3) optimization of former algorithm is investigated and • A CPU (4) and GPU (5) version of Hough transform are deployed for OpenCL architecture. The rest of the paper is organized as follows: Section 2 discusses the Hough transform in general and focuses on circle detection using generalized Hough trans- form. Section 3 introduces the possibilities of parallelization regarding Algorithms and Choice of architecture. Section 4 provides the actual state of the art and Section 5 will provide the concrete implementation details of Hough transform for OpenCL. Section 6 illustrates detailed experimental results and performance analyses to demonstrate the effectiveness of GPU acceleration. Finally, in the last section our conclusion and future work are described. II. HOUGH TRANSFORM AND CIRCLE DETECTION In this paper, we only discuss shapes on a plane. Shapes with two parameters a and b can be represented as a function f(a, b) = 0, such as x + ay + b = 0 and x2 + ax + b − y = 0 for lines and a special type of parabolas, respectively. While the image domain resides in the X-Y coordinate system, the transformed domain, or parameter domain, should be located in the A-B coordinate system. Fig. 1 gives rough ideas about Hough transform. First the edges of the input image will be determined using edge detection algorithms like canny edge detector [8]. The image will then be converted to a binary image and Hough Transform can be applied. The resulting hough space is then utilized to find local maxims which will represent the circle centers. Edge Detection Hough Transform Find local maxima Binary Image Input Image Hough space Model Parameters Figure 1. General workflow of Hough Transform The equation of circles can be written as (x − a)2 + (y − b)2 = c2 , so we set our parameter domain as a 3-D
  • 2. cube A-B-C. Assuming we have a edge point (x0, y0) in the image domain, which can be seen as a point on the circumference of circle centered at (xc, yc), we can select all other points as this center point, then rc can be calculated by rc = (xc − x0)2 + (yc − y0)2, hence we get a set of coordinates (xc, yc, rc) corresponding to a point in the parameter domain (see Fig. 2). ϕ rc (xc, yc) (x0, y0) Figure 2. Calculation of points in parameter domain We set counters for all points in that 3-D cube and increase them by 1 when they are “visited” by the calculated (xc, yc, rc). It can be imagined that for one (x0, y0), there will be a conical surface radiate from the point (x0, y0, 0) along the line f(a, b) : {a = x0; b = y0} in the 3-D cube. The conical surfaces here are counterparts of the lines in the K-B domain. If we deduce back from the parameter domain, we can find that for a real circle, say (x−a0)2 +(y−b0)2 = c2 0 , counter’s value of the corresponding point (a0, b0, c0) in the parameter domain must outstand among its neighbors. We can see this phenomenon clearly in Fig.3, where three circles shaded by intensive noises are found in parameter domain as peaks. Another good nature of circular Hough transform is we do not have to worry about the infinite slope, but the range of r should still be specified to save space and calculation time. In real applications, we should not always follow the tra- ditional algorithm but need to find better ways to implement it. There are several mapping strategies between the image domain and the parameter domain to make right points stand out. Two strategies of circular Hough transform and their relative merits will be discussed. III. POSSIBILITIES OF PARALLELIZATION Parallelization can be achieved through various different approaches. They range from simple thread-programming in popular programming languages (eg C/C++ or Java) to the inclusion of powerful frameworks. For Hough transform the fact that each edge point from the original image is processed independently of one another can be utilized, there- fore numerous ways of parallelized solutions are possible. Some of these possibilities are presented and discussed in the first part of this section. The second part introduces popular computing platforms and programming models that allow the participation of the GPU of modern graphic cards. A. Algorithms The basic idea of parallelization is to divide the overall task into independent sub-tasks that can be processed si- multaneously. Modern computers have one or more CPUs with multiple cores each, so a real parallel processing can be realized. The division of the original problem into appropriate sub-tasks is of crucial importance, especially when resources must be shared between them. The simplest form of parallelization is the processing of multiple images simultaneously. Each available core calcu- lates the complete Hough transform for an image. Data exchange between the cores is not required. When processing different-sized images it is very likely that the work is not equally distributed among the cores. To avoid such a waste of processing power, a more complex approach is required. One possibility for the Hough transform to recognize circles with unknown radii is the parallel processing of the individual radii for one image. Each CPU core computes the Hough transform for a single radius, and generates a separate Hough space, only itself has access to. Therefore a synchronization of this space is not required. All CPU cores can operate completely in parallel without blocking each other, since the shared resource – the list of contour points – is read-only. The result of each sub-task contains all center points of the detected circles with this particular radius. Another approach is the parallel processing of individual regions within an image. Each task computes the Hough transform for contour points of its region and generates a separate section of the complete Hough space. These sections are each task-exclusive, so synchronization is not necessary. A disadvantage of this approach is an additional processing step in which the Hough space for neighboring regions must be composed of the individual spaces of each task. On the other hand not the entire Hough space has to be fully allocated. Only the subspace of neighboring regions must be kept in memory simultaneously. Once a region is no longer needed for the voting process, the memory used can be released. This reduction of the required memory can be extremely useful when processing large images (eg with more than 100,000 pixels in each dimension). There are many other approaches to the parallelization of the Hough transform. These examples should give only a general view of the possibilities. B. Choice of architecture The hardware architecture of a CPU and a GPU differ significantly from each other. The reason is the different purpose of both devices. A task that is highly parallelizable, can be calculated much faster on a GPU than on a CPU if
  • 3. Figure 3. Hough transform on circles with r = 30 the specifics of the GPU (eg memory usage) are taken into account. The gain in speed between CPU and GPU is greater when a job involves a lot of calculations a GPU is optimized for. For a detailed description of the hardware architecture and the resulting application scenarios for GPUs see [9]. In recent years, various computing platforms were intro- duced, allowing the programming of software that can be executed on GPUs. Two of these platforms are CUDA1 and OpenCL2 . In addition to the details of their implementation, the most striking difference is the number of supported hardware devices. While OpenCL supports different devices (CPU, GPU, DSP or other processors) from different manu- facturers, CUDA can only utilize GPUs from NVIDIA. This limitation of CUDA can lead to a speed advantage of up to 50% in comparison to an OpenCL program which has not been specifically adapted3 for Nvidia hardware [10]. As already described, the Hough transform is a highly parallelizable task. The use of a platform for parallelization, with the aim to significantly reduce the computation time, is a reasonable solution. IV. STATE OF THE ART Digital image processing is an area with a wide range of applications with an active research and development community. Since the Hough transform is an important tool, it is included in many frameworks or applications. One of the most common is the cross-platform framework OpenCV. Despite the existing solutions, the Hough transform is a field of active development, especially since the option to include the GPU for calculations is available. Current GPU implementations of the Hough transform can achieve remarkable increases in speed when compared with a CPU. Two recent implementations for the CUDA platform could achieve speedup factors of up to 400 [11] [12]. 1Compute Unified Device Architecture – NVIDIA 2Open Computing Language – initially Apple Inc., now Khronos Group 3adapted OpenCL programs can be equally fast but loose their portability V. PARALLELIZATION OF HOUGH TRANSFORM Within our solution the first step is to generate all jobs for the hough kernel. These jobs are represented by a triple (x, y, r). For each edge point in the original image and all radii, meaning rtotal ∈ [rmin, rmax], such a pair is generated. The total job size is calculated by edges · |rtotal| and will be the global work size for the first OpenCL kernel. For each pair (x, y, r) the kernel will be executed and the midpoint algorithm is used to “visit” the pixels of interest and increment the visited pixel by value one. w h w' h' 0 0 00... w' * h' Original Image Resized Hough Space Linear aligned Hough Space rmax x,y x',y' rmax Figure 4. hough space The original image will be transformed into the resized hough space by adding the maximum radius rmax to all sides of the image domain. Within the OpenCL kernel the resized hough space is interpreted as a linear aligned data structure. The array index n for Point (r, x, y) is calculated using formula i(r, x, y) which is shown in Fig. V. For each radius, different hough space images are aligned in a sequential order. Thus, formula one is used, to determine the offset within the hough space. To determine the original parameters (r, x, y) from the array index n the third formula i−1 (n) can be used. Adding this “junk data” leads to an optimization of the midpoint algorithm where no further boundary check within the OpenCL kernel is needed. Checking operations are extremely slow within OpenCL using GPU execution. This junk data will be ignored for voting process by setting the corresponding three dimensional global work-size of the
  • 4. voting kernel. ϕ(r) = (r − rmin) · w · h (1) i(r, x, y) = ϕ(r) + y · w + x (2) i−1 (n) =    n/(w · h ), r ∈ N n − ϕ(r)/w , y ∈ N n − ϕ(r) − y · w , x ∈ N (3) The voting kernel will be started with a three dimensional global work-size and a global offset to skip all junk data, which can be seen in listing 1. 1 offset[3] = {r_max, r_max, 0}; 2 worksize[3] = {img.cols + r_max, img.rows + r_max, 3 r_max - r_min + 1}; Listing 1. Setting clEnqueueNDRangeKernel Parameters For voting a simple thresholding and local maxims search is used. The threshold is dependent on the radius of the circle. Within the voting process the coordinates (x , y ) are converted back to original image domain (x, y) and the number of found circles is saved in a __global variable. The vote space itself will have size of found edges as a maximum because no dynamic data structures can be used within a OpenCL kernel. After the vote space kernel is finished the data structure consisting of concrete instances of (x, y, r) is copied back to the host system and the Hough transform is finished. VI. EVALUATION To evaluate the implemented Hough transform with OpenCL a benchmark was conducted. The structure of the benchmark is as follows: A. Test images Images with 4 different quadratic resolutions (256x256 - 2048x2048 pixel) and an increasing number of edge points (up to ≈ 100.000) were used in the benchmark. One sample image with a resolution of 2048x2048 pixel is shown in Fig. 5. B. Implementations In order to assess the implementation presented, it should be compared with three other freely available solutions: • V1 – sequential Hough transform algorithm [7] (CPU) • V2 – optimization of V1 (sequential – CPU) • V3 – optimization of V1 (parallelized, 30 Threads – CPU) • V4 – presented OpenCL implementation (CPU) • V5 – presented OpenCL implementation (GPU) All implementations of the Hough transform should deter- mine circles having radii of 10 to 40 in the respective image. C. Hardware Used Hardware for Benchmark: • CPU: Intel(R) Core(TM) i7-3720QM • GPU: ATI Radeon HD 5770 Figure 5. Test image 2048x2048 pixels and 92640 egde points D. Results To obtain reliable processing times, all images were pro- cessed ten times by each solution and then the average time was calculated. The average times for all solutions and images are shown in Tab. I Format Edges V1 V2 V3 V4 V5 256x256 1503 11,321 0,813 0,210 0,022 0,031 512x512 5790 167,432 3,301 0,856 0,080 0,073 1024x1024 23160 1200,344 13,373 3,278 0,476 0,354 2048x2048 92640 3400,442 53,169 13,056 1,772 0,791 Table I. EXECUTION TIME OF HOUGH TRANSFORM IN SECONDS A graphical representation of the benchmark result is shown in Fig. 6 (V1 has been omitted for clarity). The result of the benchmark shows that the presented OpenCL implementation (CPU and GPU) is significantly faster than any other solution. The minimal difference for images up to the resolution of 1024x1024 can be explained by the simplicity of the calculations, which do not fully exploit the potential of the GPU. Only for the largest image, the resolution is almost irrelevant, the number of contour points is crucial to the work, the GPU is faster than the CPU by a factor of 2. The number of calculations depends on the number of contour points which is multiplied by the number of observed radii. For the 2048x2048 resolution image there are 92640 contour points and 31 radii (10 to 40) which results in 31 · 92.640 = 2.871.840 jobs each calling the midpoint OpenCL kernel. With this large number of calls, the graphics card can begin to take benefit of their advantages.
  • 5. 500 1,000 1,500 2,000 0 10 20 30 40 50 V2 V3 V4 V5 Image Dimension n × n Executiontimeinseconds Figure 6. Execution time of different Approaches If the processing time of solutions V1 and V5 are com- pared, the speedup can be calculated as follows: SpeedupV 1/V 5 = 3400.442 0.791 = 4298.91 (4) The implemented Hough transform for the OpenCL pro- gramming platform (processed on a GPU) can achieve a speedup by a factor of ≈ 4000 compared with a serial CPU version. By optimizing the OpenCL kernels the speedup could possibly be increased even further. VII. CONCLUSION In this paper the broad applicability of Hough transform in digital image processing has been presented. Various possi- bilities for parallelization to reduce the processing time were discussed. The result of this work is the implementation of Hough transform using the programming platform OpenCL. In order to assess the results in comparison to other imple- mentations (serial and parallel on CPU), the processing time was compared with these. The introduced OpenCL solution can reach a speedup by a factor of up to ≈ 4000. To further improve the solution, the implemented algo- rithms are investigated to determine whether they can exploit the capabilities of the GPU even more. So far, all tests were conducted solely with artificial images. To assess the quality of the implemented OpenCL solution even better, real-world images will be used for testing in the next step. Possible sources for these test images could be free databases of test images for image processing, eg [13] [14]. Another approach to further increase the reliability of the evaluation is the use of parameters such as precision and recall. Using these values, a statement can be made about the reliability of the detection, ie could the Hough transform detect all circles in the image and were objects that are not circles, identified as such. The long-term goal of this work is the detection of circles with variable radii in a video stream (eg a web cam) in real-time. To achieve this, an edge detection is necessary to perform the Hough transform and has to be also created in real-time. The next step will therefore be to evaluate appropriate edge detectors. The main criterion is the effective computability on a GPU using OpenCL. In [15] the pos- sibility to implement the canny edge detector on an GPU was investigated. This method should be sufficient for the intended application. REFERENCES [1] P. V. C. Hough, “Method and means for recognizing complex patterns,” 1962. [2] R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in pictures,” in Communications of the ACM, 1972. [3] S. Eom, R. Bise, and T. Kanade, “Detection of hematopoietic stem cells in microscopy images using a bank of ring filters,” in The IEEE International Symposium on Biomedical Imaging, 2010. [4] M. Smereka and I. Duleba, “Circular object detection using a modified hough transform,” in International Journal of Applied Mathematics and Computer Science, 2008. [5] Y. Yabuta, H. Mizumoto, and S. Arii, “Binocular robot vision system with shape recognition,” in International Conference on Control, Automation and Systems, 2007. [6] Y. Xie and Q. Ji, “A new efficient ellipse detection method,” in International Conference on Pattern Recognition, 2002. [7] M. Bowes. (2009) Hough circle detector. [Online]. Available: https://github.com/marcbowes/Hough-Circle-Detector [8] J. Canny, “A computational approach to edge detection,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, no. 6, pp. 679–698, 1986. [9] J. Owens and U. D. Davis. (2007) Gpu architecture overview. [Online]. Available: http://gpgpu.org/static/s2007/slides/ 02-gpu-architecture-overview-s07.pdf [10] K. Karimi, N. G. Dickson, and F. Hamze. (2010) A performance comparison of cuda and opencl. [Online]. Available: http://arxiv.org/ ftp/arxiv/papers/1005/1005.2581.pdf [11] S. Chen and H. Jiang, “Accelerating the hough transform with cuda on graphics processing units,” in Proceedings of 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2011. [12] F. S. Tasel and A. Temizel, “Parallelization of hough transform for circles using cuda,” GPU Technology Conference, 2012. [13] Computer vision test images. Carnegie Mellon University. [Online]. Available: http://www.cs.cmu.edu/∼cil/v-images.html [14] The usc-sipi image database. Signal and Image Processing Institute. [Online]. Available: http://sipi.usc.edu/database/ [15] Y. Luo and R. Duraiswami, “Canny edge detection on nvidia cuda,” in Computer Vision and Pattern Recognition Workshops, 2008. CVPRW ’08. IEEE Computer Society Conference on, 2008, pp. 1–8. [16] M. B¨auml and R. Stiefelhagen, “Evaluation of Local Features for Person Re-Identification in Image Sequences,” Research Paper, Institute of Technology, Karlsruhe, 2011. [17] D. Wagner, “Marker-Based Tracking,” 2008. [Online]. Available: http://handheldar.icg.tugraz.at/markerbased.php [18] D. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Ph.D. dissertation, University of British Columbia, Canada, 2004. [19] ——, “Object Recognition from Local Scale-Invariant Features,” Re- search Paper, University of British Columbia, Canada, 1999. [20] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Research Paper, Research Institute for Advanced Study, Baltimore, 2002.