IRJET- Automatic Traffic Sign Detection and Recognition using CNN

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2248
Automatic Traffic Sign Detection and Recognition using CNN
Jisha Elizabeth Shaji1, Hari S2
1PG Student, Dept. of Electronics and Communication Engineering, Mount Zion College of Engineering,
Kerala, India
2Asst.Professor, Dept. of Electronics and Communication Engineering, Mount Zion College of Engineering,
Kerala, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - This paper presents an overview of traffic sign
detection and recognition. It describes the characteristics
and requirements and also difficulties behind the road sign
detection and recognition of the road signs. It shows the
convolutional nueral network technique used for the
recognition and classification of the road signs. The paper
introduces a traffic sign detection and recognition system
that accurately estimates the location and exact boundary of
traffic signs using convolutional neural network (CNN).In
navigation system, estimationofaccurateboundaryoftraffic
signs is important where road signs used as 3-D landmarks.
In the recent previous traffic sign detection system based on
CNN only provides bounding boxes. Here the system
provides precise boundary of traffic sign which helps the
recognition of signsproperly.Themethodusedisend-to-end
trainable and more robust to occlusion, blurred images and
small targets than other boundary estimation method. The
CNN-based traffic sign detection and recognition network
method gives a frame rate higher thansevenframes/second.
It also provides highly accurate and robust traffic sign
detection and boundary estimation results on a low power
mobile platform.
Key Words: Trafﬁc sign detection, traffic sign recognition,
convolutional neural network, classifiers, object detection
1. INTRODUCTION
TRAFFIC sign detection is a major crisis in intelligent
vehicles, trafficsignrecognitionprovidescritical information
like directions and alerts in autonomous driving or driver
assistance systems. Another application of traffic sign
detection is to compliment the navigation systems of
intelligent vehicles, by using traffic signs as distinct
landmarks for mapping and localization. Contrarytonatural
landmarks with arbitrary appearance, traffic signs have
standard appearances such as shapes, colors, and patterns
defined in regulations [1]. This makes it efficient and robust
to be detected and matched under any conditions, thus
making it a preferable choice as landmarks for road map
reconstruction. For reconstructing detected traffic signstoa
3D map, point-wise correspondencesof boundarycorners of
the signs across multiple frames is used, and then 3D
coordinates of the boundary corners are computed by
triangulation using the camera poseandinternal parameters
of the camera. For accurate triangulation of 3D position,
estimation of boundary of signs with pixel-level accuracy is
required. Existing traffic sign detection systems do not have
this capacity as they only estimate bounding boxes of traffic
signs [2]. Pixel-wise prediction methods such as semantic
image segmentation, which is applied successfully for road
scenes, can replace boundary estimation. But, it requires
time consuming algorithms that can severely harm the
performance of real-time systems for vehicles. By using
templates of traffic signs, we effectively utilizes prior
information of target shapes. This enables robust boundary
estimation of traffic signs that are unclear, but it’s difficultin
pixel wise prediction such as contour estimation and
segmentation. With respect toinputresolutionthatis1280×
720, our method achieves a detection rates which is higher
than 0.88 mean average precision (mAP), and boundary
estimation error less than 3 pixels. The projecting boundary
corners (matrix-vector products) requires lesscomputation
time and most of the required computation is from the CNN
forward propogation which can be accelerated by GPUs.
Combining with our efforts to find a base network
architecture that provides the best trade-off between
accuracy and speed, our precise boundary detection system
can run on mobile platforms with frame rates higher than 7
frames per second (FPS) with affordable traffic sign
detection, recognition and boundary estimation accuracy.
2. RELATED WORK
Recently, the great advanced work on object detectionhave
been achieved by CNN. Besides the discriminating power of
CNN on object category classification, the detection
networks shows the capability of accurate localization by
regression on object location. Two differentarchitecturesof
detection networks are currently being developed: direct
detection and region proposal based detection. In direct
detection, predictions on the position (by regression) and
class (by classification) of target objects are directly
obtained from convolution feature layers, resulting in
relatively faster run time. On the other hand, region
proposal based method first generates a number of
candidate regions regardless of their classes, and then
performs prediction on object position and class for each
candidate region. By performing regression and
classification twice in different stages of the network
pipeline, the region proposal based methods pursue more
accurate detection, with relatively slower run time than
direct detection methods. For the case of traffic sign
detection for autonomous driving, direct detection method

is adequate due to the latency of detection under limited
computational resources. Although most of the recent CNN
object detection methods provide accurate bounding box
and class label prediction, further processes should follow
to obtain precise object boundaries from the predicted
bounding boxes. To resolve this issue, boundaries of traffic
signs are simultaneously obtained as segmentation masks
by Over Feat-style convolutional neural network trained on
multiple tasks comprising bounding box detection, mask
segmentation, and sign categoryclassification[3]. However,
the prediction of pixel-wise segmentation [4] masks
requires intensive computation which results in very slow
speed of the network. On the other hand, we propose
boundary estimation method which does not require pixel-
wise segmentation and thus enables fast detection speed.
3. PROPOSED SYSTEM
The overall procedure of the proposed method is
illustrated in the figure.
Fig.1.Block Diagram
In this work, we create a CNN block where predictionsare
directly preformed across multiple feature levels. The main
difference of our network with the other detection networks
is its prediction of output: instead of predicting bounding
box of traffic sign, the network performs estimation of
accurate boundary of corresponding trafﬁc signs. When an
input image (traffic sign) is detected while a car is moving, it
is enhanced [5].
The process of adjusting digital images is image
enhancement therefore the results are more suitable for
display. The mainObjectiveofimageenhancementisprocess
the image (e.g. contrast improvement,imagesharpening)so
that it is better suited for identifying key features. After that
it undergoes thresholding, Image thresholding is a simple,
yet effective, way of partitioning an image into a foreground
and background. The image analysis technique is a type of
image segmentation that converts grayscale images into
binary images. Image thresholding is mostly needed in
images with high levels of contrast. Common image
thresholding algorithms include histogram and multi-level
thresholding. It then undergoes to a process of region
grouping and region properties. Grouping is a process of
extracting and representing information from an image is to
group pixels together into regions of similarity. Image
regions is also called objects. It can be contiguous or
discontiguous. A region in an image can have properties,
such as an area, center of mass, orientation, and bounding
box. After that the image undergoes for feature extraction
feature extraction starts from an initial set of measureddata
and builds derived values (features) intended to be
informative and non-redundant. Feature extraction is a
dimensionality reduction process, whereaninitial setof raw
variables is reduced to particular manageable groups
(features) for processing. When the input data to an
algorithm is too large to be processed and it is suspected to
be redundant, then it can be transformed into a reduced set
of features. The initial features subset is called feature
selection. The features that are selected contains the
required information from the inputdata,sothatthedesired
task can be performed by using this reduced representation
of information instead of the complete initial data. Finally
the image is classified with their features after passing
through the convolutional layer [6].
Convolution neural network algorithm is a multilayer
perceptron that is the special design for identification of
two-dimensional imageinformation.Alwayshasmorelayers
that is input layer, convolution layer, sample layer and
output layer. A 2-D convolutional layer performs sliding
convolutional filters to the input. The layer convolves the
input vertically and horizontally by moving the filters and
then computes the dot product of the weights and the input,
and then adds a bias term.
The convolutional layer is the important building block of
a CNN. The parameters consist of a group of learnable filters
or kernels, which have a small receptive field, but extended
through a full depth of the input volume. Each filter is
convolved across the width and height of the input volume
and computes the dot product between the entries of the
filter and the input during forward pass and producing a 2-
dimensional activation map of that filter. Afterconvolutionit
undergoes ReLu layer. It effectively removes negativevalues
from an activation map by setting them to zero Another
important concept of CNNs is pooling, which is a form of
non-linear down-sampling. There are several non-linear
functions to implement pooling among whichmaxpooling is
the most common. It divides the input image to a set of non-
overlapping rectangles and for each such sub-region it
maximize the outputs and then feed into fully connected
layers.

4. RESULTS
The output of the detection stage is a group of objectsthat
could be probable traffic road signs. This is forwarded tothe
recognizer for further evaluation, and then to the classifier
decides whether the detected objects are either rejected
objects or road signs, and in this case the classifier responds
with a sign code. For a good recognizer, some parameters
should be taken into consideration. Firstly, the recognizer
should provide a well discriminative power and low
computational cost. Secondly, it should be robust to
orientation such as vertical or horizontal, the size, and the
position of the traffic sign in the image. Thirdly, it should be
robust to noise. Fourthly, for real time applications the
recognition should be carried out quickly when a sign is
detected. Furthermore, the classifier should have the
capability to learn a large number of classes and as much a
priori knowledge about road signs should be employed into
the classifier design, as possible. Our method is more robust
to partial occlusion, cluttered background, and close signs
which cannot be directly handled by segmentation
algorithm. The recognized output is displayed in the figure
shown below. The proposed method provides detection
frame rate higher than seven frames/second and high
accuracy. The confusion matrix of recognition is also shown
below
Fig.2.Detection and Recognition of sign
Fig.3.Confusion matrix
A confusion matrix is a specific table layout that gives
information about the performance of an algorithm. Each
row of the confusion matrix consist of the instances in a
predicted class where each column represents the instances
in an actual class. Here there are five target classes and the
possibility getting the accurate class for each input is shown
in the table as percent. The total efficiency achieved is
98.8%.
5. CONCLUSIONS
In this paper, we proposed an efficient traffic sign
detection and recognition method where locations of traffic
signs are estimated together with their precise boundaries.
To this end, we generalized the traffic sign templates with
precise boundaries and high accuracy .To achieve practical
detection speed, we explored the best-performing
convolutional nueral network for both detection and
recognition consideringthecharacteristicsoftrafficsigns. By
using the templates of traffic signs, our method effectively
utilizes strong prior information of target shapes to the
drivers. This enables robust boundary estimation for traffic
signs that are occluded or blurry and also detects the
multiple signs. In addition, by optimizing the resolution of
network input for the best trade-off between speed and
accuracy, our detector can run with frame rate of 7 FPS on
low-power mobile platforms.
The Future direction of our method is that we can adopt
the latest architecturessuchasfeaturepyramidnetwork and
multi-scale training for better speed and accuracy. Finally,
the proposed method can be applied not only to traffic sign
but also to any other planar objects having standard shapes.

REFERENCES
1) M. Liang, M. Yuan, X. Hu, J. Li, and H. Liu, “Traffic
sign detection by ROI extraction and histogram
features-based recognition,” in Proc. IEEE Int. Joint
Conf. Neural Netw., Aug. 2013, pp. 1–8.
2) M. Mathias, R. Timofte, R. Benenson, and L. Van
Gool, “Traffic sign recognition—How far are we
from the solution?” in Proc. IEEE Int.Joint Conf.
Neural Netw., Aug. 2013, pp. 1–8.
3) P. Dollár, R. Appel, S. Belongie, and P. Perona, “Fast
feature pyramids for object detection,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–
1545, Aug. 2014.
4) C. Liu, F. Chang, and C. Liu, “Occlusion-robust traffic
sign detection via cascaded colour cubic feature,”
IET Intell. Transp. Syst., vol. 10,no. 5, pp. 354–360,
2015
5) A. Møgelmose, D. Liu, and M. M. Trivedi, “Detection
of U.S. traffic signs,” IEEE Trans. Intell. Transp. Syst.,
vol. 16, no. 6, pp. 3116–3125,Dec. 2015.
6) Y. Yang, H. Luo, H. Xu, and F. Wu, “Towards real-
time traffic sign detection and classification,” IEEE
Trans. Intell. Transp. Syst., vol. 17, no. 7, pp. 2022–
2031, Jul. 2016.
7) J. Uhrig, M. Cordts, U. Franke, and T. Brox. (2016).
“Pixel-level encoding and depth layering for
instance-level semantic labeling.” [Online].
Available: https://arxiv.org/abs/1604.05096
8) G. Lin, C. Shen, A. van den Hengel, and I. Reid.
(2016). “Exploring context with deep structured
models for semantic segmentation.” [Online].
9) O. Dabeer et al., “An end-to-end system for
crowdsourced 3D maps for autonomous vehicles:
The mapping component,” in Proc. IEEE/RSJ Int.
Conf. Intell. Robots Syst., Sep. 2017, pp. 634–641.
10) A. Gudigar, C. Shreesha, U. Raghavendra, and U. R.
Acharya, “Multiplethresholdingandsubspacebased
approach for detection and recognition of traffic
sign,” Multimedia Tools Appl., vol. 76, no. 5, pp.
6937–6991, 2017
11) A. Shrivastava, A. Gupta, and R. Girshick, “Training
region-based object detectors with online hard
example mining,” in Proc. Comput. Vis. Pattern
Recognit., Jun. 2016, pp. 761–769.
12) A. Møgelmose, M. M. Trivedi, and T. B. Moeslund,
“Vision-based traffic sign detection and analysis for
intelligent driver assistance systems: Perspectives
and survey,” IEEE Trans. Intell.Transp.Syst.,vol.13,
no. 4, pp. 1484–1497, Dec. 2012.
13) Y. Jia et al. (2014). “Caffe: Convolutional
architecture for fast feature embedding.” [Online].
14) M. Mathias, R. Timofte, R. Benenson, and L. Van
Gool, “Traffic sign recognition—How far are we
from the solution?” in Proc. IEEE Int. Joint Conf.
Neural Netw., Aug. 2013, pp. 1–8.
15) California Manual on Uniform Traffic Control
Devices, California State Transp. Agency and Dept.
Transp., Sacramento, CA, USA, 2014.
16) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
Fei, “ImageNet: A large-scale hierarchical image
database,” in Proc. Comput. Vis. Pattern Recognit.,
Jun. 2009, pp. 248–255.
17) X. Glorot and Y. Bengio, “Understanding the
difficulty of training deep feedforward neural
networks,” in Proc. Int. Conf. Artif. Intell. Statist.,
2010, pp. 249–256.

IRJET- Automatic Traffic Sign Detection and Recognition using CNN

More Related Content

What's hot (20)

Similar to IRJET- Automatic Traffic Sign Detection and Recognition using CNN (20)

More from IRJET Journal (20)

Recently uploaded (20)

IRJET- Automatic Traffic Sign Detection and Recognition using CNN