占子奇 刘兵 霍彬
Zhan Ziqi, Liu Bing, Huo Bin （ Dongfeng Nissan Technical Center, Guangzhou 510800）
Abstract In this research, a complete process of engineering implementation for Traffic Sign Recognition (TSR) was【 】established based on RGB (Red, Green, Blue) to HSV (Hue, Saturation, Value) model and Convolutional Neural Network (CNN). In order to improve operational speed, identification of dynamic Region Of Interest (ROI) was optimized, method of image changing from RGB to HSV model was optimized, and neural network structure was designed. TSR algorithm was verified with GTSRB database. The result shows that the proposed TSR method improves computation speed and recognition rate effectively. Key words: Traffic sign recognition, HSV model, CNN, Autonomous driving 基于CNN和HSV模型的交通标志识别研究（英文） 占子奇 刘兵 霍彬 510800） （东风日产技术中心，广州 HSV CNN）， TSR）【摘要】基于 颜色空间模型和卷积神经网络（ 建立了实现交通标志识别（ 的完整过程。为了ROI） RGB HSV提高运算速度，进一步优化了动态感兴趣区域（ 识别、 向 颜色空间模型转换方法和神经网络结构设GTSRB TSR TSR计。通过 数据库对 算法进行验证，结果表明，所建立的 方法有效提升了运算速度和识别率。主题词：交通标志识别 HSV模型 卷积神经网络 智能驾驶U461.91 A 10.19620/j.cnki.1000-3703.20180669中图分类号： 文献标识码： DOI:
Traffic Sign Recognition (TSR) has been an important perception task for autonomous driving system. Reliability and computing speed are regarded as two most important parameters for recognition tasks[ 2]. Both of traditional and
Neural Network (NN) methods are widely analyzed for TSR algorithm[ 6]. A traditional method usually uses expert
model including edge detection, shape recognition, content match, etc., which means every step of recognition algorithm is formulated. NN method is usually regarded as an end-to-end method, which means explanation of these algorithms will take a lot of time, especially during detection process 9].
Many programs had proposed methods with combination of expert model and neural network. In this paper, a combination of optimized expert model and Convolutional Neural Network (CNN) are used. And a complete process of engineering implementation for the recognition process for Chinese traffic signs are introduced.
In this paper, only traffic signs with red color are considered as recognition targets, because red sign represents forbid which includes rate-limiting, no passing, no turning left, etc. In actual driving mission, this traffic sign will give a direct instruction to the driving condition.
Essential steps include ROI identification, traffics sign area extraction, traffic sign recognition. All steps in Figure 1 are needed for the whole recognition process.
Two main parts are included in Figure 1: a. Dynamic ROI detection. This part deals with a frame of video recorded by camera, and output of 32 pixel × 32 pixel image that only includes traffic sign.
b. Traffic sign recognition. A CNN with 18 layers is designed for training, which gives a result of recognition.
Figure 1. Flow of TSR process
In each second, only one frame image in video will be taken for the detection and recognition process, considering TSR process is a relative slow comparing with obstacle recognition process or other processes related to autonomous driving.
3 Dynamic ROI Detection
Converting the image from RGB (Red, Green, Blue) to HSV (Hue, Saturation, Value) model makes it easier to extract red zones from the original image accurately. Then traffic signs are located in these red zones, and the image is turned into binary. In the binary format image, by eroding and dilating method the connected zones are obtained and traffic sign coordinate is set up, with which the ROI zone was extracted. The whole results of dynamic ROI detection process are shown in Figure 2.
RGB→ 3.1 Optimized HSV Color Model Method
Computer vision algorithms used on image process are usually straightforward extensions to algorithms, each color component is separately used as input of the algorithm. HSV color model has advantage over RGB color model. However, traditional method to transfer an image from
RGB to HSV model calculates every single hue, which consumes more time during transition. The proposed
transfer takes less method computation reduces Ctimmaxe=. calculate Imtcaaxn( process rb,ge, ebx)pressed and therefore as:
2） Cmin Δ= = Cmax min( - r, Cmin g, b) （
60° 0°, Cmax × Δ- = Cmin r + g Δ + b - Cmin , g ≥ b
hproposal = í 4） （ 360°- 60° × Δ- r + g Δ + b - Cmin , g < b î r, g, In model 3.2 which, separately. b represents Result is red, shown green, in Figure blue in 2b. RGB color 3.2.1 Target Binaryzation Area Extraction
Image color is redistributed after binary conversion by the degrees of hue in HSV model, which means 0 equals to 0°, Figure and 2c. 1 equals to 360° in HSV model. Result is shown in 3.2.2 Color extraction
A threshold is needed for red color extraction in binary image, here the threshold valuet∈ [0.027 7, 0.032 0]. Result is shown in Figure 2d. One thing needs to point out is, threshold valuetis a value based on experience and in most tests could get a fine extraction result. A better threshold value needs more tests in real environment. Color withintis set as white and others are as black in Figure 2d. 3.2.3 Erosion and dilation
Nosuch chaptert, a small part of pixel satisfies the threshold, therefore an erosion process was added after the extraction process. During erosion process, a disk area with radius of 10 pixels is used. Result shows in Figure 2e.
To improve the weight of target area, dilation process is added after erosion so that the traffic sign has a clear red circle on the outside edge, which means all pixels inside the circle will be set as the same value to the extraction area. Result is shown in Figure 2f. 3.3 ROI Extraction and Resizing
There are still many eligible areas after target areas extraction as Figure 2f shows. In this section, ranking for these alternative areas is needed. The ranking method is shown in Figure 3.
Figure 3. The ranking method of ROI area selection
In Figure 3, if area of an alternative less than 10% or more than 50%, it will be seen as an interferent such as a
red coke bottle on the road or a red building nearby the road. A RGB model image is extracted from the ROI area. Result of extraction is shown in Figure 2f.
Result of ROI extraction is the input of CNN recognizing process. Resizing image to 32 pixel × 32 pixel is the last step. Result is shown in Figure 2g. 4 Traffic Sign Recognition
CNN uses a variation of multilayer perceptions designed to require minimal preprocessing that has successfully been applied to analyzing visual imagery. CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered[ 10]. In this paper, a structure of neural network is used for the recognition process, however it is not a research focus during this engineering implementation work. 4.1 Frame of CNN
A 13 layers’network was designed for the recognizing process.
4.1.1 Input layer
Receiving 32 pixel × 32 pixel × 3 pixel RBG image as input.
4.1.2 Middle layers
Middle layers include 8 layers with repeating convolution layer (C) and max pooling layer (S):
a. Convolution: 32 5×5 convolutions with stride [1,1] and padding [2,2].
b. Max Pooling: 3×3 max pooling with stride [2,2] and padding [0,0].
c. Convolution: 32 5×5 convolutions with stride [1,1] and padding [2,2].
d. Max Pooling: 3×3 max pooling with stride [2,2] and padding [0,0].
e. Convolution: 64 5×5 convolutions with stride [1,1] and padding [2,2].
f. Max Pooling: 3×3 max pooling with stride [2,2] and padding [0,0].
g. Convolution: 64 5×5 convolutions with stride [1,1] and padding [2,2].
h. Max Pooling: 3×3 max pooling with stride [2,2] and padding [0,0].
4.1.3 Output layers
Output layers include 4 layer arrays, a. Fully Connected: 128 fully connected layers. b. Fully Connected: 4 fully connected layers. c. Softmax. d. Classification Output.
The structure of CNN is shown in Figure 4. 4.2 Training Data
Two kinds of dataset are used for the training process. One is GTSRB (German Traffic Sign Recognition Benchmark), each traffic sign contains 2 000 samples, in which 80% for training and 20% for testing. These data is used during test stage in lab, to verify the basic performance and reliability of the network. Another dataset is recorded from the real road test, to verify the performance of the whole algorithm considering real driving environment (including Chinese traffic signs, hardware of camera system, weather condition, etc.).
Figure. 4 Structure of network 4.3 Results of Recognition
4.3.1 Test with GTSRB dataset
Test with GTSRB dataset is done under static mode, parameters set in training process and test result are shown in Table 1. Comprehensive recognition rate reaches 99.6% in static mode. 4.3.2 Test with real road dataset
Test with real road dataset obtains good recognition rate in good weather conditions. But two things remain to be improved:
a. Under bad light environment, due to limitation of
camera hardware, original images under bad light environment such as crossing the portal could be very hard for target detection.
b. Under background with red color, it will be selected as ROI together with traffic when the red background overlaps with traffic sign. 5 Conclusions
This paper shows a complete calculation process of how to detect traffic signs and input them to CNN. In the next step of work, it is worth exploring how to determine the size of the image input into CNN, because it may be helpful to further improve the computation speed.
Only color based target detection is limited by color itself, camera performance, or light environment. In the future study, methods of combined color, shape and other methods will be analyzed. References  Islam K T, Raj R G. Real-Time (Vision-Based) Road Sign Recognition Using an Artificial Neural Network[J]. Sensors, 2017,17(4): 853.  Kuehni R G, Woolfe G. Color Space and Its Divisions, Color Order from Antiquity to the Present[J]. Physics Today, 2004, 29(2): 61-62.  .谷明琴 复杂环境中交通标识识别与状态跟踪估计算法 [D]. : , 2013.研究 长沙 中南大学 Cai Z, Gu M. Traffic Sign Recognition Algorithm Based on Shape Signature and Dual-Tree Complex Wavelet Transform [J]. Journal of Central South University, 2013, 20(2): 433439.  Zhang Z, He C, Li Y, et al. Traffic Sign Recognition Based on Subspace[J]. Journal of Chongqing University (English Edition), 2016, 15(2): 52-60.  Le G, Yuan X, Zhang J, et al. Traffic Sign Recognition Based on PCANet[C]// Advanced Information Management, Communicates, Electronic and Automation Control Conference. IEEE, 2017: 807-811.  , , .齐朗晔 张重阳 何成东 基于多特征组合的交通标识识[J]. , 2015, 37(4): 776-782.别 计算机工程与科学 .刘志 基于特征融合与深度卷积神经网络的交通标识识[D]. : , 2017.别 广州 广东工业大学 Yin S, Deng J, Zhang D, et al. Traffic Sign Recognition Based on Deep Convolutional Neural Network[J]. Optoelectronics Letters, 2017, 13(6): 476-480.  Pattathal V. Arun, Sunil K. Katiyar. A CNN Based Hybrid Approach Towards Automatic Image Registration[J]. Geodesy and Cartography, 2013, 62(1): 33-49.
Dilate Original video input Fill holes in image Image extraction ROI zone identification Hole meets threshold separating traffic signs and background? N RGB→HSV model Y Target area Target area extraction by color Delete Image binaryzation Recognized by trained CNN Erode Recognition results
Alternative areas N Area ratio [10%,50%] Y ∈ Delete Rank by area Take the biggest part as ROI area
g）ROI Extract h）Resize （ （ Figure 2. Dynamic ROI detection process
（ a）Original image （ b）RGB→HSV （ c）HSV→binarization
（ d）Red color extraction （ e）Erode （ f）Dilate
3×3 3×3 3×3 3×3 5×5 5×5 5×5 5×5 ConFnuellcyted 3×3 3×3 5×5 5×5 5×5 3×3 5×5 3×3 3×3 3×3 5×5 5×5 ConFnuellcyted (32) (32) (32) (32) (64) (64) (64) 32×32 (64) (128) 5×5 5×5 5×5 5×5 C (128) 3×3 3×3 3×3 3×3 S ConFnuellcyted 3×3 3×3 5×5 5×5 5×5 3×3 5×5 3×3 3×3 3×3 5×5 5×5 ConFnuellcyted S C S C S C S C C: Convolutional Layer S: Sampling Layer (Max pooling)