YANG Ning, JIN Sheng
(College of Hydraulics Engineering,Dalian University of Technology, Dalian 116000,Liaoning,China)
Abstract:Road extraction has important applications in many aspects,which is a very necessary and active research topic. In the past, manual marking of road extraction was laborious and could not guarantee accuracy. With the development of computer computing power, morphological algorithm, computer vision, machine learning and other methods began to be used to extract and mark road chaos. Common issues and feasible solutions in the road extraction process are discussed. We also conducted the experiment of road extraction task via U-Net with the database of Massachusetts Roads Dataset. Under the lack of training samples and short training time the combination of two loss functions, Binary Cross entropy loss and IoU loss, is the possible way to complete the task.
Key words:deep learning; computer vision; road extraction; neural network
The extraction of road centerline, the distribution of urban green space and urban buildings have important applications in autonomous driving, earthquake relief, urban planning, transportation, hydraulics engineering, etc. They are very necessary and active research topics. In terms of hydraulics engineering, the study of urban floods requires urban road network information as a model parameter, but in most cases it is very difficult to obtain urban road network information. We often need to face the situation where there is no road network information or lack of information. To cope with this problem, the road extraction method is mainly manual annotation at first, but this is a very tedious, time-consuming and tedious work. With the development of computer technology, software that can directly provide road network information such as OpenStreetMap and GIS has gradually appeared, but most of the software is developed by foreign companies or foreign teams. In recent years, with the enhancement of computer computing power and the development of artificial intelligence, machine learning and computer vision, automatic extraction of road information from high-definition satellite images or remote sensing images has become an efficient and practical choice. At present, most of the artificial intelligence technology in the water conservancy industry is applied in the process of project management and later maintenance, and there are few applications that combine artificial intelligence technology with numerical simulation. In summary, a method of using artificial intelligence technology to provide the road network information required for calculations for numerical simulations is studied. This method can accurately and quickly obtain the required information without relying on any software.
The methods of road extraction can be divided into semi-automatic road extraction and fully automatic road extraction. The semi-automatic road extraction[1]is very sensitive to seed points, so it requires a lot of man-machine interaction and artificial seed points in the process of work, which greatly reduces the overall work efficiency. Moreover, in the case of complex background, the semi-automatic method often takes a long time to calculate and cannot obtain satisfactory results. The automatic road extraction method relies on the designed recognition framework and a large number of training samples, however, due to the change of road characteristics and the blocking of the road by trees, buildings and other problems[2], it is difficult to design a recognition framework applicable to all situations. Moreover, automatic road extraction requires a large number of training samples to ensure its recognition accuracy, and in most cases, the number of training samples can’t meet the requirements. Therefore, the results of automatic road recognition in a wide range of recognition tasks are not always satisfactory.
Based on the above issues, common issues and feasible solutions in the road extraction process are discussed. Also, we conducted the experiment of road extraction task via U-Net with the database of Massachusetts Roads Dataset. The rest of this article is arranged as follows: the second section summed up the work of the predecessors. The third and fourth part of the manuscript discusses common issues and solutions in road extraction tasks. The fifth part includes the experimental process and results. Conclusion and outlook are shown in the final part.
According to the degree of manual participation in the extraction process, the methods of road extraction can be divided into semi-automatic road extraction and fully automatic road extraction.
Semi-automatic road extraction methods include Mathematical, Snakes, Classification, etc. The semi-automatic road extraction method needs to set parameters and seed points artificially in the extraction process. One of the main problems is difficult to give the best parameters for a given image. For a new image, the operator needs to try many different combinations to achieve the best effect. Moreover, with today’s improved image resolution, the time required to extract road information using semi-automatic methods is also increasing.
The automatic road extraction method does not need human intervention in the process of road extraction, which greatly improves the convenience of road extraction, and with the improvement of computational power, a large number of artificial neural networks are applied to the road extraction problem. In 2007, Mokhtarzade M,et al.[3]discussed the possibility of using artificial neural network to quickly extract roads. Yuan J,et al.[4]proposed the automatic road identification network LEGION in 2009 and verified its effectiveness. Krizhevsky A,et al.[5]proposed AlexNet, a recognition network based on Deep Convolutional Neural Network (DCNN), and won the first place in the 2012 ILSVRC competition. Zhang Z,et al.[6]proposed Deep Residual U-Net, which combines the advantages of Residual learning and U-Net, It simplifies training and reduces training parameters by 1/4 compared to U-Net. Similarly, Iglovikov V,et al.[7]proposed TernausNet by combining U-Net and VGG11 encoder, which greatly improved the recognition ability of U-Net. Cheng G,et al.[2]proposed the Cascaded Convolutional Network (CasNet) based on the (CNN), which can extract the center line of the road and the building information at the same time.
There are many issues in the process of road recognition, which are divided into three categories: image acquisition, road characteristics and limitations of recognition algorithm.
For the road recognition task, we need to obtain High Definition (HD) satellite image, and the road in the satellite image is expected to be clear and features are obviously easy to extract. However, due to some problems such as shooting and communication, noise is inevitable in satellite images, so it is difficult to obtain such satellite images with distinctive road features.
The noise in the satellite image in the road recognition task is mainly the electronic noise generated in the process of image shooting, storage and transmission. There are many causes of electronic noise. The first is camera noise. Firstly, cameras are equipped with Charge Coupled Device (CCD) and Complementary Metal Oxide Semiconductor (CMOS) and other photosensitive components. Due to the thermal stability of these sensory components, when the camera temperature is too high and the noise signal is too strong, noise spots will be formed in the image. Secondly, the size of the sensor is also the main reason for the noise. The photodiode conversion efficiency of the pixel is proportional to the area of the sensor, and the photodiode conversion efficiency will decrease sharply with the decrease of the area of the sensor. As a result, signals obtained on CCD or CMOS need to be amplified before they can be used properly, this process is accompanied by synchronous amplification of the noise signal. The second is the noise generated during image transmission. The transmission of image information involves two key steps: image compression and file transmission. Due to the large size of HD satellite images, it takes a lot of time to directly transmit the original image, so the image is compressed in one step before transmission. In the process of compression, different compression formats will adopt different compression algorithms. For example, the image in JPEG format can be compressed to get a very natural image, but the noise generated by compression will become more obvious as the compression rate increases. Besides, in the process of file transmission and storage, information will be lost due to the problems of electronic equipment, resulting in noise in the image.
The issues of road characteristics can be roughly divided into three categories: road morphology issue, occlusion issue and feature similarity issue.
The roads are basically banded, The problem of road morphology is that the road width is very limited compared to the length, and the satellite image is taken by the in-orbit satellite in the earth orbit. Satellite images are taken in earth orbit by in-orbit satellites. Due to the high shooting altitude and the limited resolution of satellite camera, the road targets in satellite images are too small and the features are very limited. This brings great difficulty to the target recognition algorithm based on the target feature. Shielding problem[2, 8]refers to the interaction between the road and the surrounding environment. Since the road will not be empty or exist independently, the road in the satellite image will mostly be blocked by tall buildings, trees and vehicles, thus forming a noise-like interference. Many road recognition algorithms are based on the spectral characteristics of the road to identify the characteristics of the road, the shadow of tall buildings will lead to changes in the spectral characteristics of the road which will cause the recognition algorithm to be unable to identify.
As mentioned above, many recognition algorithms identify roads according to their spectral characteristics. However, the spectral characteristics of the road in the satellite image are not unique, and many other terrain will have the same spectral characteristics as the road, which leads to the road network identified by many recognition algorithms is not as accurate and reliable. This is the characteristic similarity problem.
The limitation of algorithm refers to the issues caused by the shortcomings of the algorithm in road extraction. For example, the morphological sparse algorithm[8]is used to extract the center line of the road. Although this method is simple and fast, it usually fails to obtain smooth and consistent results. The method based on regression[9]and the Non-Maximum Suppression method (NMS)[10]can solve the problem of smoothness and consistency in the road centerline extraction task, but it cannot obtain good results near the intersection road.
Convolutional Neural Network (CNN) has achieved great success in image detection, scene understanding and target detection. The CNN method can also be applied to the road extraction problem. However, the traditional CNN network has the problems of excessive memory consumption and low computational efficiency when performing the road extraction task. Moreover, due to the limitation of the input structure of CNN network, CNN network can only input images of fixed size, which limits the size of the induction region. Therefore, most of the tasks can only distinguish local features, which results in limited performance.
In view of the above problems, predecessors have done a lot of work. In order to solve the issue of noise caused by camera hardware, with the continuous development of technology,[11] provided hardware solutions from the aspects of circuit, temperature, structure and size, and greatly improved this problem. Hardware issues are not the focus of this paper, and will not be expanded in detail.
The most common issue we encounter is the issue of image noise, which will affect the accuracy of road recognition or even lead to unrecognition. Processing an image into a low-noise or noiseless image for recognition is a very common way, which is called noise reduction. Common noise reduction algorithms include: median filter, mean filter, gaussian filter, bilateral filter, etc.[12]. At present, most methods cannot extract roads from noise images. Therefore, Sujatha C,et al.[13]proposed a road extraction method based on connected component extraction and morphological operator. The method includes adaptive global threshold method, connected component analysis, shape closure, expansion and shape refinement. This method can complete the extraction task in the noise image, which greatly reduces the requirement for the quality of the input image, and it can also remove some non-road elements through the form refinement algorithm, which reduces the probability of error recognition of the algorithm.
The characteristics of the road itself have always been a popular issue for road identification tasks. In order to solve the noise and occlusion problems under cars and trees in high-resolution images, Cheng G,et al.[2]proposed a Cascaded end-to-end Convolutional Neural Network. The method combines two convolutional neural networks into a framework and divides the road extraction task into two steps: The first step is to extract the road area through the first convolutional neural network. Due to the existence of the coding layer and the decoding layer, this layer network can obtain more consistent detection results under the condition of complex background and occlusion. On the basis of the first network, the second network is combined with the thinning algorithm to obtain a smooth, complete, single-pixel width road centerline network. Most of the previous road extraction methods are only based on the spectral characteristics of the road, In order to solve the issue that spectral characteristics of different terrain are similar in the recognition process, Zhang Q,et al.[14]proposed a new comprehensive road recognition method for multi-spectral image, which can be roughly divided into three steps: Firstly, the k-means algorithm is used to segment the image, then the road is identified according to the spectral characteristics of the road surface material, and finally the misclassification of non-road terrain can be reduced by identifying the road according to the angle texture feature.
Extracting road information through neural networks has been popular recently, the most commonly used neural network for road extraction is CNN. However, due to the structure limitation of CNN, each input image can only be a fixed-size image, so the extracted features are mostly local features. For this reason, Long J,et al.[15]improved CNN by changing the last full connection layer of CNN into the convolution layer. So unlike CNN which uses a full connection layer to get a fixed length eigenvector and then classifies it through a classifier, in the last step of FCN structure, the deconvolution layer is used to carry out up-sampling of the final feature map, which generates a certain degree of prediction for each pixel. Then, on the basis of extracting local features, it can also extract global features, and finally, it can be classified on the up-sampled feature map.
Currently, most neural networks require a large number of training samples to train the neural network before performing an identification or classification task. However, in many cases, training samples are often insufficient, or even if there are enough samples, the quality of samples is not high, not up to the accuracy required by training neural network. To solve the issueof insufficient samples, Ronneberger O,et al.[16]designed a new neural network and training strategy: U-Net. The network is modified based on the full convolution network, and a large number of feature channels are set in the upper sampling part, which allows the network to transfer the context information to a higher resolution layer. In other words, the detailed information of the image is acquired in the high layer of the network, the low-frequency information of the image is acquired in the low layer of the network, and then the information between each layer is retained through skip connection, making the network easier to “remember”.
No algorithm can extract the road network with 100% accuracy, It is not unusual for there to be a missing link in the identified road network, In order to solve the missing link of road network identification process, Mattyus G,et al.[17]designed a structure: DeepRoadMapper. The structure is divided into two parts. In the first step, the traditional CNN structure is used to obtain the road network, and in the second step, the missing road network is completed by the A* algorithm proposed by Hart P E,et al.[18].
We conducted the experiment of road extraction task via U-Net with the database of Massachusetts Roads Dataset.
The dataset used in this project is the Massachusetts Roads Dataset, which can be accessed at https://www.kaggle.com/insaff/massachusetts-roads-dataset.The entire dataset is based on the Massachusetts aerial imagery, which consists of 1 171 aerial images of Massachusetts and the size is in 1 500×1 500 dimensions. Each image covers an area of about 2.25 km2, and the entire dataset covers an area of more than 2 600 km2. This dataset covers most areas such as urban and rural areas, with a large amount of data and the samples in the dataset are very representative. The 1 171 images in the dataset are randomly divided into a training set of 1 108 images, a test set of 49 images and a validation set of 14 images
The 804 pieces of training data from the training set are used to train the test network, and all 49 pieces of validation set data are used for testing.
The neural network adopted in is U-Net. The network structure was initially proposed in 2015[16], which was initially proposed for the purpose of medical cell segmentation as shown in Fig.1.
Fig.1 Examples in U-Net
U-Net is named after its network structure, as shown in Fig.2. The main structure consists of convolutional layer, maximum pooling layer, deconvolution layer and nonlinear activation function (ReLU).
Fig.2 Structure of U-Net
The left side of the network structure is called the contraction path, and the network structure of this path is the conventional convolutional network, including two 3*3 convolution cores, followed by a ReLU, and followed by a Max pooling layer for de-sampling. Every time the image data passes through the maximum pooling layer, that is, the feature channel will double. The subsequent expansion path consists of up-sampling, followed by 2*2 up-convolution. In this step, the number of feature channels is halved and corresponding clipping feature maps of the contraction path are connected. There are also two convolution kernels of 3*3 and then ReLU. In the last layer, the 1*1 convolution is used to map the eigenvectors of each 64 component to the required number of classes. The network has a total of 23 convolutional layers.
The other biggest feature of U-Net is data enhancement, which can teach the network what is invariable and what are the variables that need to be learned, thus greatly improving the efficiency of data use. The random displacement vector is used to generate smooth deformation on a rough 3*3 grid[16]. The displacement vector is sampled from a gaussian distribution with a standard deviation of 10. The displacement of each pixel is then calculated using bicubic interpolation. The dropout layer at the end of the shrink path performs further implicit data enhancement.
The problem studied is the problem of road extraction, which is simply the problem of edge extraction. U-Net performs well in different biomedical segmentation problems, most of which are edge extraction problems. Therefore, is tried to used U-Net network for road extraction.
According to the number of available data, in this experiment, we set batch_size = 2, steps per epoch = 400. The loss function we use is the combination of Binary Cross Entropy and IoU[19]. Use keras’ own evaluation standard “accuracy” to evaluate the performance of the model. During the training process, the accuracy of the model’s learning prediction can be controlled by changing the number of iterations and learning rate of each iteration cycle.
After multiple epoch cycles, different neural networks tend to stabilize. Among them, the loss of unet neural network dropped from 1.599 8 to 0.340 99 after 270 iteration cycles, and the final accuracy was 97.63%; the loss of FCN neural network dropped from 1.220 15 to 0.855 32 after 200 iteration cycles, and the final accuracy was 93.84%; the loss of FCN+vgg16 neural network dropped from 1.405 39 to 0.979 67 after 130 iteration cycles, and the final accuracy was 93.07%; the loss of SegNet+vgg16 neural network dropped from 1.154 73 to 0.666 42 after 220 iteration cycles, and the final accuracy was 95.29%. Variation curve of the loss function is showed in Fig.3. Through the trained network to predict the image, the results are shown in Fig.4.
Fig.3 Loss rate
We compare the road layout identified by the neural network with the labeled road layout and get the prediction accuracy. It can be seen in Fig.4. that in the case of a small amount of data and a short training time, all four networks can recognize the road layout. However, compared with the other three network structures, The road layout recognized by the FCN +VGG16 network is very vague and has low accuracy. U-Net, FCN, and SegNet+VGG16 can accurately identify the road layout (accuracy rate is above 93%), but SegNet+VGG16 is inferior to U-Net and FCN in terms of extracting details. The road layout extraction results of U-Net and FCN can clearly see that although both networks can extract the details of the road layout, the road layout proposed by FCN has more discontinuities, blurred details, and jagged textures in the road structure.
Fig.4 Prediction results
Common problems of image acquisition in road extraction is discussed. The influence of road characteristics on the extraction problem; The advantages and disadvantages of various extraction algorithms and applicable scenarios. Then we discussed solutions to these problems. In addition, we conducted the experiment of road extraction task via U-Net with the database of Massachusetts Roads Dataset. The combination of two loss functions, Binary Cross entropy loss and IoU loss, are proved to be the possible way to cope with our limited training data. Our results show that, compared to other network structures, U-Net has a powerful road extraction capability and can accurately extract road details, while the execution time is acceptable.
However, the result is not perfect, there are discontinues in the identified road layout. In future work, we can further process the results through image inpainting method, or improve the neural network to improve the recognition accuracy.
The road extraction problem has broad application prospects and can intersect with most disciplines. In the future, road extraction technology can be combined with fluid mechanics and hydraulics knowledge to form a flood prediction model, which can predict the urban flood process and give targeted suggestions based on the prediction results to reduce the loss of floods to the city.
黑龍江大學(xué)工程學(xué)報(bào)2020年4期