Yingxiao LI, Ju HUO, Ping MA, Ruiye JIANG
Control and Simulation Center, Harbin Institute of Technology, Harbin 150001 China
KEYWORDS Spacecraft service;High resolution image;Object detection;Complex environment;Lightweight model
Abstract With the explosion of the number of meteoroid/orbital debris in terrestrial space in recent years, the detection environment of spacecraft becomes more complex. This phenomenon causes most current detection methods based on machine learning intractable to break through the two difficulties of solving scale transformation problem of the targets in image and accelerating detection rate of high-resolution images.To overcome the two challenges,we propose a novel noncooperative target detection method using the framework of deep convolutional neural network.Firstly,a specific spacecraft simulation dataset using over one thousand images to train and test our detection model is built.The deep separable convolution structure is applied and combined with the residual network module to improve the network’s backbone. To count the different shapes of the spacecrafts in the dataset,a particular prior-box generation method based on K-means cluster algorithm is designed for each detection head with different scales. Finally, a comprehensive loss function is presented considering category confidence, box parameters, as well as box confidence. The experimental results verify that the proposed method has strong robustness against varying degrees of luminance change,and can suppress the interference caused by Gaussian noise and background complexity. The mean accuracy precision of our proposed method reaches 93.28%, and the global loss value is 13.252.The comparative experiment results show that under the same epoch and batchsize,the speed of our method is compressed by about 20%in comparison of YOLOv3,the detection accuracy is increased by about 12%, and the size of the model is reduced by nearly 50%.
In recent years,as the rapid development of space technology,grasping and acquiring spacecraft on On-Orbit Service (OOS)has become an essential research field of aerospace technology.1–3The main tasks of OOS are in two aspects: one is onorbit maintenance, including spacecraft component upgrade,refueling, and arresting derailment. The second is resource recovery tasks,such as recovering rare materials on abandoned spacecraft, or collecting the key operational information.4Apparently, the premise of capturing the target spacecraft is the implementation of accurate localization. Most spacecraft can be located directly by the communication module. However, for non-cooperative spacecrafts, such as failed satellites and space debris, which have no functioning response sensor,are extremely difficult to be located by radio signals.5As a direct approach of detection, computer vision-based target detection technology plays an increasingly important role in the acquisition mission of OOS. To address the detection of non-cooperative spacecrafts, vision-based detector shows significant advantages of non-contact, low power consumption and low cost. In general, a higher resolution of the image signal can bring a more accurate detection result. Assuming under the ideal experimental condition,there is only the target spacecraft in the receptive field of the space image without any noise interference. In this situation, the target can be detected by basic image processing algorithms like Otsu threshold segmentation6and Hough transform.7Many literature study target detection of spacecrafts on OOS in the ideal circumstances.For example, Shtark8and Bay9et al combine the Speeded-Up Robust Feature (SURF) operator and Kalman filter to develop a real-time feature detector based on stereo vision system, which achieves the feature detection and matching of floating robot.Zhang et al.present an improved Hough transform method considering multiple geometric constraints used to detect the bracket of satellite.10Liu et al. propose a multipeak optimization algorithm to detect the noncooperative target, which is limited to strict shooting angle and is only suitable to few types of spacecrafts.11Guo et al. employ a perspective-n-point and bundle adjustment algorithm to optimize the point cloud of the target spacecraft,which can redeem the functions of detection and pose estimation at the same time.12
However, there are diverse signal interferences in the practical process of visual localization, for instance, communication signal noise, intensity roughness, non-target background etc.These interferences exist widely in the transmitting of camera equipment or in the complex photographing environment with a large amount of space junk in near-Earth space.13,14These disturbances all bring severe challenges to the implementation of target detection in engineering applications. To address this issue, some advanced machine learning-based detection methods are proposed to counter the influence of these interferences, such as Histogram of Oriented Gradient(HOG)15and Deformable Parts Models (DPMs)16,17etc. By taking advantage of template features, HOG can effectively locate target objects in a complex background without being interfered by lighting condition and color. As an improved algorithm of HOG, DPM can perform bounding box regression on the deformable object region by using combination feature template. Chen et al. modify the methodology of HOG by simplified description operator of local features to complete the accurate detection of the satellite brackets.18Nevertheless, the methods mentioned above always face a challenge of high computation complexity, which is caused by the two following aspects. On the one hand, as the camera approaches the target spacecraft, the scale of the target changes in the receptive field. This process requires different sliding windows to scan local features at different scales,resulting in computing redundancy and low computational efficiency. On the other hand, the amount of computation increases exponentially as the increase of image resolution.Thus, due to the great amount of calculation, this type of method generally dissatisfies the fast-calculation requirement of online detection.
With the development of hashrate of computer,deep learning has shown its superiority in the field of computer vision,and it also exploits a new type of method for target detection.As a hierarchical structure of the deep learning method, Convolutional Neural Network (CNN) first extracts high-level semantic information from the input image or video data through feed-forward operation, and finally transforms it into a corresponding objective function; then according to the loss between ground truth and the predicted data, the parameters of each layer of the network are updated through backpropagation algorithm.19These operations are repeated iteratively until the model converges and achieves the purpose of detection.Fig.1 shows the Alex-Net,which is a classical inchoate CNN structure applied in image processing. As an emerging approach, this category of method has gained wide application in the aerospace field. Chen et al. design a spacecraft component detector based on Region Proposal Network(RPN),which is proved faster than the traditional sliding window method,but still fails to meet the detection speed requirements.20Hu et al.propose a satellite bracket detection method using transfer learning with a small dataset,facing the defect of only considering the ideal environment and lack of robustness verification.21
In the non-cooperative spacecraft localization task of OOS,the target is searched and located by the photograph system of the working platform, and then the platform approaches the target gradually from a far-range distance under the control of its own thruster.During this process,there are three difficulties that need to be resolved:
(1) The environment in OOS during the spacecraft recycle mission is relatively complicated, which requires the detection method to be robust against figure background and the interference in the process of data transmission.
(2) The detector needs to be deployed on the serving spacecraft, which means the size of the model should be lighter with fewer parameters to facilitate subsequent compression and optimization.
(3) The target needs to be continuously located, and this requires fast detection speed to meet online real-time work.
However,none of the above-mentioned methods can satisfy the three key requirements of spacecraft localization at the same time. To address this issue, this paper proposes a new spacecraft detection method using the framework of Deep Convolutional Neural Network (DCNN) to realize the online detect of spacecraft on OOS, which is named as the lite model of Spacecraft Convolutional Neural Network(SCNN-lite).To achieve the high-speed requirement of online detection, the position and width of the bounding box is iteratively obtained through the backbone and bottleneck of the network instead of using the RPN network like the above methods. Finally,it is proved that our spacecraft detection method can reach the accuracy and detection rate performance of state-of-thearts.
Fig. 1 Structure of Alex-Net.
Fig. 2 Diverse shape of OOS spacecraft.
Fig. 2 presents the shape of common spacecrafts currently.It can be concluded that the appearance of some spacecraft in service or those already recycled is generally rectangular-like shown by Figs. 2(a)–(d), and cylindrical-like shapes as Figs. 2(e)–(h). Then, two of the representative spacecrafts, Sloshsat-FLEVO and Dragon C2, are selected and used to establish the simulation spacecraft image dataset.Then,darknet is used as the backbone network of our model, and the modules are improved to reduce the calculation under the condition of ensuring the detection accuracy, and speed up the detection process. Finally, to further improve the detection efficiency,the prior knowledge based on unsupervised learning is used to constrain the regression process of the bounding box,to further improve the detection efficiency.
The rest of the paper is organized as follows: Section 2 explains the proposed spacecraft detection algorithm, the experiments are conducted and discussed in Section 3, the main contributions of this paper are summarized and the direction of future work is indicated in Section 4.
In this section,the spacecraft detection methods are presented in the context of engineering applications. First, an improved convolution module is employed to build the backbone network structure to enhance detection efficiency.Besides,anchor boxes are introduced into three different scales of detection heads of the network to achieve multi-class detection and rapid regression. Finally, according to the position and dimensions of the detection region and the confidence of the spacecraft type and the category of bounding box, the loss function of the model is built to make the model converge smoothly.
It is known that high-resolution image data is essential for precise localization tasks. Standard convolution obtains local information of the image through convolution kernels of different scales. Besides, the pixels of the image grow geometrically as its resolution increases. Therefore, the quantity of the convolution kernel’s parameters increases exponentially when training these images, which leads to consuming excessive training and detection time.
To improve this issue,the framework of Depthwise Separable Convolution (DSC) is chosen as the basic convolution module in this paper,22,23which can greatly compress model parameters and achieve the same effect as the standard convolution (conv). Specifically, DSC is composed of two parts:Depthwise Conv (DC) and Pointwise Conv (PC), as shown in Fig. 3. At the start, DC uses single conv kernel for each channel of the input feature map,then concatenates the output of all conv kernels to get its final output. Subsequently, the output feature map is transferred to the PC, a 1 × 1 conv located after DC, which not only allows the DSC to freely change the number of output channels, but also can integrate the information between channels of the feature map output by DC.
Fig. 3 Calculation procession of DSC.
For the standard conv layer, assuming that the size of the input feature map isHi×Wi×M, the output size is Ho×Wo×N and N conv kernels of size k×k×m are required. Moreover, assuming that each point in the spatial position of the input feature map is subjected to a convolution operation. Then, the total amount of calculation of the standard conv layer is shown as:
It can be seen from Eq. (3) that with the deepening of the convolutional network and the increase of the conv kernel’s size, the advantage of the DSC parameters becomes more significant. Especially, for high-resolution images, the semantic information of the feature maps in the deep network is complicated. Hence, DSC can greatly increase the calculation speed of the model.
In terms of structural design, normalization layer and activation function between DC and PC are added in some research,such as Ref.23,which helps to improve the nonlinear expressiveness of the network. However, it is demonstrated that this structure has no significant effect on public datasets.24On the basis of the previous research and considering the requirement for lightweight models in on-orbit services, the DSC structure in this paper is to connect DC and PC directly without any other operation, as is shown in Fig. 4.
2.2.1. Multi-scale detector
At the approaching stage of OOS,the scale of target spacecraft in the image changes as the service spacecraft getting close to it. To continuously locate the target spacecraft, it is necessary to detect the targets on multiple scales. At the back end of the backbone network, multi-scale detection is achieved by establishing detectors with different receptive fields.
The image data of the lower receptive field is extracted from the last two down-sampling stages in the backbone network,and the information of the higher receptive field figures is concatenated through the up-sampling layer. By using this bottleneck structure, image detectors of different scales are constructed.Subsequently,the feature map of image is divided into grid areas of 13×13, 26×26 and 52×52, which are the input of three different scale detectors. Each grid is used to detect whether there is a center of target region in its interior.Rather than sliding window on the feature map, dividing image into grids can help the network model directly obtain the approximate position of the target center from the feature map,and supplement other detailed information by the data in the channel of the feature map.Therefore,multi-scale detector can speed up the detection rate and distinctly reduce the memory usage.In this way,the spacecraft target farther away from the lens in the image is detected by the small-scale detector.Targets close to the lens in the image are detected by the large-scale detector, and the medium-sized receptive field detector is used for the detector in the middle dimension.
2.2.2. Bounding box regression
In the field of computer vision, bounding box is a universal method to mark detected objects in object detection. The parameters of bounding box are obtained from the feature map by the detection heads, including the center coordinates and width and height(pixel)of the bounding box.The essence of the calculation process of the bounding box is the regression of predictive parameters. With the initialization parameters,the model calculates the loss between the initial value and the ground truth, and then update the parameters of every layer through the back propagation of the network. In this way, all the parameters are updated iteratively, and the nextgeneration results move toward the minimum loss, until the model converges.
Fig. 4 Basic structure of DSC layer in network.
To further accelerate the model iteration to generate accurate bounding boxes, prior boxes are set under each grid, that is, a number of rectangular areas with specific widths and heights. The prior box is obtained based on the priori knowledge of the targets in the spacecraft dataset. In general, there are two different states of spacecraft: one is that the aspect ratio of its bounding box is close to 1. This occurs in some old spacecrafts (their solar panels are attached to the main body) and the spacecrafts with malfunctioning solar panels.The other is that the spacecraft working under normal condition,and the bounding box aspect ratio is usually 2:1 or 3:1,or even higher. Therefore, the prior boxes of spacecraft are designed according to the prior information in the application scenario.
Prior box can be obtained by artificial labeling and statistical clustering. Considering that our data volume is large, we use K-means clustering algorithm25to obtain the prior box.By calculating and selecting the minimum error between the prior box and the candidate box,the box cluster is established for every prior box in each generation. The box, whose width and height are the middle value of the cluster, are selected as the next-generation prior box and the algorithm iterating until the candidate box of each box cluster no longer change.
The relationship between the prior box and the bounding box generated by the network is shown as:
where pw,phand bw,bhdenote the width and height of the prior box and bounding box, respectively. twand thare the intermediate parameter predicted by the neural network.(cx,cy)is the grid position,which is offset from the top left corner of the feature map,and(tx,ty)is the offset of the bounding box’s center, which is relative to the grid. σ is the normalize function to compress txand tyto [0, 1]. Besides, the relationship between the prior box and the bounding box and their position, width and height offset are shown in Fig. 5.
The prior box is used in both the training process and testing process. At the training stage, the prior box that has the largest intersection ratio with the ground truth is used for the target prediction. The relation between bounding box’s dimension, prior box’s dimension and the width and height of the model output is determined by Eq. (5), so that each bounding box can predict region of different aspect ratios after multiple penalty trainings. In the testing stage, multiple prior boxes are first generated, then the categories and offsets of these prior boxes are predicted in line with the trained model parameters, and the predicted bounding box is finally transformed by Eqs.(4)and(5).In addition,after predicting multiple bounding boxes, the optimal bounding box is selected by non-maximum suppression.
Fig. 5 Prior box and bounding box.
2.2.3. Loss function construction
The loss function of SCNN-lite contains four parts as follow,which are all obtained by the convolution network:the loss of the center position(x,y)between prediction result and ground truth Lxy,the loss of their width and height Lwh,the confidence loss Lconfidence, and its conditional class probabilities loss Lclass.The comprehensive loss function is formulated as follows:
Similarly, Lclass is defined by the last term of Eq. (6), ^Pirepresents the ground truth of the probability of belonging to the target class c in the grid i, and Piis the corresponding value predicted by the algorithm. In addition, TensorFlow is used as the backend framework of the program in this paper,which defines the cross-entropy function based on the Napierian logarithm e, and the unit of entropy is nat.
In terms of model building, considering the application requirements of OOS, Darknet structure is used as the backbone in the proposed model. The Darknet in SCNN-lite is built by combining the residual network block with the DSC structure. Table 1 shows the specific structure, filter shape and output size of the backbone, where Conv represents standard convolution layer, and DSC means DSC structure presented in Fig. 4. To express the backbone clearly, other layers of the residual network (Zero-padding layers and add layers) are omitted in Table 1.
To deal with the difficulty that the scale of the spacecraft is constantly changing during the approaching stage, three feature maps with different dimensions are employed as the inputs of detection heads in SCNN-lite. To make full use of the semantic information of superior and subordinate, the concatenate layer is used to merge the high-level feature map downward to form an enhanced output feature map. Then each detection scale is equipped with different prior box suitable for the target object, which can effectively solve themulti-scale of the spacecraft detection. This method can quickly and accurately complete the detection task without losing too much accuracy. Besides, as a lightweight model, it is friendly to subsequent deployment and optimization to mobile devices. The overall architecture of the detection network is pictured in Fig. 6.
Table 1 Size of backbone.
The experiment section is started by introducing the composition and distribution of the dataset. Afterwards, the detection effect of SCNN-lite on high-resolution images and its accuracy under different interference is discussed. At last, the comparative test of SCNN-lite is conducted with the original YOLOv3 detection method. The experiments show that SCNN-lite has shown excellent performance in a variety of situations.
Generally, an accurately annotated image dataset is essential for training a robust detection model. To be specific, three key points should be considered to build the satellite detection dataset, which are elaborated in the following.
(1) Most of the public datasets use common images in people’s lives, such as ILSVRC and MS COCO. Although these existing datasets involve diverse information, the specific details about satellite are highly limited. Therefore, it is necessary to establish an independent dataset for satellite. According to the publicly available spacecraft data,26the main body of most of spacecrafts is cuboid-like or cylindrical-like,as listed in Fig.2.Taking the comprehensiveness of experiment into consideration,two representative satellite scaled models are selected and assembled to simulate on-orbit target detection experiment,as are shown in Fig. 7.It is noted that each of the satellite in Fig. 7 belongs to the different geometric shape groups mentioned above.
(2) To simulate the interference caused by space debris, the satellite dataset is established under three different complex backgrounds. These backgrounds contain objects whose appearances are similar with satellite models or debris, aiming to enhance the accuracy of our detection model.
(3) Space photography environment is another factor that should to be considered.Due to the limitations of space camera and telecommunication equipment, more spacecraft images in various conditions have been collected,such as the position and distance information of satellites, scene luminance and signal noise. Besides, part of these data is generated by data augmentation.
In summary, considering the pixel values of satellite cameras in recent years,27–29our simulation satellite dataset contains a total of 1200 images with a resolution of 3456 × 3456, of which 840 are regarded as the training set for model training, 240 are used as the verification set to observe the model trend, and 120 are applied as the test set to evaluate the final model. All images are annotated with ground truth, some of which are shown in Fig. 8. For real on-orbit service localization tasks, the dataset can be constructed using the three-dimensional scanning model of the target spacecraft or the spacecraft experimental images collected during ground testing, and the spacecraft detection model can be trained by simulating the real on-orbit environment.
Fig. 6 Overall architecture of network.
Fig. 7 Satellite models in experiment.
All experiments are performed on the Intel Core i7 9700 Processor with 3.0 GHz and NVIDIA 2070s GPU. The operating system is Windows10 and the programming environment is Pycharm IDE with Python 3.6, CUDA 9.0, cuDNN 7.3.0 and Tensorflow 1.8.0.
In the construction of SCNN-lite’s network structure, batch normalization layer and activation function contribute to the broadcast of network parameters. Batch normalization layer is helpful to prevent overfitting and speed up convergence in training process, and the activation function adds nonlinear factors to the model to enhance its complexity. Spacecraft detection are generally binary classification or multiclassification tasks with few categories, which means that the imbalance of positive and negative samples is exponentially magnified as the image resolution increases. Considering that the negative sample needs to be appropriately suppressed, so leaky-ReLU is selected as the activation function in SCNNlite:
where α is introduced to prevent dead ReLU problem.
Moreover, due to the particularity of the detection targets and working environment, general dataset is inefficient to improve the detection performance for the model. Thus, we train the model from scratch instead of transfer learning. At the start of training, L2-normalization function is added in convolution kernel regularization procession to prevent overfitting of the network, which is given by:
Fig. 8 Examples of experimental images.
where ω is the parameters of the model, λ is the L2-norm parameter, which is set to 5×10-4in the following experiments.
Subsequently,prior boxes need to be inserted at the bottleneck of the network structure. In view of the ground truth of our dataset, K-means clustering method is deployed to generate 9 prior boxes, which are assigned to three detectors with different scales. The distribution map of K-means clustering is illustrated in Fig. 9, where the diamond marker represents the cluster center, and the prior box scales for three detection heads are listed in Table 2.
In terms of holistic training in SCNN-lite, the initial learning rate is set as 10-3and is reduced to 10-1of the previous epoch after each training epoch.The Adam optimization algorithm30is employed to update the network parameters, with the batch size of 9 in each iteration.
In vision-based target detection, Detection Rate (DR) and
mean Accuracy Precision (mAP) are the general indexes to evaluate the model performance. DR is defined as the ratio of correctly detected images to the entire image in the test set. For spacecraft detection task, the expression of Intersection Over Union (IOU) is used to calculate DR of the model,as follows:
where t represents threshold, biis the area of bounding box,and giis the corresponding ground truth area of the target.Aorepresents the area of overlap of biand gi, and Auis the area of union of biand gi.Specifically,if IOU of the bounding box and the label region in the image is not less than t, it is considered that the spacecraft is correctly detected.An illustration about relationship between labeled ground truth and prediction result is shown in Fig. 10. The red solid frame represents the ground truth region, the yellow solid frame is the detection result of SCNN-lite, while the yellow dashed one represents the false positive result.
Table 2 Prior boxes scale for three detection heads.
Fig. 10 Schematic diagram of image IOU and detection rate.
In Section 3.1, a manually labeled spacecraft dataset is introduced, and the proposed method is trained on the training set of the spacecraft dataset.Then,the testing set is conducted to verify the detection ability of our detection model.In detail,we analyze the proposed SCNN-lite in four aspects, including background complexity, communication noise interference,image brightness changes and Grayscale image detection.
(1) Cluttered background.
To enhance the robustness of SCNN-lite, several different shooting scenes are chosen as the background. These backgrounds contain areas that has similar color and texture to the spacecraft, which will affect the convolutional layer analyses the low-level pixel information of the image. Further,objects that have similar shape of the spacecraft or space debris are placed randomly in the backgrounds,to create the challenging clutter backgrounds. Fig. 11 shows the spacecraft detection results under static cluttered backgrounds.The numbers above the boxes indicate the prediction confidence. It is clear that SCNN-lite locate the target spacecrafts with high confidence in these cluttered environments.
In addition, Figs. 11(a)–(d) prove that SCNN-lite is applicable of localizing spacecrafts from different views or distance,and Figs.11(e)–(h)manifest that SCNN-lite is able to identify different types of spacecrafts in the same image. Even if the scale difference between two spacecrafts is huge (like Figs. 11(f) and 11(g)), or the distance of them is very small (like Fig. 11(h)), SCNN-lite can still locate them accurately. This comprehensive applicability of SCNN-lite is, to some extent,owing to the multi-view and multi-target images covered in the dataset.
(2) Noise case.
Fig. 11 Detection results of SCNN-lite under cluttered backgrounds.
During the image transmission procession,the communication module of the working platform is susceptible by the signals from other equipment and different channels, resulting in unclear received images. These types of interference appear as gaussian noise in the image, which makes the target hardly to be detected, especially when the target size in the image is relatively small. The detection result under different levels of gaussian noises is demonstrated in Fig. 12, which means (μ)of all the noises are 0, and the variances (σ) range from 0.01 to 2.It is proved that SCNN-lite is capable to detect and localize the spacecraft region under varying degrees of noise. This attributes to the advantage of convolution and up-sample operation in semantic understanding, which makes the detection robust to pixel-level interference. It should be noted that,in the extremely situation with significantly strong interference as shown in Fig.12(f),the proposed method still has its limitation, that is, the detector fails to extract the spacecraft region accurately.
(3) Image luminance.
Fig. 12 Partial detection results of SCNN-lite under different noise effects.
In the space environment,image luminance is an important factor in the spacecraft detection.When the camera is far away from the target,the luminance of the image is usually dark.As the camera approaches it, the brightness of the image will be higher due to the reflection of the sun or light-emitting parts.To examine the robustness of SCNN-lite, we evaluate SCNNlite under various brightness ratio conditions, and the results are illustrated in Fig. 13. It can be seen from Fig. 13 that SCNN-lite shows excellent stability when the brightness ratio of the image increases or decreases, which can identify spacecraft accurately over huge brightness ratio ranges with a slight cost of confidence reduction.Besides,false retrieval may occur if the brightness ratio of the image is extreme faintness (like Fig. 13(e)). In the real experiment, the image data of the relative part should be strengthened according to the application needs, and the model needs to be further strengthened.
(4) Grayscale image detection.
Although color cameras have begun to be deployed and applied in current on-orbit services, considering transmission efficiency and computational complexity, grayscale camera is still one of the commonly used devices in spaceborne camera.Therefore, it is necessary to verify the detection effect of our method on grayscale images. We generate grayscale images on the basis of the original dataset and constructed the grayscale image dataset. SCNN-lite is trained under this dataset,and some of the detection results are shown in Fig. 14.
Through the experimental results, it can be seen that SCNN-lite still has a remarkable detection effect on gray figures.Since the amount of information in grayscale image is less than that of RGB image,the confidence of the bounding box is lower than that of the original dataset. However, SCNN-lite can still accurately locate the target region when detecting different types of spacecrafts at different scales. In addition, the detection confidence under high-brightness images is better than that under low-brightness images, and this is caused by the difference in contrast between grayscale pixels. Besides,Fig. 14(h) shows that excessive noise has more serious impact on grayscale images, making it almost impossible to be distinguished by human eyes.
For further clarification about the superiority of SCNN-lite, a comparative experiment is conducted with YOLOv3, which is a commonly used object detection method in engineering projects. These two methods are employed to test their performance in spacecraft detection. Specifically, SCNN-lite is compared with YOLOv3 in the aspects of computing consumption, testing speed and detection rate.
Both methods are trained under the same experimental conditions with the same dataset for 50 epochs,and the results are illustrated in Fig. 15. The blue box represents the detection result of YOLOv3, and the red box represents that of SCNN-lite. It is obvious that both methods could localize target correctly,but the estimated area of SCNN-lite is much closer to the ground truth than YOLOv3.Especially in the Fig.14(d), when SCNN-lite accurately detects the position of the spacecraft, some of YOLOv3’s detection results under the same epoch are still under-fitting.
Fig. 13 Partial detection results of SCNN-lite under different brightness conditions.
Fig. 14 Partial detection results of SCNN-lite under grayscale dataset.
Fig. 15 Comparison of detection result of SCNN-lite and YOLOv3.
From the depth separable convolution introduced in Section 3.1, SCNN-lite has significant strengths in terms of the number of parameters, which indicates that it takes less time in the training process than the model built by standard convolution of YOLOv3. At the same time, DSC structure also affects the operating rate of the model.More parameters mean more elapsed time is consumed to detect the spacecraft,the difference in running speed widens as the number of network layers deepens,which is shown in Table 3.Specifically,the size of SCNN-lite is reduced by half compared with YOLOv3, and the mAP has increased by 12.18%.
Besides, Fig. 16 shows the trend of the loss function of the two methods in the iterative process. It can be concluded that SCNN-lite always performs better than YOLOv3. In detail,the advantage of SCNN-lite over YOLOv3 widens after 10 epoch and the final loss function value reaches 13.252,in comparison with the figure of 23.861 of YOLOv3.
Table 3 Comparison of detection result of SCNN-lite and YOLOv3.
Fig. 16 Curves of training loss of SCNN-lite and YOLOv3.
This paper has proposed a one-stage spacecraft detection model based on convolutional neural network, named SCNN-lite, which is effective to implement the detection of the target spacecraft in the complex space environment at a cost of small computation complexity.It can also achieve rapid detection while processing high-resolution images.
A hybrid model composed of the residual network and the deep separable convolution module is constructed as the backbone network,which achieves the compression of a large number of model parameters without decreasing detection accuracy. Then, to accelerate the regression process of the bounding box, we build the detection head with multi-layer feature maps, and insert prior boxes of different sizes into the detection head.Finally,a comprehensive loss function considering the center position, width and height, confidence and category confidence of the bounding box is established to optimize the entire network.
The experimental results show that the proposed method can accurately detect multiple targets under different viewing angles and different scales in complex background. Besides,SCNN-lite is competent in the detection task under Gaussian noise with the coefficient less than 2, and able to detect the image with the brightness in the range of (-80,+80) accurately. In the comparison experiment of SCNN-lite and YOLOv3, the mean accuracy precision of SCNN-lite reaches 93.28% when compared with 81.10% of YOLOv3. In fifty epochs, the loss value of SCNN-lite is 13.252,comparing with 23.861 of YOLOv3. Moreover, the size of SCNN-lite is reduced by more than 50% compared with YOLOv3, which proves SCNN-lite has greater possibilities to be deployed into mobile devices in the future.
In future work, we will carry out further research on data expansion and reinforcement learning for specific scenarios to increase the types of spacecraft that can be detected. In addition, we will also carry out distributed learning optimization to achieve localization applications on mobile devices.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This study was supported by the National Natural Science Foundation of China (No. 61473100)
Conflict of interest statement
The authors declare that there is no conflict of interests regarding the publication of this article.
CHINESE JOURNAL OF AERONAUTICS2022年11期