亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Loop Closure Detection With Reweighting NetVLAD and Local Motion and Structure Consensus

        2022-06-25 01:18:12KainingZhangJiayiMaandJunjunJiang
        IEEE/CAA Journal of Automatica Sinica 2022年6期

        Kaining Zhang, Jiayi Ma,, and Junjun Jiang,

        Dear Editor,

        Loop closure detection (LCD) is an important module in simultaneous localization and mapping (SLAM). In this letter, we address the LCD task from the semantic aspect to the geometric one.To this end, a network termed as AttentionNetVLAD which can simultaneously extract global and local features is proposed. It leverages attentive selection for local features, coupling with reweighting the soft assignment in NetVLAD via the attention map for global features. Given a query image, candidate frames are first identified coarsely by retrieving similar global features in the database via hierarchical navigable small world (HNSW). As global features mainly summarize the semantic information of images and lead to compact representation, information about spatial arrangement of visual elements is lost. To provide fine results, we further propose a feature matching method termed as local motion and structure consensus (LMSC) to conduct geometric verification between candidate pairs. It constructs local neighborhood structures of local features through motion consistency and manifold representation, and formulates the matching problem into an optimization model, enabling linearithmic time complexity via a closed-form solution. Experiments on several public datasets demonstrate that LMSC performs well in feature matching, and the proposed LCD system can yield satisfying results.

        Related work: LCD aims to lessen the cumulative error of the pose estimation in the SLAM system by identifying reobservations during the navigation [1]. It is achieved by 1) first searching a connection between the current and the historic observations and 2)then regarding the recovered SE(3)/Sim(3) pose as a constraint to optimize the pose graph. In this letter, we focus on the former step in the visual SLAM system. Namely, we mainly solve how to find reliable image pairs to constitute loop-closing pairs.

        The first step of LCD is commonly studied as an image retrieval task. But differently, the reference database in LCD is incremental while the size of that in image retrieval is generally fixed for a short period of time. In this step, it is important to determine how to generate a global descriptor for image representation. Notable early global image descriptors are dominated by keypoint detection coupling with aggregation of associated local descriptors, such as BoVW [2], VLAD [3] or ASMK [4]. These approaches rely on a visual dictionary, which can be trained off-line [5], [6], or on-line [7], [8].Compared with the off-line manner, on-line ones are more sceneagnostic and become increasingly popular during recent years. In these aggregation-based approaches, inverted index [6] or voting [8], [9]technique is commonly exploited to accelerate the searching process.Recently, deep approaches gain increasing popularity in the context of image representation. These CNN models are trained based on ranking triplet [10], [11] or classification [12] losses, which can acquire deep semantic information of images and perform well even with large viewpoint changes. It is demonstrated that LCD approaches built on deep global representation can yield good performance [13]-[16]. In these approaches, HNSW [17] is always selected as a technique for searching acceleration.

        After candidate pairs being identified in the first step, a geometric verification process is followed to guarantee precision. It is roughly achieved by first building the putative set based on the similarity between local descriptors and then rejecting false matches (i.e.,outliers). Later, the preserved true matches (i.e., inliers) are exploited to recover fundamental or essential matrix. Little LCD literature has discussed this step, and most approaches use RANSAC [18] to achieve it. It assumes the transformation between image pairs is rigid, and the parametric model can be acquired by alternating between sampling and verification. RANSAC is vulnerable to dominated outliers and non-rigid deformation. Numerous non-parametric approaches have been extensively studied to address this issue,ranging from graph matching [19], generalized geometric constraints [20] to locality consistency assumptions [21], [22].

        The weights of common parts between our AttentionNetVLAD and DELG are initialized by the official pre-trained model (R50-DELG)and the whole AttentionNetVLAD is trained via knowledge distillation [14]. Specifically, the teacher network is VGG16+NetVLAD and has been released in [10]. We train the student by minimizing the mean square error loss between its predictions and target global descriptors exported by the teacher on GLDv2 [23]. The number of cluster centersJis set to 64, and an FC layer is introduced additionally to make the dimension of the output of the student network equal to that of the teacher network (4 096). Training details involving the input image size, batch size, learning rate, etc. are all referred to [12]. In this way, the new model can inherit the property of the teacher, i.e., being able to capture time-invariant visual clues at a high structural level, meanwhile with the training time being reduced drastically.

        Fig. 1. AttentionNetVLAD and its training process.

        Fig. 2. Illustration of local inter-distance and intra-distance in LMSC. The example is shown in an ideal situation, i.e., no outliers existing in the local neighborhood ( K =4 ) . In the source image, xinlier , xoutlier, Nxinlier and Nxoutlier are presented. In the reference image, yinlier , youtlier, Cxinlier and Cxoutlier are presented.

        Table 1.Dataset Information. “# Images” Means the Number of Images

        Fig. 3. The choice of the optimal K.

        Fig. 4. Quantitative results of LMSC. From left to right: Precision, Recall, Fscore and Runtime (ms) with respect to the cumulative distribution. The coordinate (x,y) on the curves means that there are 100*x percent of image pairs which have precisions, recalls or runtime no more than y.

        Experimental setup: We implement AttentionNetVLAD with TensorFlow, and run on an Intel(R) Core(TM) i9-9920X CPU @3.50 GHZ machine with three TITAN RTX GPUs. The information of six sequences selected for evaluation is presented in Table 1. The frame rates of St1410 and St1545 are downsampled to 3 Hz, meanwhile the right measurements of NC with the frame rate of 1 Hz are adopted.Ground truth (GT) which is present in the form of binary matrices and preserves the image-wise correspondence of datasets is provided by Anet al.[13].

        Results on feature matching: We select 33 loop-closing pairs according to GT from the datasets shown in Table 1 to evaluate LMSC. The putative set of each image pair is established based on SIFT [27], followed by GT generation through manual check of each putative match, which results in the average number of putative matches and the inlier ratio are 345 and 51.28%, respectively. As Fig. 3 shows, the runtime of LMSC scales up with the increase ofKdue to the linearithmic time complexity and F-score =(2·Precision·Recall)/(Precision + Recall) in the left plot is applied for comprehensive evaluation in feature matching. On the whole,K= 12 outperforms other cases with a relatively low time cost, thus we choose it for subsequent experiments.

        We report the quantitative performance of LMSC in Fig. 4,involving six feature matching methods (i.e., RANSAC [18], GS [19],GMS [22], ICF [28], RFM-SCAN [29] and LPM [21]) for comparison. It can be seen that compared with Precision, LMSC is better at Recall, and outperforms other methods in F-score with relatively high efficiency.

        Results on loop closure detection: We adopt maximum recall rate at 100% precision (MR) to evaluate the performance of LCD. Firstly,we perform it based on DELG, NetVLAD (teacher) and AttentionNetVLAD (student) respectively, along with LMSC for geometric check. Results are shown in Table 2. DELG is tailored for the instance recognition task, thus the performance of it degenerates with the similar structures occured in LCD scenes. NetVLAD is oflow efficiency due to the requirement of additional local feature extraction. Its coupling with AL outperforms that with SURF may be caused by the fact that SURF cannot ignore redundant information in scenes [26], which reduces the accuracy of feature matching.Through training with knowledge distillation, our AttentionNetVLAD can at the same time learn prior knowledge of perceptual changes in NetVLAD, and be efficient enough for real-time requirements.

        Table 2.LCD Results on Different Feature Extraction Approaches. “AL”Means the Local Part in AattentionNetVLAD

        Secondly, on the basis of AttentionNetVLAD, different feature matching methods involving RANSAC, LPM and LMSC are embedded into our LCD pipeline, and the results are shown in Table 3.Records of runtime in the table exclude the process of putative set construction. Obviously, the pipeline with LMSC yields better performance on MR than RANSAC and LPM. This is because RANSAC performs poor when resampling in putative sets with low inlier ratio, while LPM cannot separate inliers from relatively lowprecision noisy matches due to its weak local geometric constraint.When the process of geometric verification is considered, the average runtime of our system on the dataset with the maximum image resolution (K00) is about 116.14 ms/frame.

        Table 3.LCD Results on Different Feature Matching Approaches

        Finally, we report the comparative results of our method with stateof-the-art LCD approaches in Table 4. The results of ESA-VLAD[14] and Zhanget al.are cited from [26], while others are cited from[13]. Overall, our pipeline has satisfying performance on all datasets. Albeit our results are marginally worse than Zhanget al.[26],the runtime of our pipeline is about one time faster than them.Meanwhile, they need a pretrained visual dictionary to conduct candidate frame selection, while ours is completely operated online.

        Table 4.Comparative Results

        Conclusions: In this work, we conduct LCD in a semantic-togeometric, coarse-to-fine manner. We first propose Attention-NetVLAD to achieve global and local feature extraction simultaneously. The global feature is used to perform candidate frame selection via HNSW, while the local one is exploited for geometric verification via LMSC. LMSC is the proposed feature matching method, which is able to identify reliable matches efficiently by imposing from-weak-to-strong local geometric constraints. Based on the above two components, the whole LCD system is at the same time high-performance and efficient compared with state-of-the-art approaches.

        Acknowledgments: This work was supported by Key Research and Development Program of Hubei Province (2020BAB113), and the Natural Science Fund of Hubei Province (2019CFA037).

        国产午夜精品av一区二区麻豆| 完整在线视频免费黄片| 亚洲av色香蕉一区二区三区蜜桃 | 91精品国产91久久久无码95| 91久国产在线观看| 亚洲免费视频一区二区三区| 久久久国产熟女综合一区二区三区| 国产成人一区二区三区影院| 欧美疯狂性受xxxxx喷水| 国模精品一区二区三区| 艳妇臀荡乳欲伦交换在线播放| 免费无码中文字幕A级毛片| 欧美日韩国产在线成人网| 麻豆成年视频在线观看| 日本频道一区二区三区| 久久不见久久见www日本网| 成人国内精品久久久久一区| 青青久在线视频免费观看| 无码人妻精品一区二区三18禁 | 在线精品亚洲一区二区三区| 亚洲情久久久精品黄色| 一区二区中文字幕在线观看污污 | 国产精品原创永久在线观看| 亚洲av熟女天堂系列| 日本人妻系列一区二区| 精品一区二区三区蜜桃麻豆| 欧美成人精品第一区| 日韩人妻无码精品久久免费一 | 蜜桃夜夜爽天天爽三区麻豆av| 午夜国产精品视频在线观看| 国产特黄级aaaaa片免| 久久无码av三级| 日韩中文字幕无码av| 国产精品久久婷婷免费观看| 欧美老肥妇做爰bbww| 国产精品黄在线观看免费软件| 国产精品毛片无码久久| 亚洲av第一区综合激情久久久| 经典三级免费看片天堂| 亚洲精品无码久久久久去q| 免费无码av片在线观看网址|