Xiaoyuan YANG,Ridong ZHU,Jingkai WANG,Zhengze LI
School of Mathematics and Systems Science,Beihang University,Beijing 100083,China
KEYWORDS Correlation filter;Discrete Fourier transform;Least squares;Object tracking;Unmanned aerial vehicle
Abstract This paper addresses the problem of real-time object tracking for unmanned aerial vehicles.We consider the task of object tracking as a classification problem.Training a good classifier always needs a huge number of samples,which is always time-consuming and not suitable for realtime applications.In this paper,we transform the large-scale least-squares problem in the spatial domain to a series of small-scale least-squares problems with constraints in the Fourier domain using the correlation filter technique.Then,this problem is efficiently solved by two stages.In the first stage,a fast method based on recursive least squares is used to solve the correlation filter problem without constraints in the Fourier domain.In the second stage,a weight matrix is constructed to prune the solution attained in the first stage to approach the constraints in the spatial domain.Then,the pruned classifier is used for tracking.To evaluate proposed tracker's performance, comprehensive experiments are conducted on challenging aerial sequences in the UAV123 dataset.Experimental results demonstrate that proposed approach achieves a state-ofthe-art tracking performance in aerial sequences and operates at a mean speed of beyond 40 frames/s.For further analysis of proposed tracker's robustness,extensive experiments are also performed on recent benchmarks OTB50,OTB100,and VOT2016.
The intelligence of Unmanned Aerial Vehicles (UAVs) is becoming an important research area due to wide applications in commercial,industrial,and military fields.Equipping UAVs with capabilities of automated detection,tracking,and recognition are the three key steps for UAV intelligence.Recently,some works have been done to study computer vision-based UAV applications,such as moving object detection in aerial videos,1road detection and tracking in UAV videos,2benchmark and simulator for UAV tracking3, and motion models for UAV tracking.4As an important problem in computer vision,object tracking has attracted great attention over the last two decades,and a number of tracking algorithms have been proposed.5-6Despite demonstrated success,it remains a challenge to design a real-time tracker that is robust to various realistic scenarios such as camera motion, nonrigid deformation,and partial and full occlusions.
Numerous trackers have been proposed for real-time tracking in recent years.Comaniciu et al.7employed the mean shift iteration which has low computational complexity to track a target in real-time.Grabner et al.8proposed a real-time online AdaBoost feature selection algorithm to choose the most discriminative features for tracking. Sevilla-Lara and Learned-Miller9introduced a method based on distribution fields to address the problem that blurring an image destroys image information,and then applied this method into tracking.Zhang et al.10proposed a compressive tracking algorithm using the compressive sensing theory to reduce the dimension of a feature space.In a compressed low-dimensional feature space,their tracker ran in real-time,and obtained robust tracking performance.
Recently,correlation filter-based tracking algorithms have achieved excellent performance in terms of tracking accuracy and speed,which arouse great interest in the tracking community.Bolme et al.11proposed a Minimum Output Sum of Squared Error(MOSSE)filter,which is a single-channel correlation filter model that uses the single-channel feature to train the filter.It has a closed-form solution which can be computed efficiently in the Fourier domain.However,the application of the single-channel feature limits the capability of tracking.In order to solve this problem,Henriques et al.12introduced a variant of the correlation filter model called Kernelized Correlation Filter(KCF)to incorporate multiple feature channels.Danelljan et al.13presented a Discriminative Scale Space Tracker(DSST)to handle both multiple feature channels and scale estimation.Although these methods achieve outstanding tracking performance, there still exist some limitations.Firstly,in the training process,they just use samples in the current frame.Secondly,they both apply circular samples to train the filter which can lead to a boundary effect. To handle these limitations, Danelljan et al.14introduced a spatial regularization approach to penalize correlation filter coefficients and solve this problem through the Gauss-Seidel iteration method.Despite that this method achieves a state-of-the-art tracking performance,its computational complexity and space complexity are high,which limits the tracking speed.
In order to improve the performance without losing the efficiency of the correlation filter,several attempts have been made.Mueller et al.15proposed a framework that takes global context into account when training the filter.This framework provides a strategy to boost the performances of most correlation filter-based trackers while maintaining their high speeds.Lukezic et al.16introduced a formulation with a spatial reliability map and channel reliabilities to reduce the boundary effect of the correlation filter, and significant improvement was achieved while running in real-time.After revisiting the issues of over-fitting and computational complexity in correlation filter formulation, Danelljan et al.17proposed a novel formulation to improve the robustness and efficiency by reducing the number of parameters,the number of samples in learning,and the frequency of model update.
Recently,several methods have been proposed to exploit the power of convolutional neural networks for real-time tracking.Held et al.18proposed a method to train a simple feed-forward network offline,and used it to track generic objects without online fine-tuning.Bertinetto et al.19trained a novel fully-convolutional Siamese network offline on a large video dataset for tracking. Although without any model update during tracking,this method achieved a state-of-theart tracking performance at a real-time speed in a Graphics Processing Unit(GPU)device.
In UAV tracking,the effectiveness and efficiency of a tracking algorithm must both be considered. Inspired by the correlation filter,we consider object tracking as a classification problem which can be formulated as a large-scale least-squares problem, and pose it as a correlation filter problem with constraints.We present a fast method to solve the new problem by two stages.In the first stage,the correlation filter is posed as a series of least-squares problems in the Fourier domain.According to the samples update strategy,we apply the Recursive Least-Squares(RLS)algorithm to efficiently obtain the solution of the correlation filter without constraints, which just needs one iteration when updating samples.In order to handle data saturation accumulated during recursive least-squares iteration,two classifiers are trained to modify the errors.In the second stage,a matrix is constructed to prune the solution obtained in the first stage to approach the constraints in the spatial domain.Benefiting from the two fast stages,the large-scale least-squares problem can be approximately solved with low computation complexity.Applying this method,we realize a real-time tracking algorithm which achieves a state-ofthe-art tracking performance in a UAV tracking dataset and several standard tracking benchmark datasets, including OTB50,6OTB100,20and VOT2016.21
The main contributions of this paper can be summarized as follows:
(1)We present a method applying the correlation filter to solve a specific large-scale least-squares problem.Applying the RLS algorithm and a pruning method,the leastsquares problem is approximately solved efficiently.
(2)A method is proposed to overcome data saturation in the RLS algorithm,which is achieved through training two classifiers of which one is used for tracking and the other for modifying.
(3)We present a real-time tracking algorithm that achieves a state-of-the-art tracking performance on the UAV123 dataset and several challenging benchmark datasets.
In order to achieve robust UAV tracking,we draw a large number of samples to train a classifier. Samples can be expressed as {xi,k,yi,k,αk}(i=1,2,···,I;k=1,2,···,K),where xi,k∈Rd×1is the sample feature,yi,k∈R is the sample label,and αk∈R is the sample weight.K is the number of sampling and I is the number of samples in each sampling.The training process is formulated as
where
and w ∈Rd×1is the classifier.
In this work,the dimension of the feature space and the number of samples are high.It is difficult to solve this kind of large-scale least-squares problem in real-time.However,when we take an exhaustive sliding window approach to extract samples,the training process in Eq.(1)can be reformulated as
As the correlation operation can be efficiently computed in the Fourier domain,the unconstrained model is equivalently posed as the following expression:
According to the definitions of the Frobenius norm and the Hadamard product,Eq.(3)can be identically expressed as M×N subproblems.We denote all problems as a set Φ={(m,n)|1 ≤m ≤M,1 ≤n ≤N},where each element(m,n)corresponds to a subproblem.Each subproblem can be formulated as the following matrix-form expression:
As can be observed,the least-squares problem in the spatial domain is posed as a series of least-squares subproblems in the Fourier domain.
We solve the problem in Eq. (1) through two steps.In the first step, we solve the unconstrained problem in Eq.(4).
Samples for training need to be updated because the appearance of the target and the background always change during tracking. However, samples arise sequentially so that we cannot access them all at once. When samples in a new frame come,we should choose a learning rate to learn the new information so that the classifier can adapt to the appearance change.Then,the new subproblem in the Fourier domain for the training process can be formalized as
According to the recursive least-squares formula,22the optimal solution of Eq.(5)can be efficiently computed when we have obtained the optimal solution of Eq.(4).
Fig.1 Correlation filter model.
Similarly,after we obtain PK-1,PKis computed using the same formula as Eq.(7)instead of usingwhich needs to find the inverse matrix of B*B, because the computational complexity of the inverse is quite high.As can be seen,the training process during tracking can be achieved by Eqs. (6) and (7) once the values of P0andare given.The parameters P0andare set based on a priori knowledge.22One requirement is that P0must be positive-definite,which can ensure that PKis positivedefinite.This property is used to ensure that the iterative process is rational and practicable. In practice, we setandwhere I is an identity matrix and λ is a regularized parameter.
In the following,the detail of the solution of Eq.(3)is presented.The solution procedure accompanies with the tracking process.In the initial process(the first update),where K=1,the formulation of Eq.(3)is
According to the proposed recursive algorithm,the solution of each subproblem can be written as
Then we can obtain the optimal solution of Eq.(8)through the relationship betweenandFor next update,we should compute P1for each subproblem as
In the tracking process,the number and the weights of samples in Eq.(3)change over time due to model updates.In the second update,the formulation of Eq.(3)becomes
The solution of this new problem can be solved based on the solution of Eq.(8).According to Eqs.(5)and(6),the solution of the corresponding subproblem is given by
Then the optimal solution of Eq.(11)can be obtained through the relationship betweenandis computed for next update by
In the n-th update,where K=n,formulation of Eq.(3)is
Similar to the previous procedure,the solution of the corresponding subproblem is computed by
Hence,the optimal solution of Eq.(14)can be obtained through the relationship betweenand.Meanwhile,Pnis computed for next update by
As described above,the sub-problem of Eq.(3)can be efficiently solved by one iteration after every model update.
Although the RLS method is efficient to update the previous model on the basis of new data rather than to restart from scratch, it suffers a problem called data saturation which is caused by the accumulated error produced in the iteration process and leads to losses of precision and accuracy.Therefore,the solution will be unreliable after a certain amount of iterations.In order to handle this problem,two classifiers are trained in proposed tracking algorithm.One is used for tracking which is called Tracking(Tr)classifier,and the other is used for modifying which is called Modifying (Mo) classifier. When Tr classifier is updated t1times via RLS iteration,Mo classifier will be activated and trained via the RLS method. When Mo classifier is updated t2times,the value of Mo classifier will be assigned to Tr classifier, and then Mo classifier will be initialized and inactivated. This process will be repeated during tracking.
In the second step,we deal with the constraints in Eq.(2).In the above solving process,we do not consider the constraints,therefore the solution does not satisfy the constraints.In order to handle this limitation,we construct a matrix W to prune the solution obtained in the previous step.W is defined as
Let the solution of Eq.(3)beThen,the pruning process is
In a new frame,after extracting the feature of the searching region Tl,the detection process can be fast obtained by
where R is a matrix about the classification scores.The location of a target can be searched according to the location of the maximum value in R.
In this section,extensive experiments are conducted to evaluate the performance of proposed tracking algorithm on aerial video dataset UAV123.3UAV123 is an aerial video dataset for UAV target tracking,which consists of 123 fully annotated High-Definition(HD)video sequences captured from aerial perspective. Evaluation methodologies are precision plots and success plots.6A precision plot is drawn according to the Center Location Error(CLE),and a success plot is drawn according to the Intersection Over Union(IOU).The curves of the two plots show the rates of successfully tracked frames at a series of CLE and IOU thresholds.The two plots show a comprehensive evaluation of the tracking algorithm.Two measures for ranking trackers are the Distance Precision(DP)which is the precision score at the threshold equaling to 20 pixels of each tracker's precision plot and the Area Under the Curve(AUC)of each tracker's success plot.6,20For further analysis of the tracking robustness and accuracy of the proposed method, experiments are also performed on datasets OTB50,6OTB100,20and VOT2016.21The evaluation measures on OTB50 and OTB100 are the same as that on UAV123.For the VOT2016 data set,three primary measures are used to evaluate the tracking performance:accuracy,robustness,and expected average overlap.21,23The accuracy is the average overlap between the predicted and ground truth bounding boxes.The robustness is computed by the number of tracking failures.The expected average overlap is used to estimate the average overlap that a tracker is expected to attain on shortterm sequences.
Fig.2 Quantitative evaluation based on precision plots and success plots of OPE and SRE on UAV123.
In UAV tracking experiments,parameters are set to be the same in all video sequences for fair comparisons.The search area is set as 5×5 times a target's size.In scale evaluation,3 scaled candidate targets are set to find the target's size.The scale parameters are 0.98,1.00,and 1.02.The learn rate α=0.02.The parameter λ is set as 0.01.The values of t1and t2are set to be 150 and 20,respectively,which are determined by experiments.The features used in the experiments are the Histogram of Oriented Gradients(HOG)24and Color Names(CNs).25The channels of the HOG are compressed to 15,and the channels of CNs are compressed to 6 by Principal Component Analysis(PCA).The orthonormal basis of PCA is computed in the first frame and then maintained throughout the sequence.
3.1.1.Comparison on overall sequences
We provide a comprehensive comparison between proposed tracker and 13 recent state-of-the-art trackers, including SRDCF,14MEEM,26MUSTER,27DSST,13Struck,28ASLA,29IVT,30TLD,31MOSSE,11CSK,32OAB,8KCF,12and SAMF.33Results on overall 123 video sequences of UAV123 are shown in Fig.2.In precision plots,the ranking score of proposed tracking algorithm is top among these trackers.Proposed algorithm obtains significant improvements by 1.9%and 3.4%than the second as depicted in the precision plots of OPE and SRE in Fig.2.In success plots,the ranking is based on the AUC.Proposed tracker obtains the second best performance in OPE plots which just has a very small gap with the first,while the top performance in SRE plots.
3.1.2.Qualitative comparison
In order to vividly illustrate the tracking results,we select several challenging sequences to present the performance of proposed tracker on the UAV123 data set.
Fig.3 illustrates the tracking results of proposed tracker and the ground truth.The green bounding box is the ground truth,and the red bounding box is the predicted result of the proposed tracker.In each sequence,the first row is the ground truth of the object.The number in the top left corner of each picture represents the frame index.In the Car4 sequence,the target is a car which is traveling along the road.At about frame 400,the car is occluded by the traffic sign.In the Group1_1 sequence,during tracking the person,the main challenges are camera motion and background clutter.As can be observed,proposed tracker exhibits excellent tracking performance in the two sequences.
Fig.4 depicts the results of the proposed method(red rectangle),SAMF(blue rectangle),and SRDCF(yellow rectangle),where the first row is the ground truth of the object in each sequence.In the UAV1_1 sequence,three trackers can well track the UAV at the initial stage.At about frame 300,SAMF and SRDCF lose the target due to its fast motion,while proposed tracker is still able to locate the UAV.After frame 500,proposed tracker can still track the target while cannot estimate the target's size.In the UAV5 sequence,the target is a UAV which has a low resolution that leads to the tracking failures of SAMF and SRDCF at the beginning.Attributed to the proposed method and the application of HOG and CN features,proposed approach achieves accurate tracking.In the Wakeboard2 sequence,the task is to track a person who is playing wakeboarding.As illustrated,the tracking process encounters challenges of viewpoint change and fast motion.Despite these challenges,proposed tracker is able to correctly follow the target.
3.1.3.Attributes-based comparison
Fig.3 Tracking results on Car4 and Group1_1 sequences.
Fig.4 Tracking results comparison.
It is very necessary to assess the performance of a tracker in different scenarios,because the tracking task always encounters one or several specific challenging factors in practice.The sequences in the UAV123 dataset are labeled with various attributes,which provides a convenience for an analysis based on attributes.Comparison results with respect to 8 attributes are reported in Fig.5.The value appears in parentheses of each subgraph title is the number of sequences in that attribute.As illustrated,proposed tracker outperforms the other 13 trackers in 7 out of 8 attributes.Moreover,the proposed method performs quite well on camera motion and low-resolution videos,which are two important aerial tracking scenarios.The precision scores of proposed tracker outperform the second best scores by over 4.4%and 3.8%in regards to camera motion and lowresolution attributes,respectively.The results demonstrate the robustness of the proposed method in different challenges.
3.1.4.Comparison on speed
Proposed MATLAB implementation is carried out on a laptop computer with an Intel Core i7-6700hq 2.6-GHz CPU and 4-GB RAM.Table 1 reports comparisons between proposed tracker and the other trackers with respect to the mean speed,the DP score,and the AUC score.The mean speeds of the other trackers are reported in the UAV benchmark.3As can be observed,proposed tracker operates at an average speed of 41 frames/s,which achieves real-time tracking.Compared with the other trackers,proposed method achieves a state-ofthe-art performance in terms of accuracy and efficiency.These results reveal that the proposed tracker is suitable for UAV tracking.
Fig.5 Quantitative evaluation based on precision plots of different trackers in 8 attributes on UAV123.
Table 1 Evaluation on UAV123 by means of DP,AUC,and mean FPS.
In this section,we compare proposed tracker against several state-of-the-art trackers on OTB50,6OTB100,20and VOT201621datasets.Fig.6 illustrates comparison results on OTB50 and OTB100 datasets between proposed tracker and the other trackers,including Staple_CA,15CSR-DCF,16Staple,34SRDCF,14SAMF,33KCF,12DSST,13LCT,35MEEM,26SiamFC,19TGPR,36Struck,28SCM,37and TLD.31
Among these trackers,proposed tracker achieves top performance with respect to precision plots on OTB50,precision plots on OTB100,and success plots on OTB100.SRDCF,Staple_CA,and CSR-DCF are recent state-of-the-art trackers designed to reduce the boundary effect of the correlation filter.As shown in Fig.6,the proposed tracker achieves better or comparable results than those of these trackers.In terms of efficiency,the proposed tracker operates at over 40 frames/s on a CPU,while the reported speeds of SRDCF,CSR-DCF,and Staple_CA are 5, 13, and 35 frames/s, respectively.SiamFC is a convolutional neural network-based tracker,which can run over 80 frames/s.It is efficient because the tracker does not perform any model update and runs on a high-performance GPU device.However,no model update also leads to some loss of accuracy.
Results on VOT2016 are reported in Table 2,by comparing proposed tracker with Staple,34SRDCF,14SAMF,33KCF,12DSST,13TGPR,36Struck,28MIL,38HCF,39IVT,30DAT,40and STC.41As can be observed in this table,the proposed method obtains high accuracy and robustness scores.In terms of expected average overlap,the score of proposed tracker exceeds the VOT2016 state-of-the-art bound.
Fig.6 Precision plots and success plots of OPE on OTB50 and OTB100.
Table 2 Results on the VOT2016 dataset.
All of these results have demonstrated the effectiveness and efficiency of the proposed algorithm.The proposed algorithm exploits the advantage of the correlation filter,which can draw dense samples to learn,to train a robust classifier.However,most correlation filter-based trackers suffer from the boundary effect,which reduces the tracking performance.The proposed method uses the process of pruning to alleviate the boundary effect of the correlation filter.The proposed update strategy ensures that multiple samples in previous frames are used to train the current classifier.Therefore,the classifier is robust to appearance changes.The efficiency of the algorithm benefits from the proposed recursive solution process and the modifying strategy which is used to avoid data saturation due to multi-step recursion.
In this paper,we propose a novel tracking algorithm based on recursive least squares.Through a transformation from the spatial domain to the Fourier domain,we simplify the largescale least-squares problem and fast solve its approximate solution via two steps.In Proposed tracker,two classifiers are trained to deal with data saturation of RLS,which makes the tracker robust for tracking.In UAV tracking experiments,results show that Proposed tracking algorithm can achieve a state-of-the-art tracking performance in real-time.The results and analysis imply that the proposed method is suitable for UAV tracking.Experiments on other datasets also demonstrate the effectiveness of the proposed approach.
In this work,we only use the hand-craft feature.In future work,we will investigate the deep CNNs feature.This feature is expected to achieve a higher tracking performance.
Acknowledgement
This work was supported by the National Natural Science Foundation of China(No.61671002).
CHINESE JOURNAL OF AERONAUTICS2019年7期