XU Gongguo,SHAN Ganlin,and DUAN Xiusheng
1.Department of Electronic and Optical Engineering,Shijiazhuang Campus of Army Engineering University,Shijiazhuang 050003,China;2.School of Mechanical Engineering,Shijiazhuang Tiedao University,Shijiazhuang 050013,China
Abstract:Continuous and stable tracking of the ground maneuvering target is a challenging problem due to the complex terrain and high clutter.A collaborative tracking method of the multisensor network is presented for the ground maneuvering target in the presence of the detection blind zone(DBZ).First,the sensor scheduling process is modeled within the partially observable Markov decision process(POMDP)framework.To evaluate the target tracking accuracy of the sensor,the Fisher information is applied to constructing the reward function.The key of the proposed scheduling method is forecasting and early decision making.Thus,an approximate method based on unscented sampling is presented to estimate the target state and the multi-step scheduling reward over the prediction time horizon.Moreover,the problem is converted into a nonlinear optimization problem,and a fast search algorithm is given to solve the sensor scheduling scheme quickly.Simulation results demonstrate the proposed nonmyopic scheduling method(Non-MSM)has a better target tracking accuracy compared with traditional methods.
Key words:sensor scheduling,ground maneuvering target,detection blind zone(DBZ),decision tree optimization.
Tracking the ground target is much more difficult than the aerial target due to the complex terrain,high clutter,and strong target maneuvering.Therefore,the ground target detection sensors,such as ground moving target indicator(GMTI)and ground-based radar,mostly employ the pulse Doppler technology to eliminate the influence of ground clutters on the moving target indication[1–3],which has good detection performance for low-altitude targets[4].However,this technology always set a detection threshold of minimum radial velocity of the target.It will cause a detection blind zone(DBZ)for the sensor when the target radial velocity is smaller than the detection threshold,which is called Doppler blindness[5,6].Unlike the aerial target,the slower motion speed of the ground target make itself easy to become Doppler blindness for the detection sensor.Besides,the ground battlefield environment is more complex.The detection line of sight of the sensor may be occluded by mountains or tall buildings[7].In this case,the target also cannot be detected by the detection sensor,which is called vision blindness.The above two kinds of DBZs can lead to the loss of target state,even track breakage and new track initiation.The quality of battle field intelligence information will be seriously damaged.
Besides,different from traditional tracking scenes by a single sensor,multi-sensor collaborative working has been the development direction of military research[8].As the main application,collaborative tracking by the multisensor network has received considerable attention from various aspects,which can effectively improve the tracking performance of the reconnaissance system[9,10].However,for the complex problem of multi-sensor scheduling,the task requirements cannot be satisfied through manual decision-making.It is necessary and important to study the sensor management methods.
Up to now,the common sensor management methods include the covariance matrix based method[11,12],the information gain based method[13–15],and the operation risk based method[16–20].For instance,Kalandros[11]proposed a multi-sensor selection method to improve the capability of the tracking platform,which selects sensor combinations based on the difference between the desired covariance matrix and the predicted covariance.Zhou et al.[12]proposed two control methods of the mobile sensor based on the trace and determinant of the covariance matrix.For wireless sensor network,Keshavarz et al.[13]proposed a sensor selection method based on the Fisher information.Salvagnini et al.[14]studied the scheduling problem of camera sensors based on the information theory.
Recently,to improve the battle field survivability of the radar network,Shi et al.[15,16]proposed two sensor scheduling methods with a low probability of interception.Moreover,a novel intercept probability factor was presented for multi-target tracking tasks in[17].Zhang et al.[18,19]proposed a non-myopic sensor-scheduling method to select and assign active sensors for trading off the tracking accuracy and the radiation risk.Unlike Zhang,Marcos et al.[20]introduced a statistical risk based sensor management method in the target classification and tracking scenarios to reduce the potential operational risk.
However,most of the above sensor scheduling methods focus on the aerial target,while there are few research on the ground target considering the DBZ.In addition,the sensor model in the above research is simple,which always ignores the sensor dwell time.In order to maximize the tracking benefits,the switching frequency of the detection sensor is too fast,which is not in line with the actual situation and will consume more energy.
Based on the above analysis,to improve the tracking stability and tracking continuity of the ground maneuvering target in the complex environment,a non-myopic scheduling method(Non-MSM)of the multi-sensor system for the ground maneuvering target in the presence of DBZ is presented in this paper.Compared with the existing sensor scheduling methods,this method can be aware of DBZ in advance and maintain better target tracking performance of the ground maneuvering target all the time.
The rest of this paper is organized as follows.The scheduling problem is formulated within the partially observable Markov decision process(POMDP)framework in Section 2.The detailed scheduling process of the Non-MSM is presented in Section 3.Besides,the modified prediction method of the target state is given in Section 4.The scheduling problem is converted into an optimization problem in Section 5.Finally,simulation results are presented in Section 6,and conclusions are drawn in Section 7.
POMDP is a mathematical model for decision analysis of the agent with partially observable and random environments.It has been successfully applied in many fields,such as path planning[21,22],system management[23–25],and text recognition[26,27].Here,the POMDP model is suitable to deal with the scheduling problem because of the unobservable target state.The specific mathematical descriptions within the POMDP framework can be expressed as follows.
Assuming that there areMsensors in the ground reconnaissance system,the scheduling action is denoted asAk=[a1,k,a2,k,...,aM,k]at time stepk,andam,k(1?m?M)represents the action of the sensorm.Specifically,am,k=1 represents that the sensormworks at time stepk;otherwise,am,k=0 represents that the sensormdoes not work at time stepk.Due to the limitation of energy and communication ability of the multi-sensor system,the number of the working sensors at the same time is limited.We define that the number of working sensors cannot exceedma,and there is
The target motion process is described in the two dimensional Cartesian coordinate system.Specially,the system state is denoted asSk=Xkat time stepk,andwherexkandykare target position coordinatesand˙ykare target velocities.Because the real motion model of the maneuvering target is unknown,the target state transition law is given by
whereis the state transition matrix which is selected from a mixed motion model setthe number of motion models;anddenotes the zeromean white Gaussian process noise with the covariance matrixof the motion model.In general,two common models are employed,including the nearly constant velocity(NCV)model and the nearly constant turn(NCT)model.In the two-dimensional Cartesian coordinate system,the state transition matrices of the NCV model and the NCT model can be expressed as
whereδtis the sampling interval,andωis the turn rate.
The multi-sensor system measurement is denoted asat time stepk,wheredenotes the measurement of sensorm.In general,the sensor measurement is obtained in the polar coordinate system with the sensor as the origin.Specially,the measurement value is composed of the target distance,azimuth and radial velocity,which can be written as
where(·)is the measurement function of the sensorm,denotes the zero-mean white Gaussian process noise of the sensormwith the covariance matrix,and,andare the distance,azimuth and radial velocity of the target,respectively,which can be calculated by
whereandare the sensor position coordinates.Further,the system observation law can be expressed as
wherehk(·)denotes the comprehensive measurement function of the multi-sensor system.
To complete the POMDP model,we need a reward function to assess which sensor scheduling scheme can obtain a higher target tracking accuracy.Fortunately,the posterior Carm′er-Rao lower bound(PCRLB)can reflect the sensor tracking performance,which can provide the lower bound of the target state estimation error in the future.According to[17],the PCRLB is the inverse of the Fisher information matrix.Moreover,Fisher information represents the amount of information obtained from the sensor measurement.
Thus,to avoid complex inversion calculations,we apply the Fisher information to assess the reward of the sensor scheduling action.LetJ()be the priori Fisher information matrix of the target at time stepkandbe the posterior Fisher information at time stepk+1.When the target is detected by the sensor,the sensor measurement will be generated,and there is
with
where the symbol?represents the second-order derivative,the symbol E represents expectation,is the estimated state at time stepk,andis the Fisher information gain from the sensor measurementZk+1.When the target is occluded or the motion speed is too slow to be detected,the sensor measurement will not be generated,and the posterior Fisher information is only transferred from the prior information,and there is
Jis a conditional entropy[28].We can split the domain of integration forp(Zk+1|Ak+1)into Ω?and Ω+,where Ω?is the set in which the target cannot be detected and Ω+is the set in which the target can be detected.Further,(10)can be rewritten as
wherep(Zk+1|Ak+1)is the detection probability of the target;J?andJ+are the Fisher information of Ω?and Ω+,respectively.Then,at time stepk,we can use the following equation to calculate the optimal sensor scheduling action
with
whereαk+1(Ak+1)is the weight of the Fisher informationJ+.Next,we study how to calculate the detection probabilityp(Zk+1|Ak+1)for different situations.
To make the sensor scheduling model closer to the actual situation,the detection model is introduced to describe the complex environment.When considering the Doppler effect,the detection probability is related to the target radial velocity[29].The detection probabilityp(Zk+1|Ak+1)can be expressed as
whereVmindenotes the absolute value of the minimum radial velocity that can be detected by the sensor;is the radial velocity of the target relative to the sensor;h()is the Doppler notch function.Assuming the target type is Swerling-I,the factorP0can be given bywhere SNR is the signal to noise ratio,andPFis the false alarm probability.
Moreover,the detection line of sight of the sensor may be occluded by obstacles sometimes.Letβkbe a binary variable which is used to indicate whether the target is occluded.Here,we adopt a simple model,and the detection probabilityp(Zk+1|βk+1,Ak+1)when considering the Doppler effect and obstacles is expressed as
wherep(Zk+1|Ak+1)is calculated by(14).
Traditional sensor scheduling methods mostly depend on the one-step prediction of the covariance matrix or information gain to determine the sensor scheduling action.However,one-step prediction cannot sense the changes of the target and the environment in time.In order to find the DBZ in advance and improve the target tracking accuracy,the Non-MSM is presented on the basis of the multi-step prediction strategy.In this method,the sensor scheduling action is determined not only based on the immediate reward but also the expected reward over a few time steps in the future.Compared with the sensor scheduling methods in[18],the goal of this paper is to improve the target tracking stability in the ground complex environment.The experiment conditions are more realistic,and the sensor cannot be switched at will.
Based on the above analysis,the reward function is denoted asQ,andrepresents the total reward overhtime steps,where?Xkis the starting estimated state of the target.Then,we have
whereAk+1+τis the scheduling action specified by the policyπat time stepk+1+τ,τis the time variable,andγis the discount factor[19].
Further,letπh=[Ak+1,Ak+2,...,Ak+1+h]denote the sensor scheduling actions overhtime steps in the future,then the sensor scheduling problem in this paper can be converted into the following nonlinear optimization problem.
whererepresents the optimal sensor scheduling sequence overhtime steps.
According to(17),we must know the future expected reward before calculating the non-myopic reward.However,the target state in the future cannot be known exactly.The common method is the Monte Carlo method which calculates the non-myopic reward through a lot of repeated experiments.However,it requires a large number of random samples regardless of the prediction time and the target number.
In view of the above problems, a non-myopic reward approximation method is proposed to reduce the computation complexity based on unscented sampling.This method will generate a certain number of particles according to the specific tracking scenario,which can reduce the computational complexity.The prediction method based on unscented sampling is shown in Fig.1 and the specific steps can be briefly described as follows.
Fig.1Prediction method based on unscented sampling
Step 1Assuming that the target estimation state and its covariance matrix areandPk,then 2σ+1 Sigma pointsand their weightscan be given according to(18)and(19).
whereσis the dimension of,λis the scale factor,and[]nrepresents thenth line of the matrix after Cholesky decomposition ofPk.
Step 2On the basis of the Sigma points,transform the sampling points according to(20)and(21),and then get the measurements.
In order to estimate the future target state,it needs to know the target motion model.However,the real motion model of the maneuvering target is unknown.Fortunately,considering the motion continuity of the target,the motion model with the maximum distribution probability at the current time is adopted as the target motion model,that is
whererepresents the distribution probability of motion modeli.
Step 3On the basis ofand,updateby the cubature Kalman filter(CKF)algorithm[30,31],and obtain the filter prediction states.The future Fisher information by one-step prediction can be given by
Then,the future expected reward can be obtained by the multi-step iteration through(23).
In some special cases,the target will become the Doppler blindness for all sensors.It is impossible to update the target state by the CKF algorithm due to the lack of necessary measurements.The target state only can be obtained by prediction,the prediction method can be expressed as
However,if the target is not detected because of the Doppler blindness.the motion state of the target is constrained,that is,the absolute value of the radial velocity of the target cannot be larger than the minimum radial velocityVmin.It can be seen that the Doppler blindness itself is a kind of prior information.Based on the prior information,a modified prediction method of the target state is given.
Assuming that the target shown in Fig.2 becomes the Doppler blindness for the sensor at time stepk+1,it will not be observed at timek+1.The target stateat time stepk+1 can only be got by the prediction method.For the target predicted stateif the absolute value of the radial velocity relative to the sensor is still larger than the detection thresholdVmin,the predicted state needs to be modified.When>Vmin,the modified method is expressed as
whereθk+1is the angle between the detection line of sight and theydirection,and?˙rk+1is the predicted radial velocity.
Fig.2Diagram of modified prediction method
It can be seen from(17)that the computation complexity of the optimization problem is very high,especially when the sensor numberMand prediction horizonhare large.In addition,because the future target state is unknown,it may not be possible to calculate the optimal solution by the mathematical analytic method,and it is also difficult to match the real-time requirements of on-line scheduling.In the proposed Non-MSM,the prediction horizonhis finite.Thus,we can convert the sensor scheduling problem into a decision tree optimization problem.
Using the decision tree optimization algorithm,the filtering results of the previous nodes can be reused.Then,the repetitive complex filtering operation can be avoided,which can effectively accelerate the solving speed,especially for the large-scale sensor network.In this paper,we assume that one target only needs one sensor to track at each time.The depth and branch factors of the decision tree are equal to the length of the prediction time horizon and the number of sensors.In order to further improve the speed of the decision tree search algorithm,two pruning techniques are adopted to cut the branches of the decision tree during the search process.
(i)Pruning based on the sensor dwell time.When considering the sensor dwell time,it can be found that because the sensor has to work for a while before switching,the branches of the decision tree will be pruned a lot.When the sensor number is 2,the prediction time horizonhis 3,and the dwell time is 2,the decision trees with and without considering the sensor dwell time are shown in Fig.3(a)and Fig.3(b),respectively.
Fig.3Decision tree of sensor scheduling problem
(ii)Pruning based on the reward threshold.At first,the greedy search algorithm is used to find a global optimal solution quickly.The total reward value is defined asRmaxand the cut-off factor is defined asη(0<η<1).Then,if the cumulative reward valueRcof the current node satisfies(27)during searching,this node can be deleted,and the nodes after this node also do not need to open anymore.
wherehcis the depth of the current node,his the prediction time horizon,that is,the depth of the decision tree.This pruning strategy intuitively believes that a too small reward value at any depth cannot be a part of the optimal scheduling sequence with the maximum total reward.In summary,the proposed FSA for the sensor scheduling action based on the decision tree with pruning techniques can be described as Algorithm 1.
Algorithm 1FSA
Step 1LetRmax=0,and add the root node in the list.
Step 2Use the greedy search algorithm to find an optimal solutionRmax,and execute Step 3.
Step 3Remove the first node from the list,and open the children node of this node if it satisfies the sensor dwell time.
Step 4If the depthhcof the child node satisfieshc Ifhc Ifhc=handRc>Rmax,chooseRcasRmax. Step 5If the list is not empty,go to Step 2.Otherwise,terminate.Output the sensor scheduling sequence corresponding to the optimal solutionRmax. Assuming that the computational complexity of each node in the decision tree is 1,when the sensor number ismand the prediction horizon ish(h?2),the computational complexity is given byO.Moreover,when the sensor dwell time ist(2?t?h),the computational complexity is given by Obviously,compared with the traditional search algorithm,the computational complexity of the proposed search algorithm will gradually decrease as the sensor dwell time and the prediction time horizon increase.Furthermore,the computational complexity will further decrease by the pruning technology based on the reward threshold. In this simulation,the effectiveness of the proposed Non-MSM for ground maneuvering target tracking is investigated.As shown in Fig.4,four ground target detection sensors are used to track a maneuvering target.The deployment positions of the four sensors are(0.7,3.5)km,(0.7,1.0)km,(3.0,0.5)km,and(3.0,3.0)km.The sensor dwell time is 3 s and the detection thresholdVmin=5 m/s.The standard deviations of range noise,azimuth noise,and radial velocity noise are 1.5 m,0.01 rad,and 0.5 m/s,respectively.The power and energy of the sensor system are always limited.In order to avoid wasting resources,it is assumed that one target only needs one sensor to track each time. Besides,there are four obstacles in the battle field and the locations of the obstacles are shown in Fig.4.The sampling intervalδt=1 s and the simulation time is 100 s.The number of Monte Carlo experiments is 100.The initial position and velocity of the target are(0.5,2.0)km and(30,15)m/s,respectively.The target turns right during 25–40 s,turns left during55–70 s,and maintains uniform motion during other time.The turn rateω=6 rad/s.Three motion models are applied to the filter,which include the NCV model,the nearly left constant turn(NLCT)model and the nearly right constant turn(NRCT)model.The initial distribution probability of motion models isμ=[0.34,0.33,0.33]and the transition probability matrix is Fig.4Diagram of battle field situation The variation of the target radial velocity relative to different sensors is shown in Fig.5.Obviously,for different sensors,there is time when the radial velocity is less than the detection threshold.In other words,the four sensors are unable to detect the target at some time.A single sensor cannot steadily track the target all the time.Thus,it is necessary to study the multi-sensor collaborative tracking method. Fig.5Radial velocities of the target relative to different sensors To clearly analyze the performance of the proposed Non-MSM,the fixed scheduling method(FSM),the nearest scheduling method(NSM),and the MSM[32,33]are used to do comparative experiments. In the FSM,the sensor will not switch during the target tracking process.Sensor 2 is used to track the target all the time because the detection line of sight of sensor 2 is occluded less.In the NSM,the sensor which is closest to the target will be selected to track the target.In the MSM,sensor selection depends on the one-step prediction reward.Unlike MSM,sensor selection depends on the multi-step prediction reward in the proposed Non-MSM,and the prediction time horizonhis 3. Fig.6 shows the root mean square error(RMSE)of the target estimation position by different scheduling methods.It can be seen that the RMSE of the FSM will increase greatly at the beginning and around 80 s,and such great tracking error will result in loss of target track.Because of the obstacles and the Doppler blindness,Sensor 2 cannot detect the target at these times.Similarly,the RMSE of the NSM is still large at some time.The RMSE of the MSM is smaller than that of the FSM and the NSM.However,the myopic method selects the working sensor according to the one-step prediction reward,which cannot predict the DBZ in time.Therefore,the target tracking error of the MSM is larger than that of the Non-MSM at around 30 s and 75 s. Fig.6 RMSE of target estimation position Compared with the other methods,we can clearly find that the RMSE of the target estimation position by Non-MSM is the smallest.The target tracking error is also stable and stays at a low level all the time.It is because that the proposed method can be aware of the DBZ in time.In this way,the most suitable sensor will be selected to track the target at each decision-making time. The sensor scheduling schemes optimized by different methods are shown in Fig.7.Compared with the MSM,the sensor switching frequency of the Non-MSM is less than that of the MSM.During the 100 Monte Carlo experiments,the number of sensor switching times of the Non-MSM is 21.52 on average,and the number of sensor switching times of the MSM is 24.37 on average.It is because of the farsightedness of the proposed method,which can save energy and extend the working life of the multi-sensor system. Fig.7Sensor selection schemes It should be noted thathis the key parameter in the proposed sensor scheduling method.To analyze the influence ofhon the target tracking accuracy,some experiments with differenthare done to do comparison. Fig.8 shows the mean of RMSE versus prediction time horizonh.It can be seen that whenh?4,with the increasingh,the target tracking error will gradually decrease. Fig.8The mean of RMSE However,whenh>5,the tracking error increases with the increasingh.It is because the sensor scheduling scheme is determined by multi-step prediction.For the maneuvering target,ashincreases,the prediction error of the system will also increase.The longerhis,the greater the prediction error is.Thus,when using the Non-MSM,the prediction time horizon should not be set too large.For the problem in this experiment,the best prediction time horizon is 3. Meanwhile,to examine the effectiveness of the proposed FSA based on the decision tree optimization algorithm with pruning techniques,the depth- first search algorithm(DSA)is used to do comparison,and the average search time of a single decision is shown in Fig.9.Obviously,the search time of the FSA is less than that of the DSA,and more running time will be saved with the increase of the prediction time horizon. Fig.9Average search time of single decision To show the advantage of the proposed modified prediction method of the target state in the absence of measurements,the comparative experiment is carried out with the general prediction method.In the process of using the fixed sensor scheduling method,when the target becomes Doppler blindness for the detection sensor,the target state is predicted by two methods which include the modified prediction method and the general prediction method. Fig.10 shows the RMSE of the target prediction position by using the modified prediction method and the general prediction method.Compared with the general prediction method,we can see that the prediction error of the target state is reduced by the modified prediction method at around80 s.Combined with Fig.6,the target exactly be-comes Doppler blindness for sensor 2 at around 80s,which also corresponds to the variation of RMSE of the target estimation position.It can be concluded that the prediction error can be reduced by the modified prediction method.In this way,more accurate information can be provided for the subsequent track association operation,and the loss risk of the target track will be reduced. Fig.10 RMSE of target prediction position by FSM This paper presents a non-myopic multi-sensor collaborative scheduling method for tracking the ground maneuvering target in the presence of the DBZ.The method not only considers the immediate reward of the sensor scheduling action,but also takes into account the future expected reward over the prediction time horizon.Besides,an improved decision tree search algorithm based on the branch pruning techniques is proposed to solve the sensor scheduling action.Simulation results indicate that the method in this paper can select the most suitable sensor to track the target,and the target tracking accuracy is better than existing methods.The improved decision tree search algorithm can solve the scheduling scheme quickly,which also can be used to deal with other similar optimization problems.However,only one target is considered in this paper,and the proposed method should be extended to the multi-target tracking case in future research.6.Simulations
6.1Comparison of scheduling methods
6.2 Comparison of target state prediction methods
7.Conclusion and discussion
Journal of Systems Engineering and Electronics2020年4期