,,
1.College of Civil Aviation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China;
2.Aviation Key Laboratory of Science and Technology on Fault Diagnosis and Health Management,Shanghai201601,P.R.China
Abstract:The timely and accurately detection of abnormal aircraft trajectory is critical to improving flight safety.However,the existing anomaly detection methods based on machine learning cannot well characterize the features of aircraft trajectories.Low anomaly detection accuracy still exists due to the high-dimensionality,heterogeneity and temporality of flight trajectory data.To this end,this paper proposes an abnormal trajectory detection method based on the deep mixture density network(DMDN)to detect flights with unusual data patterns and evaluate flight trajectory safety.The technique consists of two components:Utilization of the deep long short-term memory(LSTM)network to encode features of flight trajectories effectively,and parameterization of the statistical properties of flight trajectory using the Gaussian mixture model(GMM).Experiment results on Guangzhou Baiyun International Airport terminal airspace show that the proposed method can effectively capture the statistical patterns of aircraft trajectories.The model can detect abnormal flights with elevated risks and its performance is superior to two mainstream methods.The proposed model can be used as an assistant decision-making tool for air traffic controllers.
Key words:aircraft trajectory;anomaly detection;mixture density network;long short-term memory(LSTM);Gaussian mixture model(GMM)
The contradiction between limited airspace resources and the rapidly increasing demand for air traffic has become increasingly prominent,leading to increased flight safety issues,especially in some congested airspace.Timely detection of abnormal flight trajectory is vital to prompt response of air traffic controller(ATC),and safety risk.According to statistics released by Boeing[1],78%of fatal air-related accidents worldwide occurred in the terminal airspace,including departure and arrival flight phases of the aircraft.Therefore,developing new anomaly detection technologies matters to the analysis of abnormal behavior of aircraft in terminal airspace[2].The main aim of this paper is to construct a trajectory anomaly detection method in terminal airspace.
Regarding the trajectory anomaly detection,many methods based on movement trajectories have been proposed in the aviation domain[3-7].In recent years,researchers can access vast amounts of data via open or public sources like ADS-B[8].The advances in machine learning have spurred the research’s pace to detect unsafe flight patterns and other operational anomalies automatically.
Clustering-based approaches have been extensively utilized in anomaly detection.Barratt et al.[9]usedK-means to cluster the trajectories and construct a probabilistic model based on a Gaussian mixture model.The clusters with low probability were regarded as anomalous trajectories.Guo et al.[10]presented an unsupervised clustering algorithm based on the information bottleneck(IB)principle and mutual information(MI).The kernel density estimation was used to model the shape of the motion trajectory.The Shannon entropy was adopted to identify whether a test trajectory was anomaly or not.Gariel et al.[11]identified each trajectory’s turning point and clustered all turning points using density-based spatial clustering of applications with noise(DBSCAN),and then identified anomalous trajectories.These algorithms need to measure trajectory distances[12],such as Euclidian distance,the dynamic time warping and the Hausdorff distance,and cost more money,time and high-performance equipment when applied to large-scale datasets.Actually,it is challenging to obtain a good measure of the trajectory distance.Besides,clustering-based approaches do not scale well with vast amounts of high-dimensional data.Therefore,they are not suitable for real-time anomaly detection.
Domain-based method is one category of the state-of-the-art methods for detecting significant events in aviation.Das et al.[13]applied the multiple kernel anomaly detection to detect the approach phase.They correctly identified two anomalous situations,that is,high energy approaches and turbulent approaches.Puranik et al.[14]proposed a framework to identify abnormal trajectories based on one class support vector machine(OC-SVM)model that can compute the anomaly scores of each trajectory.The method was assessed by simulated flight data with abnormalities and real flight data during the departure and arrival phases.The model showed the excellent result in anomalous detection,while its computational complexity was expensive and unsuitable for large-scale datasets.To reduce the time of learning on large datasets,the extreme learning machines(ELM)[15]or the local outlier probability(LOP)method[16]will be a better choice.
Some advanced methods are proposed in anomaly detection because of the development of neural networks. In principle,deep-learning methods should be better than traditional machine learning methods[17],especially in the case of finding anomalies in large-scale data.To better identify the operation mode,many scholars used neural networks to learn trajectory characteristics.These methods assume that anomalies will lose information when projected to a lower dimension space,resulting in a relatively poor reconstruction result.Autoencoders(AE)has the excellent capability of nonlinear dimension reduction[18].Olive et al.[19]combined a trajectory-clustering method to obtain the main flows in airspace with autoencoding neural networks to perform anomaly detection in flown trajectories.The author used a clustering method to recognize a group of clusters corresponding to sector flows.Then AE was used in each cluster to detect abnormal trajectories.Aircraft trajectory related information belonged to the time series data with high spatial-temporal correlation.The recurrent neural network(RNN)was used to capture temporal and nonlinear dependencies in multivariate time series as the one available in the aircraft trajectory dataset[20].The long short-term memory is a better tool than RNN because it can solve the inability to learn long term patterns in sequential data[21].Malhotra et al.[22]proposed a long short-term memory(LSTM)encoder-decoder model,that was trained only with normal data.The trajectories with high reconstruction errors were classified as abnormal trajectories.Nevertheless,the network output only a limited description of the target variables’properties,instead of sufficient analysis of abnormal trajectories.All of these methods are for purpose of learning features from a huge scale of data.Optimizing the trajectory features learning ability and the models’outputs to describe target variables is the main direction.
The key limitations of the studies motioned above are:(1)Their computational costs are high and it is difficult to apply them to large-scale data sets;(2)aircraft flight data has high spatial-temporal correlation,which is challenging for learning the attributive characteristics;(3)the limited description of the target variables’properties does not provide sufficient analysis on abnormal trajectories.To address these problems,this paper builds a novel detection model based on a deep mixture density network(DMDN)with the following contributions:(1)A regularization model is utilized to handle the difficulties in trajectory reconstruction and obtain a smooth trajectory with less noises;(2)we train a deep LSTM to capture features of a large number of trajectories and take into account the attributes closely related to aircraft trajectory status,such as aircraft type,speed,and angle,thereby obtain a richer representation of trajectories;(3)the model uses the Gaussian mixture model(GMM)to parameterize the statistical properties of the flight trajectory.The output of the model can characterize the trajectory data,so as to evaluate the risk degree of trajectory.
Our model first uses a regularization method to get a high quality of trajectories sequences by timealigning trajectories and extrapolating the short trajectories.The DMDN is then introduced to characterize trajectories.Finally,the conditional probability density function of the trajectories can be used to assess the risk of trajectory.Fig.1 shows the workflow of the trajectory anomaly detection method.
Fig.1 Flowchart of the trajectory anomaly detection method
By quality analysis of the flight data,it is found that there exist mainly two problems of the original data.First,some trajectories may have outliers(such as noises,null values)at a given time step.Second,the lengths of time-gaps within a trajectory differs from point to point.The regularization method is a widely used framework in image reconstruction,denoising,data compression,etc.Trajectory reconstruction employs the regularization method and uses a low-quality trajectory sequence to produce a higher quality trajectory.Here,we tackle the two problems existing in the original trajectories in a unified model.
DefineTcomwith the unit of second to present the common length.The method extrapolates trajectories that are shorter thanTcomand truncates trajectories that are longer thanTcom.An optimal value ofTcomcan be obtained through experiments from historical flight data.
The regularization method for aircraft trajectory reconstruction can be described as the following optimization problem
wherePˉis the reconstructed trajectories;P∈R N×Ethe high-quality trajectory withNequal time-gap points and itsith row is the trajectory point with attributes ofEdimensions at timei;lengthNthe maximum of the trajectory original length andTcom.P0∈R H×Ethe corresponding observed trajectory withHunequal time-gap points;D∈R N×Na sampling matrix ;represents a measure of fit;ρ(·)the regularization cost function,used to adjust the smoothness of the trajectory;λthe regularization parameter used to control the trade-off between the fit term and a regularization term,andthe Frobenius norm of a matrix.Eq.(1)propels to smooth the measured points and is especially useful for suppressing noises.
The most useful property of a regularization function should have a good balance between removing noises and preserving the trajectory’s local changing tendency.There are many formulations for the regularization term[23].For the sake of simplicity,this paper adopts the Tikhonov cost function,one of the most widely referenced regularization cost function in single processing,as the regularization term,and is defined as
If the discrete form of Laplacian operator is approximated by
whereΓis the Laplacian matrix.Then the Laplacian matrix looks like
The intuition behind this Tikhonov regulation term is to force the local spatial smoothness of the trajectory sequence.As the noisy points contain high-frequency energy,they will be removed in the regularization process,and the reconstructed trajectory will not include outliers.
Substituting the regularization term in Eq.(2)into Eq.(3),the trajectory reconstruction problem can be transformed into minimizing the following objective function
The objective function Eq.(5)is a convex quadratic function and can be solved analytically.A solutionPminimizesfif and only if
The solution of convex quadratic problem is
The reconstructed trajectory data has the following properties:(1)The time intervals between any two continuous points are same;(2)the time resolution is higher than the raw trajectory;(3)the reconstructed trajectory is smoother than the original one.Fig.2 shows the results of trajectory reconstruction.
Fig.2 Trajectory reconstruction effect
The mixture density network(MDN)proposed by Bishop[24]is a neural network that maps an input feature to a set of mixture density model parameters.MDN performs well in speech synthesis and written synthesis because it can well fit the timevarying features.Unlike conventional networks,MDN does not directly output a point estimation for the target data,but parameters of a multi-model mixture of distributions.It can better express the trajectory’s statistical properties.
MDN is a combination of a neural network and a mixture density model,where the output of a neural network is the input parameters of a density mixture model.If we learn the trajectory features through the neural network with strong nonlinear mapping capability and then parameterize the data through the mixture density model,we can fit the trajectory data statistics well to determine whether the flight trajectory has abnormal characteristics.
In previous studies,shallow neural networks,like back propagation(BP)network,is utilized to learn features,but it could not fit complex time series data.Since aircraft trajectory information is time series data with high spatial-temporal correlation,this paper chooses LSTM network to learn the multidimensional coupled nonlinear characteristics of aircraft trajectories.It proves to be useful for learning feature related to time-sequential data.There are various probability density distributions,and GMM is chosen to parameterize the network because of the Gaussian distribution’s good computational properties.
The output of the LSTM network determines the parameters of GMM.Therefore,DMDN represents the conditional probability density function of the trajectories conditioned on the input vector of the LSTM network.DMDN can model the underlying generator of flight trajectory data and identify latent risks from daily operations.Thus,we can use the results to judge abnormal trajectory.
(1)LSTM network
LSTM is a variant of RNN and composed of a cell and three gates,namely,an input gate,a forget gate,and an output gate.LSTM can maintain a longer memory,because it can decide which information to retain and which to forget.LSTM can consider the relationship between the input information in the past moment and the current moment,so it is suitable for modeling long time series data.Fig.3 shows that the LSTM network withJhidden layers is trained to capture the characteristics of the input data.The updating process of the internal parameters in the LSTM network is mainly realized through the information transfer of the internal cell state and the hidden layer state in time step.The calculation formula is as follows
Fig.3 Deep mixture density network model
wherex tis the input data at timet;σ(·)the Sigmoid activation function;tanh(·)a hyperbolic tangent activation function;⊙the Hadamard product of the matrix;ftthe control unit of the “forgetting gate” ;itthe control unit of the “input gate” ;otthe control unit of the “output gate” ;Ctthe cell state at timet;Stthe hidden state at timet;andW xf,W sf,bf,W xi,W si,bi,W xo,W so,bo,W sc,W xc,bcare the weight parameters in the network.
(2)GMM
GMM can be used for clustering and density estimation.We can identify frequent flight operations and calculate the statistical inference of the underlying distributions,representing the degree of abnormality of flight trajectories.The outputs of GMM are parametric probability functions,and represented as a weighted sum of Gaussian component densities.The outputs of LSTM network are utilized as the input of GMM.LetZbe the output tensor of MDN;It is partitioned into three subsetsandthat correspond to the inputs of the GMM mean,variances and weights,respectively
whereKis the number of Gaussians.These parameters have some restrictions which should be satisfied
(1)The mixing coefficientαishould satisfy=1.The mixing coefficients are estimated using the softmax activation function
where the value ofαilies in the range(0,1).
(2)The scale parametersσishould satisfy the constrainσi(x)>0.It is defined as
(3)The mean parametersμirepresents location parameters.These should be directly dependent on the network outputs.It is defined as
The output of the MDN model is the conditional density function of the target variable versus the input variable in the form
wherexis the input variable;ythe target variable;Mthe number of components in the model;αi(x)the mixing coefficient;andN i(·)the kernel function.
The Gaussian function is chosen as our kernel functions
whereμi(x)and(x)are the mean and the variances of theith Gaussian,respectively;andcis the dimension of the target variable to be obtained.We assume that each component of the output vector is independent of each other.
To make the output as close to the target variable as possible,we define a loss function that requires minimization of the negative logarithm of the likelihood
The parameters in MDN will be optimized as the negative log-likelihood in Eq.(15)minimized.We choose the Adam optimizer function,an extension of the gradient descent method,to improve the traditional gradient by introducing the first-moment estimation and second-moment estimation.If we choose the mixing coefficients and the Gaussian parameters(means and variances)correctly,the MDN can approximate any given density function to arbitrary accuracy.Thus,we estimate the distribution of aircraft trajectory using the optimized MDN.
“Anomaly” means unusual.That means different flight operations with normal trajectories.The probability of its occurrence is much lower than normal.The detection of the anomalous trajectory is of interest to the aviation domain,and can be used as valuable analysis reference for airspace safety.For instance,some flight operations that may lead to safety incidents,such as go-around operations,conflict resolution activities,and air traffic management operations,require increased safety precautions.Fig.4 shows that the abnormal trajectories have different flight operations with normal trajectories,leading to more uncertainties and increased safety risks.This paper aims to determine a risk threshold that distinguishes abnormal trajectories from normal ones.Since the boundary between normal and abnormal is vague,we adopt a data-driven approach to find the threshold.
The paper obtains the threshold by training the normal flight dataset.Abnormal trajectories are detected if their probabilities of being normal are lower than the threshold.Mathematically,it is a sum of the probability density function of a trajectory belonging to a Gaussian component and weighted by the probability density function of the Gaussian component being appropriate in the form as
wherex fis a vector form,trajectoryf,andione of the gaussian components
In practice,air traffic controllers(ATCs)can adjust the value of threshold according to their preferences and needs.This value is associated with an air traffic detection system’s sensitivity and can be set based on the distribution of all trajectories in the dataset.Therefore,the data-driven model is suitable for anomaly detection in different terminal areas.
If the probability of one trajectory is lower than the threshold,an ATC checks additional information about the trajectory,such as procedure standards and weather information,to further determine its risk.The detection threshold can be adjusted according to the actual situation.Therefore,this model can play an auxiliary decision-making role for air traffic control well.
Our method is evaluated on departure and arrival flight datasets at Guangzhou Baiyun International Airport(GBIA).
The dataset used in this paper contains aircraft monitoring data of flights from GBIA in China,from November 2018 to December 2018,including 60 000 trajectories with 32 000 departure trajectories and 28 000 arrival ones.Each item of data consists of ten attributes,including flight ID,latitude,longitude,altitude,airspeed,course,departure location,arrival location,aircraft type,and monitoring time.The center of runways in GBIA is set to the earth-centered,earth-fixed coordinates.The original measurements are transformed into eastnorth-up coordinates.We only keep the position measurements that are less than 50 km in the “east” and “north” dimensions and less than 2 000 m in the “up” dimension,which is the critical area for tackoff and final approach.
Considering the anomaly detection model’s training,this study transforms the raw flight data into vectors in the form of multivariate time-series.Each trajectory is represented by a series of time,position measurements,airspeed,course,and flight information(departure location,arrival location,aircraft type)ordered data.Among them each trajectoryτ=where timeti∈Z+,the measurements,the airspeedv∈R,the courseθ∈R,and the flight informationDis three-dimensional discrete random variables.The dataset is divided into a training set,the validation set and the test set according to the ratio of 8∶1∶1.The test dataset with 56 abnormal departure trajectories and 72 abnormal arrival trajectories is used to check the model’s quality.
The MDN model is run on a high-performance computer with 16 i7 CPUs and 128 GB RAM.
We carry out the experiment on the validation dataset with 66 abnormal trajectories in the departure validation dataset and 53 abnormal trajectories in the arrival validation dataset.The abnormal rate for the dataset is about 2%,meaning that the two classes(normal and abnormal)are disproportionally represented as a typical unbalanced dataset in the binary classification problem.Thus,the area under the curve(AUC)is a better index for assessing the model’s performance,since it is not sensitive to sample categories’equilibrium.
The value ofTcomhas considerable influence on the model.For instance,a largeTcomresults in less accurate trajectories,as more trajectories are extrapolated and more trajectories are too short.Fig.5 shows the histogram of the original arrival trajectory.We selectTcomas the median of the trajectory lengths at GBIA:Tcomis set to 190 s for departures and 550 s for arrivals.
Fig.5 Histogram of raw arrival trajectory
The DMDN model has an input layer,the LSTM layer,the GMM layer,and an output layer.The input data contains seven-dimensional features.The LSTM layer with three hidden layers and the node of each layer is set to 128.Fig.6 shows the sensitivity analysis of the number of mixture componentsMbased on AUC.The number of mixture componentsMis set to 15 for departures and 20 for arrivals.The output of MDN is set to three times ofM.The model is trained using an Adam optimizer with an initial learning rate of 0.006.
Fig.6 Sensitivity analysis of the number of mixture components based on AUC
The threshold of anomaly detection is associated with the sensitivity of the detection model.We used the receiver operating characteristic(ROC)curve to obtain the optimal threshold,since it is easy to detect the effect of arbitrary thresholds on the classifier’s generalization performance and not affected by the distribution of the positive and negative samples.Fig.7 presents the ROC curve of our model in the arrival validation dataset.On the ROC curve,the point closest to the top of the coordinate diagram is the best threshold with high true positive rate and false positive rate.
Fig.7 ROC curve of the proposed model in the arrival validation dataset
To assess the performance of the detection method,we use four classical criteria,including precision,recall,F(xiàn)score,and AUC.The definition are as follows
“Precision” represents the probability that a detected anomaly is indeed an anomaly. “Recall” represents the percentage of all real anomalies that are detected.Since the aircraft behavior could endanger human safety,all anomalies should be detected as far as possible.Therefore,recall matters more than precision in our task,soβis set to 2,which means recall has more weight than precision.The AUC concept is described in Section 2.2.The closer the AUC is to 1,the better the performance of model.
Fig.8 shows the results of 20 flight modes of arrival trajectories.The arrival aircraft follow the instrument landing system(ILS),which proves the validity of our data.It can be intuitively seen in Fig.9 that abnormal trajectories differ markedly from most normal trajectories.Some of the anomal trajectories do not deviate much from the normal ones,while in the lateral profile,the trajectories deviate significantly from the other trajectories in the same operation.
Fig.8 Flight modes of arrival trajectories
Fig.9 Visualization of abnormal arrival trajectories in a flight mode
The proposed DMDN detection method is compared with a simple mixture MDN employing the multi-layer perceptron as the neural network structure[9].Fig.10 shows the experimental results.Our method has the optimal in four assessment criteria,not only in departure trajectories but also in arrival trajectories.The difference between DMDN and MDN lies in the structure of the neural network layer.The performance of DMDN is better than that of MDN because of the excellent trajectory feature coding ability of LSTM.At the same time,learning probabilistic(LP)uses a classical clustering algorithm,likeK-means,and is limited in its ability to characterize input characteristics.
By comparing Fig.10(a)and Fig.10(b),we can find that the departure trajectory detection model performs better than the arrival’s trajectory anomaly detection model.Since a departure flight typically climbs out of the airport terminal area following a prescribed procedure,and its flight patterns are relatively simple and easier to mine from the data.However,an arrival flight will stay in the airport terminal area for a longer time,and its flight trajectories can change more randomly.Arrival flights may have air traffic control measures such as queuing for approach.So it is more challenging for the model to capture its trajectory characteristics.
Fig.10 Anomaly detection results of the three methods
In a nutshell,our trajectory anomaly detection method works best and can well capture the statistical features of aircraft trajectories.Also,our method trains quickly,compact,and is suitable for largescale datasets.
This paper aims to build a data-driven method to improve the abnormal behavior detection system of aircraft to guarantee flight safety.Different from other data-driven methods,the proposed method considers not only the general position information and velocity of aircraft but also the multidimensional features such as angle and aircraft type to indirectly represent the flight intention and performance of aircraft trajectory.This is helpful to improve the effectiveness of the model for the extraction of the trajectory characteristics.Before this study,it was difficult to reconstruct the trajectory accurately based on the flight data,while our model can handle the difficulties in trajectory reconstruction and obtain a smooth trajectory with less noise.Our model is also useful to other vehicle’s trajectory anomaly detection.It can be trained quickly,compact,and is suitable for large-scale datasets.
Several improvements to the proposed method will be performed in our future work.More aircraft attributes will be considered,such as the special position,ATC factors.Also,different terminal areas are different,so we should convert the threshold’s training method to change the threshold in real-time.We will use a powerful visual analytics technique to visualize anomaly detection system interface and strengthen all the steps of the trajectories analysis.
Transactions of Nanjing University of Aeronautics and Astronautics2021年5期