SUN Jianbin,LI Jichao,YOU Yaqian,JIANG Jiang,and Ge Bingfeng
College of Systems Engineering,National University of Defense Technology,Changsha 410073,China
Abstract: Link prediction of combat networks is of significant military value for precisely identifying the vital infrastructure of the enemy target and optimizing the operational plan of our side.Due to the profound uncertainty in the battleground circumstances,the acquired topological information of the opponent combat network always presents sparse characteristics.To solve this problem,a novel approach named network embedding based combat network link prediction (NECLP) is put forward to predict missing links of sparse combat networks.First,node embedding techniques are presented to preserve as much information of the combat network as possible using a low-dimensional space.Then,we put forward a solution algorithm to predict links between combat networks based on node embedding similarity.Last,massive experiments are carried out on a real-world combat network case to verify the validity and practicality of the proposed NECLP.This paper compares six baseline methods,and experimental results show that the NECLP has outstanding performance and substantially outperforms the baseline methods.
Keywords:link prediction,node embedding,combat networks,sparse feature.
The rapid advancement in information technology and intelligence has stimulated the warfare transmutation from platform-based countermeasures to the network-centric warfare (NCW) [1?3].As one of the most superior operational concepts,NCW integrates all the well-informed geographically scattered forces together through robust computer networking,transforming the information advantage into the competitive advantage [4,5].Hence,all knowledgeable entities in the battleground consist of a combat network and cooperate with each other via information flow [6,7].
Information superiority empowered by sensors accelerates the rhythm of decision-making [8].The two-sides at war are always disguising their own network structure by performing electromagnetic interference or hiding essential infrastructure to prevent hostile forces from scouting,thus promoting its own survivability [9].In the battleground,it is almost impossible to acquire perfect intelligence of the opponent combat network.Among the acquired intelligence,therefore,it is common to have missing information,or in other words,missing links or nodes of the combat network owing to the uncertainty and complexity of the battleground.It would be beneficial in both identifying and destroying the vital infrastructure of the opponent side and optimizing the operational plan of our side if we could predict and recover the missing network topology in advance [10].Hence,this would surely advance the effectiveness and efficiency of joint operations and improve the chance of victory.Missing nodes identification and missing links prediction are two significant yet different academic problems.In this regard,this paper mainly focuses on resolving the link prediction problem for combat networks,i.e.,detecting the missing links according to the observed network including all the nodes information and making the improved combat network more in line with the real one.
Link prediction has been a hot issue of network science and various measures have been put forward [11?15].The simplest and the most classic framework for link prediction is based on the structural similarity of the network [16].According to the retrieved information from the network,the similarity-based measures could be further categorized into three divisions:the local indices,global indices,and quasi-local indices [11,17].The local indices,such as Salton index,Sorensen index,common neighbors,Adamic-Adar index,Jaccard’s coefficient,hub promoted index,and resource allocation index,are usually based on the local common neighbor information.The global indices take global topological information and consider the ensemble of all paths.These link prediction measures including Katz index,average commute time,and random walk with restart [18].The third quasilocal index is a tradeoff of local and global structural network information,such as local random walk,superposed random walk,and local path index [19].Except for the similarity index,probabilistic models and the maximum likelihood algorithms are also proven to be powerful measures for link prediction [20?22].Based on the observed information of the network,the probabilistic models try to learn a model and then predict the non-observed links according to the learned probabilistic model.Similarly,the maximum likelihood algorithms work through learning some specific parameters to maximize the probability of the observed network structure and then predict the existing probability of the missing link.
However,almost all literature focused on networks in biological or social fields [23],and little has been involved in the military area.Due to the increasingly complex battleground environment,it is almost impossible to collect adequate information about the opponent combat network.Consequently,the acquired topological information of the opponent combat network is extremely imperfect and it maintains sparse characteristics [24?27].Yet,a presumption of most conventional link prediction algorithms relies on the availability of adequate observed information.Thus,the classical link prediction methods may be weak and deal with combat networks ineffectively with extremely sparse features [28].
Being one of the most powerful representation learning methods,node embedding,aiming to map the network structure into a low-dimensional space,is confirmed to be a very practical way of solving link prediction problems of a complex network with the sparse feature [29?31].Numerous node embedding methods such as DeepWalk [32],node2vec [33],and large-scale information network embedding (LINE) [34]have been proposed.Taking advantage of the deep neural network frame,these methods can discover the latent knowledge hidden in the network and preserve as much linkage information as possible by non-linearly transforming the observed network structure into a lower dimension one [35,36].Therefore,this paper spreads the advantages of node embedding techniques and presents an integrated framework called network embedding based combat network link prediction (NECLP) to solve the link prediction problems of a combat network.The contributions of the study are summarized as follows.
(i) We present a procedure of learning node embedding to map the structural information of a combat network into a low-dimensional space.The learned node embeddings can precisely capture the latent information underneath the network structural topology for link prediction.
(ii) Based on the node embedding,a solution algorithm is put forward to resolve the link prediction model for combat networks,which can better address the problem of sparsity for the combat networks.
(iii) Extensive experiments on a case study of real combat networks are carried out to verify the effectiveness and practicability of NECLP.Compared with six classic baseline measures,the proposed NECLP can achieve better link prediction results for combat networks.
The remainder of the paper is designed as follows:The problem formulation of combat network link prediction is described in Section 2.In Section 3,a detailed introduction to the unified methodology framework,NWCLP,is demonstrated.To test the feasibility of NWCLP for combat networks link prediction,experiments on an empirical case are conducted and the results are analyzed in Section 4.Finally,the conclusions and future work are discussed in Section 5.
According to the difference in prediction tasks,link prediction can be classified into three categories:missing link detection,future link prediction,and spurious link identification.Missing link detection attempts to predict the already existent yet unknown links;future link prediction solves the problem of predicting the likelihood of two nodes connecting in the future;the spurious link identification works to identify the noisy links,i.e.,nonexistent yet observed links.In this study,we concentrate on the first circumstance,i.e.,predicting the missing links of a combat network.
The combat network link prediction problem is formulated as follows:for a combat networkG=(V,E),whereVandEdenote the sets of nodes and edges respectively.We assume that all node information of the combat network can be observed,whereas some links are missing.The observed combat network is denoted asGO=(V,EO),whereEOis the set of observed links.LetEMdenote the missing links to be predicted.For a pair of nodes〈u,v〉∈EM,the goal of the link prediction task is to predict the likelihood of the link between nodeuand nodev.The easiest framework for link prediction is based on the similarity score calculated with an algorithm.Each nonobserved link pair 〈u,v〉 are assigned to a similarity score Similarity(u,v),which is calculated according to a predefined similarity function.Then the node pairs are sorted in accordance with the similarity values from the largest to the smallest,the bigger the value of the similarity score,the higher the possibility of the link in the nodes.
A unified framework is introduced in this section to resolve the combat network link prediction problem.Node embedding techniques are first introduced.Then,the node embedding similarity is calculated.Finally,a solution algorithm is put forward to predict missing links in combat networks taking advantage of node embedding.
Representation learning can be regarded as a way of using machine learning methods to infer data representation.Different from one-hot representation,distributed representation uses dense vectors to represent data points.As one of the distributed representation learning methods,node embedding tries to map information entities into a low-dimensional space.For a networkG=(V,E),the goal of node embedding is to encode each nodev∈Vwith a vectorZv∈Rdso that similarity in the embedding space (e.g.,dot product) approximates similarity in the original network.
The procedure of learning node embedding is shown in Fig.1,which consists of three steps.First,define an encoder,which represents a mapping from nodes to embedding.Next,define a node similarity function,which is a measure of similarity in the original network;a loss function is also established based on the node similarity.Then,optimize the parameters of the encoder to minimize the value of the loss function.The details of node embedding are introduced as follows.The first step is to define the encoder.For each nodev∈V,an encoder ENC(v)=Zvis defined to mapvto a low-dimensional vectorZv,whereZ∈Rd×|V|is a matrix with each column as a node embedding andZvis thed-dimensional embedding of nodev.
Fig.1 Procedure of learning node embedding
The next step is to define a node similarity.A lot of similarity approaches have been put forward,such as adjacency-based similarity and multi-hop similarity.However,these similarity measures are time-consuming since we need to iterate over all pairs of nodes.Recently,random walk similarity gets much attention for its efficiency and effectiveness,and one does not need to consider all node pairs when training.Only pairs that cooccur on random walks are needed to be considered.Hence,we take the random walk approaches to calculate node similarity.The random walks are acquired starting from each node on the combat network according to strategyR.The simplest strategy is to just run fixedlength,unbiased random walks starting from each node(DeepWalk) [32].Grover et al.[33]also put a flexible,biased random walks strategy trading off between local and global views of the network (node2vec).In this study,we take these two random walk strategies to obtain random walks.
For nodeu,letC(u) be the multiset of nodes visited on random walks starting fromu.PR(v|u) estimates the conditional probability of visiting nodevon a random walk starting from nodeuusing some random walk strategyR.PR(v|Zu)refers to the DeepWalk conditional probability of random walk co-occurrences of nodevstarting from nodeubased on node embedding,which can be parameterized using a softmax function [32,33]:
The goal of training node embedding is to makePR(v|Zu)get closer toPR(v|u).Thus,the loss function can be obtained:
One flaw of calculating the normalization term from the softmax is time-consuming,and we can fix it with a negative sampling technique [37],and the loss function can finally be expressed as follows:
So far,we have described how to optimize embedding given random walk statistics.We use the stochastic gradient descent (SGD) algorithm to update the node embeddingZ∈Rd×|V|to minimize loss functionL.For detailed information about SGD algorithm,please refer to [38].
Given a pair of nodes 〈u,v〉∈EM,several similarity measures can be defined according to the learned node embedding,such as Euclidean distance,Manhattan distance,Minkowski distance,Jaccard similarity,and cosine similarity.In this paper,the cosine similarity is used to represent the node embedding similarity.
Denoted as sim〈u,v〉,the cosine similarity betweenuandvbased on the node embedding is expressed as follows:
whereZuandZvrepresent the node embedding of nodeuandv.
After calculating the node embedding similarity,we then sort node pairs in accordance with the similarity values from the largest to smallest for link prediction.
The solution algorithm for link prediction is shown in Algorithm 1.
The practicability and effectiveness of the proposed NECLP are verified based on a real-world case of combat networks,and substantial experiments are conducted.The section can be divided into three parts:dataset description and experimental settings,results analysis,and parameter sensitivity analysis.
In order to examine the practicability and effectiveness of the proposed approach,NECLP,a case of combat network with 297 nodes and 2 359 links is studied in this section.
4.1.1 Comparison of methods
Numerous classical baseline measures are taken for comparison,which is summerized as follows.
Proposed method:The NECLP learns the node embedding of combat networks according to different random walk strategies,preserving as much information of the combat network as possible using a low-dimensional space.Two different random walk strategies are used in this paper:DeepWalk and node2vec.The number of node embedding dimensions is set as 10;the walk length is set as 80;the window size for optimization is set as 10.For node2vec,the return hyper-parameter is set as 1,and the input hyper-parameter is set as 0.5.
Baseline methods:six classical link prediction algorithms are introduced for comparison because of their persuasive link prediction performance confirmed in prior literature,including common neighbor (CN),Jaccard’s coefficient (JAC),Adamic/Adar index (AA),preferential attachment (PA),resource allocation (RA) and within inter-cluster (WIC).
4.1.2 Evaluation metrics
The precision,recall,and area under the receiver operating characteristic (ROC) curve (AUC) are the three most classic metrics for link prediction performance evaluation [18,39].Thus,we choose three metrics to evaluate the link prediction performance of combat networks for NECLP and the baseline measures.The precision and recall metrics assess completeness and accuracy,while AUC estimates the ranking performances of link prediction.
4.1.3 Experimental setups
Substantial experiments are conducted to verify the link prediction performance of the proposed methods.All the observed links of the combat network are separated into ten equal portions with nine portions as training set and the rest one portion for testing.For each link prediction algorithm,link pairs among the test set are sorted by order of the similarity scores,and the topfpercent ranked links are selected as the predicted ones.In this study,parameterfis set as 20%,we also carry out the sensitivity analysis of parameterfon link prediction performance in the following subsection.The ten-fold cross-validation is employed when calculating the precision,recall,and AUC.To ensure the robustness of the evaluation results,each experiment is conducted 20 times and the median value is taken as the final results.The 95%confidence interval is also calculated to show the validation of the results.
The link prediction performance of the proposed NECLP and the baseline methods evaluated by precision,recall,and AUC are reported in this section.
As shown in Table 1,our proposed node embedding method NECLP is more effective than the baselines for predicting missing links of combat networks.Specifically,in terms of AUC,NECLP-node2vec achieves the best link prediction performance,which is about 11.8%higher than that achieved by the best baseline method RA.When calculating the recall and precision rate of the link prediction results,similar results are obtained,and the proposed NECLP-node2vec and NECLP-DeepWalk earned more than 10% points than all the baselines.Besides,the standard error of the proposed node embedding methods,NECLP-node2vec and NECLP-DeepWalk,are much smaller than that of the baseline methods,which shows that the performance of NECLP-node2vec and NECLP-DeepWalk is more reliable and stable.
Table 1 Link prediction performance for different methods
We also draw the ROC curve of the link prediction results for all the methods.Each point of the ROC curve expresses a specificity (true negative rate) and sensitivity(true positive rate) pair regarding a special decision threshold.The AUC curve displays the sensitivity and specificity at all possible thresholds.Hence,the closer the ROC curve is to the upper left corner,the higher the overall accuracy of the prediction performance.It can be seen that the proposed NECLP obtains a far better link prediction performance of combat networks than all the baseline methods,as in Fig.2.One advantage of our proposed NECLP is that the NECLP could discover the latent semantics hidden in the network structure and leverage a mixture of related components for embedding.When comparing the two random walk strategies,node2vec is a little better than DeepWalk.That is because node2vec has a more feasible way of choosing random walks by adjusting parameters.
Fig.2 ROC curve of the prediction results for different methods
4.3.1 Topfranked links choosing analysis In the study,link pairs among the test set are sorted by order of the similarity scores,the topf∈[0,1]ranked links are taken as the predicted ones when evaluating the link prediction performance for different methods.How is the parameterfaffecting the link prediction performance for different methods? We compare the link prediction performance of the proposed NECLP and the baseline methods measured by precision and recall under different parameterf.The link prediction performance is evaluated with the parameterfchanging from 0 to 1 while fixing the value of other parameters so that the effect of the parameterfcan be determined.The results are displayed in Fig.3.
Fig.3(a) illustrates the link prediction results of different algorithms measured by precision.Because of the sparse characteristics of the heterogeneous combat networks,it is worth noting that the precision of all the mentioned methods depicted in Fig.3 are all relatively low.The precision performance of link prediction declines with the growth of parameterf,i.e.,increasing the number of missing links as predicted ones,which is adhered to our intuition.When the value of parameterfis in a small number,say below 0.4,the precision outcomes of NECLP are far better than the baselines.With the advantages of NECLP narrow down with the increase of parameterf,and even get reversed by some baseline methods,such as CN,JAC,and AA,when parameterf>0.5.However,in the real-world case,the parameterfis usually set with a small number,and the prediction is meaningless if we select a large number of missing links as predicted ones.In contrast with the precision performance,the recall rate goes up with the increase of parameterf.Fig.3(b) presents a similar tendency of recall performance when comparing the proposed NECLP with the baseline methods.
Fig.3 Link prediction results of different algorithms measured by precision and recall
In short,the proposed NECLP can achieve a good link prediction performance,especially with a small parameterk.The combat networks analyzed in this study exhibit sparse features,and the accessible linkage information is extremely inadequate.Nevertheless,NECLP can descry a way of overcoming the shortcoming by mining the valuable latent information underneath the observed structural information for link prediction.
4.3.2 Node embedding dimension analysis
The node embedding dimension is a significant parameter influencing the link prediction performance.How to design the value of node embedding dimension reasonably and accurately to optimize the performance of the NECLP algorithm remains an interesting question.In this subsection,we carry on sensitivity analysis to design parameter of the node embedding dimension.The AUC prediction performance of NECLP-node2vec and NECLPDeepWalk with changing of the node embedding dimension are shown in Fig.4.It can be seen that the AUC score of the link prediction performance is getting better with the increase of the node embedding dimensions,which is in accordance with our intuition.Because the node embedding dimension measures the information capacity to some degree,the higher the node embedding dimension,the more information that can be captured.However,the growth rate of AUC gradually decreases with node embedding dimension increasing.When the node embedding dimension is over 10,the AUC performance of link prediction remains almost unchanged,especially for the NECLP-node2vec method,and even experiences a slight fluctuation.This means 10-dimension node embedding is great enough to capture the latent information underneath the network structural topology for link prediction.It is also known the computational complexity increases with the node embedding dimension,thus the node embedding dimension is set as 10 in the paper.
Fig.4 AUC prediction performance of NECLP-node2vec and NECLP-DeepWalk with changing of the node embedding dimension
Under the condition of informationization,modern warfare tends to be complex and volatile,and the electromagnetic battleground environment has become more and more complicated.It is extremely hard to obtain complete and reliable intelligence on the opponent network.The observed topology of the opponent combat network is usually imperfect and presents sparse characteristics.If we could predict and recover the missing links of the enemy combat network in advance,it would be beneficial in both identifying and destroying the vital infrastructure of the opponent side and optimizing the operational plan of our side.In this study,we propose a link prediction framework called NECLP to take advantage of node embedding techniques for combat networks link prediction.The proposed NECLP has several merits:(i) the selforganization and self-learning ability of NECLP can help preserve as much valuable information as possible by mapping the network into a low-dimensional space;(ii) NECLP can better address the problem of sparsity for the combat networks by exploring and exploiting the latent information hidden in the network structure;(iii) NECLP performs great achievement especially with lack of training observations and produces good expansibility which can also be practiced in other complex networks.However,this paper only takes advantages of the structural information of the combat network,other information,e.g.,node and edge attributes,could also be employed to predict missing links in future research.Besides,combat networks always contain various types of nodes,such as sensor entities,decision-making entities,and striking entities.Information flows on different types of entities bear some valuable semantics,and how to combine this information with node embedding techniques for link prediction is also an interesting research topic.
Journal of Systems Engineering and Electronics2022年2期