Ding Yo ,Zhng Zhi-li ,* ,Zho Xio-feng ,Ci Wei ,He Fng ,Ci Yo-ming ,Wei-Wei Ci
a Xi'an Research Institute of High Technology, Xi'an, 710025, China
b The School of Computer Science, China University of Geosciences, Wuhan, 430074, China
c The School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Keywords: Graph neural network Hyperspectral image classification Deep hybrid network
ABSTRACT With limited number of labeled samples,hyperspectral image(HSI)classification is a difficult Problem in current research.The graph neural network (GNN) has emerged as an approach to semi-supervised classification,and the application of GNN to hyperspectral images has attracted much attention.However,in the existing GNN-based methods a single graph neural network or graph filter is mainly used to extract HSI features,which does not take full advantage of various graph neural networks(graph filters).Moreover,the traditional GNNs have the problem of oversmoothing.To alleviate these shortcomings,we introduce a deep hybrid multi-graph neural network(DHMG),where two different graph filters,i.e.,the spectral filter and the autoregressive moving average (ARMA) filter,are utilized in two branches.The former can well extract the spectral features of the nodes,and the latter has a good suppression effect on graph noise.The network realizes information interaction between the two branches and takes good advantage of different graph filters.In addition,to address the problem of oversmoothing,a dense network is proposed,where the local graph features are preserved.The dense structure satisfies the needs of different classification targets presenting different features.Finally,we introduce a GraphSAGEbased network to refine the graph features produced by the deep hybrid network.Extensive experiments on three public HSI datasets strongly demonstrate that the DHMG dramatically outperforms the state-ofthe-art models.
Hyperspectral images (HSIs) are captured by the hyperspectral imaging spectrometer.Each pixel in the HSI contains hundreds of pieces of reflection information at different frequencies,which makes it suitable for many practical applications such as military target detection,mineral exploration,and agricultural production[1-4].With the progress of remote sensing and the unique properties of HSI data,hyperspectral imaging is playing an important role in remote sensing,and HSI classification (classifying a class label to every pixel)has increasingly become one of the important contents of HSI research[5,6].However,the complex noise effects and spectral variability[7],high dimensionality[8],labeled training samples deficiency[9],and high spectral mixing between materials[10]create difficulties in extracting discriminative information from HSI for classification.
To deal with these issues,many machine learning and handcrafted features classification algorithms have been designed for HSI classification,such as gray-level co-occurrence matrix[11]and support vector machine (SVM) [12].However,these methods only extract features from the perspective of spectral information,while ignoring the spatial features in the image[13-17].Applying spatial information into the classification process will reduce noise and ensure the spatial continuity of the classification results,such as manifold learning methods[18],Markov Random Field[19],patchbased feature extraction [20],and morphological profiles [21].However,handcrafted features are quite empirical and heavily rely on professional expertise,which may have poor classification results compared with the deep learning method [22].
Motivated by the successful application of deep learning in many fields,researchers have had great interest in the use of deep learning in HSI classification.Chen et al.[23]adopted an unsupervised deep Stacked Autoencoder (SAE) to learn spatial and spectral features separately,which was the first time that the deep learning concept had been introduced into an HSI classification task.Subsequently,deep Convolutional Neural Networks (CNNs)saw rapid development.Different dimensional CNNs,such as 1DCNN [24],2DCNN [25],and 3DCNN [26],have been applied to HSI classification.Inspired by their successful application in other fields,including Natural Language Processing (NLP) and Computer Version (CV),the application of Recurrent Neural Network (RNN)[27]and Deep Belief Network (DBN) [28]to HSI classification has also emerged.In Ref.[29],the Fully Convolutional Network (FCN)was unutilized for HSI classification.Recently,Generative Adversarial Networks(GANs)[30]have also been widely investigated for HSI classification studies.Different from the traditional methods based on handcrafted features,the CNN-based methods can extract the high-level features from the HSI automatically,demonstrating better classification effect for different classification targets.However,the deep CNN methods also face limitations.In particular,the fixed shape convolution kernels may fail to perceive the object's different geometric edges,which leads to poor adaptation to the targets with diverse shapes or sizes.In addition,the CNN-based methods are generally applied to Euclidean data and cannot deal with non-Euclidean data[31].
To break the bottleneck of using the CNN-based methods in HSI classification,the graph convolutional network (GCN) approach emerged.Recently,GCN is of research interest in HSI classification.Different from the CNN-based methods,GCN can operate convolution on graph-structured data,such as social network data,and is able to learn the feature information between graph nodes[32].Qin et al.[34]introduced GCN to learn spectral-spatial information of the HSI,which showed the potential of GCN to alleviate the limitation of labeled samples in HSI.Then,Hong et al.[33]investigated different fusion strategies for combing GCN and CNN,and the miniGCN was proposed firstly to reduce the computational complexity.To explore the different aggregation mechanisms of neighboring nodes in the graph,some spatial methods have been investigated.For example,the Graph Attention Network(GAT)[35]and GraphSAGE [3]were adopted for HSI classification.This led to the rapid development of spectral filter and other the spectral methods.And more graph filters,such as the autoregressive moving average(ARMA) filter[36],were designed to extract the graph features.To obtain more spatial and global features of HSI,different structures of graph networks have been proposed.Subsequently,multi-scale and context-aware learning was introduced[37].In the past few years,remarkable successes have been achieved in HSI classification by this approach.However,there are still many defects in GCNs.
(1) In the available methods,a single graph neural network or graph filter is mainly used to extract HSI features and does not take full advantage of various graph neural networks(graph filters).In practice,a single graph filter is prone to lead to feature degradation.Specifically,there are not any information interactions between different graph filters in a graph neural network,which is not conducive to the diversification of information acquisition during feature extraction.
(2) The traditional GCNs are facing the Problem of oversmoothing,which limits the application of graph convolution neural network.
To address the problems,a novel deep hybrid network is proposed,where multi-graph neural network mechanisms,such as GCN,ARMA,and GraphSAGE,are adopted to refine graph features.In particular,we introduce a deep dual graph interaction neural network,where two different graph filters,i.e.,the spectral filter and ARMA filter,are utilized in two branches separately.The spectral filter can well extract the spectral features of the graph,and the ARMA filter can suppress graph noise.This network realizes information interaction between different graph filters,which combines the feature extraction advantages of different filters,and prevents feature degradation during feature extraction.In addition,to deal with oversmoothing,a dense network is further proposed,where the local graph features are preserved.In summary,the main contributions of the paper are threefold.
(1) A novel graph filter hybrid mechanism is presented,which realizes information interaction between different graph filters to ensure feature extraction diversification and prevent feature degradation.
(2) The ARMA filter is introduced into the network for HSI classification,and a novel graph DenseNet structure is proposed to realize the ARMA filter.
(3) Multi-graph neural network collaboration mechanisms are explored to extract HSI features.
This part focuses on relevant typical works,definitions and notations on HSI classification.
One objective of this work is to achieve state-of-the-art classification accuracy with a small number of samples by learning the relationship between labeled nodes and unlabeled nodes.
Given a graph G withmnodes,an adjacency matrixA∈{0,1}m×mand an eigenmatrix X∈Rm×dcan be used,and each node contains addimensional eigenvector.The mathematical expression of the aggregation operation in GCN is
where,Xirepresents thei-th layer output feature matrix and X0is set as X0=X.The node feature transforms from Xi∈Rm×cito Xi+1∈Rm×ci+1.=A +I,which means that the adjacency matrix is added with a self-loop.D is the degree matrix,which performs normalization on.Wi∈Rci×ci+1denotes a learnable weight matrix to realize the linear transformation on the feature matrix and σ is a nonlinear activation function.
Given a graph filter g,the graph convolution of X can be expressed as
where ⊙is the element-wise product.The graph filter is important to spectral-based graph convolution algorithms.In other words,the performance of spectral-based graph convolution algorithms is determined by graph filter designing.
Given the graph G,GraphSAGE adoptsKaggregation functions,i.e.,AGGREGATEk,to aggregate node neighbors and extract their information.The propagation rule is expressed as
In this section,a brief overview of the proposed DHMG is presented(Section 3.1),and the deep dense convolutional network and ARMA convolution implementation are illustrated(Section 3.2 and Section 3.3).Then deep hybrid network manipulation is explained(Section 3.4).Subsequently,HSI preprocessing and graph construction are introduced(Section 3.5).Finally,the HSI classification using DHMG(Section 3.6) is detailed.
Fig.1 gives the overview of DHMG.Firstly,the original HSI is accurately partitioned into adaptive regions called superpixels.With this strategy,the number of graph nodes that need to be processed in the subsequent graph network can be reduced significantly,thus improving computational efficiency.The spectral features of each superpixel are calculated using a mean operation.Then the graph is constructed.Subsequently,a deep hybrid network (using the spectral filter and ARMA filter) and a GraphSAGE-based network are proposed to acquire graph features by information propagation and interaction.Finally,the graph features are interpreted using cross-entropy loss,and the label of each pixel can be obtained.
Different from the spectral filter of GCN which decomposes the signal into base waves,ARMA predicts the future signal value using past values and past errors.However,the calculation of matrix transformation with Eq.(4) is complex,which is not conducive to practical use.More importantly,the ARMA filter cannot implement GNN sparsely.To overcome this Problem,we propose to approximate the ARMA filter in a recursive way.
Proof.Referring to Ref.[39],the first-order recursion ARMA1filter can be presented as follows:
where λmand μmare themtheigenvalues of L and M,and μm=(λmax-λmin)/2-λm.Then,Eq.(6) can be rewritten as
By summing theK-th ARMA1filter,we can restore the analytical form of the ARMAKfilter with Eq.(4),and the final filtering operation is defined as
It has been proven that ARMA can be implemented iteratively,and the graph convolutional operation of ARMA1is calculated by
where,W and V are the learnable weight matrices,presents the node eigenmatrix,denotes the modified Laplacian,and σ is the logistic ReLu function.
The implementation of the ARMA structure is complex,and thus many calculation parameters are added.In addition,different classification targets have varying requirements for features.For example,some classification targets need shallow features,while others need deep features.Inspired by ResNet and DenseNet in CNN,we connect all local features to subsequent convolutional layers (Fig.1),and the concatenate operation is adopted to fuse information.This structure,like DenseNet,is widely utilized in computer vision fields,namely,
From Eq.(12),it is concluded thatrealizes structural recursion,so as to realize the ARMA filter.Meanwhile,it should be noted that can retain the local features of convolution layers and avoid the phenomenon of feature disappearance with the increase of convolution layers,which is why DenseNet is often used in deep convolution networks.
The available graph convolution networks for hyperspectral classification adopt a single graph kernel,which does not take full advantage of different convolution kernels in hyperspectral image feature extraction.In practice,we find that the adaptability of graph filters to various types of HSI classification is different.Therefore,the graph convolution network has different classification effects for different classification targets.
Inspired by hybridization in molecular biology,we develop a deep hybrid graph convolution framework,where information interaction occurs between all convolutional layers in the two branches.Fig.1 demonstrates the deep hybrid network of the two branches,and different branches adopt different convolution kernels.Concretely,branch 1 uses the GCN spectral filter,and branch 2 uses the ARMA filter.Here,the outputs of thel-th graph convolutional layer in the two branches are denoted asThe concatenate operation is utilized for information interaction between the two convolutions.For example,the outputl-th layer feature interaction in branch 1can be expressed as
As shown in Fig.1,the graph features in branch 1 and branch 2 are integrated with interaction operation.And thellayers output of the deep hybrid networkcan be expressed as
Through information interaction of different branches,feature degradation of a single graph convolution kernel during feature extraction is avoided.Meanwhile,the graph DenseNet structure of the same branch keeps the local information of each convolutional layer,thus avoiding oversmoothing.In a word,the deep hybrid network realizes the diversity of feature sources,makes comprehensive use of the advantages of each convolution kernel,and improves the adaptability of the model to different classification targets.
3.5.1.HSI preprocessing
For an HSIIβ={x1,x2,…,xm} withmpixels and β spectral bands,the superpixel is defined as
where,Siis a superpixel withnipixels,and N denotes the total number of superpixels.
The overview of the HSI preprocessing is shown in Fig.3.Firstly,an unsupervised principal component analysis (PCA) [40]method to reduce the dimension of the HSI for computational efficiency is adopted.We choose the first principal component for reduced imagewithmpixels andbbands,whereb?B,andrdenotes reduction.Then the reduced image is partitioned into superpixels.In our proposed method,the simple linear iterative clustering(SLIC)[41]is utilized to assign the pixels into superpixel,namely,
Fig.3.HSI preprocessing and graph construction diagram.
whereSidenotes a superpixel inIband N is the number of segmented superpixels (graph nodes).
Finally,a mean filter is applied to each superpixel to produce a mean feature vector.Concretely,the average of the spectral feature of the pixels contained in the superpixel is calculated as a superpixel feature.The feature vector of the superpixel hiis defended as
whereniandare the number and the spectral feature of pixel contained in superpixelSi.
3.5.2.Graph construction
In this paper,superpixels in HSI are treated as graph nodes.Given an HSI superpixel graph G=(V,E,A),the adjacency matrix Aij∈RN×Ncan be expressed as
where hiand hjdenote the spectral feature of nodesi,j(calculated by Eq.(14)).hi-hj2is the Euclidean distance between two nodes,γ is an empirical value set to 0.2,and Nt(hj)represents the t-hop neighbors set of hi.
In Section 3.5,we introduce the mechanism of HSI preprocessing and graph construction.Then a deep hybrid network is proposed to extract graph features.Finally,we adopt a SAGE algorithm to refine the features produced by the deep hybrid network.
Referring to Eq.(3),the output feature of the node v in DHMG can be formed as
wherehvis the feature vector of the node v,{hu,?u∈N(v)} is the feature matrix of the neighbor nodes of the node v,and W and B are trainable parameters.In our proposed network,the learnable parameters are penalized by the loss function
where Yzfdenotes the label matrix,yGis the labeled example set,Cpresents the number of landcover categories,andis the output of DHMG.Meanwhile,Adam [42](gradient descent) is utilized to update the weight of DHMG.The implementation details of DHMG are demonstrated in Algorithm 1.
To evaluate the performance of DHMG,extensive experimental comparison and analysis are presented in this paper.We briefly introduce three HSI datasets used in our experiments,compare the proposed DHMG with six state-of-the-art methods,analyze the impacts of hyperparameters on DHMG performance,and discuss the computational complexity of the proposed method compared to previous GCN.Finally,the ablative effects of information interaction and dense structure are analyzed.
Algorithm 1.DHMG for HSI Classification.
Three benchmark HSI datasets are employed to evaluate the proposed approach,which are described as follows:
(1)Pavia University: The first University of Pavia (UP) dataset,obtained by the reflective optics system imaging spectrometer(ROSIS),is often used for HSIC.The original UP dataset is composed of 610*340 pixels with 115 spectral bands,which are usually reduced to 103 bands with wavelengths in the range of 430-860 nm.The land covers in the HSI can be allocated into 9 landcover categories.The false color image,ground truth image,and reference map are shown in Fig.4.
Fig.4.(a) False color image,(b) round truth image,and reference map of Pavia University.
(2)Salinas: The second dataset,Salinas,was collected by the airborne visible/infrared imaging spectrometer(AVIRIS)over Salinas Valley,California.The spatial resolution of the data is 3.7 m,and the size is 512×217.There are 224 spectral bands in the original data,and 204 bands are left after the water vapor absorption bands are removed.The data contains 16 crop categories.The false-color image,ground truth image,and reference map are shown in Fig.5.
Fig.5.(a) False color image,(b) round truth image,and reference map of Salinas.
(3)Houston 2013: The third dataset was obtained by the ITRES CASI-1500 sensor and provided for 2013 IEEE GRSS data fusion contest.The data size is 349 × 1905,including 144 bands ranging from 364 to 1046 nm.The ground cover is labeled into 15 categories,and the false-color composition picture,corresponding ground truth map,and reference map are demonstrated in Fig.6.
Fig.6.(a) False color image,(b) round truth image,and reference map of Houston 2013.
Fig.7.Classification maps on Pavia University dataset: (a) Ground-truth map;(b) MSDN (90.90%);(c) CRNN (85.46%);(d) S2GCN (89.74%);(e) MSAGE-CAL (96.14%);(f) MBCUT(89.43%);(g) JSDF (90.82%);(h) DHMG (97.81%).
To compare the proposed approach with other methods,we randomly select 30 labeled pixels to train the model from each class,and the remaining pixels are used for testing,which is a common practice in many papers.
4.2.1.Parameter selection
In the proposed network,five hyperparameters are selected in our DHMG,i.e.,superpixel numberN,the number of epochsT,learning ratelr,the number of deep hybrid layerL,and the number of SAGE layerS,as listed in Table 1.The impacts of these parameters on classification are listed in detail in Section 4.4.
Table 1 Hyperparameter settings for different datasets.
4.2.2.Evaluation index
To evaluate the performance of the comparison algorithms,four widely accepted indexes are adopted,including Overall Accuracy(OA),Per-class accuracy (PA),Average accuracy (AA),and Kappa coefficient (Kappa).
4.2.3.Compared methods
To validate the performance of our proposed DHMG,two CNN classifiers,i.e.,convolutional RNN (CRNN,2018) [43]and multiscale dense network (MSDN,2019) [44],two GNN methods,i.e.,multi-scale graph sample and aggregate network with contextaware network (MSAGE-CAL,2021) [37]and spectral-spatial GCN(S2GCN,2021) [34],and two machine learning classifiers,i.e.,multiband compact texture units (MBCTU,2019) [21]and joint collaborative SVM (JSDF,2017) [45]are utilized as comparison models.We list the architecture of DHMG in Table 2.
Table 2 Architectural details of proposed network.
To conduct the experiments,the latest software and hardware resources are employed.As hardware resources,the Intel i9-10900 K processor with 3.70G of DDR4 RAM is exploited.Each experiment is run ten times,and the standard deviation and the average are provided for the measurement indexes mentioned above.
4.3.1.Quantitative results
In this part,the proposed DHMG is compared with the above six algorithms to validate the performances.The quantitative results of all experiments(the average values of ten running times)on three HSI datasets are reported in Tables 3-5.As shown in the results,our proposed DHMG outperforms the comparative classifiers.
Concretely speaking,in Table 3,the CNN-based methods,i.e.,MSDN and CRNN,are not more advantageous compared with other methods due to a lack of training labeled dates.Meanwhile,we can observe that the GNN-based methods,i.e.,S2GCN,MSAGE-CAL,and DHMG,show their superiorities in hyperspectral classification.However,S2GCN only achieve the OA of 89.74,which is significantly lower than those of MSAGE-CAL and DHMG,namely,96.14 and 97.81.This is mainly because MSAGE-CAL and DHMG improve the GNN network and can learn the multi-scale information of the graph.It is also notable that DHMG has superior classification accuracy in most classes.The results demonstrate the adaptability of DHMG to the classification of small targets.
Table 3 Quantitative experimental performance on pavia university,bold is the best performance.
The comparison results of Salinas are presented in Table 4.Different from the results of Pavia University,JSDF (a traditional machine learning method) gets an outstanding result,whose OA,AA,and Kappa are 94.67,97.69,and 94.06,respectively.The phenomenon illustrates that traditional machine learning methods can also achieve results comparable to deep learning with proper design.For the Salinas dateset,the proposed DHMG improves OA,AA,and Kappa by 1.46,1.34,and 1.01,respectively.And DHMG achieves good classification accuracy for the classes with similar spectral features,i.e.,C8 and C15.This is mainly due to the adaption of ARMA filters and information interaction in each convolutionallayer,and the DHMG can suppress graph noise and learn the diverse graph network information.
Table 4 Quantitative experimental performance on salinas,bold is the best performance.
As for Houston 2013,the classification results are demonstrated in Table 5.Compared with the other two datasets,Houston 2013 contains more cover classes and more spectral bands.To classify a large HSI is a challenge to classifiers.From the results,all investigated methods get lower classification accuracy measured by OA,AA,and kappa.This is because Houston demands more computational power and needs to identify more details.As demonstrated in Table 5,there is no doubt that the proposed DHMG achieves the best classification results by improving OA,AA,and kappa by 1.18,0.60,and 0.40,respectively.The results validate the function of deep dense structure in hyperspectral classification,which can preserve the local convolutional features in the convolution process.
Table 5 Quantitative experimental performance on houston 2013,bold is the best performance.
4.3.2.Visual results
To intuitively observe the classification results of each method,we compare the visual maps produced by investigated classifiers and the results are shown in Figs.7-10.The different colors in the classification maps mean different kinds of land cover.According to the figures,we can observe that the visualization results of ourDHMG on three HSI datasets are the most similar to the ground truth.Specifically,the map on Pavia University contains less miss classification compared with other classification maps of the art-ofthe-state classifiers.In terms of Salinas,as shown in Fig.8,the investigated methods are able to well classify land covers except for C8 and C15 as they are heavily mixed.However,the proposed DHMG still achieves an outstanding result(Fig.8(h)),which verifies its good ability to classify targets with similar spectra.As for Huston 2013 (Fig.9),we obtain a conclusion consistent with the previous analysis.Generally speaking,the results verify the effectiveness and superiority of DHMG for HSI classification.
Fig.8.Classification maps on Salinas dataset:(a)Ground-truth map;(b)MSDN(91.64%);(c)CRNN(87.64%);(d)S2GCN(88.39%);(e)MSAGE-CAL(96.87%);(f)MBCUT(92.14%);(g)JSDF (90.82%);(h) DHMG (98.33%).
Fig.9.Classification maps on Houston dataset:(a)Ground-truth map;(b)MSDN(88.49%);(c)CRNN(82.10%);(d)S2 GCN(89.31%);(e)MSAGE-CAL(92.13%);(f)MBCUT(87.07%);(g)JSDF (90.51%);(h) DHMG (93.31%).
Table 1 shows the hyperparameter selection in our method,i.e.,superpixel numberN,the number of epochsT,learning ratelr,and the number of deep hybrid layerL.In the experiment,a grid search strategy is employed to search the optimal setting of paraments.And we divide the four hyperparameters into two groups,and the index OA is utilized to measure the performance.
4.4.1.Impact of L and N
In this part,we investigate the sensitivity ofLandNin detail.We setLin range of 1-7,andNis varied from 5000 to 20,000 with an interval of 2500 for Pavia University dataset,4000 to 16,000 with an interval of 4000 for Salinas dataset,5000 to 20,000 with an interval of 2500 for Houston dataset,and the other parameters are the same as in Table 1.The results are shown in Fig.10.From the results,we can see that the accuracy of the algorithm is limited by increasing the number of network layers when the number of network layers reaches 2.At the same time,increasing the number of network layers will also increase the amount of calculation.Therefore,we set the number of network layers to 3 in our network.ForN,we observe that OA is improved with the increase ofN.This is mainly because a biggerNleads to a smaller superpixel,and more local spatial features of HSI can be preserved.However,the biggerNmeans a larger graph,which not only requires stronger feature extraction abilities of the algorithm,but also increases the computational burden of computer.It should be emphasized that due to experimental conditions,we can only set the maximumNto 22,000,soNin Huston 2013 is 22,000.
Fig.10.Sensition of L and N: (a) Pavia University dataset;(b) Salinas dataset;(c) Houston 2013 dataset.
4.4.2.Impact of lr and T
In the experiment,lris set to 0.1,0.01,0.001,0.0001,respectively.AndTis varied from 200 to 1000 with an interval of 200.The impact results oflrandTon the three datasets are presented in Fig.11.From the results,we can observe that a large learning rate can speed up the model parameter training,but cannot reach the optimal solution.A small learning rate can make the model achieve good classification accuracy,but the training time is long.Considering both efficiency and training time,we set the learning rate to 0.001 in the proposed framework.In addition,an appropriateTis critical to achieve satisfactory performance.To prevent insufficient model training and over-fitting,we conclude from the results that whenT=600,the model classification accuracy is the best.
Fig.11.Sensition of lr and T: (a) Pavia University dataset.(b) Salinas dataset.(c) Houston 2013 dataset.
In this section,effects of different training samples of the above algorithms on three HSI datasets are analyzed.The number of training pixels is varied from 5 to 30 with an interval of 5.Except for different training data numbers,other parameter settings are the same as in Section 4.2,and Fig.12 presents the OA performance of each classifier on three HSI datasets.From Fig.12,we observe that the performances of the compared classifiers are generally improved as the number of training pixel increases because more training labels enable the model to learn more prior knowledge,and the classification ability is improved.However,the performance of the CNN-based methods is unstable,especially when the number of labeled data is small.The results fully show that the CNN-based methods are not suitable for classification with a small amount of labeled data.Different from the CNN-based methods,the GNN-based methods can learn the relationships between labeled nodes and unlabeled nodes,which greatly reduces the need for labeled data for classification.The experimental results also support that the performance of the GNN-based methods are generally outstanding.It is also worth mentioning that the OA performance of the proposed DHMG outperforms the comparative methods,especially in the adaptability to the situation of limited number of labeled samples because DHMG can preserve the local convolutional information,and performs information interaction between the two graph filters.
Fig.12.Effects of different number of labeled samples on the three HSI datasets: (a) Pavia University;(b) Salinas;(c) Houston 2013.
To evaluate the classification efficiency,we investigate the running times of different GNN-based methods and list them in Table 6,including S2GCN [34],S2GAT [35],MDGCN [46],and the proposed DHMG on the three datasets.The results are reported on a server with a 3.70G Intel i9-10900 K CPU and a GeForce GTX 1080Ti 11G GPU.From the results,we can conclude that our proposed model is more efficient than the comparison methods.This mainly because the utilization of segmentation operations and the ARMA filter can effectively reduce the computational cost.
Table 6 Running time(in seconds),best in bold.
Table 7 Ablation study results on pavia university dataset.Best results are market in bold.
Table 8 Ablation study results on salinas dataset.
Table 9 Ablation study results on houston 2013 dataset.
As illustrated above,information interaction and dense structure play a significant role in improving the effect of hyperspectral classification with the proposed algorithm.To shed light on the contributions of information interaction and dense structure,the reduced model by removing the information interaction is represented as“DHMG-x1“,and“DHMG-x2”denotes the reduced model without dense structure.The experimental setup is the same as described in Section 4.2.As shown in Tables 7-9,it is observed that DHMG achieves the best performance,which validates theimportance of information interaction and dense structure.Meanwhile,we find that the results of DHMG-x2are significantly lower than those of DHMG-x1and DHMG.This is because DHMG-x2confronts the oversmoothing Problem(the best effect comes from 2 layers and this experimental network has 4 layers).The results of DHMG-x2fully show that dense structure can alleviate the oversmoothing problem,increase the depth of the network,and extract the high-level features of the graph.It is more important to note that the results of DHMG are also improved compared to DHMG-x1.This is due to the information interaction design in DHMG.DHMG can make full use of the advantages of different graph filters and realize the diversification of information sources.
In this paper,we presented a novel deep hybrid multi-graph neural network for feature extraction of HSI.We designed a novel ARMA filter implemented in a recursive way to suppress the graph noise.By segmenting the original HSI to many superpixels(nodes)based on similar reflectance properties,the spatial dimension was reduced and the subsequent calculations were simplified.To take full advantage of different graph filters,refine the deep features,and address the Problem of oversmoothing,a deep hybrid network(using the spectral filter and ARMA filter) was proposed.The network can retain local information of convolution layers and suppress noise,which improves classification accuracy.Subsequently,a GraphSAGE-based mechanism was adopted to acquire graph features.We explored multi-graph neural network collaboration for hyperspectral classification.Compared with the state-ofthe-art methods,the experimental results on the three public HSI datasets verified the superiority of the proposed method.
Recently,the application of semi-supervised graph convolutional networks in HSI classification has achieved good results.However,network training still needs labeled data,which cannot solve labeled samples deficiency Problem.In future work,we will focus on unsupervised models for HSI clustering,such as contrastive learning and reinforcement learning.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.