ZHOU Xingyue, YANG Kunde, YAN Yonghong, LI Zipeng, and DUAN Shunli
Underwater Noise Target Recognition Based on Sparse Adversarial Co-Training Model with Vertical Line Array
ZHOU Xingyue1), 2), *, YANG Kunde2), YAN Yonghong1), LI Zipeng2), and DUAN Shunli2)
1) Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China 2) Key Laboratory of Ocean Acoustics and Sensing Ministry of Industry and Information Technology, School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
The automatic identification of underwater noncooperative targets without label records remains an arduous task considering the marine noise interference and the shortage of labeled samples. In particular, the data-driven mechanism of deep learning cannot identify false samples, aggravating the difficulty in noncooperative underwater target recognition. A semi-supervised ensemble framework based on vertical line array fusion and the sparse adversarial co-training algorithm is proposed to identify noncooperative targets effectively. The sound field cross-correlation compression (SCC) feature is developed to reduce noise and computational redundancy. Starting from an incomplete dataset, a joint adversarial autoencoder is constructed to extract the sparse features with source depth sensitivity, aiming to discover the unknown underwater targets. The adversarial prediction label is converted to initialize the joint co-forest, whose evaluation function is optimized by introducing adaptive confidence. The experiments prove the strong denoising performance, low mean square error, and high separability of SCC features. Compared with several state-of-the-art approaches, the numerical results illustrate the superiorities of the proposed method due to feature compression, secondary recognition, and decision fusion.
underwater acoustic target recognition; marine acoustic signal processing; sound field feature extraction; sparse adversarial network
The current passive acoustic recognition methods of underwater targets based on signal analysis have gradually fallen into the bottleneck, with common problems such as low efficiency, manual parameter adjustment, single application scenario, and poor generalization. By contrast, various deep learning (DL) algorithms (Wang, 2019; Khishe and Mosavi, 2020; Mishachandar and Vairamuthu, 2021) have achieved excellent performance under complete databases due to the rapidly growing observation data of underwater acoustic targets. However, in addition to the unbalanced dataset and high labeling cost, the inability of DL to discriminate false or invalid samples complicates underwater noncooperative target (UNT) recognition. The so-called UNTs (such as unlabeled whales and submarines) refer to the underwater target detected without other auxiliary information, except for the signal directly measured by sonar. If the category of one UNT target is not recorded in the database, then DL models will generally force the target to be assigned an existing label. This problem can easily be ignored in autonomous underwater target recognition but will lead to unrecorded target classification errors.
In addition, restricted by the uncertainty accumulation caused by feature extraction, the reliable feature fusion algorithm for UNT recognition is crucial. For instance, Li and Yang (2021) summarized the interpretable feature set for underwater target identification, and Wang(2019) designed the multiscale spectral fuse network. Similarly, principal component analysis can be used to fuse feature dimension reduction to improve underwater target detection precision (Zhang., 2021).
Thus, the advanced studies on unsupervised and semi- supervised learning (SSL) models indicated the research direction of UNT data recognition. For example, Preston (2009) discussed the performances of several unsupervised clustering methods on seabed classification, Huang(2020) proposed an unsupervised transient pulse signal peak detection method based on spectral matrix decomposition, and Liu. (2021a) applied data augmentation to 3D Mel-spectrogram feature recognition of underwater acoustic targets. Nevertheless, the previous methods focus on data expansions of underwater acoustic images and are only applicable to complete data sets with existing labels. By contrast, restricted by the uncertainty accumulation caused by feature extraction, the reliable feature extraction for UNT recognition is also crucial. The commonly used features include time-domain (Huang., 2020), frequency domain (Yang, 2019), auditory perception (Wang and Zeng, 2014; Li and Yang, 2021), and sonar image (Wang., 2019). Owing to the development of DL theory, numerous data fusion studies have been applied to underwater acoustic signals: Zhang. (2021) designed the integrated neural network to fuse underwater spectral features, Ke. (2020) utilized wavelet subbands to extract the fused vessel feature, and Li. (2020) introduced bi-long short term mermory (Bi-LSTM) network to vector hydrophone to classify ships with real- time data fusion. The mentioned algorithms remarkably contribute to UNT feature optimization; compared with the inherent sound field characteristics, these algorithms are easily interfered with by tidal changes, time-varying ocean channels, or other factors.
Considering the weak fluctuation of ocean acoustic chan- nel propagation, UNT can be identified by using stable sound field interference characteristics (Zhou., 2019), thereby exceedingly reducing external interference. Inspired by the intrinsic sensitivity of the interference striation feature to source depth (Duan., 2017), underwater sources can be locatedvertical arrays (Yang., 2018; Wei., 2020). Similarly, underwater interference features can be applied when distinguishing noncooperative targets with different depths. The interference features of various surface targets are similar and can easily obtain labels. Thus, excluding all surface targets, unidentified UNT targets will naturally become underwater targets. Based on this ideology, the design of the SSL ensemble framework includes two aspects: on the one hand, the adversarial net- work (Mirza and Osindero, 2014) is constructed to screen out false samples that do not match the database containing surface target samples and classify them as underwater targets. On the other hand, the recognition precision of UNT is improved by exploiting the labeled samples.
Considering the above analysis, this paper proposes the sparse adversarial co-training (SAC) ensemble algorithm to solve these issues. This framework comprises two parts: joint adversarial autodecoder (JAE) and joint co-forest (JCF) modules. Specifically, three main contributions are presented as shown below.
1) Based on the application of vertical array, the SCC feature is divided into two steps. First, cross-correlation computation is introduced to realize broadband spectrum denoising and splicing of each hydrophone. Second, the sparse coding module of the proposed JAE is performed to compress and reduce feature redundancy.
2) As for the JAE structure, two substacked autoencoders are trained in parallel to reconstruct the interference fringe features while securing the distinction between labeled and unlabeled data. The compressed features are con- nected with labels and alternately trained by the discriminator through the sigmoid or softmax layer. The discriminator purposefully determines unmatched UNT samples as false targets and marks them as underwater targets for co- training classification.
3) The weight of the proposed JCF block is initializedthe prediction output from JAE to utilize its classification prediction effectively. In addition, the adaptive con- fidence evaluation function is optimized by introducing the average similarity to reduce the influence of noisy unlabeled samples.
The structure of this paper is arranged as follows. Section 2 describes related works, marine dataset preparation, and feature preprocessing. Section 3 presents the overall framework of the SAC model, with a detailed description of the JAE network architecture and the JCF training algorithm. Section 4 provides the results of the proposed SAC on the experimental dataset and three public datasets, as well as the comparison between this method and several SSL-based algorithms. Section 5 finally concludes this pa- per.
The data collection and feature preprocessing algorithm are introduced in this section. The target features with the same shape should be extracted uniformly to suppress the nonstationary noise interference to facilitate model training.
Regarding the task of UNT autonomous recognition, SSL is generally superior to unsupervised methods (Bianco., 2019). One of the typical algorithms is co-training, which can rapidly process high-dimensional data and perform favorable generalizations. For example, Yaslan and Cataltepe (2010) measured the relevance between features and probability to optimize the random subspace, and Settouti(2017) used embedded sample feature permutation for feature selection to improve classification accuracy. However, co-training models will expand the noise problem during alternate training under excessively noisy data, leading to the continuous deterioration of classification performance. Therefore, denoising preprocessing is necessary to ensure that CRF is constantly updated in accordance with the anticipated direction. As for noisy samples, autoencoders hold significant advantages in feature denoising and compressing: for example, the convolution denoising autoencoder is used to reduce the ambient noise of underwater acoustic target spectrum adaptively (Zhou and Yang, 2020), correlation distance skip connection denoising autoencoder is proposed to enhance the robustness of speech (Badi., 2020), and the ensemble stacked autoencoder (SAE) is exploited to red hind call classification (Ibrahim, 2019).
Another rapidly developing SSL approach is the generative adversarial network (GAN) (Liu., 2021b), which can be combined with autoencoders to obtain the denoising advantages for target recognition, that is, the so-called adversarial autoencoder (AAE) (Makhzani, 2016). GAN can be extended to a variety of derivative schemes according to different application requirements. For instance, Principi(2017) applied AAE to acoustic novelty detection, Xu(2019) proposed the adversarial approximated autoencoder to acquire the latent feature with adversarial approximation, and Zamorski(2020) achieved the 3D cloud image representation by using 3D AAE. These AAEs manifest weak performances on high- dimensional feature learning despite their outstanding denoising performance compared with co-training algorithms. Therefore, optimizing the initial parameters and confidence constraintsensemble AAE modules is feasible to improve the co-training algorithms.
The experimental marine survey was conducted in the South China Sea. The vertical line array with 16 hydrophones at a receiving depth range of [206, 548] m was adopted, and underwater acoustic data at a sampling frequency of 20000 Hz were collected. A total of 2000 samples constitute the target dataset for model training and test- ing. The data were obtained from five categories with dif- ferent signal-to-noise ratios (SNRs): two bomb sources with depths of 50 and 300 m that are viewed as ‘underwater’ targets in this paper, and three ‘surface’ targets, includ- ing catamaran, freight, and a ship analog source (depth 11 m).
To illustrate the dataset, Table 1 describes its details, and the visualization of the vertical line array and related objects is given in Fig.1. Fig.2 displays the time-domain signals and the short-time Fourier transform (STFT) spectral of different targets. Notably, the ship propeller rotates intermittently; thus, the time length of the ship sample is set to 10 s to record the acoustic characteristics of the propeller. However, other target signal lengths are intercepted within 1 s to acquire concentrated energy with minimal am- bient noise interference. This type of data requires further denoising processing based on the sample interception.
For the acquired original array signal, preprocessing, in- cluding denoising and feature extraction, is executed first. Moreover, individual hydrophones are spliced according to the depth sequence to promote feature fusion. This paper proposes the bandwidth SCC method for pretreatment, which is adaptable for JAE feature compression.
First, the Hilbert transform is performed on the-th hy- drophone received signalx(),= 1, ···,of the vertical line array to calculate the envelope signal:
Table 1 Details of the five targets
Fig.1 Visualization of (a) experimental array and related targets, including (b) catamaran, (c) ship analog source, (d) freight ship, and (e) bomb.
Fig.2 Time-domain and STFT spectra of (a) catamaran, (b) ship analog source, (c) bomb, and (d) freight ship.
where() is the time-frequency domain spectral,2is the snapshot length,fis the sampling frequency, andfdenotes the dual-cyclic frequency.
All(f,) are spliced on theaxis according to the corresponding depth and normalized to form the vertical feature matrix with the dimension of×:
Fig.3 (a) Spectral response of improved Hanning window w(n); (b) Spectral of marine ambient noise.
Fig.4 Uncompressed cross-correlation features include (a) catamaran at a distance of 12 km, (b) ship analog source at a distance of 15 km, (c) 50 m bomb at a distance of 30 km, and (d) 300 m bomb at a distance of 30 km.
On the premise that the UNT dataset contains surface target samples, he unrecorded targets of dataset should be identified in this section as the category of false underwater targets regardless of true classes due to the lack of previous label information. The details of the proposed SAC model based on AAE and co-training are illustrated below.
Fig.5 presents the structures of JAE, which merely comprise two sparse autoencoders and one discriminator. JAE can effectively compress the feature redundancy and further retain the separability of labeled and unlabeled datasets by introducing greedy layer-wise pre-training (GL- WPT) into the sparse hidden layers.
Fig.5 Co-training structure of the joint adversarial autoencoder and the joint co-forest.
where(·) represents the Kullback-Leibler divergence (KL divergence). The cost function can be expressed as fol- lows to update the mapping matrixes and biases by iteration:
Fig.6 depicts the compressed cross-correlation features of SAE output. The SNRs of the target decrease with increasing distance. Compared with Fig.4, sparse compression can effectively reduce feature redundancy and extract distinct interference stripes in the vertical sound field even at low SNR. Owing to the longer sampling time, shorter distance, and more stable signal form compared with bombs, Figs.6a and 6b show that the two surface targets merely contain a single stripe with strong energy, whereas their positions are separate at different depths. The interference fringe features of underwater targets can be used for depth estimation and classification in different experimental scenes (Duan., 2017; Yang., 2018; Zhou., 2019). Moreover, the experimental analysis conducted by Zhou. (2021) proves that recognition accuracy can reach up to 90% when the interference fringe features based on compression are used as the input of the information fusion model. The above studies indicate the effectiveness and feasibility of the proposed SCC features to identify UNT depth.
Fig.6 Compressed cross-correlation features include (a) catamaran at a distance of 12 km, (b) ship analog source at a distance of 15 km, (c) 50 m bomb at a distance of 42 km, and (d) 300 m bomb at a distance of 29 km.
Step 2 Pre-training. The modified discriminative meth- od is introduced herein. JAE eliminates the decoder and only applies the discriminator to identify the compressed features. Alllshould be reshaped into a one-hot code, such asl= [0, 0···1···0], for the convenience of training. According to Mirza and Osindero (2014), the original loss functionJof the discriminator identifies whether the input feature is a true or generated sample:
where(·) denotes the probability distribution of the discriminator considering the input samples. Considering pre- ferable adversarial training effect and feature compression in the two SAEs is difficult; however, inputtingyandyinto the additional generation network will lead to extra information flow loss. Therefore, the discriminator modules of JAE separately adopt sigmoid and softmax as the output layers for alternate learning to utilize the unrecorded noncooperative target to replace the ‘noise sample’ in CG- AN (Mirza and Osindero, 2014) for adversarial training.
Step 3 Sigmoid training. Fig.5 shows that the sigmoid layer with+ 2 units is exploited to replace the softmax layer for updating the discriminator and fed with the samples either from the labeled or the UNT dataset. The binary cross-entropy loss between the network and the sigmoid is expressed as follows:
The running condition of the algorithm contains the following: Intel Core i7-10700K, NVIDIA RTX 3060Ti, Anaconda 4.7.12. Notably, the stochastic gradient descent method is used for JAE model optimization, and the GridSearchCV function is utilized to optimize other initial parameters of JCF. The related underwater targets include five categories, as shown in Fig.1. Two vertical array datasets are measured in the South China Sea to demonstrate the performance of SAC and other methods. As shown in Fig.7(a), the hydrophone depth range of position A is [198, 545] m, and the hydrophone depth range of position B corresponds to the description in Section 2.2. Except for the slightly different depth range, the working frequency bands of the two arrays are [50, 10000] Hz, the sampling frequency is 20000 Hz, the background noise level is approximately 42 dB, and the amplification factor is 3.5 times. Similar to dataset B, the number of samples in dataset A is 2000, and the specific division is as follows:
Dataset A: The incomplete marked dataset contains the following: the number of ship analog source samples, catamaran, and freight is 400, and that of bomb samples 1 is only 200. The unmarked (, UNT set) set only includes 200 bomb 1 samples and 400 bomb 2 samples.
Dataset B: The incomplete marked dataset includes the following: 400 samples of ship simulator, catamaran, and freight but only 200 samples of bomb 2. The unmarked UNT set merely comprises 400 samples of bomb 1 and 200 samples of bomb 2.
Fig.7 (a) Approximate positions of two datasets with ETOPO1 bathymetry, (b) the shipborne radar scan with freight target, and (c) the display interface of array receiving signal.
The-fold cross-validation (= 4) method is used in this study for dataset division to avoid overfitting and underfitting. Specifically, the two original datasets are randomly and independently divided into four parts (each part contains samples of each class) without repeated sampling. During every training, one fold is selected as the test set, and the three remaining parts are utilized as the set of training classifiers. The average of four test results is taken as the classification accuracy.
The similarity is measured among the simulation results, the uncompressed feature, and the compression feature to illustrate the necessity of feature compression. The mean square error (MSE) between the sample features and the simulation results is analyzed considering the five targets. The receiving distance between the array and target is limited to < 60 km (not exceeding the first convergence zone) to alleviate the submerged interference of the shadow zone. The uncompressed feature matrix processed by Section 2 during this experiment is 10001 × 3200 dimensions and comprises 16 hydrophone features of 10001 × 200 dimensions. Through 16 cycles of sparse learning, each subfeature is compressed by three 20 × 200 sparse hidden layers of the same size and then spliced again to obtain the final 320 × 200 features.
The BELLHOP model (Porter, 2011) is exploited to simulate the vertical interference fringe features at various depths according to the measured environmental parameters (Zhou., 2019). Notably, the size of the simulated matrix is also set to 320 × 200 dimensions, similar to the compression features. All uncompressed features are transformed into normalized images of equal size due to the inconsistent matrix size of simulated and uncompressed features; thus, the MSE calculation can be realized.
Fig.8 shows that the depth domain of simulation results refers to the distribution depth range of the hydrophone array, which is set at approximately 200 – 560 m. By brie?y glancing at Fig.9, the MSE of SCC features compressed by JAE is substantially smaller than that of uncompressed features. The MSE of compressed features is always < 0.1 despite the drastic fluctuation of SNR, reflecting the contribution of the JAE model to feature compression and extraction with high robustness.
Fig.8 Simulated interference fringes under the following conditions: (a) catamaran at a distance of 12 km; (b) ship analog source at a distance of 15 km; (c) 50 m bomb at a distance of 30 km; and (d) 300 m bomb at a distance of 30 km.
Fig.9 MSE statistical range of five targets considering the simulations and (a) the compression features and (b) the uncompressed features.
To illustrate the recognition performance of JAE for existing labeled samples, this section sets the labeling rates of two incomplete datasets involved in datasets A and B to 40%, 60%, 80%, 90%, and 100% and assigns unlabeled samples into UNT datasets for recognition. The training termination condition is set to maximum epoch = 150; thus, the loss curves of datasets A and B at diverse labeling rates are depicted in Fig.10. This figure proves that the proposed JAE model can realize loss convergence. Overall, with the increase in labeling rate, the gradient descent speeds of the two datasets tend to be fast, and the convergence parts of loss curves decrease accordingly. Moreover, compared with the softmax layer, the convergence trends of the sigmoid layer are stable and concentrated under different labeling rates. This stability is due to the suitability of softmax for the tasks of mutually exclusive categories, while sigmoid holds more advantages in multi-category classification. Since each sample is assigned two labels, the convergence of sigmoid is better than that of softmax in alternate training. Overall, this experiment indicates the high generalization of the JAE module in different marine scenes based on the proposed SCC feature of vertical arrays.
Fig.10 JAE loss curves of dataset A with (a) a sigmoid layer or (b) a softmax layer. JAE loss curves of dataset B with (c) a sigmoid layer or (d) a softmax layer.
The performances of the proposed SAC framework are compared with those of each module to evaluate the contributions of each block, especially the improvement of secondary recognition of the JCF module. Notably, the parameters and confidence of JCF are still optimized by JAE outputs, and the joint result of SAC is given by Eq. (21).
Fig.11 displays the comparison of two datasets under various label rates. With the increase in the labeling rate, SAC still possesses a higher precision than the two other individual blocks, reaching the highest accuracy of 86.73% and 87.42%. Additionally, Tables 2 and 3 illustrate the comparison indexes of the two datasets. The JCF module shows better performance on dataset B under almost proportions but holds lower metrics than JAE on dataset A in some cases. For example, at a rate of 60% on dataset A, the JAE block (AUC = 0.65) is superior to the JCF classifier (AUC = 0.62), which only exhibits 64.94% F-score, demonstrating a difference of 3.22% between JAE. The secondary classifier JCF may be weaker than the discriminator of JAE, and the decision fusion operation of SAC can compensate for the inconsistent error results of the two classifiers, thus improving the precision. Overall, regardless of datasets A or B, the unrecorded bombs 1 and 2 can be successfully identifiedSAC as ‘false’ and ‘underwater’ targets, inferring that the SAC model gains remarkable improvements from the co-training contributions of JAE and JCF.
Moreover, this experiment reflects the promising scene adaptability of the SAE scheme as well as the high separability of compressed features considering various depth targets. Furthermore, in addition to the bombs with depth differences, the classification results of two datasets with different surface targets are similar. The recognition capabilities of the SAC model for varying vessel targets are similar despite the approach source depths.
Fig.11 Accuracy comparison of SAC and its blocks on (a) dataset A and (b) dataset B.
Table 2 Evaluation results of two databases of dataset A
Table 3 Evaluation results of two databases of dataset B
To demonstrate the generalization of compression features and the superiority of JCF, this section compares the performance between SAC and other SSL-based modules, such as Rel-RASCO (Settouti., 2017), co-forest for feature selection (OFFS) (Yaslan and Cataltepe, 2010), and SSRF (Amini., 2014) (deterministic annealing method is utilized to determine unlabeled data), and FWCRF (Liu., 2020) (individual tree forecasts and forest average prediction fuzziness are combined to improve confidence).
In the case that other SSL models cannot identify the UNT targets without records in the labeled set, the UNT samples of datasets A and B are labeled and integrated with their incomplete sets, and the labeling scale range is set to 40% – 100% for training. The labels of two datasets are set to 5 + 1 categories, and the last category indicates ‘surface’ or ‘underwater’. Additionally, the compressed features are used as the inputs of other SSL-based classifiers with randomly initialized parameters.
The average accuracy of the two datasets is displayed in Fig.12. These results reveal an overall upward trend as the label ratio rises, which tends to stabilize at > 78% when the ratio is ≥ 90%. Among the five co-training classifiers, SAC consistently exceeds other approaches by a remarkable margin and is as high as > 70% in the case of low ratio (merely 60%), followed by FWCRF, which is almost equal to Rel-RASCO when the rate of dataset B is 60% – 80%. The weak classifiers, namely OFFS and SSRF, output approximately the same probability accuracy at low ratios (60% – 80%) on the two databases. These results reinforce the notion that random initialization of co- training approaches is unsuitable for this kind of UNT recognition task. Overall, the experiments validate that SAC outperforms other SSL-based schemes on SCC feature recognition mainly owing to the weight initialization, adaptive confidence, and decision fusion.
Aiming at the arduous task of existing DL models in identifying unrecorded noncooperative underwater targetsvertical line array, a feature compression and co- training scheme based on incomplete datasets are proposed to solve this problem. This paper develops the SCC feature to compress and integrate the broadband energy spectrum of multihydrophone signals effectively. By extending adversarial learning on labeled and noncooperative samples, JAE takes advantage of additional label categories for preliminary identification of UNT targets, which has further optimized the weight initialization and adaptive confidence of the quadratic classifier JCF. Finally, the reliable recognition of SAC is effectively improved due to the cross fusion of the outputs of two classifiers in the decision fusion stage.
Fig.12 Accuracy comparison of the proposed method and other SSL classifiers on (a) dataset A and (b) dataset B.
The experiments are applied to two datasets to quantize the contribution of each block. The results validate the generalization of SAC and its potential in the field of UNT target recognition.Meanwhile, the performance of SAC is evaluated with several state-of-the-art models. Overall, applying SAC to a vertical line array can not only distinguish the noncooperative underwater targets without label records but also harnesses higher accuracy than other me- thods, even at low label ratios.
The study is supported by the National Natural Science Foundation of China (No. 6210011631), and in part by the China Postdoctoral Science Foundation (No. 2021M692 628).
Algorithm I Proposed algorithm of JAE
Input:1: Number of two SAEs training iterations;2: Number of discriminator iterations with softmax layer;3: Number of discriminator iterations with sigmoid layer; datasetsX,X, and thresholdsigmoid,softmax.
Step 1: for epochs = 1 to1do
end for
Return compression featuresy,y;
Step 3: for epochs = 1 to2do
end for
Replace sigmoid with softmax layer;
for epochs = 1 to3do
end for
Algorithm II Training mechanism of JCF
Output: The tree modelhand the predicted labelL.
Step 1: for= 1 todo
Buildhtree with onlyZ;
end for
Repeat until the trees in forest converge
for= 1 todo
for= 1 todo
SubsamplezfromZ;
end if
end for
end if
end if
end for
end for
Step 4: Return the joint predictionLby Eq. (21)
Amini, S., Homayouni, S., and Safari, A., 2014. Semi-supervised classification of hyperspectral image using random forest algorithm.. Quebec City, QC, 2866-2869.
Badi, A., Park, S., Han, D. K., and Ko, H., 2020. Correlation distance skip connection denoising autoencoder (CDSK-DAE) for speech feature enhancement., 163: 107213.
Bianco, M. J., Gerstoft, P., Traer, J., Ozanich, E., Traer, J., Roch, M. A.,, 2019. Machine learning in acoustics: Theory and applications., 146 (5): 3590-3628.
Dong, Y., Shen, X., Jiang, Z., and Wang, H., 2021. Recognition of imbalanced underwater acoustic datasets with exponentially weighted cross-entropy loss., 174: 107740.
Duan, R., Yang, K., Li, H., and Ma, Y., 2017. Acoustic-intensity striations below the critical depth: Interpretation and modeling., 142 (3): EL245- EL250.
Huang, S., Yu, L., and Jiang, W., 2020. Water entry sound detection in strong noise by using the spectrogram matrix decomposition method., 161: 107171.
Ibrahim, A. K., Zhuang, H., Ali, A. M., Erdol, N., Chérubin, L. M., Sch?rer Umpierre, M. T.,, 2019. Classification of red hind grouper call types using random ensemble of stacked autoencoders., 146 (4): 2155-2162.
Ke, X., Yuan, F., and Cheng, E., 2020. Integrated optimization of underwater acoustic ship-radiated noise recognition based on two-dimensional feature fusion., 159: 107057.
Khishe, M., and Mosavi, M. R., 2020. Classification of underwater acoustical dataset using neural network trained by Chimp Optimization Algorithm., 157: 107005.
Li, J., and Yang, H., 2021. The underwater acoustic target timbre perception and recognition based on the auditory inspired deep convolutional neural network., 182: 108210.
Li, S., Yang, S., and Liang, J., 2020. Recognition of ships based on vector sensor and bidirectional long short-term memory networks., 164: 107248.
Liu, F., Shen, T., Luo, Z., Zhao, D., and Guo, S., 2021a. Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation., 178: 107989.
Liu, J., Zhu, G., and Yin, J., 2021b. Joint color spectrum and conditional generative adversarial network processing for underwater acoustic source ranging., 182: 108244.
Liu, Z., Wen, T., Sun, W., and Zhang, Q., 2020. Semi-supervised self-training feature weighted clustering decision tree and random forest., 2020 (8): 128337-128348.
Makhzani, A., Shlens, J., Jaitly, N., and Goodfellow, I., 2016. Adversarial Autoencoders. CoRR, arXiv: 1511.05644, http:// arxiv.org/abs/1511.05644.
Mirza, M., and Osindero, S., 2014. Conditional Generative Adversarial Nets. CoRR, arXiv: 1411.1784, http://arxiv.org/abs/ 1411.1784.
Mishachandar, B., and Vairamuthu, V., 2021. Diverse ocean noise classification using deep learning., 181: 108141.
Porter, M. B., 2011. The BELLHOP manual and user’s guide: Preliminary and draft. Heat, Light and Sound Research Inc., La Jolla, CA, USA, 5-24.
Preston, J., 2009. Automated acoustic seabed classification of multibeam images of Stanton Banks., 70 (10): 1277-1287.
Principi, E., Vesperini, F., Squartini, S., and Piazza, F., 2017. Acoustic novelty detection with adversarial autoencoders.. Anchorage, IJCNN7966273, 3324-3330.
Settouti, N., Chikh, M. A., and Barra, V., 2017. A new feature selection approach based on ensemble methods in semi- supervised classification., 20 (3): 673-686.
Wang, S., and Zeng, X., 2014. Robust underwater noise targets classification using auditory inspired time-frequency analysis., 78: 68-76.
Wang, X., Jiao, J., Sun, B., Yin, J., Han, X., and Zhao, W., 2019. Underwater sonar image classification using adaptive weights convolutional neural network., 146: 145-154.
Wei, R., Ma, X., and Li, X., 2020. Depth estimation of deep water moving source based on ray separation., 174: 107739.
Xu, W., Keshmiri, S., and Wang, G. R., 2019. Adversarially approximated autoencoder for image generation and manipulation., 21 (9): 2387-2396.
Yang, K., and Zhou, X., 2019. Deep learning classification for improved bicoherence feature based on cyclic modulation and cross-correlation., 146 (4): 2201-2211.
Yang, K., Xu, L., Yang, Q., and Duan, R., 2018. Striation-based source depth estimation with a vertical line array in the deep ocean., 143 (1): EL8-EL12.
Yaslan, Y., and Cataltepe, Z., 2010. Co-training with relevant random subspaces., 73 (10): 1652-1661.
Zamorski, M., Ziba, M., Klukowski, P., Nowak, R., Stokowiec, W., Trzciński, T.,, 2020. Adversarial autoencoders for com- pact representations of 3D point clouds. CoRR, arXiv: 1811. 07605, http://arxiv.org/abs/1811.07605.
Zhang, Q., Da, L., Zhang, Y., and Hu, Y., 2021. Integrated neural networks based on feature fusion for underwater target recognition., 182: 108261.
Zhou, X., and Yang, K., 2020. A denoising representation framework for underwater acoustic signal recognition., 147 (4): EL377-EL383.
Zhou, X., Yan, Y., and Yang, K., 2021. A multi-feature compression and fusion strategy of vertical self-contained hydrophone array., 21 (21): 24349-24358.
Zhou, X., Yang, K., and Duan, R., 2019. Deep learning based on striation images for underwater and surface target classification., 26 (9): 1378-1382.
(December 25, 2021;
May 9, 2022;
May 16, 2022)
? Ocean University of China, Science Press and Springer-Verlag GmbH Germany 2023
. E-mail: zhouxy111@mail.nwpu.edu.cn
(Edited by Xie Jun)
Journal of Ocean University of China2023年5期