亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

3DMKDR: 3D Multiscale Kernels CNN Model for Depression Recognition Based on EEG

2023-05-13 09:24:54YunSuZhixuanZhangQiCaiBingtaoZhangXiaohongLi

Journal of Beijing Institute of Technology 2023年2期

Yun Su, Zhixuan Zhang, Qi Cai, Bingtao Zhang, Xiaohong Li

Abstract: Depression has become a major health threat around the world, especially for older people, so the effective detection method for depression is a great public health challenge.Electroencephalogram (EEG) can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel, this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition (3DMKDR), which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder (MDD) and the multi-modal open dataset for mental-disorder analysis(MODMA) datasets, the experiment shows that the accuracies of depression recognition are up to 99.86% and 98.01% in the subject-dependent experiment, and 95.80% and 82.27% in the subjectindependent experiment, which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.

Keywords: major depression disorder (MDD); electroencephalogram (EEG); three-dimensional convolutional neural network (3D-CNN); spatiotemporal features

1 Introduction

Depression is a common mental disease characterized by persistent low mood and cognitive impairment, which severely affects people’s quality of life [1].

The world’s population is aging at a fast pace [2].Episodes of depression disorders might be expected to be more prevalent in older age due to the increased risk of adverse life events.Compared with younger adults, older people more often have substantial depressive symptomatology without meeting the diagnostic criteria for a depressive disorder.This condition affects nearly 1 in 10 older adults [3].

Therefore, we should pay more attention to the diagnosis of elderly people with depressive symptomatology.With the high incidence and low recognition rate of depression, exploring simple, objective, and accurate evaluation methods for depression detection for older people is a major public health challenge [4].Electroencephalogram (EEG) has the ability to objectively reflect the user’s inner true emotional experience [5].Therefore, it is urgent and necessary to explore an effective EEG-based method for depression recognition.

Currently, deep learning (DL) methods have been used in depression and emotion detection based on EEG and can achieve superior performance compared to traditional machine learning methods [6].For example, Sandheep et al.[7] proposed that an extensive data learning approach method and the convolutional neural network(CNN) model can distinguish depression patients better in resting state EEG data.Shao et al.[8]proposed a CNN model for classifying the EEG data of depressed and normal subjects.Deng et al.[9] proposed a model composed of five parallel convolutional filters that learned the spacefrequency features of EEG, and its accuracy was up to 94.37% in the MODMA dataset [10].The recognition of mental states requires consideration of not only the temporal dependence between data points but also the spatial relevance between different electrodes of EEG, which the three-dimensional convolutional neural network (3D-CNN) has the ability to automatically extract spatiotemporal features [11].

The existing recognition models have achieved high accuracy, while some researchers believe that multiple smaller-scale kernels have the ability to increase nonlinear expression more than a larger kernel [12].Defining multiscale convolutional kernels based on temporal and spatial features is still an interesting topic in depression recognition research by the 3D-CNN model.Therefore, we proposed the 3D multiscale kernels convolutional neural network model for the depression disorder recognition (3DMKDR) in this paper.Specifically, we designed the spatial position of EEG electrodes according to the 10-20 electrode system [13], which retains the spatial information of all EEG electrodes.Different kinds of smaller-scale convolutional kernels and double linear convolutional structures are joined to our model to obtain efficient recognition ability.In summary, the main contributions of this paper are as follows.

1) The raw EEG signals are transformed into a three-dimensional (3D) dataset according to the topology of EEG electrodes.This method can completely capture the prior knowledge among electrodes, where the spatial and temporal information of electrodes are mined.

2) The 3D-CNN model with multiscale convolutional kernels is constructed to achieve more efficient depression recognition performance.Multiscale convolutional kernels have powerful potential for performing recognition based on EEG.

3) This study conducts extensive experiments to verify the effective recognition ability of 3DMKDR based on EEG from the MDD [14] and MODMA depression datasets.The results are higher than those of alternative competitive methods.

The paper is structured as follows.In Sections 2 and 3, the main methodology adopted in the work is discussed and followed by the experiment.The discussion with results derived from the experiment is presented in Section 4.Section 5 provides the main considerations and some future work.

2 Methods

In this study, we presented a relocated electrode method and 3DMKDR model.The procedure of the method is shown in Fig.1.First, the original EEG signals from the EEG datasets are preprocessed and converted from the 2D to the 3D format by the located electrodes.Then, the 3DMKDR model is constructed and evaluated based on the recognition of major depressive disorders (MDDs) relative to healthy controls(HCs).

2.1 Preprocessing of EEG

It is necessary to preprocess the raw EEG signals to improve the recognition accuracy [15].Some researchers relocated the EEG electrodes to the 2D topology based on the topological relation according to the 10-20 electrode system to preserve the spatial information of EEG electrodes, which can be conducive to improving the recognition ability of the model [16].

Fig.1 The procedure of the method

To improve the accuracy of depression detection, the topology of electrodes is further improved based on our previous work [17].Located in the 10-20 electrode system shown in Fig.2(a),a blank row is added between the two rows of electrodes during design processing based on a new dataset to ensure the topology sparseness among electrodes.This method can maintain reasonable spacing among all electrodes and ensure the exact electrode position.According to the farthest distance between the two electrodes, the size of the two-dimensional matrix is set to 9×9.

Then, the selected EEG signals are mapped into a 9×9 matrix, as shown in Fig.2(b).Each one-dimensional EEG signal data vector is inserted into a two-dimensional topological matrix and the time sampling data vector of EEG acts as the third dimension of the threedimensional matrix.The blank position is represented as a topological position of the unselected physiological signals, which is set to zero in the 9×9 matrix, and the matrix is normalized.

2.2 Recognition of 3DMKDR

Based on the generated 3D EEG dataset, the 3DCNN model with multiscale convolutional kernels for depression recognition based on EEG signals (3DMKDR) is constructed to extract multiple feature matrixes that fuse temporal and spatial features.

Fig.2 Brain electrode distribution diagram:(a) brain electrode distribution diagram [13]; (b) corresponding matrix

3 Experiments

3.1 Datasets

To evaluate the effectiveness of our method, we conducted experiments on the MDD and MODMA datasets to analyze depressive states in different subjects.

The MDD dataset records resting state and task state EEG data from 34 major depression patients and 30 healthy control people.The task state data are used in our experiment.The experiment paradigm of the task state was that the subjects were asked to watch three kinds of patterns on the computer screen and press the space bar when the 5 cm circle in patterns appeared.The task state data were collected for 10 min,and the lengths of different trials were different in all subjects.

The MODMA dataset provided by Lanzhou University is the second open dataset to test our method in this paper.The EEG data in the MODMA consist of task state and resting state data from 24 major depression patients and 29 healthy control people.The task state data are also chosen in our experiment.In the task state,all subjects were asked to identify the spatial position of the “dot”in one kind of picture and press the “1”or “4”button as soon as possible,when four graphs were randomly played on a computer screen.The whole task state experiment was completed in approximately 25 min,and the lengths of different trials were different in all subjects.

3.2 Preprocessing

In the experiments, the EEG data of the task state in the MDD and MODMA datasets are chosen, which includes more data of subjects and is helpful to obtain better recognition ability.

3.2.1 MDD Dataset The task state data of the MDD dataset with 22 EEG electrodes from 64 subjects were recorded.Because 3 electrodes of 22 are ear-related electrodes and the data of three subjects (the 20th and the 21st healthy subjects, the 9th major depression subject) are invalid, this research selects 19 electrodes shown in the red circle of Fig.2(b) and 2 types of labels, i.e., major depressive disorders and healthy controls, from 61 subjects in the MDD dataset.

The processing step is as follows.First, artifacts are removed from the data by a bandpass frequency filter of 0.1–50.0 Hz and downsampled from 256 Hz to 128 Hz.The data lengths of trials in all subjects are different, so the dataset(61 × 19 × 77 440) is obtained by comparing and choosing the minimum data length (77 440)of all trials, where 61 is the number of subjects,19 is the number of EEG electrodes and 77 440 is the number of sampled data points.Second, the dataset is changed to a new dataset with a size of 1 209 × 19 ×128, where 1 209 is the number of fragments by time window, 19 is the number of EEG electrodes, and 128 is the number of sampled data points.Finally, 73 749 3D matrixes(9×9×128) are obtained based on the 10–20 electrode system, where 73 749 is the product of 61 × 1 209 (subject number× fragment number),9 × 9 is the devised 2D matrix, and 128 is the sampled data point.

3.2.2 MODMA Dataset same steps as the MDD dataset, where 82 521 is the product of 53×1 557 (subject number× fragment number), 1 557 is the number of fragments by the processing of time windows, 9×9 is the 2D matrix, and 128 is the sampled data points.

The task state EEG data in the MODMA record with 128 electrodes from 53 subjects.To ensure channel sparseness and alleviate the computation needed, 19 EEG electrodes that are the same as the MDD dataset and 2 labels (MDDs and HCs) are selected from this dataset for the analysis of the depressive disorder state.

The processing step is as follows.First, artifacts are removed from the data by a bandpass frequency filter of 0.1–50.0 Hz and downsampled from 250 Hz to 128 Hz.The dataset (53×19×99 836)is obtained by comparing and choosing the minimum data length (99 836) of each trial, where 53 is the number of subjects and 19 is the number of EEG electrodes.Second, 82 521 3D matrixes(9×9×128) are obtained for MODMA by the

3.2.3 Time Window

To further improve the recognition accuracy of the model, a suitable time window is chosen by comparing two kinds of time windows.In the literature [18], people confirmed that the average classification accuracy of a 1 s time window based on EEG was superior to other types.However, researchers believe that the 2 s time window is the most common application [19].

Fig.3 Accuracy comparison of different time windows:(a) in the MDD dataset; (b) in the MODMA dataset

To address this problem, we compare EEG classification results at two kinds of time window lengths in Fig.3 (a) and Fig.3 (b).It is obvious that the classification accuracy of the 1 s time window is higher than that of the 2 s time window with the same number of iterations.Compared with the 2 s time window, the data fragments by the 1 s time window can make the model obtain more detailed emotional characteristics and gain a suitable amount of data for the recognition of the model.Therefore, we select the 1 s time window length as suitable for the model,and a 0.5 s overlapping window is set to ensure high-quality time continuity of data and achieve high-efficiency recognition accuracy.

3.3 3DMKDR Model

The 3DMKDR model is constructed to cover the related information among the brain electrodes and helps to learn the advanced features of EEG signals from 3D data.Multiple convolutional kernels can increase the nonlinear expression and make the judgment function more efficient [20].Moreover, the multiscale kernel can improve the robustness of the network to extract features,which advances the generalization ability and classification accuracy [21].

The detailed architecture of the 3DMKDR model is shown in Fig.4 and Tab.1.This model consists of three layers, in which layer 1 is implemented in parallel with layer 2.Layer 1 includes convolutional layer 1-1 and convolutional layer 1-2.The kernel sizes of the convolution layers in layer 1 and layer 3 are both 3×3×4, where spatiotemporal features are extracted by the local spatial topology of 3×3 and temporal fragments of 4 sampled points.The kernel size of layer 2 is 3×3×5.

Fig.4 The overall architecture of the model

Tab.1 Detailed architectures of the model

Layer 1 consists of layer 1-1 and layer 1-2,which can perform the coarse-grained calculation of spatiotemporal information based on the small convolutional kernel.Layer 2 accomplishes the same operation of coarse-grained calculation based on another type of small convolutional kernel.After the procession of layer 1 and layer 2,layer 3 implements fine-grained screening of spatiotemporal information.

The activation function ReLU(·) is used after each convolutional layer to perform nonlinear feature transformation and to prevent the loss of information from the input data.A 3D maximum pooling layer is set behind each convolutional layer and used to extract the features more efficiently here, which can reduce the quantity of data on the temporal dimension.The last pooling layer is followed by the fully connected layer and the Softmax layer is deployed as the output.

4 Results and Discussion

This model has been implemented in the PyTorch framework and deployed on the GeForce RTX 3060.The learning rate was set to 0.001 with the Adam Adadelta Optimizer, and the probability of the dropout operation was set to 0.6.Then, 10-fold cross-validation [22] and leave-one-subject-out cross-validation (LOSO)methods [23] were used to evaluate the performance of the 3DMKDR model.

To estimate the accuracy of depression recognition for the 3DMKDR model, the model was analyzed by the results of two experiments.We conducted both the subject-dependent and the subject-independent depression recognition experiments.Then, the average accuracy, F1 value, AUC value, UAR value and confusion matrix were obtained.Finally, there were three discussions: comparisons with other related studies, evaluation indicators and ablation experiments.

4.1 Results

4.1.1 Subject-Dependent Experiment

For the subject-dependent depression recognition experiment, EEG data from all subjects were trained into a model.This experiment was conducted under 10-fold cross-validation; that is, all subjects’data fragments mixed together 90% of the EEG data were used as training data to construct a model, and 10% of the data were used for testing the model.Therefore, the test and training datasets have the same data distribution.

The recognition results on the 4 bands, that is, theta (4–7 Hz), alpha (8–13 Hz), beta (14–30 Hz), and gamma (31–45 Hz), are shown in Tab.2.According to the results, the recognition accuracy of depression of the theta band is lower than that of the other three bands.We speculate that the theta band is generated in the state of sleep and a less responsive depressive state.Furthermore, the excitement state of the alpha, beta,and gamma bands increased successively, so the recognition accuracy of depression gradually increased [24].The iteration of the overall EEG as the efficiency standard of our model is also shown in Tab.2.

Tab.2 Comparison of the 3DMKDR model

The F1 value, AUC value, UAR value and confusion matrix were calculated for two classes,MDDs and HCs, to analyze the performance of the 3DMKDR.The higher values of the two classes indicate that the model constructed in this paper has better performance.(0, 0) and (1,1) are expressed as HCs and MDDs recognition rates, respectively, in which the confusion matrix of the MDD dataset in Fig.5(a) and the matrix of the MODMA dataset in Fig.5(b).The matrix verifies that our model is stronger than others in predicting depression.Unweighted average recall(UAR) is used as the evaluation criterion for the performance of this model, which is calculated by the following formula:

where TPiand FNiare the true positive and false negative of thei-th (i=1,2) class and predict the number of correct and incorrect for that class,respectively.Recallimeans the predicted correct proportion in all samples of this class.UAR represents the arithmetic average of the recall rates for all classes.In the MDD dataset, we can obtain that UARMDD(Recall1= 99.87%, Recall2=99.91%) is 99.89%.The UARMODMA(Recall1=97.87%, Recall2= 97.93%) is 97.90%.UAR can ignore imbalances of samples and realize the objective evaluation of the model.By analyzing and comparing the recalls of two classes on the same dataset, higher recalls prove that our model has good overall performance.

Fig.5 The confusion matrixes of datasets:(a) in the MDD dataset; (b) in the MODMA dataset

4.1.2 Subject-Independent Experiment

For the subject-independent recognition experiment, the model can be trained using data acquired from a limited number of subjects and can be applied for a subject who has never experienced the system prior to the experiment.This experiment was conducted under the leave-onesubject-out cross-validation.For example, in the MDD dataset, a model is trained based on EEG data acquired from 60 participants and is tested on the data acquired from the remaining one subject.This is repeated until each subject is a test subject once.Due to some individual differences between each participant, the test and training sets have different data distributions.Then, the average accuracy is obtained.Tab.3 and Tab.4 show the recognition results of LOSO cross-validation in the MDD and MODMA datasets,respectively.

Tab.3 Accuracy in the MDD dataset

Tab.4 Accuracy in the MODMA dataset

The experimental results of comparing the two datasets can be summarized as follows.Each subject in the two datasets has different recognition accuracies because there are differences in the psychological state of different individuals.In addition, during the data collection process of the MDD dataset, the subjects’attention was more concentrated.This leads to these subjects’recognition accuracy in the MDD dataset having relatively small fluctuations, and the average accuracy is higher than that with the MODMA dataset.Some lower recognition results appear in Tab.4, such as 50.39%, which may be due to the subject distractions in the EEG acquisition experiment of the MODMA dataset.The model structure can obtain information among the EEG electrodes and promote better recognition ability.

4.2 Discussion

4.2.1 Comparison

The 3DMKDR model in this article compared to the previous literature report shows that our model has the ability to improve the accuracy of EEG-based depression recognition.In Tab.5, the average accuracy of our model is up to 99.86% in the MDD datasets, which is higher than the result of previously researched models, in which the highest accuracy was 99.58%.The average accuracy for MDDs and HCs is up to 98.01% in the MODMA datasets, which is higher than the 94.00% achieved.

Tab.5 Comparison with previous studies

It is known that the model based on DL proposed by [25,26] currently has the best performance in the MDD and MODMA datasets.However, the results of the 3DMKDR model are approximately 0.28% and 4.01% higher than those models.Compared with the model [27–29],the multikernel makes the 3DMKDR have higher speed efficiency and better recognition performance than other models in the MDD dataset.Our model adopts the method of repositioning the EEG electrodes to the 2D topology so that the model has higher speed efficiency and better identification performance than other models[30, 31] in the MODMA dataset.The parallel structure and the dot addition part of this model can improve the timeliness and efficient recognition ability of our model.From our experimental results shown in Tab.5, it can be seen that our results are superior to those of previous studies reported in the literature.

4.2.2 Ablation Experiment

The generic dimension and volume type of the kernel are 2×2, 3×3, 5×5, and 7×7.Conv2×2(the size of the convolutional kernel is 2×2) cannot find the central point of the convolution,which causes the features of the padding process to offset constantly [32].A plurality of conv3×3 approximately equal to one conv5×5 or conv7×7 can further extract the features and reduce the number of parameters, and the recognition capability of the model can be controlled by the kernel volume [33].Thus, this paper applies the conv3×3 multiscale convolutional kernels to increase the feature amount of the model calculation and improve the model recognition.

To confirm the effectiveness of multiscale kernels, this paper takes the four types of convolutional kernels, i.e., conv3×3×k, conv3×3×k*,conv5×5×kand conv7×7×k.Different types of kernels carry out the same dataset, and each classification result is compared in Tab.6.Conv3×3×k* represents the multiscale kernel of our model, conv3×3×krepresents the same-scale kernel, andkrepresents the length of the data that was originally set in the third dimension.Based on the comparison of the results of the ablation experiment, this work finds that conv3×3×kis 4.18% and 1.75% higher than conv5×5×kand conv7×7×k, respectively, and conv3×3×k* is 6.76% and 4.33% higher than conv5×5×kand conv7×7×k, respectively, in the MDD dataset.Similarly, the results of conv3×3×kand conv3×3×k* are better than those of conv5×5×kand conv7×7×kin the MODMA dataset.

Tab.6 Comparison of different kernels

It is obvious that the smaller convolutional kernels pay more attention to the spatiotemporal information between the two electrodes and have the ability to extract effective spatiotemporal features.The larger convolutional kernels cover a larger area in electrode matrixes and extract spatial information between multiple electrodes and contain considerable noise in the features.

Conv3×3×k* is even more than conv3×3×k,with 2.58% and 1.12% more in the two datasets.The recognition results of the multiscale convolutional kernel are higher than those of the samescale convolutional kernel.The reason is that the multiscale convolutional kernel can control the extraction of features based on different lengths of temporal sampling points.This method has the ability to diversely increase the network capacity so that the decision function is more distinguished for different categories.

5 Conclusion

In this study, the multiscale kernel 3D-CNN model (3DMKDR) is proposed to effectively recognize MDDs relative to HCs based on EEG signals.By analyzing and comparing the accuracy,F1, AUC and UAR values of the subject-dependent and subject-independent experiments are conducted in the task state EEG data of the MDD and MODMA datasets, which are open depression datasets.The results show that our model can quickly and effectively detect whether the subject is in a major depressive disorder state.The method has been confirmed to have remarkable potential to recognize depression disorders in older people.

While we have achieved superior accuracy when compared to alternative methods as discussed in this paper using EEG data, we consider that there are further potential improvements.For older persons who are inconvenient to move, our future work will explore a better model that uses EEG data from fewer electrodes to efficiently recognize depression based on the resting state EEG data collected by portable EEG acquisition devices.It is anticipated that these methods can facilitate the performance of real-time depressive disorder recognition for seniors, which will improve quality of life for older people with depressive symptomatology.

Journal of Beijing Institute of Technology2023年2期

Journal of Beijing Institute of Technology的其它文章: A Hybrid Model Based on ResNet and GCN for sEMG-Based Gesture Recognition; An End-to-End Machine Learning Framework for Predicting Common Geriatric Diseases; Brain Functional Network Based on Small-Worldness and Minimum Spanning Tree for Depression Analysis; Serum Sodium Fluctuation Prediction among ICU Patients Using Neural Network Algorithm:Analysis of the MIMIC-IV Database; Exploring Brain Age Calculation Models Available for Alzheimer’s Disease; User Profile in Smart Elderly Care Community:Findings from Community in Western China