亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

An Interpretable Depression Prediction Model for the Elderly Based on ISSA Optimized LightGBM

2023-05-13 09:24:54JieWangZitongWangJinzeLiYanPeng

Journal of Beijing Institute of Technology 2023年2期

Jie Wang, Zitong Wang, Jinze Li, Yan Peng

Abstract: Depression is one of the most severe mental health illnesses among senior citizens.Aiming at the low accuracy and poor interpretability of traditional prediction models, a novel interpretable depression predictive model for the elderly based on the improved sparrow search algorithm (ISSA) optimized light gradient boosting machine (LightGBM) and Shapley Additive exPlainations (SHAP) is proposed.First of all, to achieve better optimization ability and convergence speed, various strategies are used to improve SSA, including initialization population by Halton sequence, generating elite population by reverse learning and multi-sample learning strategy with linear control of step size.Then, the ISSA is applied to optimize the hyper-parameters of light gradient boosting machine (LightGBM) to improve the prediction accuracy when facing massive high-dimensional data.Finally, SHAP is used to provide global and local interpretation of the prediction model.The effectiveness of the proposed method is validated by a series of comparative experiments based on a real-world dataset.

Keywords: the elderly; depression prediction; improved sparrow search algorithm(ISSA); light gradient boosting machine (LightGBM); Shapley Additive exPlainations (SHAP)

1 Introduction

With the trend of global aging, depression among the elderly constitutes a major public health concern that attracts worldwide attention [1].According to the World Health Organization(WHO), in 2021, estimated 3.8% of the world’s population suffered from depression, and the rate in the elderly over the age of 60 was 5.7%, 1.5 times the overall level [2].As a typical mental disorder, depression seriously affects the life quality and health of the elderly.At present, biochemical indicators and doctor interviews are mainly used to detect elderly depression in clinical practice.However, the causes of elderly depression can be complex and obscure, and a variety of potential risk factors including biological, psychological and social factors may be associated with the occurrence of depression [3].How to fully exploit these factors and make accurate depression prediction for older adults at an early stage is a meaningful and challenging work, that can provide assistant decision-making for doctors’diagnosis and early risk assessment.

In conjunction with the advances of artificial intelligence (AI), there is increasing research on the application of machine learning (ML) in the domain of depression prediction.For instance, Qasrawi et al.compared five wellknown ML algorithms and found that support vector machine (SVM) has the best effect in predicting children’s depression risk factors [4].Based on the collected data, Priya et al.applied five ML algorithms to predict anxiety, depression and stress in modern life and found that Naive Bayes classifier achieved the best accuracy[5].Camp et al.found that SVM performed best among six popular classifiers in the prediction task of reduced depression severity in people with epilepsy [6].However, most of these studies use single ML model, whose performance is greatly affected by datasets, and always suffers from insufficient generalization ability.

Besides that, ensemble models that integrate multiple base learners into a stronger learner by certain strategy, outperform various single ML models in binary classification problems [7].Zhang et al.used the ensemble model CatBoost to establish a depression risk assessment model, and achieved higher prediction accuracy than logistic regression (LR) and SVM [8].Sharma et al.applied eXtreme Gradient Boosting (XGBoost) to identify depression on a large imbalanced dataset and obtained satisfied prediction accuracy [9].Compared with single classifiers, Mara et al.found that light gradient boosting machine (LightGBM) achieves the best performance for the detection of stress related states at a granular level on physiological and behavioral data [10].As a powerful boosting ensemble model and an efficient implementation of the gradient boosting decision tree (GBDT) framework,LightGBM has the advantages of high prediction accuracy, fast training speed and low memory consumption, which is well adapted to deal with large size data with high dimensions [11].LightGBM has gained increasing attention and achieved good performance in the field of disease prediction [12, 13].

However, the determination of hyper-parameters is a primary concern in ensemble models,which has a significant influence on the predictive performance [14].Random search (RS), grid search (GS) and swarm intelligence optimization algorithms are frequently used to tune the hyper-parameters of predictive models.Nevertheless, GS has a high computational complexity with increasing number of hyper-parameters and RS may miss the optimal values by trying random combinations of hyper-parameters [15].In addition, particle swarm optimization (PSO) and grey wolf optimizer (GSO) are easy to fall into local optimal solution for fail to grasp the global trend of model prediction performance [16].In contrast, sparrow search algorithm (SSA) [17] is a preferable choice due to the advantages of simple structure, flexibility, and few parameters [18],but its optimization ability and convergence speed still need to be improved [19].

On the other hand, the lack of interpretability becomes a challenging issue when ML models are entrusted with the power to make clinical decisions that affect people’s well-being [20].Potential depression patients may have psychological or mental disorders [21], and prediction results without explanations may be more difficult for patients to accept.

In this paper, a new interpretable prediction model ISSA-LightGBM for elderly depression is proposed, which integrates multi-strategy improved SSA(ISSA), LightGBM and Shapley Additive exPlainations (SHAP).Due to the problems of easily falling into the local optimal solution and slow convergence speed, SSA is further improved by multi-strategy in the proposed model.Then the ISSA is used to optimize the hyper-parameters of LightGBM to train a classifier that can obtain a better prediction accuracy.Furthermore, SHAP is applied to analyze the feature importance of the proposed model to provide explanations for improving the clinical understanding of depression risk factors for the elderly.A public large-scale behavioral risk survey dataset BRFSS [22] is used to prove the effectiveness of this method.

The rest of the paper is organized as follows:Section 2 introduces the methods involved in the study.Section 3 presents the entire proposed model and algorithm.In Section 4, comparative experiments are carried out and the proposed model is verified by a real-world dataset.The explainable analyses are presented in Section 5.Section 6 concludes the paper.

2 Preliminaries

2.1 LightGBM

LightGBM is a novel variant of the GBDT developed to tackle the challenges in learning decision trees when the feature dimension is high and the data size is large [23].LightGBM uses the histogram algorithm to process feature data, and the main idea is to discretize continuous feature values intokintegers and construct a histogram of widthk.Then, the data are traversed to accumulate the discretized values of the histogram and find the optimal split point.The histogram algorithm greatly reduces the time and memory consumption of split point selection.In addition,unlike most GBDT algorithms using the levelwise strategy, LightGBM adopts an efficient leafwise strategy with depth constraints to find the leaf with the greatest split gain among all the present leaves and split it.

LightGBM starts with a constant tree model and minimizes the loss function by training new tree models as

whereLis the loss function of the algorithm,expressed in terms of the predicted valuey︿iand the true valueyiof thei-th sample.

The conventional GDBT models need to scan all the data instances to estimate the information gain of all the possible split points for every feature, which is very time-consuming when handing big data.LightGBM develops two advanced technologies named gradient-based oneside sampling (GOSS) and exclusive feature bundling (EFB) to resolve this issue by rationally reducing the number of data instances and features.In addition, LightGBM adopts the parallelism mechanism of data and features and other optimization strategies, which ensure high efficiency while preventing overfitting.

Although LightGBM achieves a good balance between computational accuracy and efficiency, it has a rather large hyper-parameters space which has a significant effect on the final classification effect.Thus, it’s important to select the appropriate hyper-parameters for the classifier [19].

2.2 SSA

SSA is a recent swarm intelligent optimization algorithm that imitates sparrow predation and anti-predation behavior to find the optimal solution in many engineering application areas, and has gained increasing attention due to its searching precision and stability.

The sparrow population in SSA is generally separated into two groups: producers and scroungers.Individuals with better fitness values in population will act as producers who are responsible for providing foraging areas or directions for scroungers, while the location of the producer during each iteration is updated as

Other individuals in the population are scroungers, who may compete for food from producers or fly to other areas to feed, and their position update formula is described as

whereXbandXware the optimal position occupied by the producer and the current global worst location respectively, andArepresents a matrix of 1×dwith each element inside assigned 1 or –1 randomly.

In addition, a random subset of individuals in the population becomes aware of the danger and chooses to move towards or near other individuals in the center of the population.The initial positions of these sparrows are randomly generated in the population, and the location of the sparrows aware of the danger is updated as

whereXbis the current global optimal location,βis the step size control parameter, andKis a random number between –1 and 1.Here,fiis the fitness value of the present sparrow,fgandfwdenote the current global best and worst fitness values respectively.

2.3 Shapley Additive exPlainations(SHAP)

Complex models based on ensemble or deep learning algorithms often achieve high accuracy for large modern datasets.However, such models are difficult to explain with increasing complexity, which makes them almost a black box.SHAP [24] is an effective interpretation method based on cooperative game theory that calculates each feature’s Shapley value of the input sample to measure the contribution of each feature to the final prediction output.

In SHAP, for a single inputx, the original prediction modelf(x) is explained by a linear modelg(x’), wherex’is the simplified inputs,mapping toxthrough a mapping functionx=hx(x’), andg(x’) is given by

The Shapley value ?ican be calculated as

whereNis the set of all features,Kis a subset of the features, andgx(K) represents the expected value of the function on subsetK.More detailed explanations of the SHAP can be found in [24].

In addition to the interpretation ability of the feature importance of ML models, the unique advantage of SHAP is that it can reflect whether the influence of each feature is positive and negative on the final prediction results.

3 The Proposed Model

3.1 Overview of the Model Design

This study was conducted in five main steps: 1)Preprocess the original BRFSS dataset and divide the preprocessed dataset into training and testing set with cross-validation; 2) Improve SSA algorithm by multi-strategy; 3) Apply the ISSA to optimize the hyper-parameters of LightGBM,and train the classifier; 4) Evaluate the ISSA optimized LightGBM model by testing data; 5)Perform model interpretation by feature importance and case analysis on the complete preprocessed data set.The overall study design is shown in Fig.1.

3.2 Multi-Strategy Optimized SSA

Although SSA has the advantages of simplicity and flexibility, it still has some drawbacks, such as poor initial quality by randomly generated population, inefficient information exchange between population individuals, and uncontrollable step size when individuals move.These problems lead to the algorithm being prone to getting stuck in local optima and having slow convergence speed.Hence, a multi-strategy optimized SSA algorithm ISSA is proposed, which are effective to obtain the global optimal hyperparameter solution of LightGBM.

3.2.1 Population Initialization Based on Halton Sequence

In swarm intelligent optimization algorithm, the initial individuals with uniform distribution within the population boundary help to maximize the initial search range of the algorithm[25].However, the initial population in SSA is randomly generated, leading to the limited value range of the initial individuals.Here, Halton sequence is applied to generate a uniformly distributed initial population which increases the diversity of the sparrow population.

As a powerful method to generate point sets with low difference in any dimension between (0,1), the steps of using Halton sequence to initialize population are as follows.For ad-dimensional population, set the firstddata in the prime sequence as the basisdnof each d dimension, divide the interval (0, 1) intodnsubintervals, and continue to performdndividing for each subinterval.Then the set consisting of each division point is then-dimension representation by the Halton sequence.Each division point in the process is calculated as follows.

Thet-th point of any dimension represented as the basednis

And thisdnbase data is mirrored to the right of the decimal point and denoted as?dn(t)

Then the corresponding decimal data(t)of?dn(t) is thet-th data indndimensiont.

In addition, using Halton sequence to initialize a population also requires mapping the corresponding Halton points of each dimension to the population boundary of that dimension, which is described as

Fig.2 shows the initial populations generated by random method and Halton sequence in two dimensions, and the comparison result signifies that Halton sequence initialized population is more uniform in distribution, which increases the diversity of population samples and can better meet the initial search range requirement of SSA.

3.2.2 Reverse Learning to Generate Elite Population

The quality of the initial population has an important impact on the accuracy of SSA, and elite population can obtain high-quality solutions of SSA at the beginning of iteration, which helps to increase the convergence speed and improve optimization performance.On the basis of initial uniform population created by the Halton sequence, reverse learning is applied to generate a new populationX? based on the original populationX.Meanwhile, the fitness of individuals in both populations is calculated.After ranking all individuals, the topPindividuals are selected to form the elite initial population.

Fig.2 Comparison of population initialization: (a) random initialization; (b) Halton sequence initialization

whereXDMaxandXDMinrepresent the maximum and minimum values for each dimension in the initial population respectively.

3.2.3 Multi-Sample Learning Strategy with Linear Control of Step Size

According to the position update formula of SSA,scroungers only learn from the producers with the highest fitness in the population, which is not conducive to maintaining the diversity of the population.Meanwhile, other producers are only responsible for searching food and do not participate in the exchange of information, which results in the poor effect of information exchange between population individuals and the solutions are easy to be stuck in local optima.

Multi-sample learning allows scroungers to learn from both the most suitable producers and two random producers, thereby expanding the search range of the population and increasing the population diversity.Moreover, the linear control resets the step control parameter, which reduces the moving step size of population with the increase of iterations linearly, expands the search range in the early stage of the iteration,and conducts search in a small range at the later stage, thus improves the optimization accuracy.

The update formula for multi-sample learning of the scroungers with linear control step size is

The linear control step size of the producers’position is updated as

Subsequently, the linear control step size of the perceived dangerous sparrow position is updated as

3.3 ISSA-LightGBM Algorithm

According to the proposed optimization strategies, the mechanism of ISSA-LightGBM is shown in Algorithm 1.Firstly, initiate the parameters of ISSA.Secondly, the Halton sequence is used to initiate the population of SSA and generate an elite population by the reverse learning strategy.Thirdly, update the position of sparrows and global fitness.Finally, algorithm 1 returns the position of the optimal sparrow and its fitness value, which are the best hyper-parameters of LightGBM.In addition, the input of algorithm 1 are the parameters of SSA, which are described in Tab.1.

Algorithm 1 ISSA-LightGBM Input: P, D, ub, lb, ε, S, PD.fGbXGb Output: ,.1.pdNum = P×PD ?X 2.X = [], = [], F = []3.X ← Halton(D)XDMaxXDMin 4.Get the ,?X 5.← Apply formula (11) to X ?X 6.X = X [:P] ← sort (X+ )7.for t = 1:ε ? 8.=1–(t/ε)9.F = LightGBM.fit (X)XbXwfbfw 10.Get the , , ,11.for i = 1:pdNum 12.X[i] ← Apply formula (13) to X[i]13.end 14.for i = (pdNum+1):P 15.X[i] ← Apply formula (12) to X[i]16.end 17.I = random(P)18.for i = 1:I 19.X[i] ← Apply formula (14) to X[i]20.end fGbfb 21.if < then fGbfb 22.Replace the fitness value by 23.end XGbXb 24.if < then XGbXb 25.Replace the fitness value by 26.end

27.End 28.return the global best sparrow’s position and its fitness value

4 Experiments and Results

4.1 Dataset and Preprocessing

For this study, the dataset is sourced from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) of the US Centers for Disease Control and Prevention (CDC), and the raw dataset has 438 693 samples and 304 features.Since 1984, BRFSS collects more than 400 000 American health-related behavioral risk factors related to disease occurrence, development, or mortality among adults, including smoking, alcohol consumption, and poor dietary habits et al every year, and it is now the largest continuously conducted health survey system in the world.

In the preprocessing stage, we first process the missing values in the dataset.Numerical features with a few missing values are imputed by the mean.On the other hand, the features with more than 15% missing values are directly deleted because in the case of serious missing,whether filling blanks with median or average value will cause the distortion of data and decrease the amount of information in original sample.For the processing of features, undesirable features and features unrelated to depression are removed from the dataset, including identifier codes, whether the older adults having been asked to measure blood pressure at home,etc.Additionally, to avoid the influence of the maximum or minimum abnormal samples on the model training, the min–max normalization is conducted to normalize the numerical data into the range of [0, 1].After filtering data with age over 60, the proportion is 1:5 between positive and negative categories, thus synthesizing minority oversampling technology (SMOTE) is applied to balance the data.As a result, the dataset contains 134 730 samples and 63 features, and the binary variable‘ADDEPEV3’representing whether depression is the target to be predicted.

4.2 Parameters Setting of the Prediction Model

Multi-strategy optimized-SSA is used to tune the hyper-parameters of LightGBM.

For SSA itself, GS is used to tune the most important parametersP,Sand PD.Regarding the number of iterations, early stopping is adopted, which can stop the iteration early when the algorithm has already converged.The specific parameter settings of SSA are listed in Tab.1.LightGBM has several types of hyper-parameters, including core, learning control and io parameters et al.Subsequently, this paper selects those that have a significant impact on the model to tune.The description, tuning range and tuning result of the selected hyper-parameters are shown in Tab.2.

Tab.1 Parameter setting of sparrow search algorithm

Tab.2 The hyper-parameters tuning results of ISSA-optimized LightGBM

4.3 Comparison Experiments

In this section, three groups of comparative experiments are conducted on the massive high dimensional dataset (2021 BRFSS) to verify the competence of the proposed ISSA-LightGBM.First, we compare ISSA-LightGBM with five state-of-the-art additional classifiers, namely Knearest neighbor (KNN), LR, random forest(RF), LightGBM and XGBoost, and the hyperparameters of each model are optimized by random search.Furthermore, KNN, RF and LR optimized by GS and LightGBM, XGBoost optimized by SSA are compared with ISSA-Light-GBM.In addition, iterative 5-fold cross validation with the shuffle strategy is conducted to evaluate the generalizability of ISSA-LightGBM and avoid overfitting.

Different standard evaluation metrics are applied to measure the performance of each model accordingly, including accuracy, recall, F1 score and AUC, whereas accuracy refers to the percentage of correct samples predicted by the classifier in the total samples; recall is the proportion of correct predicted positive samples; F1 score is the harmonic average of precision and recall, which can give consideration to two evaluation indicators and reflect the robustness of the model, and AUC is the area under the ROC curve, and the higher the AUC means the better classification effect of the classifier.

The performance achieved by ISSA-Light-GBM and additional ML models with different optimization strategies are exhibited in Tab.3.Using cross-validation with multiple iterations can effectively reflect the robustness of the model.Thus, all evaluation metrics in the table are based on 50 times 5-fold cross-validation and calculated as averages.It can be noted that ensemble models generally have better performance than single ML.Among the additional methods, LightGBM and XGBoost optimized with SSA resulted better accuracies of 80.7% and 80.6% respectively, whereas KNN performed worst with an accuracy below 70%.Under iterative 5-fold cross validation, the proposed ISSALightGBM outperforms all other methods with values of 81.4%, 80.7%, 78.9% and 90.8% in terms of the above metrics.In addition, as Fig.3 shown, the area under the ROC curve of ISSALightGBM is the largest, which illustrates that the overall classification effect of ISSA-Light-GBM is the best.From the obtained results, the proposed ISSA-LightGBM constitutes an important technique for the classification of healthrelated data and, in particular, for depression prediction for old people.

Tab.3 Evaluation of different methods

Fig.3 ROC of ISSA-LightGBM with other ML models: (a)ROC Curve 1; (b)ROC Curve 2

5 SHAP Interpretations

To better understand the underlying reasons behind predictions, SHAP was employed to provide post-hoc global and local explanations for the proposed model, which helped to reveal the importance of various risk factors, the interaction of each variable towards the outcome, and explained given instances.

5.1 Global Interpretation

By summing the SHapley values of all features,Fig.4(a)-(b) reflect the importance ranking of depression risk features for the elderly and how the value of features affects the change in risk.As Fig.4(a) shows, mental status (within one month), sex, concentration or memory, physical condition and age contribute greatly to the prediction results, and as a whole, all features contribute almost equally to the positive or negative predictions.In Fig.4(b), the color of each point denotes whether the actual feature value is high(red) or low (blue), and thex-axis represents the SHAP values, which illustrates the contribution of the feature to the positive or negative prediction.

According to Fig.4(b), mental status has an important influence on the elderly depression; the more days under poor mental status, the higher the risk of depression.Feature values of age decrease depression risk and relatively young elderly people are at greater risk of depression.In terms of gender, women are more likely to be depressed than men.

Moreover, Fig.5(a)-(d) illustrate the contributions of different features to the prediction results in detail.In the experimental dataset, age is recorded in the form of range, such as age group.As Fig.5(a) shown, the risk of geriatric depression gradually decreases with increasing age.The effect of the respondents’mental status is described in Fig.5(b), here mental status represents the number of days of poor mental status within a month.When the number of such days is less than 5, the respondents are more likely to be judged as having low risk of depression, where with the increase of the number of such days, the depression risk increases gradually.Fig.5(c) and Fig.5(d) reveals that both education level and BMI index has considerable positive impacts on depression risk.It can be observed that with the increasing of education level or BMI index, the depression risk subsequently increases among old people.

5.2 Individual Interpretation

Besides global interpretation, Fig.6 and Fig.7 illustrate two selected instances for local explanations obtained from the SHAP force plot.Fig.6 and Fig.7 show the explanation of ISSA-Light-GBM for sample No.1 and sample No.2.For sample No.1, the model predicts that the interviewee has no risk of depression.Among features of the interviewee, having no symptoms of poor mental status, being male, having low education level and no memory disorder, all contribute to the lack of risk of depression.In contrast, age between 60 and 65 and having two days of poor physical condition within a month contribute to the risk of depression.In fact, the interviewee doesnot suffer from depression, which matches the prediction result given by the proposed model.

For sample No.2, the model judges that the interviewee is at risk of depression.Having no symptoms of poor physical condition and being male contribute to the lack of risk of depression.Seven days in a month under poor mental status,having HIV testing, and having dressing or bathing difficulty, contribute to the risk of depression.The interviewee actually suffers from depression, which is consistent with the predicted results of the model.

Fig.4 Global interpretation: (a) the order of feature importance; (b) the contribution of features

Fig.5 Contribution of different eigenvalues to prediction results: (a) age; (b) mental status; (c)education level, from high school (1)to bachelor (4); (d) BMI index

Fig.6 Explanation of the model for sample No.1

Fig.7 Explanation of the model for sample No.2

6 Conclusion

Depression is a major source of suffering and disability among elderly individuals.In this study,an interpretable depression prediction model ISSA-LightGBM has been explored, which can capture the significant depression risk features from massive data with high dimension, and accurately identify the risk of depression of the elderly.Several ML models were compared with ISSA-LightGBM, varying the hyper-parameters optimization method, and evaluated according to their average accuracy by iterative cross validation.Moreover, SHAP is used to interpret the model and obtain useful knowledge to predict potential depressive patients among the old people.The experimental results show that ISSALightGBM outperformed other compared approaches for screening geriatric depression in all selected metrics, hence it is suitable for the prediction task of elderly depression facing massive high-dimensional data.The limitation of this work is that the proposed model was verified by only one dataset.In future work, more types of geriatric depression datasets, such as electronic medical records, will be incorporated to further verify the effectiveness and generalization ability of the proposed integrated model.

Journal of Beijing Institute of Technology2023年2期

Journal of Beijing Institute of Technology的其它文章: A Hybrid Model Based on ResNet and GCN for sEMG-Based Gesture Recognition; An End-to-End Machine Learning Framework for Predicting Common Geriatric Diseases; Brain Functional Network Based on Small-Worldness and Minimum Spanning Tree for Depression Analysis; Serum Sodium Fluctuation Prediction among ICU Patients Using Neural Network Algorithm:Analysis of the MIMIC-IV Database; Exploring Brain Age Calculation Models Available for Alzheimer’s Disease; User Profile in Smart Elderly Care Community:Findings from Community in Western China