Jian Guo, Yu Han, Fan Xu, Jiru Deng, Zhe Li
Abstract: Interdisciplinary applications between information technology and geriatrics have been accelerated in recent years by the advancement of artificial intelligence, cloud computing, and 5G technology, among others.Meanwhile, applications developed by using the above technologies make it possible to predict the risk of age-related diseases early, which can give caregivers time to intervene and reduce the risk, potentially improving the health span of the elderly.However, the popularity of these applications is still limited for several reasons.For example, many older people are unable or unwilling to use mobile applications or devices (e.g.smartphones) because they are relatively complex operations or time-consuming for older people.In this work, we design and implement an end-to-end framework and integrate it with the WeChat platform to make it easily accessible to elders.In this work, multifactorial geriatric assessment data can be collected.Then, stacked machine learning models are trained to assess and predict the incidence of common diseases in the elderly.Experimental results show that our framework can not only provide more accurate prediction (precision:0.871 3, recall:0.821 2) for several common elderly diseases, but also very low timeconsuming (28.6 s) within a workflow compared to some existing similar applications.
Keywords: predicting geriatric diseases; machine learning; end-to-end framework
Age-related diseases refer to medical conditions that are more commonly found in individuals at an advanced age, where aging is a significant contributing factor.The Chinese Centre for Disease Control and Prevention reports that the prevalence of hypertension, diabetes, and dyslipidemia among Chinese elderly adults aged 60 years and above is 58.3%, 19.4%, and 37.2%,respectively, with the likelihood of developing these conditions increasing gradually with age [1].The majority of age-related diseases are often identified and managed during the intermediate and advanced stages, which can be attributed to several factors.Firstly, one of the primary reasons is that age-related diseases often lack apparent symptoms in the initial stages, making it challenging to identify abnormalities during routine health examinations.Secondly, a large proportion of elderly adults lack awareness of disease prevention and attention to the potential risks of minor illnesses, which leads to the condition being diagnosed and treated in the middle and late stages, it not only greatly increases the difficulty and cost of treating but also not conducive for health recovery.
The State Council of China’s 2017 “New Generation of Artificial Intelligence Development Plan”recognizes that artificial intelligence(AI) presents novel opportunities for societal advancement [2].This plan encourages the development of AI healthcare and other convenient intelligent services for elderly adults.It is essential to note that the intelligent elderly healthcare mentioned here is different from the comprehensive geriatric assessment (CGA) consulted by a professional geriatrician in hospitals, instead AI-enabled wearable devices to surveil the health status of elderly individuals during their daily routines and warn about potential geriatric when risks arise.Furthermore, when the risk of a geriatric disease is predicted by AI, elderly users should consult a professional doctor and have double-checking in time, which can reduce the probability of sudden illness in daily life and improve the cure rate greatly as well as reduce the difficulty and cost of treatment.On the other hand, a survey by Tencent reports that the relatively complex operation is the biggest issue when using smart devices for older people.Particularly, 46.7% of the issues were related to functions operation of applications (Apps),41.2% of the issues were caused by setup and maintenance smart operation systems, 32.7% of the issues were related to downloading, registering and logging to the Apps, as well as making online payments [3].To this end, age-friendly apps should simplify as much as possible complex operations that most older people cannot understand, meanwhile, making complex functional processes transparent to older users.
The primary findings of this study are succinctly outlined and emphasized as follows.
1) We propose an end-to-end framework that can be embedded into the WeChat platform and predict the incidence of common diseases, making it easy to use and friendly enough for elderly users.
2) Unlike previous work, our end-to-end framework integrates multiple disease prediction models and is also scalable to embed additional disease prediction models in the future.
The subsequent sections of this manuscript are organized as follows.Section 2 provides the research context, while Section 3 outlines the design concept and workflow of our comprehensive end-to-end framework, especially how to integrate multiple machine learning models for risk prediction of common geriatric diseases.In Section 4, we describe our experiments and compare our end-to-end framework with other similar works in both predicting accuracy and runtime, then we also discuss the results before giving our conclusion.
Currently, machine learning models based on artificial intelligence (AI) techniques are commonly applied in the area of elderly healthcare and the timely identification of prevalent geriatric illnesses.To name but some examples, Boyd et al.conducted a comprehensive review comprising 47 publications that applied machine learning for clinical vascular analysis, such as stroke and coronary artery disease [4].Dami et al.proposed a deep learning methodology that employed a 5-minute electrocardiogram (ECG)recording and extracted time-frequency features of ECG signals to forecast arterial incidents occurring several weeks or months before the event [5].Yu et al.devised a system for predicting strokes that identifies stroke incidents via electromyography (EMG) bio-signal feature data and employs machine learning algorithms (random forest and long short-term memory) [6].Qian et al.introduced an innovative machine framework for scrutinizing the daily behavior of elderly individuals residing alone, which utilizes feature data acquired from household appliances,such as televisions and refrigerators [7].In addition, a machine learning model was developed to predict the possibility of lung cancer diagnosis using multifactorial geriatric assessment (MGA)data, such as demographic features, BMI data,smoking habits, comorbidities, spirometry outcomes, laboratory test results, hospitalization records, and vital status data [8].The authors in[9] prioritized the significant risk factors by ranking the corresponding features and subsequently suggested an approach based on both feature ranking analysis and traditional biostatistics tests which were used to build the machine learning survival prediction model.Mujumdar et al.proposed a diabetes prediction model that incorporates external factors associated with diabetes, in addition to conventional factors such as Glucose,BMI, Age, and Insulin, in order to enhance classification accuracy [10].In order to forecast cerebral stroke with incomplete and imbalanced physiological feature data, Liu et al.created a two-step hybrid machine learning technique.First, random forest regression was utilized to fill in the missing values prior to classification, then an automated hyperparameter optimization based on deep neural network was employed for stroke prediction on an imbalanced dataset [11].
The aforementioned methods aim to facilitate early risk prediction of age-related diseases through modeling and feature learning.In feature selection, conventional clinical and laboratory data are often utilized to achieve optimal results,which can be enhanced by adding or selecting relevant features.Hybrid modeling and deep neural networks are frequently employed in model construction.However, we found the following limitations in the above studies.Firstly, one of the mentioned achievements was limited in corresponding to the prediction of one disease, which leads to a lack of expandability.Secondly, among existing studies, specific feature data such as EEG and EMG data would be difficult to obtain and use to assist in disease diagnosis without specialist equipment and personnel, which leads to the low accessibility of those research achievements.Inspired by previous work [12], an end-toend machine learning framework is able to cover the intermediate processes typically present in the classical machine learning pipeline, making the workflow of the framework easier and more operationally efficient for users.We design and implement an end-to-end framework and integrate it with the WeChat platform to make it easily accessible to elders.In addition, stacked machine learning models are trained to assess and predict the incidence of common diseases for the elderly in this work.
According to related survey reports [13, 14], simplified operations, larger fonts, and icons are the basic requirements for being applications for elderly people in China.The main purpose of our work is to provide easy-to-use health status monitoring as well as early detection services of common geriatric diseases to the elderly anytime and anywhere.In this case, the end-to-end machine learning framework is able to cover the intermediate processes usually present in the classic machine learning pipeline, which makes the workflow of the framework to be easier and more operationally efficient for both aged users and healthcare staff.On the other hand, the WeChat platform (including App, mini-program, and Tencent cloud computing, etc.) has become the most used application for Chinese, which is an ideal tool that will cover all our needs whereby various features complement each other and cooperate smoothly and seamlessly.In particular,the WeChat Mini Programs can be accessed directly within the WeChat App without any need for installation.Deploying a framework through a WeChat Mini Program can help developers quickly reach users and provide them with a convenient way to use the service without leaving the app.Thus, we think the WeChat platform is the best choice to deploy our end-to-end machine learning framework for the early detection of common geriatric diseases.
3.2.1 Input Phase
As can be seen in Fig.1, we divide the workflow of the end-to-end framework into 4 phases.In the input phase firstly, except for basic personal information such as age, gender height, weight,etc., MGA features (e.g.blood pressure, BMI,skin thickness, etc.) need to be used for training machine learning models, which currently supports prediction models of 4 common geriatric diseases, and they are all based on studies[15–18].In terms of MGA features, some raw data can be obtained through wearable devices,but other more specialized MGA data (such as anemia, albumin, HbA1c, etc.) may require specialized medical equipment or medical examination to be obtained [19].Among them, each disease prediction model requires specific MGA features for training machine learning models respectively, a list can be found in Tab.1.In this work, we mainly focus on end-to-end framework design, therefore we do not explain medical details about those MGA features anymore,which can be referred to in previous studies.
Fig.1 The overall workflow of our end-to-end framework for early detection of common geriatric diseases
Tab.1 The list of MGA features for predicting common geriatric diseases in this work
3.2.2 Pre-processing (Features Extraction)Phase
As we can see, this framework can not only collect MGA data automatically via wearable devices but also supports actively importing MGA data by sending WeChat messages.More specifically, a Tencent cloud-based service account is necessary for receiving and preprocessing all the MGA features.Of course, the most direct way is to send the MGA features to the service account by sending a WeChat text message.For more convenient input methods, users could send their medical examination results by sending a photograph of the result (by OCR) or voice messages (by NLP) to our WeChat service account for MGA features extraction [20, 21].Meanwhile, some special MGA features have to be extracted by using other technologies such as audio signal processing [22, 23].
Above mentioned pre-processing methods are integrated and implemented through Tencent Cloud, which can greatly improve data security and convenience for end-to-end implementation.In addition, the MGA feature data will be utilized for the training of machine learning algorithms aimed at predicting prevalent geriatric illnesses.
3.2.3 Machine Learning Phase
Fig.2 How to train machine learning models in our framework and use the trained models for inference of predicting common geriatric diseases
A complete machine learning process comprises two components: the training and inference stages of the machine learning model.During the training phase, the pre-processed MGA feature data will be stored in a database.The aforementioned feature data are intended to be utilized in the training of machine learning models aimed at early prediction of prevalent geriatric ailments,meanwhile, those prediction results would be saved back to the database as well.The process for machine learning modeling is shown in Fig.2.Numerous methods exist for ensembling models in machine learning, with stacking representing a commonly employed technique whereby multiple models are predicted to construct a new model and enhance performance [24].In our end-to-end framework, we ensemble classic machine learning models to combine their predicting output, which builds a new model with improved performance.As we showed in Fig.3, firstly, we train three base models for each disease with above mentioned classic algorithms like Random Forest,XGBoost, and LightGBM respectively [25–27].For the sake of fairness, we split modeling required MGA features data into training sets(80%) and test sets (20%) and obtain the predicting output with an average score of the 5 individual scores by 5-fold cross-validation approach [28], among them, Randomized SearchCV function is used to tune the hyper-parameters of Random Forest, XGBoost and LightGBM based models [29].After that, predicting outputs of base models are combined to build a new final model by stack ensemble with improved performance for predicting each disease.As shown on the right side of Fig.2, machine learning inference is the deployment of previously trained models in a real production environment and using models to make predictions on real data.
3.2.4 Output Phase
After the machine learning phase, all the predicting results of common geriatric diseases and all MGA data have been saved into a cloud database, which can be accessed actively or passively by users (including elderly users and their family members) through our WeChat service account, users can actively query whether there is a high chance of having geriatric diseases or not.Besides, as showing in Fig.3, users can receive regular health profiles (including the rate of having geriatric diseases and time series MGA data like BMI values for the last 30 days) tweets from the service account.It assumes that this data has already been collected and stored in the database.
Fig.3 Users can actively or passively access to database to get their latest health profile with WeChat platform in our end-to-end framework
As we mentioned previously, our end-to-end machine learning framework in this work is able to early detect 4 common geriatric diseases including lung cancer, diabetes, serebral stroke,and heart failure.In this section, we train and evaluate machine learning models with public datasets [30–33].
As we mentioned previously, we build classification models to predict whether there is a high chance of having geriatric diseases or not (binary classification).It is important to note that, accuracy is not a reliable metric for evaluating the performance of a classifier problem with the imbalanced dataset, because most of the datasets we used in this work are imbalanced datasets.Instead, precision and recall allow more detailed analysis for binary classification problems with imbalanced data [34].Precision can be calculated with
The predicted results for different common geriatric diseases are shown in Tab.2.We can see that our stacked machine-learning models.
Tab.2 Precision and recall for predicting 4 common geriatric diseases
We evaluate the performance of our framework for predicting common geriatric diseases in the previous experiments.In this section, we evaluate the efficiency and user experience of our endto-end framework by counting the runtime for a complete workload of predicting common geriatric diseases.The configuration of our experimental environment is given in Tab.3.We select a free instance of the WeChat cloud host as our experimental environment.We can see that the configuration of the free instance is much lower than the paid instance, the lower configuration can measure the lower limit of the runtime of our framework.
Tab.3 WeChat cloud hosting configuration details for runtime evaluation
In runtime evaluation, we do not take the runtime consumption of training models into account since training time consumption has no business with user experience, instead, the runtime consumption of machine learning inference is integrated to the output phase.The experiment of runtime consumption starts with the user entering the MGA features data to our WeChat service account and ends with the user receiving the predicted results from our WeChat service account, which has been divided into 3 phases respectively: the input phase, the pre-process (include features extraction) phase, and the output phase (include model inference).We evaluate the runtime consumption of 3 different input ways (various workflows) supported by our framework for predicting diabetes.A mobile application named the Diabetic Nurse which is able to download from Appstore is being used to evaluate runtime consumption as a control group.We evaluate the time consumption of the Diabetic Nurse with text model because it only supports text input.Additionally, we integrated the runtime consumption of pre-processing phase in the Diabetic Nurse into the output phase since we can only access this mobile application in being a general user instead of the developer.
As can be seen in Fig.4, our end-to-end framework considerably improves the overall runtime consumption for predicting diabetes as compared to the control group in both image (OCR)and voice (ASR) workflows.More specifically,the voice-based input takes the least time overall,followed by the image-based workflow.In the meantime, it seems that the 3 various workflows of our framework have similar runtime consumption in the output phase because the data input to the machine learning inference is the MGA features data that has been pre-processed.In contrast, there is no significant difference between the text-based workflow of our framework and the Diabetic Nurse, this is simply caused due to the same input method they have had (text only).Note that among our framework, the overall image-based workflow has longer runtime consumption than the voice-based workflow(through ASR technology), even within the same task since their runtime consumptions of pre-processing are different.We can clearly see that even though voice-based workflow needs more time in voice input phase than image-based workflow, the runtime consuming pre-processing of image-based workflow through OCR technology consumes a huge amount of time (around 18 s).We may assume that if the efficiency of OCR can be improved significantly, then imagebased workflow may be the most efficient workflow in our end-to-end framework.
Fig.4 Comparing runtime consumption about predicting diabetes with various workflows
In this work, we design an end-to-end machine learning framework and integrate it into the WeChat platform for predicting common geriatric diseases.Compared with existing related similar works, our framework can provide not only more accurate prediction for several common elderly diseases but also low time-consuming workflow.Benefiting from the end-to-end design concept, our framework can collect MGA features needed to predict diseases from authorized users in a variety of ways through the WeChat platform, in addition, users can actively or passively access their health profile in the same way easily.To summarize, this work is a typical and efficient solution for AI healthcare as well as intelligent services for elderly adults.
In the future, we would like to expand the types of common geriatric diseases by training additional predicting models with more datasets,algorithms, and machine learning techniques.We will further extend the boundaries of our framework and improve its functionality by collaborating with geriatricians to introduce the framework into CGA in hospitals.
Journal of Beijing Institute of Technology2023年2期