亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Application of machine learning algorithms to screen potential biomarkers under cadmium exposure based on human urine metabolic profiles

        2023-01-30 06:49:10TingZengYnshnLingQingyunDiJinglinTinJinyoChenBoLeiZhuYngZongweiCi
        Chinese Chemical Letters 2022年12期

        Ting Zeng ,Ynshn Ling ,Qingyun Di ,Jinglin Tin,Jinyo Chen,Bo Lei ,Zhu Yng ,Zongwei Ci ,*

        a Food Science and Technology Program,Beijing Normal University-Hong Kong Baptist University United International College,Zhuhai 519087,China

        b State Key Laboratory of Environmental and Biological Analysis,Department of Chemistry,Hong Kong Baptist University,Hong Kong,China

        c Department of Nutrition,Food Safety and Toxicology,West China School of Public Health,Sichuan University,Chengdu 6100 41,China

        Keywords:Human urine High-resolution mass spectrometry Cadmium exposure Metabolic profiles Machine learning

        ABSTRACT Exposure to environmental cadmium increases the health risk of residents.Early urine metabolic detection using high-resolution mass spectrometry and machine learning algorithms would be advantageous to predict the adverse health effects.Here,we conducted machine learning approaches to screen potential biomarkers under cadmium exposure in 403 urine samples.In positive and negative ionization mode,4207 and 3558 features were extracted,respectively.We compared seven machine learning algorithms and found that the extreme gradient boosting(XGBoost)and random forest(RF)classifiers showed better accuracy and predictive performance than others.Following 5-fold cross-validation,the value of area under curve(AUC)was both 0.93 for positive and negative ionization modes in XGBoost classifier.In the RF classifier,AUC were 0.80 and 0.84 for positive and negative ionization modes,respectively.We then identified a biomarker panel based on XGBoost and RF classifiers.The incorporation of machine learning models into urine analysis using high-resolution mass spectrometry could allow a convenient assessment of cadmium exposure.

        Despite the fact that the official regulations of environmental cadmium(Cd)pollution have been in place for over one hundred years,the screening of potential biomarkers related to Cd toxicity is still in need to safeguard human health[1].The tolerable Cd exposure level is 0.83μg/kg body weight/day and urinary Cd is 5.24μg/g creatinine according to the Food and Agriculture Organization/World Health Organization(FAO/WHO)guideline[2].Appropriately,the safety standard of Cd in rice is 0.2 mg/kg according to the China Food Safety National Standard for Maximum Levels of Contaminants in Foods(GB2762-2017).In many cases,urine is routinely and robustly used to diagnose and monitor health conditions because it is easy-obtained and non-invasive sampling[3,4].The interpretation of the urinary profile under Cd exposure would be beneficial to the determination of Cd exposure grades based on the screening markers.Mass spectrometry(MS)is able to provide data information including the retention time,ion mass spectral data and peak intensities for thousands of molecules concurrently,which could deepen our knowledge in the urinary samples.Considering the growing capacities of high-resolution MS(HRMS)to reveal and characterize the pollutant-relevant health status,as well as the accessible nature of urine samples,a bunch of relevant studies has been conducted over the past years[5–7].

        Machine learning(ML)algorithms,in parallel,have been successfully applied in clinical laboratories for years,demonstrating their ability to deal with complex data sets[8].The machine learning approach is a type of artificial intelligence,which contains supervised and unsupervised learning,and has been used to improve the diagnostic capabilities of HRMS on urinary samples for more challenging problems[9].Briefly,a typical application of ML works as follows.First,it is the infrastructure data collection and pre-processing.The most appropriate machine learning algorithms for the type of data would be selected after loading the data and adjusting the parameters.After training and testing the model on data,cross-validation would be properly used for evaluation through the best-performing machine learning algorithms and the optimized signatures[10,11].The ratio of the training to testing set was usually set at 70:30 or 80:20,and several compounds were pinpointed to distinguish the exposure status[12].Altogether,machine learning algorithms have great potential for automating the interpretation and analysis of the urinary sample under Cd exposure[13,14].

        The purpose of this study was to develop ML-based models that used urinary metabolic characteristics to screen potential biomarkers under Cd exposure.We collected 403 urine samples and analyzed them using liquid chromatography-HRMS(LC–HRMS).We then compared seven machine learning algorithms on the LC–HRMS data set,including extreme gradient boosting classifier(XGBoost classifier),logistic regression classifier,support vector classifier(SVC),multi-layer perceptron classifier(MLP classifier),K-nearest neighbor classifier(KNN classifier),Gaussian naive Bayes classifier(GaussianNB classifier)and random forest classifier(RF classifier).Via5-fold cross-validation,the area under curve(AUC),accuracy,sensitivity,specificity,positive predictive value,and negative predictive value were calculated.Finally,we identified a biomarker panel based on the selected machine learning models.The incorporation of machine learning models into urine analysis by HRMS allows a convenient assessment for Cd exposure.

        As for the urinary characteristics of Cd-exposed humans,the present study included 403 participants,among which 280 and 123 belong to the exposure and control groups,respectively.The discriminating criterion was whether the villagers lived in Cd exposure or control areas.The main exposure route of Cd to the villagers was dietary Cd intake.The Cd exposure area was miningaffected and the villagers mainly subsisted on home-grown rice.The Cd concentration of home-grown rice greater than 0.2 mg/kg in the polluted area was reported in the previous study[15].The villagers in the control area with a Cd concentration of homegrown rice lower than 0.05 mg/kg have a similar lifestyle to the villagers in the polluted area.Table 1 showed the demographic characteristics of the Cd exposure and the control group.Of the 403 participants included in this study,the characteristics such as gender,frequency of smoking,age,residence duration and body mass index between Cd exposure and the control group showed no significant differences(P>0.05).Nevertheless,there were significant differences(P<0.05)in the concentrations of urinary cadmium(UCd),blood cadmium(BCd)andβ2-microglobulin between the Cd exposure and the control groups.Among them,β2-microglobulin is one of the nephrotoxicity signal molecules.The renal function tests,hepatic function tests and cardiac function tests were conducted.The results of cardiac function tests revealed no significant differences(P>0.05)between the two groups.While some of the clinical traits in the renal function tests(creatinine and blood urea nitrogen)and the hepatic function tests(total bile acid,alkaline phosphatase,gamma-glutamyl transpeptidase,indirect bilirubin,direct bilirubin and total bilirubin)showed a significant difference(P<0.05)between Cd exposure and the control groups.After the acquisition of urine metabolic profiles,4207 and 3558 features were extracted in positive and negative ionization mode,respectively.Then the partial least squares-discriminant analysis(PLS-DA)and the following metabolic pathway analysis were conducted using MetaboAnalyst(https://www.metaboanalyst.ca).As shown in Fig.S1(Supporting information),the exposure group clustered away from the control group and the peak intensities of the quality control sample were stable during the whole analysis guaranteeing the system stability(RSD<30%).

        Then,these samples were randomly split into two parts at a ratio of 7:3.The forest plot analysis of different machine learning algorithms based on positive and negative ionization mode mass spectra was conducted in terms of classifying the Cd exposure using data from sufficient metabolic features(Fig.1).In this step,the random subsampling method was applied primarily due to its convenient comparison among multiple algorithms.The XGBoost and RF classifiers outperformed other algorithms when applied to the data from both positive and negative ionization modes.Following 5-fold cross-validation,accuracy,sensitivity,specificity,positive and negative predictive values of both models were calculated.The AUC for positive and negative ionization modes were both 0.93 in the XGBoost classifier.In the RF classifier,the AUC were 0.80 and 0.84 for positive and negative ionization modes,respectively(Table 2).

        As XGBoost and RF classifiers were preferred,the top thirty features calculated from their feature importance were selected from positive and negative ionization modes,as shown in Fig.2.Among the top thirty features selected from the XGBoost classifier and RF classifier,biomarkers were identified by matching their mass fragmentation patterns with those in the Human Metabolome Database(HMDB,https://hmdb.ca).In total,seven biomarkers were found,includingS-adenosylmethionine,dehydroepiandrosterone sulfate,propionic acid,L-asparagine,homovanillic acid,deoxyinosine and deoxycytidine(Table 3).These compounds are involved in amino acid metabolism,purine metabolism,steroid hormone biosynthesis,etc.(Fig.S2 in Supporting information)[16–20].The Spearman correlation matrix of the identified compounds from human urine under Cd exposure was shown in Fig.3.Correlation were obtained between UCd and BCd(r=0.26),UCd and L-asparagine(r=?0.22),deoxycytidine and dehydroepiandrosterone sulfate(r=?0.32),deoxycytidine and homovanillic acid (r= ?0.34),deoxycytidine and S-adenosylmethionine(r=?0.31),S-adenosylmethionine and L-asparagine(r=0.36),S-adenosylmethionine and dehydroepiandrosterone sulfate(r=0.53),S-adenosylmethionine and homovanillic acid(r=0.73),homovanillic acid and L-asparagine(r=0.37),homovanillic acid and dehydroepiandrosterone sulfate(r=0.51)as well as dehydroepiandrosterone sulfate and Lasparagine(r=0.23).It is worth noting that additional work is required to identify more endogenous metabolites in urine and to expand the depth and breadth of this altered profile.The current models had scientific merits but were not comprehensive.

        Table 1 Characteristics of the populations between Cd exposure and control group at baseline(n=403).

        Table 2 Summary of 5-fold cross-validation result of different machine learning models on the dataset.

        Table 3 Biomarkers identified from human urine under Cd exposure based on different machine learning models.

        Although popular machine learning algorithms have been applied to predict Cd pollution in water springs,agricultural soil and lettuce leaves,to name a few[21–24],our research was the first study using seven different machine learning algorithms to screen potential biomarkers under Cd exposure based on human urine metabolic profiles.After the acquisition of MS data,fitting,coding,debugging,and fine-tuning of the specific machine learning algorithms,the XGBoost and RF classifiers showed better accuracy and predictive performance than others.These two algorithms are both decision tree-based supervised ensemble machine learning methods[25,26].To avoid overfitting,k-fold cross-validation is commonly used in these two algorithms[27].The most suitable one for our dataset is 5-fold cross-validation.The principles of various algorithms differ[28,29].The superior performances of XGBoost and RF classifiers in the present study may attribute to the skewed feature distributions and the unbalanced class distributions(123 control samples and 280 exposure samples)according to previous literature[30].XGBoost and RF classifiers are both promising in constructing predictive models and,as a result,providing variables of relatively high importance[31].To gain insights into these variables and several clinicopathologic characteristics,we may find more useful information hidden inside the high-dimensional mass spectrometry dataset.We trained and tested the models with a binary classified dataset(polluted or non-polluted)in this study.In reality,the polluted situations could be classified as non-polluted,low-polluted and high-polluted areas which would be more complex than binary classification.Thus,more human biological samples and optimized algorithms should be used to improve this multi-classification point in future studies.In short,the present study has added a new reference for selecting data-driven machinelearning algorithms for a metabolic analysis of urine upon Cd exposure.

        Fig.1.Forest plot of different machine learning algorithms based on positive(a)and negative(b)ionization mode mass spectra.The red,blue,orange,green,purple,light blue and light yellow represent the results from XGBoost,RF,Logistic regression,GaussianNB,SVM,MLP and KNN classifiers,respectively.

        Fig.2.Top thirty features calculated from their feature importance selected from positive(a)and negative(b)ionization mode mass spectra based on the random forest classifier and positive(c)and negative(d)ionization mode mass spectra based on the extreme gradient boosting classifier.The x-axis represents the results of mean decrease Gini and weight-based feature importance to RF and XGBoost classifiers,respectively.

        Due to the public health concerns about Cd,many metabolomics studies have been conducted.These reports have evidenced that amino acid metabolism,carbohydrate metabolism,purine metabolism,steroid hormone biosynthesis,creatine pathway,fatty acid oxidation,lipid metabolism,etc.would be influenced by Cd exposure[32–35].The biomarkers identified in this study were related to the changes in amino acid metabolism(Sadenosylmethionine,L-asparagine and homovanillic acid),purine metabolism (deoxyinosine)and steroid hormone biosynthesis(dehydroepiandrosterone sulfate).By convention,the aberration of amino acid metabolism indicated its energy expenditure under Cd exposure.As an illustration,S-adenosylmethionine,metabolized from the essential amino acid,methionine,is related to liver disease[16].Concomitantly,purine metabolism is associated with nephrotoxicity[36,37].The alteration of purine metabolism in this study also indicated the influence of Cd on the kidney.Dehydroepiandrosterone sulfate is a rich circulating steroid in humans and the steroid hormone biosynthesis dedicates to the normal development of the individual.These findings were generally consistent with the previous reports.As shown in the Spearman correlation analysis,the positive correlation coefficient between the concentration of UCd and BCd,two typical indexes for evaluating Cd exposure,was relatively higher than others.It was worth mentioning that mean decrease Gini and weight-based feature importance ranking were applied to RF and XGBoost classifiers,respectively.Other feature selection methodologies in these two algorithms could also be good alternatives for obtaining metabolic information to distinguish Cd exposure from control.Inferred from the information about hepatic and renal damage collected in the present study(Table 1)and supported by similar findings in the previous study[38–41],we concluded that the liver and kidney were susceptible to Cd exposure regardless of acute and chronic toxicity.As more and more omics data are accumulated[42,43],artificial intelligence-based techniques in combination with omics might be effective in recognizing unknown metabolic biomarkers and pathways,allowing for personalized risk assessments.

        In conclusion,we acquired high-resolution mass spectrometry datasets from 403 human urine samples,trained XGBoost and RF classifiers and identified a biomarker panel under Cd exposure.Taking into account the vast potential of big data,prospective risk stratification of machine learning algorithms,more sophisticated models are needed based on the two algorithms.

        Fig.3.Spearman correlation matrix of the identified compounds from human urine under Cd exposure.

        Declaration of competing interest

        The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

        Acknowledgment

        This work was supported by the National Key Research and Development Program of China(No.2017YFC1600500).

        Supplementary materials

        Supplementary material associated with this article can be found,in the online version,at doi:10.1016/j.cclet.2022.03.020.

        久久综合九色综合久久久| 国产精品99久久精品爆乳| 亚洲 欧美 激情 小说 另类| 极品粉嫩嫩模大尺度无码| 在线观看精品视频一区二区三区| 伊人狼人大香线蕉手机视频| 蜜桃网站免费在线观看视频| 麻豆资源在线观看视频| 国产成人精品一区二区三区视频| 久久亚洲黄色| 蜜桃视频在线免费观看完整版| 日韩在线视频专区九区| 免费一级淫片日本高清| 中文字幕一精品亚洲无线一区| 激情偷乱人伦小说视频在线| 无码精品色午夜| av永远在线免费观看| 偷拍美女上厕所一区二区三区| 精品无码国产自产拍在线观看蜜| 亚洲自拍另类制服在线| 久久综合激激的五月天| 森中文字幕一区二区三区免费| 亚洲人交乣女bbw| 1000部精品久久久久久久久| 久久精品成人免费观看97| 午夜视频一区二区在线观看| 日韩亚洲欧美久久久www综合| 国产第一页屁屁影院| 亚洲专区路线一路线二天美| 国产特黄a三级三级三中国| 国产av天堂亚洲国产av天堂| 波多野吉衣av无码| 精品人妻av一区二区三区不卡| 人妻少妇久久中中文字幕| 午夜男女很黄的视频| 麻豆精产国品| 99久久免费中文字幕精品| 亚洲日本中文字幕高清在线| 无码精品人妻一区二区三区av| 爱我久久国产精品| 超高清丝袜美腿视频在线|