亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Two Paradoxes in Linear Regression Analysis

        2016-12-09 08:30:48GeFENGJingPENGDongkeTUJuliaZHENGChangyongFENG
        上海精神醫(yī)學 2016年6期
        關(guān)鍵詞:醫(yī)學期刊生物醫(yī)學悖論

        Ge FENG, Jing PENG, Dongke TU, Julia Z. ZHENG, Changyong FENG,3*

        ?Biostatistics in psychiatry (36)?

        Two Paradoxes in Linear Regression Analysis

        Ge FENG1, Jing PENG2, Dongke TU4, Julia Z. ZHENG5, Changyong FENG2,3*

        Forward selection, backward elimination, univariate regression; multiple regression

        1. Introduction

        Linear regression is the most widely used statistical model in data analysis.[1]Wide availability and ease of use of statistical software packages, such as SAS, SPSS and R make the linear regression accessible to people without any formal statistical training. Although wise use of statistical methods such as linear regression helps us, even novices, develop a better understand of data and guide our decisions, it also causes confusion in interpretation of results and paradoxical findings.For example, we are often asked by our biomedical collaborators questions like “When I run the univariate regression of Y on the predictor , the p-value is very small. However, if I add some other predictors in the model, is not signif i cant anymore. Why?” The same problem also occurs in logistic regression for binary outcome[2], log-linear regression for counting data[2],and Cox proportional hazards regression for survival data.[3]

        A simple answer to this question is the different assumptions between the univariate and multiple regression models. However, this is not so meaningful for non-statisticians. This is discussed in Section 2.

        In many medical studies, regression analysis involves a large of number of independent variables,or predictors. Model selection is required to find the predictors that are signif i cantly associated with an outcome, or dependent variable, of interest. Here is how the model selection was done in a recent paper published in JAMA Surgery[4]:

        “The administrative database was then evaluated by means of univariate and multivariate logistic regression. First we identified variables that were associated (P < .20) with readmission, the dependent variable. These potential confounders were then entered in multivariate stepwise (backward elimination) logistic regression, with readmission as the dependent variable.A logistic regression model was constructed to identify patient factors associated with readmission.”

        This forward selection procedure as the fi rst step to weed out “non-signif i cant” predictors has been become almost the gold standard for variable selection and has been used in many papers published in top medical journals.[5-24]The key idea of this method is fi rst to run a univariate regression on each predictor. If the p-value is less than some pre-specif i ed level, for example 0.1,then the predictor is used in the multiple regression.Otherwise, the predictor is assumed to have no signif i cant effect on the outcome. This method seems quite logical and intuitively meaningful. Indeed, it has been used and is still being used by the biomedical and other research communities. Is this a valid procedure?

        In this paper we use linear regression analysis to show two paradoxes in regression analysis. In Section 2 we use some very basic theory to show how the univariate regression and multiple regression make different assumptions on the models. We use examples and simulation studies to show two paradoxes in regression analysis in Section 3. Section 4 brief l y discusses the transitivity of correlation. Our results clearly invalidate the model selection procedure widely used in biomedical research.

        2. Basic theory

        Let (Y, X1, ..., Xp) be a random vector, where X1, ..., Xpare called the covariates (independent variables),and Y is called the outcome (dependent variables).The regression of Y on (X1, ..., Xp) is the conditional expectation of Y given (X1, ..., Xp), denoted by E[Y|X1, ...,Xp] which is a measurable function of (X1, ..., Xp). Denote the function by g(X1, ..., Xp). Without knowing the joint distribution of (X1, ..., Xp, Y ), in general, the form of g(X1, ..., Xp) is unknown. In statistical analysis, we usually assume some mathematically tractable forms of g(X1, ..., Xp). For example, the linear regression analysis[1]assumes that

        In the logistic regression analysis with 0-1 outcome[2],we assume that

        In this paper we assume the outcome Y is continuous.Let

        It is obvious that E[Y|X1, ..., Xp] = 0. We consider a stronger form of the liner regression model

        and assume that given X1, ..., Xp, the variance of ε

        which does not depend on (X1, ..., Xp). This assumption is also used in most statistical literature on linear model.[1]We further assume that Xk, k = 1, . . . , p, have finite second moments.

        From (1) we have

        Let Zk= E[Xk|X1] , k = 1, . . . , p. (It is clear that Zk= Xk).Then the regression of Y on X1is

        which still has a linear form. Let Then

        Although (3) has the same form as (1), they are fundamentally different in the error terms. Note that E[η|X1] = 0, Cov( Zk, η) = 0, k = 1, . . . , p. However, the conditional variance of η given X1is

        Therefore, the conditional variance of η given X1is no longer a constant. This violates the fundamental assumption used in linear regression model.[1]

        The univariate linear regression of on assumes the following form of the model

        From (3) we know that generall

        Suppose (Y, Xi1, ..., Xip), i = 1, . . . , n, is a random sample from (1). Let Letbe the least square estimate of the univariate regression of Yion X1iin (4). Then

        and

        3. Two paradoxes in linear regression analysis

        In this section we show why the estimates of the coefficient of some covariates in the univariate regression and in the multiple regression do not match.More specif i cally, we show that in some cases, the estimate from the univariate regression is signif i cant,but the result from the multiple regression is not. On the other hand, in some cases, the result is signif i cant for the multiple regression but not for the univariate regression.

        Suppose (1) is the true multiple regression model.The univariate regression model uses model (4) by assuming that= 0. This assumption is generally wrong unless E[Xk|X1] is a constant (k = 2, . . . , p). Hence,with a correct multiple regression model, the estimate of the univariate analysis is based on a wrong model.This is the reason why the results from univariate regression and multiple regression do not match.Furthermore, result (5) shows that there is no clear interpretation of the estimate in the univariate analysis.

        We discuss two paradoxes related to univariate and multiple regressions through both theoretical derivations and simulation studies.

        3.1 Signif i cant covariate effect in multiple regression but not in univariate regression

        Let X2, X3, X4and ε be independent random variables with standard normal distributions. Consider the following model

        which is 0 if and only if

        From (5) we know that if (7) is true, the least square estimatorof the coefficient of the univariate regression of Y on X1will not be signif i cant, even though X1is necessary in specifying model (6).

        Example 1.Let α1= -3/5, α2= 3, α3= 4, β1= 1, β2= 2 in (6).The true model is

        Table 1 shows the simulation result of the estimates and standard deviations of the coefficient of X1in both univariate and multiple regressions after 10,000 replications. For a wide range of sample sizes, the least square estimator of the coefficient of X1in the multiple regression is very close to the true value, and the standard deviation decreases signif i cantly with the sample size. However, the estimate of coefficient in the univariate analysis is very close to 0 in all cases.

        According to the practice in medical publications[4-24], X1will not enter the multiple regression. Table 2 shows the result of the least square estimates of the coefficients of X2and X3after X1is removed in (8). It is easy to see that the estimate of the coefficient of X2is dramatically biased in the multiple regression after X1is removed due to the univariate analysis.

        3.2 Signif i cant covariate effect in univariate regression but not in multiple regression

        Suppose X1, X2, X3and ε are independent standard normal random variables, and X4= β1X1+β2X2,where

        Table 1. Estimate of the regression coefficientof X1

        Table 2. Estimates of the regression coefficients of X2 and X3 with X1 being removed

        Consider the following true model is

        If (9) is expanded to include X4and the expanded model still satisf i es the conditions of the linear regression, then the regression equation becomes

        From (9) and (10) we have

        or

        Example 2.Let α0= 0, α1= 1, α2= 2 in (9) and β1= β2=1, Table 3 shows the least square estimates of the coefficient of X4in both univariate and multiple linear regressions after 10,000 replications. For all sample sizes, the univariate regression shows that X4has very signif i cant effect on Y. However, in the multiple regression, the effect is not signif i cant.

        4. Transitivity of correlation

        Another issue around the regression analysis is the transitivity of the correlation in the interpretation.For example, some people may say like that: “Since factor A is highly correlated with outcome Y, and factor A and factor B are highly correlated, then B should be correlated with Y.” It seems very intuitive and reasonable that correlation is transitive. Unfortunately,this is not true. Here is a theoretical example. Suppose X and Z are independent standard normal random variables and Y=X+Z. It’s clear that the correlation between X and Y, and between Y and Z are both 0.707.However, the correlation between X and Z is 0.

        Table 3. Estimate of the regression coefficient of X4

        In our Example 2, the correlations between X4and X1and Y are 0.707 and 0.408, respectively. However,we proved in Section 3.2 shows that X4has no role in the multiple regression if X1and X2are in the model although X4is not a linear combination of X1and X2.

        5. Discussion

        Regression analysis in medical research usually involves many predictors (independent variables). The model selection is needed to pick covariates having signif i cant effect on the outcome. A widely used method in medical publications[4-24]is first to screen those covariates through univariate analysis. If a covariate is not significant in the univariate regression analysis,it will not enter the multiple regression analysis. The underlying assumption of this method is that is a covariate is significant in the multiple regression only if it is significant in the univariate regression analysis.Our results indicate that this assumption is wrong.A covariate may be very signif i cant in the univariate regression but has no role in the multiple regression (see Example 2 in Section 3). On the other hand, a covariate is a necessary part of a multiple regression but may be not correlated with the outcome (see Example 1 in Section 3). The initial univariate screening method totally ignores the correlation among covariates.There is no theoretical work to support this method.Our simulation results clearly show that the multiple regression results after the univariate screening may be dramatically biased and misleading. The biomedical community should stop using this procedure in their research and publications.

        Funding

        None

        Conflict of interest statement

        The authors report no conflict of interest related to this manuscript.

        Author’s contribution

        Ge Feng and Changyong Feng: theoretical derivation and revision

        Jing Peng, Dongke Tu, and Julia Z. Zheng: Simulation and manuscript drafting

        1. Seber GAF, Lee AJ. Linear regression analysis (2nd ed).Hoboken, NJ: Wiley; 2003

        2. Agresti A. Categorical data analysis (2nd ed). Hoboken, NJ:Wiley; 2002

        3. Cox DR. Regression models and life-tables (with discussion).J R STAT SOC. 1972; B. 34:187-220. doi: http://dx.doi.org/10.2307/2985181

        4. McIntyre LK, Arbabi S, Robinson EF, Maier RV. Analysis of Risk Factors for Patient Readmission 30 Days Following Discharge From General Surgery. JAMA Surgery. 2016; (Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.1258

        5. Bardia A, Sood A, Mahmood F, Orhurhu V, Mueller A,Montealegre-Gallegos M, et al. Combined epiduralgeneral anesthesia vs general anesthesia alone for elective abdominal aortic aneurysm repair. JAMA Surgery. 2016;(Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.2733

        6. Barlesi F, Mazieres J, Merlio JP, Debieuvre D, Mosser J, Lena H,et al. Routine molecular prof i ling of patients with advanced non-small-cell lung cancer: results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup(IFCT). Lancet. 2016; 387: 1415-1426. doi: http://dx.doi.org/10.1016/S0140-6736(16)00004-0

        7. Brooks GA, Kansagra AJ, Rao SR, Weitzman JI, Linden EA,Jacobson JO. A clinical prediction model to assess risk for chemotherapy-related hospitalization in patients initiating palliative chemotherapy. JAMA Oncology. 2015; 1(4): 441-447; doi: http://dx.doi.org/10.1001/jamaoncol.2015.0828

        8. Cronin PR, DeCoste L, Kimball AB. A multivariate analysis of dermatology missed appointment predictors. JAMA Dermatology. 2013; 149(12): 1435-1437. doi: http://dx.doi.org/10.1001/jamadermatol.2013.5771

        9. Fivez T, Kerklaan D, Mesotten D, Verbruggen S, Wouters PJ,Vanhorebeek I, et al. Early versus late parenteral nutrition in critically Ill children. N Engl J Med. 2016; 374(12): 1111-1122. doi: http://dx.doi.org/10.1056/NEJMoa1514762

        10. Geng E, Kreiswirth B, Burzynski J, Schluger NW. Clinical and radiographic correlates of primary and reactivation tuberculosis: a molecular epidemiology study. JAMA.2005; 293(22): 2740-2745. doi: http://dx.doi.org/10.1001/jama.293.22.2740

        11. Hole J, Hirsch M, Ball E, Meads C. Music as an aid for postoperative recovery in adults: a systematic review and meta-analysis. Lancet. 2015; 386: 1659-1671. doi: http://dx.doi.org/10.1016/S0140-6736(15)60169-6

        12. International CLL-IPI working group. An international prognostic index for patients with chronic lymphocytic leukaemia (CLL-IPI): A meta-analysis of individual patient data. Lancet Oncology. 2016; 17(6): 779-790. doi: http://dx.doi.org/10.1016/S1470-2045(16)30029-8

        13. Leon MB, Smith CR, Mack MJ, Makkar RR, Svensson LG,Kodali SK, et al. Transcatheter or surgical aortic-valve replacement in intermediate-risk patients. N Engl J Med.2016; 374(17): 1609-1620. doi: http://dx.doi.org/10.1056/NEJMoa1514616

        14. Li Y, Stocchi L, Cherla D, Liu X, Remzi FH. Association of preoperative narcotic use with postoperative complications and prolonged length of hospital stay in patients with crohn disease. JAMA Surgery. 2016; 151(8): 726-734. doi: http://dx.doi.org/10.1001/jamasurg.2015.5558

        15. Lorant V, Deli?ge D, Eaton W, Robert A, Philippot P, Ansseau M. Socioeconomic Inequalities in Depression: A Meta-Analysis. Am J Epidemiol. 2003; 157(2): 98-112. doi: http://dx.doi.org/10.1093/aje/kwf182

        16. van der Meer AJ, Veldt BJ, Feld JJ, Wedemeyer H, Dufour JF,Lammert F, et al. Association between sustained virological response and all-cause mortality among patients with chronic hepatitis C and advanced hepatic fi brosis. JAMA.2012; 308(24): 2584-2593. doi: http://dx.doi.org/10.1001/jama.2012.144878

        17. Mingrone G, Panunzi S, De Gaetano A, Guidone C, Iaconelli A, Nanni G, et al. Bariatricmetabolic surgery versus conventional medical treatment in obese patients with type 2 diabetes: 5 year follow-up of an open-label, single-centre,randomized controlled trial. Lancet. 2015; 386: 964-973. doi:http://dx.doi.org/10.1016/S0140-6736(15)00075-6

        18. Nelson KB, Ellenberg JH. Antecedents of cerebral palsy:I. univariate analysis of risks. Am J Dis Child. 1985;139(10): 1031-1038. doi: http://dx.doi.org/10.1001/archpedi.1985.02140120077032

        19. Nelson KB, Ellenberg JH. Antecedents of cerebral palsy:Multivariate analysis of risk. N Engl J Med. 1986; 315(2): 81-86. doi: http://dx.doi.org/10.1056/NEJM198607103150202

        20. NICE-SUGAR Study Investigators. Hypoglycemia and risk of death in critically ill patients. N Engl J Med. 2012; 367(12):1108-1118. doi: http://dx.doi.org/10.1056/NEJMoa1204942

        21. Pag?s F, Berger A, Camus M, Sanchez-Cabo F, Costes A,Molidor R, et al. Effector memory T cells, early metastasis,and survival in colorectal cancer. N Engl J Med. 2005;353(25): 2654-2666. doi: http://dx.doi.org/10.1056/NEJMoa051424

        22. Schwed AC, Boggs MM, Pham XD, Watanabe DM,Bermudez MC, Kaji AH, et al. Association of admission laboratory values and the timing of endoscopic retrograde cholangiopancreatography with clinical outcomes in acute cholangitis. JAMA Surgery. 2016; (Epub ahead of print). doi:http://dx.doi.org/10.1001/jamasurg.2016.2329

        23. Templin C, Ghadri JR, Diekmann J, Napp LC, Bataiosu DR, Jaguszewski M, et al. Clinical features and outcomes of takotsubo (stress) cardiomyopathy. N Engl J Med.2015; 373(10): 929-938. doi: http://dx.doi.org/10.1056/NEJMoa1406761

        24. Wood GC, Benotti PN, Lee CJ, Mirshahi T, Still CD, Gerhard GS, Lent MR. Evaluation of the association between preoperative clinical factors and long-term weight loss after roux-en-y gastric bypass. JAMA Surgery. 2016;(Epub ahead of print). doi: http://dx.doi.org/10.1001/jamasurg.2016.2334

        Ge Feng is a graduate student in the School of Geophysics and Oil Resources at Yangtze University,Wuhan, Hubei, China. His research interest includes statistical analysis in rock physics.

        線性回歸分析中的兩個悖論

        Feng G, Peng J, Dongke TU, Zheng JZ, Feng C

        向前選擇,向后消除,單變量回歸,多元回歸

        Regression is one of the favorite tools in applied statistics. However, misuse and misinterpreta-tion of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

        [Shanghai Arch Psychiatry. 2016; 28(6): 355-360.

        http://dx.doi.org/10.11919/j.issn.1002-0829.216084]

        1School of Geophysics and Oil Resource, Yangtze University, Wuhan, China

        2Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA

        3Department of Anesthesiology, University of Rochester, Rochester, NY, USA

        4School of Philosophy, Wuhan University, Wuhan, China

        5Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada

        *correspondence: Dr. Changyong Feng. Mailing address: Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Ave., Box 630, Rochester, NY, USA. Postcode: NY 14642. E-mail: Changyong_feng@urmc.rochester.edu

        概述:回歸是應用統(tǒng)計學中最受歡迎的工具之一。然而,回歸分析結(jié)果的誤用和誤解在生物醫(yī)學研究中是常見的。本文運用統(tǒng)計理論和模擬研究來說明有關(guān)這種普遍使用的統(tǒng)計方法的一些悖論。我們還特別指出在頂級醫(yī)學期刊發(fā)表的很多文章中廣泛使用的模型選擇程序事實上是錯誤的。模型選擇使用哪一種步驟化程序需基于可靠的統(tǒng)計理論。

        猜你喜歡
        醫(yī)學期刊生物醫(yī)學悖論
        芻議“生物醫(yī)學作為文化”的研究進路——兼論《作為文化的生物醫(yī)學》
        科學與社會(2022年4期)2023-01-17 01:20:04
        視神經(jīng)炎的悖論
        山西醫(yī)學期刊社簡介
        全科護理(2022年19期)2022-07-09 05:42:08
        山西醫(yī)學期刊社簡介
        全科護理(2022年16期)2022-06-09 07:24:38
        山西醫(yī)學期刊社簡介
        全科護理(2022年10期)2022-04-07 11:14:00
        山西醫(yī)學期刊社簡介
        全科護理(2022年8期)2022-03-23 01:00:22
        靈長類生物醫(yī)學前沿探索中的倫理思考
        科學與社會(2021年4期)2022-01-19 03:29:50
        海島悖論
        “帽子悖論”
        當代陜西(2019年9期)2019-05-20 09:47:10
        國外生物醫(yī)學文獻獲取的技術(shù)工具:述評與啟示
        欧洲美女黑人粗性暴交视频| 国产高清黄色在线观看91| 国产精品美女自在线观看| 国产亚洲一区二区三区| 国产人妻久久精品二区三区老狼| 亚洲人成影院在线无码观看| 亚洲伊人久久综合精品| 少妇人妻字幕精品毛片专区| 免费看av在线网站网址| 亚洲一区日韩无码| 亚洲一区二区三区99区| 久久精品人搡人妻人少妇| 国产乱国产乱老熟300部视频| 亚洲综合网在线观看首页| 中文字幕一区二区va| 日本一二三区在线观看视频 | 六月婷婷久香在线视频| 国产偷国产偷高清精品| 亚洲视频在线免费观看一区二区| 最近免费中文字幕中文高清6| 怡红院a∨人人爰人人爽| 久久中文字幕日韩无码视频| 亚洲中文字幕一区av| 国产夫妇肉麻对白| 亚洲男人第一av网站| 国产后入内射在线观看| 亚洲高清三区二区一区| 欧美极品色午夜在线视频| 任你躁欧美一级在线精品免费| 国产午夜激情视频在线看| 成人麻豆日韩在无码视频| 车上震动a级作爱视频| 亚洲av熟女天堂系列| 久久婷婷综合缴情亚洲狠狠| 夜先锋av资源网站| 亚洲精品亚洲人成在线播放| 国产av剧情精品麻豆| 中文字幕乱码熟妇五十中出| 日韩欧美第一页| 亚洲激情视频在线观看a五月| 欧美激情综合色综合啪啪五月|