解瑞飛 吳 波.杭州市腫瘤醫(yī)院信息科,浙江杭州 30002;2.浙江省臺州市中心醫(yī)院放療科,浙江臺州 38000
隨機生存森林在小細胞肺癌預后分析中的應(yīng)用
解瑞飛1吳波2▲
1.杭州市腫瘤醫(yī)院信息科,浙江杭州310002;2.浙江省臺州市中心醫(yī)院放療科,浙江臺州318000
目的 辨識與小細胞肺癌具有本質(zhì)關(guān)聯(lián)的基因變量,可以幫助臨床醫(yī)生制定個性化治療方案,延長患者生存期,提高患者預后生活質(zhì)量。方法 共入組 117例小細胞肺癌患者,含41000個基因變量,8個一般特征。利用隨機生存森林方法結(jié)合基因表達譜及預后數(shù)據(jù)從一系列基因變量中探索與小細胞肺癌具有密切相關(guān)的基因變量。結(jié)果 一般特征及EGFR、K-ras、p53表達在預后上無明顯差異;所挑選的前12個基因中,F(xiàn)TCD、BTC、PSMC4、SLC43A1與小細胞肺癌具有密切的關(guān)系,而UCHL5、PSMC4與PSMD7、PCSK4、VPS13D與VPS13A具有調(diào)控依賴關(guān)系。結(jié)論 隨機生存森林可以高效的辨識與預后具有密切相關(guān)的本質(zhì)基因。
小細胞肺癌;隨機生存森林;基因表達譜;生存分析;基因調(diào)控
[Abstract]Objective To distinguish the genetic variables with essential relevance with small cell lung cancer,which is able to help clinical physicians to formulate customized therapeutic protocols,prolong patients'survival time,and improve patients'prognosis and life quality.Methods A total of 117 patients with small cell lung cancer were included,with 41000 genetic variables and 8 general characteristics.Random survival forests were applied,combined with gene expression profile and prognostic data,genetic variables closely related to small cell lung cancer were explored in a series of genetic variables.Results General characteristics and EGFR,K-ras and p53 expressions were not significantly different in prognosis;in the former 12 selected genes,F(xiàn)TCD,BTC,PSMC4,SLC43A1 were closely related to small cell lung cancer,but UCHL5,PSMC4 and PSMD7,PCSK4,VPS13D and VPS13A were in the dependent relation of regulation.Conclusion Random survival forests are able to effectively distinguish the essential genes closely related to the prognosis.
[Key words]Small cell lung cancer;Random survival forests;Gene expression profile;Survival analysis;Gene regulation
在全球范圍,肺癌是最常見的惡性腫瘤之一,且死亡率較高,預后較差[1-3]。小細胞肺癌(small cell lung cancer,SCLC)較非小細胞肺癌 (non-small cell lung cancer,NSCLC)預后更差。在我國,超過80%的小細胞肺癌5年存活率不超過10%[4,5]。因此,尋找與SCLC發(fā)生發(fā)展相關(guān)的基因和分子,對于腫瘤的診斷和治療尤為重要[2,6,7]。
近年來,轉(zhuǎn)化醫(yī)學的研究逐漸被重視,越來越多的研究者致力于基因組學的研究。高維基因組數(shù)據(jù)和生存信息的結(jié)合可以幫助研究者從全新的角度認識個體生物學過程以及疾病的發(fā)生、發(fā)展及預后過程。隨機生存森林(random survival forest,RSF)[8-13]可以在高維基因組數(shù)據(jù)中有效地結(jié)合生存信息,提取與預后相關(guān)的基因變量,指導臨床醫(yī)生對患者進行個性化治療[14]。
1.1臨床資料
本文數(shù)據(jù)從117例小細胞肺癌患者中提取,共包含41000個基因,一般特征見表1。EGFR與性別、K-ras與性別及T分期具有較強的相關(guān)性。
1.2隨機生存森林
隨機生存森林是在隨機森林(Random Forest)基礎(chǔ)上,加入生存分析,采用bootstrap方法從原始數(shù)據(jù)中有放回的隨機抽取N個樣本,建立生存樹模型,而袋外37%樣本測試生存樹模型。
假設(shè)在樹節(jié)點h上有n(h)例樣本,(T1,δ1),…,(Tn,δn)表示他們的生存時間和截尾信息,δ=0表示個體i在時間Ti時右截位,δ=1表示在時間Ti時死亡,則給定的一個變量Xj(j=1,2,…,m),在節(jié)點h處可以根據(jù)Xj≤c和Xj>c將生存數(shù)據(jù)分為兩組數(shù)據(jù)。RSF在每棵樹的節(jié)點處,隨機選擇M個變量作為分割節(jié)點的候選變量,選擇使子節(jié)點生存差異最大的分支。樹節(jié)點分裂準則采用Log-Rank分裂方法,計算生存函數(shù)采用Kaplan-Meier估計方法。為了選擇極少最重要的基因變量,可以依據(jù)變量的重要性(VIMP)對變量進行篩選,VIMP值越大表明其預測能力越強。具體流程如下:第一步:清除缺失數(shù)據(jù);第二步:對所有基因,使用Cox模型;第三步:選擇P<0.005的基因變量;第四步:利用一般臨床特征及最終選擇的基因變量使用RSF,并根據(jù)其重要性對各變量進行排名。
表1 一般特征
對于臨床一般特征及EGFR、K-ras、p53突變,利用Kaplan-Meier和Log-rank進行數(shù)據(jù)分析,見封三圖1。從封三圖1可以看出,性別、年齡、EGFR、K-ras、p53在小細胞肺癌腺癌患者中,預后無明顯差異。而在T、N和臨床分期中,只有T1vs T4、N0vs N2、臨床Ⅰ期 vs臨床Ⅲ期所對應(yīng)的P均<0.001,提示差異存在統(tǒng)計學意義。
從封三圖2和表2可以看出,在建立模型過程中,隨著生存樹個數(shù)的增多,錯誤率趨于穩(wěn)定。對于不同的根據(jù)對預后的影響進行排序,前12個變量分別為:FTCD、UCHL5、RANBP9、YWHAQ、LOC151878、PPP2R5C、C20orf96、NFKBIB、BTC、SUMO3、PSMC4、 C6orf64。通過 Genecard數(shù)據(jù)庫分析,F(xiàn)TCD、BTC、PSMC4、SLC43A1與腫瘤具有密切的關(guān)系,而UCHL5、PSMC4與PSMD7,PCSK4、VPS13D與VPS13A具有調(diào)控依賴關(guān)系,如表3所示。與PCSK4、PSMC具有相關(guān)調(diào)控的基因關(guān)系如封三圖3所示,各基因相互影響,相互控制,共同影響腫瘤的生成及演化過程。
為了進一步驗證所獲得的敏感基因和臨床特征是否影響預后,對其采用Cox regression進行單因素和多因素分析,結(jié)果如表4所示。在單因素分析中,只有T、N、FTCD、UCHL5、BTC、PSMC4、PCSK4、SLC43A1具有統(tǒng)計學意義,對其進行多因素分析后,T、N、FTCD、UCHL5及PSMC4的P值小于0.05,具有統(tǒng)計學意義,即共同影響患者預后。
RSF方法利用基因表達譜,結(jié)合預后數(shù)據(jù),可以有效地篩選出與肺癌具有密切關(guān)系的基因,指導臨床醫(yī)生制定個性化治療方式,提高患者生活治療,延長患者生存期。
隨機生存森林有效地結(jié)合機器學習及臨床生存數(shù)據(jù),可以快速有效地識別與預后密切關(guān)系的本質(zhì)基因。由于隨機森林在挑選特征過程中考慮多個基因的聯(lián)合作用,所挑選出的基因組具有較強的相關(guān)性或具有相互調(diào)控關(guān)系,為后期分析基因之間調(diào)控關(guān)系、建立基因調(diào)控網(wǎng)絡(luò)奠定基礎(chǔ)。
在眾多的研究中,EGFR[27]、K-ras[28]、p53[29]位點是否發(fā)生突變影響著非小細胞肺癌的治療手段和方法,如EGFR突變時,EGFR酪氨酸激酶抑制劑(EGFR Tyrosine Kinase Inhibitors,EGFR-TKIs)吉非替尼和厄羅替尼可以顯著提高非小細胞肺癌患者的生存獲益,已被FDA批準用于治療晚期非小細胞肺癌(NSCLC)。然而,利用隨機生存森林驗證發(fā)現(xiàn),EGFR、K-ras、p53并未進入影響小細胞肺癌預后的敏感基因中,而通過Kaplan-Meier及單因素也再次驗證此3個突變位點并未影響患者的預后。因此,根據(jù)EGFR、K-ras、p53決定SCLC患者相應(yīng)治療方法的可能意義不大。
表2 變量重要性排名
表3 敏感基因的生物信息學
表4 臨床特征及敏感基因單因素和多因素分析
對于進入隨機生存森林的其他預后敏感基因,通過Cox回歸模型進行單因素及多因素分析發(fā)現(xiàn),T分期、N分期、FTCD、UCHL5、PSMC4具有統(tǒng)計相關(guān)性,并且與腫瘤預后具有較強的相關(guān)性。T分期為主要是通過腫瘤體積大小進行劃分,N分期通過淋巴結(jié)轉(zhuǎn)移位置及范圍進行劃分,腫瘤體積越小、淋巴結(jié)轉(zhuǎn)移范圍越?。碩、N分期越低),患者預后越好。而FTCD、UCHL5、PSMC4在發(fā)生突變的位點,預后風險比分別為1.569、2.194、2.314,與SCLC患者具有較顯著的關(guān)聯(lián)。有文獻已經(jīng)證實FTCD的敲除可以減少HIF-1α在低氧環(huán)境中的效果,加強HepG2在細胞中的化療敏感性且FTCD和HIF之間的存在相互調(diào)控關(guān)系;同時,已經(jīng)證實FTCD可作為一個靶基因用于治療肝癌患者[30]。Randles等[31]已經(jīng)證實UCHL5基因影響著細胞周期,UCHL5的蛋白缺失將導致細胞周期停止在G0/G1階段,無法正常進行。PSMC4是ATP亞基酶的一種,亞基酶已經(jīng)被證實與核激素受體的超高表達具有相互影響,在肝臟或肝臟蛋白中,并已經(jīng)確定具有兩個轉(zhuǎn)錄變異體[32]。對于臨床醫(yī)生,可以利用FTCD、UCHL5、PSMC4構(gòu)建預后預測模型,可以針對發(fā)生FTCD、UCHL5、PSMC4發(fā)生基因突變的患者使用特定的靶向藥物,抑制腫瘤的發(fā)展及惡化,提高患者預后。
隨機生存森林可以快速、高效的辨識與預后具有較強相關(guān)性的基因,進一步促進SCLC患者的精準醫(yī)療,精確的尋找到SCLC的原因和治療的靶點,并對不同狀態(tài)和過程進行精確分類,最終實現(xiàn)對SCLC患者進行個性化精準治療的目的,提高疾病診治與預防的效益。
[1]Menachery A,Burt J,Chappell S,et al.Dielectrophoretic characterization and separation of metastatic variants of small cell lung cancer cells[J].Une,2016,(3):386-389.
[2]Mitsudomi T.Molecular epidemiology of lung cancer and geographic variations with special reference to EGFR mutations[J].Transl Lung Cancer Res,2014,3(4):205-211.
[3]Jung KW,Won YJ,Kong HJ,et al.Cancer statistics in Korea:Incidence,mortality,survival,and prevalence in 2012[J]. Cancer Res Treat,2015,47(2):127-141.
[4]Chen W,Zheng R,Zeng H,et al.Epidemiology of lung cancer in China[J].Thorac Cancer,2015,6(2):209-215.
[5]Zhou C.Lung cancer molecular epidemiology in China:Recent trends[J].Transl Lung Cancer Res,2014,3(5):270-279.
[6]Chen W,Zheng R,Zeng H,et al.Geographic distribution and epidemiology of lung cancer during 2011 in Zhejiang province of China[J].Asian Pac J Cancer Prev,2014,15(13):5299-5303.
[7]Blakely CM,Pazarentzos E,Olivas V,et al.NF-kappaB-activating complex engaged in response to EGFR oncogene inhibition drives tumor cell survival and residual disease in lung cancer[J].Cell Rep,2015,11(1):98-110.
[8]Miao F,Cai YP,Zhang YX,et al.Risk prediction of oneyear mortality in patients with cardiac arrhythmias using random survival forest[J].Comput Math Methods Med,2015,2015:303250.
[9]Marino SR,Lin S,Maiers,et al.Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation[J].Bone Marrow Transplant,2012,47(2):217-226.
[10]Buhnemann C,Li S,Yu H,et al.Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis[J].PLoS One,2014,9(9):e107105.
[11]Choi JY,Kim SK,Lee WH,et al.A survival prediction model of rats in hemorrhagic shock using the random forest classifier[J].Conf Proc IEEE Eng Med Biol Soc,2012,2012:5570-5573.
[12]Shim JH,Jun MJ,Han S,et al.Prognostic nomograms for prediction of recurrence and survival after curative liver resection for hepatocellular carcinoma[J].Ann Surg,2015,261(5):939-946.
[13]Biesbroek S,vander ADl,Brosens MC,et al.Identifying cardiovascular risk factor-related dietary patterns with reduced rank regression and random forest in the EPICNL cohort[J].Am J Clin Nutr,2015,102(1):146-154.
[14]Kasinski AL,Kelnar K,Stahlhut C,et al.A combinatorial microRNA therapeutics approach to suppressing nonsmall cell lung cancer[J].Oncogene,2015,34(27):3547-3555.
[15]Seimiya Masanori,Tomonaga Takeshi,Matsushita Kazuyuki,et al.Identification of novel immunohistochemical tumor markers for primary hepatocellular carcinoma;clathrinheavychainandformiminotransferasecyclodeaminase[J].Hepatology,2008,48(2):519-530.
[16]Kawaguchi M,Hosotani R,Kogire,et al.Auto-induction and growth stimulatory effect of betacellulin in human pancreatic cancer cells[J].Int J Oncol,2000,16(1):37-41.
[17]Yamamoto T,Akisue T,Marui T,et al.Expression of betacellulin,heparin-binding epidermal growth factor and epiregulin in human malignant fibrous histiocytoma[J]. Anticancer Res,2004,24(3b):2007-2010.
[18]Moon WS,Park HS,Yu KH,et al.Expression of betacel-lulin and epidermal growth factor receptor in hepatocel lular carcinoma:Implications for angiogenesis[J].Hum Pathol,2006,37(10):1324-1332.
[19]Watanabe T,Shintani A,Nakata M,et al.Recombinant human betacellulin:Molecular structure,biological activities,and receptor interaction[J].J Biol Chem,1994,269 (13):9966-9973.
[20]Ocharoenrat P,Modjtahedi H,Rhys-Evans P,et al.Epidermal growth factor-like ligands differentially up-regulate matrix metalloproteinase 9 in head and neck squamous carcinoma cells[J].Cancer Res,2000,60(4):1121-1128.
[21]Sakon M,Kishimoto S,Aoki T,et al.A patient with HCC successfully treated by ethanol injection therapy with etoposide[J].Gan To Kagaku Ryoho,1996,23(11):1585-1587.
[22]Lu Z,Hu X,Li Y,et al.Human papillomavirus 16 E6 oncoprotein interferences with insulin signaling pathway by binding to tuberin[J].J Biol Chem,2004,279(34):35664-35670.
[23]Szabo A,Perou CM,Karaca M,et al.Statistical modeling for selecting housekeeper genes[J].Genome Biol,2004,5 (8):R59.
[24]Kokkinakis DM,Liu X,Chada S,et al.Modulation of gene expression in human central nervous system tumors under methionine deprivation-induced stress[J].Cancer Res,2004,64(20):7513-7525.
[25]Bassi DE,Mahloogi H,Klein-Szanto AJ.The proprotein convertases furin and PACE4 play a significant role in tumor progression[J].Mol Carcinog,2000,28(2):63-69.
[26]Cole KA,Chuaqui RF,Katz K,et al.cDNA sequencing and analysis of POV1(PB39):A novel gene up-regulated in prostate cancer[J].Genomics,1998,51(2):282-287.
[27]Paez J Guillermo,J?nne Pasi A,Lee Jeffrey C,et al.EGFR mutations in lung cancer:Correlation with clinical response to gefitinib therapy[J].Science,2014,304(5676):1497-1500.
[28]Johnson Leisa,Mercer Kim,Greenbaum Doron,et al.Somatic activation of the K-ras oncogene causes early onset lung cancer in mice[J].Nature,2001,410(6832):1111-1116.
[29]Denissenko Mikhail F,Pao Annie,Tang Moon-shong,et al. Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in P53[J].Science,1996,274 (5286):430-432.
[30]Yu Zhenhai,Ge Yingying,Xie Lei,et al.Using a yeast two-hybrid system to identify FTCD as a new regulator for HIF-1α in HepG2 cells[J].Cellular signalling,2014,7(26):1560-1566.
[31]Randles L,Anchoori RK,Roden RB,et al.Proteasome Ubiquitin Receptor hRpn13 and its Interacting Deubiquitinating Enzyme Uch37 are Required for Proper Cell Cycle Progression[J].J Biol Chem,2016,M115:694588.
[32]Choi HS,Seol W,Moore DD.A component of the 26S proteasome binds on orphan member of the nuclear hormone receptor superfamily[J].J Steroid Biochem Mol Biol,1996,56(6):23-30.
Application of random survival forests in the analysis of small cell lung cancer prognosis
XIE Ruifei1WU Bo2
1.Department of Information,Hangzhou Tumor Hospital,Hangzhou310002,China;2.Department of Radiology,Taizhou Central Hospital in Zhejiang Province,Taizhou318000,China
R734
A
1673-9701(2016)17-0004-05
浙江省科技廳公益技術(shù)研究社會發(fā)展項目(2015C33268);浙江省醫(yī)藥衛(wèi)生科技項目(2014KYA181);浙江省杭州市衛(wèi)生科技計劃(一般)項目(2014A33)
▲
(2016-04-29)