[摘" "要]" "目的:通過GEO多芯片聯(lián)合分析篩選出一組與肺癌發(fā)生密切相關的基因,作為預測肺癌的關鍵標志基因并進行初步驗證。方法:從GEO數(shù)據(jù)庫下載GSE89047、GSE108055與GSE116959肺癌表達數(shù)據(jù)集并進行合并,采用R語言中sva程序包ComBat矯正批次效應,limma程序包進行基因差異表達分析從中篩選出肺癌差異表達基因。利用String數(shù)據(jù)庫結(jié)合Cytoscape 3.8.2軟件構建差異表達基因蛋白質(zhì)相互作用網(wǎng)絡,并分析核心基因。運用ROC方法驗證肺癌差異基因、核心基因?qū)Ψ伟┰\斷的預測作用。通過TIMER數(shù)據(jù)庫分析GPM6A基因表達及拷貝數(shù)變異與免疫細胞浸潤的關系。結(jié)果:基于GEO數(shù)據(jù)庫GSE89047、GSE108055與GSE116959肺癌表達數(shù)據(jù)集多芯片聯(lián)合分析,篩選得到938個肺癌組織與正常肺組織間差異表達基因,以矯正的P值排序,TOP 10差異基因為GPM6A、WNT3A、SLC6A4、TMEM100、TCF21、BTNL9、HSPA12B、LIMS2、VGLL3和ITLN2。String數(shù)據(jù)庫結(jié)合Cytoscape 3.8.2軟件分析所得10個核心基因為CCNA2、CCNB1、CENPE、FOXM1、ITGAM、KIF11、KIF20A、KIF23、KIF2C和MMP9。ROC分析顯示GPM6A的AUC(95%CI)為0.948(0.874~0.986);TOP10差異基因的AUC(95%CI)為0.961(0.886~0.992);10個核心基因的AUC(95%CI)為0.830(0.722~0.895),表明這些標志基因具有較好的肺癌預測能力。TIMER分析結(jié)果顯示:肺腺癌及肺鱗癌中GPM6A表達均與巨噬細胞浸潤相關性最高(肺腺癌:r=0.347,Plt;0.001;肺鱗癌:r=0.425,Plt;0.001),GPM6A基因拷貝數(shù)變異在肺腺癌中與B細胞、CD4+T細胞、巨噬細胞、中性粒細胞和樹突狀細胞的免疫浸潤相關(均Plt;0.05),GPM6A基因拷貝數(shù)變異在肺鱗癌中與B細胞、CD8+T細胞、CD4+T細胞、巨噬細胞、中性粒細胞和樹突狀細胞的免疫浸潤均具有較高的相關性(均Plt;0.05)。結(jié)論:通過多芯片聯(lián)合分析初步開發(fā)、驗證了對肺癌診斷具有較好預測能力的標志基因,并發(fā)現(xiàn)差異最顯著的標志基因GPM6A與免疫細胞浸潤關系密切。
[關鍵詞]" "肺癌;多芯片聯(lián)合;預測;開發(fā);驗證;免疫浸潤
[中圖分類號]" "R734.2" " " " " " " "[文獻標志碼]" "A" " " " " " " "[文章編號]" "1674-7887(2024)04-0307-06
Development and preliminary verification of lung cancer diagnostic marker genes
based on the joint analysis of multiple chips
[Abstract]" "Objective: To screen out a group of genes closely related to the occurrence of lung cancer through GEO multichip combined analysis, as a key marker gene for predicting lung cancer, and conduct preliminary verification. Methods: Download the GSE89047, GSE108055 and GSE116959 lung cancer expression datasets from GEO database and merge them. The sva program package ComBat in the R language corrects the batch effect, and the limma program package performs gene differential expression analysis to screen out lung cancer differentially expressed genes. String database combined with Cytoscape 3.8.2 software to construct a differentially expressed gene protein-protein interaction network and analyze core genes. The ROC method was used to verify the predictive effect of lung cancer differential genes and core genes on the diagnosis of lung cancer. TIMER database was used to analyze the relationship between GPM6A gene expression and copy number variation and immune cell infiltration. Results: Based on the multi-chip combined analysis of the GEO database GSE89047, GSE108055 and GSE116959 lung cancer expression datasets, 938 differentially expressed genes between lung cancer tissues and normal lung tissues were screened and sorted by the corrected P value. The TOP 10 differential genes were GPM6A, WNT3A, SLC6A4, TMEM100, TCF21, BTNL9, HSPA12B, LIMS2, VGLL3 and ITLN2. The 10 core genes analyzed by String database combined with Cytoscape 3.8.2 software are CCNA2, CCNB1, CENPE, FOXM1, ITGAM, KIF11, KIF20A, KIF23, KIF2C and MMP9. ROC analysis showed that the AUC(95%CI) of GPM6A was 0.948(0.874-0.986); the AUC(95%CI) of the TOP10 differential genes was 0.961(0.886-0.992); the AUC(95%CI) of the 10 core genes was 0.830(0.722-0.895), indicating that the marker genes selected in this study have good lung cancer prediction ability. TIMER analysis showed that GPM6A expression correlated highest with macrophage infiltration in both lung adenocarcinoma and lung squamous carcinoma(lung adenocarcinoma: r=0.347, Plt;0.001; lung squamous carcinoma: r=0.425, Plt;0.001), GPM6A gene copy number variation correlated with immune infiltration of B cells, CD4+T cells, macrophages, neutrophils and dendritic cells in lung adenocarcinoma(Plt;0.05), and GPM6A gene copy number variation correlated with immune(Plt;0.05), and GPM6A gene copy number variants were highly correlated with immune infiltration of B cells, CD8+T cells, CD4+T cells, macrophages, neutrophils and dendritic cells in lung squamous carcinoma(Plt;0.05). Conclusion: In this study, we initially developed and verified some marker genes with better predictive ability for lung cancer diagnosis through multi-chip combined analysis, and found that the most significant difference marker gene GPM6A is closely related to immune cell infiltration.
[Key words]" "lung cancer; multi-chip combination; prediction; development; verification; immune infiltration
肺癌是指支氣管黏膜或呼吸系統(tǒng)腺體的惡性腫瘤。在世界范圍內(nèi),肺癌是主要的公共衛(wèi)生問題,它是第二大常見癌癥,是癌癥相關死亡的第一大原因[1]。根據(jù)組織病理學類型,肺癌可分為非小細胞肺癌和小細胞肺癌,后者被發(fā)現(xiàn)與吸煙密切相關[2]。非小細胞肺癌約占所有肺癌病例的85%,其主要表現(xiàn)為呼吸道癥狀和局部壓迫癥狀。由于缺乏有效的診斷方法,大多數(shù)肺癌在中晚期才被發(fā)現(xiàn)。因此,肺癌的早期發(fā)現(xiàn)和治療對控制其死亡率具有重要作用。
在目前的早期篩查方法中,胸片和痰細胞學是經(jīng)濟實用的方法。然而,它們的敏感性和特異性都不是很高。低劑量電腦斷層掃描可檢測肺內(nèi)僅幾毫米大小的小病變,敏感性高,但特異性較差。它也會給良性腫瘤結(jié)節(jié)患者造成精神和經(jīng)濟負擔,甚至可能對身體造成不必要的創(chuàng)傷[3]。靶向治療作用于肺癌基因組突變確定的關鍵治療點,但仍使少數(shù)患者受益[4]。現(xiàn)有研究[5]表明,遺傳多態(tài)性和高外顯率基因等是在個體對肺癌的易感性中起重要作用的遺傳因素。因此,需要新的癌癥生物標志物來盡快診斷、預測和治療肺癌。
本研究旨在通過多芯片聯(lián)合分析初步開發(fā)、驗證對肺癌診斷具有較好預測能力的標志基因,并分析標志基因與免疫細胞浸潤的關系,以期為肺癌早期診斷提供可靠的生物學標志物。
1" "材料與方法
1.1" "肺癌組織與正常肺組織差異表達基因分析與篩選" "利用美國國立生物技術信息中心的GEO數(shù)據(jù)庫(https://www.ncbi.nlm.nih.gov/geo/)選取并下載GSE89047、GSE108055、GSE116959肺癌表達數(shù)據(jù)集的基因表達文件及注釋文件。對上述3個數(shù)據(jù)集進行合并,通過R語言中sva軟件包ComBat矯正批次效應,以消除不同平臺的數(shù)據(jù),同一平臺不同時期的數(shù)據(jù),同一樣品不同試劑的數(shù)據(jù),以及同一樣品不同時間數(shù)據(jù)的批次效應。采用R語言limma程序包進行基因差異表達分析,以FDR(adj.P.value)lt;0.05、logFC絕對值gt;1篩選肺癌組織和正常肺組織間的差異基因。并通過R語言繪制基因表達火山圖及熱圖。
進一步利用TCGA數(shù)據(jù)庫樣本對多芯片聯(lián)合分析所得的差異基因進行驗證。使用箱線圖顯示基因表達水平的分布,使用R軟件Wilcoxon檢驗評估腫瘤和鄰近正常肺組織之間差異表達的統(tǒng)計顯著性。
1.2" "蛋白質(zhì)相互作用(protein-protein interaction, PPI)網(wǎng)絡分析及核心基因篩選" "通過String(https://www.string-db.org/)數(shù)據(jù)庫進行差異基因的PPI網(wǎng)絡分析,Cytoscape 3.8.2軟件對PPI網(wǎng)絡進行可視化分析,并通過該軟件的cytoHubba模塊以Degree為依據(jù)計算差異基因的核心基因(Hub基因)。
1.3" "肺癌診斷標志基因的驗證" "GEO多芯片聯(lián)合分析數(shù)據(jù)集納入的肺癌患者為研究對象(GSE89047、GSE108055、GSE116959),利用正常肺組織和肺癌組織中TOP1、TOP10、Hub10基因表達數(shù)據(jù)進行ROC曲線分析,以驗證本研究開發(fā)的基因?qū)Ψ伟┰\斷的可靠性。
1.4" "肺癌診斷標志基因與免疫浸潤的關系" "TIMER數(shù)據(jù)庫評估GPM6A基因表達及拷貝數(shù)變異與肺癌免疫細胞浸潤的關系。GPM6A基因與肺癌中免疫浸潤水平的相關性通過Spearman方法進行分析??截悢?shù)變異與肺癌免疫細胞浸潤的關系通過箱線圖進行展示。雙側(cè)Wilcoxon秩和檢驗將每個拷貝數(shù)變異類別與正常水平進行統(tǒng)計分析。
2" "結(jié)" " " 果
2.1" "肺癌組織與正常肺組織的差異表達基因" "基于GEO數(shù)據(jù)庫的多芯片聯(lián)合分析,共納入肺癌組織119例,正常肺組織29例,其中肺癌亞型包括肺腺癌、小細胞癌,獲得差異表達基因938個,包括674個在肺癌組織中上調(diào)的基因,264個在肺癌組織中下調(diào)的基因(圖1)。以FDR(adj.P.value)為依據(jù)排序,差異基因TOP1為GPM6A;TOP10為GPM6A、WNT3A、SLC6A4、TMEM100、TCF21、BTNL9、HSPA12B、LIMS2、VGLL3和ITLN2。TOP100差異表達基因如圖2所示。通過TCGA數(shù)據(jù)庫分析顯示GPM6A在多種癌癥中具有顯著的表達差異,在肺腺癌和肺鱗癌中均引起了顯著的表達差異(Plt;0.001)(圖3)。
2.2" "PPI網(wǎng)絡的建立及Hub基因的篩選" "通過String數(shù)據(jù)庫與Cytoscape軟件構建938個差異基因的PPI網(wǎng)絡。在Cytoscape軟件中運用cytoHubba模塊以Degree為依據(jù)計算差異基因的10個Hub基因為CCNA2、CCNB1、CENPE、FOXM1、ITGAM、KIF11、KIF20A、KIF23、KIF2C和MMP9(圖4)。
2.3" "肺癌診斷標志基因驗證" "以GPM6A、TOP10基因及Hub10基因表達為依據(jù)對肺癌患者發(fā)病進行預測,繪制ROC曲線(圖5)。其中GPM6A的AUC(95%CI)為0.948(0.874~0.986);TOP10基因的AUC(95%CI)為0.961(0.886~0.992);Hub10基因的AUC(95%CI)為0.830(0.722~0.895),表明本研究所選的標志基因具有較好的肺癌預測能力。
2.4" "GPM6A與免疫細胞浸潤的關系" "TIMER數(shù)據(jù)庫分析表明,肺腺癌及肺鱗癌中GPM6A表達均與巨噬細胞浸潤相關性最高(肺腺癌:r=0.347,Plt;0.001;肺鱗癌:r=0.425,Plt;0.001)(圖6)。GPM6A基因拷貝數(shù)變異在肺腺癌中與B細胞、CD4+T細胞、巨噬細胞、中性粒細胞和樹突狀細胞的免疫浸潤相關(Plt;0.05);GPM6A基因拷貝數(shù)變異在肺鱗癌中與B細胞、CD8+T細胞、CD4+T細胞、巨噬細胞、中性粒細胞和樹突狀細胞的免疫浸潤均具有較高的相關性(Plt;0.05)(圖7)。
3" "討" " " 論
晚期肺癌因肝轉(zhuǎn)移等原因常導致治療效果差、術后生存時間較短[6-8]。因此肺癌的早期診斷顯得尤為重要。本研究通過多芯片聯(lián)合分析發(fā)現(xiàn)了諸如GPM6A、WNT3A、SLC6A4等具有較高表達差異的肺癌相關基因。其中,GPM6A在人類B細胞惡性腫瘤的發(fā)展中發(fā)揮作用,可能充當潛在原癌基因[9]。WNT3A作為Wnt信號通路的主要成分能將癌細胞轉(zhuǎn)化為侵襲性和轉(zhuǎn)移性表型,通過促進上皮間質(zhì)轉(zhuǎn)化、調(diào)節(jié)MMP的表達和其他在細胞外基質(zhì)調(diào)節(jié)中起作用的因素來促進癌癥的轉(zhuǎn)移進展[10-11]。SLC6A4基因單核苷酸多態(tài)性與癌癥易感性及嚴重程度密切相關[12-14]。
肺癌組織與正常肺組織間的差異基因中核心基因的計算顯示,ITGAM、KIF11、MMP9等基因的度(Degree)較大,相互作用的基因較多。ITGAM是與炎癥反應發(fā)展相關的基因之一,ITGAM基因的蛋白質(zhì)產(chǎn)物負責Ⅱ型干擾素受體的功能和炎癥介質(zhì)分泌的調(diào)節(jié)[15]。ITGAM基因調(diào)控區(qū)變異是頭頸癌接受調(diào)強放療患者營養(yǎng)不良的新預測因子[16]。KIF11作為有絲分裂相關基因影響非小細胞肺癌患者的預后[17]。MMP9的遺傳變異及活性改變可影響肺癌的易感性及預后[18-19]。
以GPM6A、TOP10差異基因及Hub10基因表達為依據(jù)對肺癌患者發(fā)病進行預測的ROC曲線表明,GPM6A、TOP10差異基因作為標志基因的肺癌診斷預測方法相對更優(yōu),其AUC值均>0.900,表明運用差異基因GPM6A及TOP10差異基因進行肺癌預測的診斷準確度較高。Hub10基因的AUC值介于0.800~0.900之間,表明其肺癌預測的診斷準確度適中。
為進一步探究肺癌差異基因具有高肺癌診斷準確度的潛在原因,本研究分析了最顯著的差異基因GPM6A與肺癌患者免疫細胞浸潤的關系。結(jié)果顯示,GPM6A表達及拷貝數(shù)變異均會影響肺癌的免疫細胞浸潤水平。先前的研究[20]顯示肺癌的發(fā)生和擴散不僅依賴于腫瘤細胞的特性,而且還受到與免疫系統(tǒng)相互作用的影響。最近,對肺癌的免疫療法也顯現(xiàn)出顯著的生存獲益[21]。
綜上,本研究通過多芯片聯(lián)合分析初步開發(fā)、驗證了對肺癌診斷具有較好預測能力的標志基因,并發(fā)現(xiàn)差異最顯著的標志基因GPM6A與免疫細胞浸潤關系密切。
[參考文獻]
[1]" "SIEGEL R L, MILLER K D, FUCHS H E, et al. Cancer statistics, 2021[J]. CA Cancer J Clin, 2021, 71(1):7-33.
[2]" "WALTER J E, HEUVELMANS M A, DE BOCK G H, et al. Relationship between the number of new nodules and lung cancer probability in incidence screening rounds of CT lung cancer screening: The NELSON study[J]. Lung Cancer, 2018, 125:103-108.
[3]" "KLUTSTEIN M, NEJMAN D, GREENFIELD R, et al. DNA methylation in cancer and aging[J]. Cancer Res, 2016, 76(12):3446-3450.
[4]" "SPENCER D H, LEY T J. Sequencing of tumor DNA to guide cancer risk assessment and therapy[J]. JAMA, 2018, 319(14):1497.
[5]" "MALHOTRA J, MALVEZZI M, NEGRI E, et al. Risk factors for lung cancer worldwide[J]. Eur Respir J, 2016, 48(3):889-902.
[6]" "ZHU R F, LIU Z H, JIAO R, et al. Updates on the patho-genesis of advanced lung cancer-induced cachexia[J]. Thorac Cancer, 2019, 10(1):8-16.
[7]" "吳海山, 鄒端萍, 李建成. 影響非小細胞肺癌腦轉(zhuǎn)移預后的相關因素探討[J]. 南通大學學報(醫(yī)學版), 2018, 38(1):58-62.
[8]" "張國偉, 程瑞瑞, 張國俊, 等. 有或無肝轉(zhuǎn)移的晚期非小細胞肺癌應用納武利尤單抗的療效差異: 一項回顧性隊列研究[J]. 現(xiàn)代腫瘤醫(yī)學, 2021, 29(15):2615-2619.
[9]" "YOSHIMURA K, HANAOKA T, OHNAMI S, et al. Allele frequencies of single nucleotide polymorphisms(SNPs) in 40 candidate genes for gene-environment studies on cancer: data from population-based Japanese random samples[J]. J Hum Genet, 2003, 48(12):654-658.
[10]" "SIMMONS C P, KOINIS F, FALLON M T, et al. Prognosis in advanced lung cancer—a prospective study examining key clinicopathological factors[J]. Lung Cancer, 2015, 88(3):304-309.
[11]" "CHARFI C, EDOUARD E, RASSART E. Identification of GPM6A and GPM6B as potential new human lymphoid leukemia-associated oncogenes[J]. Cell Oncol, 2014, 37(3):179-191.
[12]" "ZHANG Q, BAI X L, CHEN W, et al. Wnt/β-catenin signaling enhances hypoxia-induced epithelial-mesenc-hymal transition in hepatocellular carcinoma via crosstalk with hif-1α signaling[J]. Carcinogenesis, 2013, 34(5):962-973.
[13]" "SAVAS S, HYDE A, STUCKLESS S N, et al. Serotonin transporter gene(SLC6A4) variations are associated with poor survival in colorectal cancer patients[J]. PLoS One, 2012, 7(7):e38953.
[14]" "LI C Y, SONG G R, ZHANG S Y, et al. Wnt3a increases the metastatic potential of non-small cell lung cancer cells in vitro in part via its upregulation of Notch3[J]. Oncol Rep, 2015, 33(3):1207-1214.
[15]" "CRISPN J C, HEDRICH C M, TSOKOS G C. Gene-function studies in systemic lupus erythematosus[J]. Nat Rev Rheumatol, 2013, 9:476-484.
[16]" "MAZUREK M, MLAK R, HOMA-MLAK I, et al. Poly-morphism of the regulatory region of the ITGAM gene(-323Ggt;A) as a novel predictor of a poor nutritional status in head and neck cancer patients subjected to intensity-modulated radiation therapy[J]. J Clin Med, 2020, 9(12):4041.
[17]" "SCHNEIDER M A, CHRISTOPOULOS P, MULEY T, et al. AURKA, DLGAP5, TPX2, KIF11 and CKAP5: five specific mitosis-associated genes correlate with poor prognosis for non-small cell lung cancer patients[J]. Int J Oncol, 2017, 50(2):365-372.
[18]" "LI W, JIA M X, WANG J H, et al. Association of MMP9-1562C/T and MMP13-77A/G polymorphisms with non-small cell lung cancer in southern Chinese population[J]. Biomol-ecules, 2019, 9(3):E107.
[19]" "DONG D D, ZHOU H, LI G. ADAM15 targets MMP9 activity to promote lung cancer cell invasion[J]. Oncol Rep, 2015, 34(5):2451-2460.
[20]" "ROSENTHAL R, CADIEUX E L, SALGADO R, et al. Neoantigen-directed immune escape in lung cancer evo-lution[J]. Nature, 2019, 567(7749):479-485.
[21]" "BRAHMER J R. Harnessing the immune system for the treatment of non-small-cell lung cancer[J]. J Clin Oncol, 2013, 31(8):1021-1028.