臧衛(wèi)東
摘 要 目的: 通過生物信息學分析乳腺癌中具有自更新能力的乳腺球樣本,挖掘與自更新能力有關的關鍵基因,為乳腺癌治療提供基礎和理論依據。方法:首先通過比較原位乳腺癌樣本(breast cancer, BC)與乳腺癌的乳腺球樣本(mammosphere samples, MS)的mRNA芯片表達數(shù)據,獲得差異表達基因(differentially expressed genes,DEGs)。隨后構建DEGs的蛋白與蛋白相互作用 (protein-protein interaction, PPI)網絡,并從中篩選出一個高度關聯(lián)的子網絡,最后對子網絡進行功能富集分析。結果:MS和BC兩組樣本間共有1 083個DEGs。從這些DEGs構建得到的PPI網絡中,獲得了一個包含49個DEGs的高度關聯(lián)的子網絡,其中tspo、igf1、fn1 和cdk1為子網絡的核心基因。結論:這些核心基因可能是乳腺癌細胞中與自更新相關的基因。
關鍵詞 乳腺癌 乳腺球 自我更新 差異表達基因 蛋白與蛋白相互作用網絡
中圖分類號:R737.9 文獻標識碼:A 文章編號:1006-1533(2018)01-0076-05
Analysis of critical genes related to self-renewal in the mammosphere model of breast cancer by bioinformatics
ZANG Weidong*
(Shanghai Fengheng Biotechnology Co., Ltd., Shanghai 200240, China)
ABSTRACT Objective: To explore the key genes related to self-renewal in breast cancer by bioinformatics, which may provide a basic theoretical basis for the treatment of breast cancer. Methods: The mRNA microarray data from breast cancer(BC) and mammosphere samples (MS) were compared. The protein-protein interaction (PPI) network of differentially expressed genes (DEGs) was constructed and a highly correlated subnetwork was screened out, and then the functional enrichment analysis was performed on the subnetwork. Results: There were 1 083 DEGs between MS and BC samples. Then the PPI network was constructed based on these DEGs. Subsequently, a highly correlated subnetwork containing 49 DEGs was obtained from the PPI network. Notably, tspo, igf1, fn1 and cdk1 were considered as the core genes of the subnetwork. Conclusion: These core genes may be associated with self-renewal in breast cancer cells.
KEY WORDS breast cancer; mammosphere; self-renewal; differentially expressed genes; protein-protein interaction
network
乳腺癌(breast cancer,BC)是發(fā)生在乳腺腺上皮組織的惡性腫瘤,多發(fā)生于女性,男性僅占1%,全世界每年約有100萬例新發(fā)病例和40萬死亡病例[1]。乳腺并不是維持人體生命活動的重要器官,所以原位乳腺癌并不致命;但癌細胞轉移后,會危及生命。乳腺癌細胞的一些子細胞系(如CD44+/CD24-/low細胞)能抵抗治療并導致癌癥復發(fā)[2]。CD44+/CD24-/low可以從乳腺癌組織中分離出來并通過體外移植到具備自更新(self-renewal)能力的乳腺球樣本(mammosphere samples,MS)中培養(yǎng)[3]。此外,MS培養(yǎng)可以為BC細胞的腫瘤誘導亞群的進一步表征提供高度適宜的模型[4]。Creighton等[5]對原位乳腺癌樣本和乳腺癌的乳腺球樣本的生物芯片表達譜數(shù)據進行分析,發(fā)現(xiàn)經過傳統(tǒng)治療后殘留的CD44+/CD24-/low在MS樣本中具有高表達特征。Creighton等[5]認為與上皮間充質轉化(EMT)相關的靶蛋白或許能夠治療癌細胞并抑制BC復發(fā),但能抑制BC復發(fā)的目標基因或蛋白質在他們的研究中很少提及。本文利用生物信息學分析Creighton的基因芯片數(shù)據,嘗試挖掘出與抗癌細胞治療和復發(fā)相關的關鍵基因,為乳腺癌的相關研究提供基礎和理論依據。
1 材料與方法
1.1 表達譜數(shù)據獲取
從Gene Expression Omnibus(GEO,http://www. ncbi.nlm.nih.gov/geo/)中選取下載實驗組GSE7515芯片表達數(shù)據[5]。此套表達譜數(shù)據集共有26個樣本,其中包括11個原位乳腺癌的樣本和15個乳腺癌的乳腺球樣本。該芯片采用Affymetrix Human Genome U133Plus 2.0 Array平臺進行檢測。利用Affy軟件包中的GCRMA方法[6]對所有樣本mRNA表達數(shù)據進行預處理,并從Probe ID轉換Gene Symbol并處理后,得到Gene Symbol對應的表達矩陣,總共獲得19 851個Gene Symbols。endprint