亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        全基因組測(cè)序及其在遺傳性疾病研究及診斷中的應(yīng)用

        2014-05-10 01:25:04邵謙之姜毅吳金雨
        遺傳 2014年11期
        關(guān)鍵詞:變異基因組癌癥

        邵謙之,姜毅,吳金雨

        1. 溫州醫(yī)科大學(xué)基因組醫(yī)學(xué)研究院,溫州 325000;

        2. 中國(guó)科學(xué)院北京生命科學(xué)研究院,北京 100101

        隨著高通量測(cè)序技術(shù)(Next generation sequencing,NGS)的不斷發(fā)展,特別是隨著測(cè)序費(fèi)用的逐年降低以及數(shù)據(jù)分析流程的日趨成熟,全基因組測(cè)序(WGS)已經(jīng)成為疾病研究、臨床診斷中重要的手段[1,2]。研究者已經(jīng)運(yùn)用全基因組測(cè)序來(lái)檢測(cè)癌癥、孟德?tīng)栠z傳病、復(fù)雜疾病的致病突變和致病基因,取得了前所未有的科研成果[3]。本文就全基因組測(cè)序的數(shù)據(jù)分析及其在疾病研究和臨床診斷中的應(yīng)用進(jìn)行綜述。

        1 全基因組測(cè)序的背景介紹

        近年來(lái),隨著高通量測(cè)序技術(shù)的不斷發(fā)展與成熟,全基因組測(cè)序被應(yīng)用到了各種領(lǐng)域,尤其是在遺傳性疾病研究方面的應(yīng)用備受關(guān)注[1,2,4~6]。目前人類(lèi)已知的疾病中,大約有4000多種疾病與基因異常有關(guān)[7]。利用全基因組測(cè)序,可在全基因組水平上檢測(cè)與人類(lèi)疾病相關(guān)的單核苷酸變異(SNVs)、插入缺失(InDels)、拷貝數(shù)變異(CNV)和結(jié)構(gòu)變異(SV)等多種全面的突變信息,進(jìn)而找到致病突變并研發(fā)有效的治療藥物,為臨床用藥提供指導(dǎo)。

        價(jià)格昂貴一直是全基因組測(cè)序發(fā)展的一個(gè)重大阻礙,然而隨著Hiseq X Ten的出現(xiàn),全基因組測(cè)序的成本已大幅下降,測(cè)序費(fèi)用僅需1000美元。Hiseq X Ten是由Illumina公司研發(fā)的有史以來(lái)最強(qiáng)大的測(cè)序平臺(tái),旨在提供大規(guī)模人類(lèi)基因組測(cè)序服務(wù)。它由10臺(tái)超高通量測(cè)序儀組成,每臺(tái)測(cè)序儀的產(chǎn)出效率是Hiseq 2000的12倍,每天可產(chǎn)出高達(dá)600 GB的數(shù)據(jù)量,全年可以完成約18000人次全基因組測(cè)序。數(shù)據(jù)分析速度慢則是全基因組發(fā)展的另一個(gè)難題,受數(shù)據(jù)量及分析軟件的限制,全基因組數(shù)據(jù)分析需要1 d以上。然而2014年7月,Dutch生物信息公司宣布開(kāi)發(fā)的Genalice Map軟件可以成功實(shí)現(xiàn)1 min比對(duì)人類(lèi)全基因組,并在將來(lái)的合作中繼續(xù)測(cè)試10000個(gè)人類(lèi)全基因組。此外,由Edico Genome開(kāi)發(fā)的生物科技處理器(Dynamic Read Analysis for Genomics,DRAGEN),作為全球首款新一代測(cè)序生物信息特殊應(yīng)用集成電路,可以將用于分析整個(gè)人類(lèi)基因組數(shù)據(jù)所需的24 h銳減為18 min,同時(shí)還確保了分析的準(zhǔn)確性。相信不久以后,其他分析步驟也將在幾分鐘內(nèi)完成。

        盡管全基因組測(cè)序面臨著價(jià)格昂貴、數(shù)據(jù)分析速度慢等難題,但是由于其能檢測(cè)結(jié)構(gòu)變異以及非編碼區(qū)的SNVs、InDels等,目前在國(guó)內(nèi)已被應(yīng)用于一系列遺傳性疾病的研究。早在2003年,趙國(guó)屏課題組就利用全基因組測(cè)序分析鉤端螺旋體病[8]。此后,全基因組測(cè)序逐漸被應(yīng)用于肝癌[9]、膀胱癌[10]、胰腺癌[11]、腹膜間皮瘤[12]、自閉癥[13]等疾病致病機(jī)理的研究。謝曉亮課題組于2012年底利用其新近發(fā)明的MALBAC擴(kuò)增技術(shù)對(duì)一個(gè)亞洲男子的99個(gè)精子進(jìn)行單細(xì)胞全基因組DNA擴(kuò)增,首次實(shí)現(xiàn)了單個(gè)精子高覆蓋度的全基因組測(cè)序[14]。此外,該課題組還首次利用上述 MALBAC基因組擴(kuò)增高通量測(cè)序?qū)υ嚬軏雰哼M(jìn)行單基因遺傳病篩查,該嬰兒已于 2014年9月19日在北京大學(xué)第三醫(yī)院誕生,標(biāo)志著我國(guó)胚胎植入前遺傳診斷技術(shù)已處于世界領(lǐng)先水平。由此可見(jiàn),全基因組測(cè)序已成為現(xiàn)階段基因測(cè)序工作的重心。全基因組測(cè)序的時(shí)代已經(jīng)到來(lái),勢(shì)不可擋。

        2 全基因組測(cè)序的數(shù)據(jù)分析流程

        全基因組測(cè)序的數(shù)據(jù)分析流程包括質(zhì)量控制(Quality control)、比對(duì)(Mapping)、突變檢測(cè)(Call variant )、突變注釋(Annotation)。針對(duì)不同數(shù)據(jù)要求,已有多款分析軟件得以開(kāi)發(fā)(表1),目前廣泛使用的分析流程為“BWA+ GATK + ANNOVAR”(附圖 1)。

        2.1 質(zhì)量控制

        對(duì)測(cè)序產(chǎn)生的原始數(shù)據(jù)(Raw data)進(jìn)行去接頭、過(guò)濾低質(zhì)量處理,得到 Clean data的過(guò)程稱(chēng)為質(zhì)量控制。質(zhì)量控制能除去部分測(cè)序效果較差的序列,提高后續(xù)分析的準(zhǔn)確性。經(jīng)過(guò)該步驟通常會(huì)過(guò)濾掉5%~15%低質(zhì)量的序列。

        2.2 比對(duì)到參考基因組

        將質(zhì)量控制后的Clean data比對(duì)到參考基因組上,得到每條序列的比對(duì)位置、比對(duì)質(zhì)量值等信息。目前最主流的比對(duì)軟件為 BWA(Burrows-Wheeler Aligner)[18],它能將短序列準(zhǔn)確快速地比對(duì)到參考基因組上,生成通用的 SAM 格式的文件。自 2013年起 BWA發(fā)布了新算法 BWA MEM,可以比對(duì)70bp~1 Mb的序列,比原來(lái)的算法更加準(zhǔn)確,運(yùn)行速度也更加快[54]。

        表1 全基因組數(shù)據(jù)分析常用軟件

        2.3 突變檢測(cè)

        比對(duì)好的SAM文件通常會(huì)轉(zhuǎn)換成BAM文件并進(jìn)行去重(Remove duplication),然后進(jìn)行突變的檢測(cè)。目前主流檢測(cè)SNV和InDel的軟件為Genome Analysis Toolkit (GATK,http://www.broadinstitute.org/gatk/),GATK準(zhǔn)確度非常高,它會(huì)對(duì)BAM文件進(jìn)行兩次校正過(guò)程以提高突變檢測(cè)的準(zhǔn)確率,但是速度比較慢。2014年 3月,Broad宣布最新版GATK(version3.1)在突變檢測(cè)速度上將比原來(lái)快3~5倍,使全基因組的分析時(shí)間從3 d縮短到1 d。

        由于全基因組測(cè)序具有較好的均一性和覆蓋度,因此在 CNVs的檢測(cè)方面具有眾多優(yōu)勢(shì)。目前已經(jīng)發(fā)表了多種CNV的檢測(cè)方法與軟件,可以分為兩大類(lèi)別:(1)基于深度差異的檢測(cè)方法受測(cè)序局部不均一性的影響,往往假陽(yáng)性率比較高; (2)基于讀段對(duì)之間的距離檢測(cè) CNV的方法能相對(duì)準(zhǔn)確地找到斷點(diǎn)。若讀段對(duì)之間的距離明顯超過(guò)正常大小,就可以認(rèn)為這對(duì)讀段之間存在 CNV。另外,有些比對(duì)不上的讀段拆成兩條讀段后能分別比對(duì)到染色體上不同位置,這兩個(gè)位置之間也可能存在 CNV。廣義上的SVs包括CNVs和倒位、易位等多種類(lèi)別,因此SVs的檢測(cè)比 CNVs更為復(fù)雜,往往需要多款軟件結(jié)合使用,才能更準(zhǔn)確地找到可能的SVs。CNVs和SVs都需要通過(guò)Sanger測(cè)序?qū)帱c(diǎn)進(jìn)行驗(yàn)證才能最終確定,如果無(wú)法確定斷點(diǎn)的則需要通過(guò) qPCR驗(yàn)證。

        最近,越來(lái)越多研究表明新生突變(de novo mutation)在散發(fā)性疾病中扮演重要的角色[55],特別是在神經(jīng)精神疾病中鑒定到一系列的致病基因[56,57]。因此,具有核心家系(例如:患者以及患者的父親與母親)的全基因組測(cè)序也開(kāi)始得到廣泛應(yīng)用。目前已經(jīng)開(kāi)發(fā)出了一系列的軟件與工具,這些軟件對(duì)多個(gè)樣品同時(shí)鑒定突變,并篩選出僅在患者出現(xiàn)突變。新生突變通常都是極端稀有,對(duì)散發(fā)性疾病具有重要作用。

        2.4 注釋突變及預(yù)測(cè)致病基因

        每一個(gè)全基因組的樣品,平均可以檢測(cè)到大約3000000個(gè)突變。為了篩選致病的候選突變并用于后續(xù)功能驗(yàn)證,需要通過(guò)諸如 ANNOVAR[37]等軟件對(duì)其進(jìn)行注釋。一方面,利用已知突變數(shù)據(jù)庫(kù)(如dbSNP139[58]、ESP6500[59]、1000 Genome[60]等),去除在數(shù)據(jù)庫(kù)中出現(xiàn)頻率較高的突變,并將剩下的突變注釋到基因組上的各個(gè)基因區(qū)間(如外顯子區(qū)、內(nèi)含子區(qū)、5′-UTR區(qū)或3′-UTR區(qū))和突變對(duì)蛋白質(zhì)編碼的改變情況(如錯(cuò)義突變、無(wú)義突變或移碼突變);另一方面,通過(guò)多個(gè)疾病數(shù)據(jù)庫(kù)(OMIM[48]、MGI[49]、Cosmic[50]、ClinVar[51]、HGMD[52]等)將部分已知突變與疾病表型聯(lián)系起來(lái),并利用多款預(yù)測(cè)軟件(如SIFT[61]、Polyphen[62]、GERP++[63]、LRT[64]等)對(duì)這些突變進(jìn)行有害性和保守型預(yù)測(cè),最終鑒定導(dǎo)致疾病發(fā)生的相關(guān)基因及突變。

        隨著科研人員對(duì)遺傳性疾病的進(jìn)一步研究,發(fā)現(xiàn)在非編碼區(qū)域,特別是一些位于高度保守區(qū)域、啟動(dòng)子區(qū)域以及重要調(diào)控區(qū)域的突變對(duì)疾病的發(fā)生仍然具有不可替代的作用[65,66]。非編碼區(qū)的功能分析常用 FunSeq[53]軟件進(jìn)行。FunSeq過(guò)濾掉 1000 genomes中的突變后,根據(jù)突變是否在某些功能元件上、是否在敏感區(qū)域、是否中斷轉(zhuǎn)錄因子模體、靶標(biāo)基因是否已知及靶標(biāo)基因是否在網(wǎng)絡(luò)中心等對(duì)剩下的突變進(jìn)行打分,篩選出可能有害的突變。如果有多個(gè)樣本一起分析,FunSeq還可以判斷一個(gè)突變是否是頻發(fā)突變(Recurrent mutation)。另外,還需要充分利用ENCODE數(shù)據(jù)庫(kù)(http://genome.ucsc.edu/ENCODE/),里面包含了多種細(xì)胞系不同功能元件的注釋信息(如啟動(dòng)子、增強(qiáng)子、轉(zhuǎn)錄因子等),可以為非編碼的研究提供參考。

        3 全基因組測(cè)序在疾病研究及臨床診斷中的應(yīng)用

        全基因組測(cè)序給疾病研究以及致病基因的篩選帶來(lái)了前所未有的機(jī)遇。近年來(lái),通過(guò)全基因組測(cè)序方法,已在孟德?tīng)栠z傳病、癌癥等疾病中鑒定到了一系列的致病突變和基因,已經(jīng)成為致病基因鑒定和臨床診斷的重要手段之一[4,7]。

        3.1 在癌癥中的應(yīng)用

        癌癥是指細(xì)胞的生長(zhǎng)與增生不受機(jī)體控制,從而引起機(jī)體功能受損的一類(lèi)疾病。癌細(xì)胞的基因組缺乏穩(wěn)定性,容易發(fā)生各種突變,進(jìn)而改變細(xì)胞功能,使患者產(chǎn)生一系列的臨床癥狀。高通量測(cè)序技術(shù)特別是全基因組測(cè)序?qū)Π┌Y中體細(xì)胞突變的鑒定,疾病的診斷與治療提供了最直接有效的方法之一,并得到了廣泛應(yīng)用。通過(guò)全基因組測(cè)序,許多癌癥已經(jīng)被廣泛研究,并取得了一系列的研究成果。Pleasance等[67]在 2010年首次通過(guò)全基因組測(cè)序得到了黑色素瘤的全基因組突變譜。他們發(fā)現(xiàn),黑色素瘤的體細(xì)胞突變?cè)诨蚪M上面不均一分布,絕大部分的突變都是C>T/G>A這種類(lèi)型,而這些突變絕大部分發(fā)生在CpC/GpG上面。產(chǎn)生這種特異突變普的原因可能是黑色素瘤患者長(zhǎng)期暴露于紫外線照射中。 Pleasance等[68]采用全基因組測(cè)序技術(shù),在小細(xì)胞肺癌中卻發(fā)現(xiàn) G>T/C>A轉(zhuǎn)換在所有突變中占主要部分,并且更傾向于發(fā)生在 CpG上面,揭示這種特殊的突變譜可能與患者的長(zhǎng)期吸煙有關(guān)。Lee等[69]對(duì)肺癌進(jìn)行全基因組測(cè)序卻發(fā)現(xiàn) C>T/G>A轉(zhuǎn)換占突變的比例最高,并且富集于 CpG上面,暗示可能同甲基化的脫氨基作用有關(guān)。由于不同癌癥具有不同的發(fā)病機(jī)理,因此可能會(huì)表現(xiàn)出不同的突變譜。全基因組測(cè)序提供了最直接有效、無(wú)偏向性地的方法系統(tǒng)分析癌癥突變譜,為深入了解致病機(jī)理提供指導(dǎo)。

        不同癌癥不但具有特異的突變譜,同時(shí)還具有不同的突變頻率,差距可能達(dá)到1000倍以上[70]。橫紋肌樣瘤的突變頻率最小,每一Mb區(qū)域約發(fā)生0.1個(gè)突變; 然而黑色素瘤的突變頻率最高,達(dá)到100/Mb。研究表明,組織差異性可能是造成突變頻率差異最直接的原因,而且受較大外界壓力(如吸煙、紫外線照射等)的癌癥通常具有較高的突變頻率。另外,同一種癌癥的不同患者攜帶的突變數(shù)量同樣具有很大的差異性。例如,在黑色素瘤和肺癌中,突變頻率最少的樣品只有 0.1/Mb,而突變頻率最高的樣品卻達(dá)到100/Mb以上。盡管如此,研究者們使用全基因組技術(shù),從 SNVs、InDels、CNVs和 SVs等多個(gè)角度尋找致病突變,找到一系列可復(fù)制的致病基因。Puente等[71]對(duì) 4對(duì)慢性淋巴細(xì)胞性白血病(Chronic lymphocytic leukaemia)樣品進(jìn)行全基因組測(cè)序,鑒定到46個(gè)對(duì)蛋白功能有害的突變。大樣本量驗(yàn)證后發(fā)現(xiàn) 4個(gè)基因(NOTCH1、XPO1、MYD88和KLHL6)攜帶復(fù)發(fā)突變。Roberts等[72]對(duì)15例急性淋巴細(xì)胞白血病樣品進(jìn)行全基因組測(cè)序,在多個(gè)基因(ABL1、JAK2、PDGFRB、CRLF2 和 EPOR)中發(fā)現(xiàn)了結(jié)構(gòu)變異,同時(shí)在 IL7R、FLT3和 SH2B3基因中鑒定到多個(gè)害突變。對(duì)這些基因的功能進(jìn)行深入分析后發(fā)現(xiàn),體細(xì)胞突變減弱了相應(yīng)蛋白同絡(luò)氨酸激酶抑制劑的結(jié)合,因此與絡(luò)氨酸酶抑制劑相關(guān)的藥物對(duì)這些患者的定向治療將具有重要臨床指導(dǎo)意義。最近,Wang等[73]使用全基因組測(cè)序技術(shù),對(duì)100對(duì)胃癌樣品進(jìn)行全面分析,包括編碼區(qū)域和非編碼區(qū)域的點(diǎn)突變、插入缺失、拷貝數(shù)變異、結(jié)構(gòu)變異、基因表達(dá)以及甲基化圖譜,成功鑒定已知的胃癌致病基因(TP53、ARID1A 和 CDH1)以及新的胃癌致病基因(MUC6、 CTNNA2、GLI3和RNF43等)。通過(guò)全基因組測(cè)序,已經(jīng)在白血病[71,72,74]、黑色素瘤[75]、腦膜瘤[76]、乳腺癌[77]、成神經(jīng)管細(xì)胞瘤[78]、腎癌[79]、小細(xì)胞肺癌[80]、結(jié)腸癌[81]和甲狀腺癌[82]等多種癌癥中鑒定到一系列的致病突變和基因。

        由于全基因組測(cè)序?qū)Y(jié)構(gòu)變異與非編碼區(qū)變異的檢測(cè)具有無(wú)可比擬的優(yōu)勢(shì),該技術(shù)已經(jīng)全面應(yīng)用于癌癥領(lǐng)域,使得科研工作者對(duì)癌癥的發(fā)生發(fā)展有更深入的了解。隨著測(cè)序成本的降低以及數(shù)據(jù)分析手段的發(fā)展,更多的癌癥和樣品將被測(cè)序,并鑒定到一系列有可重復(fù)的致病基因。為了更好的研究癌癥,科研工作者們已經(jīng)成立了國(guó)際基因組聯(lián)盟(International Cancer Genome Consortium,ICGC),到目前為止該聯(lián)盟已經(jīng)公布了超過(guò)10000個(gè)癌癥基因組數(shù)據(jù)。全基因組測(cè)序已經(jīng)成為癌癥研究的工作重心,有益于系統(tǒng)分析致病基因參與的分子通路,將為臨床用藥提供最有效依據(jù),使得癌癥的治愈也將成為可能。

        3.2 在神經(jīng)與精神疾病中的應(yīng)用

        全基因組測(cè)序技術(shù)不僅在癌癥等疾病中得以應(yīng)用,也逐步被應(yīng)用到其他常見(jiàn)遺傳病中,尤其是神經(jīng)與精神疾病。全基因組測(cè)序在結(jié)構(gòu)變異的鑒定方面存在無(wú)可比擬的優(yōu)勢(shì),可以準(zhǔn)確的找到斷點(diǎn)位置,精確定位致病基因。Talkowski等[83]對(duì)具有神經(jīng)發(fā)育障礙的患者進(jìn)行全基因組測(cè)序并鑒定到 33個(gè)區(qū)域。這些區(qū)域的致病基因可以歸類(lèi)為4種類(lèi)別:(1)已知的致病基因(AUTS2、FOXP1和CDKL5); (2)單個(gè)基因的區(qū)域(SATB2、EHMT1); (3)新的候選基因與區(qū)域(CHD8、KIRREL3和ZNF507); (4)同其他神經(jīng)精神疾病相關(guān)的基因(TCF4、ZNF804A、PDE10A、GRIN2B和ANK3)。他們的研究表明多個(gè)基因可能共同作用,并產(chǎn)生多種多樣的表型。Michaelson等[33]在2012年對(duì)10個(gè)自閉癥譜系障礙(Autism spectrum disorder,ASD)核心家系(患者以及正常的父母親)進(jìn)行全基因組測(cè)序。分析發(fā)現(xiàn)新生突變?cè)谌蚪M范圍內(nèi)的分布不是隨機(jī)的,而是存在一定的熱點(diǎn)區(qū)域,而且這些熱點(diǎn)區(qū)域同疾病具有重要的關(guān)系。他們還發(fā)現(xiàn)基因組不同區(qū)域的突變速速率同基因組中的多種因素(如 CG含量、復(fù)制時(shí)間、轉(zhuǎn)錄水平和敏感位點(diǎn)等)存在一定聯(lián)系。最終他們提出了一種回歸模型,引入上面多種因素,可以準(zhǔn)確地計(jì)算自閉癥患者在基因組不同區(qū)域的突變速率,為熱點(diǎn)區(qū)域的鑒定提供參考和依據(jù)。同時(shí)他們還發(fā)現(xiàn)公共數(shù)據(jù)庫(kù)中的致病基因,不管是顯性遺傳還是隱性遺傳都具有較高的突變速率。此外, Kong等[84]在全基因組水平證明新生突變的個(gè)數(shù)與父親的年齡存在著顯著的關(guān)系,而不是母親的年齡。而且父親的年齡每增加一歲,小孩攜帶的平均新生突變個(gè)數(shù)將增加兩個(gè),從而增加了患神經(jīng)精神疾病的風(fēng)險(xiǎn)。

        科研人員通過(guò)全基因組測(cè)序不但揭示了突變發(fā)生的一些本質(zhì)規(guī)律,同時(shí)還有效地鑒定了一系列致病基因。Jiang等[13]對(duì)32個(gè)自閉癥(ASD)核心家系進(jìn)行全基因組測(cè)序,最大可能地將臨床表型同遺傳變異聯(lián)系起來(lái),從新生突變、稀有遺傳變異等多個(gè)角度進(jìn)一步解釋ASD的發(fā)病機(jī)理。他們的研究鑒定到一系列與 ASD 相關(guān)的致病基因,包括 CAPRIN1、AFF2、VIP、SCN2A、KCNQ2和 CHD7等。針對(duì)ASD這類(lèi)具有高度異質(zhì)性的遺傳病,全基因組測(cè)序能夠更有效地鑒定致病突變與基因。最近,Nature雜志發(fā)表了對(duì) 50個(gè)智力殘疾的核心家系進(jìn)行全基因組測(cè)序的研究,鑒定到84個(gè)在編碼區(qū)域的新生突變,以及8個(gè)新生CNVs[5]。得力于與全基因測(cè)序的高覆蓋度和均一性,能夠?qū)?62%的患者進(jìn)行臨床診斷,找到明確的致病基因,充分肯定了全基因組測(cè)序的重要意義。

        全基因組測(cè)序在神經(jīng)精神疾病的運(yùn)用才剛剛開(kāi)始,更多的基因組測(cè)序?qū)⒈煌瓿?。例?中美科研機(jī)構(gòu)將合作完成“萬(wàn)人自閉癥基因組研究計(jì)劃”。 這個(gè)項(xiàng)目有助于更全面地了解、發(fā)現(xiàn)絕大多數(shù)自閉癥兒童患病原因,并能應(yīng)用于對(duì)自閉癥兒童的早期臨床診斷和家庭的產(chǎn)前篩查,最終了解自閉癥的發(fā)病機(jī)理并開(kāi)發(fā)出有效治療方法??傊?全基因組測(cè)序?qū)⒃谏窠?jīng)精神疾病中得到更為廣泛的應(yīng)用。

        3.3 在臨床診斷中的應(yīng)用

        全基因組測(cè)序技術(shù)不僅在疾病致病基因的研究中扮演著重要的角色,它還廣泛地應(yīng)用于臨床上一些疾病的診斷、篩查,為疾病的預(yù)防以及治療提供依據(jù)。目前,大多數(shù)的產(chǎn)前診斷都是基于有創(chuàng)性的侵入檢查手段,如羊膜腔穿刺術(shù)、胎兒臍帶血穿刺等。這種侵入性技術(shù)對(duì)孕婦以及嬰兒都存在一定的傷害,甚至可能導(dǎo)致流產(chǎn)[85,86]。侵入性產(chǎn)前診斷通過(guò)分析母親血樣中的胎兒DNA,避免了穿刺損失、感染和流產(chǎn)的風(fēng)險(xiǎn),減輕了孕婦的精神壓力,易為廣大孕婦和家屬接受。 目前,全基因組測(cè)序已在無(wú)創(chuàng)產(chǎn)前診斷(Non-invasive prenatal testing,NIPT)領(lǐng)域顯現(xiàn)雛形。一方面,可通過(guò)全基因組測(cè)序技術(shù),非侵入性檢查染色體非整倍異常,為 21三體綜合征、18三體綜合征等的準(zhǔn)確診斷提供了一個(gè)有效的解決方案。Lau等[87]通過(guò)全基因組測(cè)序技術(shù)對(duì) 5例孕婦進(jìn)行無(wú)創(chuàng)產(chǎn)前診斷,并在 4例中發(fā)現(xiàn)了染色體異常,準(zhǔn)確診斷出21三體綜合征、18三體綜合征等。Lau等[2]通過(guò)全基因組測(cè)序?qū)?1982例樣品的游離 DNA進(jìn)行分析,并證明即使深度較低(0.1x),全基因組測(cè)序也能檢測(cè)到染色體結(jié)構(gòu)變異,對(duì)常見(jiàn)的三體綜合征做出準(zhǔn)確的診斷。另一方面,還可通過(guò)全基因組測(cè)序,非侵入性診斷諸如癌癥等基因異常性疾病。Leary等[88]對(duì) 10例結(jié)直腸癌、乳腺癌樣品以及 10例正常人進(jìn)行全基因組測(cè)序,在 ERBB2基因和CDK6基因找到了染色體拷貝數(shù)變異和重排,并證明可以不依賴(lài)活組織檢查而進(jìn)行無(wú)創(chuàng)診斷。

        隨著高通量測(cè)序技術(shù)的發(fā)展,以高通量、自動(dòng)化、高準(zhǔn)確度為顯著特征的第二代測(cè)序技術(shù)(NGS)已被成熟地運(yùn)用于一些疾病的診斷和篩查。其中,基于全外顯子測(cè)序技術(shù)的基因診斷已成功地對(duì)先天性氯腹瀉[89]、新生兒糖尿病[90]、難治性炎性腸病[91]和 Charcot-Marie-Tooth atrophy綜合征[92]等疾病進(jìn)行分子水平的診斷。但是由于該技術(shù)對(duì)結(jié)構(gòu)變異與非編碼區(qū)變異的研究具有局限性,在一些由結(jié)構(gòu)變異或非編碼區(qū)變異所導(dǎo)致的復(fù)雜疾病(如 21三體綜合征)面前則力不能及。

        4 全基因組測(cè)序數(shù)據(jù)分析面臨的挑戰(zhàn)

        盡管全基因組測(cè)序能夠有效地挖掘全基因組范圍內(nèi)的多種變異,為遺傳性疾病的研究以及臨床診斷提供極大便利。但是,由于下機(jī)數(shù)據(jù)量的巨大增加給全基因組測(cè)序的數(shù)據(jù)分析帶來(lái)巨大挑戰(zhàn)。(1)數(shù)據(jù)存儲(chǔ):一個(gè)標(biāo)準(zhǔn)的全基因數(shù)據(jù)通常在100 GB左右,再加上分析得到的clean data、BAM文件、SAM文件以及突變結(jié)果文件,一個(gè)全基因組數(shù)據(jù)往往還需要額外300 GB的存儲(chǔ)空間。例如,100個(gè)標(biāo)本的全基因組數(shù)據(jù),完成所有數(shù)據(jù)分析至少需要 30 TB以上的存儲(chǔ)空間。(2)數(shù)據(jù)分析效率:如此巨大的數(shù)據(jù)將給數(shù)據(jù)分析效率以及服務(wù)器的運(yùn)算性能帶來(lái)巨大的挑戰(zhàn)。數(shù)據(jù)分析過(guò)程中往往需要使用多線程,同時(shí)還需要將數(shù)據(jù)分成多份同時(shí)運(yùn)算,以加快數(shù)據(jù)分析效率。(3)篩選致病變異:通常情況下,通過(guò)全基因組測(cè)序?qū)⒎治龅玫酱蠹s 3000000個(gè) SNV以及InDels,如何從如此眾多的突變中,特別是非編碼區(qū)域重要調(diào)控原件中尋找致病突變成為亟待解決的問(wèn)題。與此同時(shí),還有可能找到多個(gè)CNV/SV,如何確定這些變異對(duì)疾病的貢獻(xiàn)也存在巨大挑戰(zhàn)。(4)CNV/SV鑒定的準(zhǔn)確率:盡管目前發(fā)表了多款基于全基因組測(cè)序鑒定 CNV/SV的方法與工具,但是準(zhǔn)確率都不高,同時(shí)還存在一定的假陰性。盡管如此,在闡明疾病的發(fā)病機(jī)理時(shí),全基因組測(cè)序在疾病的基因診斷和致病基因的研究中仍具有不可替代的作用。

        5 展 望

        目前,全基因組測(cè)序技術(shù)已在疾病研究和臨床診斷中得到日益廣泛的應(yīng)用,特別是對(duì)妊娠過(guò)程中母體血漿中存在游離的胎兒DNA (Fetal DNA)[93]通過(guò)全基因組測(cè)序進(jìn)行無(wú)創(chuàng)產(chǎn)前診斷。另一方面,隨著大數(shù)據(jù)時(shí)代的來(lái)臨,為了使大數(shù)據(jù)能夠得到更快的分析和更有效的利用,全基因組測(cè)序必然向著數(shù)據(jù)的云存儲(chǔ)、云計(jì)算等方向發(fā)展。(1)云存儲(chǔ):基于分布式原理存儲(chǔ)高通量數(shù)據(jù),極大地降低數(shù)據(jù)分析時(shí)輸入、輸出和中間數(shù)據(jù)量,從而加快數(shù)據(jù)分析速度,在相同的時(shí)間里處理更多的測(cè)序數(shù)據(jù); (2)云計(jì)算:開(kāi)發(fā)基于并行原理的生物信息學(xué)軟件,并行地處理高通量數(shù)據(jù),提高數(shù)據(jù)分析過(guò)程中每個(gè)步驟的效率,充分利用計(jì)算資源,從而消耗資源更少,數(shù)據(jù)分析更迅速; (3)測(cè)序與分析一體化:即把高通量測(cè)序與后續(xù)數(shù)據(jù)分析相結(jié)合,下機(jī)得到的數(shù)據(jù)不僅有測(cè)序結(jié)果,還有檢測(cè)到的各種突變(SNVs、InDels、CNVs或 SVs),并與云存儲(chǔ)的疾病數(shù)據(jù)庫(kù)相關(guān)聯(lián),以預(yù)估病人各種遺傳疾病的風(fēng)險(xiǎn)。目前已有相關(guān)的工具得以開(kāi)發(fā)如 MegaSeq[94],利用全基因組測(cè)序進(jìn)行疾病的基因診斷和致病基因的研究將是一個(gè)非常有前景的領(lǐng)域。又如,biobambam[95]可以在不損害比對(duì)重要信息的情況下,對(duì) BAM 文件進(jìn)行大幅度壓縮100倍以上,極大地縮減了數(shù)據(jù)存儲(chǔ)空間。最近,基因組學(xué)重要軟件 SAMtools也在基因組數(shù)據(jù)量的快速上升的背景下進(jìn)行了重要升級(jí),最新版本支持壓縮和全球共享數(shù)據(jù)。總而言之,全基因組測(cè)序的時(shí)代已經(jīng)到來(lái),將會(huì)在遺傳性疾病的研究和臨床診斷中發(fā)揮更重要的作用。

        附錄

        附圖1見(jiàn)文章電子版(www.Chinagene.cn)。

        [1]Dewey FE,Grove ME,Pan CP,Goldstein BA,Bernstein JA,Chaib H,Merker JD,Goldfeder RL,Enns GM,David SP,Pakdaman N,Ormond KE,Caleshu C,Kingham K,Klein TE,Whirl-Carrillo M,Sakamoto K,Wheeler MT,Butte AJ,Ford JM,Boxer L,Ioannidis JP,Yeung AC,Altman RB,Assimes TL,Snyder M,Ashley EA,Quertermous T. Clinical interpretation and implications of whole-genome sequencing. JAMA,2014,311(10): 1035–1045.

        [2]Lau TK,Cheung SW,Lo PSS,Pursley AN,Chan MK,Jiang F,Zhang H,Wang W,Jong LFJ,Yuen OKC,Chan HYC,Chan WSK,Choy KW. Non-invasive prenatal testing for fetal chromosomal abnormalities by low-coverage whole-genome sequencing of maternal plasma DNA: review of 1982 consecutive cases in a single center. Ultrasound Obst Gyn,2014,43(3): 254–264.

        [3]Cirulli ET,Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet,2010,11(6): 415–425.

        [4]Rabbani B,Tekin M,Mahdieh N. The promise of whole-exome sequencing in medical genetics. J Hum Genet,2014,59(1): 5–15.

        [5]Gilissen C,Hehir-Kwa JY,Thung DT,van de Vorst M,van Bon BWM,Willemsen MH,Kwint M,Janssen IM,Hoischen A,Schenck A,Leach R,Klein R,Tearle R,Bo T,Pfundt R,Yntema HG,de Vries BBA,Kleefstra T,Brunner HG,Vissers LELM,Veltman JA. Genome sequencing identifies major causes of severe intellectual disability. Nature,2014,511(7509): 344–347.

        [6]Egan JB,Shi CX,Tembe W,Christoforides A,Kurdoglu A,Sinari S,Middha S,Asmann Y,Schmidt J,Braggio E,Keats JJ,Fonseca R,Bergsagel PL,Craig DW,Carpten JD,Stewart AK. Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events,evolution,and clonal tides.Blood,2012,120(5): 1060–1066.

        [7]Boycott KM,Vanstone MR,Bulman DE,MacKenzie AE.Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet,2013,14(10): 681–691.

        [8]Ren SX,Fu G,Jiang XG,Zeng R,Miao YG,Xu H,Zhang YX,Xiong H,Lu G,Lu LF,Jiang HQ,Jia J,Tu YF,Jiang JX,Gu WY,Zhang YQ,Cai Z,Sheng HH,Yin HF,Zhang Y,Zhu GF,Wan M,Huang HL,Qian Z,Wang SY,Ma W,Yao ZJ,Shen Y,Qiang BQ,Xia QC,Guo XK,Danchin A,Girons IS,Somerville RL,Wen YM,Shi MH,Chen Z,Xu JG,Zhao GP. Unique physiological and pathogenic features of Leptospira interrogans revealed by wholegenome sequencing. Nature,2003,422(6934): 888–893.

        [9]Kan ZY,Zheng HC,Liu X,Li SY,Barber TD,Gong ZL,Gao H,Hao K,Willard MD,Xu JC,Hauptschein R,Rejto PA,Fernandez J,Wang G,Zhang QH,Wang B,Chen RH,Wang J,Lee NP,Zhou W,Lin Z,Peng ZY,Yi K,Chen SP,Li L,Fan XM,Yang J,Ye R,Ju J,Wang K,Estrella H,Deng SB,Wei P,Qiu M,Wulur IH,Liu JG,Ehsani ME,Zhang CS,Loboda A,Sung WK,Aggarwal A,Poon RT,Fan ST,Wang J,Hardwick J,Reinhard C,Dai H,Li YR,Luk JM,Mao M. Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res,2013,23(9): 1422–1433.

        [10]Guo GW,Sun XJ,Chen C,Wu S,Huang PD,Li ZS,Dean M,Huang Y,Jia WL,Zhou Q,Tang AF,Yang ZQ,Li XX,Song PF,Zhao XK,Ye R,Zhang SQ,Lin Z,Qi MF,Wan SQ,Xie LF,Fan F,Nickerson ML,Zou XJ,Hu XD,Xing L,Lv ZJ,Mei HB,Gao SJ,Liang CZ,Gao ZB,Lu JX,Yu Y,Liu CX,Li L,Fang XD,Jiang ZM,Yang J,Li CL,Zhao X,Chen J,Zhang F,Lai YQ,Lin ZG,Zhou FJ,Chen H,Chan HC,Tsang S,Theodorescu D,Li YR,Zhang XQ,Wang J,Yang HM,Gui YT,Wang J,Cai ZM. Wholegenome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat Genet,2013,45(12):1459–1463.

        [11]張麗,陳杰. 全基因組關(guān)聯(lián)研究及第二代測(cè)序技術(shù)在胰腺癌中的相關(guān)研究. 中華病理學(xué)雜志,2014,43(2):132–135.

        [12]陳賓,馬建婷,陳利玲,洪旭濤,唐曉婧,陳樞青. 腹膜間皮瘤組織體細(xì)胞突變的分析. 浙江大學(xué)學(xué)報(bào) (醫(yī)學(xué)版),2013,42(4): 426–430.

        [13]Jiang YH,Yuen RK,Jin X,Wang MB,Chen N,Wu XL,Ju J,Mei JP,Shi YJ,He MZ,Wang GB,Liang JQ,Wang Z,Cao DD,Carter MT,Chrysler C,Drmic IE,Howe JL,Lau L,Marshall CR,Merico D,Nalpathamkalam T,Thiruvahindrapuram B,Thompson A,Uddin M,Walker S,Luo J,Anagnostou E,Zwaigenbaum L,Ring RH,Wang J,Lajonchere C,Wang J,Shih A,Szatmari P,Yang HM,Dawson G,Li YR,Scherer SW. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet,2013,93(2):249–263.

        [14]Lu SJ,Zong CH,Fan W,Yang MY,Li JS,Chapman AR,Zhu P,Hu XS,Xu LY,Yan LY,Bai F,Qiao J,Tang FC,Li RQ,Xie XS. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science,2012,338(6114): 1627–1630.

        [15]Patel RK,Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE,2012,7(2): e30619.

        [16]Yang X,Liu D,Liu F,Wu J,Zou J,Xiao X,Zhao FQ,Zhu BL. HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics,2013,14: 33.

        [17]Dai MH,Thompson RC,Maher C,Contreras-Galindo R,Kaplan MH,Markovitz DM,Omenn G,Meng F.NGSQC: cross-platform quality analysis pipeline for deep sequencing data. BMC Genomics,2010,11(Suppl. 4): S7.

        [18]Li H,Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics,2010,26(5): 589–595.

        [19]Langmead B,Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat methods,2012,9(4): 357–359.

        [20]Li RQ,Li YR,Kristiansen K,Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics,2008,24(5): 713–714.

        [21]Li H,Handsaker B,Wysoker A,Fennell T,Ruan J,Homer N,Marth G,Abecasis G,Durbin R,1000 Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics,2009,25(16): 2078–2079.

        [22]Koboldt DC,Zhang QY,Larson DE,Shen D,McLellan MD,Lin L,Miller CA,Mardis ER,Ding L,Wilson RK.VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res,2012,22(3): 568–576.

        [23]Li RQ,Li YR,Fang XD,Yang HM,Wang J,Kristiansen K,Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Res,2009,19(6):1124–1132.

        [24]Chiang DY,Getz G,Jaffe DB,O'Kelly MJT,Zhao XJ,Carter SL,Russ C,Nusbaum C,Meyerson M,Lander ES.High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods,2009,6(1):99–103.

        [25]Abyzov A,Urban AE,Snyder M,Gerstein M. CNVnator:an approach to discover,genotype,and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res,2011,21(6): 974–984.

        [26]New methods for detecting Salmonella. Anal Chem,2000,72(11): 387A.

        [27]Ivakhno S,Royce T,Cox AJ,Evers DJ,Cheetham RK,Tavare S. CNAseg--a novel framework for identification of copy number changes in cancer from second-generation sequencing data. Bioinformatics,2010,26(24):3051–3058.

        [28]Fan X,Abbott TE,Larson D,Chen K. BreakDancer:Identification of Genomic Structural Variation from Paired-End Read Mapping. Curr Protoc Bioinformatics,2014,DOI:10.1002/0471250953.bi1506s45.

        [29]Layer RM,Chiang C,Quinlan AR,Hall IM. LUMPY: A probabilistic framework for structural variant discovery.Genome Biol,2014,15(6): R84.

        [30]Wang JM,Mullighan CG,Easton J,Roberts S,Heatley SL,Ma J,Rusch MC,Chen K,Harris CC,Ding L,Holmfeldt L,Payne-Turner D,Fan X,Wei L,Zhao D,Obenauer JC,Naeve C,Mardis ER,Wilson RK,Downing JR,Zhang JH.CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods,2011,8(8): 652–654.

        [31]Sindi S,Helman E,Bashir A,Raphael BJ. A geometric approach for classification and comparison of structural variants. Bioinformatics,2009,25(12): i222–i230.

        [32]Zeitouni B,Boeva V,Janoueix-Lerosey I,Loeillet S,Legoix-né P,Nicolas A,Delattre O,Barillot E. SVDetect:a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics,2010,26(15): 1895–1896.

        [33]Michaelson JJ,Shi YJ,Gujral M,Zheng HC,Malhotra D,Jin X,Jian MH,Liu GM,Greer D,Bhandari A,Wu WT,Corominas R,Peoples A,Koren A,Gore A,Kang SL,Lin GN,Estabillo J,Gadomski T,Singh B,Zhang K,Akshoomoff N,Corsello C,McCarroll S,Iakoucheva LM,Li YR,Wang J,Sebat J. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation.Cell,2012,151(7): 1431–1442.

        [34]Liu YZ,Li BS,Tan RJ,Zhu XL,Wang YD. A gradient-boosting approach for filtering de novo mutations in parent-offspring trios. Bioinformatics,2014,30(13):1830–1836.

        [35]Li BS,Chen W,Zhan XW,Busonero F,Sanna S,Sidore C,Cucca F,Kang HM,Abecasis GR. A likelihood-based framework for variant calling and de novo mutation detection in families. PLoS Genet,2012,8(10): e1002944.

        [36]Ramu A,Noordam MJ,Schwartz RS,Wuster A,Hurles ME,Cartwright RA,Conrad DF. DeNovoGear: de novo indel and point mutation discovery and phasing. Nat Methods,2013,10(10): 985–987.

        [37]Wang K,Li MY,Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res,2010,38(16): e164.

        [38]Sana ME,Iascone M,Marchetti D,Palatini J,Galasso M,Volinia S. GAMES identifies and annotates mutations in next-generation sequencing projects. Bioinformatics,2011,27(1): 9–13.

        [39]Bi C,Wu Jy,Jiang T,Liu Q,Cai Ws,Yu P,Cai T,Zhao M,Jiang YH,Sun ZS. Mutations of ANK3 identified by exome sequencing are associated with Autism susceptibility. Hum Mutat,2012,33(12): 1635–1638.

        [40]Kessler RC,Berglund P,Demler O,Jin R,Merikangas KR,Walters EE. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry,2005,62(6): 593–602.

        [41]Kessler RC,Amminger GP,Aguilar-Gaxiola S,Alonso J,Lee S,Ustün TB. Age of onset of mental disorders: a review of recent literature. Curr Opin Psychiatr,2007,20(4): 359–364.

        [42]Collins PY,Patel V,Joestl SS,March D,Insel TR,Daar AS,Bordin IA,Costello EJ,Durkin M,Fairburn C,Glass RI,Hall W,Huang YQ,Hyman SE,Jamison K,Kaaya S,Kapur S,Kleinman A,Ogunniyi A,Otero-Ojeda A,Poo MM,Ravindranath V,Sahakian BJ,Saxena S,Singer PA,Stein DJ,Anderson W,Dhansay MA,Ewart W,Phillips A,Shurin S,Walport M. Grand challenges in global mental health. Nature,2011,475(7354): 27–30.

        [43]Crow JF. The origins,patterns and implications of human spontaneous mutation. Nat Rev Genet,2000,1(1): 40–47.

        [44]Eyre-Walker A,Keightley PD. The distribution of fitness effects of new mutations. Nat Rev Genet,2007,8(8):610–618.

        [45]Veeramah KR,O'Brien JE,Meisler MH,Cheng XY,Dib-Hajj SD,Waxman SG,Talwar D,Girirajan S,Eichler EE,Restifo LL,Erickson RP,Hammer MF. De novo pathogenic SCN8A mutation identified by whole-genome sequencing of a family quartet affected by infantile epileptic encephalopathy and SUDEP. Am J Hum Genet,2012,90(3): 502–510.

        [46]Schuurs-Hoeijmakers JHM,Geraghty MT,Kamsteeg EJ,Ben-Salem S,de Bot ST,Nijhof B,van de Vondervoort IIGM,van der Graaf M,Nobau AC,Otte-H?ller I,Vermeer S,Smith AC,Humphreys P,Schwartzentruber J,FORGE Canada Consortium,Ali BR,Al-Yahyaee SA,Tariq S,Pramathan T,Bayoumi R,Kremer HPH,van de Warrenburgbp,van den Akker WM,Gilissen C,Veltman JA,Janssen IM,Vulto-van Silfhout AT,van der Velde-Visser S,Lefeber DJ,Diekstra A,Erasmus CE,Willemsen MA,Vissers LE,Lammens M,van Bokhoven H,Brunner HG,Wevers RA,Schenck A,Al-Gazali L,de Vries BB,de Brouwer AP. Mutations in DDHD2,Encoding an Intracellular Phospholipase A(1),Cause a Recessive Form of Complex Hereditary Spastic Paraplegia.Am J Hum Genet,2012,91(6): 1073–1081.

        [47]Barcia G,Fleming MR,Deligniere A,Gazula VR,Brown MR,Langouet M,Chen HJ,Kronengold J,Abhyankar A,Cilio R,Nitschke P,Kaminska A,Boddaert N,Casanova JL,Desguerre I,Munnich A,Dulac O,Kaczmarek LK,Colleaux L,Nabbout R. De novo gain-of-function KCNT1 channel mutations cause malignant migrating partial seizures of infancy. Nat Genet,2012,44(11):1255–1259.

        [48]Hamosh A,Scott AF,Amberger JS,Bocchini CA,McKusick VA. Online Mendelian Inheritance in Man(OMIM),a knowledgebase of human genes and genetic disorders. Nucleic Acids Res,2005,33(Database issue):D514–D517.

        [49]Blake JA,Bult CJ,Eppig JT,Kadin JA,Richardson JE,Mouse Genome Database Group. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. Nucleic Acids Res,2014,42(Database issue): D810–D817.

        [50]Forbes SA,Bindal N,Bamford S,Cole C,Kok CY,Beare D,Jia MM,Shepherd R,Leung K,Menzies A,Teague JW,Campbell PJ,Stratton MR,Futreal PA. COSMIC:mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res,2011,39(Suppl 1): D945–D950.

        [51]Landrum MJ,Lee JM,Riley GR,Jang W,Rubinstein WS,Church DM,Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res,2014,42(Database issue):D980–D985.

        [52]Stenson PD,Mort M,Ball EV,Shaw K,Phillips AD,Cooper DN. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics,diagnostic testing and personalized genomic medicine. Hum Genet,2014,133(1): 1–9.

        [53]Khurana E,Fu Y,Colonna V,Mu XJ,Kang HM,Lappalainen T,Sboner A,Lochovsky L,Chen JM,Harmanci A,Das J,Abyzov A,Balasubramanian S,Beal K,Chakravarty D,Challis D,Chen Y,Clarke D,Clarke L,Cunningham F,Evani US,Flicek P,Fragoza R,Garrison E,Gibbs R,Gümüs ZH,Herrero J,Kitabayashi N,Kong Y,Lage K,Liluashvili V,Lipkin SM,MacArthur DG,Marth G,Muzny D,Pers TH,Ritchie GR,Rosenfeld JA,Sisu C,Wei XM,Wilson M,Xue YL,Yu FL,1000 Genomes Project Consortium,Dermitzakis ET,Yu HY,Rubin MA,Tyler-Smith C,Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science,2013,342(6154),DOI:10.1126/science.1235587.

        [54]Li H. Aligning sequence reads,clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv:13033997,2013.

        [55]O'Roak BJ,Deriziotis P,Lee C,Vives L,Schwartz JJ,Girirajan S,Karakoc E,Mackenzie AP,Ng SB,Baker C,Rieder MJ,Nickerson DA,Bernier R,Fisher SE,Shendure J,Eichler EE. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations.Nat Genet,2011,43(6): 585–589.

        [56]Lee H,Lin MCA,Kornblum HI,Papazian DM,Nelson SF.Exome sequencing identifies de novo gain of function missense mutation in KCND2 in identical twins with autism and seizures that slows potassium channel inactivation. Hum Mol Genet,2014,23(13): 3481–3489.

        [57]Hamdan FF,Daoud H,Patry L,Dionne-Laporte A,Spiegelman D,Dobrzeniecka S,Rouleau GA,Michaud JL.Parent-child exome sequencing identifiesa de novo truncating mutation in TCF4 in non-syndromic intellectual disability. Clin Genet,2013,83(2): 198–200.

        [58]Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res,2014,42(Database issue): D7–D17.

        [59]Fu W,O'Connor TD,Jun G,Kang HM,Abecasis G,Leal SM,Gabriel S,Rieder MJ,Altshuler D,Shendure J,Nickerson DA,Bamshad MJ,Project NES,Akey JM.Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature,2013,493(7431): 216–220.

        [60]1000 Genomes Project Consortium,Abecasis GR,Auton A,Brooks LD,DePristo MA,Durbin RM,Handsaker RE,Kang HM,Marth GT,McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature,2012,491(7422): 56–65.

        [61]Kumar P,Henikoff S,Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc,2009,4(7): 1073–1081.

        [62]Adzhubei IA,Schmidt S,Peshkin L,Ramensky VE,Gerasimova A,Bork P,Kondrashov AS,Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods,2010,7(4): 248–249.

        [63]Davydov EV,Goode DL,Sirota M,Cooper GM,Sidow A,Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++.PLoS Comput Biol,2010,6(12): e1001025.

        [64]Chun S,Fay JC. Identification of deleterious mutations within three human genomes. Genome Res,2009,19(9):1553–1561.

        [65]Maurano MT,Humbert R,Rynes E,Thurman RE,Haugen E,Wang H,Reynolds AP,Sandstrom R,Qu HZ,Brody J,Shafer A,Neri F,Lee K,Kutyavin T,Stehling-Sun S,Johnson AK,Canfield TK,Giste E,Diegel M,Bates D,Hansen RS,Neph S,Sabo PJ,Heimfeld S,Raubitschek A,Ziegler S,Cotsapas C,Sotoodehnia N,Glass I,Sunyaev SR,Kau R,Stamatoyannopoulos JA. Systematic localization of common disease-associated variation in regulatory DNA. Science,2012,337(6099): 1190–1195.

        [66]Khurana E,Fu Y,Colonna V,Mu XJ,Kang HM,Lappalainen T,Sboner A,Lochovsky L,Chen JM,Harmanci A,Das J,Abyzov A,Balasubramanian S,Beal K,Chakravarty D,Challis D,Chen Y,Clarke D,Clarke L,Cunningham F,Evani US,Flicek P,Fragoza R,Garrison E,Gibbs R,Gümü? ZH,Herrero J,Kitabayashi N,Kong Y,Lage K,Liluashvili V,Lipkin SM,MacArthur DG,Marth G,Muzny D,Pers TH,Ritchie GRS,Rosenfeld JA,Sisu C,Wei XM,Wilson M,Xue YL,Yu FL,1000 Genomes Project Consortium,Dermitzakis ET,Yu HY,Rubin MA,Tyler-Smith C,Gerstein M. Integrative annotation of variants from 1092 humans: Application to cancer genomics. Science,2013,342(6154),DOI:10.1126/science.1235587.

        [67]Pleasance ED,Cheetham RK,Stephens PJ,McBride DJ,Humphray SJ,Greenman CD,Varela I,Lin ML,Ordó?ez GR,Bignell GR,Ye K,Alipaz J,Bauer MJ,Beare D,Butler A,Carter RJ,Chen LN,Cox AJ,Edkins S,Kokko-Gonzales PI,Gormley NA,Grocock RJ,Haudenschild CD,Hims MM,James T,Jia MM,Kingsbury Z,Leroy C,Marshall J,Menzies A,Mudie LJ,Ning ZM,Royce T,Schulz-Trieglaff OB,Spiridou A,Stebbings LA,Szajkowski L,Teague J,Williamson D,Chin L,Ross MT,Campbell PJ,Bentley DR,Futreal PA,Stratton MR.A comprehensive catalogue of somatic mutations from a human cancer genome. Nature,2009,463(7278): 191–196.

        [68]Pleasance ED,Stephens PJ,O’Meara S,McBride DJ,Meynert A,Jones D,Lin ML,Beare D,Lau KW,Greenman C,Varela I,Nik-Zainal S,Davies HR,Ordo?ez GR,Mudie LJ,Latimer C,Edkins S,Stebbings L,Chen L,Jia M,Leroy C,Marshall J,Menzies A,Butler A,Teague JW,Mangion J,Sun YA,McLaughlin SF,Peckham HE,Tsung EF,Costa GL,Lee CC,Minna JD,Gazdar A,Birney E,Rhodes MD,McKernan KJ,Stratton MR,Futreal PA,Campbell PJ. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature,2010,463(7278): 184–190.

        [69]Lee W,Jiang ZS,Liu JF,Haverty PM,Guan YH,Stinson J,Yue P,Zhang Y,Pant KP,Bhatt D,Ha C,Johnson S,Kennemer MI,Mohan S,Nazarenko I,Watanabe C,Sparks AB,Shames DS,Gentleman R,de Sauvage FJ,Stern H,Pandita A,Ballinger DG,Drmanac R,Modrusan Z,Seshagiri S,Zhang ZM. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature,2010,465(7297): 473–477.

        [70]Lawrence MS,Stojanov P,Polak P,Kryukov GV,Cibulskis K,Sivachenko A,Carter SL,Stewart C,Mermel CH,Roberts SA,Kiezun A,Hammerman PS,McKenna A,Drier Y,Zou LH,Ramos AH,Pugh TJ,Stransky N,Helman E,Kim J,Sougnez C,Ambrogio L,Nickerson E,Shefler E,Cortés ML,Auclair D,Saksena G,Voet D,Noble M,DiCara D,Lin P,Lichtenstein L,Heiman DI,Fennell T,Imielinski M,Hernandez B,Hodis E,Baca S,Dulak AM,Lohr J,Landau DA,Wu CJ,Melendez-Zajgla J,Hidalgo-Miranda A,Koren A,McCarroll SA,Mora J,Lee RS,Crompton B,Onofrio R,Parkin M,Winckler W,Ardlie K,Gabriel SB,Roberts CW,Biegel JA,Stegmaier K,Bass AJ,Garraway LA,Meyerson M,Golub TR,Gordenin DA,Sunyaev S,Lander ES,Getz G. Mutational heterogeneity in cancer and the search for new cancerassociated genes. Nature,2013,499(7457): 214–218.

        [71]Puente XS,Pinyol M,Quesada V,Conde L,Ordó?ez GR,Villamor N,Escaramis G,Jares P,Beà S,González-Díaz M,Bassaganyas L,Baumann T,Juan M,López-Guerra M,Colomer D,Tubio JM,López C,Navarro A,Tornador C,Aymerich M,Rozman M,Hernández JM,Puente DA,Freije JMP,Velasco G,Gutiérrez-Fernández A,Costa D,Carrió A,Guijarro S,Enjuanes A,Hernández L,Yagüe J.,icolás P,Romeo-Casabona CM,Himmelbauer H,Castillo E,Dohm JC,de Sanjosé S,Piris MA,de Alava E,San Miguel J,Royo R,GelpíJL,Torrents D,Orozco M,Pisano DG,Valencia A,Guigó R,Bayés M,Heath S,Gut M,Klatt P,Marshall J,Raine K,Stebbings LA,Futreal PA,Stratton MR,Campbell PJ,Gut I,López-Guillermo A,Estivill X,Montserrat E,López-Otín C,Campo E.Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature,2011,475(7354):101–105.

        [72]Roberts KG,Morin RD,Zhang JH,Hirst M,Zhao YJ,Su XP,Chen SC,Payne-Turner D,Churchman ML,Harvey RC,Chen X,Kasap C,Yan CH,Becksfort J,Finney RP,Teachey DT,Maude SL,Tse K,Moore R,Jones S,Mungall K,Birol I,Edmonson MN,Hu Y,Buetow KE,Chen IM,Carroll WL,Wei L,Ma J,Kleppe M,Levine RL,Garcia-Manero G,Larsen E,Shah NP,Devidas M,Reaman G,Smith M,Paugh SW,Evans WE,Grupp SA,Jeha S,Pui CH,Gerhard DS,Downing JR,Willman CL,Loh M,Hunger SP,Marra MA,Mullighan CG. Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia.Cancer Cell,2012,22(2): 153–166.

        [73]Wang K,Yuen ST,Xu J,Lee SP,Yan HH,Shi ST,Siu HC,Deng S,Chu KM,Law S,Chan KH,Chan AS,Tsui WY,Ho SL,Chan AK,Man JL,Foglizzo V,Ng MK,Chan AS,Ching YP,Cheng GH,Xie T,Fernandez J,Li VS,Clevers H,Rejto PA,Mao M,Leung SY. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet,2014,46(6): 573–582.

        [74]Holmfeldt L,Wei L,Diaz-Flores E,Walsh M,Zhang JH,Ding L,Payne-Turner D,Churchman M,Andersson A,Chen SC,McCastlain K,Becksfort J,Ma J,Wu G,Patel SN,Heatley SL,Phillips LA,Song G,Easton J,Parker M,Chen X,Rusch M,Boggs K,Vadodaria B,Hedlund E,Drenberg C,Baker S,Pei D,Cheng C,Huether R,Lu C,Fulton RS,Fulton LL,Tabib Y,Dooling DJ,Ochoa K,Minden M,Lewis ID,To LB,Marlton P,Roberts AW,Raca G,Stock W,Neale G,Drexler HG,Dickins RA,Ellison DW,Shurtleff SA,Pui CH,Ribeiro RC,Devidas M,Carroll AJ,Heerema NA,Wood B,Borowitz MJ,Gastier-Foster JM,Raimondi SC,Mardis ER,Wilson RK,Downing JR,Hunger SP,Loh ML,Mullighan CG. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat Genet,2013,45(3): 242–252.

        [75]Turajlic S,Furney SJ,Lambros MB,Mitsopoulos C,Kozarewa I,Geyer FC,MacKay A,Hakas J,Zvelebil M,Lord CJ,Ashworth A,Thomas M,Stamp G,Larkin J,Reis-Filho JS,Marais R. Whole genome sequencing of matched primary and metastatic acral melanomas. Genome Res,2012,22(2): 196–207.

        [76]Brastianos PK,Horowitz PM,Santagata S,Jones RT,McKenna A,Getz G,Ligon KL,Palescandolo E,Van Hummelen P,Ducar MD,Raza A,Sunkavalli A,Macconaill LE,Stemmer-Rachamimov AO,Louis DN,Hahn WC,Dunn IF,Beroukhim R. Genomic sequencing of meningiomas identifies oncogenic SMO and AKT1 mutations. Nat Genet,2013,45(3): 285–289.

        [77]Nik-Zainal S,Alexandrov LB,Wedge DC,Van Loo P,Greenman CD,Raine K,Jones D,Hinton J,Marshall J,Stebbings LA,Menzies A,Martin S,Leung K,Chen L,Leroy C,Ramakrishna M,Rance R,Lau KW,Mudie LJ,Varela I,McBride DJ,Bignell GR,Cooke SL,Shlien A,Gamble J,Whitmore I,Maddison M,Tarpey PS,Davies HR,Papaemmanuil E,Stephens PJ,McLaren S,Butler AP,Teague JW,J?nsson G,Garber JE,Silver D,Miron P,Fatima A,Boyault S,Langer?d A,Tutt A,Martens JW,Aparicio SA,Borg ?,Salomon AV,Thomas G,B?rresen-Dale AL,Richardson AL,Neuberger MS,Futreal PA,Campbell PJ,Stratton MR; Breast Cancer Working Group of the International Cancer Genome Consortium. Mutational processes molding the genomes of 21 breast cancers. Cell,2012,149(5): 979–993.

        [78]Robinson G,Parker M,Kranenburg TA,Lu C,Chen X,Ding L,Phoenix TN,Hedlund E,Wei L,Zhu XY,Chalhoub N,Baker SJ,Huether R,Kriwacki R,Curley N,Thiruvenkatam R,Wang J,Wu G,Rusch M,Hong X,Becksfort J,Gupta P,Ma J,Easton J,Vadodaria B,Onar-Thomas A,Lin T,Li S,Pounds S,Paugh S,Zhao D,Kawauchi D,Roussel MF,Finkelstein D,Ellison DW,Lau CC,Bouffet E,Hassall T,Gururangan S,Cohn R,Fulton RS,Fulton LL,Dooling DJ,Ochoa K,Gajjar A,Mardis ER,Wilson RK,Downing JR,Zhang J,Gilbertson RJ.Novel mutations target distinct subgroups of medulloblastoma. Nature,2012,488(7409): 43–48.

        [79]Sato Y,Yoshizato T,Shiraishi Y,Maekawa S,Okuno Y,Kamura T,Shimamura T,Sato-Otsubo A,Nagae G,Suzuki H,Nagata Y,Yoshida K,Kon A,Suzuki Y,Chiba K,Tanaka H,Niida A,Fujimoto A,Tsunoda T,Morikawa T,Maeda D,Kume H,Sugano S,Fukayama M,Aburatani H,Sanada M,Miyano S,Homma Y,Ogawa S. Integrated molecular analysis of clear-cell renal cell carcinoma. Nat Genet,2013,45(8): 860–867.

        [80]Han JY,Lee YS,Kim BC,Lee GK,Lee S,Kim EH,Kim HM,Bhak J. Whole-genome analysis of a patient with early-stage small-cell lung cancer. Pharmacogenomics J,2014,DOI:10.1038/tpj.2014.17.

        [81]Mohan S,Heitzer E,Ulz P,Lafer I,Lax S,Auer M,Pichler M,Gerger A,Eisner F,Hoefler G. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing. PLoS Genet,2014,10(3): e1004271.

        [82]Demeure MJ,Aziz M,Rosenberg R,Gurley SD,Bussey KJ,Carpten JD. Whole-genome Sequencing of an Aggressive BRAF Wild-type Papillary Thyroid Cancer Identified EML4–ALK Translocation as a Therapeutic Target. World J Surg,2014,38(6): 1296–1305.

        [83]Talkowski ME,Rosenfeld JA,Blumenthal I,Pillalamarri V,Chiang C,Heilbut A,Ernst C,Hanscom C,Rossin E,Lindgren AM,Pereira S,Ruderfer D,Kirby A,Ripke S,Harris DJ,Lee JH,Ha K,Kim HG,Solomon BD,Gropman AL,Lucente D,Sims K,Ohsumi TK,Borowsky ML,Loranger S,Quade B,Lage K,Miles J,Wu BL,Shen Y,Neale B,Shaffer LG,Daly MJ,Morton CC,Gusella JF.Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell,2012,149(3): 525–537.

        [84]Kong A,Frigge ML,Masson G,Besenbacher S,Sulem P,Magnusson G,Gudjonsson SA,Sigurdsson A,Jonasdottir A,Jonasdottir A,Wong WS,Sigurdsson G,Walters GB,Steinberg S,Helgason H,Thorleifsson G,Gudbjartsson DF,Helgason A,Magnusson OT,Thorsteinsdottir U,Stefansson K. Rate of de novo mutations and the importance of father's age to disease risk. Nature,2012,488(7412): 471–475.

        [85]Chiu RW. Noninvasive prenatal testing by maternal plasma DNA analysis: Current practice and future applications. Scand J Clin Lab Invest Suppl,2014,244:48–53.

        [86]Chiu RWK,Chan KCA,Gao Y,Lau VYM,Zheng WL,Leung TY,Foo CHF,Xie B,Tsui NBY,Lun FMF,Zee BCY,Lau TK,Cantor CR,Lo YMD. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma.Proc Natl Acad Sci U S A,2008,105(51): 20458–20463.

        [87]Lau TK,Jiang FM,Stevenson RJ,Lo TK,Chan LW,Chan MK,Lo PSS,Wang W,Zhang HY,Chen F,Choy KW.Secondary findings from non-invasive prenatal testing for common fetal aneuploidies by whole genome sequencing as a clinical service. Prenat Diagn,2013,33(6):602–608.

        [88]Leary RJ,Sausen M,Kinde I,Papadopoulos N,Carpten JD,Craig D,O'Shaughnessy J,Kinzler KW,Parmigiani G,Vogelstein B,Diaz LA Jr,Velculescu VE. Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing. Sci Transl Med,2012,4(162): 162ra154.

        [89]Choi M,Scholl UI,Ji WZ,Liu TW,Tikhonova IR,Zumbo P,Nayir A,Bakkalo?lu A,?zen S,Sanjad S,Nelson-Williams C,Farhi A,Mane S,Lifton RP. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci USA,2009,106(45):19096–19101.

        [90]Bonnefond A,Durand E,Sand O,De Graeve F,Gallina S,Busiah K,Lobbens S,Simon A,Bellanné-Chantelot C,Létourneau L,Scharfmann R,Delplanque J,Sladek R,Polak M,Vaxillaire M,Froguel P. Molecular diagnosis of neonatal diabetes mellitus using next-generation sequencing of the whole exome. PLoS ONE,2010,5(10): e13630.

        [91]Worthey EA,Mayer AN,Syverson GD,Helbling D,Bonacci BB,Decker B,Serpe JM,Dasu T,Tschannen MR,Veith RL,Basehore MJ,Broeckel U,Tomita-Mitchell A,Arca MJ,Casper JT,Margolis DA,Bick DP,Hessner MJ,Routes JM,Verbsky JW,Jacob HJ,Dimmock DP. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet Med,2011,13(3):255–262.

        [92]Montenegro G,Powell E,Huang J,Speziani F,Edwards YJK,Beecham G,Hulme W,Siskind C,Vance J,Shy M,Züchner S. Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth family. Ann Neurol,2011,69(3): 464–470.

        [93]Lo YM. Fetal DNA in maternal plasma: biology and diagnostic applications. Clin Chem,2000,46(12): 1903–1906.

        [94]Puckelwartz MJ,Pesce LL,Nelakuditi V,Dellefave-Castillo L,Golbus JR,Day SM,Cappola TP,Dorn II GW,Foster IT,McNally EM. Supercomputing for the parallelization of whole genome analysis. Bioinformatics,2014,30(11): 1508–1513.

        [95]Tischler G,Leonard S. Biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol Med,2014,9: 13.

        附圖1

        猜你喜歡
        變異基因組癌癥
        牛參考基因組中發(fā)現(xiàn)被忽視基因
        變異危機(jī)
        變異
        留意10種癌癥的蛛絲馬跡
        癌癥“偏愛(ài)”那些人?
        海峽姐妹(2018年7期)2018-07-27 02:30:36
        對(duì)癌癥要恩威并施
        特別健康(2018年4期)2018-07-03 00:38:08
        不如擁抱癌癥
        特別健康(2018年2期)2018-06-29 06:13:42
        變異的蚊子
        基因組DNA甲基化及組蛋白甲基化
        遺傳(2014年3期)2014-02-28 20:58:49
        有趣的植物基因組
        丝袜美腿制服诱惑一区二区| 91呻吟丰满娇喘国产区| 老熟女熟妇嗷嗷叫91| 在线视频免费自拍亚洲| 无码熟妇人妻av在线网站| 女人扒开屁股爽桶30分钟| 最新亚洲人AV日韩一区二区| 国产精品日本一区二区三区| 中国久久久一级特黄久久久| a级毛片免费观看在线| 99精品成人片免费毛片无码| 中文字幕人妻少妇精品| 久久国产精品亚洲婷婷片| 麻豆国产原创视频在线播放| 国产AV无码专区亚洲AV桃花庵| 白白色青青草视频免费观看| 成人免费播放视频777777| 国产亚洲人成a在线v网站| 亚洲成在人网av天堂| 亚洲不卡av二区三区四区| 日本免费精品一区二区三区视频| 亚洲黄色天堂网站在线观看禁18| 无码人妻久久一区二区三区不卡| 亚洲制服无码一区二区三区| 亚洲精品综合久久国产二区 | 亚洲中文字幕久久精品一区| 色偷偷偷久久伊人大杳蕉| 久久AV中文一区二区三区| 狼人狠狠干首页综合网| 亚洲丁香婷婷久久一区二区| 欧美熟妇色ⅹxxx欧美妇| 久久精品国产乱子伦多人| 精品女厕偷拍视频一区二区| 天天爽天天爽夜夜爽毛片| 欧美一级视频精品观看 | 一区二区三区四区草逼福利视频 | 欧美成人免费高清视频| 亚洲综合网中文字幕在线| 亚洲综合网国产精品一区| 欧美孕妇xxxx做受欧美88| 欧美综合自拍亚洲综合百度|