李軒 朱艷
摘 要:面對超大規(guī)模測序數(shù)據(jù)的處理方法與處理能力的挑戰(zhàn),該課題從各種新一代測序技術(shù)平臺,包括Illumina/Solexa、Roche/454、AB/SOLiD和國產(chǎn)AG-100/200測序系統(tǒng)等數(shù)據(jù)產(chǎn)生的源頭出發(fā),研究數(shù)據(jù)的特點(diǎn)、實驗設(shè)計策略和數(shù)據(jù)處理方法, 發(fā)展新一代測序技術(shù)中的編碼模型和高通量實驗設(shè)計理論與方法,研究各種測序平臺數(shù)據(jù)的數(shù)學(xué)模型和質(zhì)量控制方法,發(fā)展高通量測序數(shù)據(jù)的高效處理方法及工作流程和跨平臺數(shù)據(jù)的統(tǒng)合分析方法。在研究發(fā)展新一代測序技術(shù)和序數(shù)據(jù)的數(shù)學(xué)模型和質(zhì)量控制方法的基礎(chǔ)上,建立新一代測序的編碼和實驗設(shè)計理論。這些理論方法,對測序數(shù)據(jù)處理提供重要的指導(dǎo)的同時,將改進(jìn)我國自主研發(fā)的新一代測序儀AG系統(tǒng)。該課題將建立適應(yīng)多種平臺、針對多種應(yīng)用的新一代測序數(shù)據(jù)處理方法、算法、可重構(gòu)軟件工作流程和和跨平臺數(shù)據(jù)統(tǒng)合分析方法,并開發(fā)面向大數(shù)據(jù)量序列數(shù)據(jù)處理的硬件加速技術(shù);課題的進(jìn)展將推動我國生物信息學(xué)和高通量測序技術(shù)的研究發(fā)展進(jìn)入世界前沿行列。在課題工作實施的一年多的時間里,圍繞著課題的主要方向目標(biāo),各個參與團(tuán)隊和合作單位積極開展工作,取得了一些突出的進(jìn)展,為后一階段工作的開展完成打下了良好的基礎(chǔ)。主要進(jìn)展包括研究發(fā)展了一套新的解碼合成測序技術(shù)體系,研究建立了測序誤差模型和原始測序數(shù)據(jù)處理算法,建立了AG測序系統(tǒng)數(shù)據(jù)處理軟件的框架,并完成了該系統(tǒng)的主要模塊發(fā)展;研究建立了測序誤差模型和原始測序數(shù)據(jù)處理算法;面向多樣本測序?qū)嶒灥木幋a理論和方法,建立了測序樣本編碼優(yōu)化設(shè)計方法,提出雙標(biāo)簽編碼的高通量測序文庫制備方案;研究了基于群試?yán)碚摚℅roup testing)的樣本混合(pooled DNA samples)編碼方法,提出了面向Pool-Seq實驗的均衡編碼設(shè)計算法和基于超幾何分布計算的分組設(shè)計算法;進(jìn)行了對不同高通量測序技術(shù)平臺、不同組學(xué)應(yīng)用(基因組、轉(zhuǎn)錄組)的數(shù)據(jù)特征分析,完成多套應(yīng)用案例;完成了高通量測序的轉(zhuǎn)錄組數(shù)據(jù)(RNA-seq)的數(shù)據(jù)處理和拼裝優(yōu)化流程;申請多項專利技術(shù),形成我國自主產(chǎn)權(quán)的新一代測序技術(shù)的核心技術(shù)體系。
關(guān)鍵詞:新一代測序 技術(shù)平臺 數(shù)學(xué)模型 誤差分析 實驗設(shè)計 優(yōu)化
Abstract:Challenges of processing methods and capabilities of large scale sequencing data, generated from a variety of next-generation sequencing platforms, including Illumina/Solexa, Roche/454,AB/ OLiD sequencing systems and domestic AG-100/200, are the focus of bioinformatics today. We develop the experimental design strategies and the data processing methods, the high-throughput experimental coding model design, and methods of quality control data for a variety of sequencing platforms. On the basis of mathematical models and methods of quality control research and sequence data on the establishment of coding theory and experimental design of next-generation sequencing, we will provide new theoretical methods of sequencing data processing that give important guidance to improve our self-developed next-generation sequencing AG system. This strategy will be used to adapt to a variety of platforms for a variety of next-generation sequencing data processing methods, algorithms, software reconfigurable workflow, and the development of sequence data for the hardware acceleration technology. The progress will promote the bioinformatics and sequencing technology to enter the ranks of the world's frontier. In more than a year of work implementation, with the goals surrounding the main directions of the target, in cooperation with the participating teams, we made outstanding progress, and carried out work to lay a good foundation for the later stages of the project. Major research progress, including a new set of decoding sequencing by synthesis technology system to study the establishment of a sequencing error model and raw sequencing data processing algorithms to establish a framework AG sequencing data processing software system. To study the sequencing error model and raw sequencing data processing algorithms, the oriented coding theory and methods with various sequencing experiments, a sequencing sample code optimization design method is proposed to double tag encoding high-throughput sequencing library preparation programs. Test sample mixed group of pooled DNA samples encoding method is proposed for the Pool-Seq balanced design algorithms and packet-based encoding algorithm design with hypergeometric distribution calculations. The applications of different genomics data analysis, and assembly optimization were filed for patent application. They form the core of our own proprietary system for next-generation sequencing technology.
Key Words:Nextgen sequencing; Technological platform; Mathematics model; Error analysis; Experiment design; Optimization
閱讀全文鏈接(需實名注冊):http://www.nstrs.cn/xiangxiBG.aspx?id=49573&flag=1