亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        基因預(yù)測(cè)算法中閾值的傅里葉質(zhì)譜分析

        2014-07-02 19:06:35劉平等
        湖北農(nóng)業(yè)科學(xué) 2014年6期
        關(guān)鍵詞:信噪比

        劉平等

        摘要:蛋白質(zhì)編碼區(qū)預(yù)測(cè)中閾值選擇對(duì)預(yù)測(cè)結(jié)果的影響不容忽視。研究提出以歸一化的功率譜密度作為判別DNA序列編碼區(qū)和非編碼區(qū)的閾值,以FIR(Finite impulse response,F(xiàn)IR)窄通帶濾波器NPBF(Narrow pass band filter,NPBF)作為編碼區(qū)預(yù)測(cè)算法核心,采用DNA序列集HMR195和ALLSEQ作為測(cè)試集,以堿基層的近似相關(guān)系數(shù) (Approximate correlation,AC)為預(yù)測(cè)準(zhǔn)確率測(cè)度指標(biāo),對(duì)所提出方法與現(xiàn)有方法的預(yù)測(cè)結(jié)果做了比較。結(jié)果表明,采用新閾值得到的預(yù)測(cè)準(zhǔn)確率最高,算法簡(jiǎn)單直觀。

        關(guān)鍵詞:蛋白質(zhì)編碼區(qū)預(yù)測(cè);窄通帶濾波器;歸一化的功率譜密度值;信噪比;近似相關(guān)系數(shù)

        中圖分類號(hào):TP391.9;TN713 文獻(xiàn)標(biāo)識(shí)碼:A 文章編號(hào):0439-8114(2014)06-1432-04

        Analysis on Threshold Used in Gene Prediction Algorithm Based on Fourier Spectrum

        LIU Ping1,MA Yu-tao1,SUN Xue-hong1,ZHANG Cheng1,DU Yong2

        (1.School of Physics & Electrical Information Engineering/Ningxia Key Laboratory of Intelligent Sensing for Desert Information, Ningxia University,Yinchuan 750021,China;2.Department of Pediatric Surgery,General Hospital of Ningxia Medical University,Yinchuan 750004,China)

        Abstract: Threshold selection of protein coding regions prediction algorithm has important influence on the prediction accuracy. In this paper, a new threshold and normalized value of power spectrum density was proposed to differentiate protein coding regions and non-coding regions. Using the FIR (Finite impulse response) NPBF (Narrow pass-band filter) as the kernel of the prediction algorithm and taking the DNA sequences data sets HMR195 and ALLSEQ as the test sets, the prediction results of the NPBF algorithm with new threshold was compared with those of the same algorithm using other two thresholds. The results were discussed with the AC(Approximate correlation) used as a base level prediction accuracy measure. It was indicated that the proposed threshold was the best choice for higher AC and less amount of computation.

        Key words: protein coding regions prediction; narrow pass-band filter; normalized value of power spectrum density; ratio of signal to noise; approximate correlation

        蛋白質(zhì)編碼區(qū)預(yù)測(cè)對(duì)于DNA序列的注釋和標(biāo)注工作具有很重要的指導(dǎo)意義[1-3]。在現(xiàn)有的蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法中,Tiwari等[4]提出的SDFT(Sliding discrete fourier transform,SDFT)算法使用了信噪比RSN(Ratio of signal to noise,RSN)作為區(qū)分編碼區(qū)和非編碼區(qū)的閾值;Mena-Chalco等[5]使用預(yù)測(cè)非編碼率PNCR(Predicted non-coding ratio,PNCR)作為閾值;Ambikairajah等[6]、Akhtar等[7]在作DNA序列的PSD(Power spectral density,PSD)曲線圖時(shí)對(duì)曲線的幅度作了歸一化處理。面對(duì)兩種不同的閾值選擇,在基因預(yù)測(cè)時(shí)哪一個(gè)能給出最好的預(yù)測(cè)結(jié)果,是否還有更好的閾值選擇,還需進(jìn)行研究并確定。

        本研究提出采用歸一化的功率譜密度(Power spectrum sensity normalized by its maximum value, PSDN)作為區(qū)分編碼區(qū)和非編碼區(qū)的閾值,采用FIR (Finite impulse response,F(xiàn)IR) NPBF (Narrow pass-band filter,NPBF)蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法作為平臺(tái)[8,9],采用DNA序列集HMR195[10]和ALLSEQ[11]作為算法的測(cè)試序列集,采用Sn(Sensitivity)、Sp(Specificity)、FPR(False positive rate)、AC(Approximate correlation)和CC(Correlation coefficient)作為預(yù)測(cè)結(jié)果的指標(biāo)[11],比較了RSN、PNCR和PSDN分別作為閾值時(shí)的預(yù)測(cè)結(jié)果,為獨(dú)立基因預(yù)測(cè)算法中的閾值選擇提供參考。

        1 材料與方法

        1.1 材料

        采用基因序列AB003730(序列集HMR195中的一個(gè)DNA序列)作為標(biāo)準(zhǔn)序列來比較采用前述3種閾值時(shí)蛋白質(zhì)編碼區(qū)預(yù)測(cè)的結(jié)果;采用ALLSEQ和HMR195 DNA序列集來驗(yàn)證閾值選擇對(duì)預(yù)測(cè)結(jié)果產(chǎn)生影響的廣泛性。

        1.2 NPBF基因預(yù)測(cè)算法

        基于FIR窄通帶濾波器的編碼區(qū)預(yù)測(cè)算法主要包括以下步驟:①采用Voss法將DNA序列映射成數(shù)值序列(信號(hào));②使用FIR窄通帶濾波器對(duì)前一步得到的數(shù)值信號(hào)進(jìn)行濾波,濾除非周期為3的信號(hào);③計(jì)算信號(hào)的功率譜密度(PSD);④對(duì)PSD曲線進(jìn)行滑動(dòng)平均濾波和幅度歸一化;⑤用非編碼率作為閾值對(duì)DNA序列進(jìn)行分類,確定DNA序列中的編碼區(qū)和非編碼區(qū),并以一種或多種預(yù)測(cè)準(zhǔn)確率指標(biāo)給出預(yù)測(cè)結(jié)果。

        采用Voss法將由堿基Adenine (A),Thymine (T),Cytosine (C)和Guanine (G)組成的DNA序列映射為數(shù)值序列x1[n],l={A,T,C,G}[1-9],讓其通過FIR窄通帶濾波器濾波后,得到了周期為3的信號(hào)y1[n],l={A,T,C,G}。DNA序列編碼信號(hào)的功率譜密度

        PSD[n]=■■y■[n]■,l=A,T,C,G;n=1,…,L

        (1)

        式中,N為FIR濾波器長(zhǎng)度,L為DNA序列的長(zhǎng)度。

        在實(shí)際編碼區(qū)預(yù)測(cè)算法仿真中存在濾波輸出序列不夠光滑的問題,因此,在統(tǒng)計(jì)預(yù)測(cè)結(jié)果之前先采用1個(gè)110階的移動(dòng)平均濾波器對(duì)預(yù)測(cè)進(jìn)行平滑處理。式(2)為1個(gè)Nma階的移動(dòng)平均濾波器的差分方程。

        PSD■[n]=■■PSD(n-i)(2)

        在計(jì)算出序列的移動(dòng)平均功率譜后,采用其最大值作為標(biāo)準(zhǔn)進(jìn)行歸一化以便于不同算法結(jié)果的比較。之后,采用預(yù)測(cè)非編碼率作為閾值,使得閾值范圍為1~99,且改變?yōu)V波器的長(zhǎng)度,以便獲得算法的最好預(yù)測(cè)準(zhǔn)確率閾值。本研究用敏感度(Sn)、特異度(Sp)、近似相關(guān)系數(shù)(AC)和相關(guān)系數(shù)(CC)來評(píng)估算法對(duì)編碼區(qū)的識(shí)別性能[11]。其中,AC作為整體預(yù)測(cè)準(zhǔn)確率的測(cè)度,便于與其他文獻(xiàn)的研究結(jié)果進(jìn)行比較;Sn、Sp作為參考測(cè)度,用于對(duì)標(biāo)準(zhǔn)序列進(jìn)行研究。

        1.3 3種閾值運(yùn)算量的比較

        以RSN為閾值的預(yù)測(cè)需要計(jì)算每個(gè)序列PSD的均值,然后根據(jù)RSN計(jì)算出與之對(duì)應(yīng)的一個(gè)PSDN作為閾值;以PNCR為閾值的預(yù)測(cè)實(shí)際上需要將DNA序列的PSD排序,然后根據(jù)指定的PNCR計(jì)算出一個(gè)與其對(duì)應(yīng)的PSDN作為閾值;以PSDN作為閾值只需要選擇一個(gè)PSDN即可。因此,以PSDN作為閾值的預(yù)測(cè)算法的運(yùn)算量最小。

        2 結(jié)果與分析

        2.1 窄通帶濾波器的實(shí)現(xiàn)

        在編碼區(qū)預(yù)測(cè)試驗(yàn)中使用了119和 599兩種窗長(zhǎng)的APNPBF(All phase NPBF)窄通帶濾波器,圖1為窗長(zhǎng)為599的APNPBF的頻率響應(yīng)。對(duì)于DNA序列集中長(zhǎng)度小于600 bp的DNA序列,在預(yù)測(cè)時(shí)使用的是窗長(zhǎng)為119的濾波器,以減少由于輸入序列進(jìn)行補(bǔ)零等延拓處理造成對(duì)預(yù)測(cè)結(jié)果的失真。

        2.2 編碼區(qū)預(yù)測(cè)結(jié)果

        采用3種閾值在序列AB003730上進(jìn)行預(yù)測(cè)分析,以RSN、PNCR和PSDN為閾值得到的預(yù)測(cè)曲線分別見圖2a、圖2b、圖2c;3種閾值對(duì)應(yīng)預(yù)測(cè)結(jié)果的ROC曲線見圖2d;對(duì)ROC曲線左上角的局部進(jìn)行了放大(圖2e)。對(duì)于閾值RSN來說,其ROC曲線是通過令RSN以0.08為步長(zhǎng),取0.08~8.00共100個(gè)值,將這些閾值獲得預(yù)測(cè)結(jié)果的FPR和TPR配對(duì)在二維平面上描出的曲線。對(duì)于閾值PNCR和PSDN來說,其ROC曲線的獲得與RSN相類似,取值范圍分別為1≤PNCR≤100和0.01≤PSDN≤1.00,步長(zhǎng)都取0.01。ROC曲線下的AUC(Area under the ROC curve)面積越大則表明算法對(duì)編碼區(qū)和非編碼區(qū)的區(qū)分能力越強(qiáng)。

        基于3種閾值的序列AB003730的最好預(yù)測(cè)結(jié)果見表1。最好預(yù)測(cè)結(jié)果是指在閾值的某個(gè)變化范圍內(nèi),選用某個(gè)具體數(shù)值時(shí)獲得的預(yù)測(cè)準(zhǔn)確率AC最高。對(duì)于閾值RSN來說,其選擇范圍建議取1

        采用上述3種閾值分別對(duì)ALLSEQ和HMR195DNA序列集進(jìn)行預(yù)測(cè), 結(jié)果見表2。由表2可知,PSDN作為閾值在ALLSEQ和HMR195上都獲得了最高的預(yù)測(cè)準(zhǔn)確率,同時(shí)采用RSN作為閾值預(yù)測(cè)結(jié)果要好于采用PNCR。

        RSN作為閾值能夠?qū)⒕幋a信號(hào)強(qiáng)度較強(qiáng)的區(qū)域預(yù)測(cè)為編碼區(qū),強(qiáng)調(diào)了一個(gè)DNA序列中編碼區(qū)和非編碼區(qū)PSD大小的差別,但對(duì)編碼信號(hào)較弱和編碼信號(hào)較強(qiáng)且編碼區(qū)占DNA序列完整長(zhǎng)度比較高的編碼區(qū)則都不能正確識(shí)別;PNCR作為閾值則限定任何序列都只有某一個(gè)固定的百分比是編碼區(qū),這與實(shí)際情況不符;而PSDN作為閾值則只強(qiáng)調(diào)了一個(gè)DNA序列中編碼區(qū)具有的周期性的強(qiáng)弱,忽視了非編碼區(qū)和噪聲的作用,在3種閾值中最大限度地提高了編碼區(qū)被識(shí)別的可能性。

        3 小結(jié)

        對(duì)獨(dú)立基因預(yù)測(cè)算法中的閾值問題進(jìn)行了研究,提出了一種新的閾值PSDN。結(jié)果表明,以PSDN作為閾值獲得的預(yù)測(cè)準(zhǔn)確率最好,使NPBF預(yù)測(cè)算法得到了簡(jiǎn)化。與以RSN和PNCR為閾值的預(yù)測(cè)算法相比,明顯改善了編碼區(qū)長(zhǎng)度占DNA序列長(zhǎng)度比值較高情況下的預(yù)測(cè)結(jié)果。

        參考文獻(xiàn):

        [1] CHEN B,JI P.Visualization of the protein-coding regions with a self adaptive spectral rotation approach[J]. Nucleic Acids Research,2011,39(1):e3.

        [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

        [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

        [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

        [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

        [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

        [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

        [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

        [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

        [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

        [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

        [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

        [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

        [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

        [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

        [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

        [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

        [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

        [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

        [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

        [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

        [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

        [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

        [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

        [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

        [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

        [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

        [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

        [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

        [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

        [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

        猜你喜歡
        信噪比
        兩種64排GE CT冠脈成像信噪比與劑量對(duì)比分析研究
        基于經(jīng)驗(yàn)分布函數(shù)快速收斂的信噪比估計(jì)器
        一種基于2G-ALE中快速信噪比的估計(jì)算法
        無線通信中的信噪比估計(jì)算法研究
        信噪比在AR模型定階方法選擇中的研究
        自跟蹤接收機(jī)互相關(guān)法性能分析
        基于深度學(xué)習(xí)的無人機(jī)數(shù)據(jù)鏈信噪比估計(jì)算法
        低信噪比下LFMCW信號(hào)調(diào)頻參數(shù)估計(jì)
        低信噪比下基于Hough變換的前視陣列SAR稀疏三維成像
        不同信噪比下的被動(dòng)相控陣?yán)走_(dá)比幅測(cè)角方法研究
        91视色国内揄拍国内精品人妻| 免费毛片在线视频| 亚洲性69影视| 羞涩色进入亚洲一区二区av| 久久精品亚洲熟女av蜜謦| 专干老肥熟女视频网站300部| 最新亚洲人成无码网www电影| 综合久久久久6亚洲综合| 国产精品 视频一区 二区三区| 99riav精品国产| 亚洲午夜久久久精品国产| 你懂的视频网站亚洲视频| 国产成人综合精品一区二区| 日本va欧美va精品发布| 亚洲国产综合精品 在线 一区 | 久久亚洲av永久无码精品| 人妻精品一区二区免费| 熟妇人妻精品一区二区视频免费的| 国内精品久久久久伊人av| 色偷偷一区二区无码视频| 亚洲国产18成人中文字幕久久久久无码av| 少妇av免费在线播放| 手机在线看片国产人妻| 伊人久久大香线蕉av不卡| 国产成人综合一区二区三区| av东京热一区二区三区| 大香蕉视频在线青青草| 国产亚洲精品久久久久久国模美| 天天爽天天爽夜夜爽毛片| 久草视频国产| 国产伦理自拍视频在线观看| 亚洲国产精品一区二区毛片| 亚洲综合国产一区二区三区| 国产婷婷丁香五月麻豆| 精品午夜中文字幕熟女| 亚洲国产精品无码aaa片| 99偷拍视频精品一区二区| 国内精品久久久久影院蜜芽| 青青草视频网站在线观看| 久久综合给合综合久久| 国产在线视频国产永久视频|