亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        FasParser: a package for manipulating sequence data

        2017-05-06 06:17:13YanBoSun
        Zoological Research 2017年2期

        Yan-Bo Sun

        ?

        FasParser: a package for manipulating sequence data

        Yan-Bo Sun1,*

        1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China

        A computer software package called ‘FasParser’ was developed for manipulating sequence data. It can be used on personal computers to perform series of analyses, including counting and viewing differences between two sequences at both DNA and codon levels, identifying overlapping regions between two alignments, sorting of sequences according to their IDs or lengths, concatenating sequences of multiple loci for a particular set of samples, translating nucleotide sequences to amino acids, and constructing alignments in several different formats, as well as some extracting and filtrating of data for a particular FASTA file. Majority of these functions can be run in a batch mode, which is very useful for analyzing large data sets. This package can be used by a broad audience, and is designed for researchers that do not have programming experience in sequence analyses. The GUI version of FasParser can be downloaded from https://github.com/Sun-Yanbo/FasParser, free of charge.

        FasParser; Batch processing; Sequence comparison; Extraction and filtration

        INTRODUCTION

        Recent developments in sequencing technology function to generate a vast amount of DNA and RNA sequence data. Analyses based on these sequences are one of the most important means of assessing their potential for biological inference. The amount of available sequence data has made their manipulation tricky, especially for researchers without programming experience. Hence, the development of user-friendly software facilitates research using batch modes for sequence extraction, filtration, translation and conversions of file formats.

        The program package MEGA (Kumar et al., 1994), which was developed decades ago, has achieved worldwide usage. Although it has manipulation functions, such as sequence viewing and format conversion, it focused mainly on various statistical analyses of molecular evolution. Many sequence manipulations still require manual work or the use of other tools (i.e., Microsoft Office Excel). Examples include the concatenation of loci from multiple sequence files, the extraction of some gene sequences from a whole genome, and the filtering of very short sequences in an alignment. Another package, BioEdit (Hall, 1999), can handle most simple sequence editing and manipulation functions. However, it inefficiently handles batch processing and can only deal with one alignment file at a time.[1]

        Herein, I provide the new program package ‘FasParser’ for manipulating sequence files. It has a user-friendly GUI and batch processing modes, which allows users to handle multiple sequence files in a simple way. Presently, the package has seven main programs/functions (Figure 1): (1) counting and viewing the differences between two sequences at the DNA and codon levels; (2) identifying overlapping columns of two alignments of a same gene; (3) sorting sequences according to ID, sequence length, or ID list provided by user; (4) concatenating sequences for a particular set of samples from multiple sequence files; (5) batch-translating protein –coding nucleic acid sequences into amino acids; (6) constructing alignments with different formats; and (7) extracting and filtering sequences according to ID or sequence length. FasParser is a standalone application that has been compiled and tested on Windows 7/10 operating systems. Only available computer memory limits the size of data to be analyzed.

        BATCH PROCESSING

        This new package can batch process several commonly used procedures including merging sequences, translating, aligning and converting formats. For merging, it can obtain a “super sequence” by concatenating all the loci sequences for a particular set of samples. This is useful for phylogenetic inference. The translation program can obtain the amino acid sequences according to multiple genetic codes. In addition to the batch processing, it can also read single FASTA file orsingle DNA sequence (manual mode), thus providing a simple way to get the amino acid sequences. Alignment construction is one of the most important manipulations of sequences and the program can make use of three popular aligners for it: MUSCLE (Edgar, 2004), MAFFT (Katoh et al., 2002), and PRANK (L?ytynoja & Goldman, 2005). The first two programs can generate final alignments quickly and automatically recognize the type of sequence (DNA or amino acid). Although PRANK is slower than the others, it produces more accurate results (Jordan & Goldman, 2012) and can directly obtain final alignments at the codon-level. In addition, FasParser can convert alignments to different formats, for example from FASTA to PHYLIP, PAML, or NEXUS. Batch processing of these functions only needs a directory containing all the sequence files to be analyzed.

        Figure 1 Overview of the functions provided by FasParser

        SEQUENCE COMPARISON AND MUTATION IDENTIFICATION

        After constructing an alignment, it is often desirable to visualize the mutations or substitutions between two sequences, and/or identify overlapping regions generated by different aligners for the same gene. The programs “Cmp-2Seq” and “Cmp-2Align” address these issues. Cmp-2Seq counts and displays differences between two sequences at the levels of nucleotides and codons. Under the codon level, the program estimates the total number of sites with synonymous (S) and non-synonymous (N) substitutions for the first sequence and then calculates the number of synonymous and non-synonymous substitutions between the two sequences according to the NG86 method of Nei & Gojobori (1986). This function is useful in analyses, such as cancer genomic studies that focus on understanding the selective pressures following cell proliferation (Liu et al., 2012).

        Cmp-2Align identifies overlapping regions between two alignments using a simple but rigorous algorithm (Figure 2). Briefly, for each base of an alignment column, the program calculates its gap-free position in the raw sequence. Next, it transforms these positions to a string vector, like “1-2-2”, meaning there are 3 sequences, and this column contains the first base of the first sequence, the second base of the second sequence and also the second base in the third sequence. Finally, the program extracts all columns with the same position-vectors between two alignments (Figure 2). This manipulation is useful for analyses such as the identification of regions informative for phylogenetic inference.

        Figure 2 Algorithm used to compare different alignments

        The red-colored columns in the two alignments hold the same position-matrix, which are identified as the final overlapping regions by FasParser.

        EXTRACTION AND FILTRATION

        FasParser can also extract and filter a set of sequences from a raw FASTA file (“Fas-Filter”) based on query IDs, as well as removing sequences according to a cutoff-length. Fas-Filter can cut a raw alignment by removing columns with gaps based on a cutoff value of gap frequency. Moreover, the program can also provide summary-statistics of a raw alignment, such as pointing out one or more too short sequences and calculating the length of gap-free blocks.

        COMPARISONS BETWEEN FASPARSER WITH OTHER PROGRAMS

        The FasParser package provides a graphic user interface (GUI) with several commonly used functions that perform sequence manipulations. This package remains limited in that it cannot perform phylogenetic inference, edit alignments and identify open reading frames (ORF) (Table 1). Therefore, FasParser is not a replacement of other packages, such as MEGA. Nonetheless, new functions to FasParser are in the process of development.

        Table 1 Comparisons between FasParser with other programs

        Acknowledgements

        Special thanks to Prof. Robert W. Murphy, Dr. Adeniyi Charles Adeola and Lotanna Micah Nneji for the modifications of this manuscript, and also our colleagues for their suggestions on the improvement of FasParser.

        Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5: 113.

        Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, 41: 95-98.

        Jordan G, Goldman N. 2012. The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Molecular Biology and Evolution, 29(4): 1125-1139.

        Katoh K, Misawa K, Kuma KI, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14): 3059-3066.

        Kumar S, Tamura K, Nei M. 1994. MEGA: molecular evolutionary genetics analysis software for microcomputers. Computer Applications in Biosciences, 10(2): 189-191.

        Liu J, Wang LD, Sun YB, Li EM, Xu LY, Zhang YP, Yao YG, Kong QP. 2012. Deciphering the signature of selective constraints on cancerous mitochondrial genome. Molecular Biology and Evolution, 29(4): 1255-1261.

        L?ytynoja A, Goldman N. 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America, 102(30): 10557-10562.

        Nei M, Gojobori T. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution, 3(5): 418-426.

        10.24272/j.issn.2095-8137.2017.017

        01 February 2017; Accepted: 01 March 2017

        The development of this package was promoted by the data accumulated through the programs of the National Natural Science Foundation of China (31671326, 81602346, and 31401958)

        E-mail: sunyanbo@mail.kiz.ac.cn

        国产成人综合亚洲精品| 亚洲精品偷拍自综合网| 99国产精品久久久久久久成人热| 99在线精品免费视频九九视| 亚洲国产精品线路久久| 国内自拍视频在线观看| 国产偷国产偷亚洲综合av| 性xxxx18免费观看视频| 国产精品亚洲欧美云霸高清| 性感人妻一区二区三区| 亚洲国产成人久久精品不卡| 欧美性色黄大片手机版| 日韩中文网| 精品国产夫妻自拍av| 国产精品成人观看视频国产奇米| 亚洲综合区图片小说区| 免费精品美女久久久久久久久久| 国产一区二区三区免费在线播放| 亚洲 小说区 图片区 都市| 久久中文字幕无码专区| 丁香六月久久| 青青草免费手机直播视频| 99久久精品免费观看国产| 国产一区视频在线免费观看| 国产成人色污在线观看| 久久精品熟女亚洲av麻| 亚洲av永久无码精品一区二区| 国产一区二区精品久久凹凸| 日韩精品国产精品亚洲毛片| 日韩av激情在线观看| 久久精品片| 99久久精品国产一区色| 亚洲国产精品无码久久一线| 亚洲精品一区二区三区大桥未久| 欧洲AV秘 无码一区二区三 | 国产精品髙潮呻吟久久av| 国产不卡视频一区二区三区| 99久久久无码国产精品9| 成人精品国产亚洲av久久| 插插射啊爱视频日a级| 在线亚洲人成电影网站色www |