Bioutils: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 23: Line 23:
bioseq -t3 foo.fasta # translate in +1, +2, and +3 frames
bioseq -t3 foo.fasta # translate in +1, +2, and +3 frames
bioseq -t6 foo.fasta # translate in all 6 frames
bioseq -t6 foo.fasta # translate in all 6 frames
bioseq -s'100,200' foo.fasta | bioseq -r # bioseq is pipe-able: subseq and then revcom
bioseq -p'id:seq_1' foo.fasta # pick a sequence by ID
bioseq -p'id:seq_1' foo.fasta # pick a sequence by ID
bioseq -p'order:3' # pick the 3rd sequence
bioseq -p'order:3' # pick the 3rd sequence
Line 41: Line 40:
bioaln -d'seq_1,seq_3,seq_6' foo.aln # delele a subset of sequences
bioaln -d'seq_1,seq_3,seq_6' foo.aln # delele a subset of sequences


# biotree
# biopop
</syntaxhighlight>
</syntaxhighlight>


===Power usage===
===Power usage===
<syntaxhighlight lang=bash">
<syntaxhighlight lang=bash">
grep -v "Description" ../../bio425/data/ge.dat | wc -l; or:  grep -vc "Description" ../../bio425/data/ge.dat
# Pipe with the same utility
grep "Description" ../../bio425/data/ge.dat | tr '\t' '\n'| grep -v "Desc" | wc -l
bioseq -s'100,200' foo.fasta | bioseq -r # subseq and then revcom
grep -Pw "ERBB2|PGR|ESR1" ../../bio425/data/ge.dat
# Pipe among utilities
bioaln -o'fasta' foo.aln | bioseq -g # remove gaps within individual sequences
</syntaxhighlight>
</syntaxhighlight>
===Creative usage===
===Creative usage===

Revision as of 04:35, 29 September 2014

BioPerl-based Sequence Utilities

Figure 1. Design and Methods of bioutils

What is bioutils?

bioutils are a suite of Perl scripts that provide convenient command-line accesses to popular BioPerl methods. Designed as UNIX-like utilities, these tools aim to circumvent the need for composing one-off BioPerl scripts for routine manipulations of sequences, alignments and trees.

The initial release of bioutils consists of four utilities (Fig 1):

  • bioseq: a wrapper of BioPerl class Bio::Seq, with additional methods
  • bioaln: a wrapper of Bio::SimpleAlign, with additional methods
  • biopop: a wrapper of Bio::PopGen, with additional methods
  • biotree: a wrapper of Bio::tree, with additional methods

These utilities have been in development since 2002 in the lab of Dr Weigang Qiu at Hunter College of the City University of New York. They are the main code base of the Qiu Lab, which specializes in microbial evolutionary genomics. They proved to be convenient, efficient, and popular among students and researchers. By releasing bioutils as an Open Source tool, we hope to (1) share our experience and (2) invite other developers to join the effort of making BioPerl more accessible.


Demos

Basic Usage

# bioseq
bioseq -l foo.fasta # print seq names and lengths from FASTA (default format) file
bioseq -s '100,200'  foo.fasta # extract a subsequence
bioseq -r foo.fasta # reverse complement
bioseq -t1 foo.fasta # translate in the +1 frame
bioseq -t3 foo.fasta # translate in +1, +2, and +3 frames
bioseq -t6 foo.fasta # translate in all 6 frames
bioseq -p'id:seq_1' foo.fasta # pick a sequence by ID
bioseq -p'order:3' # pick the 3rd sequence
bioseq -p're:seq_' foo.fasta # pick a sequence by regular expression
bioseq -g foo.fasta # remove all gaps
bioseq -z'CP003201' -o'genbank' # retrieve a GenBank file with accession
bioseq -z'CP003201' -o'fasta' # same file in FASTA

# bioaln
bioaln -i'fasta' -o'phylip' foo.fasta # convert a FASTA alignment to PHYLIP
bioaln -l foo.aln # print alignment length of a CLUSTALW (default format) file
bioaln -s'100, 200' foo.aln # obtain an alignment slice
bioaln -m foo.aln # show only variable sites
bioaln -r'seq_2' foo.aln # use "seq_2" as reference (first) sequence
bioaln -g foo.aln # remove gapped sites
bioaln -p'seq_1,seq_3,seq_6' foo.aln # pick a subset of sequences
bioaln -d'seq_1,seq_3,seq_6' foo.aln # delele a subset of sequences

# biotree

# biopop

Power usage

# Pipe with the same utility
bioseq -s'100,200' foo.fasta | bioseq -r # subseq and then revcom
# Pipe among utilities
bioaln -o'fasta' foo.aln | bioseq -g # remove gaps within individual sequences

Creative usage

echo -ne ">lookup\nATG\n" | bioseq -t1 # Lookup a codon product
len=$(bioaln -l foo.aln); len_degap=$(bioaln -g foo.aln | bioaln -l); echo "$len-$len_degap" | bc -l # count alignment gaps

Full documentation


Release notes


Main contributors

  • Yozen Hernandez
  • Weigang Qiu
  • Pedro Pagan
  • Levy Vargas