Annotate-a-genome: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 118: Line 118:
==Predict orthologs with reciprocal BLAST==
==Predict orthologs with reciprocal BLAST==
* Commands:
* Commands:
<syntaxhighlight lang="bash" line start="7" enclose="div">
<syntaxhighlight lang="bash" line start="6" enclose="div">
makeblastdb -in b31.pep -parse_seqids # Prepare the reference DB
makeblastdb -in accession.pep -parse_seqids # Prepare the new genome DB
makeblastdb -in new.pep -parse_seqids # Prepare the new genome DB
blastp -query accession.pep -db ref.pep -outfmt 6 -evalue 1e-3 -out accession.fwd # Forward BLAST
blastp -query new.pep -db b31.pep -outfmt 6 -evalue 1e-3 -out forward_blast.out # Forward BLAST
blastp -query ref.pep -db accession.pep -outfmt 6 -evalue 1e-3 -out accession.rev # Reverse BLAST
blastp -query b31.pep -db new.pep -outfmt 6 -evalue 1e-3 -out reverse_blast.out # Reverse BLAST
./check-reciprocal.pl <accession.fwd> <accession.rev> > accession.orthlogs 2> accession.not-orthologs # Identify orthologs
./check-reciprocal.pl forward_blast.out reverse_blast.out > new.orthlogs 2> new.not-orthologs # Identify orthologs
</syntaxhighlight>
* Code for "check-reciprocal.pl":
<syntaxhighlight lang="perl" line enclose="div">
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
 
die "$0 <forward_blast_output> <reverse_blast_output>\n" unless @ARGV == 2;
my ($fwd, $rev) = @ARGV;
my (@fwd_top_hits, @rev_top_hits);
open FWD, "<" . $fwd;
my (%fwd_top_hits, %rev_top_hits, @query);
my $query_ct=0;
while (<FWD>) {
    chomp;
    my @data = split;
    next if $fwd_top_hits{$data[0]};
    $fwd_top_hits{$data[0]} = $data[1];
    push @query, $data[0];
    $query_ct++;
}
close FWD;
warn "Total query having hits:" . $query_ct . "\n";
 
open REV, "<" . $rev;
while (<REV>) {
    chomp;
    my @data = split;
    next if $rev_top_hits{$data[0]};
    $rev_top_hits{$data[0]} = $data[1];
}
close REV;
 
foreach my $q (@query) { # e.g., BafPKo_0002
    my $top = $fwd_top_hits{$q}; # e.g. BB_0002
    if ( $q eq $rev_top_hits{$top}) {
print "Found reciprocol top hits:\t", $q, "\t", $top, "\n";
    } else {
warn "Not reciprocol top hits:\t", $q, "\t", $top, "\t", $rev_top_hits{$top}, "\n";
    }
}
 
exit;
</syntaxhighlight>
</syntaxhighlight>



Revision as of 03:51, 9 March 2014

Project Goals

A Borrelia Phylogeny
  • Annotate and add newly sequenced Borrelia genomes to BorreliaBase
  • Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis

Download genome sequences from GenBank

Genome_id Strain Species Group Genome Sequences Notes
100 B31 B. burgdorferi (reference genome) Lyme Disease Reference. Already downloaded as "ref.pep"
114 CA382 B. burgdorferi (California) Lyme Disease
  • Accession: CP005925; Assigned to: HA
115 CA8 B. burgdorferi (California) Lyme Disease
  • Accession: ADMY01000001; Assigned to: AA
  • Accession: ADMY01000002; Assigned to: TAA
  • Accession: ADMY01000003; Assigned to: KD
  • Accession: ADMY01000004; Assigned to: JG
  • Accession: ADMY01000005; Assigned to: KPG
  • Accession: ADMY01000006; Assigned to: GG
  • Accession: ADMY01000007; Assigned to: TDH
304 BgVir B. garinii (Russia) Lyme Disease
  • Accession: CP003151; Assigned to: LH
  • Accession: CP003201; Assigned to: SK
  • Accession: CP003202; Assigned to: BK
305 NMJW1 B. garinii (China) Lyme Disease
  • Accession: CP003866; Assigned to: AL
402 HLJ01 B. afzelii (China) Lyme Disease
  • Accession: CP003882; Assigned to: RL
1003 Ly B. duttonii (Tanzania) Relapsing Fever
  • Accession: CP000976; Assigned to: HL
  • Accession: CP000980; Assigned to: NM
1001 A1 B. recurrentis (Ethiopia) Relapsing Fever
  • Accession: CP000993; Assigned to: JP
  • Accession: CP000994; Assigned to: DP
  • Accession: CP000995; Assigned to: GAR
1100 DAH B. hermsii (Washington State) Relapsing Fever
  • Accession: CP000048; Assigned to: KR
1200 91E135 B. turicatae (Texas) Relapsing Fever
  • Accession: CP000049; Assigned to: MDR
1002 Achema B. crocidurae (Mauritania) Relapsing Fever
  • Accession: CP003426; Assigned to: VS
1400 HR1 B. parkeri (??) Relapsing Fever
  • Accession: CP0007022; Assigned to: AV
1300 LB-2001 B. miyamotoi (Northeast US) Relapsing Fever
  • Accession: CP006647; Assigned to: LLW
107 94a B. burgdorferi (Northeast US) Lyme Disease
  • Accession: ABGK02000008; Assigned to: QZ

Protocol

Dependencies

Fetch genome sequences and extract protein sequences

  • Commands:
# These scripts are in "../../bio425/annotate-a-genome-pipeline". 
# You may either make a copy to your home directory (recommended) or run directly from that directory
./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb"
./gb2fas.pl <accession.gb> # Expected output: "accession.pep"

Predict orthologs with reciprocal BLAST

  • Commands:
makeblastdb -in accession.pep -parse_seqids # Prepare the new genome DB
blastp -query accession.pep -db ref.pep -outfmt 6 -evalue 1e-3 -out accession.fwd # Forward BLAST
blastp -query ref.pep -db accession.pep -outfmt 6 -evalue 1e-3 -out accession.rev # Reverse BLAST
./check-reciprocal.pl <accession.fwd> <accession.rev> > accession.orthlogs 2> accession.not-orthologs # Identify orthologs

Verify with synteny broswer

./gb2fas -t new.gb > new-to-orf-table.txt
# load into database with SQL
# Visualize synteny