Annotate-a-genome: Difference between revisions
Jump to navigation
Jump to search
imported>Weigang |
imported>Weigang |
||
Line 107: | Line 107: | ||
* [http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download NCBI Standalone BLAST+] | * [http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download NCBI Standalone BLAST+] | ||
==Fetch genome sequences== | ==Fetch genome sequences and extract protein sequences== | ||
* Commands: | * Commands: | ||
<syntaxhighlight lang="bash" line enclose="div"> | <syntaxhighlight lang="bash" line enclose="div"> | ||
# These scripts are in "../../bio425/annotate-a-genome-pipeline". | |||
./ | # You may either make a copy to your home directory (recommended) or run directly from that directory | ||
./ | ./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb" | ||
./gb2fas.pl <accession.gb> # Expected output: "accession.pep" | |||
./ | |||
< | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Revision as of 03:47, 9 March 2014
Project Goals
- Annotate and add newly sequenced Borrelia genomes to BorreliaBase
- Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis
Download genome sequences from GenBank
Genome_id | Strain | Species | Group | Genome Sequences | Notes |
---|---|---|---|---|---|
100 | B31 | B. burgdorferi (reference genome) | Lyme Disease | Reference. Already downloaded as "ref.pep" | |
114 | CA382 | B. burgdorferi (California) | Lyme Disease |
| |
115 | CA8 | B. burgdorferi (California) | Lyme Disease |
| |
304 | BgVir | B. garinii (Russia) | Lyme Disease |
| |
305 | NMJW1 | B. garinii (China) | Lyme Disease |
| |
402 | HLJ01 | B. afzelii (China) | Lyme Disease |
| |
1003 | Ly | B. duttonii (Tanzania) | Relapsing Fever |
|
|
1001 | A1 | B. recurrentis (Ethiopia) | Relapsing Fever |
| |
1100 | DAH | B. hermsii (Washington State) | Relapsing Fever |
| |
1200 | 91E135 | B. turicatae (Texas) | Relapsing Fever |
| |
1002 | Achema | B. crocidurae (Mauritania) | Relapsing Fever |
|
|
1400 | HR1 | B. parkeri (??) | Relapsing Fever |
| |
1300 | LB-2001 | B. miyamotoi (Northeast US) | Relapsing Fever |
| |
107 | 94a | B. burgdorferi (Northeast US) | Lyme Disease |
|
Protocol
Dependencies
- BASH (default shell of Linux OS and Apple OS X)
- Perl and BioPerl
- DNATweezer
- NCBI Standalone BLAST+
Fetch genome sequences and extract protein sequences
- Commands:
# These scripts are in "../../bio425/annotate-a-genome-pipeline".
# You may either make a copy to your home directory (recommended) or run directly from that directory
./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb"
./gb2fas.pl <accession.gb> # Expected output: "accession.pep"
Predict orthologs with reciprocal BLAST
- Commands:
makeblastdb -in b31.pep -parse_seqids # Prepare the reference DB
makeblastdb -in new.pep -parse_seqids # Prepare the new genome DB
blastp -query new.pep -db b31.pep -outfmt 6 -evalue 1e-3 -out forward_blast.out # Forward BLAST
blastp -query b31.pep -db new.pep -outfmt 6 -evalue 1e-3 -out reverse_blast.out # Reverse BLAST
./check-reciprocal.pl forward_blast.out reverse_blast.out > new.orthlogs 2> new.not-orthologs # Identify orthologs
- Code for "check-reciprocal.pl":
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
die "$0 <forward_blast_output> <reverse_blast_output>\n" unless @ARGV == 2;
my ($fwd, $rev) = @ARGV;
my (@fwd_top_hits, @rev_top_hits);
open FWD, "<" . $fwd;
my (%fwd_top_hits, %rev_top_hits, @query);
my $query_ct=0;
while (<FWD>) {
chomp;
my @data = split;
next if $fwd_top_hits{$data[0]};
$fwd_top_hits{$data[0]} = $data[1];
push @query, $data[0];
$query_ct++;
}
close FWD;
warn "Total query having hits:" . $query_ct . "\n";
open REV, "<" . $rev;
while (<REV>) {
chomp;
my @data = split;
next if $rev_top_hits{$data[0]};
$rev_top_hits{$data[0]} = $data[1];
}
close REV;
foreach my $q (@query) { # e.g., BafPKo_0002
my $top = $fwd_top_hits{$q}; # e.g. BB_0002
if ( $q eq $rev_top_hits{$top}) {
print "Found reciprocol top hits:\t", $q, "\t", $top, "\n";
} else {
warn "Not reciprocol top hits:\t", $q, "\t", $top, "\t", $rev_top_hits{$top}, "\n";
}
}
exit;
Verify with synteny broswer
./gb2fas -t new.gb > new-to-orf-table.txt
# load into database with SQL
# Visualize synteny