Annotate-a-genome: Difference between revisions
Jump to navigation
Jump to search
imported>Weigang m (→Protocol) |
imported>Weigang |
||
(48 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Project Goals= | =Project Goals= | ||
* Annotate and add newly sequenced Borrelia genomes to [ | [[File:Borrelia-tree.png|thumbnail|A Borrelia Phylogeny]] | ||
* Annotate and add newly sequenced Borrelia genomes to [http://borreliabase.org BorreliaBase] | |||
* Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis | * Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis | ||
= | |||
=Claim your assigned genome= | |||
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
|- | |- | ||
! Strain !! Species !! Group !! Genome Sequences !! Notes | ! Genome_id !! Strain !! Species !! Group !! Genome Sequences !! Notes | ||
|- | |- | ||
| B31 || B. burgdorferi (reference genome) || Lyme Disease || | |-style="background-color:powderblue;" | ||
| 100 || B31 || B. burgdorferi (reference genome) || Lyme Disease || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/AE000783.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/AE000783.1 main chromosome] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/AE000792.1 cp26 plasmid] | * [http://www.ncbi.nlm.nih.gov/nuccore/AE000792.1 cp26 plasmid] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/AE000790.2 lp54 plasmid] | * [http://www.ncbi.nlm.nih.gov/nuccore/AE000790.2 lp54 plasmid] | ||
|| | || Reference. Already downloaded as "ref.pep" | ||
|- | |- | ||
| CA382 || B. burgdorferi (California) || Lyme Disease || | | 114 || CA382 || B. burgdorferi (California) || Lyme Disease || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP005925.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP005925.1 main chromosome] | ||
|| | || | ||
* Accession: CP005925; Assigned to: HA | |||
|- | |- | ||
| CA8 || B. burgdorferi (California) || Lyme Disease || | | 115 || CA8 || B. burgdorferi (California) || Lyme Disease || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/?term=ADMY01000001:ADMY01000007%5Baccn%5D | * [http://www.ncbi.nlm.nih.gov/nuccore/?term=ADMY01000001:ADMY01000007%5Baccn%5D 7 unassembled contigs] | ||
|| | || | ||
* Accession: ADMY01000001; Assigned to: AA | |||
* Accession: ADMY01000002; Assigned to: TAA | |||
* Accession: ADMY01000003; Assigned to: KD | |||
* Accession: ADMY01000004; Assigned to: JG | |||
* Accession: ADMY01000005; Assigned to: KPG | |||
* Accession: ADMY01000006; Assigned to: GG | |||
* Accession: ADMY01000007; Assigned to: TDH | |||
|- | |- | ||
| BgVir || B. garinii (Russia) || Lyme Disease || | | 304 || BgVir || B. garinii (Russia) || Lyme Disease || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003151.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003151.1 main chromosome] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003201.1 cp26] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003201.1 cp26] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003202.1 lp54] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003202.1 lp54] | ||
|| | || | ||
* Accession: CP003151; Assigned to: LH | |||
* Accession: CP003201; Assigned to: SK | |||
* Accession: CP003202; Assigned to: BK | |||
|- | |- | ||
| NMJW1 || B. garinii (China) || Lyme Disease || | | 305 || NMJW1 || B. garinii (China) || Lyme Disease || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003866.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003866.1 main chromosome] | ||
|| | || | ||
* Accession: CP003866; Assigned to: AL | |||
|- | |- | ||
| HLJ01 || B. afzelii (China) || Lyme Disease || | | 402 || HLJ01 || B. afzelii (China) || Lyme Disease || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003882.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003882.1 main chromosome] | ||
|| | || | ||
*Accession: CP003882; Assigned to: RL | |||
|- | |- | ||
| Ly || B. duttonii (Tanzania) || Relapsing Fever || | | 1003|| Ly || B. duttonii (Tanzania) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000976.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000976.1 main chromosome] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000980.1 lp23 (homolog of cp26 in LD genomes)] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000980.1 lp23 (homolog of cp26 in LD genomes)] | ||
* Many other plasmids (not to include) | * Many other plasmids (not to include) | ||
|| | || | ||
* Accession: CP000976; Assigned to: HL | |||
* Accession: CP000980; Assigned to: NM | |||
|- | |- | ||
| A1 ||B. recurrentis (Ethiopia) || Relapsing Fever || | | 1001 || A1 ||B. recurrentis (Ethiopia) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000993.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000993.1 main chromosome] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000994.1 lp124 (homologous to lp54 in LD genomes)] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000994.1 lp124 (homologous to lp54 in LD genomes)] | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000995.1 lp23] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000995.1 lp23] | ||
|| | || | ||
* Accession: CP000993; Assigned to: JP | |||
* Accession: CP000994; Assigned to: DP | |||
* Accession: CP000995; Assigned to: GAR | |||
|- | |- | ||
| DAH || B. hermsii (Washington State) || Relapsing Fever || | | 1100 || DAH || B. hermsii (Washington State) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000048.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000048.1 main chromosome] | ||
|| | || | ||
* Accession: CP000048; Assigned to: KR | |||
|- | |- | ||
| 91E135 || B. turicatae (Texas) || Relapsing Fever || | | 1101 || MTW || B. hermsii (?) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP005680.1 main chromosome] | |||
|| | |||
* Accession: CP005680; Assigned to: ? | |||
|- | |||
| 1102 || YBT || B. hermsii (?) || Relapsing Fever || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP005706.1 main chromosome] | |||
|| | |||
* Accession: CP005706; Assigned to: ? | |||
|- | |||
| ?? || SLO || B. parkeri (?) || Relapsing Fever || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP005851.1 main chromosome] | |||
|| | |||
* Accession: CP005706; Assigned to: ? | |||
|- | |||
| 1103 || YOR || B. hermsii (?) || Relapsing Fever || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP004146.1 main chromosome] | |||
|| | |||
* Accession: CP004146; Assigned to: ? | |||
|- | |||
| 1200 || 91E135 || B. turicatae (Texas) || Relapsing Fever || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP000049.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP000049.1 main chromosome] | ||
|| | || | ||
* Accession: CP000049; Assigned to: MDR | |||
|- | |- | ||
| Achema || B. crocidurae (Mauritania) || Relapsing Fever || | | 1002 || Achema || B. crocidurae (Mauritania) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP003426.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP003426.1 main chromosome] | ||
* Many unassembled plasmids (not to include) | * Many unassembled plasmids (not to include) | ||
|| | || | ||
* Accession: CP003426; Assigned to: VS | |||
|- | |- | ||
| HR1 || B. parkeri (??) || Relapsing Fever || | |1400 || HR1 || B. parkeri (??) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP007022.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP007022.1 main chromosome] | ||
|| | || | ||
* Accession: CP0007022; Assigned to: AV | |||
|- | |- | ||
| LB-2001 || B. miyamotoi (Northeast US) || Relapsing Fever || | | 1300 || LB-2001 || B. miyamotoi (Northeast US) || Relapsing Fever || | ||
* [http://www.ncbi.nlm.nih.gov/nuccore/CP006647.1 main chromosome] | * [http://www.ncbi.nlm.nih.gov/nuccore/CP006647.1 main chromosome] | ||
|| | || | ||
* Accession: CP006647; Assigned to: LLW | |||
|- | |||
| 107 || 94a || B. burgdorferi (Northeast US) || Lyme Disease || | |||
* [http://www.ncbi.nlm.nih.gov/nuccore/ABGK02000008.1 main chromosome] | |||
|| | |||
* Accession: ABGK02000008; Assigned to: QZ | |||
|} | |} | ||
=Protocol= | =Protocol= | ||
==Fetch genome sequences== | ==Dependencies== | ||
* BASH (default shell of Linux OS and Apple OS X) | |||
* Perl and [http://bioperl.org BioPerl] | |||
* [http://sourceforge.net/projects/dnatwizzer/ DNATweezer] | |||
* [http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download NCBI Standalone BLAST+] | |||
==Part 1. Fetch genome sequences and extract protein sequences== | |||
* '''Note''': These scripts are in "../../bio425/annotate-a-genome-pipeline". You may either make a copy to your home directory (recommended) or run directly from that directory by including the path | |||
* Commands: | |||
<syntaxhighlight lang="bash" line enclose="div"> | <syntaxhighlight lang="bash" line enclose="div"> | ||
./ | ./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb" | ||
./gb2pep.pl <accession.gb> # Expected output: "accession.pep" | |||
./ | |||
</syntaxhighlight> | </syntaxhighlight> | ||
==Predict orthologs with reciprocal BLAST== | |||
<syntaxhighlight lang="bash" line start=" | ==Part 2. Predict orthologs with reciprocal BLAST== | ||
makeblastdb -in b31.pep -parse_seqids # Prepare the | * '''Note 1: Replace the "accession" in the following commands with your assigned accession number.''' | ||
makeblastdb -in | * '''Note 2: You may have to lower the identity (-d, default 90) and length coverage (-l, default 90) for replapsing fever genomes''' | ||
blastp -query | * Commands: | ||
blastp -query b31.pep -db | <syntaxhighlight lang="bash" line start="3" enclose="div"> | ||
./ | cp ../../bio425/annotate-genome-pipeline/b31.pep . # get reference genome | ||
makeblastdb -in b31.pep -dbtype 'prot' -parse_seqids -out ref # Prepare the ref genome DB | |||
makeblastdb -in accession.pep -dbtype 'prot' -parse_seqids -out mygenome # Prepare the new genome DB | |||
blastp -query accession.pep -db ref -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.fwd # Forward BLAST # customized outfmt 6 | |||
blastp -query b31.pep -db mygenome -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.rev # Reverse BLAST | |||
./find-reciprocal.pl <accession.fwd> <accession.rev> > accession.orthlogs 2> accession.not-orthologs # Identify orthologs. | |||
# check results. If orthologs less than 90% of total ORFs in genbank file, run the previous command with more relaxed stringency (use "-d 80", 80% identify cutoff) | |||
wc accession.orthlogs | |||
wc accession.not-orthologs | |||
</syntaxhighlight> | </syntaxhighlight> | ||
== | |||
==Part 3. Generate database tables & deposit your results== | |||
* '''Note: Use your assigned genome_id in the table above as the argument for the "-g" option''' | |||
<syntaxhighlight lang="bash" line start="12" enclose="div"> | <syntaxhighlight lang="bash" line start="12" enclose="div"> | ||
./ | ./gb2table.pl -g <genome_id> -c <accession.gb> # Expected output: "accession.contig.txt" | ||
./gb2table.pl -g <genome_id> -f <accession.gb> # Expected output: "accession.orf.txt" | |||
# Check your outputs | |||
wc <accession.contig.txt> | |||
head <accession.contig.txt> | |||
tail <accession.contig.txt> | |||
wc <accession.orf.txt> | |||
head <accession.orf.txt> | |||
tail <accession.orf.txt> | |||
cp accession.contig.txt ../../bio425/annotate-a-genome-results/your_initial.accession.contig.txt | |||
cp accession.orf.txt ../../bio425/annotate-a-genome-results/your_initial.accession.orf.txt | |||
cp accession.orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.orth | |||
cp accession.not-orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.not-orth | |||
</syntaxhighlight> | </syntaxhighlight> |
Latest revision as of 05:24, 23 May 2014
Project Goals
- Annotate and add newly sequenced Borrelia genomes to BorreliaBase
- Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis
Claim your assigned genome
Genome_id | Strain | Species | Group | Genome Sequences | Notes |
---|---|---|---|---|---|
100 | B31 | B. burgdorferi (reference genome) | Lyme Disease | Reference. Already downloaded as "ref.pep" | |
114 | CA382 | B. burgdorferi (California) | Lyme Disease |
| |
115 | CA8 | B. burgdorferi (California) | Lyme Disease |
| |
304 | BgVir | B. garinii (Russia) | Lyme Disease |
| |
305 | NMJW1 | B. garinii (China) | Lyme Disease |
| |
402 | HLJ01 | B. afzelii (China) | Lyme Disease |
| |
1003 | Ly | B. duttonii (Tanzania) | Relapsing Fever |
|
|
1001 | A1 | B. recurrentis (Ethiopia) | Relapsing Fever |
| |
1100 | DAH | B. hermsii (Washington State) | Relapsing Fever |
| |
1101 | MTW | B. hermsii (?) | Relapsing Fever |
| |
1102 | YBT | B. hermsii (?) | Relapsing Fever |
| |
?? | SLO | B. parkeri (?) | Relapsing Fever |
| |
1103 | YOR | B. hermsii (?) | Relapsing Fever |
| |
1200 | 91E135 | B. turicatae (Texas) | Relapsing Fever |
| |
1002 | Achema | B. crocidurae (Mauritania) | Relapsing Fever |
|
|
1400 | HR1 | B. parkeri (??) | Relapsing Fever |
| |
1300 | LB-2001 | B. miyamotoi (Northeast US) | Relapsing Fever |
| |
107 | 94a | B. burgdorferi (Northeast US) | Lyme Disease |
|
Protocol
Dependencies
- BASH (default shell of Linux OS and Apple OS X)
- Perl and BioPerl
- DNATweezer
- NCBI Standalone BLAST+
Part 1. Fetch genome sequences and extract protein sequences
- Note: These scripts are in "../../bio425/annotate-a-genome-pipeline". You may either make a copy to your home directory (recommended) or run directly from that directory by including the path
- Commands:
./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb"
./gb2pep.pl <accession.gb> # Expected output: "accession.pep"
Part 2. Predict orthologs with reciprocal BLAST
- Note 1: Replace the "accession" in the following commands with your assigned accession number.
- Note 2: You may have to lower the identity (-d, default 90) and length coverage (-l, default 90) for replapsing fever genomes
- Commands:
cp ../../bio425/annotate-genome-pipeline/b31.pep . # get reference genome
makeblastdb -in b31.pep -dbtype 'prot' -parse_seqids -out ref # Prepare the ref genome DB
makeblastdb -in accession.pep -dbtype 'prot' -parse_seqids -out mygenome # Prepare the new genome DB
blastp -query accession.pep -db ref -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.fwd # Forward BLAST # customized outfmt 6
blastp -query b31.pep -db mygenome -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.rev # Reverse BLAST
./find-reciprocal.pl <accession.fwd> <accession.rev> > accession.orthlogs 2> accession.not-orthologs # Identify orthologs.
# check results. If orthologs less than 90% of total ORFs in genbank file, run the previous command with more relaxed stringency (use "-d 80", 80% identify cutoff)
wc accession.orthlogs
wc accession.not-orthologs
Part 3. Generate database tables & deposit your results
- Note: Use your assigned genome_id in the table above as the argument for the "-g" option
./gb2table.pl -g <genome_id> -c <accession.gb> # Expected output: "accession.contig.txt"
./gb2table.pl -g <genome_id> -f <accession.gb> # Expected output: "accession.orf.txt"
# Check your outputs
wc <accession.contig.txt>
head <accession.contig.txt>
tail <accession.contig.txt>
wc <accession.orf.txt>
head <accession.orf.txt>
tail <accession.orf.txt>
cp accession.contig.txt ../../bio425/annotate-a-genome-results/your_initial.accession.contig.txt
cp accession.orf.txt ../../bio425/annotate-a-genome-results/your_initial.accession.orf.txt
cp accession.orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.orth
cp accession.not-orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.not-orth