Annotate-a-genome

From QiuLab
Jump to navigation Jump to search

Project Goals

A Borrelia Phylogeny
  • Annotate and add newly sequenced Borrelia genomes to BorreliaBase
  • Build an informatics pipeline for gene prediction, ortholog calls, databasing, and synteny analysis

Claim your assigned genome

Genome_id Strain Species Group Genome Sequences Notes
100 B31 B. burgdorferi (reference genome) Lyme Disease Reference. Already downloaded as "ref.pep"
114 CA382 B. burgdorferi (California) Lyme Disease
  • Accession: CP005925; Assigned to: HA
115 CA8 B. burgdorferi (California) Lyme Disease
  • Accession: ADMY01000001; Assigned to: AA
  • Accession: ADMY01000002; Assigned to: TAA
  • Accession: ADMY01000003; Assigned to: KD
  • Accession: ADMY01000004; Assigned to: JG
  • Accession: ADMY01000005; Assigned to: KPG
  • Accession: ADMY01000006; Assigned to: GG
  • Accession: ADMY01000007; Assigned to: TDH
304 BgVir B. garinii (Russia) Lyme Disease
  • Accession: CP003151; Assigned to: LH
  • Accession: CP003201; Assigned to: SK
  • Accession: CP003202; Assigned to: BK
305 NMJW1 B. garinii (China) Lyme Disease
  • Accession: CP003866; Assigned to: AL
402 HLJ01 B. afzelii (China) Lyme Disease
  • Accession: CP003882; Assigned to: RL
1003 Ly B. duttonii (Tanzania) Relapsing Fever
  • Accession: CP000976; Assigned to: HL
  • Accession: CP000980; Assigned to: NM
1001 A1 B. recurrentis (Ethiopia) Relapsing Fever
  • Accession: CP000993; Assigned to: JP
  • Accession: CP000994; Assigned to: DP
  • Accession: CP000995; Assigned to: GAR
1100 DAH B. hermsii (Washington State) Relapsing Fever
  • Accession: CP000048; Assigned to: KR
1101 MTW B. hermsii (?) Relapsing Fever
  • Accession: CP005680; Assigned to: ?
1102 YBT B. hermsii (?) Relapsing Fever
  • Accession: CP005706; Assigned to: ?
?? SLO B. parkeri (?) Relapsing Fever
  • Accession: CP005706; Assigned to: ?
1103 YOR B. hermsii (?) Relapsing Fever
  • Accession: CP004146; Assigned to: ?
1200 91E135 B. turicatae (Texas) Relapsing Fever
  • Accession: CP000049; Assigned to: MDR
1002 Achema B. crocidurae (Mauritania) Relapsing Fever
  • Accession: CP003426; Assigned to: VS
1400 HR1 B. parkeri (??) Relapsing Fever
  • Accession: CP0007022; Assigned to: AV
1300 LB-2001 B. miyamotoi (Northeast US) Relapsing Fever
  • Accession: CP006647; Assigned to: LLW
107 94a B. burgdorferi (Northeast US) Lyme Disease
  • Accession: ABGK02000008; Assigned to: QZ

Protocol

Dependencies

Part 1. Fetch genome sequences and extract protein sequences

  • Note: These scripts are in "../../bio425/annotate-a-genome-pipeline". You may either make a copy to your home directory (recommended) or run directly from that directory by including the path
  • Commands:
./fetch-genome.pl <your_assigned_accession> # Expected output: "accession.gb"
./gb2pep.pl <accession.gb> # Expected output: "accession.pep"

Part 2. Predict orthologs with reciprocal BLAST

  • Note 1: Replace the "accession" in the following commands with your assigned accession number.
  • Note 2: You may have to lower the identity (-d, default 90) and length coverage (-l, default 90) for replapsing fever genomes
  • Commands:
cp ../../bio425/annotate-genome-pipeline/b31.pep . # get reference genome
makeblastdb -in b31.pep -dbtype 'prot' -parse_seqids -out ref # Prepare the ref genome DB
makeblastdb -in accession.pep -dbtype 'prot' -parse_seqids -out mygenome # Prepare the new genome DB
blastp -query accession.pep -db ref -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.fwd # Forward BLAST # customized outfmt 6
blastp -query b31.pep -db mygenome -outfmt '6 qseqid sseqid pident length qlen evalue' -evalue 1e-3 -out accession.rev # Reverse BLAST
./find-reciprocal.pl <accession.fwd> <accession.rev> > accession.orthlogs 2> accession.not-orthologs # Identify orthologs. 
# check results. If orthologs less than 90% of total ORFs in genbank file, run the previous command with more relaxed stringency (use "-d 80", 80% identify cutoff)
wc accession.orthlogs
wc accession.not-orthologs

Part 3. Generate database tables & deposit your results

  • Note: Use your assigned genome_id in the table above as the argument for the "-g" option
./gb2table.pl -g <genome_id> -c <accession.gb> # Expected output: "accession.contig.txt"
./gb2table.pl -g <genome_id> -f <accession.gb> # Expected output: "accession.orf.txt"
# Check your outputs
wc <accession.contig.txt>
head <accession.contig.txt>
tail <accession.contig.txt>
wc <accession.orf.txt>
head <accession.orf.txt>
tail <accession.orf.txt>
cp accession.contig.txt ../../bio425/annotate-a-genome-results/your_initial.accession.contig.txt
cp accession.orf.txt ../../bio425/annotate-a-genome-results/your_initial.accession.orf.txt
cp accession.orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.orth
cp accession.not-orthologs ../../bio425/annotate-a-genome-results/your_initial.accession.not-orth