Pseudomonas population genomics: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Rayrah
imported>Rli0019
mNo edit summary
 
(53 intermediate revisions by 3 users not shown)
Line 5: Line 5:
### "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
### "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
### "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
### "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
## Parsing scripts
## Parsing scripts  
###Rayees Parsing code, requires that you remove columns 9-27 using bash command: <code>cut -c 1-8</code> (I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
## Database loading scripts
## Database loading scripts
#Molecular Evolution of flagellum genes
#Molecular Evolution of flagellum genes
Line 11: Line 12:
## Reconstruct phylogenetic tree
## Reconstruct phylogenetic tree
## Run PAML tests
## Run PAML tests
==Update:August 5, 2013==
Pseudomonas Pipeline Steps:
USAGE: ./pseudo_pipline.sh [options] [FASTQ file]
*'''Step 0''': Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores).
*'''Step 1''': Assembly of reads via ALL Paths
*'''Step 2''': ORF finding via Glimmer
*'''Step 3''': Gene finding via BLAST
*'''Step 4''': Top Gene matches and Database orth_id matching via top_orth_match.pl
*'''Step 5''': Predicted Ortholog Database import
*'''Step 6''': Core genome extraction
*'''Step 7''': Choose random number of orthologs via random_cds.sh
*'''Step 8''': Translation of Orthologs via BioSeq
*'''Step 9''': Alignment of peptide files via Muscle
*'''Step 10''': Reverse translation of aligned sequences via align_coding_seq.pl
*'''Step 11''': Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl
*'''Step 12''': Creation of treefile via FASTTREE
Pseudomonas Pipeline options:
#-i: keep all intermediary files (default remove all intermediate files)
#-f: keep only intermediate fasta files
#-b: [integer] blast score threshold
#-n: [integer] number of genes to create treefile from
==Update:July 22, 2013==
Estimation of recombination hotspots via LD_hat
Ld_hat results:
{| class="wikitable"
| Data || Number of Genes ||  ||  || ||
|-
| No PA7 || 10 || [[File:No_PA7_1.png|200px]] || [[File:No_PA7_2.png|200px]] || [[File:No_PA7_3.png|200px]] || [[File:No_PA7_4.png|200px]]
|-
| PA7 || 10 || [[File:PA7_1.png|200px]] || [[File:PA7_2.png|200px]]  || [[File:PA7_3.png|200px]]  || [[File:PA7_4.png|200px]]
|-
| No PA7 || 50 || [[File:No_PA7_51.png|200px]] || [[File:No_PA7_6.png|200px]] || [[File:No_PA7_7.png|200px]] || [[File:No_PA7_8.png|200px]] 
|}
==Update:July 17, 2013==
Phylip dollop parsimony tree based on an matrix of ortholog changes
[[parsimony output]]
==Update: July 16, 2013==
Treefiles based on aligned orthologs
{| class="wikitable"
| number of genes used|| Method || treefile 
|-
| 100|| FASTTREE || [[treefile]]
|-
| 100 || FASTTREE || [[Treefile 2]]
|-
| all orthologs ||Phylip || [[treefile_phylip]]
|-
|}
FleN, FleQ, Flhf unique strains
{| class="wikitable"
| Gene || alignment || tree 
|-
| fleN || [[fleN unique align]] ||
[[File:flen_unique.png|200px]]
|-
| fleQ || [[fleQ unique align]] || [[File:fleq_unique.png|200px]]
|-
| flhf ||[[flhF unique align]] || [[File:flhf_unique.png|200px]]
|-
|}
To do:
# Create pipeline
# LD_Hat w/ 10 genes excluding and including PA7
==Update: June 28, 2013==
Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs
{| class="wikitable"
! Gene !! PAML outfile for orthologs !! 
|-
| fleN || [[fleN PAML]]
|-
| fleQ || [[fleQ PAML]] 
|-
| flhf ||[[flhF-corrected PAML]]
|-
|}
To do:
# Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/).
# Analysis of positively selected genes using PAML and homozygosity analysis
==Update: June 18, 2013==
Ortholog aligning and phylogenetic tree material and methods
Protein ortholog data: [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105676&start=1583956&stop=1584798&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleN], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=104960&start=1187587&stop=1189059&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleQ], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105674&start=1582528&stop=1583817&replicon_id_reference=136&alphabet=protein&limit_to_species=false flhF]   
# Protocol:
## Fasta headers are too long for tree, run: [https://www.dropbox.com/s/x2p4joeqg7omfub/rename.pl rename.pl] to shorten names. Usage:<code> >rename.pl <FASTA_file> > <OUTPUTfilename.fas> </code> (will rewrite script to create automatic output file)
## To align use [http://www.drive5.com/muscle/Alignment muscle] Usage: <code> >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict </code>
## To create tree file run [http://www.clustal.org/clustal2/ clustalW2] Usage <code>>clustalw2 -infile= <Aligned_file> -output= Phylip </code>
## To create tree run [http://www.r-project.org/ R] using following commands:
###<code> setwd("<directory containing phylip files>")</code>
###<code> library ("ape")</code>
###<code> library ("phangorn")</code>
###<code> <gene_name> = read.tree("<file_gene_name.phy>")</code>
###<code> <gene_name> = midpoint(gene_name)</code>
###<code> plot(<gene_name>)</code>
{| class="wikitable"
! Gene !! Alignment !! Tree !! Notes
|-
| fleN || [[fleN pep alignment]] ||[[File:flen.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAN03366.1 Conserved Domain]
|-
| fleQ || [[fleQ pep alignment]] ||[[File:fleq.png|200px]]  || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAC37124.1 Conserved Domain]
|-
| flhf ||[[flhF pep alignment]]||[[File:flhf.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain]
|-
|}
==Benchmark: June 11, 2013==
==Benchmark: June 11, 2013==
# Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
# Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
## Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
# Parsing the [http://pseudomonas.com/downloadOrthologs.do?strain_id=107 ortholog file] to upload the "orth_orf" table (Raymond)
# Parsing the [http://pseudomonas.com/downloadOrthologs.do?strain_id=107 ortholog file] to upload the "orth_orf" table (Raymond)
# Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)
# Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)

Latest revision as of 23:41, 1 December 2013

Projects

  1. Build a local genome database
    1. Database schema:
      1. "genome": genome_id, strain_name, ncbi_taxid
      2. "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
      3. "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
    2. Parsing scripts
      1. Rayees Parsing code, requires that you remove columns 9-27 using bash command: cut -c 1-8 (I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
    3. Database loading scripts
  2. Molecular Evolution of flagellum genes
    1. Download orthologs
    2. Reconstruct phylogenetic tree
    3. Run PAML tests

Update:August 5, 2013

Pseudomonas Pipeline Steps:

USAGE: ./pseudo_pipline.sh [options] [FASTQ file]

  • Step 0: Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores).
  • Step 1: Assembly of reads via ALL Paths
  • Step 2: ORF finding via Glimmer
  • Step 3: Gene finding via BLAST
  • Step 4: Top Gene matches and Database orth_id matching via top_orth_match.pl
  • Step 5: Predicted Ortholog Database import
  • Step 6: Core genome extraction
  • Step 7: Choose random number of orthologs via random_cds.sh
  • Step 8: Translation of Orthologs via BioSeq
  • Step 9: Alignment of peptide files via Muscle
  • Step 10: Reverse translation of aligned sequences via align_coding_seq.pl
  • Step 11: Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl
  • Step 12: Creation of treefile via FASTTREE

Pseudomonas Pipeline options:

  1. -i: keep all intermediary files (default remove all intermediate files)
  2. -f: keep only intermediate fasta files
  3. -b: [integer] blast score threshold
  4. -n: [integer] number of genes to create treefile from

Update:July 22, 2013

Estimation of recombination hotspots via LD_hat Ld_hat results:

Data Number of Genes
No PA7 10 No PA7 1.png No PA7 2.png No PA7 3.png No PA7 4.png
PA7 10 PA7 1.png PA7 2.png PA7 3.png PA7 4.png
No PA7 50 No PA7 51.png No PA7 6.png No PA7 7.png No PA7 8.png

Update:July 17, 2013

Phylip dollop parsimony tree based on an matrix of ortholog changes

parsimony output

Update: July 16, 2013

Treefiles based on aligned orthologs

number of genes used Method treefile
100 FASTTREE treefile
100 FASTTREE Treefile 2
all orthologs Phylip treefile_phylip

FleN, FleQ, Flhf unique strains

Gene alignment tree
fleN fleN unique align

Flen unique.png

fleQ fleQ unique align Fleq unique.png
flhf flhF unique align Flhf unique.png

To do:

  1. Create pipeline
  2. LD_Hat w/ 10 genes excluding and including PA7

Update: June 28, 2013

Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs

Gene PAML outfile for orthologs
fleN fleN PAML
fleQ fleQ PAML
flhf flhF-corrected PAML

To do:

  1. Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/).
  2. Analysis of positively selected genes using PAML and homozygosity analysis

Update: June 18, 2013

Ortholog aligning and phylogenetic tree material and methods

Protein ortholog data: fleN, fleQ, flhF    
  1. Protocol:
    1. Fasta headers are too long for tree, run: rename.pl to shorten names. Usage: >rename.pl <FASTA_file> > <OUTPUTfilename.fas> (will rewrite script to create automatic output file)
    2. To align use muscle Usage: >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict
    3. To create tree file run clustalW2 Usage >clustalw2 -infile= <Aligned_file> -output= Phylip
    4. To create tree run R using following commands:
      1. setwd("<directory containing phylip files>")
      2. library ("ape")
      3. library ("phangorn")
      4. <gene_name> = read.tree("<file_gene_name.phy>")
      5. <gene_name> = midpoint(gene_name)
      6. plot(<gene_name>)
Gene Alignment Tree Notes
fleN fleN pep alignment Flen.png Conserved Domain
fleQ fleQ pep alignment Fleq.png Conserved Domain
flhf flhF pep alignment Flhf.png Conserved Domain

Benchmark: June 11, 2013

  1. Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
    1. Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
  2. Parsing the ortholog file to upload the "orth_orf" table (Raymond)
  3. Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)