Pseudomonas population genomics: Difference between revisions
Jump to navigation
Jump to search
imported>Rayrah |
imported>Rli0019 mNo edit summary |
||
(40 intermediate revisions by 2 users not shown) | |||
Line 12: | Line 12: | ||
## Reconstruct phylogenetic tree | ## Reconstruct phylogenetic tree | ||
## Run PAML tests | ## Run PAML tests | ||
==Update:August 5, 2013== | |||
Pseudomonas Pipeline Steps: | |||
USAGE: ./pseudo_pipline.sh [options] [FASTQ file] | |||
*'''Step 0''': Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores). | |||
*'''Step 1''': Assembly of reads via ALL Paths | |||
*'''Step 2''': ORF finding via Glimmer | |||
*'''Step 3''': Gene finding via BLAST | |||
*'''Step 4''': Top Gene matches and Database orth_id matching via top_orth_match.pl | |||
*'''Step 5''': Predicted Ortholog Database import | |||
*'''Step 6''': Core genome extraction | |||
*'''Step 7''': Choose random number of orthologs via random_cds.sh | |||
*'''Step 8''': Translation of Orthologs via BioSeq | |||
*'''Step 9''': Alignment of peptide files via Muscle | |||
*'''Step 10''': Reverse translation of aligned sequences via align_coding_seq.pl | |||
*'''Step 11''': Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl | |||
*'''Step 12''': Creation of treefile via FASTTREE | |||
Pseudomonas Pipeline options: | |||
#-i: keep all intermediary files (default remove all intermediate files) | |||
#-f: keep only intermediate fasta files | |||
#-b: [integer] blast score threshold | |||
#-n: [integer] number of genes to create treefile from | |||
==Update:July 22, 2013== | |||
Estimation of recombination hotspots via LD_hat | |||
Ld_hat results: | |||
{| class="wikitable" | |||
| Data || Number of Genes || || || || | |||
|- | |||
| No PA7 || 10 || [[File:No_PA7_1.png|200px]] || [[File:No_PA7_2.png|200px]] || [[File:No_PA7_3.png|200px]] || [[File:No_PA7_4.png|200px]] | |||
|- | |||
| PA7 || 10 || [[File:PA7_1.png|200px]] || [[File:PA7_2.png|200px]] || [[File:PA7_3.png|200px]] || [[File:PA7_4.png|200px]] | |||
|- | |||
| No PA7 || 50 || [[File:No_PA7_51.png|200px]] || [[File:No_PA7_6.png|200px]] || [[File:No_PA7_7.png|200px]] || [[File:No_PA7_8.png|200px]] | |||
|} | |||
==Update:July 17, 2013== | |||
Phylip dollop parsimony tree based on an matrix of ortholog changes | |||
[[parsimony output]] | |||
==Update: July 16, 2013== | |||
Treefiles based on aligned orthologs | |||
{| class="wikitable" | |||
| number of genes used|| Method || treefile | |||
|- | |||
| 100|| FASTTREE || [[treefile]] | |||
|- | |||
| 100 || FASTTREE || [[Treefile 2]] | |||
|- | |||
| all orthologs ||Phylip || [[treefile_phylip]] | |||
|- | |||
|} | |||
FleN, FleQ, Flhf unique strains | |||
{| class="wikitable" | |||
| Gene || alignment || tree | |||
|- | |||
| fleN || [[fleN unique align]] || | |||
[[File:flen_unique.png|200px]] | |||
|- | |||
| fleQ || [[fleQ unique align]] || [[File:fleq_unique.png|200px]] | |||
|- | |||
| flhf ||[[flhF unique align]] || [[File:flhf_unique.png|200px]] | |||
|- | |||
|} | |||
To do: | |||
# Create pipeline | |||
# LD_Hat w/ 10 genes excluding and including PA7 | |||
==Update: June 28, 2013== | |||
Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs | |||
{| class="wikitable" | |||
! Gene !! PAML outfile for orthologs !! | |||
|- | |||
| fleN || [[fleN PAML]] | |||
|- | |||
| fleQ || [[fleQ PAML]] | |||
|- | |||
| flhf ||[[flhF-corrected PAML]] | |||
|- | |||
|} | |||
To do: | |||
# Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/). | |||
# Analysis of positively selected genes using PAML and homozygosity analysis | |||
==Update: June 18, 2013== | ==Update: June 18, 2013== | ||
Ortholog aligning and phylogenetic tree material and methods | |||
Protein ortholog data: [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105676&start=1583956&stop=1584798&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleN], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=104960&start=1187587&stop=1189059&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleQ], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105674&start=1582528&stop=1583817&replicon_id_reference=136&alphabet=protein&limit_to_species=false flhF] | |||
# Protocol: | |||
## Fasta headers are too long for tree, run: [https://www.dropbox.com/s/x2p4joeqg7omfub/rename.pl rename.pl] to shorten names. Usage:<code> >rename.pl <FASTA_file> > <OUTPUTfilename.fas> </code> (will rewrite script to create automatic output file) | |||
## To align use [http://www.drive5.com/muscle/Alignment muscle] Usage: <code> >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict </code> | |||
## To create tree file run [http://www.clustal.org/clustal2/ clustalW2] Usage <code>>clustalw2 -infile= <Aligned_file> -output= Phylip </code> | |||
## To create tree run [http://www.r-project.org/ R] using following commands: | |||
###<code> setwd("<directory containing phylip files>")</code> | |||
###<code> library ("ape")</code> | |||
###<code> library ("phangorn")</code> | |||
###<code> <gene_name> = read.tree("<file_gene_name.phy>")</code> | |||
###<code> <gene_name> = midpoint(gene_name)</code> | |||
###<code> plot(<gene_name>)</code> | |||
{| class="wikitable" | {| class="wikitable" | ||
! Gene !! Alignment !! Tree !! Notes | ! Gene !! Alignment !! Tree !! Notes | ||
|- | |- | ||
| fleN || [[fleN pep alignment]] ||[[File:flen.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE= | | fleN || [[fleN pep alignment]] ||[[File:flen.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAN03366.1 Conserved Domain] | ||
|- | |- | ||
| fleQ || [[fleQ pep alignment]] ||[[File:fleq.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE= | | fleQ || [[fleQ pep alignment]] ||[[File:fleq.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAC37124.1 Conserved Domain] | ||
|- | |- | ||
| flhf ||[[flhF pep alignment]]||[[File:flhf.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain] | | flhf ||[[flhF pep alignment]]||[[File:flhf.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain] |
Latest revision as of 23:41, 1 December 2013
Projects
- Build a local genome database
- Database schema:
- "genome": genome_id, strain_name, ncbi_taxid
- "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
- "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
- Parsing scripts
- Rayees Parsing code, requires that you remove columns 9-27 using bash command:
cut -c 1-8
(I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
- Rayees Parsing code, requires that you remove columns 9-27 using bash command:
- Database loading scripts
- Database schema:
- Molecular Evolution of flagellum genes
- Download orthologs
- Reconstruct phylogenetic tree
- Run PAML tests
Update:August 5, 2013
Pseudomonas Pipeline Steps:
USAGE: ./pseudo_pipline.sh [options] [FASTQ file]
- Step 0: Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores).
- Step 1: Assembly of reads via ALL Paths
- Step 2: ORF finding via Glimmer
- Step 3: Gene finding via BLAST
- Step 4: Top Gene matches and Database orth_id matching via top_orth_match.pl
- Step 5: Predicted Ortholog Database import
- Step 6: Core genome extraction
- Step 7: Choose random number of orthologs via random_cds.sh
- Step 8: Translation of Orthologs via BioSeq
- Step 9: Alignment of peptide files via Muscle
- Step 10: Reverse translation of aligned sequences via align_coding_seq.pl
- Step 11: Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl
- Step 12: Creation of treefile via FASTTREE
Pseudomonas Pipeline options:
- -i: keep all intermediary files (default remove all intermediate files)
- -f: keep only intermediate fasta files
- -b: [integer] blast score threshold
- -n: [integer] number of genes to create treefile from
Update:July 22, 2013
Estimation of recombination hotspots via LD_hat Ld_hat results:
Data | Number of Genes | ||||
No PA7 | 10 | ||||
PA7 | 10 | ||||
No PA7 | 50 |
Update:July 17, 2013
Phylip dollop parsimony tree based on an matrix of ortholog changes
Update: July 16, 2013
Treefiles based on aligned orthologs
number of genes used | Method | treefile |
100 | FASTTREE | treefile |
100 | FASTTREE | Treefile 2 |
all orthologs | Phylip | treefile_phylip |
FleN, FleQ, Flhf unique strains
Gene | alignment | tree |
fleN | fleN unique align | |
fleQ | fleQ unique align | |
flhf | flhF unique align |
To do:
- Create pipeline
- LD_Hat w/ 10 genes excluding and including PA7
Update: June 28, 2013
Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs
Gene | PAML outfile for orthologs | |
---|---|---|
fleN | fleN PAML | |
fleQ | fleQ PAML | |
flhf | flhF-corrected PAML |
To do:
- Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/).
- Analysis of positively selected genes using PAML and homozygosity analysis
Update: June 18, 2013
Ortholog aligning and phylogenetic tree material and methods
Protein ortholog data: fleN, fleQ, flhF
- Protocol:
- Fasta headers are too long for tree, run: rename.pl to shorten names. Usage:
>rename.pl <FASTA_file> > <OUTPUTfilename.fas>
(will rewrite script to create automatic output file) - To align use muscle Usage:
>muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict
- To create tree file run clustalW2 Usage
>clustalw2 -infile= <Aligned_file> -output= Phylip
- To create tree run R using following commands:
setwd("<directory containing phylip files>")
library ("ape")
library ("phangorn")
<gene_name> = read.tree("<file_gene_name.phy>")
<gene_name> = midpoint(gene_name)
plot(<gene_name>)
- Fasta headers are too long for tree, run: rename.pl to shorten names. Usage:
Gene | Alignment | Tree | Notes |
---|---|---|---|
fleN | fleN pep alignment | Conserved Domain | |
fleQ | fleQ pep alignment | Conserved Domain | |
flhf | flhF pep alignment | Conserved Domain |
Benchmark: June 11, 2013
- Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
- Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
- Parsing the ortholog file to upload the "orth_orf" table (Raymond)
- Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)