Pseudomonas population genomics: Difference between revisions

Latest revision as of 23:41, 1 December 2013

Projects

Build a local genome database
1. Database schema:
  1. "genome": genome_id, strain_name, ncbi_taxid
  2. "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
  3. "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
2. Parsing scripts
  1. Rayees Parsing code, requires that you remove columns 9-27 using bash command: cut -c 1-8 (I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
3. Database loading scripts
Molecular Evolution of flagellum genes
1. Download orthologs
2. Reconstruct phylogenetic tree
3. Run PAML tests

Update:August 5, 2013

Pseudomonas Pipeline Steps:

USAGE: ./pseudo_pipline.sh [options] [FASTQ file]

Step 0: Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores).
Step 1: Assembly of reads via ALL Paths
Step 2: ORF finding via Glimmer
Step 3: Gene finding via BLAST
Step 4: Top Gene matches and Database orth_id matching via top_orth_match.pl
Step 5: Predicted Ortholog Database import
Step 6: Core genome extraction
Step 7: Choose random number of orthologs via random_cds.sh
Step 8: Translation of Orthologs via BioSeq
Step 9: Alignment of peptide files via Muscle
Step 10: Reverse translation of aligned sequences via align_coding_seq.pl
Step 11: Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl
Step 12: Creation of treefile via FASTTREE

Pseudomonas Pipeline options:

-i: keep all intermediary files (default remove all intermediate files)
-f: keep only intermediate fasta files
-b: [integer] blast score threshold
-n: [integer] number of genes to create treefile from

Update:July 22, 2013

Estimation of recombination hotspots via LD_hat Ld_hat results:

Data	Number of Genes
No PA7	10
PA7	10
No PA7	50

Update:July 17, 2013

Phylip dollop parsimony tree based on an matrix of ortholog changes

parsimony output

Update: July 16, 2013

Treefiles based on aligned orthologs

number of genes used	Method	treefile
100	FASTTREE	treefile
100	FASTTREE	Treefile 2
all orthologs	Phylip	treefile_phylip

FleN, FleQ, Flhf unique strains

Gene	alignment	tree
fleN	fleN unique align
fleQ	fleQ unique align
flhf	flhF unique align

To do:

Create pipeline
LD_Hat w/ 10 genes excluding and including PA7

Update: June 28, 2013

Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs

Gene	PAML outfile for orthologs
fleN	fleN PAML
fleQ	fleQ PAML
flhf	flhF-corrected PAML

To do:

Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/).
Analysis of positively selected genes using PAML and homozygosity analysis

Update: June 18, 2013

Ortholog aligning and phylogenetic tree material and methods

Protein ortholog data: fleN, fleQ, flhF

Protocol:
1. Fasta headers are too long for tree, run: rename.pl to shorten names. Usage: >rename.pl <FASTA_file> > <OUTPUTfilename.fas> (will rewrite script to create automatic output file)
2. To align use muscle Usage: >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict
3. To create tree file run clustalW2 Usage >clustalw2 -infile= <Aligned_file> -output= Phylip
4. To create tree run R using following commands:
  1. setwd("<directory containing phylip files>")
  2. library ("ape")
  3. library ("phangorn")
  4. <gene_name> = read.tree("<file_gene_name.phy>")
  5. <gene_name> = midpoint(gene_name)
  6. plot(<gene_name>)

Gene	Alignment	Notes
fleN	fleN pep alignment	Conserved Domain
fleQ	fleQ pep alignment	Conserved Domain
flhf	flhF pep alignment	Conserved Domain

Benchmark: June 11, 2013

Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
1. Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
Parsing the ortholog file to upload the "orth_orf" table (Raymond)
Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)

@@ Line 12: / Line 12: @@
 ## Reconstruct phylogenetic tree
 ## Run PAML tests
+==Update:August 5, 2013==
+Pseudomonas Pipeline Steps:
+USAGE: ./pseudo_pipline.sh [options] [FASTQ file]
+*'''Step 0''': Creation of Simulated reads via GemSim: http://www.biomedcentral.com/1471-2164/13/74 (outputs FastQ file with quality scores).
+*'''Step 1''': Assembly of reads via ALL Paths
+*'''Step 2''': ORF finding via Glimmer
+*'''Step 3''': Gene finding via BLAST
+*'''Step 4''': Top Gene matches and Database orth_id matching via top_orth_match.pl
+*'''Step 5''': Predicted Ortholog Database import
+*'''Step 6''': Core genome extraction
+*'''Step 7''': Choose random number of orthologs via random_cds.sh
+*'''Step 8''': Translation of Orthologs via BioSeq
+*'''Step 9''': Alignment of peptide files via Muscle
+*'''Step 10''': Reverse translation of aligned sequences via align_coding_seq.pl
+*'''Step 11''': Creation of Aligned Fasta file via AlignConcat_Bioperl.final.pl
+*'''Step 12''': Creation of treefile via FASTTREE
+Pseudomonas Pipeline options:
+#-i: keep all intermediary files (default remove all intermediate files)
+#-f: keep only intermediate fasta files
+#-b: [integer] blast score threshold
+#-n: [integer] number of genes to create treefile from
+==Update:July 22, 2013==
+Estimation of recombination hotspots via LD_hat
+Ld_hat results:
+{| class="wikitable"
+| Data || Number of Genes ||  ||  || ||
+|-
+| No PA7 || 10 || [[File:No_PA7_1.png|200px]] || [[File:No_PA7_2.png|200px]] || [[File:No_PA7_3.png|200px]] || [[File:No_PA7_4.png|200px]]
+|-
+| PA7 || 10 || [[File:PA7_1.png|200px]] || [[File:PA7_2.png|200px]]  || [[File:PA7_3.png|200px]]  || [[File:PA7_4.png|200px]]
+|-
+| No PA7 || 50 || [[File:No_PA7_51.png|200px]] || [[File:No_PA7_6.png|200px]] || [[File:No_PA7_7.png|200px]] || [[File:No_PA7_8.png|200px]]
+|}
+==Update:July 17, 2013==
+Phylip dollop parsimony tree based on an matrix of ortholog changes
+[[parsimony output]]
+==Update: July 16, 2013==
+Treefiles based on aligned orthologs
+{| class="wikitable"
+| number of genes used|| Method || treefile
+|-
+| 100|| FASTTREE || [[treefile]]
+|-
+| 100 || FASTTREE || [[Treefile 2]]
+|-
+| all orthologs ||Phylip || [[treefile_phylip]]
+|-
+|}
+FleN, FleQ, Flhf unique strains
+{| class="wikitable"
+| Gene || alignment || tree
+|-
+| fleN || [[fleN unique align]] ||
+[[File:flen_unique.png|200px]]
+|-
+| fleQ || [[fleQ unique align]] || [[File:fleq_unique.png|200px]]
+|-
+| flhf ||[[flhF unique align]] || [[File:flhf_unique.png|200px]]
+|-
+|}
+To do:
+# Create pipeline
+# LD_Hat w/ 10 genes excluding and including PA7
+==Update: June 28, 2013==
+Phylogenetic Analysis by Maximum Likelihood (PAML) Test performed on FleN, FleQ, FlhF orthologs and aeruginosa only orthologs
+{| class="wikitable"
+! Gene !! PAML outfile for orthologs !!
+|-
+| fleN || [[fleN PAML]]
+|-
+| fleQ || [[fleQ PAML]]
+|-
+| flhf ||[[flhF-corrected PAML]]
+|-
+|}
+To do:
+# Estimate genome tree using MrBayes and BEST (http://www.stat.osu.edu/~dkp/BEST/introduction/).
+# Analysis of positively selected genes using PAML and homozygosity analysis
 ==Update: June 18, 2013==
-# Material and Methods
+Ortholog aligning and phylogenetic tree material and methods
-## Protein ortholog data: [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105676&start=1583956&stop=1584798&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleN], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=104960&start=1187587&stop=1189059&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleQ], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105674&start=1582528&stop=1583817&replicon_id_reference=136&alphabet=protein&limit_to_species=false flhF]
+ Protein ortholog data: [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105676&start=1583956&stop=1584798&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleN], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=104960&start=1187587&stop=1189059&replicon_id_reference=136&alphabet=protein&limit_to_species=false fleQ], [http://pseudomonas.com/alignPolymorphicGeneSequencesStep1.do?feature_id_parent=105674&start=1582528&stop=1583817&replicon_id_reference=136&alphabet=protein&limit_to_species=false flhF]
-## Protocol:
+# Protocol:
-### Fasta headers are too long for tree, run: [[File:renamer.pl renamer]] editing/rule (script name?); Alignment (tool and command); Phylogeny; Tree display (R commands)
+## Fasta headers are too long for tree, run: [https://www.dropbox.com/s/x2p4joeqg7omfub/rename.pl rename.pl] to shorten names. Usage:<code> >rename.pl <FASTA_file> > <OUTPUTfilename.fas> </code> (will rewrite script to create automatic output file)
+## To align use [http://www.drive5.com/muscle/Alignment muscle] Usage: <code> >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict </code>
+## To create tree file run [http://www.clustal.org/clustal2/ clustalW2] Usage <code>>clustalw2 -infile= <Aligned_file> -output= Phylip </code>
+## To create tree run [http://www.r-project.org/ R] using following commands:
+###<code> setwd("<directory containing phylip files>")</code>
+###<code> library ("ape")</code>
+###<code> library ("phangorn")</code>
+###<code> <gene_name> = read.tree("<file_gene_name.phy>")</code>
+###<code> <gene_name> = midpoint(gene_name)</code>
+###<code> plot(<gene_name>)</code>
 {| class="wikitable"
 ! Gene !! Alignment !! Tree !! Notes
 |-
-| fleN || [[fleN pep alignment]] ||[[File:flen.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain]
+| fleN || [[fleN pep alignment]] ||[[File:flen.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAN03366.1 Conserved Domain]
 |-
-| fleQ || [[fleQ pep alignment]] ||[[File:fleq.png|200px]]  || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain]
+| fleQ || [[fleQ pep alignment]] ||[[File:fleq.png|200px]]  || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AAC37124.1 Conserved Domain]
 |-
 | flhf ||[[flhF pep alignment]]||[[File:flhf.png|200px]] || [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=AEV61607.1 Conserved Domain]

Pseudomonas population genomics: Difference between revisions

Latest revision as of 23:41, 1 December 2013

Contents

Projects

Update:August 5, 2013

Update:July 22, 2013

Update:July 17, 2013

Update: July 16, 2013

Update: June 28, 2013

Update: June 18, 2013

Benchmark: June 11, 2013

Navigation menu

Pseudomonas population genomics: Difference between revisions

Latest revision as of 23:41, 1 December 2013

Projects

Update:August 5, 2013

Update:July 22, 2013

Update:July 17, 2013

Update: July 16, 2013

Update: June 28, 2013

Update: June 18, 2013

Benchmark: June 11, 2013

Navigation menu

Search