Pseudomonas population genomics: Difference between revisions

Revision as of 21:21, 18 June 2013

Build a local genome database
1. Database schema:
  1. "genome": genome_id, strain_name, ncbi_taxid
  2. "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
  3. "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
2. Parsing scripts
  1. Rayees Parsing code, requires that you remove columns 9-27 using bash command: cut -c 1-8 (I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
3. Database loading scripts
Molecular Evolution of flagellum genes
1. Download orthologs
2. Reconstruct phylogenetic tree
3. Run PAML tests

Gene	Alignment	Notes
fleN	fleN pep alignment	Conserved Domain
fleQ	fleQ pep alignment	Conserved Domain
flhf	flhF pep alignment	Conserved Domain

Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
1. Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
Parsing the ortholog file to upload the "orth_orf" table (Raymond)
Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)

@@ Line 18: / Line 18: @@
 ## Protocol:
 ### Fasta headers are too long for tree, run: [https://www.dropbox.com/s/x2p4joeqg7omfub/rename.pl rename.pl] to shorten names. Usage:<code> >rename.pl <FASTA_file> > <OUTPUTfilename.fas> </code> (will rewrite script to create automatic output file)
-### To align use [http://www.drive5.com/muscle/Alignment muscle] Usage: <code> >muscle -in <FASTA_File> -out <OUTPUTfilename.aln> -clwstrict </code>
+### To align use [http://www.drive5.com/muscle/Alignment muscle] Usage: <code> >muscle -in <FASTA_File> -out <OUTPUTfilename> -clwstrict </code>
 ### To create tree file run [http://www.clustal.org/clustal2/ clustalW2] Usage <code>>clustalw2 -infile= <Aligned_file.aln> -output= Phylip </code>
 ### To create tree run [http://www.r-project.org/ R] using following commands: