Pseudomonas population genomics

Projects

Build a local genome database
1. Database schema:
  1. "genome": genome_id, strain_name, ncbi_taxid
  2. "orf": genome_id, locus_tag, start, stop, strand, genome_name, product_name
  3. "orth_orf": orth_orf_id, locus_name, genome_id, orth_class
2. Parsing scripts
  1. Rayees Parsing code, requires that you remove columns 9-27 using bash command: cut -c 1-8 (I will write a bash script that does this and runs the program) https://www.dropbox.com/s/lpxxbkxeyw7frrn/parser.pl
3. Database loading scripts
Molecular Evolution of flagellum genes
1. Download orthologs
2. Reconstruct phylogenetic tree
3. Run PAML tests

Material and Methods
1. Protein ortholog data: fleN, fleQ, flhF
2. Protocol:
  1. Fasta headers are too long for tree, run: rename.pl to shorten names. Usage: >rename.pl <FASTA_file> > <OUTPUTfilename.fas> (will rewrite script to create automatic output file)
  2. To alignmuscle Usage: > muscle -in <FASTA_File> -out <OUTPUTfilename.aln> -clwstrict Phylogeny; Tree display (R commands)

Gene	Alignment	Notes
fleN	fleN pep alignment	Conserved Domain
fleQ	fleQ pep alignment	Conserved Domain
flhf	flhF pep alignment	Conserved Domain

Finish parsing the genome files to upload the "orf" table (Raymond & Rayees)
1. Rayees Parsed genome files: https://www.dropbox.com/sh/k0zktvvmv39op9i/1zBercEky8
Parsing the ortholog file to upload the "orth_orf" table (Raymond)
Identify and download fleN, fleQ, and flhF orthologs & align them (Rayees)