Qiu Lab Meetings: Difference between revisions
Jump to navigation
Jump to search
imported>Weigang |
imported>Weigang m (→Monday, June 6) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
# The latest tree of life: http://www.nature.com/articles/nmicrobiol201648 | # The latest tree of life: http://www.nature.com/articles/nmicrobiol201648 | ||
# Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585 | # Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585 | ||
# Recombination facilitates adaptation of polio virus: [http://www.sciencedirect.com/science/article/pii/S1931312816301019 Xiao et al (2016)] | # Evolutionary mechanisms in polio viruses | ||
##Fitness landscape at single-nucleotide levels: [http://www.ncbi.nlm.nih.gov/pubmed/24284629 Acevedo et al (2014)] | |||
##Recombination facilitates adaptation of polio virus: [http://www.sciencedirect.com/science/article/pii/S1931312816301019 Xiao et al (2016)] | |||
# Cancer evolution: | # Cancer evolution: | ||
## http://sysbio.oxfordjournals.org/content/64/1/e1.long | ## http://sysbio.oxfordjournals.org/content/64/1/e1.long | ||
Line 88: | Line 90: | ||
## Extract third sites from the alignment and show the alignment in match view. (Hint: look for option in bioaln help page that could be use to extract third site. Use pipe) | ## Extract third sites from the alignment and show the alignment in match view. (Hint: look for option in bioaln help page that could be use to extract third site. Use pipe) | ||
## Remove the gaps from the alignment and show the final alignment in codon view. (Hint: look for option in bioaln help page that could be use to remove gap) | ## Remove the gaps from the alignment and show the final alignment in codon view. (Hint: look for option in bioaln help page that could be use to remove gap) | ||
# SQL exercises: | |||
## Login the borreliabase.org database by typing:<code>psql -h borreliabase.org -U lab -d genome_var</code> | |||
## Please write down your command to retrieve what is listed as below (don’t forget that each command should end with a “;”): | |||
## Select all columns in the table “varlist” and show the first 10 rows | |||
## From table “varlist”, select values stored in the columns “acc”, “refcodon”, “altcodon”, “protein_accession” | |||
## In the “varlist” table, select all columns where “proj_id” value is “1” from and count the selection | |||
## Select those whose “conf” value is greater than “90” and arrange your selection in an ascending order | |||
## For the values in table “var”, write an expression to output the sum of the values in the “coverage” grouped by the values in column “genome_id”, limited to where “status” are all ‘f’, arrange your selection in an ascending order | |||
## From table “genome”, select values in column “genus”; from table “var”, select values in column “var_id”, “status”, “conf”; from table “varlist”, select values in column “acc”, “refaa”, “altaa”, then join your selection together. What columns are the keys when you join the table? | |||
# Tree Quizzes [[File:Pretest.pdf|Print & hand in]] | # Tree Quizzes [[File:Pretest.pdf|Print & hand in]] | ||
# A scripting exercise: Write a Perl or Python script to export SNPs | # A scripting exercise: Write a Perl or Python script to export SNPs | ||
Line 95: | Line 106: | ||
===Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)=== | ===Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)=== | ||
===June 6-10, 2016=== | ===June 6-10, 2016=== | ||
====Monday, June 6==== | |||
* project meeting: Pathogen genome pipeline | |||
** Team: John (leader), Zawar | |||
* project meeting: trepnema operon algorithm | |||
** Team: Amanda (leader), Roy, Fatima | |||
** Schedule: Monday, Wed, & Friday 12-5 | |||
* project meeting: simulation of evolution of traits | |||
** Team: Rayees (leader), Ishemael, Jesam | |||
** Schedule: Monday, Tuesday, & Friday 12-5 | |||
* Project: bp-utils development | |||
** Team: Rocky; Khalikuz | |||
====Tuesday, June 7==== | |||
* project meeting: Borrelia genomics | |||
** Team: Saymon (leader), Sharon | |||
** Schedule: Tuesday, Thursday, and Friday 12-5 | |||
===June 13-17, 2016=== | ===June 13-17, 2016=== | ||
===June 20-24, 2016=== | ===June 20-24, 2016=== |
Latest revision as of 22:01, 6 June 2016
Summer 2016
Rules of Conduct
- No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
- Be respectful to each other, regardless of level of study
- Be on time & responsible. Communicate with the PI if late or absent
Readings & Journal Club
- A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
- The latest tree of life: http://www.nature.com/articles/nmicrobiol201648
- Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585
- Evolutionary mechanisms in polio viruses
- Fitness landscape at single-nucleotide levels: Acevedo et al (2014)
- Recombination facilitates adaptation of polio virus: Xiao et al (2016)
- Cancer evolution:
Projects
Tick work (Lia [leader], Amanda, Saymon [after first-level])
- Goal 1. Protocol optimization for DNA prep & PCR. Status: completed
- Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated
- Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated
Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]
- Goal 1. Reconcile pf32 tree within Bbss SNP groups
- Goal 2. Reconcile pf32 tree within Bbss
- Goal 3. Reconcile pf32 tree with Bbsl
Pseudomonas GWAS (Rayees [leader], Roy, Ishmael; with Dr Xavier of MSKCC)
- Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac; SFS_CODE (http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.html); AnA-FiTS (http://www.ncbi.nlm.nih.gov/pubmed/23834340)
- Goal 2. Simulate phenotype (SimPheno)
- Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction)
Pathogen genomics pipeline (John [leader], Zawar)
- Goal 1. Variant call pipeline (e.g., cortex_var)
- Goal 2. Variant database
- Goal 3. Website
Existing projects
- Treponema genome evolution (Amanda & Roy)
- PVT1 evolution & function (Jeff [after first-level)
- PhyloHMM algorithm (weigang)
- Adaptive dynamics & effect of diversity to Borrelia virulence (Jiangtao & Sipa)
Weekly Schedule
Friday, May 27, 2016. Lab meeting
- End-of-semster celebration
- Finalize EEID posters
- Summer planning
Tuesday, May 31, 2016. Orientation Session 1
- Time: 1-5 pm; Room: (to be reserved & posted)
- Pre-orientation: Obtain lab accounts (Yozen); Obtain cluster accounts (Carlos)
- Day 1. 1:00 - 1:30. Lab overview
- Day 1. 1:30 - 2:00. Unix Part 1 (Weigang);
- Day 1. 2:00 - 2:30. Lunch break
- Day 1. 2:45 - 3:20. BoreliaBase.org (Lia) Slides: Slides
- Day 1. 3:30 - 4:00. bp-utils (Saymon): tutorials
- Day 1. 4:00 - 4:30. Servers & cluster usage (Rayees, Tutorial )
Wed, June 1, 2016. Orientation Session 2
- Day 2. 1:00 - 2:00. Phylogenetics/Tree Quizzes (Weigang)
- Day 2. 2:00 - 2:45. Lunch break
- Day 2. 2:45 - 3:15. R (Amanda). Download data set from http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/intern_data.csv2 & save as "rna_seq.csv"
- Day 2. 3:30 - 4:00. SQL & SQL-embeded Perl or Python (John)
- Day 2. 4:00 - 4:30. Unix Part 2 (Roy)
- Day 2. 4:30 - 5:00. Lab Databases: bb3-dev, pa2, genome_var (weigang)
Assignments. (Q1 & Q2 Due 1pm, Wed, June 1st, 2016; The rest Due Noon, Monday, June 7st, 2016) |
---|
|
Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)
June 6-10, 2016
Monday, June 6
- project meeting: Pathogen genome pipeline
- Team: John (leader), Zawar
- project meeting: trepnema operon algorithm
- Team: Amanda (leader), Roy, Fatima
- Schedule: Monday, Wed, & Friday 12-5
- project meeting: simulation of evolution of traits
- Team: Rayees (leader), Ishemael, Jesam
- Schedule: Monday, Tuesday, & Friday 12-5
- Project: bp-utils development
- Team: Rocky; Khalikuz
Tuesday, June 7
- project meeting: Borrelia genomics
- Team: Saymon (leader), Sharon
- Schedule: Tuesday, Thursday, and Friday 12-5
June 13-17, 2016
June 20-24, 2016
June 27-July 1, 2016
July 6 - July 10
July 13 - July 17, 2016. Project conclusion
July 17 - August 20, 2016. PI vacation
School Year 2015
Nov 19, 2015
- Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods)
- Roy: Briefing on his Poster presentation at ABRCMS
- Rayees: PA SNP call done. (meeting with MSKCC at 11am)
- Weigang: ABRCMS briefing / Tools to check out
- PRICE: a de novo genome assembler of short reads. Document Page
- QuickGO: a web browser of GO terms.
- Pathway Tools: for qualitative prediction of pathogenecity, operons, and pathways
- PCIRUST: predicting functions of microbial community based on gene contents
- Saymon, John & Weigang: PopGenome package of R to explore selective sweeps, linkage, and drift
- Sipa: Presentation on Mathematics models of cancer development
Sept 18, 2015
- Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon
Sept 11, 2015
- Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
- Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
- Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
- Key results: extensive recombination among clones; rates and tract length quantified by LD decay
- My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.
Sept 4, 2015
- Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
- Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
- Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
- Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
- Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
- My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)
Aug 28, 2015
- Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
- Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
- Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
- Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
- Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
- Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
- My overall rating: 3.5/5.0
- Project updates & plans (1:30-2)
- Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
- Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
- John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
- Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis
Summer 2014
Projects & Goals
Name | Goal/Description | Team |
---|---|---|
Pseudomonas |
|
Example |
Borrelia intergenics | Clean up start-codon positions | Example |
SNP pipeline | Example | Example |
Gain/Loss pipeline | Example | Example |
- Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
- Mutual information
Summer 2013
Projects & Goals
- Borrelia population genomics: Recombination & Natural Selection (Published)
- Borrelia pan-genomics (Submitted as of 5/25/2013)
- Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013)
- Dr Bargonetti's project (Summer 2013)
- A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
- Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
- Pseudomonas population genomics (Summer 2013) Project page
- Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
- Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
- Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
Lab meeting: June 13, 2013
- Weigang: IGS paper submission should be done by Thursday.
- Che/Slav: Workshop update (Meeting at 3:30pm?)
- Che: SILAC project (Meeting at 4pm?)
- Zhenmao: Tick processing & paired-end Illumina sequencing
- Pedro: Updates on "ncbi-orf" table
- Girish: phyloSVG extension; QuBi video
- Saymon and Deidre: consensus start-codons
- Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
- Valentyna: BLASTn results (4:30pm?)
Lab meeting: May 23, 2013
- May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
- Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
- Weigang: IGS paper submission
- Che: Thesis update/SILAC project/Summer teaching
- Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
- Pedro: Catlyst web framework
- Girish: cp26 phylogenomic analysis
- Saymon and Deidre: consensus start-codons
Lab meeting: May 16, 2013
- Weigang: IGS paper submitted yet?
- Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
- Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
- Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
- Saymon/Deidre: Identification of consensus start-codon positions
- Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
- Raymond: start the Pseudomonas summer project
Foundational papers for working in Qiu Lab
- A recent review by Qiu lab: Evolutionary genomics of Lyme bacteria
- Phylogeography of Borrelia burgdorferi sensu lato. A review by Margos et al. 2011
- A short tutorial on molecular phylogenetics: Phylogeny for the faint of heart: a tutorial
- The Ka/Ks test of natural selection: The Ka/Ks ratio: diagnosing the form of sequence evolution
Informatics Architecture
- Operating Systems: Linux OS/Ubuntu, Mac OS
- Programming languages: BASH, Perl/BioPerl, R
- Relational Databases: PostgreSQL
- Software architecture
- bb3: Borrelia Genome Database. To access:
psql -h borreliabase.org -U lab bb3
- Pseudomonas Genome Database. To access:
psql -h ortholog -U lab paerug
- DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [5]
- SimBac: A Perl/Moose package for simulating bacterial genome evolution [6]
- BorreliaBase
- bb3: Borrelia Genome Database. To access:
Perl Challenges
Problem | Input | Output |
---|---|---|
DNA transcription | A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) | An RNA sequence, in 5'-3' direction |
Genetic code | None | 64 codons, one per line (using loops) |
Count amino acids | A protein sequence | Frequency counts of individual amino acids |
Count codons | A protein-coding DNA sequence | Frequency counts of individual codons |
Random sequence 1 | None | Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies |
Random sequence 2 | None | Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A. |
Graphics I | a categorical dataset, e.g., Biology | a bar graph & a pie char, using GD::Simple or Postscript::Simple |