Qiu Lab Meetings: Difference between revisions
Jump to navigation
Jump to search
imported>Weigang |
imported>Weigang m (→Monday, June 6) |
||
(78 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Summer 2016= | |||
==Rules of Conduct== | |||
# No eating, drinking, or loud talking in the lab. Socialize in the lobby only. | |||
# Be respectful to each other, regardless of level of study | |||
# Be on time & responsible. Communicate with the PI if late or absent | |||
==Readings & Journal Club== | |||
# A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728 | |||
# The latest tree of life: http://www.nature.com/articles/nmicrobiol201648 | |||
# Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585 | |||
# Evolutionary mechanisms in polio viruses | |||
##Fitness landscape at single-nucleotide levels: [http://www.ncbi.nlm.nih.gov/pubmed/24284629 Acevedo et al (2014)] | |||
##Recombination facilitates adaptation of polio virus: [http://www.sciencedirect.com/science/article/pii/S1931312816301019 Xiao et al (2016)] | |||
# Cancer evolution: | |||
## http://sysbio.oxfordjournals.org/content/64/1/e1.long | |||
## http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001789 | |||
## http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0929-9 | |||
==Projects== | |||
===Tick work (Lia [leader], Amanda, Saymon [after first-level])=== | |||
# Goal 1. Protocol optimization for DNA prep & PCR. Status: completed | |||
# Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated | |||
# Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated | |||
===Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]=== | |||
# Goal 1. Reconcile pf32 tree within Bbss SNP groups | |||
# Goal 2. Reconcile pf32 tree within Bbss | |||
# Goal 3. Reconcile pf32 tree with Bbsl | |||
===Pseudomonas GWAS (Rayees [leader], Roy, Ishmael; with Dr Xavier of MSKCC)=== | |||
# Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac; SFS_CODE (http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.html); AnA-FiTS (http://www.ncbi.nlm.nih.gov/pubmed/23834340) | |||
# Goal 2. Simulate phenotype (SimPheno) | |||
# Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction) | |||
===Pathogen genomics pipeline (John [leader], Zawar)=== | |||
# Goal 1. Variant call pipeline (e.g., cortex_var) | |||
# Goal 2. Variant database | |||
# Goal 3. Website | |||
===Existing projects=== | |||
# Treponema genome evolution (Amanda & Roy) | |||
# PVT1 evolution & function (Jeff [after first-level) | |||
# PhyloHMM algorithm (weigang) | |||
# Adaptive dynamics & effect of diversity to Borrelia virulence (Jiangtao & Sipa) | |||
==Weekly Schedule== | |||
===Friday, May 27, 2016. Lab meeting=== | |||
* End-of-semster celebration | |||
* Finalize EEID posters | |||
* Summer planning | |||
===Tuesday, May 31, 2016. Orientation Session 1=== | |||
# Time: 1-5 pm; Room: (to be reserved & posted) | |||
# Pre-orientation: Obtain lab accounts (Yozen); Obtain cluster accounts (Carlos) | |||
# Day 1. 1:00 - 1:30. Lab overview | |||
# Day 1. 1:30 - 2:00. [http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1 Unix Part 1] (Weigang); | |||
# Day 1. 2:00 - 2:30. Lunch break | |||
# Day 1. 2:45 - 3:20. BoreliaBase.org (Lia) Slides: [[File:BorreliaBase-intro.pptx|Slides]] | |||
# Day 1. 3:30 - 4:00. bp-utils (Saymon): [[Mini-Tutorals#Bp-utils:_sequence.2C_alignment_.26_tree_utilities_by_Qiu_Lab|tutorials]] | |||
# Day 1. 4:00 - 4:30. Servers & cluster usage (Rayees, [[A_Primer_on_the_Cluster_System_at_Hunter|Tutorial]] ) | |||
===Wed, June 1, 2016. Orientation Session 2=== | |||
# Day 2. 1:00 - 2:00. Phylogenetics/Tree Quizzes (Weigang) | |||
# Day 2. 2:00 - 2:45. Lunch break | |||
# Day 2. 2:45 - 3:15. R (Amanda). Download data set from http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/intern_data.csv2 & save as "rna_seq.csv" | |||
# Day 2. 3:30 - 4:00. SQL & SQL-embeded Perl or Python (John) | |||
# Day 2. 4:00 - 4:30. [http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part2 Unix Part 2] (Roy) | |||
# Day 2. 4:30 - 5:00. Lab Databases: bb3-dev, pa2, genome_var (weigang) | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignments. (Q1 & Q2 Due 1pm, Wed, June 1st, 2016; The rest Due Noon, Monday, June 7st, 2016) | |||
|- style="background-color:powderblue;" | |||
| | |||
# Log in lab account (first to "darwin.hunter.cuny.edu", then to "wallace") and change password (email me [weigang@genectr.hunter.cuny.edu] if you have trouble logging in) | |||
# Unix exercises: U10.1, U14.1, U16.1, U18.1, U27.1 (with emacs or vi), U29.1 & U29.2 (with emacs or vi) | |||
# Borreliabase exercises: | |||
## Download B31 genome, ORF, and protein sequences | |||
## Download ospA ortholog alignments (nucleotide & protein) | |||
## Download pf32 paralog alignments | |||
## Use BLAST to identify which gene(s) in the B31 genome contain this DNA sequence: "caagattaatattattgcaatgatattaactttaatttgcacctcatgcgcaccttttagcaaaatcgatcctaaagcaaatgcaaacactaagccaaaaaaaatcaccaatccgggggaaaacacccaaaattttgaagataaatctggagaccttagcacttctgatgaaaaaattatggaaactatcgcttcaga" | |||
## Use BLAST to identify which genes(s) in the B31 proteome contain this amino-acid sequence:"MGINSTSFYSLNMKVKPLDNVKVRKALSFAIDRKTLTESVLN" | |||
# Use bioseq to answer all the questions below. Submit only the command that you used to find the answers. | |||
## Use accession # CP002316.1 to retrieve the genbank file from NCBI. Save the output to CP002316.1.gb file. | |||
## Extract the sequences in FASTA format from file CP002316.1.gb and save the output to CP002316.1.fas file. Use the file CP002316.1.fas to answer the following question. | |||
## Count the number of sequences in the file? | |||
## Using one single command, pick the first 10 sequences from the file and find the length of them. (Hint: use pipe) | |||
## Using single command, pick third and seventh sequences from the file and then do the 3-frame translation for both sequences. Which reading frame is correct? Specify. | |||
## Using a single command, get the first 100 nucleotides of all the sequences present in the file and then do 1-frame translation for all the sub-sequences. (Hint: look for option in bioseq help page that could be use to get the subsequence and 1-frame translation. Use pipe) | |||
# Use bioaln for the following exercises. Go to /home/shared/lab_tutorial and find the sequence alignment file named “ospC.aln”. Name the format of the alignment file. Use it to answer all the questions below. Submit only the command that you used to find the answers. | |||
## Find the length of the alignment. | |||
## Count the number of the sequences present in the alignment. | |||
## How do you convert this alignment in phylip format? Save your output. | |||
## Pick “B31, N40, BOL26, JD1” from the alignment and calculate their average percent identity. (Hint: look for option in bioaln help page that could be use to pick specific sequence and calculate average percent identity. Use pipe) | |||
## Extract third sites from the alignment and show the alignment in match view. (Hint: look for option in bioaln help page that could be use to extract third site. Use pipe) | |||
## Remove the gaps from the alignment and show the final alignment in codon view. (Hint: look for option in bioaln help page that could be use to remove gap) | |||
# SQL exercises: | |||
## Login the borreliabase.org database by typing:<code>psql -h borreliabase.org -U lab -d genome_var</code> | |||
## Please write down your command to retrieve what is listed as below (don’t forget that each command should end with a “;”): | |||
## Select all columns in the table “varlist” and show the first 10 rows | |||
## From table “varlist”, select values stored in the columns “acc”, “refcodon”, “altcodon”, “protein_accession” | |||
## In the “varlist” table, select all columns where “proj_id” value is “1” from and count the selection | |||
## Select those whose “conf” value is greater than “90” and arrange your selection in an ascending order | |||
## For the values in table “var”, write an expression to output the sum of the values in the “coverage” grouped by the values in column “genome_id”, limited to where “status” are all ‘f’, arrange your selection in an ascending order | |||
## From table “genome”, select values in column “genus”; from table “var”, select values in column “var_id”, “status”, “conf”; from table “varlist”, select values in column “acc”, “refaa”, “altaa”, then join your selection together. What columns are the keys when you join the table? | |||
# Tree Quizzes [[File:Pretest.pdf|Print & hand in]] | |||
# A scripting exercise: Write a Perl or Python script to export SNPs | |||
# An R exercise in statistical analysis: Gene expression analysis using the cancer data | |||
|} | |||
===Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)=== | |||
===June 6-10, 2016=== | |||
====Monday, June 6==== | |||
* project meeting: Pathogen genome pipeline | |||
** Team: John (leader), Zawar | |||
* project meeting: trepnema operon algorithm | |||
** Team: Amanda (leader), Roy, Fatima | |||
** Schedule: Monday, Wed, & Friday 12-5 | |||
* project meeting: simulation of evolution of traits | |||
** Team: Rayees (leader), Ishemael, Jesam | |||
** Schedule: Monday, Tuesday, & Friday 12-5 | |||
* Project: bp-utils development | |||
** Team: Rocky; Khalikuz | |||
====Tuesday, June 7==== | |||
* project meeting: Borrelia genomics | |||
** Team: Saymon (leader), Sharon | |||
** Schedule: Tuesday, Thursday, and Friday 12-5 | |||
===June 13-17, 2016=== | |||
===June 20-24, 2016=== | |||
===June 27-July 1, 2016=== | |||
===July 6 - July 10=== | |||
===July 13 - July 17, 2016. Project conclusion=== | |||
===July 17 - August 20, 2016. PI vacation=== | |||
=School Year 2015= | |||
==Nov 19, 2015== | |||
* Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods) | |||
* Roy: Briefing on his Poster presentation at ABRCMS | |||
* Rayees: PA SNP call done. (meeting with MSKCC at 11am) | |||
* Weigang: ABRCMS briefing / Tools to check out | |||
** PRICE: a de novo genome assembler of short reads. [http://derisilab.ucsf.edu/software/price/PriceDocumentation130506/sampleJob.html Document Page] | |||
** [http://www.ebi.ac.uk/QuickGO/ QuickGO]: a web browser of GO terms. | |||
** [http://bioinformatics.ai.sri.com/ptools/ Pathway Tools]: for qualitative prediction of pathogenecity, operons, and pathways | |||
** [http://www.nature.com/nbt/journal/v31/n9/full/nbt.2676.html PCIRUST]: predicting functions of microbial community based on gene contents | |||
* Saymon, John & Weigang: [http://mbe.oxfordjournals.org/content/31/7/1929.full PopGenome package of R] to explore selective sweeps, linkage, and drift | |||
* Sipa: Presentation on Mathematics models of cancer development | |||
==Sept 18, 2015== | |||
* Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[http://www.nature.com/nature/journal/vnfv/ncurrent/full/nature14895.html]. Presenter: Saymon | |||
==Sept 11, 2015== | |||
* Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [http://mbe.oxfordjournals.org/content/29/2/797.long] Presenter: John | |||
** Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s]) | |||
** Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio | |||
** Key results: extensive recombination among clones; rates and tract length quantified by LD decay | |||
** My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored. | |||
==Sept 4, 2015== | |||
* Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [http://www.pnas.org/content/112/29/8893.full]. Presenter: Amanda | |||
** Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome | |||
** Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio | |||
** Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094996/figure/F5/ a Borrelia data set to understand how to identify homoplasy and recombination]) | |||
** Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either) | |||
** My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored) | |||
==Aug 28, 2015== | |||
* Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [http://gbe.oxfordjournals.org/content/7/8/2173.full%20]. Presenter: Roy | |||
** Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet | |||
** Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation | |||
** Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne) | |||
** Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion | |||
** Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures) | |||
** My overall rating: 3.5/5.0 | |||
* Project updates & plans (1:30-2) | |||
** Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes | |||
** Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs | |||
** John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis | |||
** Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis | |||
=Summer 2014= | |||
==Projects & Goals== | |||
{| class="wikitable" | |||
|- | |||
! Name !! Goal/Description !! Team | |||
|- | |||
| Pseudomonas || | |||
* Gene gain/loss | |||
* SNP analysis | |||
|| Example | |||
|- | |||
| Borrelia intergenics || Clean up start-codon positions || Example | |||
|- | |||
| SNP pipeline || Example || Example | |||
|- | |||
| Gain/Loss pipeline || Example || Example | |||
|} | |||
* Frequency distribution of ospC types in wild tick populations (Fall 2013) [[strain_natural_frequency|Project page]] | |||
* Mutual information | |||
=Summer 2013= | |||
==Projects & Goals== | ==Projects & Goals== | ||
* Borrelia population genomics: Recombination & Natural Selection (Published) | * Borrelia population genomics: Recombination & Natural Selection (Published) | ||
* Borrelia pan-genomics (Submitted) | * Borrelia pan-genomics (Submitted as of 5/25/2013) | ||
* Positive and negative selection in Borrelia ORFs and IGS ( | * Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013) | ||
* Dr Bargonetti's project (Summer 2013) | * Dr Bargonetti's project (Summer 2013) | ||
* A population genomics pipeline using MUGSY-FastTree (Summer 2013): [[Population_Genomics_Course|Project page]] | * A population genomics pipeline using MUGSY-FastTree (Summer 2013): [[Population_Genomics_Course|Project page]] | ||
Line 11: | Line 207: | ||
* Frequency distribution of ospC types in wild tick populations (Fall 2013) [[strain_natural_frequency|Project page]] | * Frequency distribution of ospC types in wild tick populations (Fall 2013) [[strain_natural_frequency|Project page]] | ||
---- | ---- | ||
==Lab meeting: June 13, 2013== | |||
* Weigang: IGS paper submission should be done by Thursday. | |||
* Che/Slav: Workshop update (Meeting at 3:30pm?) | |||
* Che: SILAC project (Meeting at 4pm?) | |||
* Zhenmao: Tick processing & paired-end Illumina sequencing | |||
* Pedro: Updates on "ncbi-orf" table | |||
* Girish: phyloSVG extension; QuBi video | |||
* Saymon and Deidre: consensus start-codons | |||
* Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny | |||
* Valentyna: BLASTn results (4:30pm?) | |||
==Lab meeting: May 23, 2013== | ==Lab meeting: May 23, 2013== | ||
Line 32: | Line 239: | ||
* Raymond: start the Pseudomonas summer project | * Raymond: start the Pseudomonas summer project | ||
---- | ---- | ||
=Foundational papers for working in Qiu Lab= | |||
* | * A recent review by Qiu lab: [http://www.ncbi.nlm.nih.gov/pubmed/24704760 Evolutionary genomics of Lyme bacteria] | ||
* | * Phylogeography of ''Borrelia burgdorferi sensu lato''. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3214628/ A review by Margos et al. 2011] | ||
* | * A short tutorial on molecular phylogenetics: [http://www.ncbi.nlm.nih.gov/pubmed/12801728 Phylogeny for the faint of heart: a tutorial] | ||
* | * The Ka/Ks test of natural selection: [http://www.ncbi.nlm.nih.gov/pubmed/12175810 The Ka/Ks ratio: diagnosing the form of sequence evolution] | ||
---- | ---- | ||
=Informatics Architecture= | |||
* Operating Systems: Linux OS/Ubuntu, Mac OS | * Operating Systems: Linux OS/Ubuntu, Mac OS | ||
* Programming languages: BASH, Perl/BioPerl, R | * Programming languages: BASH, Perl/BioPerl, R | ||
* Relational Databases: PostgreSQL | * Relational Databases: PostgreSQL | ||
* Software architecture | * Software architecture | ||
** | ** bb3: Borrelia Genome Database. To access: <code>psql -h borreliabase.org -U lab bb3</code> | ||
** | ** Pseudomonas Genome Database. To access: <code>psql -h ortholog -U lab paerug</code> | ||
** DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [https://sourceforge.net/p/dnatwizzer/home/Home/] | ** DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [https://sourceforge.net/p/dnatwizzer/home/Home/] | ||
** SimBac: A Perl/Moose package for simulating bacterial genome evolution [http://sourceforge.net/projects/bacsim/files/] | ** SimBac: A Perl/Moose package for simulating bacterial genome evolution [http://sourceforge.net/projects/bacsim/files/] | ||
** | ** [http://borreliabase.org BorreliaBase] | ||
---- | ---- | ||
=Perl Challenges= | |||
{| class="wikitable" | {| class="wikitable" | ||
! Problem | ! Problem | ||
Line 67: | Line 270: | ||
| None | | None | ||
| 64 codons, one per line (using loops) | | 64 codons, one per line (using loops) | ||
|- | |||
| Count amino acids | |||
| A protein sequence | |||
| Frequency counts of individual amino acids | |||
|- | |||
| Count codons | |||
| A protein-coding DNA sequence | |||
| Frequency counts of individual codons | |||
|- | |- | ||
| Random sequence 1 | | Random sequence 1 |
Latest revision as of 22:01, 6 June 2016
Summer 2016
Rules of Conduct
- No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
- Be respectful to each other, regardless of level of study
- Be on time & responsible. Communicate with the PI if late or absent
Readings & Journal Club
- A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
- The latest tree of life: http://www.nature.com/articles/nmicrobiol201648
- Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585
- Evolutionary mechanisms in polio viruses
- Fitness landscape at single-nucleotide levels: Acevedo et al (2014)
- Recombination facilitates adaptation of polio virus: Xiao et al (2016)
- Cancer evolution:
Projects
Tick work (Lia [leader], Amanda, Saymon [after first-level])
- Goal 1. Protocol optimization for DNA prep & PCR. Status: completed
- Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated
- Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated
Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]
- Goal 1. Reconcile pf32 tree within Bbss SNP groups
- Goal 2. Reconcile pf32 tree within Bbss
- Goal 3. Reconcile pf32 tree with Bbsl
Pseudomonas GWAS (Rayees [leader], Roy, Ishmael; with Dr Xavier of MSKCC)
- Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac; SFS_CODE (http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.html); AnA-FiTS (http://www.ncbi.nlm.nih.gov/pubmed/23834340)
- Goal 2. Simulate phenotype (SimPheno)
- Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction)
Pathogen genomics pipeline (John [leader], Zawar)
- Goal 1. Variant call pipeline (e.g., cortex_var)
- Goal 2. Variant database
- Goal 3. Website
Existing projects
- Treponema genome evolution (Amanda & Roy)
- PVT1 evolution & function (Jeff [after first-level)
- PhyloHMM algorithm (weigang)
- Adaptive dynamics & effect of diversity to Borrelia virulence (Jiangtao & Sipa)
Weekly Schedule
Friday, May 27, 2016. Lab meeting
- End-of-semster celebration
- Finalize EEID posters
- Summer planning
Tuesday, May 31, 2016. Orientation Session 1
- Time: 1-5 pm; Room: (to be reserved & posted)
- Pre-orientation: Obtain lab accounts (Yozen); Obtain cluster accounts (Carlos)
- Day 1. 1:00 - 1:30. Lab overview
- Day 1. 1:30 - 2:00. Unix Part 1 (Weigang);
- Day 1. 2:00 - 2:30. Lunch break
- Day 1. 2:45 - 3:20. BoreliaBase.org (Lia) Slides: Slides
- Day 1. 3:30 - 4:00. bp-utils (Saymon): tutorials
- Day 1. 4:00 - 4:30. Servers & cluster usage (Rayees, Tutorial )
Wed, June 1, 2016. Orientation Session 2
- Day 2. 1:00 - 2:00. Phylogenetics/Tree Quizzes (Weigang)
- Day 2. 2:00 - 2:45. Lunch break
- Day 2. 2:45 - 3:15. R (Amanda). Download data set from http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/intern_data.csv2 & save as "rna_seq.csv"
- Day 2. 3:30 - 4:00. SQL & SQL-embeded Perl or Python (John)
- Day 2. 4:00 - 4:30. Unix Part 2 (Roy)
- Day 2. 4:30 - 5:00. Lab Databases: bb3-dev, pa2, genome_var (weigang)
Assignments. (Q1 & Q2 Due 1pm, Wed, June 1st, 2016; The rest Due Noon, Monday, June 7st, 2016) |
---|
|
Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)
June 6-10, 2016
Monday, June 6
- project meeting: Pathogen genome pipeline
- Team: John (leader), Zawar
- project meeting: trepnema operon algorithm
- Team: Amanda (leader), Roy, Fatima
- Schedule: Monday, Wed, & Friday 12-5
- project meeting: simulation of evolution of traits
- Team: Rayees (leader), Ishemael, Jesam
- Schedule: Monday, Tuesday, & Friday 12-5
- Project: bp-utils development
- Team: Rocky; Khalikuz
Tuesday, June 7
- project meeting: Borrelia genomics
- Team: Saymon (leader), Sharon
- Schedule: Tuesday, Thursday, and Friday 12-5
June 13-17, 2016
June 20-24, 2016
June 27-July 1, 2016
July 6 - July 10
July 13 - July 17, 2016. Project conclusion
July 17 - August 20, 2016. PI vacation
School Year 2015
Nov 19, 2015
- Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods)
- Roy: Briefing on his Poster presentation at ABRCMS
- Rayees: PA SNP call done. (meeting with MSKCC at 11am)
- Weigang: ABRCMS briefing / Tools to check out
- PRICE: a de novo genome assembler of short reads. Document Page
- QuickGO: a web browser of GO terms.
- Pathway Tools: for qualitative prediction of pathogenecity, operons, and pathways
- PCIRUST: predicting functions of microbial community based on gene contents
- Saymon, John & Weigang: PopGenome package of R to explore selective sweeps, linkage, and drift
- Sipa: Presentation on Mathematics models of cancer development
Sept 18, 2015
- Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon
Sept 11, 2015
- Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
- Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
- Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
- Key results: extensive recombination among clones; rates and tract length quantified by LD decay
- My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.
Sept 4, 2015
- Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
- Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
- Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
- Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
- Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
- My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)
Aug 28, 2015
- Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
- Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
- Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
- Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
- Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
- Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
- My overall rating: 3.5/5.0
- Project updates & plans (1:30-2)
- Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
- Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
- John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
- Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis
Summer 2014
Projects & Goals
Name | Goal/Description | Team |
---|---|---|
Pseudomonas |
|
Example |
Borrelia intergenics | Clean up start-codon positions | Example |
SNP pipeline | Example | Example |
Gain/Loss pipeline | Example | Example |
- Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
- Mutual information
Summer 2013
Projects & Goals
- Borrelia population genomics: Recombination & Natural Selection (Published)
- Borrelia pan-genomics (Submitted as of 5/25/2013)
- Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013)
- Dr Bargonetti's project (Summer 2013)
- A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
- Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
- Pseudomonas population genomics (Summer 2013) Project page
- Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
- Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
- Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
Lab meeting: June 13, 2013
- Weigang: IGS paper submission should be done by Thursday.
- Che/Slav: Workshop update (Meeting at 3:30pm?)
- Che: SILAC project (Meeting at 4pm?)
- Zhenmao: Tick processing & paired-end Illumina sequencing
- Pedro: Updates on "ncbi-orf" table
- Girish: phyloSVG extension; QuBi video
- Saymon and Deidre: consensus start-codons
- Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
- Valentyna: BLASTn results (4:30pm?)
Lab meeting: May 23, 2013
- May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
- Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
- Weigang: IGS paper submission
- Che: Thesis update/SILAC project/Summer teaching
- Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
- Pedro: Catlyst web framework
- Girish: cp26 phylogenomic analysis
- Saymon and Deidre: consensus start-codons
Lab meeting: May 16, 2013
- Weigang: IGS paper submitted yet?
- Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
- Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
- Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
- Saymon/Deidre: Identification of consensus start-codon positions
- Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
- Raymond: start the Pseudomonas summer project
Foundational papers for working in Qiu Lab
- A recent review by Qiu lab: Evolutionary genomics of Lyme bacteria
- Phylogeography of Borrelia burgdorferi sensu lato. A review by Margos et al. 2011
- A short tutorial on molecular phylogenetics: Phylogeny for the faint of heart: a tutorial
- The Ka/Ks test of natural selection: The Ka/Ks ratio: diagnosing the form of sequence evolution
Informatics Architecture
- Operating Systems: Linux OS/Ubuntu, Mac OS
- Programming languages: BASH, Perl/BioPerl, R
- Relational Databases: PostgreSQL
- Software architecture
- bb3: Borrelia Genome Database. To access:
psql -h borreliabase.org -U lab bb3
- Pseudomonas Genome Database. To access:
psql -h ortholog -U lab paerug
- DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [5]
- SimBac: A Perl/Moose package for simulating bacterial genome evolution [6]
- BorreliaBase
- bb3: Borrelia Genome Database. To access:
Perl Challenges
Problem | Input | Output |
---|---|---|
DNA transcription | A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) | An RNA sequence, in 5'-3' direction |
Genetic code | None | 64 codons, one per line (using loops) |
Count amino acids | A protein sequence | Frequency counts of individual amino acids |
Count codons | A protein-coding DNA sequence | Frequency counts of individual codons |
Random sequence 1 | None | Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies |
Random sequence 2 | None | Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A. |
Graphics I | a categorical dataset, e.g., Biology | a bar graph & a pie char, using GD::Simple or Postscript::Simple |