BigData 2020: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
(Created page with "<center>[http://bigdata.citytech.cuny.edu/ City Tech/Cornell BioMedical Big Data Week 2019]: '''Pathogen Evolutionary Genomics'''</center> <center>Wed, July 22, 2019, 9-12</ce...")
 
imported>Weigang
mNo edit summary
Line 1: Line 1:
<center>[http://bigdata.citytech.cuny.edu/ City Tech/Cornell BioMedical Big Data Week 2019]: '''Pathogen Evolutionary Genomics'''</center>
<center>[http://bigdata.citytech.cuny.edu/ City Tech/Cornell BioMedical Big Data Week 2020]: '''Pathogen Evolutionary Genomics'''</center>
<center>Wed, July 22, 2019, 9-12</center>
<center>Wed, July 22, 2020, 9 am - 12 noon</center>
<center>'''Instructor:''' Dr Weigang Qiu, Professor, Department of Biological Sciences </center>
<center>'''Instructor:''' Dr Weigang Qiu, Professor, Department of Biological Sciences </center>
<center>'''Office:''' B402 Belfer Research Building</center>
<center>'''Office:''' B402 Belfer Research Building</center>
Line 8: Line 8:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Lyme Disease (Borreliella) !! Volcano plot !! Heat map
! Lyme Disease (Borreliella) !! Coronaviruses !! SARS-CoV-2
|-
|-
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] ||  
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] ||  
Line 19: Line 19:


==What is evolutionary genomics?==
==What is evolutionary genomics?==
Genomes differ among individuals and species. Evolutionary genomics studies genome variability and genome changes using evolutionary principles. Typical applications include identification of human genome variations associated with diseases and identification of pathogen virulence genes.
Genomes differ among individuals and species. Evolutionary genomics studies genome variability and genome changes using evolutionary principles. Typical applications in pathogen research include molecular epidemiology (e.g., wildlife origin of SARS-CoV-2 & tracking Covid-19 spread), molecular evolution (e.g., identify key genes and protein sequences contributing to virulence and immune escape), and vaccine design (e.g., influenza vaccine based on latest circulating strains).


Genome changes are studied at two distinct levels: (1) within-species/within-population variations (e.g., human genetic variation), and (2) between-species divergence (e.g., human-mouse comparisons).  
Genome changes are studied at two distinct levels: (1) within-species/within-population variations (e.g., genomic changes during Covid-19 pandemic), and (2) between-species divergence (e.g., difference between SARS-CoV-1 and SARS-CoV-2).  


The key for analyzing genome variations within species is "population-thinking", the idea that there is no one individual genome that is standard, normal, or disease-free.
The key for analyzing genome variations within species is "population-thinking", the idea that there is no one individual genome that is standard, normal, or "wildtype".


The key for comparing genomes across species is "tree-thinking", the idea that evolution happens by diversification (like a branching tree), not by climbing a ladder. There is no such thing as "advanced" or "primitive" species. All living species have the exact same evolutionary distances/time of divergence since the origin of life.
The key for comparing genomes across species is "tree-thinking", the idea that evolution happens by diversification (like a branching tree), not by climbing a ladder. There is no such thing as "advanced" or "primitive" species. All living species have the exact same evolutionary distances/time of divergence since the origin of life.


==Case studies from Qiu Lab==
==Case studies from Qiu Lab==
* Between-sepcies genome comparisons: Comparative genomics of worldwide Lyme disease pathogens. [http://borreliabase.org/ BorreliaBase] (Figure 1)
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
* Within-population genome comparison: Genomic epidemiology of Group B Streptococcus: [http://diverge.hunter.cuny.edu/~weigang/gbs-browser/%20 Gene gains & losses associated with Group B Streptococcus virulence]
* Evolutionary origin of coronaviruses
* Within-host genome evolution: Evolution of multi-drug antibiotic-resistance Pseudomonas in cancer patients (Figure 2)
* [http://cov.genometracker.org Covid-19 Genome Tracker]  


==Bioinformatics workflow for comparative analysis of bacterial pathogen genomes==
==Bioinformatics workflow for comparative analysis of bacterial pathogen genomes==

Revision as of 20:46, 14 July 2020

City Tech/Cornell BioMedical Big Data Week 2020: Pathogen Evolutionary Genomics
Wed, July 22, 2020, 9 am - 12 noon
Instructor: Dr Weigang Qiu, Professor, Department of Biological Sciences
Office: B402 Belfer Research Building
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) Coronaviruses SARS-CoV-2
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
p-value (y-axis) vs. fold change (x-axis)
genes significantly down or up-regulated (at p<1e-4)

What is evolutionary genomics?

Genomes differ among individuals and species. Evolutionary genomics studies genome variability and genome changes using evolutionary principles. Typical applications in pathogen research include molecular epidemiology (e.g., wildlife origin of SARS-CoV-2 & tracking Covid-19 spread), molecular evolution (e.g., identify key genes and protein sequences contributing to virulence and immune escape), and vaccine design (e.g., influenza vaccine based on latest circulating strains).

Genome changes are studied at two distinct levels: (1) within-species/within-population variations (e.g., genomic changes during Covid-19 pandemic), and (2) between-species divergence (e.g., difference between SARS-CoV-1 and SARS-CoV-2).

The key for analyzing genome variations within species is "population-thinking", the idea that there is no one individual genome that is standard, normal, or "wildtype".

The key for comparing genomes across species is "tree-thinking", the idea that evolution happens by diversification (like a branching tree), not by climbing a ladder. There is no such thing as "advanced" or "primitive" species. All living species have the exact same evolutionary distances/time of divergence since the origin of life.

Case studies from Qiu Lab

Bioinformatics workflow for comparative analysis of bacterial pathogen genomes

  • Pathogen isolation -> DNA extraction -> Library preparation -> High-through sequencing
  • De novo genome assembly (canu; velvet; etc)
  • Identify reference genome from NCBI database (kraken)
  • Variant call (bwa; cortex_var; samtools mpileup)
  • Infer genome phylogeny (muscle; reXML)
  • Annotation (PATRIC)
  • Custom genome browser (JavaScript; D3 library for interactive graphics)

Essential bioinformatics skills

  • Linux command-line interface (e.g., BASH shell)
  • Familiarity with a programming language (e.g., Python or Perl)
  • Data visualization & statistical analysis (e.g., JavaScript; the R statistical computing environment)

Textbooks for genome evolution

  • Graur, 2016, Molecular and Genome Evolution, First Edition, Sinauer Associates, Inc. ISBN: 978-1-60535-469-9. Publisher's Website
  • Baum & Smith, 2013. Tree Thinking: an Introduction to Phylogenetic Biology, Roberts & Company Publishers, Inc.

Learning Goals

  • Be able to compare evolutionary relationships using phylogenetic trees
  • Be able to use command-line tools for batch-processing of genome files
  • Be able to perform genome-wide association analysis on the R platform

Schedule

Exercises & Challenges

  • Finish Tree Thinking Quizzes
  • Unix exercises:
    • count the number of sequences using "grep -v" or "wc"
    • display the first 5 lines of a file
    • display the last 5 lines of a file
    • change upper-cases to lower-cases
    • change "|" to "_"
    • replace strings