BigData 2020: Difference between revisions
imported>Weigang mNo edit summary |
imported>Weigang m (→Schedule) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! Lyme Disease (Borreliella) !! | ! Lyme Disease (Borreliella) !! CoV Genome Tracker !! Coronavirus evolutuon | ||
|- | |- | ||
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] || | | [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] || | ||
[[File: | [[File:Cov-screenshot-1.png|300px|thumbnail| [http://cov.genometracker.org/ Haplotype network] ]] | ||
|| | || | ||
[[File: | [[File:Cov-screenshot-2.png|300px|thumbnail| Spike protein alignment ]] | ||
|} | |} | ||
</center> | </center> | ||
Line 29: | Line 29: | ||
==Case studies from Qiu Lab== | ==Case studies from Qiu Lab== | ||
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens] | * [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens] | ||
* [http://cov.genometracker.org Covid-19 Genome Tracker] | * [http://cov.genometracker.org Covid-19 Genome Tracker] | ||
==Essential bioinformatics skills== | ==Essential bioinformatics skills== | ||
* Linux command-line interface (e.g., BASH shell) | * Linux command-line interface (e.g., BASH shell) | ||
* Familiarity with a programming language (e.g., Python or Perl) | * Familiarity with a programming language (e.g., Python or Perl) | ||
* Data visualization & statistical analysis (e.g., JavaScript; the R statistical computing environment) | * Data visualization & statistical analysis (e.g., JavaScript; the R statistical computing environment) | ||
==Learning Goals== | ==Learning Goals== | ||
Line 56: | Line 42: | ||
==Schedule== | ==Schedule== | ||
* 9:00 - 9:25: Introduction; [http://rstudio.org Install R & R Studio]; Download fasta file & save as " | * 9:00 - 9:25: Introduction; [http://rstudio.org Install R & R Studio]; Download fasta file & save as "spike.fasta" : [[File:Spike2.txt|thumbnail]] | ||
* 9:30 - 10:00: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1 Part I. Unix Basics]) | * 9:30 - 10:00: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1 Part I. Unix Basics]) | ||
* 10:05 - 10:30: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part2 Part II Advanced Unix]) | * 10:05 - 10:30: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part2 Part II Advanced Unix]) | ||
* 10:35 - 11:00: Tree-thinking Quizzes: Slides [[File:Big-data-phylogeny.pptx|thumbnail]] & Handouts [[File:Pretest.pdf|thumbnail]] | * 10:35 - 11:00: Tree-thinking Quizzes: Slides [[File:Big-data-phylogeny.pptx|thumbnail]] & Handouts [[File:Pretest.pdf|thumbnail]] | ||
* 11:05 - 12: | * 11:05 - 12: [http://www.phylogeny.fr Build molecular phylogeny online] | ||
==Exercises & Challenges== | ==Exercises & Challenges== |
Latest revision as of 21:06, 15 July 2020
Lyme Disease (Borreliella) | CoV Genome Tracker | Coronavirus evolutuon |
---|---|---|
What is evolutionary genomics?
Genomes differ among individuals and species. Evolutionary genomics studies genome variability and genome changes using evolutionary principles. Typical applications in pathogen research include molecular epidemiology (e.g., wildlife origin of SARS-CoV-2 & tracking Covid-19 spread), molecular evolution (e.g., identify key genes and protein sequences contributing to virulence and immune escape), and vaccine design (e.g., influenza vaccine based on latest circulating strains).
Genome changes are studied at two distinct levels: (1) within-species/within-population variations (e.g., genomic changes during Covid-19 pandemic), and (2) between-species divergence (e.g., difference between SARS-CoV-1 and SARS-CoV-2).
The key for analyzing genome variations within species is "population-thinking", the idea that there is no one individual genome that is standard, normal, or "wildtype".
The key for comparing genomes across species is "tree-thinking", the idea that evolution happens by diversification (like a branching tree), not by climbing a ladder. There is no such thing as "advanced" or "primitive" species. All living species have the exact same evolutionary distances/time of divergence since the origin of life.
Case studies from Qiu Lab
Essential bioinformatics skills
- Linux command-line interface (e.g., BASH shell)
- Familiarity with a programming language (e.g., Python or Perl)
- Data visualization & statistical analysis (e.g., JavaScript; the R statistical computing environment)
Learning Goals
- Be able to compare evolutionary relationships using phylogenetic trees
- Be able to use command-line tools for batch-processing of genome files
- Be able to perform genome-wide association analysis on the R platform
Schedule
- 9:00 - 9:25: Introduction; Install R & R Studio; Download fasta file & save as "spike.fasta" :
- 9:30 - 10:00: Unix Tutorial (Part I. Unix Basics)
- 10:05 - 10:30: Unix Tutorial (Part II Advanced Unix)
- 10:35 - 11:00: Tree-thinking Quizzes: Slides & Handouts
- 11:05 - 12: Build molecular phylogeny online
Exercises & Challenges
- Finish Tree Thinking Quizzes
- Unix exercises:
- count the number of sequences using "grep -v" or "wc"
- display the first 5 lines of a file
- display the last 5 lines of a file
- change upper-cases to lower-cases
- change "|" to "_"
- replace strings