Computational Genomics (KIZ, Fall 2024): Difference between revisions

From QiuLab
Jump to navigation Jump to search
No edit summary
(80 intermediate revisions by the same user not shown)
Line 1: Line 1:
<center>'''Computational Genomics'''</center>
[[File:Banner-comp-genomics.png|800px|center]]
<center>Sept, Oct & Nov of 2024</center>
<center>Thursdays 8:30-11:30am, Oct 10 - Dec 10, 2024</center>
<center>'''Guest Instructor:''' Weigang Qiu, Ph.D.<br>Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center<br>Adjunct Faculty, Department of Physiology and Biophysics,
<center>'''Guest Instructor:''' Weigang Qiu, Ph.D.<br>Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center<br>Adjunct Faculty, Department of Physiology and Biophysics,
Institute for Computational Biomedicine, Weil Cornell Medical College</center>
Institute for Computational Biomedicine, Weil Cornell Medical College</center>
Line 7: Line 7:
<center>'''Lab Website:''' https://wiki.genometracker.org</center>
<center>'''Lab Website:''' https://wiki.genometracker.org</center>
<br>
<br>
<center>'''Assistants''': Mr Bei Liu & Dr Charles Adeola</center>
<center>'''Host''': Dr Yun Gao, Ph.D.<br>[http://english.kiz.cas.cn/ Kunming Institute of Zoology (KIZ)]</center>
<center>'''Host''': Dr Yun Gao, Ph.D.<br>[http://english.kiz.cas.cn/ Kunming Institute of Zoology (KIZ)]</center>
----
----
==Course Overview==
==Course Overview==
Welcome to BioMedical Genomics, a computer workshop for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.  
Welcome to Computational Genomics, a 9-week computer workshop for graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.  


Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.  
Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.  
Line 20: Line 21:
==Learning goals==
==Learning goals==
By the end of this course successful students will be able to:  
By the end of this course successful students will be able to:  
* Describe next-generation sequencing  (NGS) technologies & contrast it with traditional Sanger sequencing
* Use Linux commands & compose simple shell scripts to automate a bioinformatics pipeline
* Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome.
* Program in Python for parsing texts and simulating evolution
* Visualize and explore genomics data using RStudio
* Visualize data and perform statistical analysis using R/RStudio
* Replicate key results using a raw data set produced by a primary research paper
* Compose a bioinformatics research report


==Web Links==
==Web Links==
Line 29: Line 30:
* Install R Studio (Desktop version): http://www.rstudio.com/download
* Install R Studio (Desktop version): http://www.rstudio.com/download
* Download: [http://www.r4all.org/books/datasets R datasets]
* Download: [http://www.r4all.org/books/datasets R datasets]
* A reference book: [https://r4ds.had.co.nz/ R for Data Science (Wickharm & Grolemund)]
* A reference book: [https://r4ds.hadley.nz/ R for Data Science (Wickharm et al)]
* Github repository: [https://github.com/weigangq/CSB-BIOL425/tree/master/lecture-materials Computational Skills for Biologists (Allesina & Wilmes)]


==Quizzes and Exams==
==Assignments, Quizzes, and Final Report==
Student performance will be evaluated by attendance, three (4) quizzes and a final report:
Student performance will be evaluated by attendance, three (3) quizzes, six (6) assignments, and a final report:
* Attendance: 50 pts
* Attendance & participation: 30 pts
* Assignments: 5 x 10 = 50 pts  
* Assignments: 6 x 10 = 60 pts  
* Open-book Quizzes: 2 x 25 pts = 50 pts
* Open-Book Quizzes: 3 x 20 pts = 60 pts
* Take-home Mid-term: 50 pts
* Final presentation: 50 pts
* Final presentation: 50 pts
Total: 250 pts
Total: 200 pts


==Course Schedule==
==Course Schedule==
===Week 1, Tuesday, Sep 24, 2024===
===Week 1, Thursday, Oct 10, 2024===
* Computer setup: Linux terminals
* Introudction. Lecture slides: [[File:QiuLab-CUNY-Hunter.pdf|thumb]]
* Linux Tutorial: Linux commands
* Computer setup: gitee accounts (for course management). [https://edu.gitee.com/huntercollege/courses/3030/noticeboard Course link on gitee]
* Assignment 1.
* <span style="color: blue">Survey 1: Genomics & Data Science</span> PDF file: [[File:KIZ-survey-1.pdf|thumb]]
===Week 2, Tuesday, Oct 01, 2024===
* <span style="color: blue">Survey 2: Tree-thinking Skills</span> PDF file: [[File:Pretest-1.pdf|thumb]]
* Computer setup: conda; jupyter-notebook
* Computer setup: Linux accounts (on the "phylonet.net" server)
* Python Tutorial 1: Basic Python
* Lecture: Tree-thinking Skills. Lecture slides: [[File:Phylogeny-lecture-slides-2024.pdf]]
* Assignment 2
 
===Week 3, Tuesday, Oct 08, 2024===
===Week 2, Thursday, Oct 17, 2024===
* <span style="color: red">Quiz #1</span>
* Git usages (by Mr Liu)
* Python Tutorial 2: Advanced Python
** Demo: homework submission using Gitee
* Assignment 3
** Download the course repository: <code>git clone https://gitee.com/huntercollege/comp-genomics-kiz.git</code>
===Week 4, Tuesday, Oct 15, 2024===
* Linux Tutorial I. [[File:Intro-unix-KIZ.pdf|thumb]]
* Python Tutorial 3: Regular expression & scientific computing with Python
* Phylogenetics lecture: [[File:Part-1-tree-thinking.pdf|thumb]]
* Assignment 4
** Tree manipulations: reroot & tree distances
===Week 5, Tuesday, Oct 22, 2024===
** Gene tree vs Species tree; orthologous and paralogous genes; tree pruning & collapsing
* Quiz #2
 
* Computer setup: R & RStudio
===Week 3, Thursday, Oct 24, 2024===
* R Tutorial 1: Basic R
* Review: Tree manipulations
* Assignment 5
* <span style="color: red">Quiz #1. tree terms & tree manipulations</span> (20 pts); open-book; 9-10am, in-class
===Week 6, Tuesday, Oct 29, 2024===
* <span style="color: green">Linux Tutorial II: BpWrapper Toolkit (https://github.com/bioperl/p5-bpwrapper)</span> Updated sldies: [[File:Intro-unix-KIZ.pdf|thumb]]
* R Tutorial 2: Data visualization & basic statistics
 
* Assignment 6
===Week 4, Thursday, Oct 31, 2024===
===Week 7, Tuesday, Nov 05, 2024===
* <span style="color: red">Quiz #2. Linux commands & BpWrappper toolkits</span>: 20 pts, 9-10:30am
* Quiz #3
* <span style="color: orange">AFSV Genomics I </span>: Download and align genomes.
* Genomics Tutorial 1: Cluster analysis & scRNA analysis
** Fork a copy of [https://edu.gitee.com/huntercollege/projects/696311/repos/huntercollege/afsv-genomics/sources the ASFV project repository].
* Final report (draft 1: Background, Hypothesis, Significance, Material & Methods)
** Instructions to fork a repository (prepared by Mr Liu): https://zwmqn249t3y.feishu.cn/wiki/OVdEwB00ciNx9Xkxvxsc7lJEncc?from=from_copylink
===Week 8, Tuesday, Nov 12, 2024===
** Project overview: Lecture slides: [[File:Afsv-project-kiz.pdf|thumb]]
* Genomics Tutorial 2. NGS data analysis
 
* Final report (draft 2: Results & Discussion)
===Week 5, Thursday, Nov 7, 2024===
===Week 9, Tuesday, Nov 19, 2024===
* ASFV project:
* Genomics Tutorial 3. Monte Carlo simulations of genome evolution
** <code> git clone https://gitee.com/huntercollege/asfv-genomics.git </code>
* Final report (draft 3: Conclusions, future directions, reference; Due Nov 26)
** Align genomes: protocol-1
** SNP calls; quality check by Ts/Tv ratio (Protocol 1)
** Annotate SNPs: Protocol 2
** IQ-tree to get genome tree: Protocol-3
* Trait evolution and comparative analysis [[File:Trait-evolution-KIZ.pdf|thumb]]
* Assignment: Pre-test 2
 
===Week 6, Thursday, Nov 14, 2024===
* Computer setup: R & RStudio; <span style="color: orange">R Tutorial Part 1: Basic R & Data manipulations</span>
* Lecture slides: [[File:R-tutorials-KIZ-part-1.pdf|thumb]]
* Assignment: Practice-1 & -2
 
===Week 7, Thursday, Nov 21, 2024===
* R tutorials:
** part 2 (Visualization & Statistics). Slides: [[File:R-tutorials-KIZ-part-2.pdf|thumb]]
** Assignment: Practice-3 & -4
* Final project:
** Introduction to Ka/Ks analysis
** File distribution: each student is assigned 10 random genes; Run <code>git pull</code>; or if you haven't cloned the repository, run <code>git clone https://gitee.com/huntercollege/asfv-genomics.git</code>
** Follow the protocol in "doc/protocol-4-paml.txt"
 
===Week 8, Thursday, Nov 28, 2024===
* <span style="color: red">Quiz #3. R exercises</span>
* Final project:
** Visualize tree with <code>ggtree</code>
** Plot Ka/Ks for genes
** Run IQ-TREE to obtain site-specific rates; Plot site-specific rates
 
===Week 9, Thursday, Dec 5, 2024===
* Final presentations
 
===Week 10, Thursday, Dec 10, 2024===
* Final presentations

Revision as of 14:46, 19 November 2024

Banner-comp-genomics.png
Thursdays 8:30-11:30am, Oct 10 - Dec 10, 2024
Guest Instructor: Weigang Qiu, Ph.D.
Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center
Adjunct Faculty, Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weil Cornell Medical College
Office: B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA
Email: wqiu@hunter.cuny.edu
Lab Website: https://wiki.genometracker.org


Assistants: Mr Bei Liu & Dr Charles Adeola
Host: Dr Yun Gao, Ph.D.
Kunming Institute of Zoology (KIZ)

Course Overview

Welcome to Computational Genomics, a 9-week computer workshop for graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.

Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.

This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises, using published studies.

The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.

Learning goals

By the end of this course successful students will be able to:

  • Use Linux commands & compose simple shell scripts to automate a bioinformatics pipeline
  • Program in Python for parsing texts and simulating evolution
  • Visualize data and perform statistical analysis using R/RStudio
  • Compose a bioinformatics research report

Web Links

Assignments, Quizzes, and Final Report

Student performance will be evaluated by attendance, three (3) quizzes, six (6) assignments, and a final report:

  • Attendance & participation: 30 pts
  • Assignments: 6 x 10 = 60 pts
  • Open-Book Quizzes: 3 x 20 pts = 60 pts
  • Final presentation: 50 pts

Total: 200 pts

Course Schedule

Week 1, Thursday, Oct 10, 2024

Week 2, Thursday, Oct 17, 2024

Week 3, Thursday, Oct 24, 2024

Week 4, Thursday, Oct 31, 2024

Week 5, Thursday, Nov 7, 2024

Week 6, Thursday, Nov 14, 2024

Week 7, Thursday, Nov 21, 2024

Week 8, Thursday, Nov 28, 2024

  • Quiz #3. R exercises
  • Final project:
    • Visualize tree with ggtree
    • Plot Ka/Ks for genes
    • Run IQ-TREE to obtain site-specific rates; Plot site-specific rates

Week 9, Thursday, Dec 5, 2024

  • Final presentations

Week 10, Thursday, Dec 10, 2024

  • Final presentations