Computational Genomics (KIZ, Fall 2024): Difference between revisions
No edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 104: | Line 104: | ||
===Week 9, Thursday, Dec 5, 2024=== | ===Week 9, Thursday, Dec 5, 2024=== | ||
* Final presentations | * Final presentations: student demos & trouble shooting | ||
===Week 10, Thursday, Dec 10, 2024=== | ===Week 10, Thursday, Dec 10, 2024=== | ||
* Final presentations | * ASFV project overview slides: [[File:Asfv-project-kiz-Dec-5-2024.pdf|thumb]] | ||
* Final presentations (30 pts): | |||
** (10 pts) 3 slides & 5 min | |||
** (15 pts) Show & interpret results for a single gene (no need to show all genes on PPT, although you will upload all MLC and tree files). Pick a genes that is the most interesting to you (e.g., significant position selection, apparent inconsistency with the genome tree, important gene function, etc). | |||
** (5 pts) Conclusions & future directions. Find gene name and function from this paper: https://www.mdpi.com/2076-2615/14/15/2187 (you need to first find the gene name using the "BA71.gff3" file with <code>grep</code>) | |||
* Course evaluation |
Latest revision as of 07:39, 5 December 2024
Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center
Adjunct Faculty, Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weil Cornell Medical College
Kunming Institute of Zoology (KIZ)
Course Overview
Welcome to Computational Genomics, a 9-week computer workshop for graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.
Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.
This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises, using published studies.
The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.
Learning goals
By the end of this course successful students will be able to:
- Use Linux commands & compose simple shell scripts to automate a bioinformatics pipeline
- Program in Python for parsing texts and simulating evolution
- Visualize data and perform statistical analysis using R/RStudio
- Compose a bioinformatics research report
Web Links
- Install R base: https://cloud.r-project.org
- Install R Studio (Desktop version): http://www.rstudio.com/download
- Download: R datasets
- A reference book: R for Data Science (Wickharm et al)
- Github repository: Computational Skills for Biologists (Allesina & Wilmes)
Assignments, Quizzes, and Final Report
Student performance will be evaluated by attendance, three (3) quizzes, six (6) assignments, and a final report:
- Attendance & participation: 30 pts
- Assignments: 6 x 10 = 60 pts
- Open-Book Quizzes: 3 x 20 pts = 60 pts
- Final presentation: 50 pts
Total: 200 pts
Course Schedule
Week 1, Thursday, Oct 10, 2024
- Introudction. Lecture slides: File:QiuLab-CUNY-Hunter.pdf
- Computer setup: gitee accounts (for course management). Course link on gitee
- Survey 1: Genomics & Data Science PDF file: File:KIZ-survey-1.pdf
- Survey 2: Tree-thinking Skills PDF file: File:Pretest-1.pdf
- Computer setup: Linux accounts (on the "phylonet.net" server)
- Lecture: Tree-thinking Skills. Lecture slides: File:Phylogeny-lecture-slides-2024.pdf
Week 2, Thursday, Oct 17, 2024
- Git usages (by Mr Liu)
- Demo: homework submission using Gitee
- Download the course repository:
git clone https://gitee.com/huntercollege/comp-genomics-kiz.git
- Linux Tutorial I. File:Intro-unix-KIZ.pdf
- Phylogenetics lecture: File:Part-1-tree-thinking.pdf
- Tree manipulations: reroot & tree distances
- Gene tree vs Species tree; orthologous and paralogous genes; tree pruning & collapsing
Week 3, Thursday, Oct 24, 2024
- Review: Tree manipulations
- Quiz #1. tree terms & tree manipulations (20 pts); open-book; 9-10am, in-class
- Linux Tutorial II: BpWrapper Toolkit (https://github.com/bioperl/p5-bpwrapper) Updated sldies: File:Intro-unix-KIZ.pdf
Week 4, Thursday, Oct 31, 2024 (Halloween)
- Quiz #2. Linux commands & BpWrappper toolkits: 20 pts, 9-10:30am
- AFSV Genomics I : Download and align genomes.
- Fork a copy of the ASFV project repository.
- Instructions to fork a repository (prepared by Mr Liu): https://zwmqn249t3y.feishu.cn/wiki/OVdEwB00ciNx9Xkxvxsc7lJEncc?from=from_copylink
- Project overview: Lecture slides: File:Afsv-project-kiz.pdf
Week 5, Thursday, Nov 7, 2024
- ASFV project:
git clone https://gitee.com/huntercollege/asfv-genomics.git
- Align genomes: protocol-1
- SNP calls; quality check by Ts/Tv ratio (Protocol 1)
- Annotate SNPs: Protocol 2
- IQ-tree to get genome tree: Protocol-3
- Trait evolution and comparative analysis File:Trait-evolution-KIZ.pdf
- Assignment: Pre-test 2
Week 6, Thursday, Nov 14, 2024
- Computer setup: R & RStudio; R Tutorial Part 1: Basic R & Data manipulations. Lecture slides File:R-tutorials-KIZ-part-1.pdf
- Assignment: Practice-1 & -2
Week 7, Thursday, Nov 21, 2024
- R Tutorial 2. Data visualization & statistics. Slides: File:R-tutorials-KIZ-part-2.pdf
- Assignment: Practice-3 & -4
- Final project:
- Introduction to Ka/Ks analysis
- File distribution: each student is assigned 10 random genes; Run
git pull
; or if you haven't cloned the repository, rungit clone https://gitee.com/huntercollege/asfv-genomics.git
- Follow the protocol in "doc/protocol-4-paml.txt"
Week 8, Thursday, Nov 28, 2024 (Thanksgiving)
- R Tutorial 3. Cluster analysis:
- Part 4. Heatmap (hierarchical clustering) & principal component analysis (PCA)
- Part 5. Gene expression analysis.
- Assignment: reproduce the cluster analysis
- Final project: R Markdown Demo
- Visualize tree with
ggtree
- Plot Ka/Ks for genes
- Run IQ-TREE to obtain site-specific rates; Plot site-specific rates
- Visualize tree with
Week 9, Thursday, Dec 5, 2024
- Final presentations: student demos & trouble shooting
Week 10, Thursday, Dec 10, 2024
- ASFV project overview slides: File:Asfv-project-kiz-Dec-5-2024.pdf
- Final presentations (30 pts):
- (10 pts) 3 slides & 5 min
- (15 pts) Show & interpret results for a single gene (no need to show all genes on PPT, although you will upload all MLC and tree files). Pick a genes that is the most interesting to you (e.g., significant position selection, apparent inconsistency with the genome tree, important gene function, etc).
- (5 pts) Conclusions & future directions. Find gene name and function from this paper: https://www.mdpi.com/2076-2615/14/15/2187 (you need to first find the gene name using the "BA71.gff3" file with
grep
)
- Course evaluation