Biol375 2016: Difference between revisions
imported>Weigang |
imported>Weigang |
||
(19 intermediate revisions by the same user not shown) | |||
Line 159: | Line 159: | ||
|} | |} | ||
* <font color="gray">10/10 (M). No Class (Columbus Day)</font> | * <font color="gray">10/10 (M). No Class (Columbus Day)</font> | ||
* 10/13 (Th). Genome and gene evolution. | * 10/13 (Th). Genome and gene evolution. Lecture Slides: [[File:Part-2-trait-evolution-2016.pdf|thumbnail]] | ||
* 10/17 (M). Review & Practices. | * 10/17 (M). Review & Practices. | ||
* 10/20 (Th). '''Midterm Exam 2''' | * 10/20 (Th). '''Midterm Exam 2''' | ||
===Part 3. Tree Algorithms=== | ===Part 3. Tree Algorithms=== | ||
* 10/24 (M). BLAST & Alignments (Chapter 3. pages 93-100) | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignment #6 (5 pts; Due 10/31) | |||
|- style="background-color:white;" | |||
| Based on the [http://www.ncbi.nlm.nih.gov/gene/54205 NCBI Gene Page for cytochrome C (CYCS)], answer the following questions: | |||
* What is the molecular function of CYCS? | |||
* Describe its chromosomal location and gene structure (number of introns and exons, length of protein) | |||
* Click the link "HomoloGene" and then in the section "Pairwise alignments generated using BLAST", run BLAST between Human and Mouse protein sequences. Show BLAST report. | |||
* Pick another species and generate a BLAST alignment between the Human and this species. Show BLAST report. <font color="red">Explain the meaning of "Expect" by rephrasing from [http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#expect this page]</font> | |||
|} | |||
* 10/27 (TH). Genetic distances & Sequence-evolutionary models (Chapter 3, pages 79-88) | |||
* 10/31 (M). Maximum parsimony (Chapter 5, pages 191-194). In class exercise #6 | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignment #7 (10 pts; <font color="red">Due 11/7, Monday (Part-3 lecture slides posted below)</font>) | |||
|- style="background-color:white;" | |||
| | |||
# [Do NOT use computer for this part] Compare [[Datafile|these two Ebola VP30 sequences]], one from the 2014 outbreak and the other from the 1994 outbreak. | |||
## Calculate the proportion of difference (''p'') between the two sequences | |||
## Calculate Jukes-Cantor distance (''d'') between the two sequences (specify unit) | |||
## Count the number of transitions and transversions (arrange in a table, as we did in the class) | |||
## Identify the number of synonymous and nonysynonymous substitutions | |||
## Assuming that the total number of synonymous sites S=174 and the total number of nonsynonymous sites N=690, calculate <i>K<sub>S</sub> and K<sub>A</sub></i> (with Jukes-Cantor correction) | |||
# [Computer Exercise] Calculate & compare genetic distances among the primate mitochondria sequences using R-Studio | |||
## Make sure you have a file "Mt_primate.txt" in your working directory (e.g., "/Users/john/Documents") [Note: Refer back to Assignment #3 if you couldn't locate the file.] | |||
## Load library: library(ape) | |||
## Read alignment: mt = read.FASTA("Mt_primate.txt") | |||
## Calculate raw distance: mt.raw = dist.dna(mt, model = "raw") | |||
## Apply Juke-Cantor (one-parameter model) correction: mt.jc = dist.dna(mt, model = "JC") | |||
## Apply Kimura(two-parameter model, for Ts and Tv) correction: mt.k80 = dist.dna(mt, model = "K80") | |||
## Plot JC distance vs the raw distance: plot(mt.raw, mt.jc, xlab = "uncorrected distance (diff/site)", ylab = "corrected distance (sub/site)", xlim = c(0,0.4), ylim = c(0,0.5), las =1) | |||
## Add a 1:1 line: abline(0,1, col = "red") | |||
## Add K80 distances: points(mt.raw, mt.k80, pch = 3, col = "blue") | |||
## Add a legend: legend(0.05, 0.45, legend = c("JC (1-parameter)", "K80 (2-parameter)"), pch = c(1,3), col = c("black","blue"), bty = "n") | |||
## Export an PDF and print a copy | |||
## Use the graph to explain (1) Why it is necessary to correct for raw distances when comparing sequences from distantly related species; (2) What is the key difference between the K80 and JC models | |||
|} | |||
* 11/3 (TH). Distance methods (Chapter 5, pages 184-187). Lecture Slides: [[File:Part-3-tree-construction.pdf|thumbnail]] | |||
* 11/7 (M). Likelihood & Bayesian methods (Chapter 5, pages 194-198) | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignment #8 (10 pts; Due 11/14, Monday) | |||
|- style="background-color:white;" | |||
| | |||
# Comparison of distance and parsimony trees (review previous assignments for detailed R-Studio instructions) | |||
## In R studio, load the "ape" and "phangnorm" libraries | |||
## Read the "Mt_primate.txt" file, save as "aln" | |||
## Obtain a distance tree: | |||
### Calculate K80 distance matrix, save as "mt.dist" | |||
### Obtain a neighbor-joining tree: tree.nj = NJ(mt.dist) | |||
### Plot a midpoint rooted tree: plot(midpoint(tree.nj)) | |||
### Add a scale bar: add.scale.bar() | |||
### Print tree and answer this question: what does the distance represent? What is the unit? | |||
## Obtain a maximum parsimony tree | |||
### Convert object to a different class: aln.phy = as.phyDat(aln) | |||
### Search maximum parsimony tree.mp = optim.parsimony(tree.nj, aln.phy) | |||
### Get tree distance: tree.mp = acctran(tree.mp, aln.phy) | |||
### Plot tree: plot(midpoint(tree.mp)) | |||
### Add a scale bar: add.scale.bar() | |||
### Print tree and answer the question: what does the distance represent? What is the unit? | |||
## Compare the two trees and explain the differences in these two methods: Which one uses full sequence information and why? | |||
# Bootstrap analysis | |||
## Read alignment: aln.fas = read.dna("Mt_primate.txt", format = "fasta") | |||
## Create a function for re-rooted distance tree: f = function(x) root(nj(dist.dna(x)), outgroup = c("lemur", "tarsier"), resolve.root = T) | |||
## Calculate a tree: tr = f(aln.fas) | |||
## Perform bootstrap for 100 pseudo-replicates: boot.trees = boot.phylo(tr, aln.fas, f, B=100, rooted =T) | |||
## Plot tree: plot(tr, no.margin = T) | |||
## Add bootstrap values as node labels: nodelabels(boot.trees, bg= "white") | |||
## Explain (1) Does bootstrap test for tree precision or tree accuracy? (2) What does a bootstrap value of 80% mean? | |||
|} | |||
* 11/10 (TH). Tree-testing & Review (Chapter 5, pages 207-209). Lecture slides: to be posted | |||
* 11/14 (M). '''Midterm Exam 3''' | |||
===Part 4. Population Genetics === | ===Part 4. Population Genetics (Chapter 2) === | ||
* 11/17 (Th). Mechanism of molecular evolution: Overview (pages 35-38) | |||
* 11/21 (Mon). SNP statistics & Genetic Drift (pages 47-49) | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignment #9 (10 pts; Due 12/1, Thursday) | |||
|- style="background-color:lightblue;" | |||
|[[File:Snp-pa1.png|thumbnail]] | |||
The left figure shows a codon alignment of 38 strains of a bacterium, with an outgroup sequence (which starts with a string of SNPs: "....g...c..ca..", etc), answer the following questions (<font color="red">with the outgroup sequence excluded.</font>) <font color="green">Do not print the figure directly. Hand-copy the sequences to a graph sheet, include only sequences at the two variable codon positions</font>: | |||
# There are two SNP sites. For each SNP, determine whether it is a synonymous or nonsynonymous change (could be both if more than 2 states). You may simply list the codons and their corresponding amino acids, at each aligned codon site. | |||
# Calculate allele frequencies at each SNP site (for 3 SNP states, calculate frequencies of all three separately) | |||
# List all haplotypes using the 2 SNP sites | |||
# Calculate frequencies of all haplotypes | |||
# Using the outgroup sequence, determine the ancestral and derived SNP, codon, and amino-acid states at each codon site. Explain with a tree including the outgroup sequence. | |||
|} | |||
* 11/28 (M). Neutral Theory & Molecular Clock (pages 58-59; 72-74) | |||
* 12/1 (TH). Tests of Natural Selection | |||
{| class="wikitable sortable mw-collapsible" | |||
|- style="background-color:lightsteelblue;" | |||
! Assignment # (10 pts; Due 12/8, Thursday) | |||
|- style="background-color:lightblue;" | |||
|Statistical experiments to explore gene-frequency change due to genetic drift: | |||
# With R-studio, make two populations of N=1000 haploid individuals consisting of alleles "A" and "G" at a SNP site: <code>pop1 = c(rep("A",500), rep("G",500)); pop2 = c(rep("A",100), rep("G",900))</code> | |||
# Count alleles in each population: <code>table(pop1); table(pop2)</code>. Which population is more diverse? Why? | |||
# Define a function to calculate heterozygosity: <code>hg = function(x) {cts = table(x); total=sum(cts); if (length(cts)==1) {return(0) } else { freq1=cts[1]/total; freq2=cts[2]/total; return(1-(freq1^2+freq2^2)) } }</code> | |||
# Calculate heterozygosity of each population: <code>hg(pop1); hg(pop2)</code>. The results should match your answer to the 2nd question. | |||
# Permute population 1 and take a random sample of 100 individuals: <code>pop1=sample(pop1); s = sample(pop1, 100); counts=table(s); heterozygosity = hg(s)</code>. Is the sample more or less diverse than the original population? Repeat 10 times and report all counts and diversity (e.g., with a table) | |||
# Repeat the above with a smaller sample of 10 individuals | |||
# Repeat with population 2 and a sample of 100 individuals | |||
# Repeat the above with a smaller sample of 10 individuals | |||
# Define "genetic diversity" verbally (+2 pts for giving and using formula for calculating heterozygosity). <font color="blue">Answer: Heterozygosity is the probability that two randomly picked alleles are different</font> | |||
# Define "genetic drift". Using results from the above four statistical experiments, discuss the effect of genetic drift to genetic diversity within population. What's the general trend (increase or decrease) of genetic diversity as a result of random sampling of gametes? Is the gain or loss of genetic diversity due to genetic drift more rapid in small or large population (contrasting results with different sample sizes)? <font color="blue">Genetic drift is random, non-selective fluctuation of allele frequencies due to random gamete sampling in a finite population. Genetic drift reduces genetic diversity within a population and increases genetic divergence between populations. The effect of drift is more pronounced in small populations, as shown by larger frequency variation in a sample of 10 individuals than in a sample of 100 individuals.</font> | |||
|} | |||
* 12/5 (M) Rates of nucleotide substitutions (pages 111-125). Part 4 slides: [[File:Part-4-evol-mechanisms-v2.pdf|thumbnail]]. | |||
* 12/8 (Thursday). (Last Lecture) Review & Course evaluations. Review slides: [[File:Final-review.pdf|thumbnail]]. '''Submit your Teacher's Evaluation''', using either: | |||
** Personal computer at [http://www.hunter.cuny.edu/te www.hunter.cuny.edu/te]; or, | |||
** Smartphone at [http://www.hunter.cuny.edu/mobilete www.hunter.cuny.edu/mobilete] | |||
* 12/19 (Monday, 4-6pm) '''Comprehensive Final Exam''' | |||
* 12/31 (Sat). Grades Submitted to Registrar Offices (Hunter and Graduate Center) |
Latest revision as of 16:25, 13 December 2016
Course Description
Molecular evolution is the study of the change of DNA and protein sequences through time. Theories and techniques of molecular evolution are widely used in species classification, biodiversity studies, comparative genomics, and molecular epidemiology. Contents of the course include:
- Population genetics, which is a theoretical framework for understanding mechanisms of sequence evolution through mutation, recombination, gene duplication, genetic drift, and natural selection.
- Molecular systematics, which introduces statistical models of sequence evolution and methods for reconstructing species phylogeny.
- Bioinformatics, which provides hands-on training on data acquisition and the use of software tools for phylogenetic analyses.
This 3-credit course is designed for upper-level biology-major undergraduates. Hunter pre-requisites are BIOL203, and MATH150 or STAT113.
Please note that starting from fall 2015, completing this course no longer counts towards research credits for biology majors.
Textbooks
- (Required) Graur, 2016, Molecular and Genome Evolution, First Edition, Sinauer Associates, Inc. ISBN: 978-1-60535-469-9. Publisher's Website (Student discount: a 15% discount and receive free UPS standard shipping)
http://www.sinauer.com/molecular-and-genome-evolution.html)
- (Recommended) Baum & Smith, 2013. Tree Thinking: an Introduction to Phylogenetic Biology, Roberts & Company Publishers, Inc.
Learning Goals
- Be able to describe evolutionary relationships using phylogenetic trees
- Be able to use web-based as well as stand-alone software to infer phylogenetic trees
- Understand mechanisms of DNA sequence evolution
- Understand algorithms for building phylogenetic trees
Links for phylogenetic tools
- NCBI sequence databases
- R Tools
- R source: download & install from a mirror site
- R Studio: download & install
- APE package
- A Molecular Phylogeny Web Server
- EvolView: an online tree viewer
Exams & Grading
- Attendance (or a note in case of absence) is required. Bonus for active participation in classroom discussions.
- Assignments. All assignments should be handed in as hard copies only. Email submission will not be accepted. Late submissions will receive 10% deduction (of the total grade) per day.
- Three Mid-term Exams (30 pts each)
- Comprehensive Final Exam (50 pts)
Academic Honesty
While students may work in groups and help each other for assignments, duplicated answers in assignments will be flagged and investigated as possible acts of academic dishonesty. To avoid being investigated as such, do NOT copy anyone else's work, or let others copy your work. At the least, rephrase using your own words. Note that the same rule applies regarding the use of textbook and online resources: copied sentences are not acceptable and will be considered plagiarism.
Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity and will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures.
Course Schedule
Part 1. Tree Thinking
- 8/25 (TH). Overview & Introduction. Textbook Chapter: "Introduction" (pages 1-3)
Assignment 1 (10 pts; Due: 8/29, Monday) |
---|
Pre-test: Full credits will be given as long as each question is answered with some reasoning. In other words, it will NOT be graded on being right or wrong. It's an assessment tool, to be compared with later test outcomes to show teaching/learning results. |
- 8/29 (M). Introduction (Continued). Tutorial: R & R-Studio (Bring your own computer). Lecture slides:
- 9/1 (TH). Intro to trees. In-class exercise 1.
Assignment 2 (10 pts; Due: 9/8, Thursday) |
---|
Watch Origin of Species: Lizards in an Evolutionary Tree. Provide short answer (1-3 sentences) to each of the following three questions.
|
R exercises
|
- 9/5 (M). Labor Day. No class
- 9/8 (TH). Intro to trees. In-class exercise 2. Textbook Chapter 5: "Molecular Phylogenetics" (pages 170-175; 201-202)
- 9/12 (M). Species Tree & Lineage Sorting. Textbook Chapter 5: "Molecular Phylogenetics" (pages 177-180)
Assignment 3 (10 pts; Due: 9/19, Monday) |
---|
R exercises
|
- 9/15 (TH). Consensus Tree & Review. Chapter 5. pages 199-200 (Figure 5.31) Lecture Slides:
- 9/19 (M). 4:10 - 5:10pm Midterm Exam I Bring pencils, erasers, and a calculator
Part 2. Trait & Sequence Evolution
- 9/22 (Th). Traits & trait matrix
- Textbook Chapter 5, pages 180-183
Assignment #4 (10 pts; Due Thursday, 9/29) |
---|
|
- 9/26 (M). Homoplasy & consistency
- 9/29 (Th). Parsimony reconstruction (Chapter 5).
- Textbook Chapter 5, pages 188-191
- In-Class Exercise 4
- 10/3 (No class)
- 10/6 (TH). Genome & gene structure (Chapter 3)
- In-Class Exercise 5. Pretest Part 2 (molecular phylogenetics in forensics)
Assignment #5 (10 pts; Due 10/13) |
---|
|
- 10/10 (M). No Class (Columbus Day)
- 10/13 (Th). Genome and gene evolution. Lecture Slides:
- 10/17 (M). Review & Practices.
- 10/20 (Th). Midterm Exam 2
Part 3. Tree Algorithms
- 10/24 (M). BLAST & Alignments (Chapter 3. pages 93-100)
Assignment #6 (5 pts; Due 10/31) |
---|
Based on the NCBI Gene Page for cytochrome C (CYCS), answer the following questions:
|
- 10/27 (TH). Genetic distances & Sequence-evolutionary models (Chapter 3, pages 79-88)
- 10/31 (M). Maximum parsimony (Chapter 5, pages 191-194). In class exercise #6
Assignment #7 (10 pts; Due 11/7, Monday (Part-3 lecture slides posted below)) |
---|
|
- 11/3 (TH). Distance methods (Chapter 5, pages 184-187). Lecture Slides:
- 11/7 (M). Likelihood & Bayesian methods (Chapter 5, pages 194-198)
Assignment #8 (10 pts; Due 11/14, Monday) |
---|
|
- 11/10 (TH). Tree-testing & Review (Chapter 5, pages 207-209). Lecture slides: to be posted
- 11/14 (M). Midterm Exam 3
Part 4. Population Genetics (Chapter 2)
- 11/17 (Th). Mechanism of molecular evolution: Overview (pages 35-38)
- 11/21 (Mon). SNP statistics & Genetic Drift (pages 47-49)
Assignment #9 (10 pts; Due 12/1, Thursday) |
---|
The left figure shows a codon alignment of 38 strains of a bacterium, with an outgroup sequence (which starts with a string of SNPs: "....g...c..ca..", etc), answer the following questions (with the outgroup sequence excluded.) Do not print the figure directly. Hand-copy the sequences to a graph sheet, include only sequences at the two variable codon positions:
|
- 11/28 (M). Neutral Theory & Molecular Clock (pages 58-59; 72-74)
- 12/1 (TH). Tests of Natural Selection
Assignment # (10 pts; Due 12/8, Thursday) |
---|
Statistical experiments to explore gene-frequency change due to genetic drift:
|
- 12/5 (M) Rates of nucleotide substitutions (pages 111-125). Part 4 slides: .
- 12/8 (Thursday). (Last Lecture) Review & Course evaluations. Review slides: . Submit your Teacher's Evaluation, using either:
- Personal computer at www.hunter.cuny.edu/te; or,
- Smartphone at www.hunter.cuny.edu/mobilete
- 12/19 (Monday, 4-6pm) Comprehensive Final Exam
- 12/31 (Sat). Grades Submitted to Registrar Offices (Hunter and Graduate Center)