Biol375 2018

From QiuLab
Revision as of 02:03, 20 October 2018 by imported>Weigang (→‎Part 3. Tree Algorithms)
Jump to navigation Jump to search
Molecular Evolution (BIOL 375.00/790.64/793.03, Fall 2018)
Instructor: Dr Weigang Qiu, Professor, Department of Biological Sciences
Room: 926 HN (Seminar Room, North Building)
Hours: Mon. & Thur 4:10-5:25 pm
Office Hours: Belfer Research Building (Google Map) BB-402; Tuesdays 5-7 pm or by appointment
Course Website: http://diverge.hunter.cuny.edu/labwiki/Biol375_2018

Borreliabase-screenshot-1.png

Course Description

Molecular evolution is the study of the change of DNA and protein sequences through time. Theories and techniques of molecular evolution are widely used in species classification, biodiversity studies, comparative genomics, and molecular epidemiology. Contents of the course include:

  • Population genetics, which is a theoretical framework for understanding mechanisms of sequence evolution through mutation, recombination, gene duplication, genetic drift, and natural selection.
  • Molecular systematics, which introduces statistical models of sequence evolution and methods for reconstructing species phylogeny.
  • Bioinformatics, which provides hands-on training on data acquisition and the use of software tools for phylogenetic analyses.

This 3-credit course is designed for upper-level biology-major undergraduates. Hunter pre-requisites are BIOL203, and MATH150 or STAT113.

Please note that starting from fall 2015, completing this course no longer counts towards research credits for biology majors.

Textbooks

  • (Required) Graur, 2016, Molecular and Genome Evolution, First Edition, Sinauer Associates, Inc. ISBN: 978-1-60535-469-9. Publisher's Website (Student discount: a 15% discount and receive free UPS standard shipping)

http://www.sinauer.com/molecular-and-genome-evolution.html)

  • (Recommended) Baum & Smith, 2013. Tree Thinking: an Introduction to Phylogenetic Biology, Roberts & Company Publishers, Inc.

Learning Goals

  • Be able to describe evolutionary relationships using phylogenetic trees
  • Be able to use web-based as well as stand-alone software to infer phylogenetic trees
  • Understand mechanisms of DNA sequence evolution
  • Understand algorithms for building phylogenetic trees

Links for phylogenetic tools

Exams & Grading

  • Bonus for full attendance & active participation in classroom discussions.
  • Assignments. All assignments should be handed in as hard copies only. Email submission will not be accepted. Late submissions will receive 10% deduction (of the total grade) per day.
  • Three Mid-term Exams (30 pts each)
  • Comprehensive Final Exam (50 pts)

Academic Honesty

While students may work in groups and help each other for assignments, duplicated answers in assignments will be flagged and investigated as possible acts of academic dishonesty. To avoid being investigated as such, do NOT copy anyone else's work, or let others copy your work. At the least, rephrase using your own words. Note that the same rule applies regarding the use of textbook and online resources: copied sentences are not acceptable and will be considered plagiarism.

Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity and will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures.

Course Schedule

Part 1. Tree Thinking

  • 8/27 (M). Overview & Introduction. Textbook Chapter: "Introduction" (pages 1-3)
Assignment 1 (10 pts; Due next class 8/30)
  • (10 pts) Pre-test: Full credits will be given as long as each question is answered with some reasoning. In other words, it will NOT be graded on being right or wrong. It's an assessment tool, to be compared with later test outcomes to show teaching/learning results.
  • 8/30 (TH). Introduction (Continued).
    • Go over pre-test questions
    • Tutorial: R & R-Studio (Bring your own computer).
Assignment 2 (5 pts; Due: next session)
R exercises
  1. Install R & R-studio (see "Links for phylogenetic tools" above)
  2. Open R-studio and install the "ape" package using the "Packages"->"Install" menu, located within the lower right window
  3. Type in the console window (lower left) the following commands (one at a time, wait for the prompt ">" to appear before proceed to the next command; quit & restart R-studio if stuck):
    1. library(ape)
    2. tr <- read.tree(text = "(monkey:0.09672,((tarsier:0.18996,lemur:0.14790)0.999:0.09005,(macaque:0.18524,(gibbon:0.10388,(orang-utan:0.09481,(human:0.03391,(gorilla:0.06135,chimpanzee:0.05141):0.01580)0.316:0.05381)1.000:0.03019)0.978:0.05616)0.997:0.05042)0.965:0.09672);")
    3. plot(tr)
  4. Export the tree graph using the "Export"->"Save as PDF" or "Save as Image" menu in the lower right window
  5. Exit R studio by typing the command "q()" and type "y" to answer the question for saving the R session
  6. Copy & paste the tree image into your document to be handed in
  • 9/5 (Wed; Monday Schedule). Intro to trees.
  • 9/6 (TH). Intro to trees.
    • In-class exercise 2. (5 pts; Due next session)
    • Textbook Chapter 5: "Molecular Phylogenetics" (pages 170-175; 201-202)
  • 9/13 (TH). Species Tree & Lineage Sorting.
    • Textbook Chapter 5: "Molecular Phylogenetics" (pages 177-180).
  • 9/17 (M). Consensus Tree & Review.
  • 9/20 (Th). 4:10 - 5:10pm Midterm Exam I Bring pencils, erasers, and a calculator

Part 2. analysis of Trait Evolution

  • 9/24 (M). Traits & trait matrix
    • Textbook Chapter 5, pages 180-183
Assignment #3 (5 pts; Due next session)
Watch Origin of Species: Lizards in an Evolutionary Tree. Provide short answer (1-3 sentences) to each of the following three questions.
  1. What are the two hypotheses explaining the origin of different ecomorphs of lizards on Caribbean Islands?
  2. What is the expected phylogeny under each hypothesis?
  3. Which hypothesis is supported by the phylogeny of actual DNA sequences?
  • 9/27 (TH). Homoplasy & consistency
Assignment #4 (5 pts; Due next session)
  1. Download or Copy/Paste the lizard DNA sequences to your own computer and save the file as "lizard.txt"
  2. Align the DNA sequences using this website and save the aligned DNA file ("Output->Alignment in Fasta format") as "lizard-aligned.txt". Use "one-click" option to make a tree.
  3. Based on the lizard card, construct a character-state matrix for all lizard species. For each species, list its character state for each of the following two characters (as columns): (1) Geographic origin, and (2) Habitat.
  4. Determine which hypothesis ("Multiple origin" or "Single origin" of ecomorphs) is more supported by the mtDNA tree. Explain.

Part 3. Tree Algorithms

  • 10/22 (M). BLAST & Alignments (Chapter 3. pages 93-100)
  • 10/25 (TH). Genetic distances & Sequence-evolutionary models (Chapter 3, pages 79-88)
  • 10/29 (M). Maximum parsimony (Chapter 5, pages 191-194).
  • 11/1 (TH). Distance methods (Chapter 5, pages 184-187). In class exercise #6; In-class computer exercise:
  • 11/5 (M). Likelihood & Bayesian methods (Chapter 5, pages 194-198). Lecture slides posted:
  • 11/8 (TH). Tree-testing & Review (Chapter 5, pages 207-209). Lecture slides posted in the last session
  • 11/12 (M). Midterm Exam 3

Part 4. Population Genetics (Chapter 2)

  • 11/15 (TH). Mechanism of molecular evolution: Overview (pages 35-38); SNP statistics
  • 11/19 (M).
Assignment #9 (10 pts; Due 11/30, Thursday)
Snp-pa1.png

The left figure shows a codon alignment of 38 strains of a bacterium, with an outgroup sequence (which starts with a string of SNPs: "....g...c..ca..", etc), answer the following questions (with the outgroup sequence excluded.) Do not print the figure directly. Hand-copy the sequences to a graph sheet, include only sequences at the two variable codon positions:

  1. There are two SNP sites. For each SNP, determine whether it is a synonymous or nonsynonymous change (could be both if more than 2 states). You may simply list the codons and their corresponding amino acids, at each aligned codon site.
  2. Calculate allele frequencies at each SNP site (for 3 SNP states, calculate frequencies of all three separately)
  3. List all haplotypes using the 2 SNP sites
  4. Calculate frequencies of all haplotypes
  5. Using the outgroup sequence, determine the ancestral and derived SNP, codon, and amino-acid states at each codon site. Explain with a tree including the outgroup sequence.
  • 11/26 (M).
  • 11/29 (TH). Neutral Theory & Molecular Clock (pages 58-59; 72-74)
  • 12/3 (M). Genetic Drift (pages 47-49). Computer exercises of Assignment #10 (below) will be done in class.
Assignment # (10 pts; Due 12/11, Monday)
Statistical experiments to explore gene-frequency change due to genetic drift:
  1. With R-studio, make two populations of N=1000 haploid individuals consisting of alleles "A" and "G" at a SNP site: pop1 = c(rep("A",500), rep("G",500)); pop2 = c(rep("A",100), rep("G",900))
  2. Count alleles in each population: table(pop1); table(pop2). Which population is more diverse? Why?
  3. Define a function to calculate heterozygosity: hg = function(x) {cts = table(x); total=sum(cts); if (length(cts)==1) {return(0) } else { freq1=cts[1]/total; freq2=cts[2]/total; return(1-(freq1^2+freq2^2)) } }
  4. Calculate heterozygosity of each population: hg(pop1); hg(pop2). The results should match your answer to the 2nd question.
  5. Permute population 1 and take a random sample of 100 individuals: pop1=sample(pop1); s = sample(pop1, 100); counts=table(s); heterozygosity = hg(s). Is the sample more or less diverse than the original population? Repeat 10 times and report all counts and diversity (e.g., with a table)
  6. Repeat the above with a smaller sample of 10 individuals
  7. Repeat with population 2 and a sample of 100 individuals
  8. Repeat the above with a smaller sample of 10 individuals
  9. Define "genetic diversity" verbally (+2 pts for giving and using formula for calculating heterozygosity).
  10. Define "genetic drift". Using results from the above four statistical experiments, discuss the effect of genetic drift to genetic diversity within population. What's the general trend (increase or decrease) of genetic diversity as a result of random sampling of gametes? Is the gain or loss of genetic diversity due to genetic drift more rapid in small or large population (contrasting results with different sample sizes)?