BIOL200 2013
EXPERIMENT # 4
BIOL 200 Cell Biology II LAB, Spring 2013
Hunter College of the City University of New York
Course information
Instructors: TBD
Class Hours: Room TBD HN; TBD
Office Hours: Room 830 HN; Thursdays 2-4pm or by appointment
Contact information:
- Dr. Weigang Qiu: weigang@genectr.hunter.cuny.edu, 1-212-772-5296
Experiment #4
The Tree of Life and Molecular Identification of Microorganisms
Objective
To classify microorganisms and determine their relatedness using molecular sequences.
LAB REPORT GRADING GUIDE
CELL BIO II Experiment #4:
- Introduction 1 point :
Statement of objectives or aims of the experiment in the student’s own words. (not to be copied from the Lab Manual)
- MATERIALS AND METHODS 0 points :
This should be a brief synopsis and must include any changes or deviations from the procedures outlined in the Lab Manual. Specify which organisms were used to create the phylogram.
- RESULTS 4 points :
A print out of the phylogram will suffice.
- DISCUSSION 4 points :
Responses to discussion questions.
- SUMMARY |CONCLUSION 1 point :
Two sentence summary of your findings.
- REFERENCES 1 point :
Credit is given for pertinent references obtained from sources other than the Lab Manual. This point is in addition to the 10 for the lab report..
INTRODUCTION
Introduction | |
---|---|
Evolution can be defined as descent with modification. In other words, changes in the nucleotide sequence of an organsim’s genomic DNA is inherited by the next generation. According to this, all organisms are related through descent from an ancestor that lived in the distant past. Since that time, about 4 billion years ago, life has undergone an extensive process of change as new kinds of organisms arose from other kinds existing in the past. The evolutionary history of a group is called a phylogeny, and can be represented by a phylogram (Figure 1). A major goal of evolutionary analysis is to understand this history. We do not have direct knowledge of the path of evolution, as by definition, extinct organisms no longer exist. Therefore, phylogeny must be inferred indirectly. Originally, evolutionary analysis was based upon the organisms’ morphology and metabolism. This is the basis for the Linnaean classification scheme (the “Five Kingdoms” scheme). However, this method can lead to mistaken relationships. Different species living in the same environment may have similar morphologies in order to deal with specific environmental factors. Thus these similarities have nothing to do with how related the organisms are, but are a direct result of shared surroundings. However, with the advent of genomics, organisms can be grouped based upon their sequence relatedness. Since evolution is a process of inherited nucleotide change, analyzing DNA sequence differences allows for the reconstruction of a better phylogenetic history. | |
Figure 1. Tree of life based on 16S ribosomal RNA (image credit: NR Pace, Science 1997)
| |
Read Chapter 1 |
February 5
- Chapter 1. Central Dogma & Wet Lab Tools Lecture Slides Ch.1-Che
- Beginning Perl (Beginning Perl, Part 1 Slides)
- Homework: (this assignment *will* be graded.)
Assignment #2 |
---|
Before you begin... Do this ONLY ONCE: echo "source /data/yoda/b/student.accounts/bio425_2011/bio425.profile" >> ~/.bash_profileAlternatively, you can open ~/.bash_profile in a text editor (ask me if don't know how) and paste the line: source /data/yoda/b/student.accounts/bio425_2011/bio425.profileat the end. |
Beginning Perl For the homework, read up to page 221 in Appendix 1. For February 26, read all of Appendix 1. There are two choices for the homework. The first is recommended for novices. The second is for those who are either comfortable with Perl, or feel the need for a challenge this early. Only complete ONE of these assignments, as I will only accept one. Please follow the guidelines listed above.
|
Problems (pg.31-32): 1.2, 1.3, 1.5,1.9, 1.10, 1.11 |
February 12
NO CLASS
(Read Chapter 6 for next class)
February 19
Yozen will not be lecturing
- Chapter 6. Gene and Genome Structures [Lecture Slides Lecture Slides Ch.6-Che
- Tutorial: ORF Prediction using GLIMMER
- Homework: This homework will be graded.
Assignment #3 |
---|
Bacterial gene identification using Glimmer Remember to first log in to mysql by doing: ssh mysql
|
Read All of Appendix 1. |
February 26
- Appendix 1. More PERL (Lecture Slides)
- Homework
Assignment #4 |
---|
Beginning Perl This time, both novices and experienced programmers do the same homework, with one small difference in the use of the program. Recall from the first class where I introduced the FASTA-format. In this format, sequence data is recorded as follows: >SequenceID_info1_info2 atgcgtgatg... Of course, the ID portion is itself not standardized, and the sequence can also be an amino acid sequence. For simplicity, let's assume that in the ID field, you have a "Strain" name followed by a "protein" name, separated by an underscore (_). You will write a program to read a FASTA file with the ID format described above, and a nucleotide sequence. For both novice-level and experienced level programmers, your program will:
Novice-level task: Your program will just print the above information for all sequences, in a readable form. Sample output could be: Strain: B31 Protein: ospA Seq Length: 819 GC content: 33.58% Base composition: A 42.98 %, T 23.44 %, C 14.77 %, G 18.80 % If your percentages have more than 2 decimal places, that's OK. Experienced-level task: The only difference from novices is that your program will ask the user for the name of a strain and protein, separated by an underscore (ie, B31_opsA). Once given that input, it will print the exact same output as above, but only for the sequence described by that input. If the input doesn't exist, it will say so and exit. Your program will continue to ask the user for the sequence ID until the user types 'quit' or they give an invalid sequence ID. You can do this by using a while loop. Notes Calculating the GC content and the base composition is easy if you make use of the tr (transliterate) function as described at the bottom of page 232, and divide the result by the sequence length. GC content is just the sum of total G and C nucleotides, divided by the sequence length. I do want percents, so remember to multiply the results by 100 and to append a '%' at the end. Getting the strain name and the protein name separately can be accomplished with the split() function (check new slides or search on the internet). You will test your program the with the file /data/yoda/b/student.accounts/bio425_2011/data/Borrelia_osp.dna.fasta as input. You don't have to include the file itself with your homework, but I do still want you to copy the program output and submit it with your assignment. Again, the program cannot use any outside dependencies/modules such as BioPerl (supposing you know how to use it.) Besides that, you can implement it however you like. If you know about references, it is possible to do this assignment without using them. |
March 5
- Chapter 2. Data Search and Alignments Lecture Slides Ch.2-Che
- Object-Oriented PERL & BioPerl (Link to Bioperl site and HOWTOs)
- Homework:
Assignment #5 |
---|
BioPerl Assignment
For this assignment, you will use the .predict file you made with glimmer in assignment 3. If connecting from home: open gedit before logging on to mysql. For BioPerl to work, you must log on to mysql. Complete the assignment by following these steps. Make sure each part works before trying to solve the next part:
See these sample scripts for how to use revcom and subseq: ../bio425_2011/sample-perl-scripts/revcom_translate_seq.pl ../bio425_2011/sample-perl-scripts/subseq.pl And I linked to the HOWTO above in case you forgot. Output should be informative: ORF: orf00002 DNA: ... Protein: ... |
Read
For next class, read CH 3 |
March 12
- Chapter 3. Molecular Evolution Lecture Slides Ch.3-Che
- Homework: (TBA)
March 19
- REVIEW Session for MID-TERM EXAMS
March 26
- MID-TERM
April 2
- Chapter 4. Phylogenetics I. Distance Methods Lecture Slides Ch.4-Che
- "Tree Thinking" Puzzles - (Download)
- Tutorial: PROTDIST and NEIGHBOR using Mobyle Pasteur
Assignment #6 |
---|
Chapter 4 Questions & Problems (pg.95-96): 4.1, 4.3, 4.4, 4.7, 4.8 |
April 9
- Chapter 5. Phylogenetics II. Character-Based Methods Lecture Slides Ch.5-Che
- Tutorial: DNAML and bootstrap analysis using Mobyle Pasteur
April 16
- Topic: Relational Database and SQL
- Tutorial: the Borrelia Genome Database
- Homework: SQL-embedded PERL
Assignment #7 |
---|
SQL-embedded PERL Continue work on the assignment we began in class. It is reproduced below, with some added functionality. Your script will:
called "Pko_orfs.fasta". The sequence ID should be of the form: Pko_orfname Note that the above will require the use of BioPerl.
of obtaining their respective IDs from just their names. Make sure the sequence IDs are informative. They should look like this: strainname_contigname_orfname
|
Questions from Text (pg.115-116): 5.1, 5.3 |
April 23
NO CLASSES (Spring recess)
April 30
- Topic: Statistics
- In-class exercise: R basics and short demonstration of a simple boxplot
- Tutorial: Statistical Visualization using R Lecture Slides-Che
May 7
- Chapter 6 (Gene Expression) & Chapter 8 (Proteomics)
- Tutorial: Array Data Visualization and Analysis ( Micro-Array Analysis Slides)
- Homework:Data Analysis using R
Assignment #8 |
---|
Part 1 Data Analysis:
For this assignment, you will use sample data to answer the question: Do men and women have different body temperatures? The file temps.txt located in ../bio425_2011/data on eniac, contains body temperature data for a sample of adults. Use a hypotheses test with α = .05 to answer the above question of interest. NOTE: For this part of the assignment you will need to turn in your answer to the question with p-values in addition to the R syntax used. Indicate your null hypothesis.
Using the files GSM129276_cy3.txt & GSM129276_cy5.txt located in ./bio425_2011/data on eniac, conduct an analysis to produce a histogram of fold changes. In addition to the histogram, you will need to turn in the R syntax used in every step of the analysis in R, along with an explanation as to why the step was necessary. |
Read
For next class, read CH 7 |
May 14
- Chapter 7. Protein Structure Prediction
May 21
- Final Project Due (TBA)
Useful Links
Unix Tutorials
- A very nice UNIX tutorial (you will only need up to, and including, tutorial 4).
- FOSSWire's Unix/Linux command reference (PDF). Of use to you: "File commands", "SSH", "Searching" and "Shortcuts".
Perl Help
- Professor Stewart Weiss has taught CSCI132, a UNIX and Perl class. His slides go into much greater detail and are an invaluable resource. They can be found on his course page here.
- Perl documentation at perldoc.perl.org. Besides that, running the perldoc command before either a function (with the -f option ie, perldoc -f substr) or a perl module (ie, perldoc Bio::Seq) can get you similar results without having to leave the terminal.
Bioperl
- BioPerl's HOWTOs page.
- BioPerl-live developer documentation. (We use bioperl-live in class.)
- Yozen's tutorial on installing bioperl-live on your own Mac OS X machine. (Let me know if there are any issues!).
- A small table showing some methods for BioPerl modules with usage and return values.
SQL
- SQL Primer, written by Yozen.
R Project
- Install location and instructions for Windows
- Install location and instructions for Mac OS X
- For users of Ubuntu/Debian:
sudo apt-get install r-base-core
- For users of Fedora/Red Hat:
su - yum install R
Utilities
- An RSS button extension for chrome. Can add feeds to Google Reader and others.
- A similar extension which adds a "Live bookmarks"-like feature to Chrome (like Firefox's RSS bookmarks).
Other Resources
- Information Theory Primer by Thomas D. Schneider. Useful in understanding sequence logo maps.
© Weigang Qiu, Hunter College, Last Update Jan 2013