BIOL200 2013: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Cmartin
No edit summary
imported>Cmartin
m (moved User talk:Cmartin to BIOL200 2013: rename)
 
(53 intermediate revisions by the same user not shown)
Line 23: Line 23:


===Objective===
===Objective===
<span style="color: Crimson;font-weight:bold;">To classify microorganisms and determine their relatedness using molecular sequences.<span>
<span style="color: Crimson;font-weight:bold;">To classify microorganisms and determine their relatedness using molecular sequences.</span>


===LAB REPORT GRADING GUIDE===
===LAB REPORT GRADING GUIDE===
CELL BIO II Experiment #4:
CELL BIO II Experiment #4:
*<span style="font-weight:bold;">Introduction:<span> Statement of objectives or aims of the experiment in the student’s own words.(not to be copied from the Lab Manual)
*'''Introduction'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
*Evaluate and interpret computational results statistically ("Statistical-thinking")
<pre>Statement of objectives or aims of the experiment in the student’s own words.
*Formulate informatics questions quantitatively and precisely ("Abstraction")
  (not to be copied from the Lab Manual)</pre>
*Design efficient procedures to solve problems ("Algorithm-thinking")
*'''MATERIALS AND METHODS'''<span style="font-weight:bold;color:OrangeRed;"> 0 points</span> ''':'''
*Manipulate high-volume textual data using UNIX tools, Perl/BioPerl, R, and Relational Database ("Data Visualization")
<pre>This should be a brief synopsis and must include any changes or deviations from the procedures
outlined in the Lab Manual. Specify which organisms were used to create the phylogram.</pre>
*'''RESULTS'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
<pre>A print out of the phylogram will suffice.</pre>
*'''DISCUSSION'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
<pre>Responses to discussion questions.</pre>
*'''SUMMARY |CONCLUSION'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
<pre>Two sentence summary of your findings.</pre>
*'''REFERENCES'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
<pre>Credit is given for pertinent references obtained from sources other than the Lab Manual.
  This point is in addition to the 10 for the lab report..</pre>


===Pre-requisites===
===INTRODUCTION===
This 3-credit course is designed for upper-level undergraduates and graduate students. Prior experiences in the UNIX Operating System and at least one programming language are required. Hunter pre-requisites are CSCI132 (Practical Unix and Perl Programming) and BIOL300 (Biochemistry) or BIOL302 (Molecular Genetics), or permission by the instructor.
 
===Textbook===
 
Krane & Raymer (2003). ''Fundamental Concepts of Bioinformatics''. Pearson Education, Inc. (ISBN 0-8053-4633-3)
 
This book should be available in the Hunter Bookstore, as well as through several popular retailers and resellers online.
 
===Grading & Academic Honesty===
 
Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity and will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures.
 
Student performance will be evaluated by weekly assignments and projects. While these are take-home projects and students are allowed to work in groups and answers to some of the questions are provided in the back of the textbook, students are expected to compose the final short answers, computer commands, and code independently. There are virtually an unlimited number of ways to solve a computational problem, as are ways and personal styles to implement an algorithm. Writings and blocks of codes that are virtually exact copies between individual students will be investigated as possible cases of plagiarism (e.g., copies from the Internet, text book, or each other). In such a case, the instructor will hold closed-door exams for involved individuals. Zero credits will be given to ALL involved individuals if the instructor considers there is enough evidence for plagiarism. To avoid being investigated for plagiarism, '''Do Not Copy from Others & Do Not Let Others Copy Your Work.'''
 
'''Submit assignments in Printed Hard Copies.''' Email attachments will NOT be accepted. Each assignment will be graded based on timeliness (10%), completeness (30%), whether executable or having major errors (20%), correctness of the final output (20%), algorithm efficiency (10%), and cleanness and readability in programming styles (10%).
 
The grading scheme for the course, is as follows ('''Subject to some change. You will be notified with sufficient time'''):
*Assignments (50%): 10 exercises.
*Mid-term (20%): In class Assignment + Take home to be collected on the same day.
*Final exam (20%)
*Classroom Q & A (5%):  Read the chapters before lecture.
*Attendance (5%): 1-2 absences = -2.5%. More than 2 = -5%.
 
===Programming Assignment Expectations===
All code must begin with the lines in the Perl slides, without exception. For each assignment, unless otherwise stated, I would like the '''full text''' of the source code. Since you cannot print using the text editor in the lab (even if you are connected from home), you must copy and paste the code into a word processor or a local text editor. If you are using a word processor, '''change the font to a fixed-width/monospace font'''. On Windows, this is usually '''Courier'''.
 
Code indentation is '''your personal taste''', so long as it is consistent and readable. Use comments whenever you think either the code is unclear, or simply as a guideline for yourself. Well-commented code improves readability, but be careful not overdo it.
 
Also, unless otherwise stated, both the '''input and the output of the program must be submitted as well'''. This should '''also be in fixed-width font''', and you should '''label''' it in such a way so that I know it is the program's input/output. This is so that I know that you've run the program, what data you have used, and what the program produced.
 
If you are working from the lab, one option is to email the code to yourself, change the font, and then print it somewhere else as there is no printer in the lab.
 
==Course Schedule (All Saturdays)==
<span style="color:red;font-weight:bold;font-size:large;">Dates and assignments below are subject to some change</span>
 
'''"Lecture slides" links will be available either during or before each lecture, in PDF.'''
 
'''Homework assignments are due the week *after* the date under which they appear.''' ie, an assignment posted under Jan 29 is due the following lecture, on Feb 5.
 
===January 29===
*Course Overview
*'''Tutorial:''' UNIX Account, Tools, & Emacs [Lecture Slides]
*'''UNIX Tutorial:''' Please check the new '''[[#Useful Links]]''' section below
*'''How to connect remotely:''' ([http://qiu.bioweb.hunter.cuny.edu/index.php?option=com_content&view=article&id=110 Windows]) ([http://qiu.bioweb.hunter.cuny.edu/index.php?option=com_content&view=article&id=111 Mac])
* '''Homework: This homework will *not* be graded. It is for practice purposes ONLY.'''
{| class="collapsible collapsed wikitable"
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
|- style="background-color:lightsteelblue;"
! Assignment #1
! Introduction
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Linux Proficiency'''<br />
| Evolution can be defined as descent with modification.  In other words, changes in the nucleotide sequence of an organsim’s genomic DNA is inherited by the next generation.  According to this, all organisms are related through descent from an ancestor that lived in the distant past.  Since that time, about 4 billion years ago, life has undergone an extensive process of change as new kinds of organisms arose from other kinds existing in the past.<br /> The evolutionary history of a group is called a phylogeny, and can be represented by a phylogram (Figure 1).  A major goal of evolutionary analysis is to understand this history.  We do not have direct knowledge of the path of evolution, as by definition, extinct organisms no longer exist.  Therefore, phylogeny must be inferred indirectly.  Originally, evolutionary analysis was based upon the organisms’ morphology and metabolism. This is the basis for the Linnaean classification scheme (the “Five Kingdoms” scheme). However, this method can lead to mistaken relationships. Different species living in the same environment may have similar morphologies in order to deal with specific environmental factors.  Thus these similarities have nothing to do with how related the organisms are, but are a direct result of shared surroundings. However, with the advent of genomics, organisms can be grouped based upon their sequence relatedness.  Since evolution is a process of inherited nucleotide change, analyzing DNA sequence differences allows for the reconstruction of a better phylogenetic history.<br/>
#Display the absolute path of your home directory
|-
#List files in your home directory in long format & ordered by their time stamps
|[[File:TreeLife.PNG|center|alt=The Tree of Life.|Tree of life based on 16S ribosomal RNA (image credit: NR Pace, Science 1997)]]
#List files in the "/data/yoda/b/student.accounts/bio425_2011/" directory from your home directory
#Copy of the file "/data/yoda/b/student.accounts/bio425_2011/data/GBB.seq" into your home directory
#Count the number of lines in the file "GBB.seq"
#Show the first five lines of the file "GBB.seq" & save it to a file with arbitrary name
#Show your last ten commands using "history"
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Read''' Chapter 1
|Of course, when comparing DNA sequences, the question of which genes to use arises.  The most widely used genes are those coding for the 16S rRNA gene in prokaryotes and the 18S rRNA gene in eukaryotes.  These genes code for small subunit ribosomal RNA and are used for evolutionary analysis because they 1) are found in all organisms, 2) are functionally conserved, 3) vary only slightly between organisms (their nucleotide sequence changed slowly throughout evolution), and 4) have adequate length.  In this lab, you will be performing evolutionary analysis by constructing a phylogram of 15 microbes spanning bacteria, archaea and eukarya.  You will find and download rRNA sequences, align them and use that alignment to create a phylogram.
|}
|}


===February 5===
===MATERIALS===
*'''Chapter 1.''' Central Dogma & Wet Lab Tools [[Media:Molecular_Biology_and_Genomics.pdf|Lecture Slides Ch.1-Che]]
*'''Required hardware:''' Computer
*'''Beginning Perl''' ([[Media:Bio425_beginning_perl.pdf‎|Beginning Perl, Part 1 Slides]])
*'''Homework:''' (this assignment *will* be graded.)
{| class="collapsible collapsed wikitable"
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
|- style="background-color:lightsteelblue;"
! Assignment #2
! Procedure
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Before you begin...'''<br />
|
Do this ONLY ONCE: <pre>echo "source /data/yoda/b/student.accounts/bio425_2011/bio425.profile" >> ~/.bash_profile</pre>
#Examine Table I, select representative species from Bergey’s Manual. Select 2 prokaryotic species from each group, giving 14 prokaryotic species total. Also select the Eukaryotic representative, Saccharomyces cerevisiae.
Alternatively, you can open ~/.bash_profile in a text editor (ask me if don't know how) and paste the line: <pre>source /data/yoda/b/student.accounts/bio425_2011/bio425.profile</pre> at the end.
#Access the NCBI website: http://www.ncbi.nlm.nih.gov/  
|-style="background-color:powderblue;"
#Under the “Search” category, select “Nucleotide”
| '''Beginning Perl'''<br />
#Under the “for” category, type the accession number for your first organism, and hit the “Go” button. This takes you to the access for the 16S rRNA for your organism.
For the homework, read up to page 221 in Appendix 1. For February 26, read all of Appendix 1.
#Download the 16S rRNA sequence for your first organisms by choosing “FASTA” under the “Display” category.
 
#Copy and paste the entire output into a Microsoft Word file.
There are '''two choices''' for the homework. The first is recommended for novices. The second is for those who are either comfortable with Perl, or feel the need for a challenge this early. Only complete ONE of these assignments, as I will only accept one. Please follow the guidelines listed [[#Programming Assignment Expectations|above]].
#Edit the sequence id to match the format of “Genus_Species_Genbank#” (eg. > Escherichia_coli_174375).
 
#Repeat process for all of your organisms, pasting the sequences into the same Microsoft Word file. (note: be sure to place a blank line between each sequence entry)
# Copy the code from page 221 in a new file. (Remember to put the code from the slides in the beginning of the file and to declare all variables on first use!) You must alter the code so that the resulting program accomplishes the following four tasks:
#Access the EMBL CLUSTALW alignment website: http://www.ebi.ac.uk/Tools/clustalw/, and copy and paste your entire Microsoft Word file into the area which asks you to “Enter or paste a set of sequences in any supported format”. Click “Run”. This program will make an alignment of all of your sequences.
##Instead of taking the average of 10 numbers, ask the user how many numbers to average and use that number instead. (Hint: see how the code asks for each number). This must be stored in a new variable.
#Click “Show as Phylogram Tree” to create a tree showing the relatedness of your organisms based on their 16S rRNA sequences.
##If the number the user gave was 0 or negative, print a message telling the user so, and exit immediately. You can exit using <pre>exit;</pre>
#To print your phylogram tree..
##The code always prints 'Enter another number:'. Change it so that on the '''first time only''' it instead prints 'Enter a number:'.
#*a. hit the “Print Screen” button on your keyboard
##Just before printing the average, print a message saying 'The numbers to average are: '. Then print out out all the numbers the user entered.
#*b. open the Paint program from your “accessories” menu on your computer
#More advanced programmers can try this assignment (you may wish to read all of Appendix 1 now): create a script which can take as input one or more DNA sequences from a file and translate directly to the correct amino acid sequence (single-letter format). You may implement this program in Perl however you wish, with as much complexity as you wish, as long as it meets the guidelines above and satisfies the following four criteria:
#* c. hit paste to paste your screen
##The format of the input file it reads must be: one DNA sequence per line, so that each DNA sequence is separated by a new line character. '''Also assume you are given the coding strand.'''
#* d. “select” your phylogram tree
##The name of the input file cannot be hard coded. You may either ask the user for the file location/name or take it as a command line argument.
#* e. copy and paste it into a new paint file
##It must tolerate all upper-case, lower-case or mixed-case sequences in the input
#* f. print your tree and email it to yourself
##For every input DNA sequence, output the DNA sequence, the equivalent RNA, and the peptide sequence. The output '''must''' be informative, ie:
##:Input: atgcgtcgataa
##:Output: augcgucgauaa
##:Peptide: MRR*
#:Additionally, the program cannot use any outside dependencies/modules such as BioPerl (supposing you know how to use it.) Also note that STOP codons are denoted by a '<nowiki>*</nowiki>'
|-style="background-color:powderblue;"
| '''Problems'''<br />
(pg.31-32): 1.2, 1.3, 1.5,1.9, 1.10, 1.11
|}
|}


===February 12===
===Table 1===
'''NO CLASS'''
{| class="wikitable"
 
|-
(Read Chapter 6 for next class)
| colspan="2" |
 
'''Volume 1A (Gram-negative bacteria)'''
===February 19===
'''Yozen will not be lecturing'''


*Chapter 6. Gene and Genome Structures [Lecture Slides [[Media:Chapter_6.pdf|Lecture Slides Ch.6-Che]]
|-
*'''Tutorial:''' ORF Prediction using GLIMMER
|
* '''Homework:''' This homework will be graded.
''Escherichia coli''
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #3
|- style="background-color:powderblue;"
| '''Bacterial gene identification using Glimmer'''<br />
Remember to first log in to mysql by doing: <pre>ssh mysql</pre>
#Copy the Lyme disease bacterium lp17 plasmid file "/data/yoda/b/student.accounts/bio425_2011/data/lp17.fas" into your home directory.
#Run long-orf, extract, build-icm, and glimmer3.
#Show your commands and "cat" the final output.
#Describe key elements of a prokaryotic gene in addition to the open reading frame.
#Textbook Questions (pg152-153): 6.6, 6.9, 6.15
|-style="background-color:powderblue;"
| '''Read''' All of Appendix 1.
|}


===February 26===
|
*Appendix 1. More PERL ([[Media:Bio425_more_perl.pdf|Lecture Slides]])
ACCESSION #174375
*'''Homework'''
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #4
|-style="background-color:powderblue;"
| '''Beginning Perl'''<br />
This time, both novices and experienced programmers do the same homework, with one small difference in the use of the program.


Recall from the first class where I introduced the FASTA-format. In this format, sequence data is recorded as follows:
|-
<pre>>SequenceID_info1_info2
|
atgcgtgatg...</pre>
''Helicobacter pylori''


Of course, the ID portion is itself not standardized, and the sequence can also be an amino acid sequence. For simplicity, let's assume that in the ID field, you have a "Strain" name followed by a "protein" name, separated by an underscore (_). You will write a program to read a FASTA file with the ID format described above, and a nucleotide sequence. For both novice-level and experienced level programmers, your program will:
|
ACCESSION #402670


# Pick out the strain name, the protein name, and the nucleotide sequence.
|-
# Calculate he length of each sequence.
|
# Calculate the GC content (in percent) of each sequence.
''Salmonella typhi''
# Calculate the percent composition of each nucleotide (base composition).


'''Novice-level task:'''
|
ACCESSION #2826789


Your program will just print the above information '''for all sequences''', in a readable form. Sample output could be:
|-
<pre>Strain: B31
|
Protein: ospA
''Serratia marcescens''
Seq Length: 819
GC content: 33.58%
Base composition: A 42.98 %, T 23.44 %, C 14.77 %, G 18.80 %</pre>
If your percentages have more than 2 decimal places, '''that's OK.'''


'''Experienced-level task:'''
|
ACCESSION #4582213


The only difference from novices is that your program will '''ask the user for the name of a strain and protein, separated by an underscore''' (ie, B31_opsA). Once given that input, it will print the exact same output as above, but only for the sequence described by that input. If the input doesn't exist, it will say so and exit. Your program will '''continue to ask the user for the sequence ID''' until the user types 'quit' or they give an invalid sequence ID. You can do this by using a while loop.
|-
|
''Treponema pallidum''


'''Notes'''
|
ACCESSION #176249


Calculating the GC content and the base composition is easy if you make use of the tr (transliterate) function as described at the bottom of page 232, and divide the result by the sequence length. GC content is just the sum of total G and C nucleotides, divided by the sequence length. I do want '''percents''', so remember to multiply the results by 100 and to append a '%' at the end.
|-
| colspan="2" |
Additional species: ''Agrobacterium tumefaciens, Boredetella pertussis, Thermus aquaticus, Yersinia pestis, Borrelia burgdorferi. '''''(Note: To search for unlisted 16S sequences, type key words such as “yersinia<nowiki> AND 16S [gene]” in the NCBI </nowiki>GenBank search box.)'''


Getting the strain name and the protein name separately can be accomplished with the split() function (check new slides or search on the internet).
|-
| colspan="2" |
'''Volume 1B (Rikettsias and endosymbionts)'''


You will test your program the with the file /data/yoda/b/student.accounts/bio425_2011/data/Borrelia_osp.dna.fasta as input. You don't have to include the file itself with your homework, but I do still want you to copy the program output and submit it with your assignment.
|-
|
''Baronella bacilliformis''


Again, the program cannot use any outside dependencies/modules such as BioPerl (supposing you know how to use it.) Besides that, you can implement it however you like. If you know about references, '''it is possible to do this assignment without using them.'''
|
|}
ACCESSION #173825


===March 5===
|-
|
''Chlamydia trachomatis''


*Chapter 2. Data Search and Alignments [[Media:Chapter2.pdf|Lecture Slides Ch.2-Che]]
|
*Object-Oriented PERL & BioPerl (Link to [http://www.bioperl.org/wiki/Main_Page Bioperl] site and [http://www.bioperl.org/wiki/HOWTOs HOWTOs])
ACCESSION #2576240
*'''Homework:'''
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #5
|-style="background-color:powderblue;"
| '''BioPerl Assignment'''


For this assignment, you will use the .predict file you made with glimmer in [[#February_19 | assignment 3]].
|-
|
''Rickettsia rickettsii''


If connecting from home: open gedit '''before''' logging on to mysql.
|
ACCESSION #538436


For BioPerl to work, you '''must''' log on to mysql.
|-
| colspan="2" |
Additional species: ''Coxiella burnetii, Thermoplasma acidophilum''


'''Complete the assignment by following these steps.''' Make sure each part works '''before''' trying to solve the next part:
|-
# Make a perl script that reads each line from the .predict file that describes a gene (skip the heading line).
| colspan="2" |
# Save each line ('''hint:''' array, anyone?)
'''Volume 2A (Gram-positive bacteria)'''
# Now, in the same script, use '''Bio::SeqIO''' to read the lp17.fas file '''and get a Bio::Seq object.'''
# Go through each line saved from the .predict file. Remember: these are predicted orfs:
## For each of these, '''extract the start and stop positions and "strand" values''' (the three values following the orf name).
## If the strand starts with a '-', it means the orf is on the reverse complement, so you need to use the Bio::Seq method "revcom".
## Now, extract the orf sequence using the start & stop values using the Bio::Seq method "subseq", paying special attention to sequences on on the '-' strand.
## Print both the DNA sequence AND the protein sequence.


See these sample scripts for how to use revcom and subseq:
|-
<pre>../bio425_2011/sample-perl-scripts/revcom_translate_seq.pl
|
../bio425_2011/sample-perl-scripts/subseq.pl
''Bacillus subtilis''
</pre>


And I linked to the HOWTO above in case you forgot.
|
ACCESSION #8980302


'''Output should be informative:'''
|-
<pre>
|
ORF: orf00002
''Dinococcus radiodurans''
DNA: ...
Protein: ...
</pre>
|-style="background-color:powderblue;"
| '''Read'''
'''For next class, read CH 3'''
|}


===March 12===
|
*Chapter 3. Molecular Evolution [[Media:CH3.pdf|Lecture Slides Ch.3-Che]]
ACCESSION #145033
* '''Homework:''' (TBA)


===March 19===
|-
*REVIEW Session for MID-TERM EXAMS
|
<!--*Assignment #7. '''(To be posted)'''
''Staphylococcus aureus''
Questions & Problems (pg.54-55): 2.1, 2.2, 2.3, 2.4-->


===March 26===
|
*MID-TERM
ACCESSION #576603
<!--*Assignment #8. '''(To be posted)'''
Questions & Problems (pg.75-76): 3.1, 3.2, 3.3 (use first ten codons), 3.4, 3.5, 3.7-->


===April 2===
|-
*'''Chapter 4.''' Phylogenetics I. Distance Methods  [[Media:CH4.pdf|Lecture Slides Ch.4-Che]]
| colspan="2" |
*"Tree Thinking" Puzzles - ([http://diverge.hunter.cuny.edu/~weigang/lab-website/SummerWorkshop/Baum_etal05_sup_part1.pdf Download])
Additional species: ''Bacillus anthracis, Clostridium botulinum, Lactobacillus acidophilus, Streptococcus pyogenes''
*'''Tutorial:''' PROTDIST and NEIGHBOR using [http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Mobyle Pasteur]
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #6
|-style="background-color:powderblue;"
| '''Chapter 4 ''' Questions & Problems (pg.95-96): 4.1, 4.3, 4.4, 4.7, 4.8
|}


===April 9===
|-
*'''Chapter 5.''' Phylogenetics II. Character-Based Methods  [[Media:CH4.pdf|Lecture Slides Ch.5-Che]]
| colspan="2" |
*'''Tutorial:''' DNAML and bootstrap analysis using [http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Mobyle Pasteur]
'''Volume 2B (Mycobacteria and nocardia)'''
<!--*Assignment #10. '''(To be posted)'''
Questions & Problems (pg.115-116): 5.1, 5.2, 5.3, 5.4-->


===April 16===
|-
*'''Topic:''' Relational Database and SQL
|
*'''Tutorial:''' the Borrelia Genome Database
''Mycobacterium haemophilum''
*'''Homework:''' SQL-embedded PERL
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #7
|- style="background-color:powderblue;"
| '''SQL-embedded PERL'''<br />


Continue work on the assignment we began in class. It is reproduced below, with some added functionality.
|
ACCESSION #406086


Your script will:
|-
|
''Mycobacterium tuberculosis''


# Retrieve TEN orfs from the orf table that belong to the strain Pko.
|
# Find and store the sequences described by those orfs and their lengths.
ACCESSION #3929878
# Determine if the orf is on the reference or reverse complement strand, and use that information to print the correct sequence.
# Print the orf name, sequence, and the length for each orf.
# '''In addition to printing the above information to the screen,''' write out the sequence information '''(in FASTA format)''' to a file
called "Pko_orfs.fasta". The sequence ID should be of the form:
Pko_orfname


Note that the above will require the use of BioPerl.
|-
| colspan="2" |
Additional species: ''Mycobacterium bovis, Nocardia orientalis''


|-
| colspan="2" |
'''Volume 3A (Phototrophs, chemolithotrophs, sheathed bacteria, gliding bacteria)'''


For those looking for extra challenges, you can try adding the following:
|-
|
''Anabaena sp.''


* Ask the user for the strain and contig *names* that they want orfs from, and only retrieve those rows. This means you must find a way
|
of obtaining their respective IDs from just their names. Make sure the sequence IDs are informative. They should look like this:
ACCESSION #39010
strainname_contigname_orfname
* If asking users for input, fail if they gave a strain or contig name which does not exist in the database.
* Also if asking users for input, the output file's name should be changed to reflect the chosen strain.
* Ask the user the minimum length the orf is allowed to be, and only print orfs as long, or longer, than what the user specifies.


|-
|
''Cytophaga latercula''


Sample scripts will go up slowly, over time, including example SQL statements.
|
|-style="background-color:powderblue;"
ACCESSION #37222646
| '''Questions from Text''' <br /> (pg.115-116): 5.1, 5.3
|}


===April 23===
|-
'''NO CLASSES''' (Spring recess)
|
''Nitrobacter wiogradskyi''


===April 30===
|
*'''Topic:''' Statistics
ACCESSION #402722
*'''In-class exercise:''' [https://docs.google.com/document/d/1wq-s8WpqyURVeGiLUxhEyBvHRDrK__Cr7XjkuLicP-c/edit?hl=en&authkey=CJ2g4qsI R basics and short demonstration of a simple boxplot]
*'''Tutorial:''' Statistical Visualization using R  [[Media:R-implementations.pdf|Lecture Slides-Che]]
<!--*Assignment #12. '''(To be posted)'''
R Exercises-->


===May 7===
|-
*'''Chapter 6''' (Gene Expression) & '''Chapter 8''' (Proteomics)
| colspan="2" |
*'''Tutorial:''' Array Data Visualization and Analysis ([[Media:Array_Data_Visualization_and_Analysis.pdf| Micro-Array Analysis Slides]])
Additional species: ''Heliothrix oregonensis, Myxococcus fulvus, Thiobacillus ferrooxidans''
*'''Homework:'''Data Analysis using R
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #8
|-style="background-color:powderblue;"
| '''Part 1 Data Analysis:'''
For this assignment, you will use sample data to answer the question: '''Do men and women have different body temperatures?'''


The file '''temps.txt''' located in ../bio425_2011/data on eniac, contains body temperature data for a sample of adults.
|-
| colspan="2" |
'''Volume 3B (Archeobaceria)'''


Use a hypotheses test with α = .05 to answer the above question of interest.
|-
|
''''Methanococcus jannaschii''


NOTE: For this part of the assignment you will need to turn in your answer to the question with p-values in addition to the R syntax used. '''Indicate your null hypothesis'''.
|
ACCESSION #175446


|-
|
''Thermotoga subterranean''


'''Part 2 Gene Expression Data Analysis:'''
|
ACCESSION #915213


Using the files '''GSM129276_cy3.txt''' & '''GSM129276_cy5.txt''' located in ./bio425_2011/data on eniac, conduct an analysis to produce a histogram of fold changes.
|-
| colspan="2" |
Additional species: ''Desulfurococcus mucosus, Halobacterium salinarium, Pyrococcus woesei''


In addition to the histogram, you will need to turn in the R syntax used in every step of the analysis in R, along with an explanation as to why the step was necessary.
|-
| colspan="2" |
'''Volume 4 (Actinomycetes)'''


|-style="background-color:powderblue;"
|-
| '''Read'''
|
'''For next class, read CH 7'''
''Actinomyces bowdenii''
|}


===May 14===
|
*'''Chapter 7.''' Protein Structure Prediction
ACCESSION #6456800
<!--*Assignment #14 (Final Comprehensive Project). '''(To be posted)'''-->


===May 21===
|-
*Final Project Due (TBA)
|
''Actinomyces neuii''


==Useful Links==
|
ACCESSION #433527


===Unix Tutorials===
|-
*A very nice [http://www.ee.surrey.ac.uk/Teaching/Unix/ UNIX tutorial] (you will only need up to, and including, tutorial 4).
|
*FOSSWire's [http://files.fosswire.com/2007/08/fwunixref.pdf Unix/Linux command reference] (PDF). Of use to you: "File commands", "SSH", "Searching" and "Shortcuts".
''Actinomyces turicensis''


===Perl Help===
|
* Professor Stewart Weiss has taught CSCI132, a UNIX and Perl class. His slides go into much greater detail and are an invaluable resource. They can be found on his course page [http://compsci.hunter.cuny.edu/~sweiss/course_materials/csci132/csci132_f10.php here].
ACCESSION #642970
* Perl documentation at [http://perldoc.perl.org perldoc.perl.org]. Besides that, running the perldoc command before either a function (with the -f option ie, perldoc -f substr) or a perl module (ie, perldoc Bio::Seq) can get you similar results without having to leave the terminal.


===Bioperl===
|-
* BioPerl's [http://www.bioperl.org/wiki/HOWTOs HOWTOs page].
| colspan="2" |
* BioPerl-live [http://doc.bioperl.org/bioperl-live developer documentation]. (We use bioperl-live in class.)
Eukaryotic representative (used as outgroup for rooting the phylogenetic tree)
* Yozen's tutorial on [http://diverge.hunter.cuny.edu/wiki/HOWTO:Bioperl-live_on_Mac_OS_X installing bioperl-live on your own Mac OS X machine]. (Let me know if there are any issues!).
* [https://spreadsheets.google.com/pub?key=0AjfPzjrqY7BndHpyRHlDZUlGcktINm1IbXVzX1QzMXc&single=true&gid=0&output=html A small table] showing some methods for BioPerl modules with usage and return values.


===SQL===
|-
* [https://docs.google.com/document/d/1zYLPeenwsqPYchkpXnndzphBbTKqX2GjjLHDxlBnt78/edit?hl=en&authkey=CLnh_88K SQL Primer], written by Yozen.
|
''Saccharomyces cerevisiae''


===R Project===
|
* Install location and instructions for [http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/ Windows]
ACCESSION #172403
* Install location and instructions for [http://lib.stat.cmu.edu/R/CRAN/ Mac OS X]
* For users of Ubuntu/Debian:
sudo apt-get install r-base-core
* For users of Fedora/Red Hat:
su -
yum install R


===Utilities===
|}
*An [https://chrome.google.com/webstore/detail/nlbjncdgjeocebhnmkbbbdekmmmcbfjd RSS button extension] for chrome. Can add feeds to Google Reader and others.
*A [https://chrome.google.com/webstore/detail/hcamnijgggppihioleoenjmlnakejdph similar extension] which adds a "Live bookmarks"-like feature to Chrome (like Firefox's RSS bookmarks).


===Other Resources===
===ANALYSIS===
* [http://www.ccrnp.ncifcrf.gov/~toms/papers/primer/primer.pdf Information Theory Primer] by Thomas D. Schneider. Useful in understanding sequence logo maps.
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Analyzing your phylogram
|- style="background-color:powderblue;"
| A phyolgram is composed of nodes and branches (Figure 2). The internal nodes represent extinct ancestors, and the tips of the branches, also called nodes, are individual strains of microorganisms that exist now, and from which the sequence data were obtained.  The internal nodes are points in evolution where an extinct ancestor diverged into two new entities, each of which began to accumulate differences during its subsequent independent evolution.<br/>
The branches define the order of descent and the ancestry of the nodes.  The branch length represents the number of changes that have occurred along that branch.  Thus, the more recently two organisms share a common ancestor, the more closely related they are. Trees can be either “unrooted” or “rooted”. Unrooted trees show the relationships among the microorganisms under study, but not the evolutionary path leading from an ancestor to a strain.<br/>
|-
|
[[ File:Phylo.PNG|center|Phylogram with internal nodes (a, b, c, d) and tips (1, 2, 3, 4, 5).  Nodes at the tips are species that exist today, and internal nodes are extinct ancestors.]]
|-style="background-color:powderblue;"
|A rooted tree shows the unique path from an ancestor (internal node) to each strain.  Trees are rooted by inclusion of an outgroup in the analysis. An outgroup is an organism that is less closely related to the other organisms under study than the organisms are to each other.
|}


===DISCUSSION===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Discussion Questions
|-style="background-color:powderblue;"
|
#Answer the following questions based on a Tree of Life shown in Figure 1.
#*a. What do internal and terminal nodes represent?
#*b. What do branch lengths represent? What’s the unit and meaning of the scale bar?
#*c. Identify the positions of Humans (Homo), corn (Zea), E.coli, and Bacillus on the tree. Use the scale bar to estimate which pair is evolutionarily more distant: human/corn or E.coli/Bacillus?
#In Figure 2, which two species are more closely related: 1 and 2, 2 and 3, or 1 and 4?  Which are more distantly related?  How did you determine this?
#In Figure 2, is 1 more, less, or equally related to 4 and 5? Explain your rationale.
#List and describe the key steps of constructing a phylogenetic tree.
#Why do we use 18S rRNA information for yeast and 16S for prokaryotes?  Could we use other molecules as phylogenetic markers?  What constitutes a “good” phylogenetic marker for building a tree of life?
#'''Bonus Question'''
#*Define 16S “phylo-species” and “metagenomics”.  Describe how PCR amplification and sequencing of 16S rRNA molecules from environmental microbial samples (e.g., sea water, soil, human gut, hot springs) can be used to define species composition of an environment.
|}


===References===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Reference & Resource
|-style="background-color:powderblue;"
|
#Jungck, J. R.; Fass, M.F.; Stanley, E. D. (ed.). 2003 (2006 Revision). Microbes Count! Problem Posing, Problem Solving, and Peer Persuasion in Microbiology. BioQUEST Curriculum Consortium. (Chapter 6, pg 191)
#Holt. J. G. Editor-in-Chief (1984). Bergey’s Manual of Systematic Bacteriology, Volume 1-4. Williams & Wilkins: Baltimore. http://www.cme.msu.edu/bergeys/pubinfo.html
|}


© Weigang Qiu, Hunter College, Last Update Jan 2013
© Weigang Qiu, Hunter College, Last Update Jan 2013

Latest revision as of 20:38, 4 March 2013

EXPERIMENT # 4

BIOL 200 Cell Biology II LAB, Spring 2013

Hunter College of the City University of New York

Course information

Instructors: TBD

Class Hours: Room TBD HN; TBD

Office Hours: Room 830 HN; Thursdays 2-4pm or by appointment

Contact information:

  • Dr. Weigang Qiu: weigang@genectr.hunter.cuny.edu, 1-212-772-5296


Experiment #4

The Tree of Life and Molecular Identification of Microorganisms

Objective

To classify microorganisms and determine their relatedness using molecular sequences.

LAB REPORT GRADING GUIDE

CELL BIO II Experiment #4:

  • Introduction 1 point :
Statement of objectives or aims of the experiment in the student’s own words.
  (not to be copied from the Lab Manual)
  • MATERIALS AND METHODS 0 points :
This should be a brief synopsis and must include any changes or deviations from the procedures 
outlined in the Lab Manual. Specify which organisms were used to create the phylogram.
  • RESULTS 4 points :
A print out of the phylogram will suffice.
  • DISCUSSION 4 points :
Responses to discussion questions.
  • SUMMARY |CONCLUSION 1 point :
Two sentence summary of your findings.
  • REFERENCES 1 point :
Credit is given for pertinent references obtained from sources other than the Lab Manual.
  This point is in addition to the 10 for the lab report..

INTRODUCTION

MATERIALS

  • Required hardware: Computer

Table 1

Volume 1A (Gram-negative bacteria)

Escherichia coli

ACCESSION #174375

Helicobacter pylori

ACCESSION #402670

Salmonella typhi

ACCESSION #2826789

Serratia marcescens

ACCESSION #4582213

Treponema pallidum

ACCESSION #176249

Additional species: Agrobacterium tumefaciens, Boredetella pertussis, Thermus aquaticus, Yersinia pestis, Borrelia burgdorferi. (Note: To search for unlisted 16S sequences, type key words such as “yersinia AND 16S [gene]” in the NCBI GenBank search box.)

Volume 1B (Rikettsias and endosymbionts)

Baronella bacilliformis

ACCESSION #173825

Chlamydia trachomatis

ACCESSION #2576240

Rickettsia rickettsii

ACCESSION #538436

Additional species: Coxiella burnetii, Thermoplasma acidophilum

Volume 2A (Gram-positive bacteria)

Bacillus subtilis

ACCESSION #8980302

Dinococcus radiodurans

ACCESSION #145033

Staphylococcus aureus

ACCESSION #576603

Additional species: Bacillus anthracis, Clostridium botulinum, Lactobacillus acidophilus, Streptococcus pyogenes

Volume 2B (Mycobacteria and nocardia)

Mycobacterium haemophilum

ACCESSION #406086

Mycobacterium tuberculosis

ACCESSION #3929878

Additional species: Mycobacterium bovis, Nocardia orientalis

Volume 3A (Phototrophs, chemolithotrophs, sheathed bacteria, gliding bacteria)

Anabaena sp.

ACCESSION #39010

Cytophaga latercula

ACCESSION #37222646

Nitrobacter wiogradskyi

ACCESSION #402722

Additional species: Heliothrix oregonensis, Myxococcus fulvus, Thiobacillus ferrooxidans

Volume 3B (Archeobaceria)

''Methanococcus jannaschii

ACCESSION #175446

Thermotoga subterranean

ACCESSION #915213

Additional species: Desulfurococcus mucosus, Halobacterium salinarium, Pyrococcus woesei

Volume 4 (Actinomycetes)

Actinomyces bowdenii

ACCESSION #6456800

Actinomyces neuii

ACCESSION #433527

Actinomyces turicensis

ACCESSION #642970

Eukaryotic representative (used as outgroup for rooting the phylogenetic tree)

Saccharomyces cerevisiae

ACCESSION #172403

ANALYSIS

DISCUSSION

References

© Weigang Qiu, Hunter College, Last Update Jan 2013