Bioinformatics Workshop 2013: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Cmartin
imported>Cmartin
 
(72 intermediate revisions by 3 users not shown)
Line 42: Line 42:


The grading scheme for the course, is as follows ('''Subject to some change. You will be notified with sufficient time'''):
The grading scheme for the course, is as follows ('''Subject to some change. You will be notified with sufficient time'''):
*Assignments (50%): 6 exercises (10 points each).
*Assignments (60%): 7 exercises (10 points each).
*Final exam (40%)
*Final exam (30%)
**Bioinformatics terminology and concepts (10 pts)
**Bioinformatics terminology and concepts (Bonus pts)
**Use of web-based Bioinformatics databases (e.g., NCBI) and tools (e.g., BLAST, CLUSTALW, PHYLIP, ORF-Finder) (15 pts)
**Use of web-based Bioinformatics databases (e.g., NCBI) and tools (e.g., BLAST, CLUSTALW, PHYLIP, ORF-Finder) (Bonus pts)
**Ability to interpret an algorithm and its Perl implementations (15 pts)
**Ability to interpret an algorithm and its Perl implementations (Bonus pts)
*Classroom Q & A (5%):  Read the chapters before lecture.
*Classroom Q & A (5%):  Read the chapters before lecture.
*Attendance (5%): 1-2 absences = -2.5%. More than 2 = -5%.
*Attendance (5%): 1-2 absences = -2.5%. More than 2 = -5%.
Line 65: Line 65:
'''"Lecture slides" links will be available either during or before each lecture, in PDF.'''
'''"Lecture slides" links will be available either during or before each lecture, in PDF.'''


'''Homework assignments are due the week *after* the date under which they appear.''' ie, an assignment posted under June 4 is due the following lecture, on June 6.
'''Homework assignments are due the class *after* the date under which they appear.''' ie, an assignment posted under June 4 is due the following lecture, on June 6.


===June 4===
===June 4===
*Course Overview
*'''Course Overview'''
*'''Scope of Bioinformatics''' (Chapter 1)[Lecture Slides]
*'''Scope of Bioinformatics''' (Chapter 1)([[Media:Scope.pdf|Lecture Slides-Che]])
*'''Tutorial:''' UNIX Account, Tools, & Emacs
*'''WORKSHOP SLIDES''':([[Media:BioTeach1.pdf|Lecture Slides-Slav]])
*'''Workshop 1''':
*'''Workshop 1''': NCBI/OMIM Database
**Intro to UNIX
*'''Workshop 2''': UNIX Operating System
**Program editors
**Terminal & Home Directory
**The vi Editor
**first basic program
**first basic program
{| class="collapsible collapsed wikitable"
{| class="collapsible collapsed wikitable"
Line 80: Line 81:
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Linux Proficiency'''<br />
| '''Linux Proficiency'''<br />
#Display the absolute path of your home directory
#Install ActivePerl (if you use Windows; Not necessary if you have Mac OS X)
#List files in your home directory in long format & ordered by their time stamps
#Install vim (if you use Windows; Not necessary if you have Mac OS X)
#TBD
# (5 pts) OMIM Question
#TBD
# (5 pts) Analyze each line of your "Opposite strand code" written in class and describe what it does. (please write your answer)
#TBD
#TBD
#TBD
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Read''' Chapter 2
| '''Read''' Chapter 2
Line 92: Line 90:


===June 6===
===June 6===
*'''Chapter 2.''' Central Dogma & Molecular Biology terms [Lecture Slides Ch.2]
*'''Chapter 2.''' Central Dogma & Molecular Biology terms (Chapter 2)([[Media:chap2.pdf|Lecture Slides-Che]])
*'''Workshop 2''':
*'''Workshop 2''': ([[Media:Bioteach2.pdf|Lecture Slides-Slav]])
**Linux tutorial
**Linux tutorial
**Basic Perl (Appendix B1 & B2, pg.310-318)
**Basic Perl (Appendix B1 & B2, pg.310-318)
Line 104: Line 102:
#"Web Exploration" (pg. 25-27, 7 questions)
#"Web Exploration" (pg. 25-27, 7 questions)
#"Running the Program" (pg.33). Show source code, input, and output
#"Running the Program" (pg.33). Show source code, input, and output
#TBD
#Using the code we have written in class and your new found understanding of Perl, write a code which prompts the user to enter a DNA sequence and then prints the translation. Assume that the user will provide a sequence that consists of only upper-case A,T,G and C AND that the sequence will have a length that is a multiple of three. In addition to using the hash of amino acids and their one letter codes, your program should incorporate some or all of the following:
#TBD
*length($string): (return a number equal to the length of the variable specified inside the parentheses).
#TBD
*while (CONDITION) { LINES OF CODE } : (repeatedly execute the instructions within the curly brackets as long as the conditions inside the parentheses are met).
#TBD
*if (CONDITION) { LINES OF CODE } : (instructions within the curly brackets are executed only if the condition in the parentheses is met).
#TBD
#
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Review''' Chapter 2
| '''Review''' Chapter 2
Line 115: Line 113:
===June 11===
===June 11===
*'''Chapter 2.''' Central Dogma & Molecular Biology (continued) [Lecture Slides Ch.2]
*'''Chapter 2.''' Central Dogma & Molecular Biology (continued) [Lecture Slides Ch.2]
*'''Workshop 3''':
*'''Workshop 3''': ([[Media:BioTeach3.pdf|Lecture Slides-Slav]])
**Perl (Appendix B3 & B4, pg. 318-322)
**'''Perl''' (Appendix B3 & B4, pg. 318-322)
**'''Algorithm 3''': Translation
**'''Algorithm 3''': Translation
{| class="collapsible collapsed wikitable"
{| class="collapsible collapsed wikitable"
Line 125: Line 123:
#"Running the Program" (pg.37). Input your own sequences. Show input and output, but do NOT print the source code.  
#"Running the Program" (pg.37). Input your own sequences. Show input and output, but do NOT print the source code.  
#"Putting Your Skills into Practice" Q6 & Q7 (pg.37-38). Show source code (Q7 only), input (Q6 & Q7), and outputs (Q6 & Q7).
#"Putting Your Skills into Practice" Q6 & Q7 (pg.37-38). Show source code (Q7 only), input (Q6 & Q7), and outputs (Q6 & Q7).
#TBD
#Explain when you would use the following UNIX commands. Your answer should indicate if the command require any arguments:cd, pwd, man, cp, cat, mkdir, rm, grep, wc.
#TBD
#Choose three commands from the list above and describe two options/arguments which modify the way in which the command functions.
#TBD
#Describe what the following commands do in your own words:
  cat Sickle_Protein_FASTA | wc
  cat Sickle_Protein_FASTA > wc
  cat Sickle_Protein_FASTA >> wc
  ls -lh /User/Desktop/FASTA_FILES
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Read''' Chapter 3
| '''Read''' Chapter 3
Line 133: Line 135:


===June 13===
===June 13===
*'''Chapter 3.''' NCBI Databases/Tools; Gene alignments [Lecture Slides Ch.3]
*'''Chapter 3.''' NCBI Databases/Tools; Gene alignments: ([[Media:chap3.pdf|Lecture Slides-Ch3]])
*'''Workshop 4''':
*'''Workshop 4''':
**'''Web Exploration''' (pg.60-66)
**'''Web Exploration''' (pg.60-66)
Line 142: Line 144:
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Linux Proficiency'''<br />
| '''Linux Proficiency'''<br />
#Print the GenBank file with the accession AY356351. In the DNA sequence section, mark the following gene elements: start and end of exon(s), translation start and stop, 5' untranslated regions, intron start and end, and any SNPs associated with human diseases
#Create a new vi file with the provided code, grant appropriate file permissions and run the script using a FASTA file as an argument. In a single sentence, describe what the code does. 
#Compare wild-type and mutant HBB protein sequences (NP_000509 and AAQ63175) using BL2SEQ. Print your result and explain the following terms: Expect, Score, Identities, Positives, and Gaps
#Add comments to every line of code explaining what it does.
#TBD
<pre>
#TBD
#!/usr/bin/perl
#TBD
 
use strict;
use warnings;
 
die "Usage: $0 <Fasta_File>\n" unless @ARGV >0;
my $filename = shift(@ARGV);
 
my $dna_string = '';
 
open (FILE, $filename);
 
while ( <FILE> ) {
        my $line = $_;
        chomp $line;
        if ($line =~ /^>/) {
                print $line, "COMPLEMENT\n";
                next;
        }
        else {
                $dna_string .= $line;
                next;
}
}
 
for (my $i=0; $i<length($dna_string); $i++) {
        my $nucleo = substr($dna_string,$i,1);
        if ( $nucleo eq "A" ) { print "T"; }
        elsif ( $nucleo eq "C" ) { print "G"; }
        elsif ( $nucleo eq "G" ) { print "C"; }
        else { print "A"; }
}
 
close FILE
</pre>
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Read''' Chapter 6
| '''Read''' Chapter 6
Line 152: Line 187:


===June 18===
===June 18===
*'''Chapter 6.''' Gene Prediction [Lecture Slides Ch.6]
*'''Chapter 6.''' Gene Prediction ([[Media:Chap6.pdf|Lecture Slides-Ch6]])
*'''Workshop 5''':
*'''Workshop 5''':
**'''Web Exploration''' (pg.168-174)
**'''Web Exploration''' (pg.168-174)
Line 163: Line 198:
#Manually translate the follow sequence in all 6 reading frames (use one-letter amino acid code): 5'-GTTCCCTCTCGGGT-3'. '''Show your work'''
#Manually translate the follow sequence in all 6 reading frames (use one-letter amino acid code): 5'-GTTCCCTCTCGGGT-3'. '''Show your work'''
#Modify your translation script (one-letter code version) so that it translates a DNA sequence in all six reading frames.  Use your script to find the correct reading frame of the given sequence. '''Show your code, input and output''' (partial credits will be considered).
#Modify your translation script (one-letter code version) so that it translates a DNA sequence in all six reading frames.  Use your script to find the correct reading frame of the given sequence. '''Show your code, input and output''' (partial credits will be considered).
#TBD
#TBD
#TBD
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Read''' Chapter 8
| '''Read''' Chapter 8
Line 171: Line 203:


===June 20===
===June 20===
*'''Chapter 8.''' Molecular Phylogenetics [Lecture Slides Ch.8]
*'''Chapter 6.''' Gene Prediction [continued]
*'''Workshop 6''':
*'''Workshop 6''':
**'''Web Exploration''' (pg.244-248)
**'''Web Exploration''' (pg.168-169)
**'''Algorithm 4''': TBD
**'''Algorithm 4''': TBD
{| class="collapsible collapsed wikitable"
{| class="collapsible collapsed wikitable"
Line 179: Line 211:
! Assignment #6
! Assignment #6
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Tree Thinking'''<br />
| '''Using ONLINE tools '''<br />
#"Tree-Thinking" Puzzles [Download]: Briefly explain your choices. (Partial credits if you simply mark the choices).
# Using the supplied accession number [YP_063283]:  
#Phylogenetics worksheet [Download]
# Find the top 6 orthologs using on online tool we covered in class.  
#TBD
# Align the 7 sequences (6 identified orthologs plus given sequence) using another online tool covered in class.
#TBD
# Your results should include: Printed alignment and tabulated results showing name, scores, and e-values of the significant orthologs.
#TBD
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
| '''Review''' Chapter 8
| '''Review''' Chapter 8
Line 190: Line 221:


===June 25===
===June 25===
*'''Chapter 8.''' Molecular Phylogenetics (continued) [Lecture Slides Ch.8]
*'''Chapter 8.''' Molecular Phylogenetics([[Media:chap8.pdf|Lecture Slides-Ch8]])
**'''Web Exploration''' (pg.244-248)
*'''Begin Review'''
*'''Begin Review'''
**''' Begin Algorithm Review'''
**''' Begin Algorithm Review'''
Line 198: Line 230:
|- style="background-color:powderblue;"
|- style="background-color:powderblue;"
| '''Tree Thinking'''<br />
| '''Tree Thinking'''<br />
#Prepare Review questions for final
#"Tree-Thinking" Puzzles ([[Media:TreeV1.pdf|Download]]): Briefly explain your choices. (Partial credits if you simply mark the choices).
|-style="background-color:powderblue;"
|-style="background-color:powderblue;"
|
| '''Review''' Chapter 8
|}
|}
===June 27===
===June 27===
*'''Review for final'''
*'''Review Web Exploration, Databases, and Gene Prediction'''
 
===July 2===
===July 2===
*'''Final Exam'''
*'''Review Code Structure and syntax, as well as common coding errors. Also begin review of phylogeny.'''
 
===July 9===
*'''Student Q&A session '''
===July 11===
*'''Final EXAM Due''' ([[Media:FINAL.pdf|Final Take Home Download]])
**Asad Filza      : '''Gene Assigned'''
**Bachu Saheed    : '''Gene Assigned'''
**Basdeo Sharon    : '''Gene Assigned'''
**Deopaul Randy    : '''Gene Assigned'''
**Grinman Eddie    : '''Gene Assigned'''
**Horowitz Marc    : Gene Unassigned
**Liang Raymond    : '''Gene Assigned'''
**Munday Gagandeep : '''Gene Assigned'''
**Quijano, April  : '''Gene Assigned'''
**Uddin, Azad      : '''Gene Assigned'''
*Please request your assigned gene name via email. Once you have received your gene name, your status on the above list will be changed to '''"Assigned"'''.
*'''Note the Final is Due at the beginning of the class'''.

Latest revision as of 17:56, 22 July 2013

Summer Bioinformatics Biology (BIOL 470.83/790.86, Spring 2013)
Instructors: Che Martin & Slav Kendal
Room:1000G HN (10th Floor, North Building
Hours: Tues & Thur 11:30 am-15:00
Office Hours: Room 830 HN; Tuesday 3-5pm or by appointment
Contacts: Mr Martin: cmartin@gc.cuny.edu; Mr Kendal: skendall@hunter.cuny.edu

Course Description

Background

Biomedical research is becoming a high-throughput science. As a result, information technology plays an increasingly important role in biomedical discovery. Bioinformatics is a new interdisciplinary field formed by the merging of molecular biology and computer science techniques.Today’s biology students must therefore not only learn to perform in vivo and invitro, but also in silico research skills. Quantitative/computational biologists are expected to be in increasing demand in the 21st century.

However, the technical barrier to enter the field and perform basic research projects in a bioinformatics lab is daunting for most undergraduate students. This is mainly due to the multidisciplinary nature of quantitative biology, which requires understandings and skills in chemistry, biology, computer programming, and statistics. The Hunter Summer Bioinformatics Workshop aims to introduce bioinformatics to motivated undergraduate and high school students by lowering the barrier and dispensing the usual pre-requisites in advanced biology/chemistry courses as well as entry-level programming/statistics courses. The Workshop does not assume prior programming experience.

The workshop DOES NOT

  • Replace existing advanced bioinformatics courses such as BIOL425 and STAT 319
  • Teach advanced bioinformatics programming skills (e.g., advanced data structure, object-oriented Perl, BioPerl, or relational database with SQL), which are the contents of BIOL425
  • Teach in-depth statistics or the popular R statistical package, although probabilistic thinking (e.g., distributions of a random variable, stochastic processes, likelihood, clustering analysis) is at the core of all bioinformatics analysis (STAT 319 teaches these topics)

To learn these advanced bioinformatics topics and skills, motivated students are encouraged to enroll in one of the Five Bioinformatics Concentrations of at Hunter. The QuBi program prepares the students for bioinformatics positions in a research lab or a biotechnology company.

Contents

This course will introduce both bioinformatics theories and practices. Topics include: database searching, sequence alignment, and basic molecular phylogenetics. The course is held in a UNIX-based instructional lab specifically configured for bioinformatics applications. Each session consists of a first-half instruction on bioinformatics theories and a second-half session of hands-on exercises.

Learning Goals

Students are expected to be able to:

  • Approach biological questions evolutionarily ("Tree-thinking")
  • Design efficient procedures to solve problems ("Algorithm-thinking")
  • Manipulate high-volume textual data using UNIX tools, Perl and Relational Database ("Data Visualization")

Textbook

St.Clair& Visick, (2010). Exploring Bioinformatics: a Project-Based Approach. Jones and Bartlett Publishers, Sudbury, Massachusetts, Inc. (ISBN 0-978-7637-5829-5)

This book should be available through several popular retailers and resellers online.

Grading & Academic Honesty

Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity and will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures.

Student performance will be evaluated by weekly assignments and projects. While these are take-home projects and students are allowed to work in groups, students are expected to compose the final short answers, computer commands, and code independently. There are virtually an unlimited number of ways to solve a computational problem, as are ways and personal styles to implement an algorithm. Writings and blocks of codes that are virtually exact copies between individual students will be investigated as possible cases of plagiarism (e.g., copies from the Internet, text book, or each other). In such a case, the instructor will hold closed-door exams for involved individuals. Zero credits will be given to ALL involved individuals if the instructor considers there is enough evidence for plagiarism. To avoid being investigated for plagiarism, Do Not Copy from Others & Do Not Let Others Copy Your Work.

The grading scheme for the course, is as follows (Subject to some change. You will be notified with sufficient time):

  • Assignments (60%): 7 exercises (10 points each).
  • Final exam (30%)
    • Bioinformatics terminology and concepts (Bonus pts)
    • Use of web-based Bioinformatics databases (e.g., NCBI) and tools (e.g., BLAST, CLUSTALW, PHYLIP, ORF-Finder) (Bonus pts)
    • Ability to interpret an algorithm and its Perl implementations (Bonus pts)
  • Classroom Q & A (5%): Read the chapters before lecture.
  • Attendance (5%): 1-2 absences = -2.5%. More than 2 = -5%.
  • Email help: Include course code ("BIOL470", or "BIOL790") in the subject line

Programming Assignment Expectations

All code must begin with the lines in the Perl slides, without exception. For each assignment, unless otherwise stated, I would like the full text of the source code. Since you cannot print using the text editor in the lab (even if you are connected from home), you must copy and paste the code into a word processor or a local text editor. If you are using a word processor, change the font to a fixed-width/monospace font. On Windows, this is usually Courier.

Code indentation is your personal taste, so long as it is consistent and readable. Use comments whenever you think either the code is unclear, or simply as a guideline for yourself. Well-commented code improves readability, but be careful not overdo it.

Also, unless otherwise stated, both the input and the output of the program must be submitted as well. This should also be in fixed-width font, and you should label it in such a way so that I know it is the program's input/output. This is so that I know that you've run the program, what data you have used, and what the program produced.

If you are working from the lab, one option is to email the code to yourself, change the font, and then print it somewhere else as there is no printer in the lab.

Course Schedule (Tuesdays and Thursdays)

Dates and assignments below are subject to some change

"Lecture slides" links will be available either during or before each lecture, in PDF.

Homework assignments are due the class *after* the date under which they appear. ie, an assignment posted under June 4 is due the following lecture, on June 6.

June 4

  • Course Overview
  • Scope of Bioinformatics (Chapter 1)(Lecture Slides-Che)
  • WORKSHOP SLIDES:(Lecture Slides-Slav)
  • Workshop 1: NCBI/OMIM Database
  • Workshop 2: UNIX Operating System
    • Terminal & Home Directory
    • The vi Editor
    • first basic program

June 6

  • Chapter 2. Central Dogma & Molecular Biology terms (Chapter 2)(Lecture Slides-Che)
  • Workshop 2: (Lecture Slides-Slav)
    • Linux tutorial
    • Basic Perl (Appendix B1 & B2, pg.310-318)
    • Algorithm 2: Transcription

June 11

  • Chapter 2. Central Dogma & Molecular Biology (continued) [Lecture Slides Ch.2]
  • Workshop 3: (Lecture Slides-Slav)
    • Perl (Appendix B3 & B4, pg. 318-322)
    • Algorithm 3: Translation

June 13

  • Chapter 3. NCBI Databases/Tools; Gene alignments: (Lecture Slides-Ch3)
  • Workshop 4:
    • Web Exploration (pg.60-66)
    • Algorithm 3: Translation

June 18

  • Chapter 6. Gene Prediction (Lecture Slides-Ch6)
  • Workshop 5:
    • Web Exploration (pg.168-174)
    • Algorithm 3: TBD

June 20

  • Chapter 6. Gene Prediction [continued]
  • Workshop 6:
    • Web Exploration (pg.168-169)
    • Algorithm 4: TBD

June 25

  • Chapter 8. Molecular Phylogenetics(Lecture Slides-Ch8)
    • Web Exploration (pg.244-248)
  • Begin Review
    • Begin Algorithm Review

June 27

  • Review Web Exploration, Databases, and Gene Prediction

July 2

  • Review Code Structure and syntax, as well as common coding errors. Also begin review of phylogeny.

July 9

  • Student Q&A session

July 11

  • Final EXAM Due (Final Take Home Download)
    • Asad Filza  : Gene Assigned
    • Bachu Saheed  : Gene Assigned
    • Basdeo Sharon  : Gene Assigned
    • Deopaul Randy  : Gene Assigned
    • Grinman Eddie  : Gene Assigned
    • Horowitz Marc  : Gene Unassigned
    • Liang Raymond  : Gene Assigned
    • Munday Gagandeep : Gene Assigned
    • Quijano, April  : Gene Assigned
    • Uddin, Azad  : Gene Assigned
  • Please request your assigned gene name via email. Once you have received your gene name, your status on the above list will be changed to "Assigned".
  • Note the Final is Due at the beginning of the class.