Biol20N02 2016: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 36: Line 36:
{| class="wikitable sortable mw-collapsible"
{| class="wikitable sortable mw-collapsible"
|- style="background-color:lightsteelblue;"
|- style="background-color:lightsteelblue;"
! Assignment #1 (10 pts; Due 2/16, Tuesday)
! Assignment #1
|- style="background-color:lightblue;"
|- style="background-color:powderblue;"
To Be Posted
| '''Unix Text Filters''' (10 pts) Show both commands and outputs for the following questions:<br>
# Without changing directory (i.e., remain in your home directory), locate and long-list the genbank file named "GBB.gb" in the course data directory
# Count the total number of lines, show the first and last 10 lines of the file. Using a combination of <code>head</code> and <code>tail</code> commands, show only the lines containing the translated protein sequence of the first gene
# Count the total number of replicans by extracting lines containing "LOCUS" (case sensitive); sort them by the total number of bases ("bp")
# Remove the string "(plasmid" from the above output
# Extract the second column (replicon names) from the above output. [Hint: these fields are delimited by an unequal number of spaces, not by tabs. Use <code>tr -s</code> to first squeeze to single space]
|}
|}



Revision as of 04:56, 2 February 2016

Analysis of Biological Data (BIOL 20N02, Spring 2015)
Instructor: Dr Weigang Qiu, Associate Professor, Department of Biological Sciences
Room: 1001B HN (North Building, 10th Floor, Mac Computer Lab)
Hours: Tuesdays 10-1
Office Hours: Belfer Research Building (Google Map) BB-402; Wed 5-7 pm or by appointment
Course Website: http://diverge.hunter.cuny.edu/labwiki/Biol20N2_2016

Course Description

With rapid accumulation of genome sequences and digitalized health data, biomedicine is becoming a data-intensive science. This course is a hands-on, computer-based workshop on how to visualize and analyze large quantities of biological data. The course introduces R, a modern statistical computing language and platform. Students will learn to use R to make scatter plots, bar plots, box plots, and other commonly used data-visualization techniques. The course will review statistical methods including hypothesis testing, analysis of frequencies, and correlation analysis. Student will apply these methods to the analysis of genomic and health data such as whole-genome gene expressions and SNP (single-nucleotide polymorphism) frequencies.

This 3-credit experimental course fulfills elective requirements for Biology Major I. Hunter pre-requisites are BIOL100, BIOL102 and STAT113.

Learning Goals

  • Be able to use R as a plotting tool to visualize large-scale biological data sets
  • Be able to use R as a statistical tool to summarize data and make biological inferences
  • Be able to use R as a programming language to automate data analysis

Textbooks

Exams & Grading

  • Attendance (or a note in case of absence) is required
  • In-Class Exercises (50 pts).
  • Assignments. All assignments should be handed in as hard copies only. Email submission will not be accepted. Late submissions will receive 10% deduction (of the total grade) per day.
  • Three Mid-term Exams (3 X 30 pts each = 90 pts)
  • Comprehensive Final Exam (50 pts)
  • Bonus for active participation in classroom discussions

Course Outline

Feb 2. Introduction & tutorials for R/R studio

  1. Course overview
  2. Install R & RStudio on your home computers (Chapter 1. pg. 9)
  3. Tutorial 1: First R Session (pg. 12)
  4. Tutorial 2. Writing R Scripts (Chapter 2. pg. 21)
Assignment #1
Unix Text Filters (10 pts) Show both commands and outputs for the following questions:
  1. Without changing directory (i.e., remain in your home directory), locate and long-list the genbank file named "GBB.gb" in the course data directory
  2. Count the total number of lines, show the first and last 10 lines of the file. Using a combination of head and tail commands, show only the lines containing the translated protein sequence of the first gene
  3. Count the total number of replicans by extracting lines containing "LOCUS" (case sensitive); sort them by the total number of bases ("bp")
  4. Remove the string "(plasmid" from the above output
  5. Extract the second column (replicon names) from the above output. [Hint: these fields are delimited by an unequal number of spaces, not by tabs. Use tr -s to first squeeze to single space]

Feb 9. No class (Friday Schedule)

Feb 16. Introduction & tutorials for R/R studio

Feb 23. Statistics & samples

March 1. Displaying data

March 8. Describing data; Exam 1.

March 15. Probability and hypothesis testing

March 22. Analysis of proportions

March 29. Analysis of frequencies

April 5. Contingency tests; Exam 2

April 12. Normal distribution and controls

April 19. Comparing two means

April 26. No Class (Spring break)

May 3. Designing experiments

May 10. Comparing more than two groups; Exam 3

May 17. Correlation analysis

May 24. Final Exam (Comprehensive)

May 31. Grades submitted to Registrar Office