Biol20N02 2016: Difference between revisions

Revision as of 15:08, 17 February 2016

Analysis of Biological Data (BIOL 20N02, Spring 2015) Instructor: Dr Weigang Qiu, Associate Professor, Department of Biological Sciences Room: 1001B HN (North Building, 10th Floor, Mac Computer Lab) Hours: Tuesdays 10-1 Office Hours: Belfer Research Building (Google Map) BB-402; Wed 5-7 pm or by appointment Course Website: http://diverge.hunter.cuny.edu/labwiki/Biol20N2_2016

Course Description

With rapid accumulation of genome sequences and digitalized health data, biomedicine is becoming a data-intensive science. This course is a hands-on, computer-based workshop on how to visualize and analyze large quantities of biological data. The course introduces R, a modern statistical computing language and platform. Students will learn to use R to make scatter plots, bar plots, box plots, and other commonly used data-visualization techniques. The course will review statistical methods including hypothesis testing, analysis of frequencies, and correlation analysis. Student will apply these methods to the analysis of genomic and health data such as whole-genome gene expressions and SNP (single-nucleotide polymorphism) frequencies.

This 3-credit experimental course fulfills elective requirements for Biology Major I. Hunter pre-requisites are BIOL100, BIOL102 and STAT113.

Learning Goals

Be able to use R as a plotting tool to visualize large-scale biological data sets
Be able to use R as a statistical tool to summarize data and make biological inferences
Be able to use R as a programming language to automate data analysis

Textbooks

R Studio (Required): Learning RStudio for R Statistical Computing
Digital textbook (Required): Data Analysis for the Life Sciences

Exams & Grading

Attendance (or a note in case of absence) is required
In-Class Exercises (50 pts).
Assignments. All assignments should be handed in as hard copies only. Email submission will not be accepted. Late submissions will receive 10% deduction (of the total grade) per day.
Three Mid-term Exams (3 X 30 pts each = 90 pts)
Comprehensive Final Exam (50 pts)
Bonus for active participation in classroom discussions

Course Outline

Feb 2. Introduction & tutorials for R/R studio

Course overview
Install R & RStudio on your home computers (Chapter 1. pg. 9)
Tutorial 1: First R Session (pg. 12)
1. Create a new project by navigating: File | New Project | New Directory. Name it project file "Abalone"
2. Import abalone data set: Tools | Import DataSet | From Web URL, copy & paste this address: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data
3. Assign column names: colnames(abalone) <- c("Sex", "Length", "Diameter", "Height", "Whole_Weight", "Shucked_weight", "Viscera_weight", "Shell_weight", "Rings")
4. Save data into a file: write.csv(abalone, "abalone.csv", row.names = FALSE)
5. Create a new R script: File | New | R script. Type the following commands:abalone <- read.csv("abalone.csv"); boxplot(Length ~ Sex, data = abalone)
6. Save as "abalone.R" using File | Save
7. Execute R script: source("abalone.R")
8. Install the notebook package: install.packages("knitr")
9. Compile a Notebook: File | Compile Notebook | HTML | Open in Browser
Tutorial 2. Writing R Scripts (Chapter 2. pg. 21)
Tutorial 3. Vector

Assignment #1. Due 2/16, Tuesday (Finalized)
(2 pts) Install R & R Studio on your own computer (4 pts) Reproduce the "abalone" project (follow steps in Tutorial 1). Save & Print your notebook (consists of commands and a boxplot). If "Compile notebook" doesn't work, print a copy of your commands and export the boxplot in 4X6 PDF (print & submit) (4 pts) Vector operations. Create a new vector object of abalone height: `ht <- abalone$Height` Show commands for extracting the first item, first 10 items, items 20 through 30, the 1st, 2nd, and 5th items First, obtain the indices for items less than 0.5 using the which() function. Save as a new vector called "ht.idx". Then, obtain the actual items by combining the "ht" and "ht.idx" vectors. Apply the following functions: range(), min(), max(), mean(), var(). [Hint: use help(var), help(min) for help]

Feb 9. No class (Friday Schedule)

Feb 16. Introduction & tutorials for R/R studio

Start a new project called "Session-02-individual"
Tutorial 3: Vector (Continued)

x <- c(1,2,3,4,5) # construct a vector using the c() function
x # show x
2 * x + 1 # arithmetic operations, applied to each element
exp(x) # exponent function (base e)
x <- 1:5 # alternative way to construct a vector, if consecutive
x <- seq(from = -1, to = 14, by = 2) # use the seq() function to create a numeric series
x <- rep(5, times = 10) # use the rep() function to create a vector of same element
x <- rep(NA, times = 10) # pre-define a vector with unknown elements; Use no quotes
# Apply vector functions
length(x)
sum(x)
mean(x)
range(x)
# Access vector elements
x[1]
x[1:3]
x[-2]
# Character vectors
gender <- c("male", "female", "female", "male", "female")
gender[3]
# Logical vectors
is.healthy <- c(TRUE, TRUE, FALSE, TRUE, FALSE) # Use no quotes
is.male <- (gender == "male") # obtain a logic vector by a test
age <- c(60, 43, 72, 35, 47)
is.60 <- (age == 60)
less.60 <- (age <= 43)
is.female <- !is.male # use the logical negate operator (!)
# The which() function returns the indices of TRUE elements
ind.male <- which(is.male)
ind.young <- which(age < 45)
age[ind.young] # obtain ages of young individuals

Tutorial 4: Matrix

BMI <- c(28, 32, 21, 27, 35) # a vector of body-mass index
bp <- c(124, 145, 127, 133, 140) # a vector of blood pressure
data.1 <- cbind(age, BMI, bp) # create a matrix using column bind function cbind(), individuals in rows
data.1
data.2 <- rbind(age, BMI, bp) # create a matrix using row bind function rbind()
t(data.1) # transpose a matrix: columns to rows & rows to columns
dim(data.1) # dimension of the matrix
colnames(data.1)
rownames(data.1) <- c("subject1", "subject2", "subject3", "subject4", "subject5")
data.1
data.1[3,1] # access the element in row 3, column 1
data.1[2,] # access all elements in row 2
data.1[,2] # access all elements in column 2
matrix(data = 1:12, nrow = 3, ncol =4) # create a matrix with three rows and four columns; filled by column
matrix(data = 1:12, nrow = 3, ncol =4, byrow = TRUE) # filled by row
mat <- matrix(data = NA, nrow = 2, ncol = 3) # create an empty matrix
mat[1,3] <- 5 # assign a value to a matrix element

Assignment #2. Due 2/23, Tuesday
(2 pts) Construct a numeric vector of 10 random numbers sampled from the uniform distribution between 0 and 1 (Hint: use the function `runif()`). Name the resulting vector as "rand.1". Show length, range, mean, and variance. (2 pts) Construct a numeric vector of 10 random numbers sampled from a normal distribution with mean of 0 and variance of 1 Hint: use the function `rnorm()`). Name the resulting vector as "rand.2". Show length, range, mean, and variance. (2 pts) Construct a matrix of 10 rows by combining the previous two vectors using the `cbind` function. Name the matrix as "mat". Assign row names as "ind1" .. "ind10". Show row values for ind1, column values for rand.2; transpose the matrix and save it as "mat.t". (2 pts) Construct a logical vector of 10 from the first number by using the test of evenness (2 pts) Construct a character vector of 10 US States. Use full names and "_" in place of spaces (2 pts) Matrix

Feb 23. Statistics & samples

March 1. Displaying data

March 8. Describing data; Exam 1.

March 15. Probability and hypothesis testing

March 22. Analysis of proportions

March 29. Analysis of frequencies

April 5. Contingency tests; Exam 2

April 12. Normal distribution and controls

April 19. Comparing two means

April 26. No Class (Spring break)

May 3. Designing experiments

May 10. Comparing more than two groups; Exam 3

May 17. Correlation analysis

May 24. Final Exam (Comprehensive)

May 31. Grades submitted to Registrar Office

@@ Line 114: / Line 114: @@
 mat <- matrix(data = NA, nrow = 2, ncol = 3) # create an empty matrix
 mat[1,3] <- 5 # assign a value to a matrix element
-</syntaxhighlight>
-* Tutorial 5. Data Frame: a table to store mixed data types
-<syntaxhighlight lang=R">
-data.df = data.frame(age, gender, is.healthy)
-data.df
-class(data.df) # check object type
-factor(gender) # categories (called "levels") of a character vector
-data.df[3,4] # access row 3, column 4
-data.df[, "age"] # a vector of all ages
-data.df$age # an alternative way, using the $ notation
-data.df$BMI[4]
-data.df$gender[2]
-# Create a data frame from text files:
-# Download and save the file: http://extras.springer.com/2012/978-1-4614-1301-1/BodyTemperature.txt
-BodyTemperature <- read.table(file = "BodyTemperature.txt", header = TRUE, sep = " ")
-head(BodyTemperature) # show first 10 lines
-names(BodyTemperature) # show column headings
-BodyTemperature[1:3, 2:4] # show a slice of data
-BodyTemperature$Age[1:3] # show 1-3 ages
 </syntaxhighlight>
 {| class="wikitable sortable mw-collapsible"
@@ Line 139: / Line 120: @@
 |- style="background-color:powderblue;"
 |
-# (2 pts) Construct a numeric vector of 10 random numbers sampled from the uniform distribution between 1 and 10. Name the resulting vector as "rand.1". Show length, range, mean, and variance.
+# (2 pts) Construct a numeric vector of 10 random numbers sampled from the uniform distribution between 0 and 1 (Hint: use the function <code>runif()</code>). Name the resulting vector as "rand.1". Show length, range, mean, and variance.
+# (2 pts) Construct a numeric vector of 10 random numbers sampled from a normal distribution with mean of 0 and variance of 1 Hint: use the function <code>rnorm()</code>). Name the resulting vector as "rand.2". Show length, range, mean, and variance.
+# (2 pts) Construct a matrix of 10 rows by combining the previous two vectors using the <code>cbind</code> function. Name the matrix as "mat". Assign row names as "ind1" .. "ind10". Show row values for ind1, column values for rand.2; transpose the matrix and save it as "mat.t".
+# (2 pts) Construct a logical vector of 10 from the first number by using the test of evenness
 # (2 pts) Construct a character vector of 10 US States. Use full names and "_" in place of spaces
-# (2 pts) Construct a logical vector of 10 from the first number by using the test of evenness
 # (2 pts) Matrix
 |}

Biol20N02 2016: Difference between revisions

Revision as of 15:08, 17 February 2016

Contents

Course Description

Learning Goals

Textbooks

Exams & Grading

Course Outline

Feb 2. Introduction & tutorials for R/R studio

Feb 9. No class (Friday Schedule)

Feb 16. Introduction & tutorials for R/R studio

Feb 23. Statistics & samples

March 1. Displaying data

March 8. Describing data; Exam 1.

March 15. Probability and hypothesis testing

March 22. Analysis of proportions

March 29. Analysis of frequencies

April 5. Contingency tests; Exam 2

April 12. Normal distribution and controls

April 19. Comparing two means

April 26. No Class (Spring break)

May 3. Designing experiments

May 10. Comparing more than two groups; Exam 3

May 17. Correlation analysis

May 24. Final Exam (Comprehensive)

May 31. Grades submitted to Registrar Office

Navigation menu

Biol20N02 2016: Difference between revisions

Revision as of 15:08, 17 February 2016

Course Description

Learning Goals

Textbooks

Exams & Grading

Course Outline

Feb 2. Introduction & tutorials for R/R studio

Feb 9. No class (Friday Schedule)

Feb 16. Introduction & tutorials for R/R studio

Feb 23. Statistics & samples

March 1. Displaying data

March 8. Describing data; Exam 1.

March 15. Probability and hypothesis testing

March 22. Analysis of proportions

March 29. Analysis of frequencies

April 5. Contingency tests; Exam 2

April 12. Normal distribution and controls

April 19. Comparing two means

April 26. No Class (Spring break)

May 3. Designing experiments

May 10. Comparing more than two groups; Exam 3

May 17. Correlation analysis

May 24. Final Exam (Comprehensive)

May 31. Grades submitted to Registrar Office

Navigation menu

Search