R-tutorial: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
mNo edit summary
imported>Weigang
mNo edit summary
Line 11: Line 11:
hg$Length <- hg$Gene.End - hg$Gene.Start + 1 # add a column for gene length; access variables by pressing "Tab" after "$" (auto-completion)
hg$Length <- hg$Gene.End - hg$Gene.Start + 1 # add a column for gene length; access variables by pressing "Tab" after "$" (auto-completion)
hist(hg$Length, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
hist(hg$Length, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
hist(log10(hg$Length), br = 200)
mean(hg$Length) # not representative, super-long genes carry too much weight to the average length
mean(hg$Length) # not representative, super-long genes carry too much weight to the average length
median(hg$Length) # More representative. Use median for a variable not normally distributed
median(hg$Length) # More representative. Use median for a variable not normally distributed
summary(hg$Length) # Show all quartiles
summary(hg$Length) # Show all quartiles
boxplot(Length ~ Chromosome, data = hg)  show gene length by chromosomes
boxplot(log10(Length) ~ Chromosome, data = hg)  show gene length by chromosomes
write.csv(hg, "hg.csv", row.names = FALSE) # save into a file
write.csv(hg, "hg.csv", row.names = FALSE) # save into a file
hg <- read.csv("hg.csv") # read back into R
hg <- read.csv("hg.csv") # read back into R

Revision as of 18:50, 2 May 2016

  • Install R & RStudio on your home computer
  • Create a new project by navigating: File | New Project | New Directory. Name it project file "genes"
  • Import abalone data set: Tools | Import DataSet | From Web URL, copy & paste this address: http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/hg.tsv2
  • Select "Yes" for column heading. Rename the data set if you wish (short but informative names, e.g., human.genes). Do not use spaces, use dot or underscore as name delimiters (e.g., "human.genes" or "human_genes", but never "human genes") Same rule for column or row names
dim(hg) 
head(hg)
tail(hg)
hg$Length <- hg$Gene.End - hg$Gene.Start + 1 # add a column for gene length; access variables by pressing "Tab" after "$" (auto-completion)
hist(hg$Length, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
hist(log10(hg$Length), br = 200)
mean(hg$Length) # not representative, super-long genes carry too much weight to the average length
median(hg$Length) # More representative. Use median for a variable not normally distributed
summary(hg$Length) # Show all quartiles
boxplot(log10(Length) ~ Chromosome, data = hg)  show gene length by chromosomes
write.csv(hg, "hg.csv", row.names = FALSE) # save into a file
hg <- read.csv("hg.csv") # read back into R
  • Export a PDF or image
  • Open a new R script, name it as "hg.R"
  • Select commands and save to script
  • Retrieve and edit a command by pressing "up" or "down" arrows
  • Retrieve commands by using the search box on the "History" table
  • Type q() to quit