R-tutorial: Difference between revisions

Latest revision as of 19:25, 17 March 2019

Install R & RStudio on your home computer
- R from this website: https://mirrors.nics.utk.edu/cran/
- R studio from this website: https://www.rstudio.com/
Create a new project by navigating: File | New Project | New Directory. Name it project file "genes"
Import abalone data set: Tools | Import DataSet | From Web URL, copy & paste this address: http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/hg.tsv2
Select "Yes" for column heading. Rename the data set if you wish (short but informative names, e.g., human.genes). Do not use spaces, use dot or underscore as name delimiters (e.g., "human.genes" or "human_genes", but never "human genes") Same rule for column or row names

dim(hg) 
head(hg)
tail(hg)
hg$Length <- hg$Gene.End - hg$Gene.Start + 1 # add a column for gene length; access variables by pressing "Tab" after "$" (auto-completion)
hist(hg$Length, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
hist(log10(hg$Length), br = 200)
mean(hg$Length) # not representative, super-long genes carry too much weight to the average length
median(hg$Length) # More representative. Use median for a variable not normally distributed
summary(hg$Length) # Show all quartiles
boxplot(log10(Length) ~ Chromosome, data = hg)  show gene length by chromosomes
write.csv(hg, "hg.csv", row.names = FALSE) # save into a file
hg <- read.csv("hg.csv") # read back into R

Export a PDF or image
Open a new R script, name it as "hg.R"
Select commands and save to script
Retrieve and edit a command by pressing "up" or "down" arrows
Retrieve commands by using the search box on the "History" table
Type q() to quit. Answer "y" to save workspace
To reload and restore workspace, go to "C:/Users/instructor/Documents/human.genes" and double click on the file "human.gene"
R Markdown

@@ Line 1: / Line 1: @@
-# Install R & RStudio on your home computer
+* Install R & RStudio on your home computer
-## R from this website: https://mirrors.nics.utk.edu/cran/
+** R from this website: https://mirrors.nics.utk.edu/cran/
-## R studio from this website: https://www.rstudio.com/
+** R studio from this website: https://www.rstudio.com/
-# Create a new project by navigating: File | New Project | New Directory. Name it project file "genes"
+* Create a new project by navigating: File | New Project | New Directory. Name it project file "genes"
-# Import abalone data set: Tools | Import DataSet | From Web URL, copy & paste this address: http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/hg.tsv2
+* Import abalone data set: Tools | Import DataSet | From Web URL, copy & paste this address: http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/hg.tsv2
-# Select "Yes" for column heading. Rename the data set if you wish (short but informative names, e.g., human.genes). Do not use spaces, use dot or underscore as name delimiters (e.g., "human.genes" or "human_genes", but never "human genes") Same rule for column or row names
+* Select "Yes" for column heading. Rename the data set if you wish (short but informative names, e.g., human.genes). Do not use spaces, use dot or underscore as name delimiters (e.g., "human.genes" or "human_genes", but never "human genes") Same rule for column or row names
-<syntaxhighlight lang=R">
+<syntaxhighlight lang="bash">
-hg.len <- hg$Gene.End - hg$Gene.Start + 1 # calculate gene length; access variables by pressing "Tab" (auto-completion)
+dim(hg)
-hist(hg.len, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
+head(hg)
-mean(hg.len) # not representative, super-long genes carry too much weight to the average length
+tail(hg)
-median(hg.len) # More representative. Use median for a variable not normally distributed
+hg$Length <- hg$Gene.End - hg$Gene.Start + 1 # add a column for gene length; access variables by pressing "Tab" after "$" (auto-completion)
-summary(hg.len) # Show all quartiles
+hist(hg$Length, br = 200) # plot gene-length distribution. Not normal: mostly genes are short, few very long
-IQR(hg.len) # 3rd Quartile - 1st Quartile, the range of majority data points, even for skewed distribution
+hist(log10(hg$Length), br = 200)
-log.len <- log10(hg.len); hist(log.len, br=200) # Log of gene length is more normally distributed
+mean(hg$Length) # not representative, super-long genes carry too much weight to the average length
-mean(log.len); median(log.len) # They should be similar, since log.len is normal
+median(hg$Length) # More representative. Use median for a variable not normally distributed
-boxplot(hg$Gene.End-hg$Gene.Start+1 ~ hg$Chromosome)
+summary(hg$Length) # Show all quartiles
+boxplot(log10(Length) ~ Chromosome, data = hg)  show gene length by chromosomes
 write.csv(hg, "hg.csv", row.names = FALSE) # save into a file
 hg <- read.csv("hg.csv") # read back into R
-boxplot(Gene.End - Gene.Start + 1 ~ Chromosome, data = hg) # show gene length by chromosomes
 </syntaxhighlight>
-# Export a PDF or image
+* Export a PDF or image
-# Open a new R script, name it as "hg.R"
+* Open a new R script, name it as "hg.R"
-# Select commands and save to script
+* Select commands and save to script
-# Retrieve and edit a command by pressing "up" or "down" arrows
+* Retrieve and edit a command by pressing "up" or "down" arrows
-# Retrieve commands by using the search box on the "History" table
+* Retrieve commands by using the search box on the "History" table
+* Type q() to quit. Answer "y" to save workspace
+* To reload and restore workspace, go to "C:/Users/instructor/Documents/human.genes" and double click on the file "human.gene"
+* [http://diverge.hunter.cuny.edu/~weigang/hg.html R Markdown]

R-tutorial: Difference between revisions

Latest revision as of 19:25, 17 March 2019

Navigation menu

Search