BioMed-R-2020: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
 
(107 intermediate revisions by 2 users not shown)
Line 68: Line 68:
* Syllabus Policy
* Syllabus Policy
Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice, announced in class or posted on Blackboard.
Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice, announced in class or posted on Blackboard.
==Final project assignment==
* Recommended study: Rabinowitz et al. “Transcriptomic, proteomic, and metabolomic landscape of positional memory in the caudal fin of zebrafish.” [[doi:10.1073/pnas.1620755114|PNAS 114,5 (2017)]]
* Email me for permission if you prefer to work on your own data
* Data set description & assignment
{| class="wikitable"
|-
! Data sets !! Category/Description !! Visualization/Reference !! Statistical analysis !! Assigned to
|-
| S01,S02,S03 || RNA-Seq, expression levels of all genes || Histogram, boxplot, vocanoplot (Fig 1E) || t-tests || Erica
|-
| S04, S05 || RNA-Seq, transcript gradients || heatmap (Fig 1B, "RNA") || cluster analysis || Illan
|-
| S06 || RNA-Seq, middle-enriched & middle-depleted || heatmap (Fig S2, "RNA") || cluster analysis || Arkadily
|-
| S7-S12|| RNA-Seq, pathway genes (transcription factors, Ion channels, membrane receptors, RA/Wnt/Fgf pathway genes) || barplots (Fig 2B-2G) || t-tests || QianFan, Andrew, Adam, Junho
|-
| S13|| Proteome, a 77-column table with technical replicates  || vocanoplot (Fig 1F), PCA (Fig 1D) || t-tests, PCA analysis || Brian
|-
| S14-S17|| Proteome, 4 biological replicates (with technical replicates for each) || vocanoplot (Fig 1F), PCA (Fig 1D) || t-tests, PCA analysis|| Hector
|-
| S18 || Proteome, combined biological & technical replicates || vocanoplot (Fig 1F), PCA (Fig 1D || t-tests , PCA analysis || Marvin
|-
| S19, S20|| Proteome, top hits || heatmap (Fig 1B, "Protein") || cluster analysis || Brittany
|-
| S21 || Proteome, gradient || heatmap (Fig S2, "Protein") || cluster analysis|| Stephanie
|-
| S22, S23|| Proteome, two regions || Volcano plot || t-tests|| Vhy-Shelta
|-
| S24, S25|| Proteome, dorsal-ventral-center axis|| PCA plot (Fig 5B) || PCA analysis|| Kiseok
|-
| S26, S27 || Metabolomics, raw & normalize counts || PCA plot (Fig 4B) || PCA analysis || Hao
|-
| S28 || Metabolomics, differential expression || heatmap (Fig 4C), volcano plot || cluster analysis || Jennifer
|-
| S29 || Metabolomics, gradient || heatmap (Fig 4C) || cluster analysis || Ann
|-
| combined analysis || RNA-seq vs Proteome || venn diagram (Fig S3) || binomial test; chi-square analysis || Tahir
|}
==Student projects & submissions==
{| class="wikitable"
|-
! # !! Student & project type !! Citation & PubMed link !! Research question !! Study Design: samples, sample size & controls !! Omics tech & NGS platform !! Computational tools !! Data visualization !! Statistical tests !! Data description & links
|-
| 1|| Tahir - cancer microbiome|| Kostic, A. D., et al. (2012). Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome research, 22(2), 292–298. [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266036/ PubMed] || How does the composition of tumorous colorectal carcinoma tissue microbiome differ from non-tumorous adjacent tissue? || Colorectal carcinoma (Tumor) tumor tissue and non-tumorous adjacent nonneoplastic (Normal) tissue); 95 tumor/normal paired samples (190 total samples); Non tumorous adjacent nonneoplastic tissue as controls || 16S rDNA amplicon sequencing; 454 GS FLX Sequencing || Mothur || Bar plots, Boxplots, Scatterplots, Cladogram || Linear Discriminate Analysis (LDA) and Wilcox Rank Sum Test (non-parametric t-test) || NCBI Sequence Read Archive accession no. SRP000383. Pre-processed dataset can also be retrieved from R package, phyloseq: filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip", package = "phyloseq"); kostic = microbio_me_qiime(filepath). The Kostic dataset is a phyloseq object (S4) consisting of sam_table, otu_table, table, phy_tree, and tax_table. Sample table includes metadata of samples collected including: Diagnosis, Race, Gender, etc.
|-
| 2|| Junho - yeast transcriptome ||  Gierlin ́ski M, Cole C, Schofield P, Schurch NJ, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson G, Owen- Hughes T, Blaxter M, and Barton GJ. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. [[doi:10.1093/bioinformatics/btv425|Bioinformatics, 31(22):1–15, 2015]].
|| These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools, edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. || RNA-seq dataset to date that contains mRNA from 48 replicates of two S. cerevisiae populations: wildtype vs snf2 knock-out mutants|| Illumina HiSeq 2000 || RStudio|| scatterplot, boxplot, heatmap || t-test, Wald test (2 factors); LRT for multiple factor || [https://www.ebi.ac.uk/ena/data/view/PRJEB5348 Europe SRA database accession]
|-
| 3 || Brian - mouse platelet transcriptome  || Rowley, Jesse W et al. “Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes.” [[doi:10.1182/blood-2011-03-339705|Blood vol. 118,14 (2011)]] || How can we use RNA-seq analysis to identify key genetic expression differences in human/mouse platelet cells?  || 8 - 16 mouse samples (male & female); 2 human samples (male & female)  || Illumina GAIIx  || Aligned via Novoalignment / downstream analysis done in Perl / RPKM calculation  || scatter plots / pie charts  / RefSeq gene annotations with RPKM expression levels / histograms  || Spearman rank correlation analysis || Mouse & Human BAM files: https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2430)    https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2431) (LINKS don't work)
|-
| 4 || Qinfan - wildlife microbiome || [https://doi.org/10.3389/fmicb.2018.00803 Comparing Microbiome Sampling Methods in a Wild Mammal: Fecal and Intestinal Samples Record Different Signals of Host Ecology, Evolution] || If there any differences between microbial communities from fecal and intestinal mucosa?  || Fecal and intestinal tissue samples from 37 bats in Lamanai, Belize. 55 DNA samples , 29 intestinal and 24 guano  || Illumina 16S rRNA  || Rstudio || boxplot (Alpha diversity – Shannon and Faith’s Index) , Barplot(Abundance; Heatmap (test bacterial family abundance) , Scatterplot ( beta diversity) || t-test, Wilconxon sign rank test and permutational multivariate analysis of variance || This is raw, demultiplexed 16S sequence data. Data is available on NCBI Sequence Read Archive under BioProject # PRJNA428973, [[doi: 10.6084/m9.figshare.5975365|QIIME2 mapping file and annotated feature table are available on Figshare]].
|-
| 4 || Brittany - human genome variation || [https://doi.org/10.1186/s12864-019-5957-x Belsare, S. et al. Evaluating the quality of the 1000 genomes project data. BMC Genomics 20, 620 (2019).] || Use data from the 1000 Genomes Project to determine whether there are significant differences in variants of pain receptor genes between ethnic groups. || 2,504 individuals from 26 different ethnic groups from Africa, Asia, Europe, America
Genetic variants: rs4633, rs4680, rs4818, rs6269, rs740603, rs1051660, rs1799971, rs7958311, rs40434, rs2066713
|| whole genome sequencing, deep exome sequencing, dense microarray genotypiny; Illumina, 10X Genomics || RStudio || bar plot, pie chart, heatmap, Manhattan plot || chi-square, ANOVA ||[http://useast.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=22:19963248-19964248;v=rs4680;vdb=variation;vf=88058863 To search by variant and download table of allele frequencies and genotype frequencies]; [https://www.internationalgenome.org/data-portal/sample To download whole genomes]
|-
| 5 || Hao - microRMA biomarker || Vila-Navarro, Elena et al. “MicroRNAs for Detection of Pancreatic Neoplasia: Biomarker Discovery by Next-generation Sequencing and Validation in 2 Independent Cohorts.” [[doi:10.1097/SLA.0000000000001809|Annals of surgery vol. 265,6 (2017): 1226-1234.]]  || How to find the new microRNA-based biomarkers for early detection of pancreatic neoplasia by anakyzing the miRNome of PDAC and the IMPN.  ||A. Pancreatic tissues (n = 165);  B. Biomarker discovery in a set of 18 surgical samples (11 PDAC, 4 IPMN, 3 C).  C. MiRNA validation in 2 different set of samples. Set 1—52 surgical samples (24 PDAC, 7 IPMN, 6 chronic pancreatitis, 15 C), and set 2—95 endoscopic ultrasound-guided fine-needle aspirations (60 PDAC, 9 IPMN, 26 C).
|| Illumina Genome Analyzer IIx || Illumina GA pipeline software; R/Bioconductor||Volcano plot, Heatmap, pie chart, ROC curve || t-test || [https://www.ebi.ac.uk/biostudies/studies/S-EPMC5434964?xr=true Discriminatory power of immature granulocyte count (IG), cell-free DNA (cfDNA), mitochondrial DNA (mtDNA), nuclear DNA (ncDNA), phagocytic index (PI) and revised BAUX (rBAUX) for sepsis at different time]
|-
| 6 || Marvin - microbiome  || [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227985 Marangoni, A. et al. (2020). Pharyngeal microbiome alterations during Neisseria gonorrhoeae infection.]  || Identify the bacterial community profiles of the pharyngeal microbiome associated with Neisseria gonorrhoeae infection. Does gonorrhea infection change the microbiome community in oropharynx? ||: Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected). || Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected).||  R package vegan, PICRUST software, Prism, MATLAB || Alpha-diversity box plot of Chao1 and Shannon index, principal coordinate analysis plot (scatterplot), pie charts, data tables ||  Non-parametric Monte Carlo-based test, Whitney t-test.|| (Requesting), NCBI Short Read Archive accession number PRJNA556341
|-
| 7 || Ann- disease transcriptome || Mastrokolias A, Ariyurek Y, Goeman JJ, et al. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood. [[doi:10.1038/ejhg.2014.281|Eur J Hum Genet. 2015;23(10):1349–1356.]] || Can the researchers identify accessible biomarkers for Huntington’s Disease to monitor disease progression and therapy response? || RNA blood samples - 91 mutation carriers (27 presymptomatic and, 64 symptomatic) and 33 controls || transcriptome sequencing; Illumina GA Pipeline || R  || boxplots; scatterplot  || Student’s T-tests, linear modeling function in R, linear regression model || Gene Expression Omnibus (GEO) under accession number GSE51799.
|-
| 8 || Vhy-Shelta - single cell transcriptome  || E. A. Stadtmauer et al., [https://pubmed.ncbi.nlm.nih.gov/32029687-crispr-engineered-t-cells-in-patients-with-refractory-cancer/ Science 10.1126/science.aba7365 (2020)] || First in human phase I clinical trial to test the safety and feasibility of multiplex CRISPR-Cas9 editing to engineer T cells in three patients with refractory cancer. Can CRISPR-Cas9 can be used as a synthetic biology cancer immunotherapy application in refractory cancer? || Experimental group: n=3, 2 patients with refractory advanced myeloma and one patient with refractory metastatic sarcoma. Types of controls: Patient Untransduced T cells and cells transduced with NY-ESO TCR without CRISPR, positive reference sample containing 1x10^3 copies of synthetic template plasmid, healthy donor (ex vivo CD4+and CD8+ T cells from patients or healthy donor controls) || Illumina seq- HiSeq400, MiSeq || iGuide, Cellranger v3.0.2, Seurat v3.1.0, Python  ||  Box plots, bar graphs, scatter plots, Venn diagrams, heat maps, cytometry plots, UMAP plots, Swimmer’s plot, piecewise linear model, computed tomography scan || Paired student t-tests, unpaired student’s t-tests, GraphPad, prism, parameter estimates, random estimates ||
Table 4S (excel file) presents the information for interpreting scRNAseq Metadata tables (4a-c)
See supplementary materials
Table 5S s Differentially expressed genes in NY-ESO-1 TCR+ T cells during various days pre and post infusion.
[https://science.sciencemag.org/content/suppl/2020/02/05/science.aba7365.DC1 Data Link (See Table 4S and Supplementary materials Table 5S)]
|-
| 8 || Andrew - wildlife transcriptome || [[ https://www.pnas.org/content/116/4/1331.long|Young, Ferkin, et al. (2019)]] Conserved transcriptomic profiles underpin monogamy across vertebrates  ||  Is there a universal transcriptomic code underlying monogamy in vertebrates? ||Sequenced and compared neural transcriptomic profiles from reproductive males of closely related monogamous and nonmonogamous species from four major classes of vertebrates (n = 3 pooled individuals per species): Mammalia (Microtus ochrogaster versus Microtus pennsylvanicus and Peromyscus californicus versus Peromyscus maniculatus); Reptilia–Aves (A. spinoletta versus P. modularis); Amphibia (Ranitomeya imitator versus Oophaga pumilio); and Actinopterygii (Xenotilapia spilotera versus Xenotilapia ornatipinnis) || ? || Bioconductor DESeq2 ||Volcano Plot, (RRHO) Rank Rank Hypergeometric Overlap, HeatMap || Spearman's Rank Correlation Coefficient, Linear Regression, Principle Component ANalysis  || Data Appendix, Dataset_S01, [https://www.pnas.org/content/suppl/2019/01/02/1813775116.DCSupplemental Dataset_S02]
|-
| 9|| Adam - Single cell transcriptome || Hong et al. (2019). Single-cell transcriptomics reveals multi-step adaptations to endocrine therapy. [https://www.nature.com/articles/s41467-019-11721-9 Nature Communications , 10 (1), 3840.] || How do clonal genetic diversity and transcriptional plasticity play a role in the early and late endocrine therapy for luminal breast cancers? || MCF7, long-term oestrogen-deprived cells (LTED) , Primary-metastatic breast cancer cells were derived from pleuraleffusions of patients with metastatic breast cancers, and cells from 20 patients with luminal tumors treated by aromatase inhibitors (10 responders and 10 non-responders). || Illumina HiSeq 4000 || R || Box plot, volcano plot, heatmap, gene network, venn diagrams, bar graphs, line plot Statistical Analysis || Kruskal-Wallis test, two-tailed paired t-test, Wilcoxon signed-rank test, hypergeometric test, permutation test, regression || GSE122743. Supplementary Data is also provided. Description of [https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-019-11721-9/MediaObjects/%2041467_2019_11721_MOESM2_ESM.pdf supplementary data]
|-
| 10 || Stephanie - transcriptome ||  Vijay et al. Critical role of phospholipase A2 group IID in age-related susceptibility to severe acute respiratory syndrome-CoV infection. [https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5879 J Exp Med 2015 Oct 19;212(11):1851-68.] ||  What is the effect of age-dependent anti-inflammation factor phospholipase A2 group IID (PLA2G2D) in the lungs of older patients with severe respiratory disease? ||  6 wk to 22 mo specific pathogen–free C57BL/6 mice were purchased, and Pla2g2d−/− and Pla2g2d+/+ mice (H-2b) were generated via ES cell transfection and embryo injections with the Pla2g2d-targeting vector inserted between exons 1 and 2 of the Pla2g2d gene. Pla2g2d+/− mice were bred with C57BL/6NCrSLc (Japan SLC) mice. After initial development, mice were backcrossed greater than 12 times to Japan SLC mice. Pla2g2d+/+ C57BL/6NCrSLc mice were used as controls for all experiments.IIlumina MouseRef-8 v2.0 Expression BeadChip || ? ||  PartekGS software and Ingenuity Pathway Analysis software ||heat map, histogram, flow cytometry plot || Data were normalized and median polished using Robust Multichip Average background correction with log2-adjusted values. After obtaining the log2 expression values for genes, significance testing was performed comparing the two groups (CD11c+ cells from lungs of young and middle-aged mice). False discovery rate was applied to all p-values to correct for multiple testing. Significance of expression differences was assessed using an false discovery rate (FDR) cutoff of 0.05 and a twofold change. || [https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5879 Analysis of pulmonary CDC11c+ cells from 6-8 week and 10-13 month old C57BL/6 animals. CDC11c+ cells are key modulators of the immune response in the lung.]
|-
| 11 || Hector - transcriptome||Kesherwani, V., Shahshahan, H. R., & Mishra, P. K. (2017). Cardiac transcriptome profiling of diabetic Akita mice using microarray and next generation sequencing. [https://doi.org/10.1371/journal.pone.0182828 PloS one, 12(8), e0182828.]
|| 1. Identifying different molecular changes in cardiac transcriptomes between insulin negative (diabetic) and wild-type mice.  2. Determine implications of these molecular changes on several heart failure singalling pathways. || Cardiac tissue RNA exons and introns of insulin negative  Ins2+/- Akita mice, and  normoglycemic (WT) mice, WT n= 3, Ins2+/- n = 3, N = 6 || microarray, IPA analysis, and  transcriptome sequencing
|| Rstudio, Ingenuity Pathway Analyses (IPA)  ||Bar plots, IPA  network graph || t-Test  || [https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE66576 Data is found in the gene expression omnibus website (GEO) and an R script is provided]
|-
| 12 || Kiseok Yang - cancer methylome || Dai W et al., Comparative methylome analysis in solid tumors reveals aberrant methylation at chromosome 6p in nasopharyngeal carcinoma, Cancer Medicine 2015, 4(7):1079–1090, pubmed  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529346/ ||25 NPC tumors were included in the discovery set and 35 NPC tumors in the validation set. ||Methylome(epigenome)  ||Illumina(HumanMethylation450 BeadArray)  ||R with using Limma analysis package  ||Boxplots(with jitters), pie charts, bar charts(normal and stacked position)  ||Mann–Whitney U-test  || Having information of probes in microarray with its location on chromosomes, strand type(forward or reverse) and brief exploration of the function.
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE62336&format=file
|-
| 13 || Jennifer - regeneration transcriptome/proteom || Rabinowitz et al. “Transcriptomic, proteomic, and metabolomic landscape of positional memory in the caudal fin of zebrafish.” [[doi:10.1073/pnas.1620755114|PNAS 114,5 (2017)]] ||  what are the molecules that regulate positional memory that is responsible for appendage regeneration in zebrafish? || proximal and distal RNA-seq and LFQ proteomics datasets (transcripts and proteins present in similar patterns in both datasets), 32 molecules, Actb2 was used as a loading control ||  transcriptome, proteome, metabolome || Illumina || g:Profiler || bar graphs, heat maps, volcano plots || t-test ||  [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92760 GSE92760]
|}


==Course Schedule==
==Course Schedule==
Line 96: Line 190:
|}
|}


===Feb 8, 2019===
===Feb 8, 2020===
* Introduction to NGS: [[File:Intro-NGS.pdf|thumbnail]]
* Introduction to NGS: [[File:Intro-NGS.pdf|thumbnail]]
* 1-slide presentations on Next-Generation Sequencing Technologies (Group I)
* 1-slide presentations on Next-Generation Sequencing Technologies (Group I)
Line 113: Line 207:
|}
|}


===Feb 15, 2019===
===Feb 15, 2020===
* NGS presentations (Group II)
* NGS presentations (Group II)
* R Tutorial. Part 3. Data visualization with ggplot2. Slides: [[File:R-tutorials-3.pdf|thumbnail]]
* R Tutorial. Part 3. Data visualization with ggplot2. Slides: [[File:R-tutorials-3.pdf|thumbnail]]
* No assignment (go over slides and 3 tutorial scripts to prepare for Quiz next week)
* No assignment (go over slides and 3 tutorial scripts to prepare for Quiz next week)


===Feb 22, 2019===
===Feb 22, 2020===
* Quiz 1 (Open Book)  
* Quiz 1 (Open Book)  
* R Tutorial: Part 4. BioStat (chi-square & t-test) Lecture slides: [[File:R-tutorial-4.pdf|thumbnail]]
* R Tutorial: Part 4. BioStat (chi-square & t-test) Lecture slides: [[File:R-tutorial-4.pdf|thumbnail]]
Line 136: Line 230:
|}
|}


===Feb 29, 2019===
===Feb 29, 2020===
* Student submissions
* Paper evaluation & selection
{| class="wikitable"
* R Tutorial: Part 4. BioStat (regression & ANOVA) [[File:R-tutorial-5.pdf|thumbnail]]
|-
! Student & project type !! Citation & PubMed link !! Research question !! Study Design: samples, sample size & controls !! Omics tech & NGS platform !! Computational tools !! Data visualization !! Statistical tests !! Data description & links
|-
| Tahir - cancer microbiome|| Kostic, A. D., et al. (2012). Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome research, 22(2), 292–298. [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266036/ PubMed] || How does the composition of tumorous colorectal carcinoma tissue microbiome differ from non-tumorous adjacent tissue? || Colorectal carcinoma (Tumor) tumor tissue and non-tumorous adjacent nonneoplastic (Normal) tissue); 95 tumor/normal paired samples (190 total samples); Non tumorous adjacent nonneoplastic tissue as controls || 16S rDNA amplicon sequencing; 454 GS FLX Sequencing || Mothur || Bar plots, Boxplots, Scatterplots, Cladogram || Linear Discriminate Analysis (LDA) and Wilcox Rank Sum Test (non-parametric t-test) || NCBI Sequence Read Archive accession no. SRP000383. Pre-processed dataset can also be retrieved from R package, phyloseq: filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip", package = "phyloseq"); kostic = microbio_me_qiime(filepath). The Kostic dataset is a phyloseq object (S4) consisting of sam_table, otu_table, table, phy_tree, and tax_table. Sample table includes metadata of samples collected including: Diagnosis, Race, Gender, etc.
|-
| Junho - yeast transcriptome ||  Gierlin ́ski M, Cole C, Schofield P, Schurch NJ, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson G, Owen- Hughes T, Blaxter M, and Barton GJ. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. [[doi:10.1093/bioinformatics/btv425|Bioinformatics, 31(22):1–15, 2015]].
|| These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools, edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. || RNA-seq dataset to date that contains mRNA from 48 replicates of two S. cerevisiae populations: wildtype vs snf2 knock-out mutants|| Illumina HiSeq 2000 || RStudio|| scatterplot, boxplot, heatmap || t-test, Wald test (2 factors); LRT for multiple factor || ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458493/ERR458493.fastq.gz; ERR458493.fastq.gz; WT_1_Aligned.sortedByCoord.out; WT_2_Aligned.sortedByCoord.out; SNF2_1_Aligned.sortedByCoord.out.bam.bai; SNF2_2_Aligned.sortedByCoord.out.bam.bai
|-


| Brian - mouse platelet transcriptome  || Rowley, Jesse W et al. “Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes.” [[doi:10.1182/blood-2011-03-339705|Blood vol. 118,14 (2011)]] || How can we use RNA-seq analysis to identify key genetic expression differences in human/mouse platelet cells?  || 8 - 16 mouse samples (male & female); 2 human samples (male & female)  || Illumina GAIIx  || Aligned via Novoalignment / downstream analysis done in Perl / RPKM calculation  || scatter plots / pie charts  / RefSeq gene annotations with RPKM expression levels / histograms  || Spearman rank correlation analysis || Mouse & Human BAM files: https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2430)    https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2431) (LINKS don't work)
===March 7, 2020===
|-
* Self Study 1 (no class): RNA-Seq analysis. '''Assignment 4 (10 pts; due 3/14/2020)''': [http://diverge.hunter.cuny.edu/~weigang/self-study-1.html Self Study 1]
| Qinfan - wildlife microbiome || [https://doi.org/10.3389/fmicb.2018.00803 Comparing Microbiome Sampling Methods in a Wild Mammal: Fecal and Intestinal Samples Record Different Signals of Host Ecology, Evolution] || If there any differences between microbial communities from fecal and intestinal mucosa?  || Fecal and intestinal tissue samples from 37 bats in Lamanai, Belize. 55 DNA samples , 29 intestinal and 24 guano  || Illumina 16S rRNA  || Rstudio || boxplot (Alpha diversity – Shannon and Faith’s Index) , Barplot(Abundance; Heatmap (test bacterial family abundance) , Scatterplot ( beta diversity) || t-test, Wilconxon sign rank test and permutational multivariate analysis of variance || This is raw, demultiplexed 16S sequence data. Data is available on NCBI Sequence Read Archive under BioProject # PRJNA428973, [[doi: 10.6084/m9.figshare.5975365|QIIME2 mapping file and annotated feature table are available on Figshare]].
* Review for mid-term exam: 6 PDF presentations (intro to NGS & 5 R-tutorials)
|-
|Brittany - human genome variation || [https://doi.org/10.1186/s12864-019-5957-x Belsare, S. et al. Evaluating the quality of the 1000 genomes project data. BMC Genomics 20, 620 (2019).] || Use data from the 1000 Genomes Project to determine whether there are significant differences in variants of pain receptor genes between ethnic groups. || 2,504 individuals from 26 different ethnic groups from Africa, Asia, Europe, America
Genetic variants: rs4633, rs4680, rs4818, rs6269, rs740603, rs1051660, rs1799971, rs7958311, rs40434, rs2066713
|| whole genome sequencing, deep exome sequencing, dense microarray genotypiny; Illumina, 10X Genomics || RStudio || bar plot, pie chart, heatmap, Manhattan plot || chi-square, ANOVA ||[http://useast.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=22:19963248-19964248;v=rs4680;vdb=variation;vf=88058863 To search by variant and download table of allele frequencies and genotype frequencies]; [https://www.internationalgenome.org/data-portal/sample To download whole genomes]
|-
| Hao - microRMA biomarker || Vila-Navarro, Elena et al. “MicroRNAs for Detection of Pancreatic Neoplasia: Biomarker Discovery by Next-generation Sequencing and Validation in 2 Independent Cohorts.” [[doi:10.1097/SLA.0000000000001809|Annals of surgery vol. 265,6 (2017): 1226-1234.]]  || How to find the new microRNA-based biomarkers for early detection of pancreatic neoplasia by anakyzing the miRNome of PDAC and the IMPN.  ||A. Pancreatic tissues (n = 165);  B. Biomarker discovery in a set of 18 surgical samples (11 PDAC, 4 IPMN, 3 C).  C. MiRNA validation in 2 different set of samples. Set 1—52 surgical samples (24 PDAC, 7 IPMN, 6 chronic pancreatitis, 15 C), and set 2—95 endoscopic ultrasound-guided fine-needle aspirations (60 PDAC, 9 IPMN, 26 C).
|| Illumina Genome Analyzer IIx || Illumina GA pipeline software; R/Bioconductor||Volcano plot, Heatmap, pie chart, ROC curve || t-test || [https://www.ebi.ac.uk/biostudies/studies/S-EPMC5434964?xr=true Discriminatory power of immature granulocyte count (IG), cell-free DNA (cfDNA), mitochondrial DNA (mtDNA), nuclear DNA (ncDNA), phagocytic index (PI) and revised BAUX (rBAUX) for sepsis at different time]
 
|-
| Marvin - microbiome  || [[doi.org/10.1371/journal. pone.0227985|Marangoni, A. et al. (2020). Pharyngeal microbiome alterations during Neisseria gonorrhoeae infection. PLoS One, 15(1).]]  || Identify the bacterial community profiles of the pharyngeal microbiome associated with Neisseria gonorrhoeae infection. Does gonorrhea infection change the microbiome community in oropharynx? ||: Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected). || Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected).||  R package vegan, PICRUST software, Prism, MATLAB || Alpha-diversity box plot of Chao1 and Shannon index, principal coordinate analysis plot (scatterplot), pie charts, data tables ||  Non-parametric Monte Carlo-based test, Whitney t-test.|| (Requesting), NCBI Short Read Archive accession number PRJNA556341
 
|-
| Ann- disease transcriptome || Mastrokolias A, Ariyurek Y, Goeman JJ, et al. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood. [[doi:10.1038/ejhg.2014.281|Eur J Hum Genet. 2015;23(10):1349–1356.]] || Can the researchers identify accessible biomarkers for Huntington’s Disease to monitor disease progression and therapy response? || RNA blood samples - 91 mutation carriers (27 presymptomatic and, 64 symptomatic) and 33 controls || transcriptome sequencing; Illumina GA Pipeline || R  || boxplots; scatterplot  || Student’s T-tests, linear modeling function in R, linear regression model || Gene Expression Omnibus (GEO) under accession number GSE51799.
 
|-
| Vhy-Shelta - single cell transcriptome  || E. A. Stadtmauer et al., [https://pubmed.ncbi.nlm.nih.gov/32029687-crispr-engineered-t-cells-in-patients-with-refractory-cancer/ Science 10.1126/science.aba7365 (2020)] || First in human phase I clinical trial to test the safety and feasibility of multiplex CRISPR-Cas9 editing to engineer T cells in three patients with refractory cancer. Can CRISPR-Cas9 can be used as a synthetic biology cancer immunotherapy application in refractory cancer? || Experimental group: n=3, 2 patients with refractory advanced myeloma and one patient with refractory metastatic sarcoma. Types of controls: Patient Untransduced T cells and cells transduced with NY-ESO TCR without CRISPR, positive reference sample containing 1x10^3 copies of synthetic template plasmid, healthy donor (ex vivo CD4+and CD8+ T cells from patients or healthy donor controls) || Illumina seq- HiSeq400, MiSeq || iGuide, Cellranger v3.0.2, Seurat v3.1.0, Python  ||  Box plots, bar graphs, scatter plots, Venn diagrams, heat maps, cytometry plots, UMAP plots, Swimmer’s plot, piecewise linear model, computed tomography scan || Paired student t-tests, unpaired student’s t-tests, GraphPad, prism, parameter estimates, random estimates ||
Table 4S (excel file) presents the information for interpreting scRNAseq Metadata tables (4a-c)
See supplementary materials
Table 5S s Differentially expressed genes in NY-ESO-1 TCR+ T cells during various days pre and post infusion.
[https://science.sciencemag.org/content/suppl/2020/02/05/science.aba7365.DC1 Data Link (See Table 4S and Supplementary materials Table 5S)]
|-
| Andrew - wildlife transcriptome || [[ https://doi.org/10.1073/pnas.1813775116|Young, Ferkin, et al. (2019)]] Conserved transcriptomic profiles underpin monogamy across vertebrates  ||  Is there a universal transcriptomic code underlying monogamy in vertebrates? ||Sequenced and compared neural transcriptomic profiles from reproductive males of closely related monogamous and nonmonogamous species from four major classes of vertebrates (n = 3 pooled individuals per species): Mammalia (Microtus ochrogaster versus Microtus pennsylvanicus and Peromyscus californicus versus Peromyscus maniculatus); Reptilia–Aves (A. spinoletta versus P. modularis); Amphibia (Ranitomeya imitator versus Oophaga pumilio); and Actinopterygii (Xenotilapia spilotera versus Xenotilapia ornatipinnis) || ? || Bioconductor DESeq2 ||Volcano Plot, (RRHO) Rank Rank Hypergeometric Overlap, HeatMap || Spearman's Rank Correlation Coefficient, Linear Regression, Principle Component ANalysis  || Data Appendix, Dataset_S01, [https://www.pnas.org/content/suppl/2019/01/02/1813775116.DCSupplemental Dataset_S02]
 
|-
| Adam - Single cell transcriptome || Hong et al. (2019). Single-cell transcriptomics reveals multi-step adaptations to endocrine therapy. [https://www.nature.com/articles/s41467-019-11721-9 Nature Communications , 10 (1), 3840.] || How do clonal genetic diversity and transcriptional plasticity play a role in the early and late endocrine therapy for luminal breast cancers? || MCF7, long-term oestrogen-deprived cells (LTED) , Primary-metastatic breast cancer cells were derived from pleuraleffusions of patients with metastatic breast cancers, and cells from 20 patients with luminal tumors treated by aromatase inhibitors (10 responders and 10 non-responders). || Illumina HiSeq 4000 || R || Box plot, volcano plot, heatmap, gene network, venn diagrams, bar graphs, line plot Statistical Analysis || Kruskal-Wallis test, two-tailed paired t-test, Wilcoxon signed-rank test, hypergeometric test, permutation test, regression || GSE122743. Supplementary Data is also provided. Description of [https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-019-11721-9/MediaObjects/%2041467_2019_11721_MOESM2_ESM.pdf supplementary data]


|-
===March 14, 2020===
| Stephanie - transcriptome ||  Vijay et al. Critical role of phospholipase A2 group IID in age-related susceptibility to severe acute respiratory syndrome-CoV infection. [https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5879 J Exp Med 2015 Oct 19;212(11):1851-68.] ||  What is the effect of age-dependent anti-inflammation factor phospholipase A2 group IID (PLA2G2D) in the lungs of older patients with severe respiratory disease? ||  6 wk to 22 mo specific pathogen–free C57BL/6 mice were purchased, and Pla2g2d−/− and Pla2g2d+/+ mice (H-2b) were generated via ES cell transfection and embryo injections with the Pla2g2d-targeting vector inserted between exons 1 and 2 of the Pla2g2d gene. Pla2g2d+/− mice were bred with C57BL/6NCrSLc (Japan SLC) mice. After initial development, mice were backcrossed greater than 12 times to Japan SLC mice. Pla2g2d+/+ C57BL/6NCrSLc mice were used as controls for all experiments.IIlumina MouseRef-8 v2.0 Expression BeadChip || PartekGS software and Ingenuity Pathway Analysis software ||heat map, histogram, flow cytometry plot || Data were normalized and median polished using Robust Multichip Average background correction with log2-adjusted values. After obtaining the log2 expression values for genes, significance testing was performed comparing the two groups (CD11c+ cells from lungs of young and middle-aged mice). False discovery rate was applied to all p-values to correct for multiple testing. Significance of expression differences was assessed using an false discovery rate (FDR) cutoff of 0.05 and a twofold change. || [https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5879 Analysis of pulmonary CDC11c+ cells from 6-8 week and 10-13 month old C57BL/6 animals. CDC11c+ cells are key modulators of the immune response in the lung.]
* Mid-term exam (50 pts). Open Book


|-
===March 21, 2020===
| || || || || || || || ||
* Live Session using Blackboard Collaborator
|}
* [http://cov.genometracker.org Covid-19 Genome Tracker] (developed by the Qiu Lab)
* [https://wwwnc.cdc.gov/eid/article/26/6/20-0357_article Analysis of a Covid-19 symptom onset timing]
* R Markdown Tutorial: [http://diverge.hunter.cuny.edu/~weigang/Rmarkdown-template.Rmd R markdown template (by Hector)]
* [http://diverge.hunter.cuny.edu/~weigang/self-study-2.html In-class Exercises]
* Assignment 5 (10 pts; due next session): see above link


* Paper evaluation & selection
===March 28, 2020===
* R Tutorial: Part 4. BioStat (regression & ANOVA)
* No class;  (CUNY Recalibration Period)
* No assignment due (Assignment #5 due next session on April 4)


===March 7, 2019===
===April 4, 2020===  
* Self study & prepare for mid-term (no class)
* In class workshop: [http://diverge.hunter.cuny.edu/~weigang/self-study-3.html Sef-study-3: Covid-19 cases]


===March 14, 2019===
===April 11, 2020===
* Mid-term exam (50 pts). Open Boook
* Quiz II (25 pts; Open Book; R markdown-generated WORD/PDF file as submission)
* In-class workshop on identify genes/proteins/metabolites associated with tissue regeneration
** [https://www.pnas.org/content/114/5/E717 Article link] (submission by Jenifer)
** Tutorial: [http://diverge.hunter.cuny.edu/~weigang/Case-study-for-final.html Tutorial for case study]
** Will be used for final presentation & R markdown report


===March 22, 2019===
===April 18, 2020===
* R tutorial: Section 5.3. t-test
* Reference & data sets for final project have been posted & assignments have been made: [[BioMed-R-2020#Final_project_assignment|Final_project_assignment]]
* Group presentations (Data visualization)
* Before class: read paper, download assigned Excel workbook, save data set as TSV (tab-separated file); Read into R-studio.
* During class: present the data set, including:
===March 28, 2019===
** Biological question
* (Self study; No live class)
** Experimental design: samples, sample sizes, controls
* Abstract (200 words; individualized; due 3/30)
** Experimental techniques/measurements
* Review contingency test & two-sample t-test
** Data set description, column by column
* Generate preliminary graphs
** Visualization to be made
** Statistical tests to be performed


===March 30, 2019===  
===April 25, 2020===
* 20 pts Quiz on contingency test & two-sample t-test
* [[BioMed-R-2020#Final_project_assignment|Final_project_assignment]]
* Group presentations (Show preliminary graphs)
* Tutorial: [http://diverge.hunter.cuny.edu/~weigang/Case-study-for-final.html Tutorial for case study (updated)]
* Material & Methods (due 4/6)
* Presentations of draft figures


===April 4, 2019===  
===May 2, 2020===
* 20 pts Quiz
* Self study (no live session)
* R tutorial: Section 5.4. Regression analysis
* Tutorial: [http://diverge.hunter.cuny.edu/~weigang/Case-study-for-final.html Tutorial for case study]
* Results (due 4/13)
* For final report, you are required to:
** Tables to show the dataset you work on (not all, but a sample)
** Read the paper and identify a dataset to replicate
** Figures with legend (R methods, x & y-axis, conclusion)
** Create an R markdown file to record your work
** 1-paragraph summary of your results
** Produce a final WORD or PDF file as final report
* '''Your final report (100 pts) should include the following required components''':
** (10 pts) Section 1. Background & Objectives. Describe (a) the overall goal of the study; (b) the specific question to be addressed by your dataset
** (20 pts) Section 2. Material & Methods. Describe experimental design, i.e., how your assigned data set was generated, including the nature of the biological samples, sample size, number of replicates (biological & technical), controls (if any), sequencing technologies. Hint: Fig S1
** (40 pts) Section 3. R codes & graphs. Show R codes with comments for individual commands. Graphics should be as close to the published figure as possible (e.g., with proper axis labels)
** (10 pts) Section 4. Statistical analysis. Show mull hypothesis and p-value. Draw statistical conclusion
** (10 pts) Section 5. Conclusion. Draw biological conclusions of your analysis
** (10 pts) Section 6. Citations/source/URL to paper, your dataset, and methods


===April 18, 2019===
===May 9, 2020===
* 20 pts Quiz. Regression analysis
* Consultation (no mandatory participation)
* Background & Introduction (due 5/4)


===April 25, 2019===
===May 16, 2020===
* Final presentation I. Graded on:
* Consultation (no mandatory participation)
** Objective (original & your own)
** Material & methods (original & your own)
** Results (your own)
** Conclusion (your own)
** Conclusion (due 5/11)


===May 2, 2019===
===May 22, 2020===
* Self study: Prepare your 10-slide presentation
* Friday, 5pm: Final report Due (Blackboard submission)
* No class (instructor travels)
===May 16, 2019, 9-1pm===
* Final presentation
* May 22, 2018 (Wed, 5pm) Final Report Due (hard copy; n my office or in mailbox)

Latest revision as of 02:36, 2 May 2020

BIOL47120 Biomedical Genomics II
Spring 2020, Saturdays 9-12 noon, Hunter North Building 1001G
Instructor: Weigang Qiu, Ph.D., Professor, Department of Biological Sciences, Hunter College, CUNY; Email: weigang@genectr.hunter.cuny.edu
T.A.: Christopher Panlasigui; Hunter College; Email: christopher.panlasigui47@myhunter.cuny.edu
Office: B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA; Office hour: Wed 3-5pm
MA plot Volcano plot Heat map
fold change (y-axis) vs. total expression levels (x-axis)
p-value (y-axis) vs. fold change (x-axis)
genes significantly down or up-regulated (at p<1e-4)

Course Overview

Welcome to Introductory BioMedical Genomics, a seminar course for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and rapid DNA and RNA-sequencing technologies, biomedical sciences are undergoing a rapid & irreversible transformation into a highly data-intensive field, that requires familiarity with concepts in both biology, computational, and data sciences.

Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as statistics.

This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises. Students are expected to be able to replicate key results of data analysis from published studies.

The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.

Learning goals

By the end of this course successful students will be able to:

  • Describe next-generation sequencing (NGS) technologies & contrast it with traditional Sanger sequencing
  • Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome.
  • Visualize and explore genomics data using R & RStudio
  • Replicate key results using a raw data set produced by a primary research paper

Web Links

Quizzes and Exams

Student performance will be evaluated by attendance, weekly assignments, quizzes, and a final report:

  • Attendance & In-class participation: 50 pts
  • Assignments: 5 x 10 = 50 pts
  • Quizzes: 2 x 25 pts = 50 pts
  • Mid-term: 50 pts
  • Final presentation & report: 100 pts

Total: 300 pts

Tips for Success

To maximize the your experience we strongly recommend the following strategies:

  • Follow the directions for efficiently, finding high-impact papers, reading science research papers and preparing presentations.
  • Read the papers, watch required videos and do the exercises regularly, long before you attend class.
  • Attend all classes, as required. Late arrival results in loss of points.
  • Keep up with online exercises. Don’t wait until the due date to start tasks.
  • Take notes or annotate slides while attending the lectures.
  • Listen actively and participate in class and in online discussions.
  • Review and summarize material within 24 hrs after class.
  • Observe the deadlines for submitting your work. Late submissions incur penalties.
  • Put away cell phones, do not TM, email or play computer games in class.

Hunter/CUNY Policies

  • Policy on Academic Integrity

Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on homework, online exercises or examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity, and we will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures. Students will be asked to read this statement before exams.

  • ADA Policy

In compliance with the American Disability Act of 1990 (ADA) and with Section 504 of the Rehabilitation Act of 1973, Hunter College is committed to ensuring educational parity and accommodations for all students with documented disabilities and/or medical conditions. It is recommended that all students with documented disabilities (Emotional, Medical, Physical, and/or Learning) consult the Office of AccessABILITY, located in Room E1214B, to secure necessary academic accommodations. For further information and assistance, please call: (212) 772- 4857 or (212) 650-3230.

  • Syllabus Policy

Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice, announced in class or posted on Blackboard.

Final project assignment

  • Recommended study: Rabinowitz et al. “Transcriptomic, proteomic, and metabolomic landscape of positional memory in the caudal fin of zebrafish.” PNAS 114,5 (2017)
  • Email me for permission if you prefer to work on your own data
  • Data set description & assignment
Data sets Category/Description Visualization/Reference Statistical analysis Assigned to
S01,S02,S03 RNA-Seq, expression levels of all genes Histogram, boxplot, vocanoplot (Fig 1E) t-tests Erica
S04, S05 RNA-Seq, transcript gradients heatmap (Fig 1B, "RNA") cluster analysis Illan
S06 RNA-Seq, middle-enriched & middle-depleted heatmap (Fig S2, "RNA") cluster analysis Arkadily
S7-S12 RNA-Seq, pathway genes (transcription factors, Ion channels, membrane receptors, RA/Wnt/Fgf pathway genes) barplots (Fig 2B-2G) t-tests QianFan, Andrew, Adam, Junho
S13 Proteome, a 77-column table with technical replicates vocanoplot (Fig 1F), PCA (Fig 1D) t-tests, PCA analysis Brian
S14-S17 Proteome, 4 biological replicates (with technical replicates for each) vocanoplot (Fig 1F), PCA (Fig 1D) t-tests, PCA analysis Hector
S18 Proteome, combined biological & technical replicates vocanoplot (Fig 1F), PCA (Fig 1D t-tests , PCA analysis Marvin
S19, S20 Proteome, top hits heatmap (Fig 1B, "Protein") cluster analysis Brittany
S21 Proteome, gradient heatmap (Fig S2, "Protein") cluster analysis Stephanie
S22, S23 Proteome, two regions Volcano plot t-tests Vhy-Shelta
S24, S25 Proteome, dorsal-ventral-center axis PCA plot (Fig 5B) PCA analysis Kiseok
S26, S27 Metabolomics, raw & normalize counts PCA plot (Fig 4B) PCA analysis Hao
S28 Metabolomics, differential expression heatmap (Fig 4C), volcano plot cluster analysis Jennifer
S29 Metabolomics, gradient heatmap (Fig 4C) cluster analysis Ann
combined analysis RNA-seq vs Proteome venn diagram (Fig S3) binomial test; chi-square analysis Tahir

Student projects & submissions

# Student & project type Citation & PubMed link Research question Study Design: samples, sample size & controls Omics tech & NGS platform Computational tools Data visualization Statistical tests Data description & links
1 Tahir - cancer microbiome Kostic, A. D., et al. (2012). Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome research, 22(2), 292–298. PubMed How does the composition of tumorous colorectal carcinoma tissue microbiome differ from non-tumorous adjacent tissue? Colorectal carcinoma (Tumor) tumor tissue and non-tumorous adjacent nonneoplastic (Normal) tissue); 95 tumor/normal paired samples (190 total samples); Non tumorous adjacent nonneoplastic tissue as controls 16S rDNA amplicon sequencing; 454 GS FLX Sequencing Mothur Bar plots, Boxplots, Scatterplots, Cladogram Linear Discriminate Analysis (LDA) and Wilcox Rank Sum Test (non-parametric t-test) NCBI Sequence Read Archive accession no. SRP000383. Pre-processed dataset can also be retrieved from R package, phyloseq: filepath = system.file("extdata", "study_1457_split_library_seqs_and_mapping.zip", package = "phyloseq"); kostic = microbio_me_qiime(filepath). The Kostic dataset is a phyloseq object (S4) consisting of sam_table, otu_table, table, phy_tree, and tax_table. Sample table includes metadata of samples collected including: Diagnosis, Race, Gender, etc.
2 Junho - yeast transcriptome Gierlin ́ski M, Cole C, Schofield P, Schurch NJ, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson G, Owen- Hughes T, Blaxter M, and Barton GJ. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. Bioinformatics, 31(22):1–15, 2015. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools, edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. RNA-seq dataset to date that contains mRNA from 48 replicates of two S. cerevisiae populations: wildtype vs snf2 knock-out mutants Illumina HiSeq 2000 RStudio scatterplot, boxplot, heatmap t-test, Wald test (2 factors); LRT for multiple factor Europe SRA database accession
3 Brian - mouse platelet transcriptome Rowley, Jesse W et al. “Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes.” Blood vol. 118,14 (2011) How can we use RNA-seq analysis to identify key genetic expression differences in human/mouse platelet cells? 8 - 16 mouse samples (male & female); 2 human samples (male & female) Illumina GAIIx Aligned via Novoalignment / downstream analysis done in Perl / RPKM calculation scatter plots / pie charts / RefSeq gene annotations with RPKM expression levels / histograms Spearman rank correlation analysis Mouse & Human BAM files: https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2430) https://bioserver.hci.utah.edu/gnomex/analysis/(analysisPanel:2431) (LINKS don't work)
4 Qinfan - wildlife microbiome Comparing Microbiome Sampling Methods in a Wild Mammal: Fecal and Intestinal Samples Record Different Signals of Host Ecology, Evolution If there any differences between microbial communities from fecal and intestinal mucosa? Fecal and intestinal tissue samples from 37 bats in Lamanai, Belize. 55 DNA samples , 29 intestinal and 24 guano Illumina 16S rRNA Rstudio boxplot (Alpha diversity – Shannon and Faith’s Index) , Barplot(Abundance; Heatmap (test bacterial family abundance) , Scatterplot ( beta diversity) t-test, Wilconxon sign rank test and permutational multivariate analysis of variance This is raw, demultiplexed 16S sequence data. Data is available on NCBI Sequence Read Archive under BioProject # PRJNA428973, QIIME2 mapping file and annotated feature table are available on Figshare.
4 Brittany - human genome variation Belsare, S. et al. Evaluating the quality of the 1000 genomes project data. BMC Genomics 20, 620 (2019). Use data from the 1000 Genomes Project to determine whether there are significant differences in variants of pain receptor genes between ethnic groups. 2,504 individuals from 26 different ethnic groups from Africa, Asia, Europe, America

Genetic variants: rs4633, rs4680, rs4818, rs6269, rs740603, rs1051660, rs1799971, rs7958311, rs40434, rs2066713

whole genome sequencing, deep exome sequencing, dense microarray genotypiny; Illumina, 10X Genomics RStudio bar plot, pie chart, heatmap, Manhattan plot chi-square, ANOVA To search by variant and download table of allele frequencies and genotype frequencies; To download whole genomes
5 Hao - microRMA biomarker Vila-Navarro, Elena et al. “MicroRNAs for Detection of Pancreatic Neoplasia: Biomarker Discovery by Next-generation Sequencing and Validation in 2 Independent Cohorts.” Annals of surgery vol. 265,6 (2017): 1226-1234. How to find the new microRNA-based biomarkers for early detection of pancreatic neoplasia by anakyzing the miRNome of PDAC and the IMPN. A. Pancreatic tissues (n = 165); B. Biomarker discovery in a set of 18 surgical samples (11 PDAC, 4 IPMN, 3 C). C. MiRNA validation in 2 different set of samples. Set 1—52 surgical samples (24 PDAC, 7 IPMN, 6 chronic pancreatitis, 15 C), and set 2—95 endoscopic ultrasound-guided fine-needle aspirations (60 PDAC, 9 IPMN, 26 C). Illumina Genome Analyzer IIx Illumina GA pipeline software; R/Bioconductor Volcano plot, Heatmap, pie chart, ROC curve t-test Discriminatory power of immature granulocyte count (IG), cell-free DNA (cfDNA), mitochondrial DNA (mtDNA), nuclear DNA (ncDNA), phagocytic index (PI) and revised BAUX (rBAUX) for sepsis at different time
6 Marvin - microbiome Marangoni, A. et al. (2020). Pharyngeal microbiome alterations during Neisseria gonorrhoeae infection. Identify the bacterial community profiles of the pharyngeal microbiome associated with Neisseria gonorrhoeae infection. Does gonorrhea infection change the microbiome community in oropharynx? : Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected). Pharyngeal swap taken from men who have sex with men and reported having unsafe orogenital intercourse. 70 samples were taken with 45 no infection (n= 45, no infection), and 25 diagnosed with Neisseria gonorrhoeae infection (n= 25, infected). R package vegan, PICRUST software, Prism, MATLAB Alpha-diversity box plot of Chao1 and Shannon index, principal coordinate analysis plot (scatterplot), pie charts, data tables Non-parametric Monte Carlo-based test, Whitney t-test. (Requesting), NCBI Short Read Archive accession number PRJNA556341
7 Ann- disease transcriptome Mastrokolias A, Ariyurek Y, Goeman JJ, et al. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood. Eur J Hum Genet. 2015;23(10):1349–1356. Can the researchers identify accessible biomarkers for Huntington’s Disease to monitor disease progression and therapy response? RNA blood samples - 91 mutation carriers (27 presymptomatic and, 64 symptomatic) and 33 controls transcriptome sequencing; Illumina GA Pipeline R boxplots; scatterplot Student’s T-tests, linear modeling function in R, linear regression model Gene Expression Omnibus (GEO) under accession number GSE51799.
8 Vhy-Shelta - single cell transcriptome E. A. Stadtmauer et al., Science 10.1126/science.aba7365 (2020) First in human phase I clinical trial to test the safety and feasibility of multiplex CRISPR-Cas9 editing to engineer T cells in three patients with refractory cancer. Can CRISPR-Cas9 can be used as a synthetic biology cancer immunotherapy application in refractory cancer? Experimental group: n=3, 2 patients with refractory advanced myeloma and one patient with refractory metastatic sarcoma. Types of controls: Patient Untransduced T cells and cells transduced with NY-ESO TCR without CRISPR, positive reference sample containing 1x10^3 copies of synthetic template plasmid, healthy donor (ex vivo CD4+and CD8+ T cells from patients or healthy donor controls) Illumina seq- HiSeq400, MiSeq iGuide, Cellranger v3.0.2, Seurat v3.1.0, Python Box plots, bar graphs, scatter plots, Venn diagrams, heat maps, cytometry plots, UMAP plots, Swimmer’s plot, piecewise linear model, computed tomography scan Paired student t-tests, unpaired student’s t-tests, GraphPad, prism, parameter estimates, random estimates

Table 4S (excel file) presents the information for interpreting scRNAseq Metadata tables (4a-c) See supplementary materials Table 5S s Differentially expressed genes in NY-ESO-1 TCR+ T cells during various days pre and post infusion. Data Link (See Table 4S and Supplementary materials Table 5S)

8 Andrew - wildlife transcriptome [[ https://www.pnas.org/content/116/4/1331.long%7CYoung, Ferkin, et al. (2019)]] Conserved transcriptomic profiles underpin monogamy across vertebrates Is there a universal transcriptomic code underlying monogamy in vertebrates? Sequenced and compared neural transcriptomic profiles from reproductive males of closely related monogamous and nonmonogamous species from four major classes of vertebrates (n = 3 pooled individuals per species): Mammalia (Microtus ochrogaster versus Microtus pennsylvanicus and Peromyscus californicus versus Peromyscus maniculatus); Reptilia–Aves (A. spinoletta versus P. modularis); Amphibia (Ranitomeya imitator versus Oophaga pumilio); and Actinopterygii (Xenotilapia spilotera versus Xenotilapia ornatipinnis) ? Bioconductor DESeq2 Volcano Plot, (RRHO) Rank Rank Hypergeometric Overlap, HeatMap Spearman's Rank Correlation Coefficient, Linear Regression, Principle Component ANalysis Data Appendix, Dataset_S01, Dataset_S02
9 Adam - Single cell transcriptome Hong et al. (2019). Single-cell transcriptomics reveals multi-step adaptations to endocrine therapy. Nature Communications , 10 (1), 3840. How do clonal genetic diversity and transcriptional plasticity play a role in the early and late endocrine therapy for luminal breast cancers? MCF7, long-term oestrogen-deprived cells (LTED) , Primary-metastatic breast cancer cells were derived from pleuraleffusions of patients with metastatic breast cancers, and cells from 20 patients with luminal tumors treated by aromatase inhibitors (10 responders and 10 non-responders). Illumina HiSeq 4000 R Box plot, volcano plot, heatmap, gene network, venn diagrams, bar graphs, line plot Statistical Analysis Kruskal-Wallis test, two-tailed paired t-test, Wilcoxon signed-rank test, hypergeometric test, permutation test, regression GSE122743. Supplementary Data is also provided. Description of supplementary data
10 Stephanie - transcriptome Vijay et al. Critical role of phospholipase A2 group IID in age-related susceptibility to severe acute respiratory syndrome-CoV infection. J Exp Med 2015 Oct 19;212(11):1851-68. What is the effect of age-dependent anti-inflammation factor phospholipase A2 group IID (PLA2G2D) in the lungs of older patients with severe respiratory disease? 6 wk to 22 mo specific pathogen–free C57BL/6 mice were purchased, and Pla2g2d−/− and Pla2g2d+/+ mice (H-2b) were generated via ES cell transfection and embryo injections with the Pla2g2d-targeting vector inserted between exons 1 and 2 of the Pla2g2d gene. Pla2g2d+/− mice were bred with C57BL/6NCrSLc (Japan SLC) mice. After initial development, mice were backcrossed greater than 12 times to Japan SLC mice. Pla2g2d+/+ C57BL/6NCrSLc mice were used as controls for all experiments.IIlumina MouseRef-8 v2.0 Expression BeadChip ? PartekGS software and Ingenuity Pathway Analysis software heat map, histogram, flow cytometry plot Data were normalized and median polished using Robust Multichip Average background correction with log2-adjusted values. After obtaining the log2 expression values for genes, significance testing was performed comparing the two groups (CD11c+ cells from lungs of young and middle-aged mice). False discovery rate was applied to all p-values to correct for multiple testing. Significance of expression differences was assessed using an false discovery rate (FDR) cutoff of 0.05 and a twofold change. Analysis of pulmonary CDC11c+ cells from 6-8 week and 10-13 month old C57BL/6 animals. CDC11c+ cells are key modulators of the immune response in the lung.
11 Hector - transcriptome Kesherwani, V., Shahshahan, H. R., & Mishra, P. K. (2017). Cardiac transcriptome profiling of diabetic Akita mice using microarray and next generation sequencing. PloS one, 12(8), e0182828. 1. Identifying different molecular changes in cardiac transcriptomes between insulin negative (diabetic) and wild-type mice. 2. Determine implications of these molecular changes on several heart failure singalling pathways. Cardiac tissue RNA exons and introns of insulin negative Ins2+/- Akita mice, and normoglycemic (WT) mice, WT n= 3, Ins2+/- n = 3, N = 6 microarray, IPA analysis, and transcriptome sequencing Rstudio, Ingenuity Pathway Analyses (IPA) Bar plots, IPA network graph t-Test Data is found in the gene expression omnibus website (GEO) and an R script is provided
12 Kiseok Yang - cancer methylome Dai W et al., Comparative methylome analysis in solid tumors reveals aberrant methylation at chromosome 6p in nasopharyngeal carcinoma, Cancer Medicine 2015, 4(7):1079–1090, pubmed https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529346/ 25 NPC tumors were included in the discovery set and 35 NPC tumors in the validation set. Methylome(epigenome) Illumina(HumanMethylation450 BeadArray) R with using Limma analysis package Boxplots(with jitters), pie charts, bar charts(normal and stacked position) Mann–Whitney U-test Having information of probes in microarray with its location on chromosomes, strand type(forward or reverse) and brief exploration of the function.

https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE62336&format=file

13 Jennifer - regeneration transcriptome/proteom Rabinowitz et al. “Transcriptomic, proteomic, and metabolomic landscape of positional memory in the caudal fin of zebrafish.” PNAS 114,5 (2017) what are the molecules that regulate positional memory that is responsible for appendage regeneration in zebrafish? proximal and distal RNA-seq and LFQ proteomics datasets (transcripts and proteins present in similar patterns in both datasets), 32 molecules, Actb2 was used as a loading control transcriptome, proteome, metabolome Illumina g:Profiler bar graphs, heat maps, volcano plots t-test GSE92760

Course Schedule

Feb 1, 2020

  • Introduction
  • R Tutorial 1: Use interface, basic operations, load data. Slides:
Assignment 1 (10 pts; Due next class 2/8, in hard copy)
  • (3 pts) Print a copy of your first R script, with proper annotations
  • (3 pts) Transform the following "untidy/wide" table into a "tidy/tall" table (print a hard copy)
PropertyName,Density_250m,Density_500m,Density_1000m
HighbridgePark,0.006561319,0.009462031,0.010578611
BronxRiverParkway,0.001318749,0.001978858,0.002652118
CrotonaPark,0.009412087,0.01164712,0.01202321
ClaremontPark,0.016391948,0.019972485,0.020350481
VanCortlandtPark,0.000550151,0.000979312,0.001372675
  • (4 pts) Make a single slide of a primary research paper using next-generation sequencing (NGS) technologies, show the following
    • proper citation (authors, title, year, journal, URL)
    • NGS method (Illumina, PacBio, or NanoPore)
    • NGS application (genomics, cancer, transcriptome, microbiome, proteome, metagenomics, human variation, etc)
    • a key figure, with a caption explaining x-axis, y-axis, samples, experiments
    • raw data table (show first few columns and first few rows)
    • for example, a student has worked on tissue regeneration, the search in PubMed with key words "regeneration zebra fish transcriptome" found the following primary paper as the best because of the high quality of journal and the availability of raw data: https://www.ncbi.nlm.nih.gov/pubmed/28096348

Feb 8, 2020

Assignment 2 (10 pts; Due next class 2/15, in hard copy)
  • (3 pts) Print a copy of your 2nd R script, with proper annotations
  • (4 pts) Show following commands with the chaining operator ("%>%") for the "iris" data set (4 individual commands; not a single one)
    • Select columns "Sepal.Length" & "Species"
    • Filter rows 2 through 10
    • Add a column "logSepalLength" by taking the logarithm of the said column
    • Calculate mean and standard deviation of Petal.Length in each species
  • (3 pts) Transform the "iris" data table into a "tidy/tall" table (manually, show first 10 rows, print a hard copy)

Feb 15, 2020

  • NGS presentations (Group II)
  • R Tutorial. Part 3. Data visualization with ggplot2. Slides:
  • No assignment (go over slides and 3 tutorial scripts to prepare for Quiz next week)

Feb 22, 2020

  • Quiz 1 (Open Book)
  • R Tutorial: Part 4. BioStat (chi-square & t-test) Lecture slides:
Assignment 3 (10 pts). In-class workshop. Evaluation of papers according to the following rubrics (submit by email)
  • Citation & PubMed Link
  • Main research question
  • Samples, sample sizes, & controls
  • Omics technologies (e.g., genomics, metagenomics, microbiome, transcriptome, proteome, mythylome, RNA-seq, 16S amplicon sequencing)
  • Sequencing platform (e.g., illumina, PacBio, nanopore)
  • Main computational tools (e.g., R, RStudio, QIMME)
  • Main graphics (e.g., scatterplot, boxplot, heatmap, vocano plot)
  • Main statistical analysis (e.g., t-test, chi-square, regression analysis)
  • Data set: a short description & links

Feb 29, 2020

March 7, 2020

  • Self Study 1 (no class): RNA-Seq analysis. Assignment 4 (10 pts; due 3/14/2020): Self Study 1
  • Review for mid-term exam: 6 PDF presentations (intro to NGS & 5 R-tutorials)

March 14, 2020

  • Mid-term exam (50 pts). Open Book

March 21, 2020

March 28, 2020

  • No class; (CUNY Recalibration Period)
  • No assignment due (Assignment #5 due next session on April 4)

April 4, 2020

April 11, 2020

  • Quiz II (25 pts; Open Book; R markdown-generated WORD/PDF file as submission)
  • In-class workshop on identify genes/proteins/metabolites associated with tissue regeneration

April 18, 2020

  • Reference & data sets for final project have been posted & assignments have been made: Final_project_assignment
  • Before class: read paper, download assigned Excel workbook, save data set as TSV (tab-separated file); Read into R-studio.
  • During class: present the data set, including:
    • Biological question
    • Experimental design: samples, sample sizes, controls
    • Experimental techniques/measurements
    • Data set description, column by column
    • Visualization to be made
    • Statistical tests to be performed

April 25, 2020

May 2, 2020

  • Self study (no live session)
  • Tutorial: Tutorial for case study
  • For final report, you are required to:
    • Read the paper and identify a dataset to replicate
    • Create an R markdown file to record your work
    • Produce a final WORD or PDF file as final report
  • Your final report (100 pts) should include the following required components:
    • (10 pts) Section 1. Background & Objectives. Describe (a) the overall goal of the study; (b) the specific question to be addressed by your dataset
    • (20 pts) Section 2. Material & Methods. Describe experimental design, i.e., how your assigned data set was generated, including the nature of the biological samples, sample size, number of replicates (biological & technical), controls (if any), sequencing technologies. Hint: Fig S1
    • (40 pts) Section 3. R codes & graphs. Show R codes with comments for individual commands. Graphics should be as close to the published figure as possible (e.g., with proper axis labels)
    • (10 pts) Section 4. Statistical analysis. Show mull hypothesis and p-value. Draw statistical conclusion
    • (10 pts) Section 5. Conclusion. Draw biological conclusions of your analysis
    • (10 pts) Section 6. Citations/source/URL to paper, your dataset, and methods

May 9, 2020

  • Consultation (no mandatory participation)

May 16, 2020

  • Consultation (no mandatory participation)

May 22, 2020

  • Friday, 5pm: Final report Due (Blackboard submission)