Summer 2018: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Lab
imported>Lab
No edit summary
Line 160: Line 160:
}
}
  write.table(output, "immune-output.txt", quote = F, sep = "\t")
  write.table(output, "immune-output.txt", quote = F, sep = "\t")
</syntaxhighlight>
*(Muhammad) Creates a plot for the correlation values of the lab's data and the author's data
<syntaxhighlight lang="bash">
#read in the authors cross reactivity correlation matrix
cr<- read.csv("C:/ospc/matricesospc.csv", header=F, sep = ",")
#puts all of the values of cr into corvect
corvect<-c()
for (i in 1:(nrow(cr)-1)) {
  for (j in (i+1):ncol(cr)) {
    corvect[length(corvect)+1]<- cr[i,j]
  }
    }
#merging cross reactivity correlation data and the authors data
df<- data.frame(output, corvect)
#plots the dataframe
plot(output[,3], corvect, main="Cross Reactivity Correlation Comparison", ylab = "Author's Output", xlab="Lab Output")
#gives the liner model, relationship between our data and the authors
b<-lm(corvect~output[,3])
#places the ab line on the plot
abline(b, col=2)


</syntaxhighlight>
</syntaxhighlight>

Revision as of 17:09, 26 June 2018

Rules of Conduct

  1. No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
  2. Be respectful to each other, regardless of level of study
  3. Be on time & responsible. Communicate in advance with the PI if late or absent

Participants

  1. Dr Oliver Attie, Research Associate
  2. Brian Sulkow, Research Associate
  3. Saymon Akther, CUNY Graduate Center, EEB Program
  4. Lily Li, CUNY Graduate Center, EEB Program
  5. Mei Wu, Bioinformatics Research Assistant
  6. Yinheng Li, Informatics Research Assistant
  7. Christopher Panlasigui, Hunter Biology
  8. Dr Lia Di, Senior Scientist
  9. Dr Weigang Qiu, Principal Investigator
  10. Summer Interns: Muhammad, Pavan, Roman, Benjamin, Andrew, Michelle, Hannah

Journal Club

  1. a Unix & Perl tutorial
  2. A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
  3. A review on Borrelia genomics: https://www.ncbi.nlm.nih.gov/pubmed/24704760
  4. ospC epitope mapping: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0067445

Projects

Borrelia genome evolution (Led by Saymon)

  1. Goal 1. Estimate time of cross-Atlantic dispersal using core-genome sequences
  2. Goal 2. Investigate codon biases with respect to levels of gene expression. Data file:

Identification of host species from ticks (Led by Lily [after first-level])

  1. Goal 1. Protocol optimization for PCR amplification of host DNA from ticks
  2. Goal 2. Protocol development: library construction for MiSeq
  3. Goal 3. Development of bioinformatics protocols and sequence database

Pseudomonas Genome-wide Association Studies (GWAS) (Led by Mai & Yinheng, in collaboration with Dr Xavier of MSKCC)

  1. Goal 1. Association of genes/SNPs with biofilm formation and c-di-GMP levels: Manuscript preparation
  2. Goal 2. Association of genome diversity with metabolic diversity
  • (Christopher) This script parses excel peak-area file into database & R inputs
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Getopt::Std;

my %opts;
my $line_ct = 0;
my (@colnames, @areas, %seen_cmps, %seen_gids);

getopts('dr', \%opts);
while(<>) {
    chomp;
    $line_ct++;
    next unless $line_ct >=4;
    if ($line_ct == 4) {
        @colnames = split "\t", $_;
        for (my $i=5; $i<=$#colnames; $i++) { $seen_gids{$colnames[$i]}++ } # get uniq gids
        next;
    }
    my @data = split "\t", $_;
    $seen_cmps{$data[1]}++; # get unique compound formula
    for (my $i=5; $i<=$#data; $i++) {
        my $area = { 'compound' => $data[1], 'gid' =>$colnames[$i],  'peak_area' => $data[$i]};
        push @areas, $area;
    }
}

if ($opts{d}) { # for database output
    foreach my $cmp (sort keys %seen_cmps) {
        foreach my $gid (sort keys %seen_gids) {
            my @peaks = grep { $_->{compound} eq $cmp && $_->{gid} == $gid } @areas;
            my $peak_str = join ",", map {$_->{peak_area} || "NULL"} @peaks;
            print join "\t", ($gid, $cmp, "{" . $peak_str . "}");
            print "\n";
        }
    }
}

if ($opts{r}) { # for R output
    foreach my $cmp (sort keys %seen_cmps) {
        foreach my $gid (sort keys %seen_gids) {
            my @peaks = grep { $_->{compound} eq $cmp && $_->{gid} == $gid } @areas;
            foreach my $peak (@peaks) {
                next unless $peak->{peak_area};
                print join "\t", ($peak->{peak_area}, $gid, $cmp);          
                print "\n";
            }
        }
    }
}
exit;
Compound amount in each genome

Machine learning approaches to evolution (Led by Oliver & Brian)

OspC structural alignment, converted from S2 from Baum et al (2013)
  1. Goal 1. Implement Hopfield network for optimization of protein structure
  2. Goal 2. Neural-net models of OspC. Structural alignment (S2 from Baum et al 2013):
  1. Goal 3. K-mer-based pipeline for genome classification

Weekly Schedule

  • Summer kickoff (June 1, 2018, Friday): Introduction & schedules
  • Week 1 (June 4-8):
    • Monday: the Unix & Perl Tutorial, Part 1
    • Tuesday: Unix Part2. Explore the "iris" data set using R, by following the the Monte Carlo Club Week 1 (1 & 2) and Week 2. Read McKay (2003), Chapters 38 & 39
    • Thursday: 1st field day (Caumsett State Park); Participants: John, Mahamad, Pavan, Andrew, Dr Sun, Weigang, with 3 members of Moses team from Suffolk County Vector Control. Got ~110 deer tick nymphs
    • Friday: meeting with MSKCC group at 11am; BBQ afterwards
  • Week 2 (June 11-15):
    • Monday: Lab meeting, projects assigned
    • Tuesday: neural net tutorial (by Brian)
    • Thursday: 2nd field day (Fire Island National Seashore). Participants: John, Brian, Mei, Mahamad, Pavan, Benjamin, and Weigang. Got ~100 lone-star ticks and 4 deer tick nymphs
  • Week 3 (June 18-22):
    • Monday: Lab meeting, 1st project reports
      • Codon Bias: Theory, Coding, and Data (Andrew, Pavan, Saymon)
      • OspC epitope identification: Serum correlation, sequence correlation, immunity-sequence correction (Mahamad, Roman, Brian)
      • Pseudomonas metabolomics: parsing intensity file; theory & parsing SMBL file (Chris & Benjamin)
    • Tuesday: working groups
    • Wed: working groups
    • Thursday: Big Data Workshop
    • Friday: working groups
  • Week 4 (June 25-29):
  • Monday: Lab meeting

Lab notes for Summer HS Interns

Notes & Scripts

# preliminaries: save as TSV; substitute "\r" if necessary; 
# substitute "N/A" to "NA"; remove extra columns
setwd("Downloads/")
x <- read.table("table-s2.txt4", sep="\t", header=T)
View(x)
colnames(x)
which(x[,8]=="A")
x[which(x[,8]=="A"),12]
x[which(x[,8]=="A3"),12]
cor.test(x[which(x[,8]=="A3"),12], x[which(x[,8]=="A"),12], method = "pearson")
x.cor$estimate
levels(x[,8]) # obtain ospC allele types; to be looped through in pairwise
for (i in 1:?) { for (j in ?:?) {cor.test(....)}}
  • (Muhammad) Output generates data frame of correlation/p values for 23 different Osp-C allele types in pairwise
setwd("C:/R_OspC")
x <- read.table("Table-S2.txt", sep="\t", header=T)
a<-levels(x[,8])
output = data.frame(i=character(), j=character(), cor = numeric(), p = numeric());
#k <-0;
for(i in 1:22) {
  allele.i <- a[i];
  vect.i <- x[which(x[,8]==allele.i),12];
  
  for(j in (i+1):23) {
    allele.j <- a[j];
    vect.j <-x[which(x[,8]==allele.j),12];
    cor <- cor.test(vect.i,vect.j, method = "pearson");
    output <- rbind (output, data.frame(i=allele.i, j=allele.j, cor=cor$estimate, p=cor$p.value)); 
  }
}
 write.table(output, "immune-output.txt", quote = F, sep = "\t")
  • (Muhammad) Creates a plot for the correlation values of the lab's data and the author's data
#read in the authors cross reactivity correlation matrix
cr<- read.csv("C:/ospc/matricesospc.csv", header=F, sep = ",")
#puts all of the values of cr into corvect
corvect<-c()
for (i in 1:(nrow(cr)-1)) {
  for (j in (i+1):ncol(cr)) {
    corvect[length(corvect)+1]<- cr[i,j]
  }
    }
#merging cross reactivity correlation data and the authors data
df<- data.frame(output, corvect)
#plots the dataframe
plot(output[,3], corvect, main="Cross Reactivity Correlation Comparison", ylab = "Author's Output", xlab="Lab Output")
#gives the liner model, relationship between our data and the authors
b<-lm(corvect~output[,3])
#places the ab line on the plot
abline(b, col=2)