Latest revision as of 22:28, 5 July 2019

Rules of Conduct

No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
Be respectful to each other, regardless of level of study
Be on time & responsible. Communicate in advance with the PI if late or absent
No use of phone or laptop during lab meetings

Schedule

June 19 (Wed). Summer research kickoff. Papers assigned. To prepare for Python tutorial, install the jupyter notebook in one of two following ways (by Edgar):
- Installing the Anaconda Distribution (https://www.anaconda.com/distribution/#download-section): This is the easiest way to install Python on your machine. It also comes with a lot of packages for data science. However, it is quite heavy (~3GB), so if space is an issue you can try installing Miniconda. If you choose to install Anaconda, you don't need to install any additional packages since they are going to be installed automatically. Make sure you download Python 3.
- Installing Miniconda3 (https://docs.conda.io/en/latest/miniconda.html): This is like a mini version of Anaconda that comes with the Conda package manager and Python. It doesn't include any packages so it requires less space.
  - Installing on MacOs: https://docs.conda.io/projects/conda/en/latest/user-guide/install/macos.html
  - Installing on Windows: https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html
  - Once you install Miniconda, you can use the conda command on your terminal to install other packages:

conda install numpy
conda install pandas
conda install matplotlib
conda install jupyter

June 21 (Fri). Python Tutorial I. Jupyter notebook, string, list, conditions, loops (by Edgar)
June 24 (Mon). Python Tutorial II. string functions, function, dictionary, modules (by Edgar)
June 26 (Wed). Python Tutorial III. BioPython (Edgar & Muhammud)
June 27 (Thur). Paper presentations

Participants

Dr Oliver Attie, Research Associate
Brian Sulkow, Research Associate
Saymon Akther, CUNY Graduate Center, EEB Program
Lily Li, CUNY Graduate Center, EEB Program
Christopher Panlasigui, Hunter Biology
Summer Interns: Muhammad, Radhika Mohan, Oscar Eng, Oliver Cai

Journal Club

a Unix & Perl tutorial
A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
A review on Borrelia genomics: https://www.ncbi.nlm.nih.gov/pubmed/24704760
A model of immune selection: He et al (2018). https://www.nature.com/articles/s41467-018-04219-3
A model of flu evolution: Neher et al (2016). https://www.pnas.org/content/113/12/E1701?ijkey=72c6025e999dd043d32f6822dc06c7356d8494b2&keytype2=tf_ipsecsha
Reading on Dengue virus
1. Bäck & Lundkvist 2013
2. Overview (from Scitable)

Projects

Project	Description/Goal	Participants	Leader
Lyme genomics	phylogeography & genome intragression	Saymon, Chris	Saymon
Lyme ecology & population genetics	host identification; SIR model; coalescence	Lily, Chris	Lily
Borrelia peptide library	Compile ORFs from genome database & send to Mt Sinai team N=17 north american strains selected query script: get_pepseq_anno_for_each_genome.pl -p "B31" or strain ID; -a to get patric annotation Expected outputs: 17 FASTA files; Excel workbook with 17 sheets	Chris, Saymon	Chris
Origin of genetic code	manuscript revision Response letter done Track-change in progress To do: Reference update; Figure 6 (and legend/text)	Oliver & Brian	Oliver
Pseudomonas metabolomics	Shiny Web portal development	Chris, Edgar	Chris
Dengue antigenic variations	Parse Dengue sequences (E & PrM proteins) into data-friendly format: vid (e.g., DENV1_E_0001), strain_name, gene_name Alignment with MUSCLE & produce 2 alignments, one for E and the other for PrM Infer tree for each alignment run GA (DEAP package in python, ask Edgar) to generate centroid	Oliver Cai, Muhammad/Benjamin, Che	Muhammad
OspC design	Optimization by GA Identified a sequence with d<=43 to all 16 alleles (using DEAP) also done: write a fitness function to minimize the maximum distance to any of the 16 alleles; run ~10 times output in FASTA file and do a tree (done) To do: minimize Max(d); generate ~10 evolved sequencess	Edgar, Lia	Brian
OspC antigenecity model	Develop model with mice data by Invanova et al	Nevila, Brian	Brian
OspC per-site model	Quantify per-site importance with likelihood, i.e., Prob{fit->0, given that CW50, site=i} For allele A, file has been generated at CW50=50 Radhika has turned strings into {0,1} To do: plot fitness ~ pos CW50 values: given by Mohammad, estimated by using CW50=-(intercept/slope) Estimate importance using GA: log(fit) = sum{log(1-p[i])}, with the fitness/error function: error = \|log(fit[obs]) - log(fit[simulated])\|; using GA to minimize the error; output p[i] as results (importance)	Radhika, Muhammad, Brian	Brian
flu	implement paper algorithm plot HI vs Seq.diff, one for each of 15 "refv" plot HI vs SNP, colored by 0 or 1, with boxplot + jitter To do: implement Neher et al Alternatively, generate 0,1 strings and run the "importance" model (see above)	Oscar, Brian	Brian

@@ Line 48: / Line 48: @@
 {| class="wikitable"
 |-
-! Project !! Description/Goal !! Participants !! Leader !! Header text
+! Project !! Description/Goal !! Participants !! Leader !! Status/Notes/Weekly report (7/12, 7/19, 7/26)
 |-
-| Lyme genomics|| phylogeography & genome intragression|| Saymon, Chris || Saymon || Example
+| Lyme genomics|| phylogeography & genome intragression|| Saymon, Chris || Saymon ||
 |-
-| Lyme ecology & population genetics || host identification; SIR model; coalescence || Lily, Chris || Lily || Example
+| Lyme ecology & population genetics || host identification; SIR model; coalescence || Lily, Chris || Lily ||
 |-
 | Borrelia peptide library || Compile ORFs from genome database & send to Mt Sinai team
@@ Line 58: / Line 58: @@
 * query script: get_pepseq_anno_for_each_genome.pl -p "B31" or strain ID; -a to get patric annotation
 * Expected outputs: 17 FASTA files; Excel workbook with 17 sheets
-  || Chris, Saymon || Chris || Example
+  || Chris, Saymon || Chris ||
 |-
-| Origin of genetic code || manuscript revision || Oliver & Brian || Oliver || Example
+| Origin of genetic code || manuscript revision
+# Response letter done
+# Track-change in progress
+# To do: Reference update; Figure 6 (and legend/text)
+ || Oliver & Brian || Oliver ||
 |-
-| Pseudomonas metabolomics || Shiny Web portal development || Chris, Edgar || Chris || Example
+| Pseudomonas metabolomics || Shiny Web portal development || Chris, Edgar || Chris ||
 |-
-| Dengue antigenic variations || Parse Dengue sequences || Oliver Cai, Muhammad, Che || Muhammad || Example
+| Dengue antigenic variations ||
+# Parse Dengue sequences (E & PrM proteins) into data-friendly format: vid (e.g., DENV1_E_0001), strain_name, gene_name
+# Alignment with MUSCLE & produce 2 alignments, one for E and the other for PrM
+# Infer tree for each alignment
+# run GA (DEAP package in python, ask Edgar) to generate centroid
+|| Oliver Cai, Muhammad/Benjamin, Che || Muhammad ||
 |-
 | OspC design ||
 Optimization by GA
 * Identified a sequence with d<=43 to all 16 alleles (using DEAP)
-* next, write a fitness function to minimize the maximum distance to any of the 16 alleles; run ~10 times
+* also done: write a fitness function to minimize the maximum distance to any of the 16 alleles; run ~10 times
-* output in FASTA file and do a tree
+* output in FASTA file and do a tree (done)
-|| Edgar, Lia || Brian || Example
+* To do: minimize Max(d); generate ~10 evolved sequencess
+|| Edgar, Lia || Brian ||
 |-
-| OspC antigenecity model || Develop model with mice data by Invanova et al || Nevila, Brian || Brian || Example
+| OspC antigenecity model || Develop model with mice data by Invanova et al || Nevila, Brian || Brian ||
 |-
 | OspC per-site model || Quantify per-site importance with likelihood, i.e., Prob{fit->0, given that CW50, site=i}
@@ Line 79: / Line 89: @@
 # Radhika has turned strings into {0,1}
 # To do: plot fitness ~ pos
-# CW50 values
+# CW50 values: given by Mohammad, estimated by using CW50=-(intercept/slope)
-allele	ldcliff	ld50s	diff	fraction
+# Estimate importance using GA: log(fit) = sum{log(1-p[i])}, with the fitness/error function: error = |log(fit[obs]) - log(fit[simulated])|; using GA to minimize the error; output p[i] as results (importance)
-A	74.32	74.39986151	0.079861513	0.464999134
+|| Radhika, Muhammad, Brian || Brian ||
-B	84.07	84.19001475	0.120014749	0.526187592
+|-
-C	89.06	88.80160958	-0.25839042	0.55501006
+| flu|| implement paper algorithm
-D	82.35	82.01000047	-0.339999528	0.512562503
+# plot HI vs Seq.diff, one for each of 15 "refv"
-E	79.59	79.67991166	0.089911663	0.497999448
+# plot HI vs SNP, colored by 0 or 1, with boxplot  + jitter
-F	73.24	73.2797994	0.039799399	0.457998746
+# To do: implement Neher et al
-G	103	110.3897344	7.38973436	0.68993584
+# Alternatively, generate 0,1 strings and run the "importance" model (see above)
-H	68.13	68.47992158	0.349921584	0.42799951
+ || Oscar, Brian || Brian ||
-I	82.04	81.86999899	-0.170001006	0.511687494
-J	59.24	59.21999998	-0.020000023	0.370125
-K	91.7	90.98168236	-0.718317645	0.568635515
-L	81.25	81.50980683	0.259806831	0.509436293
-M	76.69	76.74941138	0.059411379	0.479683821
-N	89.53	89.35088035	-0.179119648	0.558443002
-T	74.12	74.11963945	-0.000360548	0.463247747
-U	74.28	74.57944362	0.29944362	0.466121523
-|| Radhika, Muhammad, Brian || Brian || Example
 |}

Summer 2019: Difference between revisions

Latest revision as of 22:28, 5 July 2019

Contents

Rules of Conduct

Schedule

Participants

Journal Club

Projects

Navigation menu

Summer 2019: Difference between revisions

Latest revision as of 22:28, 5 July 2019

Rules of Conduct

Schedule

Participants

Journal Club

Projects

Navigation menu

Search