Summer 2021: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 30: Line 30:
* Questions & Goals
* Questions & Goals
# Learn, implement, and compare the existing tools
# Learn, implement, and compare the existing tools
# Fine-tuning for OspC, to be integrated with the centroid algorithm  
# Fine-tuning for OspC, to be integrated with the centroid algorithm
# 2nd-generation centroid design: k-means algorithm (with applications to vls, Dengue, flu B)
* Reading list
* Reading list
** Strodthoff et al (2020). Bioinformatics. [https://academic.oup.com/bioinformatics/article/36/8/2401/5698270 UDSMProt: universal deep sequence models for protein classification]. [https://github.com/nstrodt/UDSMProt Source code on Github]
** Strodthoff et al (2020). Bioinformatics. [https://academic.oup.com/bioinformatics/article/36/8/2401/5698270 UDSMProt: universal deep sequence models for protein classification]. [https://github.com/nstrodt/UDSMProt Source code on Github]
** [https://www.pnas.org/content/118/15/e2016239118 Rives et al (2021). PNAS. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.] [https://github.com/facebookresearch/esm Github repository]
** [https://www.pnas.org/content/118/15/e2016239118 Rives et al (2021). PNAS. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.] [https://github.com/facebookresearch/esm Github repository]

Revision as of 03:23, 2 June 2021

Project 1. Borrelia genomics

Project 2. HIV compartmentalized evolution

  • Participants
    • Lily
  • Questions and goals
    • Do HIV evolve cell type tropisms within the host? Specifically, the Neural(N)-tropism vs T-cell(T)-tropism?
    • Build a classifier of N-tropism HIV subtypes
    • A presentation for an HIV conference in October
  • Reading list
  • Data sets
    • ~500 sequences of env genes from 15 patients
    • 2nd time point single-cell genome sequences for some of the patients
    • Experimentally verified N-tropism subtypes
  • Approach
    • Evolutionary mechanisms: mutation, recombination, and test of adaptive selection
    • Evolutionary rates & signature (BEAST)

Project 3. Natural Language models of proteins

  • Participants
  • Questions & Goals
  1. Learn, implement, and compare the existing tools
  2. Fine-tuning for OspC, to be integrated with the centroid algorithm
  3. 2nd-generation centroid design: k-means algorithm (with applications to vls, Dengue, flu B)