A Primer on the Cluster System at Hunter: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Rayrah
No edit summary
imported>Rayrah
No edit summary
Line 1: Line 1:
=What is a Cluster System?=
=What is a Cluster System?=
[[File:Https://access.redhat.com/documentation/en-US/Red Hat Enterprise Linux/4/html/Cluster Suite Overview/images/ccs-overview.png|thumb|frameless|'''Figure 1''' <sub>The general idea of a cluster system. There is a login node that is directly connected to the other nodes of the cluster</sub>]]
[[File:Ccs-overview.png|thumb|frameless|'''Figure 1''' <sub>The general idea of a cluster system. There is a login node that is directly connected to the other nodes of the cluster</sub>]]


A '''Cluster System''' is a set of connected computers that work together so that in order to performs tasks. Unlike many of the portals at hunter that requires you to first login to a head node, and then to a working node; for example to enter the Qiu lab servers you must first login to Darwin.hunter.cuny.edu and then to a compute node such as Wallace.hunter.cuny.edu, on a cluster system all of the nodes connected can be viewed as a single system. Under this system all you would need to do is login to the head node, and from there you can run your programs. However,  unlike the non-cluster computer servers at hunter, in order to fully utilize the cluster you must submit your job to a job scheduler that controls which node to run the job on.  
A '''Cluster System''' is a set of connected computers that work together so that in order to performs tasks. Unlike many of the portals at hunter that requires you to first login to a head node, and then to a working node; for example to enter the Qiu lab servers you must first login to Darwin.hunter.cuny.edu and then to a compute node such as Wallace.hunter.cuny.edu, on a cluster system all of the nodes connected can be viewed as a single system. Under this system all you would need to do is login to the head node, and from there you can run your programs. However,  unlike the non-cluster computer servers at hunter, in order to fully utilize the cluster you must submit your job to a job scheduler that controls which node to run the job on.  

Revision as of 08:14, 2 October 2015

What is a Cluster System?

Figure 1 The general idea of a cluster system. There is a login node that is directly connected to the other nodes of the cluster

A Cluster System is a set of connected computers that work together so that in order to performs tasks. Unlike many of the portals at hunter that requires you to first login to a head node, and then to a working node; for example to enter the Qiu lab servers you must first login to Darwin.hunter.cuny.edu and then to a compute node such as Wallace.hunter.cuny.edu, on a cluster system all of the nodes connected can be viewed as a single system. Under this system all you would need to do is login to the head node, and from there you can run your programs. However, unlike the non-cluster computer servers at hunter, in order to fully utilize the cluster you must submit your job to a job scheduler that controls which node to run the job on.

Sept 18, 2015

  • Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon

Sept 11, 2015

  • Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
    • Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
    • Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
    • Key results: extensive recombination among clones; rates and tract length quantified by LD decay
    • My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.

Sept 4, 2015

  • Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
    • Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
    • Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
    • Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
    • Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
    • My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)

Aug 28, 2015

  • Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
    • Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
    • Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
    • Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
    • Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
    • Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
    • My overall rating: 3.5/5.0
  • Project updates & plans (1:30-2)
    • Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
    • Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
    • John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
    • Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis