QuBi/modules/biol203-geno-pheno-association
- BIOL 203 Bioinformatics Exercises for Lab 13
Test phenotype-genotype association
Introduction: Contingency Test
Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases") as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):
Table 1. Sample Genotype Frequencies
T/T | T/C | C/C | Total | |
---|---|---|---|---|
Case | 0 | 24 | 127 | ? |
Control | 9 | 68 | 114 | ? |
Total | ? | ? | ? | ? |
Association between the genotype and the phenotype could be assessed with a contingency table analysis (also using chi-square, as in the preceding exercise). In this case, Χ2 = 26.4, p=0.0005, suggesting a significant association between genotypes and diseases. (In this case, the result suggests that C/C genotypes are over-represented in disease cases.)
- Perform an online contingency table analysis using the hypothetical data in Table 1.
- Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?
Table 2. Sample Allele Frequencies
T | C | Total | |
---|---|---|---|
Case | ? | ? | ? |
Control | ? | ? | ? |
Total | ? | ? | ? |
Test genotype/allele association at locus A
Following the above two examples, perform both the genotype and allele association tests using the class data.
Table 3a. Genotype counts at Locus A
A1/A1 | A1/A2 | A2/A2 | Row Sum | |
---|---|---|---|---|
Taster | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? |
Calculate allele counts & then test for association
Table 3b. Allele counts at Locus A
A1 | A2 | Row Sum | |
---|---|---|---|
Taster | ? | ? | ? |
Non-Taster | ? | ? | ? |
Column Sum | ? | ? | ? |
Test association at Locus B
Table 4a. Genotype counts at Locus B for each phenotype
B1/B1 | B1/B2 | B1/B3 | B2/B2 | B2/B3 | B3/B3 | Row Sum | |
---|---|---|---|---|---|---|---|
Taster | ? | ? | ? | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? | ? | ? | ? |
Calculate allele counts & then test for association Table 4b. Allele counts at Locus A
B1 | B2 | B3 | Row Sum | |
---|---|---|---|---|
Taster | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? |
Exit Questions
- Which of the two genes shows significant genotype association with the PTC Taster/Non-Taster phenotype?
- Is there a statistically significant association between the alleles and the Taster phenotype?
- Which genotype is over-represented in the Non-Tasters?
- Which allele is over-represented in the Non-Tasters?
- Are there exceptions? What are possible causes for exceptions?
Web Exercise 1. Search for gene information using NCBI online databases
- Point your browser to the NCBI Human Genome Resource page
- Type in the "Find A Gene" search box "TAS2R38" and select "Homo sapiens" from the pull-down menu. Click "Go"
- Select the first link, which leads to an NCBI Gene Card page. Use the Gene Card to identify the following information on TAS2R38 gene:
- NCBI GeneID
- Chromosome location
- Click on "GenBank" and identify its gene structure, including the length of primary transcript, coding sequences, 5'-UTR and 3'-UTR. Does it have any introns?
- Zoom out the Sequence View to find its neighboring genes. Zoom in to read DNA sequences.
- Click the link to OMIM (under Phenotype) and find phenotypes associated with TAS2R38 gene
- What does OMIM stand for?
- What are the expected "taster" and "nontaster" frequencies within human populations?
- If the ability to taste bitterness is evolutionary advantageous, how are alleles contributing to "nontaster" maintained in population?
- Is the correlation between TAS2R38 gene variations and the PTC phenotype variations 100%? If not, what could be the other causes?
Web Exercise 2. Cross-species comparisons with HomoloGene
- From the NCBI "TAS2R38" Gene page, click "HomoloGene" link under the "Related Information" (right-side navigation panel)
- You should see a page listing TAS2R38 orthologous (i.e., same gene in different species) genes from 7 mammalian species, including human (Homo sapiens), chimpanzee (Pan troglodytes), macaque (Macaca mulatta), dog (Canis lupus familiaris), cow (Bos taurus), rat (Rattus norvegicus), and mouse (Mus musculus).
- Write down your expectations for the following species relationships:
- Is chimpanzee more closely related to macaque or to human?
- Is dog more related to mouse or to cow?
- Is rat and mouse more closely related than human and chimpanzee?
- Click on the link "Show Pairwise Alignment Scores" under "Protein Alignments" and fill in the following table when the page loads. Do these sequence-comparison results change your expectations in the above? Explain.
Species pair | % Protein Sequence Identity | % DNA Seq Identity |
---|---|---|
Chimp-Human | ? | ? |
Chimp-Macaque | ? | ? |
Dog-Cow | ? | ? |
Dog-Mouse | ? | ? |
Rat-Mouse | ? | ? |
You can find exact differences by clicking on "Blast" for each pairwise comparisons. Lastly, obtain a phylogenetic tree of TAS2R38 protein sequences from these 7 species using the phylogeny.fr web
- Click "Show Multiple Alignment"
- Click "Download" and, when the page uploads, click "download" again
- Go to the the phylogeny.fr web and select "Phylogenetic Analysis" and then "One Click" analysis
- Copy and paste your downloaded sequences into the text box and click on "Submit"
- When analysis is finished, you should see a phylogenetic tree. Answer the following questions:
- Define "orthologous genes"
- What do tree nodes represent?
- What do tree branches and branch length represent?
- How do you determine species relatedness based on a phylogenetic tree?
(This short tutorial on phylogenetic tree may help).
Web Exercise 3. Predict results of PCR and restriction analysis
On a printout of the DNA sequence of TAS2R38 gene (from the GenBank link, see above),
- Identify 5'-UTR, 3'-UTR, start codon, and stop codon.
- Identify the regions your PCR primers should bind using the Primer3 web server
- Point your browser to Primer3 Web Server
- Select "check_primer" in the top box, and "HUMAN" in the 2nd box
- Paste the raw gene sequence into the 3rd box from the GenBank page
- Paste the two primer sequences (use only the sequences within {}) into the 4th and 6th boxes:
(p2283) ttttggatccAACTGGCAGAa{TAAAGATCTCAATTTAT}; (p2285) ttttggatcc{AACACAAACCATCACCCCTATTTT}
- Click "Pick Primers"
- Identify the base location that contains 785 C/T SNP
- Copy and paste the expected 303-bp section and locate the Fnu4H1 site using the NEBcutter website
- What are the expected lengths for the C/C, C/T, and T/T genotypes?