QuBi/modules/biol203-geno-pheno-association: Difference between revisions
imported>Weigang (Created page with "; <div style="font-size:180%">BIOL 203 Bioinformatics Exercises for Lab 13</div> ---- Research in modern molecular genetics increasingly relies on genomic information and comp...") |
imported>Lab No edit summary |
||
(23 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
; <div style="font-size:180%">BIOL 203 Bioinformatics Exercises for Lab | ; <div style="font-size:180%">BIOL 203 Summer 2020 - Bioinformatics Exercises for Lab 11</div> | ||
---- | ---- | ||
==Test phenotype-genotype association== | |||
===Introduction: GWAS & Contingency Test=== | |||
Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases" e.g. disease) as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T): | |||
= | <center> | ||
Table 1. Sample Genotype Frequencies | |||
{| class="wikitable" | |||
|- | |||
! !! T/T !! T/C !! C/C !! Total | |||
|- | |||
| Case || 0 || 24 || 127 || ? | |||
|- | |||
| Control || 9 || 68 || 114 || ? | |||
|- | |||
| Total || ? || ? || ? || ? | |||
|} | |||
</center> | |||
Association between the genotype and the phenotype could be assessed with a [http://en.wikipedia.org/wiki/Contingency_table contingency table analysis]. In this case, Χ<sup>2</sup> = 26.4, p<0.0005, suggesting a significant association between genotypes and diseases. (By comparing the expected and observed counts, one could conclude that the C/C genotypes are over-represented in disease cases.) | |||
1. Perform an [http://www.physics.csbsju.edu/stats/contingency.html online contingency table analysis] using the hypothetical data in Table 1. Click on "other contingency tables" and do a 2-rows and 3-columns test with the data above. Your Χ<sup>2</sup> should be 26.4. | |||
2. Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. | |||
For example, in the controls, the number of T alleles is: 18 + 68 = 86 , because homozygotes have two alleles and heterozygotes have one. | |||
Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases? | |||
<center> | <center> | ||
Table 2. Sample Allele Frequencies | |||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! | ! !! T !! C !! Total | ||
|- | |||
| Case || ? || ? || ? | |||
|- | |||
| Control || ? || ? || ? | |||
|- | |- | ||
| | | Total || ? || ? || ? | ||
|} | |||
</center> | |||
===Test association with locus A=== | |||
Following the above two examples, perform both the genotype and allele association tests using the class data. | |||
<center> | |||
Table 3a. Genotype counts at Locus A | |||
{| class="wikitable" | |||
|- | |- | ||
! !! A1/A1 !! A1/A2 !! A2/A2 !! Row Sum | |||
|- | |- | ||
| | | Taster || ? || ? || ? || ? | ||
|- | |- | ||
| | | Non-Taster || ? || ? || ? || ? | ||
|- | |- | ||
| | | Column Sum || ? || ? || ? || ? | ||
|} | |} | ||
</center> | </center> | ||
Calculate allele counts & then test for association | |||
<center> | <center> | ||
Table | Table 3b. Allele counts at Locus A | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! !! | ! !! A1 !! A2 !! Row Sum | ||
|- | |- | ||
| | | Taster || ? || ? || ? | ||
|- | |- | ||
| | | Non-Taster || ? || ? || ? | ||
|- | |- | ||
| | | Column Sum || ? || ? || ? | ||
|} | |} | ||
</center> | </center> | ||
===Test association with Locus B=== | |||
Table 4a. Genotype counts at Locus B for each phenotype | |||
<center> | <center> | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! !! | ! !! B1/B1 !! B1/B2 !! B1/B3 !! B2/B2 !! B2/B3 !! B3/B3!! Row Sum | ||
|- | |- | ||
| | | Taster || ? || ? || ? || ? || ? || ? || ? | ||
|- | |- | ||
| | | Non-Taster || ? || ? || ? || ? || ? || ? || ? | ||
|- | |- | ||
| | | Column Sum || ? || ? || ? || ? || ? || ? || ? | ||
|} | |} | ||
</center> | </center> | ||
Calculate allele counts & then test for association | |||
Table 4b. Allele counts at Locus A | |||
<center> | <center> | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! !! | ! !! B1 !! B2 !! B3 !! Row Sum | ||
|- | |- | ||
| Taster || ? || ? || ? | | Taster || ? || ? || ? || ? | ||
|- | |- | ||
| Non-Taster || ? || ? || ? | | Non-Taster || ? || ? || ? || ? | ||
|- | |- | ||
| | | Column Sum || ? || ? || ? || ? | ||
|} | |} | ||
</center> | </center> | ||
==Web Exercise. Search for gene information using NCBI online databases== | |||
# | # Point your browser to the [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&BLAST_SPEC=OGP__9606__9558&LINK_LOC=blasthome NCBI Human Genome Resource] page | ||
# Is there a statistically significant association between the | # Copy and paste sequence provided on Blackboard- this is the sequence of the gene associated with the taster phenotype | ||
# Which allele | # Expand the "Algorithm parameters" tab and change "Expect threshold" to 0.00001 (10e-5). Define "expect value" in your owns words after watching the linked Youtube video. | ||
# | # Press "BLAST". Copy & Paste the top hit in your final lab report. | ||
# Briefly describe the function of the gene based on information gathered on the locus page | |||
==Lab Report IV== | |||
# Your report should include the following results: | |||
## A printout of contingency test for Locus A, including expected counts, observed counts, chi-square statistic, degree of freedom, and p values | |||
## Same as above for Locus B | |||
## A printout of alignment for the top BLAST hits for the sequence provided | |||
Additional questions to include in your report: | |||
## State what is the ''null hypothesis'' in a chi-square test & what is the ''alternative hypothesis'' | |||
## Explain what probability is represented by the p-value. | |||
## What can you conclude when p-value is '''below''' the threshold of significance (e.g., p = 0.05)? | |||
## What would you conclude when p-value is '''above''' the critical value? | |||
## Is there a statistically significant association between one of the alleles tested and the Taster phenotype? | |||
## Which genotype is over-represented in the Non-Tasters? | |||
## Which allele is over-represented in the Non-Tasters? | |||
## Are there exceptions? What are possible causes for exceptions? | |||
## Define e-value in a BLAST search |
Latest revision as of 02:16, 24 July 2020
- BIOL 203 Summer 2020 - Bioinformatics Exercises for Lab 11
Test phenotype-genotype association
Introduction: GWAS & Contingency Test
Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases" e.g. disease) as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):
Table 1. Sample Genotype Frequencies
T/T | T/C | C/C | Total | |
---|---|---|---|---|
Case | 0 | 24 | 127 | ? |
Control | 9 | 68 | 114 | ? |
Total | ? | ? | ? | ? |
Association between the genotype and the phenotype could be assessed with a contingency table analysis. In this case, Χ2 = 26.4, p<0.0005, suggesting a significant association between genotypes and diseases. (By comparing the expected and observed counts, one could conclude that the C/C genotypes are over-represented in disease cases.)
1. Perform an online contingency table analysis using the hypothetical data in Table 1. Click on "other contingency tables" and do a 2-rows and 3-columns test with the data above. Your Χ2 should be 26.4.
2. Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. For example, in the controls, the number of T alleles is: 18 + 68 = 86 , because homozygotes have two alleles and heterozygotes have one.
Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?
Table 2. Sample Allele Frequencies
T | C | Total | |
---|---|---|---|
Case | ? | ? | ? |
Control | ? | ? | ? |
Total | ? | ? | ? |
Test association with locus A
Following the above two examples, perform both the genotype and allele association tests using the class data.
Table 3a. Genotype counts at Locus A
A1/A1 | A1/A2 | A2/A2 | Row Sum | |
---|---|---|---|---|
Taster | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? |
Calculate allele counts & then test for association
Table 3b. Allele counts at Locus A
A1 | A2 | Row Sum | |
---|---|---|---|
Taster | ? | ? | ? |
Non-Taster | ? | ? | ? |
Column Sum | ? | ? | ? |
Test association with Locus B
Table 4a. Genotype counts at Locus B for each phenotype
B1/B1 | B1/B2 | B1/B3 | B2/B2 | B2/B3 | B3/B3 | Row Sum | |
---|---|---|---|---|---|---|---|
Taster | ? | ? | ? | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? | ? | ? | ? |
Calculate allele counts & then test for association Table 4b. Allele counts at Locus A
B1 | B2 | B3 | Row Sum | |
---|---|---|---|---|
Taster | ? | ? | ? | ? |
Non-Taster | ? | ? | ? | ? |
Column Sum | ? | ? | ? | ? |
Web Exercise. Search for gene information using NCBI online databases
- Point your browser to the NCBI Human Genome Resource page
- Copy and paste sequence provided on Blackboard- this is the sequence of the gene associated with the taster phenotype
- Expand the "Algorithm parameters" tab and change "Expect threshold" to 0.00001 (10e-5). Define "expect value" in your owns words after watching the linked Youtube video.
- Press "BLAST". Copy & Paste the top hit in your final lab report.
- Briefly describe the function of the gene based on information gathered on the locus page
Lab Report IV
- Your report should include the following results:
- A printout of contingency test for Locus A, including expected counts, observed counts, chi-square statistic, degree of freedom, and p values
- Same as above for Locus B
- A printout of alignment for the top BLAST hits for the sequence provided
Additional questions to include in your report:
- State what is the null hypothesis in a chi-square test & what is the alternative hypothesis
- Explain what probability is represented by the p-value.
- What can you conclude when p-value is below the threshold of significance (e.g., p = 0.05)?
- What would you conclude when p-value is above the critical value?
- Is there a statistically significant association between one of the alleles tested and the Taster phenotype?
- Which genotype is over-represented in the Non-Tasters?
- Which allele is over-represented in the Non-Tasters?
- Are there exceptions? What are possible causes for exceptions?
- Define e-value in a BLAST search