QuBi/modules/biol203-geno-pheno-association: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Lab
No edit summary
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
; <div style="font-size:180%">BIOL 203 Bioinformatics Exercises for Lab 13</div>
; <div style="font-size:180%">BIOL 203 Summer 2020 - Bioinformatics Exercises for Lab 11</div>
----
----
==Test phenotype-genotype association==
==Test phenotype-genotype association==
===Introduction: Contingency Test===
===Introduction: GWAS & Contingency Test===
Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases") as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):
Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases" e.g. disease) as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):


<center>
<center>
Line 19: Line 19:
</center>
</center>


Association between the genotype and the phenotype could be assessed with a [http://en.wikipedia.org/wiki/Contingency_table contingency table analysis] (also using chi-square, as in the preceding exercise). In this case, &Chi;<sup>2</sup> = 26.4, p=0.0005, suggesting a significant association between genotypes and diseases. (In this case, the result suggests that C/C genotypes are over-represented in disease cases.)
Association between the genotype and the phenotype could be assessed with a [http://en.wikipedia.org/wiki/Contingency_table contingency table analysis]. In this case, &Chi;<sup>2</sup> = 26.4, p<0.0005, suggesting a significant association between genotypes and diseases. (By comparing the expected and observed counts, one could conclude that the C/C genotypes are over-represented in disease cases.)


# Perform an [http://www.physics.csbsju.edu/stats/contingency.html online contingency table analysis] using the hypothetical data in Table 1.
1. Perform an [http://www.physics.csbsju.edu/stats/contingency.html online contingency table analysis] using the hypothetical data in Table 1. Click on "other contingency tables" and do a 2-rows and 3-columns test with the data above. Your &Chi;<sup>2</sup> should be 26.4.


# Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?
2. Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above.
For example, in the controls, the number of T alleles is: 18 + 68 = 86 , because homozygotes have two alleles and heterozygotes have one.
 
Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?
<center>
<center>
Table 2. Sample Allele Frequencies
Table 2. Sample Allele Frequencies
Line 38: Line 41:
</center>
</center>


===Test association at Locus A===
===Test association with locus A===
Following the above two examples, perform both the genotype and allele association tests using the class data.
Following the above two examples, perform both the genotype and allele association tests using the class data.
<center>
<center>
Line 69: Line 72:
</center>
</center>


===Test association at Locus B===
===Test association with Locus B===
Table 4a. Genotype counts at Locus B for each phenotype
Table 4a. Genotype counts at Locus B for each phenotype
<center>
<center>
Line 98: Line 101:
</center>
</center>


===Exit Questions===
==Web Exercise. Search for gene information using NCBI online databases==
# Which of the two genes shows significant ''genotype'' association with the PTC Taster/Non-Taster phenotype?
# Point your browser to the [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&BLAST_SPEC=OGP__9606__9558&LINK_LOC=blasthome NCBI Human Genome Resource] page
# Is there a statistically significant association between the ''alleles'' and the Taster phenotype?
# Copy and paste sequence provided on Blackboard- this is the sequence of the gene associated with the taster phenotype
# Which genotype is over-represented in the Non-Tasters?
# Expand the "Algorithm parameters" tab and change "Expect threshold" to 0.00001 (10e-5). Define "expect value" in your owns words after watching the linked Youtube video.
# Which allele is over-represented in the Non-Tasters?
# Press "BLAST". Copy & Paste the top hit in your final lab report.
# Are there exceptions? What are possible causes for exceptions?
# Briefly describe the function of the gene based on information gathered on the locus page
 
==Web Exercise 1. Search for gene information using NCBI online databases==
# Point your browser to the [http://www.ncbi.nlm.nih.gov/genome/guide/human/ NCBI Human Genome Resource] page
# Type in the "Find A Gene" search box "TAS2R38" and select "Homo sapiens" from the pull-down menu. Click "Go"
# Select the first link, which leads to an NCBI Gene Card page. Use the Gene Card to identify the following information on TAS2R38 gene:
## NCBI GeneID
## Chromosome location
## Click on "GenBank" and identify its gene structure, including the length of primary transcript, coding sequences, 5'-UTR and 3'-UTR. Does it have any introns?
## Zoom out the Sequence View to find its neighboring genes. Zoom in to read DNA sequences.
# Click the link to OMIM (under '''Phenotype''') and find phenotypes associated with TAS2R38 gene
## What does OMIM stand for?
## What are the expected "taster" and "nontaster" frequencies within human populations?
## If the ability to taste bitterness is evolutionary advantageous, how are alleles contributing to "nontaster" maintained in population?
## Is the correlation between TAS2R38 gene variations and the PTC phenotype variations 100%? If not, what could be the other causes?
 
==Web Exercise 2. Cross-species comparisons with HomoloGene==
# From the NCBI "TAS2R38" Gene page, click "HomoloGene" link under the "Related Information" (right-side navigation panel)
# You should see a page listing TAS2R38 orthologous (i.e., same gene in different species) genes from 7 mammalian species, including human (''Homo sapiens''), chimpanzee (''Pan troglodytes''), macaque (''Macaca mulatta''), dog (''Canis lupus familiaris''), cow (''Bos taurus''), rat (''Rattus norvegicus''), and mouse (''Mus musculus'').
# Write down your expectations for the following species relationships:
## Is chimpanzee more closely related to macaque or to human?
## Is dog more related to mouse or to cow?
## Is rat and mouse more closely related than human and chimpanzee?
# Click on the link "Show Pairwise Alignment Scores" under "Protein Alignments" and fill in the following table when the page loads. Do these sequence-comparison results change your expectations in the above? Explain.
<center>
{| class="wikitable"
|-
! Species pair !! % Protein Sequence Identity !! % DNA Seq Identity
|-
| Chimp-Human || ? || ?
|-
| Chimp-Macaque || ? || ?
|-
| Dog-Cow || ? || ?
|-
| Dog-Mouse || ? || ?
|-
| Rat-Mouse || ? || ?
|}
</center>
You can find exact differences by clicking on "Blast" for each pairwise comparisons. Lastly, obtain a phylogenetic tree of TAS2R38 protein sequences from these 7 species using [http://www.phylogeny.fr the phylogeny.fr web]
# Click "Show Multiple Alignment"
# Click "Download" and, when the page uploads, click "download" again
# Go to the [http://www.phylogeny.fr the phylogeny.fr web] and select "Phylogenetic Analysis" and then "One Click" analysis
# Copy and paste your downloaded sequences into the text box and click on "Submit"
# When analysis is finished, you should see a phylogenetic tree. Answer the following questions:
## Define "orthologous genes"
## What do tree nodes represent?
## What do tree branches and branch length represent?
## How do you determine species relatedness based on a phylogenetic tree?
([http://evolution.berkeley.edu/evolibrary/article/evo_05 This short tutorial] on phylogenetic tree may help).


==Web Exercise 3. Predict results of PCR and restriction analysis==
==Lab Report IV==
On a printout of the DNA sequence of TAS2R38 gene (from the GenBank link, see above),
# Your report should include the following results:
# Identify 5'-UTR, 3'-UTR, start codon, and stop codon.
## A printout of contingency test for Locus A, including expected counts, observed counts, chi-square statistic, degree of freedom, and p values
# Identify the regions your PCR primers should bind using the Primer3 web server
## Same as above for Locus B
## Point your browser to [http://primer3.ut.ee/ Primer3 Web Server]
## A printout of alignment for the top BLAST hits for the sequence provided
## Select "check_primer" in the top box, and "HUMAN" in the 2nd box
Additional questions to include in your report:
## Paste the raw gene sequence into the 3rd box from the [http://www.ncbi.nlm.nih.gov/nuccore/NC_000007.13?report=fasta&from=141672259&to=141673743&strand=true GenBank page]
## State what is the ''null hypothesis'' in a chi-square test & what is the ''alternative hypothesis''
## Paste the two primer sequences (use only the sequences within {}) into the 4th and 6th boxes: <pre>(p2283) ttttggatccAACTGGCAGAa{TAAAGATCTCAATTTAT}; (p2285) ttttggatcc{AACACAAACCATCACCCCTATTTT}</pre>
## Explain what probability is represented by the p-value.
## Click "Pick Primers"
## What can you conclude when p-value is '''below''' the threshold of significance (e.g., p = 0.05)?
# Identify the base location that contains 785 C/T SNP
## What would you conclude when p-value is '''above''' the critical value?
# Copy and paste the expected 303-bp section and locate the Fnu4H1 site using the [http://tools.neb.com/NEBcutter2/ NEBcutter website]
## Is there a statistically significant association between one of the alleles tested and the Taster phenotype?
# What are the expected lengths for the C/C, C/T, and T/T genotypes?
## Which genotype is over-represented in the Non-Tasters?
## Which allele is over-represented in the Non-Tasters?
## Are there exceptions? What are possible causes for exceptions?
## Define e-value in a BLAST search

Latest revision as of 02:16, 24 July 2020

BIOL 203 Summer 2020 - Bioinformatics Exercises for Lab 11

Test phenotype-genotype association

Introduction: GWAS & Contingency Test

Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases" e.g. disease) as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):

Table 1. Sample Genotype Frequencies

T/T T/C C/C Total
Case 0 24 127 ?
Control 9 68 114 ?
Total ? ? ? ?

Association between the genotype and the phenotype could be assessed with a contingency table analysis. In this case, Χ2 = 26.4, p<0.0005, suggesting a significant association between genotypes and diseases. (By comparing the expected and observed counts, one could conclude that the C/C genotypes are over-represented in disease cases.)

1. Perform an online contingency table analysis using the hypothetical data in Table 1. Click on "other contingency tables" and do a 2-rows and 3-columns test with the data above. Your Χ2 should be 26.4.

2. Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. For example, in the controls, the number of T alleles is: 18 + 68 = 86 , because homozygotes have two alleles and heterozygotes have one.

Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?

Table 2. Sample Allele Frequencies

T C Total
Case ? ? ?
Control ? ? ?
Total ? ? ?

Test association with locus A

Following the above two examples, perform both the genotype and allele association tests using the class data.

Table 3a. Genotype counts at Locus A

A1/A1 A1/A2 A2/A2 Row Sum
Taster ? ? ? ?
Non-Taster ? ? ? ?
Column Sum ? ? ? ?

Calculate allele counts & then test for association

Table 3b. Allele counts at Locus A

A1 A2 Row Sum
Taster ? ? ?
Non-Taster ? ? ?
Column Sum ? ? ?

Test association with Locus B

Table 4a. Genotype counts at Locus B for each phenotype

B1/B1 B1/B2 B1/B3 B2/B2 B2/B3 B3/B3 Row Sum
Taster ? ? ? ? ? ? ?
Non-Taster ? ? ? ? ? ? ?
Column Sum ? ? ? ? ? ? ?

Calculate allele counts & then test for association Table 4b. Allele counts at Locus A

B1 B2 B3 Row Sum
Taster ? ? ? ?
Non-Taster ? ? ? ?
Column Sum ? ? ? ?

Web Exercise. Search for gene information using NCBI online databases

  1. Point your browser to the NCBI Human Genome Resource page
  2. Copy and paste sequence provided on Blackboard- this is the sequence of the gene associated with the taster phenotype
  3. Expand the "Algorithm parameters" tab and change "Expect threshold" to 0.00001 (10e-5). Define "expect value" in your owns words after watching the linked Youtube video.
  4. Press "BLAST". Copy & Paste the top hit in your final lab report.
  5. Briefly describe the function of the gene based on information gathered on the locus page

Lab Report IV

  1. Your report should include the following results:
    1. A printout of contingency test for Locus A, including expected counts, observed counts, chi-square statistic, degree of freedom, and p values
    2. Same as above for Locus B
    3. A printout of alignment for the top BLAST hits for the sequence provided

Additional questions to include in your report:

    1. State what is the null hypothesis in a chi-square test & what is the alternative hypothesis
    2. Explain what probability is represented by the p-value.
    3. What can you conclude when p-value is below the threshold of significance (e.g., p = 0.05)?
    4. What would you conclude when p-value is above the critical value?
    5. Is there a statistically significant association between one of the alleles tested and the Taster phenotype?
    6. Which genotype is over-represented in the Non-Tasters?
    7. Which allele is over-represented in the Non-Tasters?
    8. Are there exceptions? What are possible causes for exceptions?
    9. Define e-value in a BLAST search