QuBi/modules/biol302: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 89: Line 89:
| AF527840 || || Genomic DNA || || 34,088
| AF527840 || || Genomic DNA || || 34,088
|-
|-
| EU076746 || P2-MDM2-C1 || cDNA missing exons 5-9 & 11 || MANCA || 427
| EU076746 || P2-MDM2-C1 || cDNA  || MANCA || 427
|-
|-
| EU076747 || P2-MDM2-10 || cDNA missing exon 10 || ML-1 || 842
| EU076747 || P2-MDM2-10 || cDNA || ML-1 || 842
|-
|-
| EU076748 || P2-MDM2-C  || cDNA missing exons 5-9 || A876 || 505
| EU076748 || P2-MDM2-C  || cDNA  || A876 || 505
|-
|-
| EU076749 || P2-MDM2-FL || Full-length cDNA || SJSA-1 || 845
| EU076749 || P2-MDM2-FL || Full-length cDNA || SJSA-1 || 845
|-
|-
|}
|}
*[[NOTE: Maybe we should remove the "Description" section of this table and leave it up to the students to figure out which exons are missing]]


----
----
Line 104: Line 103:
Sequences on [[genbank]] have both basic reference information (such as what the sequence is, what organism it came from, and bibliographical information) and sequence [[annotations]]. Some sequences are more richly annotated than others - it is up to researchers to annotate the sequences they generate, which requires extra work. For this exercise you will be working will a well-annotated sequence: [[accession number]] AF527840. Explore its annotation and use it to complete the following set of tasks.
Sequences on [[genbank]] have both basic reference information (such as what the sequence is, what organism it came from, and bibliographical information) and sequence [[annotations]]. Some sequences are more richly annotated than others - it is up to researchers to annotate the sequences they generate, which requires extra work. For this exercise you will be working will a well-annotated sequence: [[accession number]] AF527840. Explore its annotation and use it to complete the following set of tasks.


# DRAW a diagram of this gene using the information and coordinates listed in the annotation. (Note: this is the bulk of the assignment and this diagram is needed for the last set of questions. Don't get lazy on this.)
# DRAW a diagram of this gene using the information and coordinates listed in the annotation.  
##Label the top of the diagram with basic information, such as the gene's name, organism, etc.. Someone should be able to pick up your diagram and know exactly what they're looking at.   
##Label the top of the diagram with basic information, such as the gene's name, organism, etc.. Someone should be able to pick up your diagram and know exactly what they're looking at.   
##Including introns, exons, 3'/5' UTRs, +1, and exact coordinates. (The mRNA annotation states which segments are used to create mRNA, and the CDS annotation states which parts code amino acids (CDS = coding sequence)).
##Including introns, exons, 3'/5' UTRs, +1, and exact coordinates. (The mRNA annotation states which segments are used to create mRNA, and the CDS annotation states which parts code amino acids (CDS = coding sequence)).
##Draw the diagram mostly to scale. It does NOT have to be perfect, but make a reasonable effort. Put a scale bar and length markers on your drawing.
##Draw the diagram mostly to scale. It does NOT have to be perfect, but make a reasonable effort. Put a scale bar and length markers on your drawing.
# How does the sequence vary at positions X, X, and X for this gene? Do these change the AA for the resulting peptide?
# What kinds of repeat regions can be found in this gene?


----
----
; Explore the graphical presentation of the gene to answer the following question.
; Explore the graphical presentation of the gene to answer the following question.
Genbank provides graphical representations of the sequences on its database: click the "Graphics" link below the sequence title, OR click "Display Settings" above the title, and choose "Graphics". Take a few minutes to explore this graphical browser and answer the following question:
Genbank provides graphical representations of the sequences on its database: click the "Graphics" link below the sequence title, OR click "Display Settings" above the title, and choose "Graphics". Take a few minutes to explore this graphical browser and answer the following question:
# A question that can only be answered from looking at this graph.
# A question that can only be answered from looking at this graph.
# Another question that can only be answered from looking at this graph.
# Another question that can only be answered from looking at this graph.
Line 120: Line 116:
----
----
; Use BLAST to determine which exons are used in the mRNA transcripts.
; Use BLAST to determine which exons are used in the mRNA transcripts.
This is the most "bioinformatic" part of the assignment. Blast ALL FOUR of the mRNA sequences (EU076746, EU076747, EU076748, EU076749) against the main sequence (AF527840) and use the results to answer the following questions. If you labeled your diagram well (with coordinates!), this task should go by quickly.
This is the most "bioinformatic" part of the assignment. Blast one of the mRNA sequences (EU076746, EU076747, EU076748, EU076749) against the main sequence (AF527840) and use the results to answer the following questions. Suggested procedures:
 
# Go to the [[http://www.ncbi.nlm.nih.gov/BLAST/ NCBI BLAST]] website
# Which exons are used to create EU076746, EU076747, EU076748, EU076749?
# Click the link “Align two sequences using BLAST (bl2seq)” under “Specialized BLAST” (near the page bottom)
# In the “Sequence 1” text box, type “AF527840” (the accession for the genomics).  Fill in “from 1” and “to 34088”. In the “Sequence 2” text box, type in "EU076748" (or other cDNA accession in the table). Fill in the “from” box with 1 and the “to” box with 505.
# Click “Align”.  You should get a “Blast Result” output page.
; Interpret your results:
# Which exons are present and which ones are absent in EU076746, EU076747, EU076748, EU076749? (Hint: Refer to the mRNA join statement).
# Do the BLAST search results corresponding to exons ''exactly'' match the start/end positions of the exons as labeled in your diagram? If not, what is the most likely reason for this?
# Do the BLAST search results corresponding to exons ''exactly'' match the start/end positions of the exons as labeled in your diagram? If not, what is the most likely reason for this?
# Do any of the BLAST results match regions outside of exons? If so, what regions?
# Do any of the BLAST results match regions outside of exons? If so, what regions?
 
----
<!-- ===[[QuBi/modules/biol302#Exit_Questions|Exit Questions]]=== -->
===[[QuBi/modules/biol302#Exit_Questions|Exit Questions]]===
#Describe essential experimental steps of obtaining cDNA clones of a gene from a cell culture.
#Based on the NCBI BLAST documentation, define BLAST. Describe applications of different BLAST tools.
#Describe informatics steps of identifying a splice variant using BL2SEQ.
# Explain why the following statement is FALSE: The first exon always starts with ATG and the last exon always ends with a stop codon. What are the terms for the untranslated regions of exons?

Revision as of 17:36, 22 March 2013

BIOL 302 Lab (Bioinformatics Exercises)

Research in molecular genetics requires effective use of bioinformatic tools to analyze and understand the genetic materials being worked with. The following exercises will expose you to real-world scenarios and introduce you to the methods and tools you can use to solve these problems.

Part 1: Cloning of murine mdm2 gene sequence to study cis acting DNA elements

Key Concepts

  • Homology searching using BLAST
  • Mammalian gene regulation

So far, you have excised this XbaI fragment:

TCTAGATGCATTTACGAAGGAGACAGAAAACGTCTTTCGGCAATAGCTCTCAAATGCAAAACGACGTCGG CGAGCTGTCCCTTACCTGGAGGCCCGCAGGAGAAGCGCGGTGATCCGAGAGGGTCCCCCAGGGGTGTCCG GTCGGTCTCCCGCTCGCCCAGCAGACGGCTGCGGAAACGGGGCAGCGTTTAAATAACCCCAGCTGGAGAC ATGTCAGGACTTAGCTCCTCCGACAGCCGACGCCGGACGTGTCCCAACTTGACCAGCCCCACAGGAAGAG CTGAGTCAACTCGGCCCAGCCCAGTCCCACCCGTCCCGGAAGCCGCATCCCGGCGAGTCCGGGACCAGGC ACCTGTCACCTCCTGGACCCCAGCAACGAGCCCAGCGCGACCCCGGAGCGGGCCCGAATTCTCTAGA


Gene identification using genome BLAST

  1. Go to the NCBI-BLAST website at NCBI/BLAST Home Page
  2. What is BLAST? Find the expanded answer by clicking on "more"
  3. Since BLAST finds matches (homologous sequences) between biological sciences, it needs a "query" sequence as input as well as a "database" to search against. To find the matches to the above sequence, what would be your "query" sequence and what would be your "database"?
  4. Start BLASTing against the mouse genome by clicking "Mouse" under "BLAST Assembled RefSeq Genomes"
  5. Copy and paste the above sequence into the "Enter Query Sequence" box
  6. Scroll down to the bottom of the page and click "BLAST"
  7. Wait for 10-30 seconds for the results to return (Be Patient). Once the result page is loaded, locate and copy the following information:
    1. Species and strain
    2. Chromosome
    3. Length of your query sequence
    4. Sequence identity, number of matched bases, and number of gaps between the matched sequences
    5. Full name of the gene that matches your query
  8. Click the link for "5' side" gene will bring you a standard GenBank file of this gene. Locate and copy the following structural information about this gene:
    1. Gene accession (ID number)
    2. Total length of the gene
    3. Number of introns
    4. Which is the actual coding strand (mRNA analog): the above sequence itself or its reverse complement?

Explore pGL2 vector

This sequence has been cloned into the NheI site in the pGL2basic vector:

PGL2vector-map.png Source: Promega

  1. Identify the location where you have cloned your fragment
  2. What are the two possible directions (with respect to the luciferase gene) you may have cloned your fragment?
  3. From the PDF file, find the location of the EcoRI site

Identify functional element through literature search

Zauberman fig2.png

PROPERLY CITE THIS Taken from A functional p53-responsive intronic promoter is contained within the human mdm2 gene. Pubmed PDF


  1. Identify the TATA box in the above sequence by looking for AT-rich regions
  2. Once the TATA box is located, use this sequence as the anchor point to locate other elements, including the two p53 Response Element (p53 RE) sites and the Exon 2
  3. Locate the EcoRI restriction site using the NEBcutter website
  4. What are the expected EcoRI fragment sizes for the two possible orientations of your cloned DNA?
  5. Which orientation would you expect to give higher luciferase expression?

Part 2 (Extra Credit Assignment): Identification of mdm2 Splice Variants Using BLAST

4seq.png

A diagram of the MDM2 gene used in this exercise, along with its splice variants. By the end of this module you will create a similar diagram.



Key Concepts

  • Use BLAST (not Google) for finding matches of DNA and protein sequences
  • Alternative splicing and isoforms of a single gene

Exercise

Genbank Accession # cDNA Clone Description Cell Line Length (bp)
AF527840 Genomic DNA 34,088
EU076746 P2-MDM2-C1 cDNA MANCA 427
EU076747 P2-MDM2-10 cDNA ML-1 842
EU076748 P2-MDM2-C cDNA A876 505
EU076749 P2-MDM2-FL Full-length cDNA SJSA-1 845

Explore the gene annotation for AF527840.

Sequences on genbank have both basic reference information (such as what the sequence is, what organism it came from, and bibliographical information) and sequence annotations. Some sequences are more richly annotated than others - it is up to researchers to annotate the sequences they generate, which requires extra work. For this exercise you will be working will a well-annotated sequence: accession number AF527840. Explore its annotation and use it to complete the following set of tasks.

  1. DRAW a diagram of this gene using the information and coordinates listed in the annotation.
    1. Label the top of the diagram with basic information, such as the gene's name, organism, etc.. Someone should be able to pick up your diagram and know exactly what they're looking at.
    2. Including introns, exons, 3'/5' UTRs, +1, and exact coordinates. (The mRNA annotation states which segments are used to create mRNA, and the CDS annotation states which parts code amino acids (CDS = coding sequence)).
    3. Draw the diagram mostly to scale. It does NOT have to be perfect, but make a reasonable effort. Put a scale bar and length markers on your drawing.

Explore the graphical presentation of the gene to answer the following question.

Genbank provides graphical representations of the sequences on its database: click the "Graphics" link below the sequence title, OR click "Display Settings" above the title, and choose "Graphics". Take a few minutes to explore this graphical browser and answer the following question:

  1. A question that can only be answered from looking at this graph.
  2. Another question that can only be answered from looking at this graph.

Use BLAST to determine which exons are used in the mRNA transcripts.

This is the most "bioinformatic" part of the assignment. Blast one of the mRNA sequences (EU076746, EU076747, EU076748, EU076749) against the main sequence (AF527840) and use the results to answer the following questions. Suggested procedures:

  1. Go to the [NCBI BLAST] website
  2. Click the link “Align two sequences using BLAST (bl2seq)” under “Specialized BLAST” (near the page bottom)
  3. In the “Sequence 1” text box, type “AF527840” (the accession for the genomics). Fill in “from 1” and “to 34088”. In the “Sequence 2” text box, type in "EU076748" (or other cDNA accession in the table). Fill in the “from” box with 1 and the “to” box with 505.
  4. Click “Align”. You should get a “Blast Result” output page.
Interpret your results
  1. Which exons are present and which ones are absent in EU076746, EU076747, EU076748, EU076749? (Hint: Refer to the mRNA join statement).
  2. Do the BLAST search results corresponding to exons exactly match the start/end positions of the exons as labeled in your diagram? If not, what is the most likely reason for this?
  3. Do any of the BLAST results match regions outside of exons? If so, what regions?

Exit Questions

  1. Describe essential experimental steps of obtaining cDNA clones of a gene from a cell culture.
  2. Based on the NCBI BLAST documentation, define BLAST. Describe applications of different BLAST tools.
  3. Describe informatics steps of identifying a splice variant using BL2SEQ.
  4. Explain why the following statement is FALSE: The first exon always starts with ATG and the last exon always ends with a stop codon. What are the terms for the untranslated regions of exons?