QuBi/modules/biol203-Lab4
Lab 4. Bioinformatics Exercises: BLAST & Gene Structure
Expected Learning Outcomes
- Be able to perform NCBI BLAST search for homologous sequences in GenBank.
- Be able to identify individual gene elements based on an NCBI GenBank file.
- Be able to identify alternative splice forms a single gene using NCBI web tools
Lab Report Grading Policy
- Introduction (5 pts). Define these terms: bioinformatics, homology, BLAST, e-value, alternative splicing (Your statements are not to be copied from the Lab Manual.)
- Materials and Methods (5 pts). List and describe steps of a BLAST search & steps of identifying alternative splicing variants with BLAST.
- Results (20 pts). Copy your results in a document during the exercises, and hand in an organized copy. Include answers to all queries and questions.
- Discussion and Conclusion (15 pts). Answer the five discussion questions. Summary/Conclusion: a sentence or two will suffice.
- References (5 pt). Credit is given for pertinent references obtained from sources other than the Lab Manual.
(Total: 50 pts)
Introduction
Research in molecular genetics requires effective use of online bioinformatic tools to analyze and understand the genetic materials being worked with. The following exercises will expose you to real-world scenarios and introduce you to the methods and tools you can use to solve these problems.
In biology, homology is defined as a common or shared evolutionary origin. Therefore, homologous sequences are sequences diverged from a common ancestor. Note that the word "homology" is different from "similarity": homologous structures or sequences may not be similar (e.g., forearms in mammals and birds) and, conversely, similar structures or sequences may not be homologous (e.g., wings in birds and bats).
BLAST is a computer algorithm allowing for efficient search of similar sequences in a large database. While BLAST performs a similar function to Google search, you should not use Google to look for similar sequences in a human or other genome. When sequences are similar with a sufficient statistical significance (measured by e-value, see below), we consider these sequences homologous to each other.
Exercise 1. Homology searching using BLAST
- Go to the NCBI-BLAST website at NCBI/BLAST Home Page
- What is BLAST? Read and copy the expanded answer by clicking on "more"
- Since BLAST finds matches between biological sciences, it needs a "query" sequence as input as well as a "database" to search against. To find matches of a sequence in human genome, what would be your "query" sequence and what would be your "database"?
- Start BLASTing against the mouse genome by clicking "Mouse" under "BLAST Assembled RefSeq Genomes"
- Copy and paste the following sequence into the "Enter Query Sequence" box:
CTAGATGCATTTACGAAGGAGACAGAAAACGTCTTTCGGCAATAGCTCTCAAATGCAAAACGACGTCGG CGAGCTGTCCCTTACCTGGAGGCCCGCAGGAGAAGCGCGGTGATCCGAGAGGGTCCCCCAGGGGTGTCCG GTCGGTCTCCCGCTCGCCCAGCAGACGGCTGCGGAAACGGGGCAGCGTTTAAATAACCCCAGCTGGAGAC ATGTCAGGACTTAGCTCCTCCGACAGCCGACGCCGGACGTGTCCCAACTTGACCAGCCCCACAGGAAGAG CTGAGTCAACTCGGCCCAGCCCAGTCCCACCCGTCCCGGAAGCCGCATCCCGGCGAGTCCGGGACCAGGC ACCTGTCACCTCCTGGACCCCAGCAACGAGCCCAGCGCGACCCCGGAGCGGGCCCGAATTCT
- Scroll down to the bottom of the page and click "BLAST"
- Wait for 10-30 seconds for the results to return (be patient). Once the result page is loaded, locate and copy/write down
the following information for the first hit:
- Species and strain
- Chromosome
- Length of your query sequence
- Sequence identity, number of matched bases, and number of gaps between the matched sequences
- Click the link for "5' side" (next to Features) will bring you a standard GenBank file of this gene. Locate and copy the
following structural information about this gene:
- Gene accession (ID number)
- Total length of the gene
- Number of introns
- Which is the non-template (mRNA analog) strand: the above sequence itself or its reverse complement? [Hint: note the word complement in mRNA and cDNA lines)
Exercise 2. Explore the structure of human mdm2 gene
- Search GenBank using the accession AF527840. Read the GenBank file and find out from the feature table how many introns and exons this sequence has according to the "mRNA" and "CDS" features.
- Click on "mRNA" and notice that exon sequences are now highlighted
- Fill in Table 1 for each EXON you could identify:
- Fill in Table 2 for each INTRON you could identify:
- Click on "CDS" and notice that coding sequences are now highlighted
- Fill in Table 3 for each coding sequence you could identify:
- DRAW a diagram of this gene based on the above exon and intron coordinates.
- Label the top of the diagram with basic information, such as the gene's name and species information.
- Label coordinates for introns, exons, 3'/5' UTRs, start-codon, and stop-codon coordinates.
- Draw the diagram mostly to scale. It does NOT have to be perfect, but make a reasonable effort. Put a scale bar and length markers on your drawing.
- Answer the following questions:
- What is the total length of exons, introns, and coding sequences of this gene?
- Are all exon sequences code for proteins? Which exons are non-coding in mdm2?
- Align the first 5 bases of all introns. Which bases are conserved among intron starts ("donor sites")?
- Align the last 5 bases of all introns. Which bases are conserved among intron ends ("acceptor sites")?
Table 1. mdm2 Exons
Exon # | Start Position | End Position | Length |
---|---|---|---|
#1 | 1971 | 2271 | 301 |
#2 | ? | ? | ? |
Table 2. mdm2 Introns
Intro Number | Start Position | End Position | Length | First 5 bases | Last 5 bases |
---|---|---|---|---|---|
#1 | 2272 | 2987 | 616 | GTACT | TGTAG |
#2 | ? | ? | ? | ? | ? |
Table 3. mdm2 Coding Sequences (CDS)
CDS # | Start Position | End Position | Length |
---|---|---|---|
#1 | 1971 | 2271 | 301 |
#2 | ? | ? | ? |