Foundations of Biology CMB Laboratory Lab 1, page 3


Practice Laboratory N° 1

Bioinformatics: Self-Guided Internet-based Exercise on Databases for the Storage and Data Mining


This exercise aims to introduce you to some of the relevant databases and bioinformatics tools for examining and comparing different pieces of biological information. Biological databases are an important resource (Maloney et al., 2010) for the study of biochemistry, molecular genetics, transmission genetics, cell biology, evolution and many other branches of the biological sciences.


Biological databases contain enormous amounts of information about the sequences and structures of nucleic acids (DNA and RNA) and proteins; gene structures and chromosomes; metabolic pathways and enzymes; signaling mechanisms, etc. Some of them include software tools that can be used to analyze such data. Often, the software can be used directly through a web browser (web apps). Freestanding applications must be downloaded and installed on your computer or a local network.

The analysis of biological macromolecules (especially DNA, RNA and proteins) is based on the fundamental principle of gene expression, also known as the Central Dogma of Molecular Genetics, represented in this oversimplified diagram:


Do not use a printout of the Manual PDF. Instead, use this Microsoft Word version. The Internet hyperlinks are active in this Word document. Enter your answers by double-clicking the phrase STARTTTYPINGTHERE and start typing. Once you have completed the exercise, provide your instructor with a hard copy, or submit via SafeAssign, or send it via e-mail, as she indicates.

Important: Always give your document a title that includes your name and other pertinent information. “Untitled01.docx” is not a good name, neither “Graph.xlsx” or “ExtraCredit.txt.” You can imagine how many papers we get from students curiously named “Untitled01.” So, here’s a suggestion (assuming that you are using Microsoft Word):

LastName_FirstName_202_Section_NN_Bioinformatics.docx. Example, Ms. Janet Kovacz sends a paper to Mr. Sergio Capellutti, instructor for section B3. So Mr. Villiers gives his paper the unmistakable name Kovacz_Janet_202_B3_Bioinformatics.docx.

  1. Finding Databases in the World Wide Web

Let’s start by finding databases (Honts, 2003). You may click on the URLs in this document. Describe, in a short sentence, what is the purpose of each particular website. The home page usually has a brief description of what the purpose of the website creators was. Some titles are obvious (e.g. OMIM = Online Mendelian Inheritance in Man); others are not. For example, if you read the top of BLAST’s first page, you’ll find: “BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.” BLAST stands for Basic Local Alignment Search Tool and any biology student should become familiar with it. Click “Learn more” to find an expanded description.

1.1. General databases and tools for bioinformatics studies

National Center for Biotechnology Information

Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE

Online Mendelian Inheritance in Man (OMIM)

Brief description: STARTTTYPINGGHERE

NCBI Conserved Domain Search

Brief description: STARTTTYPINGGHERE

CDART: Conserved Domain Architecture Retrieval Tool

Brief description: STARTTTYPINGGHERE

European Bioinformatics Institute

Brief description: STARTTTYPINGGHERE

Protein Data Bank

Brief description: STARTTTYPINGGHERE

GenomeNet Database Resources

Brief description: STARTTTYPINGGHERE

1.2. Access points for integrated suites of sequence analysis tools

Multiple sequence alignment (protein)

Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE

Multiple sequence alignments

Brief description: STARTTTYPINGGHERE

PRABI (Rhone-Alpes Bioinformatics Center)

Brief description: STARTTTYPINGGHERE

1.3. Some resources for human genomics

The Human Genome (NCBI)

Brief description: STARTTTYPINGGHERE

Human Genome Browser Gateway (UCSC)


Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE

1.4. Databases with entire genomic sequences

National Center for Genome Resources

Brief description: STARTTTYPINGGHERE

J. Craig Venter Institute

Brief description: STARTTTYPINGGHERE

Gramene: A Resource for Comparative Grass Genomics


Maize GDB (Maize Genetics and Genomics Database)


1.5. Example of a specialized structure prediction tool

COILS Server

Brief description: STARTTTYPINGGHERE

1.6. Metabolic and signaling pathways

BioCyc (several organisms)

Brief description: STARTTTYPINGGHERE

EcoCyc (Escherichia coli)

Brief description: STARTTTYPINGGHERE

Saccharomyces cerevisiae (brewer’s yeast)

Brief description: STARTTTYPINGGHERE

Arabidopsis thaliana (thale cress)

Brief description: STARTTTYPINGGHERE

Danio rerio (zebra fish)

Brief description: STARTTTYPINGGHERE

Mus musculus (mouse)

Brief description: STARTTTYPINGGHERE

Homo sapiens (human)

Brief description: STARTTTYPINGGHERE

1.7. Additional learning resources (notice the absence of Wikipedia on this list)

Biology Online Textbook

Brief description: STARTTTYPINGGHERE

Metabolic Pathways

Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE

Phylogenetic trees: tree

Brief description: STARTTTYPINGGHERE

Brief description: STARTTTYPINGGHERE

Brief description: STARTTTYPINGGHERE

Google Scholar

Brief description: STARTTTYPINGGHERE


Brief description: STARTTTYPINGGHERE

1.8. The National Center for Biotechnology Information

NCBI is a comprehensive network of databases that include information on nucleotidyl sequences (e.g. chromosomal DNA, mRNA, non-protein–coding RNAs), amino acyl sequences (proteins), taxonomy, genetically-based diseases (also known as “inborn errors of metabolism.” Here’s a diagram that illustrates the relationships among these different databases:

You may want to continue exploring NCBI. The link will take you to a comprehensive list of all NCBI databases (NCBI 2005).

  1. Case Study: A Human Nucleotidyl Sequence

Specific Learning Objectives

· Describe what GenBank files are and be able to read them.

· Describe what FASTA format is and learn how to identify sequences in FASTA format.

· Become familiar with the BLAST program (check NCBI websites) and learn how to use it.

NOTE: Your instructor may decide to assign you a different sequence with respect to the one in this section. If this is the case, enter modifications to this document as necessary.


The nucleotidyl-residue (or “nucleotide,” for short) sequence on the following page comes from a human DNA sequencing project. You are given the task of identifying the location of this sequence within the human genome (Alaie et al., 2012). The problem is that the human genome is made up of 3 billion base pairs (bp). To check even 1000 bp by eye in search of this sequence is quite time-consuming (as you will find out shortly). Imagine if you had to check a billion nucleotides in a sequence!

Notice that the sequence provided below is in FASTA format, i.e., it does not start directly with nucleotide abbreviations (A, G, T, C), nor it does include numbers, spaces or symbols. Instead, a name or designation for the sequence is written in the first line, preceded by the “>” symbol.

Start by scanning (by eye) the given sequence (3360-bp) in search of the location of the following short nucleotide stretches. Devise your own method.


Mark the sequences on your printout of this document (underline or use a highlighter) or on the electronic document, as requested by your instructor.


Please note the time at the beginning of your search and answer the following questions once you have located your sequence.

  1. Describe the method you used to find the sequence stretches (visual comparison? computer-aided?).


  1. How long did it take for you to find your sequence?




2.3. BLAST

Let us explore the efficiency of using vast online databases and online search tools to locate and identify unknown nucleotide sequences. One such search tool is called BLAST (Basic Local Alignment Search Tool). This program compares a nucleotidyl (DNA, RNA) or amino acyl sequence (protein) of interest to online databases looking for regions of local similarity and calculates the statistical significance of matches. One such online database is NCBI’s GenBank, which contains the sequences of at least three full-length human genomes and, being hosted by the National Library of Medicine (a branch of the National Institutes of Health), is free to the public.

Finding sequences of known (or putative) function in a database that have similarity to your sequence of interest may allow you to identify the gene family to which your sequence belongs or the functional significance of your sequence, if any. You will use a BLAST search to uncover information about an unknown sequence. Copy and paste the unknown sequence (either the one from last page or as provided by your section’s instructor) onto a new Word document and save it in your computer’s hard drive. Give it a title in the format 202_Test_Sequence_LastName_FirstName.docx (example: 202_Test_Sequence_McKinnell_James.docx).

· Go to NCBI BLAST website at

· In the resulting page, scroll down to Basic Blast and click on the link nucleotide blast. Copy the first line of the nucleotide sequence in the Word document and paste it in the “Enter Query Sequence” box. (The top line, preceded by the “>” sign, is a description of what the sequence is.)

· Leave the settings as they are, but make sure that Human genomic + transcriptis selected in the Choose Search Set options. Scroll to the bottom of the page and click the BLAST button in the left-hand corner. Wait for results. Did your sequence find any matches in the human genome database?


What could be the reason for this result? STARTTTYPINGGHERE

· Now try a longer sequence. Copy the first three lines and paste this sequence into the “Enter Query Sequence” box and click BLAST again. Did your query match any sequence in the human genome database?


If so, what match did it locate? STARTTTYPINGGHERE

· Next copy one line that is roughly in the middle of the provided sequence and paste it into the “Query Sequence” box and run the BLAST search again. Did you get a result this time?


· Propose a reason for why this one line yielded a different result than the one line at the beginning of the sequence.


· Click on the first of the matches that your search yielded. This match should be with a sequence within GenBank. What is the name of this gene? What is the Sequence ID?


· What chromosome is this located in?


  1. Conclusion

A fully processed messenger RNA (mRNA) contains nucleotide triplets in a particular sequence that are read from an initiation codon (AUG) up to one or two termination codons (out of three: UAG, UAA, UGA). The expression of a eukaryotic gene is controlled by DNA sequences called regulatory regions. The regulatory regions include the gene’s promoter, which binds RNA polymerase once the transcription factors have bound the DNA and made that site accessible, and one or more enhancers that also bind transcription factors and contribute to the control of gene expression.

Usually, the expression of a gene can be modified if one of its regulatory regions undergoes a mutation. This mutation may be of immense significance, even if the change involves a single base substitution, since a transcription factor’s recognition of the site is sequence-specific. Mutations may involve more substantial changes to the gene’s regulatory regions, such as multiple nucleotide deletions, or, as in the case of the gene under study in this lab, multiple nucleotide additions which may eventually result in the silencing of this gene.

The gene you searched codes for the so-called fragile-X mental retardation protein (FMRP). The promoter of this gene contains a variable number of the trinucleotide repeat CGG. Individuals with no disease (normal phenotype or wildtype) have promoters containing <60 CGG repeats. Individuals whose promoters contain 60–200 trinucleotide repeats are said to possess a “premutation” that renders them susceptible to movement problems (ataxia) later in life. Individuals whose promoters have >200 CGG trinucleotide repeats are afflicted with fragile-X syndrome and display a wide range of symptoms that include mental retardation, large testes, etc. In turn, FMRP is involved in the transport of RNA transcripts to polyribosomes located at sites of protein synthesis. In neurons these sites include the terminals of axons. Loss of expression of FMRP has far-reaching consequences for an affected individual.

  1. Questionnaire

· Consider the sequence you searched using the BLAST program. Would you predict that this gene comes from a healthy person, a person with a premutation, or a person afflicted with fragile-X syndrome, just by looking at the sequence?


Explain your reasoning.


· We used the default database when conducting our BLAST search. This database contains only human genome sequences. Imagine that the sequence you subjected to the BLAST search yielded no matches (regardless of the length of the sequence you entered into the Query box). What would you infer about that sequence?


· What result would you predict if we searched that sequence against all known sequences?


A database containing all known nucleotide sequences exists and is called “nucleotide collection (nr/nt).” This database can be found on the BLAST site under “Choose Search Set.” At “Database” you will see that the “Human Genome + transcript” is selected. Select “Others” instead and you will find that the “nucleotide collection (nr/nt)” database is automatically selected. Run your search against this vast database.

· How do your results differ from the original search?


· Describe the capabilities of a BLAST search.


· What could be the possible limitations of a BLAST search?


· BLAST is often nicknamed “the Google of DNA search tools.” Compare a BLAST search to a Google search and list one possible similarity and one possible difference.


  1. Discussion

You are given a sequence of DNA and told that it is human. You are asked to find out its identity and whether it has similarity to sequences in other organisms. Please describe the bioinformatics tool, the database, and the procedure you would use to find such information. Give two possible outcomes of your search.


Once you have completed the exercise, provide your instructor with a hard copy, or submit via SafeAssign, or send it via e-mail, as she indicates.


Alaie A, Teller V, Qiu W-g (2012) A bioinformatics module for use in an introductory biology laboratory. Am Biol Teach 74:318-332.

Honts JE (2003) Evolving strategies for the incorporation of bioinformatics within the undergraduate cell biology curriculum. CBE Life Sci Educ 2:233-247.

Maloney M, Parker J, LeBlanc M, Woodard CT, Glackin M, Hanrahan M (2010) Bioinformatics and the undergraduate curriculum. CBE Life Sci Educ 9:172-174.

National Center for Biotechnology Information (2005) NCBI Help Manual. URL:

Accessed: 15Jan20

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
The price is based on these factors:
Academic level
Number of pages
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Benefits of our college essay writing service

  • 80+ disciplines

    Buy an essay in any subject you find difficult—we’ll have a specialist in it ready

  • 4-hour deadlines

    Ask for help with your most urgent short tasks—we can complete them in 4 hours!

  • Free revision

    Get your paper revised for free if it doesn’t meet your instructions.

  • 24/7 support

    Contact us anytime if you need help with your essay

  • Custom formatting

    APA, MLA, Chicago—we can use any formatting style you need.

  • Plagiarism check

    Get a paper that’s fully original and checked for plagiarism

What the numbers say?

  • 527
    writers active
  • 9.5 out of 10
    current average quality score
  • 98.40%
    of orders delivered on time