CMSC 838T, Project 1

CMSC 838T Project 1

Project Description

The goal of this assignment is to familiarize yourself with a number of web-based bioinformatic tools, by attempting to discover useful information about some protein sequences from the CASP-5 competition. Present what you discover in a short research-style paper.

Protein Sequences

MLISHSDLNQ QLKSAGIGFN ATELHGFLSG LLCGGLKDQS WLPLLYQFSN DNHAYPTGLV QPVTELYEQI SQTLSDVEGF TFELGLTEDE NVFTQADSLS DWANQFLLGI GLAQPELAKE KGEIGEAVDD LQDICQLGYD EDDNEEELAE ALEEIIEYVR TIAMLFYSHF NEGEIESKPV LH
TGISRETSSDVALASHILTALREKQAPELSLSSQDLELVTKEDPKALAVALNW DIKKTETVQEACERELALRLQQTQSLHSLR
QPAKKTYTWNTKEEAKQAFKELLKEKRVPSNASWEQAMKMIINDPRYSALANLSE KKQAFNAYKVQTEK
MSTVTKYFYKGENTDLIVFAASEELVDEYLKNPSIGKLSEVVELFEVFTPQDGRGA EGELGAASKAQVENEFGKGKKIEEVIDLILRNGKPNSTTSSLKTKGGNAGTKAYN

Perform the following analyses for each protein sequence

Find similar protein sequences
- Use BLAST at http://www.ncbi.nlm.nih.gov/BLAST/
  Select "Standard protein-protein BLAST [blastp]"
  Do similar sequences provide clues about target protein function?
Find protein family / conserved regions using automated tools
- Use RPS-BLAST at http://www.ncbi.nlm.nih.gov/BLAST/
  Select "RPS-BLAST"
- Use HMMER at http://pir.georgetown.edu/
  Select "HMM Domain Search" from "Search and Retrieval" menu
  Does target protein belong to family with known function?
Perform multiple sequence alignment to identify conserved regions and infer phylogenetic trees
- Use CLUSTALW at http://pir.georgetown.edu/
  Select "CLUSTALW Alignment" from "Search and Retrieval" menu
  Use T-Coffee at http://us.expasy.org/
  Select T-COFFEE under "Tools and software packages"
  Should lead to http://www.ch.embnet.org/software/TCoffee.html
  Does multiple sequence alignment uncover strongly conserved region(s)?
  Does inferred phylogenetic tree provide hints as to function?
Predict secondary structure, 3D structure
- Use Swiss-Model at http://us.expasy.org/
  Select Swiss-Model under "Tools and software packages"
  Should lead to http://us.expasy.org/swissmod/SWISS-MODEL.html
  Select "First Approach Mode"
  Select "Forward to PHD secondary structure prediction"
  Use 3D-PSSM at http://www.sbg.bio.ic.ac.uk/~3dpssm/
  Select "Recognise a fold"
  Does protein prediction identify any interesting functions for protein?
Look for source of protein in genomic DNA, cDNA
- Use translated BLAST at http://www.ncbi.nlm.nih.gov/BLAST/
  Select "Protein query - Translated db [tblastn]"
  Select "Choose database", try both the default (nr) and genomic DNA (GSS) only.
  What are possible sources of target protein?
  What species? Genomic DNA or other source?

Web Accessibility