| CMSC 838T Project 1
|
The goal of this assignment is to familiarize yourself with
a number of web-based bioinformatic tools, by attempting to
discover useful information about some protein
sequences from the CASP-5 competition. Present what
you discover in a short research-style paper.
Protein Sequences
-
MLISHSDLNQ QLKSAGIGFN ATELHGFLSG LLCGGLKDQS WLPLLYQFSN
DNHAYPTGLV QPVTELYEQI SQTLSDVEGF TFELGLTEDE NVFTQADSLS
DWANQFLLGI GLAQPELAKE KGEIGEAVDD LQDICQLGYD EDDNEEELAE
ALEEIIEYVR TIAMLFYSHF NEGEIESKPV LH
-
TGISRETSSDVALASHILTALREKQAPELSLSSQDLELVTKEDPKALAVALNW
DIKKTETVQEACERELALRLQQTQSLHSLR
-
QPAKKTYTWNTKEEAKQAFKELLKEKRVPSNASWEQAMKMIINDPRYSALANLSE
KKQAFNAYKVQTEK
-
MSTVTKYFYKGENTDLIVFAASEELVDEYLKNPSIGKLSEVVELFEVFTPQDGRGA
EGELGAASKAQVENEFGKGKKIEEVIDLILRNGKPNSTTSSLKTKGGNAGTKAYN
Perform the following analyses for each protein sequence
- Find similar protein sequences
- Find protein family / conserved regions using automated tools
- Use HMMER at
http://pir.georgetown.edu/
-
Select "HMM Domain Search" from "Search and Retrieval" menu
- Does target protein belong to family with known function?
- Perform multiple sequence alignment to identify conserved regions
and infer phylogenetic trees
- Predict secondary structure, 3D structure
- Look for source of protein in genomic DNA, cDNA
- Use translated BLAST at
http://www.ncbi.nlm.nih.gov/BLAST/
- Select "Protein query - Translated db [tblastn]"
- Select "Choose database", try both the default (nr)
and genomic DNA (GSS) only.
-
What are possible sources of target protein?
- What species? Genomic DNA or other source?