UMD logo

CMSC 838T Project 2 

Basic Information

The goal of this project is to find a bioinformatics research topic related to high-performance computing and obtain some preliminary results.

The motivating factor for your project should be to figure out how to take advantage of increasing computation power to improve the quality of bioinformatic applications, while taking into account the near-exponential growth in the size of sequence databases. Earlier algorithms may be oversimplified due to concerns about processing power. Try to find opportunities to use computing power to save user effort.

You may work in 2-person groups (I will coordinate multiple groups on larger projects).

Here's a first pass at some project suggestions

Possible project topics

  1. Parallel sequence alignment / search algorithms
  2. Experimental evaluation of bioinformatic algorithms
  3. Preprocessing sequence databases
  4. Any other good ideas you can think of...

More detailed project descriptions

1) Improving performance of sequence search algorithms

    Source code for parallel versions of BLAST are available. Evaluate parallel BLAST performance using


2) Evaluating sensitivity / specificity of sequence search algorithms

    A number of papers have compared BLAST / FASTA / Smith-Waterman algorithms for discovering distant members of protein families. Repeat using


3) Preprocessing sequence databases

    A number of researchers have suggested compressed sequence database formats. Investigate issues:



  1. Conduct survey of related work (read related research papers)
  2. Write up a short description of proposed research before proceeding
  3. Initially concentrate on setting up tools / procedures
  4. Later focus on collecting experimental information
  5. Present preliminary results on last day of class
  6. Turn in short research paper describing project