UMD logo

CMSC 838T: Advanced Topics in Programming Languages

Systems Software for High Performance Computing, Emphasis on Bioinformatic Applications

Basic Information

Course News

Course Description

This course focuses on bioinformatics applications, high-performance computing, and the application of high-performance computing to bioinformatics applications.

Bioinformatics can be broadly defined as the creation and development of advanced information and computational techniques for problems in biology. More narrowly, bioinformatics is the set of computing techniques used to manage and extract useful information from the DNA/RNA/protein sequence data being generated (at high volumes) by automated techniques (e.g., DNA sequencers, DNA microarrays) and stored in large public databases (e.g., GenBank, Protein DataBank). Certain methods for analyzing genetic/protein data has been found to be extremely computationally intensive, providing motivation for the use of powerful computers.

High-performance computing describes a set of hardware and software techniques developed for building computer systems capable of quickly performing large amounts of computation. These techniques have generally relied on harnessing the computing power of large numbers of processors working in parallel, either in tightly-coupled shared-memory multiprocessors or loosely-coupled clusters of PCs. Experience has shown a great deal of software support is necessary to support the development and tuning of applications on parallel architectures.

To goal of this course is to:

  1. learn about characteristics of bioinformatic applications
  2. examine software techniques used in high-performance computing
  3. study how to apply high-performance computing to bioinformatic applications

This course will have exams and projects. Students will be required to present papers in class. For computer science graduate students, this course will count for comp credit (both MS and PhD) in SE/PL.

Course Syllabus

Bioinformatics topics: Pairwise sequence alignment (dynamic programming, heuristic methods), multiple sequence alignment, genome assembly, gene identification and annotation, DNA microarrays, protein folding analysis and prediction, phylogenetic analysis, design and implementation of biological databases. High performance computing topics: parallel architectures, parallel programming languages & paradigms, compiler techniques, program analysis, program transformations, data locality optimizations, parallelization techniques, run-time systems, software environments.

Lectures

Date Topic Slides
Bioinformatics lectures
1/29 Bioinformatics overview PDF
2/3, 2/5 Molecular biology review PDF
2/10, 2/12 Pairwise sequence alignment PDF
2/24, 2/26 Multiple sequence alignment PDF
3/3 Phylogenetics analysis PDF
3/5, 3/10, 3/12 Protein structure prediction & alignment PDF
4/2, 4/7 Sequence assembly & gene prediction PDF
4/9 Biological networks and DNA microarrays PDF
4/14 Experimental proteomics PDF
4/21 Genetics & comparative genomics PDF
3/31 Bioinformatics databases PDF
High performance computing lectures
3/17 Parallel architectures PDF
3/19 Parallel programming paradigms PDF
4/16 Program analysis & parallelization PDF
Paper presentations
4/23 Parallel Computation in Biological Sequence Analysis
Yap, Frieder, Martino
Presented by Wu, Xue
PPT
Pairwise
sequence
comparison
TurboBLAST: A Parallel Implementation of BLAST Based on the TurboHub Architecture...
R.D. Bjornson, A.H. Sherman, S.B. Weston, N.Willard, and J. Wing
Presented by Gan, Bin
PPT
Massively Parallel Solutions for Molecular Sequence Analysis
B. Schmidt, H. Schröder, and M. Schimmler
Presented by Gudla, Prabhakar Reddy (Bio Res)
PPT
4/28 Whole Genome Alignment Using a Multithreaded Parallel Implementation
W Martins, J del Civillo, W Cui, and G Gao
Presented by Murthy, Hyma (Engr)
PPT
Genome / multiple
sequence alignment
Performance Optimization of Clustal W
D Mikhailov, H Cofer, R Gomperts
Presented by Mishra, Arunesh
PPT
Improving Performance of Multiple Sequence Alignment Analysis in Multi-client Environments
U. Catalyurek, R. Ferreira, T. Kurc, and J. Saltz
Presented by Zollman, Aaron
PPT
4/30 A Study of GeneWise with the Drosophila Adh Region
Y Mo, M Regelson, and M Sievers
Presented by Gindulyte, Asta (Chem)
PPT
Gene prediction &
phylogenetics
Parallel EST Clustering
A. Kalyanaraman, S. Aluru, and S. Kothari, Iowa State University
Presented by Memarsadeghi, Nargess
PPT
High-Performance Algorithm Engineering for Computational Phylogenetics
B Moret, D Bader, and T Warnow
Presented by Liu, Kexue (Appl Math)
PPT
5/7 Solving the Protein Threading Problem in Parallel
Nicola Yanev and Rumen Andonov
Presented by Bhattacharya, Indrajit
PPT
Protein folding Using Metacomputing Tools to Facilitate Large-Scale Analyses of Biological Databases
A Waugh, G Williams, L Wei, and R Altman
Presented by Shet, Vinay
PPT
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing
V Pande et al.
Presented by Lu, Qing
PPT
5/9 Gene Clustering using Self-Organizing Maps and Particle Swarm Optimization
X. Xiao, E. Dow, R. Eberhart, Z. Ben Miled, and R. J. Oppelt
Presented by Lewis, Robert Earl (Busi MGT)
PPT
Gene network &
linkage analysis
Parallel Detection of Regulatory Elements with gMP
Bertil Schmidt, Lin Feng, Amey Laud, and Yusdi Santoso
Presented by Gupta, Damayanti
PPT
Parallel Genehunter:Implementation of a linkage analysis package for distributed memory architectures
G. Conant, A. Wagner, S. Plimpton, W. Old, and P. Fain
Presented by Moran, Michael
PPT
5/12 Parallelisation of IBD computation for determining genetic disease map
Nouhad J. Rizk
Presented by Yuan, Yuan
PPT
Linkage analysis &
microarray design
Realtime Primer Design for DNA chips
Harald Simmler, H. Singpiel, and R. Männer
Presented by Hui, Annie
PPT
Accurate method for fast design of diagnostic oligonucleotide probe sets for DNA microarrays
Andreas Krause, Markus Kräutner, and Harald Meier
Presented by Tas, Nazif Cihan
PPT
5/14 Project presentations

Class Resources

Acknowledgments

Thanks and acknowledgements to a large number of people. Lectures and class materials were borrowed from many sources, including a SC'02 tutorial by Prof. David Bader and Prof Srinivas Aluru.