Benchmarking a Network of PCs Running Parallel Applications (Presented at IPCCC'98 Phoenix, AZ, USA, Feb. 1998).

We present a benchmarking study that compares the performance of a network of four PCs connected by 100 Mb/s Fast Ethernet running three different system software configurations: TCP/IP on Windows NT, TCP/IP on Linux and a light weight message passing protocol (U-Net Active messages) on Linux. For each configuration, we report results for communication micro-benchmarks and the NAS parallel benchmarks. For the NAS benchmarks, the overall running time using Linux TCP/IP was 12 to 500 percent less than the Windows NT TCP/IP configuration. We show that longer code paths and poor memory hierarchy utilization are responsible for this difference. Likewise, the Linux U-Net based message passing protocol outperformed the Linux TCP/IP version by 5 to almost 200 percent. We also show that using Linux U-Net we are able to achieve 125 micro-second latency between two processes using PVM. In addition, we show that poor memory system performance in the form of increased kernel and user mode cache misses contributes to NT’s poor performance on the NAS kernels. Finally, we report that the default math libraries supplied under NT (for both gcc and Visual C++) are substantially slower than the one supplied with Linux. The slides and paper for this presentation are available.

MDL: A Language and Compiler for Dynamic Program Instrumentation (Presented at PACT'97 San Francisco, CA, USA, November 1997).

We use a form of dynamic code generation, called dynamic instrumentation, to collect data about the execution of an application program. Dynamic instrumentation allows us to instrument running programs to collect performance and other types of informations. The instrumentation code is generated incrementally and can be inserted and removed at any time. Our system currently runs on the SPARC, PA-RISC, Power2, Alpha, and x86 architectures. Specification of what data to collect are written in a specialized, platform-independent language, called the Metric Description Language. MDL provides a concise way to specify how to constrain performance data to particular resources such as modules, procedures, nodes, files, or message channels (or combinations of these resources). The slides and paper for this presentation are available.

Using Content-Derived Names for Configuration Management Presented at the 1997 ACM Symposium on Software Reuse (SSR)

Configuration management of compiled software artifacts (programs, libraries, icons, etc.) is a growing problem as soft-ware reuse becomes more prevalent. For an application com-posed from reused libraries and modules to function correctly, all of the required files must be available and be the correct version. In this paper, we present a simple scheme to address this prob-lem: content-derived names (CDNs). Computing an objects name automatically using digital signatures greatly eases the problem of disambiguating multiple versions of an object. By using con-tent-derived names, developers can ensure that only those soft-ware components that have been tested together are permitted to run together. The slides and paper for this presentation are available.

Internet: The Technology Behind the Hype, Presented as part of Math Awareness Week 1997 at the University of Maryland

Starting with a small project called ARPANET in 1969, the Internet has evolved into a network of networks connecting tens of millions of hosts. In this talk, I will review the history and evolution of the technology that is used in the Internet. In addition, I will explain the ideas behind many of the buz words and acronyms associated with computer networking. I will also discuss issues and technologies that will drive Internet evolution for the next decade. The slides for this presentation are available.

Internet: The Technology Behind the Hype, Originally Presented to the University of Maryland ACM Chapter

In the past couple of years, the Internet has gone from an obscure research project to a ubiquitous icon in popular culture. Behind the hype is a sophisticated collection of technologies that have been in continuous use for almost thirty years. Starting with a small project called ARPANET in 1969, the Internet has evolved into a network of networks so large that it is impossible to count the number of users or computers connected. In this talk, I will review the history and evolution of the technology that is used in the Internet. In addition, I will explain the ideas behind many of the buz words and acronyms associated with computer networking such as router, bridge, fuzz ball, ARP storm, IPNG, SNMP, and HTTP. I will also discuss issues and technologies that will drive Internet evolution for the next decade. The slides for this presentation are available.

Online "what-if" Metrics, Originally Presented at the University of Wisconsin

In this talk I will describe a new technique to assist programmers with making informed decisions about how to tune their parallel programs. The technique permits programmers, while their programs are running, to ask "what-if questions about potential tuning alter-natives. Two examples of "what-if" questions will be presented. First, I will describe a non-trace based algorithm to compute the critical path profile of the execution of a mes-sage passing parallel program. The algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. Second, I will describe a metric called Load Balancing Factor (LBF) that accesses the impact of moving where a computation is performed rather than reducing its execution time. LBF can be computed at either process or procedure granularity. Finally, I will present a brief case study that quantifies the runtime overhead of our algorithms and demonstrates their ability to accurately predict the performance of the tuning options. The slides and two papers (Critical Path and Load Balancing Factor) for this presentation are available.

An Online Computation of Critical Path Profiling , Originally Presented at SPDT'96 (Philadelphia, PA)

The slides and paper for this presentation are available.

Tuning the Performance of IO Intensive Parallel Applications, Originally Presented at USC Information Sciences Institute

Getting good I/O performance from parallel programs is a critical problem for many application domains. In this paper, we report our experience tuning the I/O performance of four application programs from the areas of sensor data processing and linear algebra. After tuning, three of the four applications achieve effective I/O rates of over 100MB/s, on 16 processors. The total volume of I/O required by the programs ranged from about 75MB to over 200GB. We report the lessons learned in achieving high I/O performance from these applications, including the need for code restructuring, local disks on every node and overlapping I/O with computation. We also report our experience on achieving high performance on peer-to-peer configurations. Finally, we comment on the necessity of complex I/O interfaces like collective I/O and strided requests to achieve high performance. The slides and paper for this presentation are available.