MTAAP'09 Agenda
Workshop on Multithreaded Architectures and Applications

Held in Conjunction With
International Parallel and Distributed Processing Symposium (IPDPS 2009)

Rome, May 29, 2009


Friday, May 29

08:00 - 08:15 Welcome to MTAAP

08:15 - 10:15 Libraries

8:15 Implementing OpenMP on a High Performance Embedded Multicore MPSoC
Barbara Chapman (University of Houston, USA); Eric Stotzer (University of Houston, USA); Lei Huang (University of Houston, USA); Eric Biscondi (Texas Instruments, France); Ashish Shrivastava (Texas Instruments, USA); Alan Gatherer (Texas Instruments - DSPS R&D Center, USA)

8:45 Multi-Threaded Library for Many-Core Systems
Allan Porterfield (Renaissance Computing Institute, USA); Rob Fowler (University of North Carolina, USA); Nassib Nassar (Renaissance Computing Institute, USA)

9:15 Implementing a Portable Multi-threaded Graph Library: the MTGL on Qthreads
Brian Barrett (Sandia National Laboratories, USA); Jonathan Berry (Sandia National Laboratories, USA); Richard Murphy (Sandia National Labs, USA); Kyle Wheeler (University of Notre Dame, USA)

9:45 A Super-Fast Adaptable Bit-Reversal on Multithreaded Architectures
Jan Meyer (Norwegian University of Science and Technology, Norway); Anne Elster (Norwegian University of Science and Technology, Norway)

10:15 - 10:45 Break

10:45 - 11:45 Keynote: ""QoS for SMT and CMP Processors?" Mateo Valero, BSC-UPC

11:45 - 13:00 Lunch

13:00 - 15:00 Performance Analysis and Parallel Algorithms

13:00 Implementing and Evaluating Multithreaded Triadic Census Algorithms on the Cray XMT
George Chin (Pacific Northwest National Laboratory, USA); Andres Marquez (Pacific Northwest National Laboratory, USA); Sutanay Choudhury (Pacific Northwest National Laboratory, USA); Kristyn Maschhoff (Cray, Inc., USA)

13:30 A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets
Kamesh Madduri (Lawrence Berkeley National Laboratory, USA); David Ediger (Georgia Institute of Technology, USA); Karl Jiang (Georgia
Institute of Technology, USA); David Bader (Georgia Institute of Technology, USA); Daniel Chavarria (Pacific Northwest National Laboratory, USA)

14:00 Accelerating Numerical Calculation on the Cray XMT
Chad Scherrer (Pacific Northwest National Laboratory, USA); Tim Shippert (Pacific Northwest National Laboratory, USA); Andres Marquez (Pacific Northwest National Laboratory, USA)

14:30 Early Experiences on Accelerating Dijkstra's Algorithm Using Transactional Memory
Nikos Anastopoulos (National Technical University of Athens, Greece); Konstantinos Nikas (National Technical University of Athens, Greece); Georgios Goumas (National Technical University of Athens, Greece); Nectarios Koziris (National Technical University of Athens, Greece)

15:00 - 15:30 Break

15:30 - 17:30 Systems

15:30 Early Experiences with Large-Scale XMT Systems
David Mizell (Cray, USA); Kristyn Maschhoff (Cray, Inc., USA)

16:00 Simplex-Based Linear Optimization on Multithreaded Architectures
Daniele Spampinato (Norwegian University of Science and Technology, Norway); Anne Elster (Norwegian University of Science and Technology, Norway)

16:30 Enabling High-Performance Memory Migration for Multithreaded Applications on Linux
Brice Goglin (INRIA Bordeaux - Sud Ouest, France); Nathalie Furmento (CNRS, France)

17:00 Exploiting DMA mechanisms to enable non-blocking execution in Decoupled Threaded Architecture
Roberto Giorgi (University of Siena, Italy); Zdravko Popovic (University of Siena, Italy); Nikola Puzovic (University of Siena, Italy)


Title: "QoS for SMT and CMP Processors?"
Mateo Valero - Professor - BSC-UPC


The limitations imposed by the Instruction Level Parallelism (ILP) have motivated the use of Thread-level parallelism (TLP) as a common strategy to improve processor performance. Common TLP paradigms are Simultaneous Multi-Threading (SMT), Chip-MultiProcessor (CMP), Fine-Grain Multithreading (FGMT), Coarse-Grain Multithreading (CGMT), or combinations of them.

Multithreading processors (SMT, CMP, FGMT, CGMT or any combination of them) are used in different computing systems such as real-time systems, High-Performance Computing (HPC) systems or High-Performance desktop systems:
- High-performance systems: processors like the Intel QuadCore or the IBM POWER6 are multithreading.
- Network systems: network multithreading processors are the Intel IXP network processor family and the IBM PowerNP.
- Real-Time systems: the Imagination Technologies Meta processor and the Infineon TriCore 2 are two examples.

Each of these computing systems has different targets. For example, in an embedded real-time environment the target of the system could be to execute a given thread before a deadline while maximizing the overall system performance and reducing power consumption. On the other hand, in a high-performance system the target could be just to improve overall performance. Thus, the question is how a multithreaded processor could be designed to potentially meet all these requirements at the same time.

We call these requirements Quality of Service (QoS) requirements.

In this talk we present the concept of Explicit Resource Allocation for multithreaded processors as well as strategies that make use of this concept to provide flexible Operating System/processor architecture collaboration. This approach is inspired by QoS in networks in which processes are given guarantees about bandwidth, throughput, or other services. Analogously, in an Multithreaded processor resources can be reserved for threads in order to guarantee a required performance. Our view is that this can be achieved by having the MT processor provide 'levers' through which the OS can fine tune the internal operation of the processor as needed. Such levers can include prioritizing instruction fetch for particular threads, reserving parts of the resources like IQ entries, play with the cache and memory bandwidth allocation, etc.

We show that by directly controlling Multithreaded processor hardware shared resources we achieve our objective of providing QoS in each of this computing systems. In particular we focus on:
- High-Performance desktop systems: in which the main QoS requirement is to improve a given metric like IPC throughput and/or fairness.
- High-Performance computing systems: In this case the QoS requirement is to finish all the threads composing a HPC application as soon as possible.
- Soft-Real Time systems: In soft-real time systems the objective is two fold. On the one hand obtain a high success rate for the soft-real time task, while obtaining high performance for the non-real time tasks.
- Hard-Real Time systems: In this case, the objective is to ensure that the hard-real time tasks are executed before their WCET estimation as well as to provide the means that allow the computation of such WCET estimation.