MTAAP'09
Agenda
Workshop on Multithreaded
Architectures and Applications
Held in Conjunction With
International Parallel and
Distributed Processing Symposium (IPDPS 2009)
Rome, May 29, 2009
Friday, May 29
08:00 - 08:15 Welcome to MTAAP
08:15 - 10:15 Libraries
8:15 Implementing OpenMP on a High Performance Embedded Multicore MPSoC
Barbara Chapman (University of Houston, USA); Eric Stotzer (University of
Houston, USA); Lei Huang (University of Houston, USA); Eric Biscondi (Texas
Instruments, France); Ashish Shrivastava (Texas Instruments, USA); Alan Gatherer
(Texas Instruments - DSPS R&D Center, USA)
8:45 Multi-Threaded Library for Many-Core Systems
Allan Porterfield (Renaissance Computing Institute, USA); Rob Fowler (University
of North Carolina, USA); Nassib Nassar (Renaissance Computing Institute, USA)
9:15 Implementing a Portable Multi-threaded Graph Library: the MTGL on
Qthreads
Brian Barrett (Sandia National Laboratories, USA); Jonathan Berry (Sandia
National Laboratories, USA); Richard Murphy (Sandia National Labs, USA); Kyle
Wheeler (University of Notre Dame, USA)
9:45 A Super-Fast Adaptable Bit-Reversal on Multithreaded Architectures
Jan Meyer (Norwegian University of Science and Technology, Norway); Anne Elster
(Norwegian University of Science and Technology, Norway)
10:15 - 10:45 Break
10:45 - 11:45 Keynote:
""QoS for SMT and CMP Processors?" Mateo Valero, BSC-UPC
11:45 - 13:00 Lunch
13:00 - 15:00 Performance Analysis and Parallel Algorithms
13:00 Implementing and Evaluating Multithreaded Triadic Census Algorithms
on the Cray XMT
George Chin (Pacific Northwest National Laboratory, USA); Andres Marquez
(Pacific Northwest National Laboratory, USA); Sutanay Choudhury (Pacific
Northwest National Laboratory, USA); Kristyn Maschhoff (Cray, Inc., USA)
13:30 A Faster Parallel Algorithm and Efficient Multithreaded
Implementations for Evaluating Betweenness Centrality on Massive Datasets
Kamesh Madduri (Lawrence Berkeley National Laboratory, USA); David Ediger
(Georgia Institute of Technology, USA); Karl Jiang (Georgia
Institute of Technology, USA); David Bader (Georgia Institute of Technology,
USA); Daniel Chavarria (Pacific Northwest National Laboratory, USA)
14:00 Accelerating Numerical Calculation on the Cray XMT
Chad Scherrer (Pacific Northwest National Laboratory, USA); Tim Shippert
(Pacific Northwest National Laboratory, USA); Andres Marquez (Pacific Northwest
National Laboratory, USA)
14:30 Early Experiences on Accelerating Dijkstra's Algorithm Using
Transactional Memory
Nikos Anastopoulos (National Technical University of Athens, Greece);
Konstantinos Nikas (National Technical University of Athens, Greece); Georgios
Goumas (National Technical University of Athens, Greece); Nectarios Koziris
(National Technical University of Athens, Greece)
15:00 - 15:30 Break
15:30 - 17:30 Systems
15:30 Early Experiences with Large-Scale XMT Systems
David Mizell (Cray, USA); Kristyn Maschhoff (Cray, Inc., USA)
16:00 Simplex-Based Linear Optimization on Multithreaded Architectures
Daniele Spampinato (Norwegian University of Science and Technology, Norway);
Anne Elster (Norwegian University of Science and Technology, Norway)
16:30 Enabling High-Performance Memory Migration for Multithreaded
Applications on Linux
Brice Goglin (INRIA Bordeaux - Sud Ouest, France); Nathalie Furmento (CNRS,
France)
17:00 Exploiting DMA mechanisms to enable non-blocking execution in
Decoupled Threaded Architecture
Roberto Giorgi (University of Siena, Italy); Zdravko Popovic (University of
Siena, Italy); Nikola Puzovic (University of Siena, Italy)
Keynote
Title: "QoS for SMT and CMP Processors?"
Mateo Valero - Professor - BSC-UPC
Abstract:
The limitations imposed by the Instruction Level Parallelism (ILP) have
motivated the use of Thread-level parallelism (TLP) as a common strategy to
improve processor performance. Common TLP paradigms are Simultaneous
Multi-Threading (SMT), Chip-MultiProcessor (CMP), Fine-Grain Multithreading (FGMT),
Coarse-Grain Multithreading (CGMT), or combinations of them.
Multithreading processors (SMT, CMP, FGMT, CGMT or any combination of them) are
used in different computing systems such as real-time systems, High-Performance
Computing (HPC) systems or High-Performance desktop systems:
- High-performance systems: processors like the Intel QuadCore or the IBM POWER6
are multithreading.
- Network systems: network multithreading processors are the Intel IXP network
processor family and the IBM PowerNP.
- Real-Time systems: the Imagination Technologies Meta processor and the
Infineon TriCore 2 are two examples.
Each of these computing systems has different targets. For example, in an
embedded real-time environment the target of the system could be to execute a
given thread before a deadline while maximizing the overall system performance
and reducing power consumption. On the other hand, in a high-performance system
the target could be just to improve overall performance. Thus, the question is
how a multithreaded processor could be designed to potentially meet all these
requirements at the same time.
We call these requirements Quality of Service (QoS) requirements.
In this talk we present the concept of Explicit Resource Allocation for
multithreaded processors as well as strategies that make use of this concept to
provide flexible Operating System/processor architecture collaboration. This
approach is inspired by QoS in networks in which processes are given guarantees
about bandwidth, throughput, or other services. Analogously, in an Multithreaded
processor resources can be reserved for threads in order to guarantee a required
performance. Our view is that this can be achieved by having the MT processor
provide 'levers' through which the OS can fine tune the internal operation of
the processor as needed. Such levers can include prioritizing instruction fetch
for particular threads, reserving parts of the resources like IQ entries, play
with the cache and memory bandwidth allocation, etc.
We show that by directly controlling Multithreaded processor hardware shared
resources we achieve our objective of providing QoS in each of this computing
systems. In particular we focus on:
- High-Performance desktop systems: in which the main QoS requirement is to
improve a given metric like IPC throughput and/or fairness.
- High-Performance computing systems: In this case the QoS requirement is to
finish all the threads composing a HPC application as soon as possible.
- Soft-Real Time systems: In soft-real time systems the objective is two fold.
On the one hand obtain a high success rate for the soft-real time task, while
obtaining high performance for the non-real time tasks.
- Hard-Real Time systems: In this case, the objective is to ensure that the
hard-real time tasks are executed before their WCET estimation as well as to
provide the means that allow the computation of such WCET estimation.