Portable High-Performance Programs,
by Matteo Frigo.
Ph. D. Thesis.
Available from links at http://supertech.lcs.mit.edu/cilk/papers/
Background: The thesis is mainly about implementing Cilk, a portable
C-based parallel programming framework targeted mainly for SMPs. I
have a similar Java version (class FJTask) in my util.concurrent
package available from my home page. It's the fastest (and also among
conceptually nicest) portable approach to SMP parallel processing I
know, in any language.
The first chapter (especially around p39) includes comparisons of
performance of different versions of the main work-stealing
algorithm. One version relies entirely on locking. Another rarely
needs locking but requires memory barriers. On a pentium pro,
performance of the barrier version was only 5% faster than locking
version. On an ultrasparc, the barrier version was 25% faster than the
full locking version. It further claims that on the ultra1, there was
no detectable performance difference even when these barriers (which
apparently aren't actually required on most ultra1-based SMPs) were
The chapter also shows overheads on other processors (around p42), but
does not explicity compare barrier vs lock costs for them.
Digression: My version of the algorithm is a little bit different
than theirs. It is also mainly lock-free, but relies on java
volatiles to provide the right barriers (at least those needed on
some platforms). I haven't benchmarked extensively on anything
except ultras. (Translation: I once checked that it worked on a
Win95 JVM :-) If anyone is interested in doing so, please feel
free! My util.concurrent pages include a result summary of one test
suite you could try -- see http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/taskDemo/numbers.html
There are still a couple of minor tuning experiments I haven't
yet carried out for FJTask. It's likely that performance can be
improved by a few percent.
Does anyone know of other data on barrier vs lock costs? It would be
useful to compare Cilk results with those for applications that aren't
specifically geared to high-performance parallelism.
Subsequent chapters of the thesis deal with caching and memory models,
and reveal some of the motivation for the location consistency memory
model paper that Bill Pugh posted a link to last week. (It is also
avialable from above URL).
-- Doug Lea, Computer Science Department, SUNY Oswego, Oswego, NY 13126 USA firstname.lastname@example.org 315-341-2688 FAX:315-341-5424 http://gee.cs.oswego.edu/ ------------------------------- JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:18 EDT