> // From Doug Lea's: FJTaskRunner.java
> //  * (This relies on the JVM properly dealing with read-after-write
> //  * of two volatiles.)
> 
> // In my tests, I haven't found any JVM that correctly implements
> // read-after-write on volatiles. I tested HotSpot and ExactVM on
> // Sparc Solaric, HotSpot, IBM's JVM and Microsoft's JVM on Windows
> // NT.
I worried about (and asked various people about) this issue when
originally putting together this fork/join framework last year, but
wasn't clever enough to think of your test. 
However, the FJTaskRunner class is put together in a way that it is
easy to check if these read-after-write problems impact behavior.  In
this code, the only observable consequences are that a task is run
twice or is never run at all. If any task isn't run, then programs
using the framework will not terminate.  If they are run more than
once, then the reported total number of tasks run when statistics are
enabled will differ across program executions of test programs. I have
looked for such problems many times, and never found them, at least on
ExactVM and HotSpot running on sparc MPs.
On re-checking this code just now, I think I see why -- it seems that
as a fragile consequence of some other implementation details,
potential problems stemming from bad read-after-write are
self-correcting, at least on sparc.  But considering that people out
there actually run this code, I need to either somehow prove that bad
read-after-write here is always self-correcting, regardless of
architecture, or release a (slower) version that only relies on
locking. (The reason for not using a lock here is that the deque
emptiness check is done on each task start, which occurs many millions
of times per second in some applications of this framework.)
Out of curiosity though, could other people on this list with
different MPs running different VMs check this out?  (It is part of
util.concurrent, that you can get off my home page
http://gee.cs.oswego.edu/) Probably the best test is Fib (in the
taskDemo dirtectory) since it has the highest task generation
rate. The Total Runs reported should always be the same. To run it,
do:
  java Fib n 40 0
where n is number of threads, which should equal number of CPUs.
This computes fib(40) -- 40 just picked arbitrarily.
The trailing "0" says to use finest possible task granularity,
which gives very non-optimal speedups here but is the best 
for testing this problem.
For example, using one CPU -- Fib 1 40 0 
Thread  Q Cap   Scans   New     Runs
T0      4096    16      1       331160281
Total           16      1       331160281
Execute: 1      Time: 158.717   Rate: 2086483
Fib: Size: 40 Answer: 102334155
And 4 CPUs --  Fib 4 40 0
Thread  Q Cap   Scans   New     Runs
T0*     4096    66      17      82570105
T1      4096    191     17      83075667
T2      4096    146     19      83065751
T3      4096    130     18      82448758
Total           533     71      331160281
Execute: 1      Time: 43.587    Rate: 7597685
... the total runs (331160281) is the same here (and both terminated :-)
-- Doug Lea, Computer Science Department, SUNY Oswego, Oswego, NY 13126 USA dl@cs.oswego.edu 315-341-2688 FAX:315-341-5424 http://gee.cs.oswego.edu/
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:23 EDT