Re: JavaMemoryModel: Most (all?) JVM's incorrectly handle volatile reads-after-writes

From: Doug Lea (
Date: Sun Nov 21 1999 - 10:25:51 EST

> // From Doug Lea's:
> // * (This relies on the JVM properly dealing with read-after-write
> // * of two volatiles.)
> // In my tests, I haven't found any JVM that correctly implements
> // read-after-write on volatiles. I tested HotSpot and ExactVM on
> // Sparc Solaric, HotSpot, IBM's JVM and Microsoft's JVM on Windows
> // NT.

I worried about (and asked various people about) this issue when
originally putting together this fork/join framework last year, but
wasn't clever enough to think of your test.

However, the FJTaskRunner class is put together in a way that it is
easy to check if these read-after-write problems impact behavior. In
this code, the only observable consequences are that a task is run
twice or is never run at all. If any task isn't run, then programs
using the framework will not terminate. If they are run more than
once, then the reported total number of tasks run when statistics are
enabled will differ across program executions of test programs. I have
looked for such problems many times, and never found them, at least on
ExactVM and HotSpot running on sparc MPs.

On re-checking this code just now, I think I see why -- it seems that
as a fragile consequence of some other implementation details,
potential problems stemming from bad read-after-write are
self-correcting, at least on sparc. But considering that people out
there actually run this code, I need to either somehow prove that bad
read-after-write here is always self-correcting, regardless of
architecture, or release a (slower) version that only relies on
locking. (The reason for not using a lock here is that the deque
emptiness check is done on each task start, which occurs many millions
of times per second in some applications of this framework.)

Out of curiosity though, could other people on this list with
different MPs running different VMs check this out? (It is part of
util.concurrent, that you can get off my home page Probably the best test is Fib (in the
taskDemo dirtectory) since it has the highest task generation
rate. The Total Runs reported should always be the same. To run it,
  java Fib n 40 0
where n is number of threads, which should equal number of CPUs.
This computes fib(40) -- 40 just picked arbitrarily.
The trailing "0" says to use finest possible task granularity,
which gives very non-optimal speedups here but is the best
for testing this problem.

For example, using one CPU -- Fib 1 40 0
Thread Q Cap Scans New Runs
T0 4096 16 1 331160281
Total 16 1 331160281
Execute: 1 Time: 158.717 Rate: 2086483
Fib: Size: 40 Answer: 102334155

And 4 CPUs -- Fib 4 40 0
Thread Q Cap Scans New Runs
T0* 4096 66 17 82570105
T1 4096 191 17 83075667
T2 4096 146 19 83065751
T3 4096 130 18 82448758
Total 533 71 331160281
Execute: 1 Time: 43.587 Rate: 7597685

... the total runs (331160281) is the same here (and both terminated :-)

Doug Lea, Computer Science Department, SUNY Oswego, Oswego, NY 13126 USA 315-341-2688 FAX:315-341-5424  

This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:23 EDT