Re: JavaMemoryModel: Most (all?) JVM's incorrectly handle volatile reads-after-writes

From: Doug Lea (
Date: Fri Nov 26 1999 - 20:56:48 EST

I wrote...

> In this and other frameworks, classes and utilities I write, I could
> do a better job, without having to tread in dark corners of the
> language, if I just had direct access to memory barriers. Rather than
> trying to twist the overall memory model to somehow get the right
> effects in the right contexts, how about just doing the obvious, and
> creating:
> class java.lang.MemoryBarrier {
> public static native void loadStoreBarrier();
> public static native void loadLoadBarrier();
> public static native void storeLoadBarrier();
> public static native void storeStoreBarrier();
> }

When writing this, I had forgotten that all of these effects could,
under a sufficiently powerful JVM, be obtained via
  synchronized(new Object()) { ... }
(This holds, in particular, under Bill's current proposal.)

For the main examples, to get a standard read barrier, do:
  synchronized(new Object()) {
    localVar = field;

and a standard write barrier via

  synchronized(new Object()) {
    field = localVar;

The basic ideas here are that the compiler/JVM would have to notice:

   (1) That actually acquiring and releasing the lock are not necessary.
   (2) That the object is thus never used and need not actually be constructed
   (3) That in the first case, a write barrier is not needed
        since there are no writes in the synchronized block,
        and similarly for read barrier in the second.
   (4) That on machines that do not ordinarily need read barriers
       (sparcs, pentiums), that the first case thus normally amounts
       to a simple read without a barrier. (Although it would still
       carry any consequential effects wrt code re-ordering, register
       usage, etc.)

But is any existing JVM this smart about such things? Is it realistic
to assume enough of them will be this smart soon enough for people to
write code using such constructions?

Here's why I've been a little obsessed about this issue lately:

Given that I knew that concurrently readable hash tables are not so
hard to implement, and given Bill's postings showing that
Hashtable.get accounts for a significant proportion of unnecessary
synchronizations, I tried building a Hashtable replacement that can
normally perform reads without locking. Such a class should be a
straight win -- it should be possible in principle to make this class
about as fast as the unsynchronized HashMap class in single-threaded
applications, yet massively faster than the synchronized Hashtable
class in typical multithreaded applications. What could be better?

Except... this code still occasionally requires a read barrier in some
cases of some methods on some machines. At least on sparcs (ExactVM
and Hotspot), if I use synchronized(new Object()) to get this effect,
then the result is generally slower than the unsynchronized HashMap
class for single-threaded applications. If I manually optimize it
away, it is on average as fast or slightly faster than HashMap in
tests I've run. (And is much much faster that Hashtable in typical
multithreaded tests.) I strongly suspect that this would also hold
(at least approximately) on machines with more relaxed memory models
where this barrier is actually needed.

So right now, I'm stuck holding a class that I don't quite dare
distribute since it doesn't make good on some of its basic performance
claims on any JVM I could run it on. I'm unhappy.

(If you are interested in testing this code on other JVMs, just ask me
for a copy.)

I think that there are a number of other cases where allowing people
to write better library code is a better solution to
concurrency-support perfromance issues than other approaches. But only
if the base language is expressive enough to write such code.

JavaMemoryModel mailing list -

This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:23 EDT