Re: JavaMemoryModel: Most (all?) JVM's incorrectly handlevolatile reads-after-writes

From: Hong Zhang (Hong.Zhang@eng.sun.com)
Date: Mon Nov 29 1999 - 13:22:12 EST


I think we should discourage such idiom generally, it will
likely cause huge performance disparity. This kind of code will
run terribly slow with interpreter (most of embedded systems
will not have compiler for some long time.)

I recommend to add a single method to System class to do the
memory barrier job, some thing like:
public class System {
  ...
  public static native void membar();
  ...
}

Support different kinds of membar is not portable nor necessary.
NOTE: the future version of Sparc will have only one consolidated
membar operation.

Hong

P.S.: (Doug) could you send me a copy of your new code? Thanks.

>HotSpot could be made to recognize this idiom,
>'synchronized(new Object()) {...}', without too much effort;
>a week max. If there is a large enough hew and cry it might
>bubble up to the top of my priority list. (An unsynchronized
>Hashtable has been on my to-do list for awhile; I have a
>prototype floating about but it's just a proof of concept).
>In short, I'm interested in this but swamped at the moment.
>
>Cliff
>
>Doug Lea wrote:
>
>> I wrote...
>>
>> > In this and other frameworks, classes and utilities I write, I could
>> > do a better job, without having to tread in dark corners of the
>> > language, if I just had direct access to memory barriers. Rather than
>> > trying to twist the overall memory model to somehow get the right
>> > effects in the right contexts, how about just doing the obvious, and
>> > creating:
>> >
>> > class java.lang.MemoryBarrier {
>> > public static native void loadStoreBarrier();
>> > public static native void loadLoadBarrier();
>> > public static native void storeLoadBarrier();
>> > public static native void storeStoreBarrier();
>> > }
>>
>> When writing this, I had forgotten that all of these effects could,
>> under a sufficiently powerful JVM, be obtained via
>> synchronized(new Object()) { ... }
>> (This holds, in particular, under Bill's current proposal.)
>>
>> For the main examples, to get a standard read barrier, do:
>>
>> synchronized(new Object()) {
>> localVar = field;
>> }
>>
>> and a standard write barrier via
>>
>> synchronized(new Object()) {
>> field = localVar;
>> }
>>
>> The basic ideas here are that the compiler/JVM would have to notice:
>>
>> (1) That actually acquiring and releasing the lock are not necessary.
>> (2) That the object is thus never used and need not actually be
constructed
>> (3) That in the first case, a write barrier is not needed
>> since there are no writes in the synchronized block,
>> and similarly for read barrier in the second.
>> (4) That on machines that do not ordinarily need read barriers
>> (sparcs, pentiums), that the first case thus normally amounts
>> to a simple read without a barrier. (Although it would still
>> carry any consequential effects wrt code re-ordering, register
>> usage, etc.)
>>
>> But is any existing JVM this smart about such things? Is it realistic
>> to assume enough of them will be this smart soon enough for people to
>> write code using such constructions?
>>
>> Here's why I've been a little obsessed about this issue lately:
>>
>> Given that I knew that concurrently readable hash tables are not so
>> hard to implement, and given Bill's postings showing that
>> Hashtable.get accounts for a significant proportion of unnecessary
>> synchronizations, I tried building a Hashtable replacement that can
>> normally perform reads without locking. Such a class should be a
>> straight win -- it should be possible in principle to make this class
>> about as fast as the unsynchronized HashMap class in single-threaded
>> applications, yet massively faster than the synchronized Hashtable
>> class in typical multithreaded applications. What could be better?
>>
>> Except... this code still occasionally requires a read barrier in some
>> cases of some methods on some machines. At least on sparcs (ExactVM
>> and Hotspot), if I use synchronized(new Object()) to get this effect,
>> then the result is generally slower than the unsynchronized HashMap
>> class for single-threaded applications. If I manually optimize it
>> away, it is on average as fast or slightly faster than HashMap in
>> tests I've run. (And is much much faster that Hashtable in typical
>> multithreaded tests.) I strongly suspect that this would also hold
>> (at least approximately) on machines with more relaxed memory models
>> where this barrier is actually needed.
>>
>> So right now, I'm stuck holding a class that I don't quite dare
>> distribute since it doesn't make good on some of its basic performance
>> claims on any JVM I could run it on. I'm unhappy.
>>
>> (If you are interested in testing this code on other JVMs, just ask me
>> for a copy.)
>>
>> I think that there are a number of other cases where allowing people
>> to write better library code is a better solution to
>> concurrency-support perfromance issues than other approaches. But only
>> if the base language is expressive enough to write such code.
>>
>> -Doug
>



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:23 EDT