JavaMemoryModel: A performance argument against acquire/release ordering, a call for a quantitative approach, and a naive proposal

From: Assaf Schuster (assaf@cs.technion.ac.il)
Date: Tue Jul 10 2001 - 19:13:42 EDT


> I think the acquire/release ordering (i.e., "roach motel" ordering)
> provides an intuitive model that can be easily encapsulated into a
> number of design patterns for concurrent software and is appropriate
> for the new JMM. I haven't seen any arguments to convince me
> otherwise.
>
> Bill

OK, how about this one:

Let us consider the knowledge level of the bulk of Java programmers.
In fact, let us talk about the "average" Java programmer, who knows
threading.
Java is meant to be a popular language, to be tought in high school.
Thus, our average programmer knows nothing about ordering and
acquire/release stuff, and data races. Even worse, he does not want to
know!

So, what happens to our average programmer when we decide to force
him (say) to avoid data races? Our experience with novices doing
acquire/release on DSMs shows they will do anything to make their program
"correct", histerically adding barriers on every entering/exiting of a
method.
The result is a program having most of its methods synchronized.

What does this mean from performance point of view?
We are talking here an OO language, for which most of the methods
are only a few lines of code long. Thus, the impact is like having
an expensive memory barrier every few instructions!

Of course, time can now be invested by the programmer in cleaning up some
of these extra synchronized calls. However, given that finding a redundant
synchronization is as hard as finding a data-race (co-NP complete, I
believe),
here are a few related questions:
A: How many average programmers will actually bother?
B: Consider those programmers who will try to clean up their code. How many
    redundant memory barriers will be left in the program when they finally
give up?
C: How long should an average programmer invest in cleaning an average
    program from redundant memory barriers before the program performs
    as well as it would on an optimized SC memory model?
D: Same as C, ..... before the program achieves maximal performance gain
    relative to a SC memory model?
E: In D, what is the average maximal performance gain mentioned?
F: ... more of the same ....

Why should the average programmer bother with our acquire/release semantics,
and our "properties for in/correctly synchronized programs",
while we are unable to even tell him his maximal performance gain for his
program?
He will surely synchronize in all cases of doubt, and as I mentioned, it is
as hard
as finding a data race to remove such doubt.

----------------------------------------------------------------------------

----
    CONCLUSION1: Anything more relaxed than SC for JMM will result in
    degredation of performance for the "average" multithreaded Java program.
----------------------------------------------------------------------------
----

---------------------------------------------------------------------------- ---- CNCLUSION2: We should find a way to answer the questions of the type mentioned above (performance gain, time spent, etc.) in a quantitative way. ---------------------------------------------------------------------------- ----

Now, let's forget for the moment about the average Java programmer, and let's consider the audience of this interest group: the compiler optimization community, the hardware community, etc. Surely all the people reading this are experts and know how to handle release/acquire stuff, how to prevent data races and how to avoid redundant synchronizations (?do they?). Clearly, they cannot accept the performance of a SC system.

Yet another example is the community of experts, say, doing high-performance real time systems. These typically specialize in avoiding expensive memory barriers by actually using data races. They rely on the liveness property of SC-like systems that guarantees eventual transfer of a written value even in the absense of release/acquire. They know that this liveness for a single value (flag) is much less expensive than a full-fledged memory barrier. However, they try to avoid even making this flag volatile for the overhead involved.

Now, since we cannot solve this hard tradeoff, and since we want to satisfy both the average programmer and the expert communities, how about enjoying both worlds:

---------------------------------------------------------------------------- ---- A NAIVE PROPOSAL: Let the default memory model be SC. Let the compiler have an optimizing flag to compile the programs using (some) relaxed release/acquire semantics. ---------------------------------------------------------------------------- ----

Of course, there is the question of partial compilation and all, but perhaps everything can be fixed with defaulting to SC and/or forcing to RC.

Also, there is the question how to make the bytecodes run everywhere. I think bytecodes can be executed SC. Some indication should be added so that when bytecodes are recompiled to native code the result may be optimized towards RC.

I'm not sure adding flags to the bytecodes is an option at this stage, but wouldn't it be nice if it was? We could allow the programmer to choose out of a range of behaviors, and even change them during running time.

--- Assaf

------------------------------- JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT