> I think the acquire/release ordering (i.e., "roach motel" ordering)
> provides an intuitive model that can be easily encapsulated into a
> number of design patterns for concurrent software and is appropriate
> for the new JMM. I haven't seen any arguments to convince me
> otherwise.
>
> Bill
OK, how about this one:
Let us consider the knowledge level of the bulk of Java programmers.
In fact, let us talk about the "average" Java programmer, who knows
threading.
Java is meant to be a popular language, to be tought in high school.
Thus, our average programmer knows nothing about ordering and
acquire/release stuff, and data races. Even worse, he does not want to
know!
So, what happens to our average programmer when we decide to force
him (say) to avoid data races? Our experience with novices doing
acquire/release on DSMs shows they will do anything to make their program
"correct", histerically adding barriers on every entering/exiting of a
method.
The result is a program having most of its methods synchronized.
What does this mean from performance point of view?
We are talking here an OO language, for which most of the methods
are only a few lines of code long. Thus, the impact is like having
an expensive memory barrier every few instructions!
Of course, time can now be invested by the programmer in cleaning up some
of these extra synchronized calls. However, given that finding a redundant
synchronization is as hard as finding a data-race (co-NP complete, I
believe),
here are a few related questions:
A: How many average programmers will actually bother?
B: Consider those programmers who will try to clean up their code. How many
redundant memory barriers will be left in the program when they finally
give up?
C: How long should an average programmer invest in cleaning an average
program from redundant memory barriers before the program performs
as well as it would on an optimized SC memory model?
D: Same as C, ..... before the program achieves maximal performance gain
relative to a SC memory model?
E: In D, what is the average maximal performance gain mentioned?
F: ... more of the same ....
Why should the average programmer bother with our acquire/release semantics,
and our "properties for in/correctly synchronized programs",
while we are unable to even tell him his maximal performance gain for his
program?
He will surely synchronize in all cases of doubt, and as I mentioned, it is
as hard
as finding a data race to remove such doubt.
----------------------------------------------------------------------------
----
CONCLUSION1: Anything more relaxed than SC for JMM will result in
degredation of performance for the "average" multithreaded Java program.
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
CNCLUSION2: We should find a way to answer the questions of the type
mentioned above (performance gain, time spent, etc.) in a quantitative
way.
----------------------------------------------------------------------------
----
Now, let's forget for the moment about the average Java programmer, and
let's
consider the audience of this interest group: the compiler optimization
community,
the hardware community, etc. Surely all the people reading this are experts
and
know how to handle release/acquire stuff, how to prevent data races and how
to avoid redundant synchronizations (?do they?). Clearly, they cannot accept
the
performance of a SC system.
Yet another example is the community of experts, say, doing high-performance
real time
systems. These typically specialize in avoiding expensive memory barriers
by actually using
data races. They rely on the liveness property of SC-like systems that
guarantees
eventual transfer of a written value even in the absense of release/acquire.
They know
that this liveness for a single value (flag) is much less expensive than a
full-fledged
memory barrier. However, they try to avoid even making this flag volatile
for the
overhead involved.
Now, since we cannot solve this hard tradeoff, and since we want to satisfy
both
the average programmer and the expert communities, how about enjoying both
worlds:
----------------------------------------------------------------------------
----
A NAIVE PROPOSAL:
Let the default memory model be SC. Let the compiler have an optimizing
flag to compile the programs using (some) relaxed release/acquire
semantics.
----------------------------------------------------------------------------
----
Of course, there is the question of partial compilation and all, but perhaps
everything can be fixed with defaulting to SC and/or forcing to RC.
Also, there is the question how to make the bytecodes run everywhere.
I think bytecodes can be executed SC. Some indication should be added
so that when bytecodes are recompiled to native code the result may be
optimized towards RC.
I'm not sure adding flags to the bytecodes is an option at this stage,
but wouldn't it be nice if it was? We could allow the programmer to
choose out of a range of behaviors, and even change them during
running time.
--- Assaf
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT