Re: JavaMemoryModel: Idiom for safe, unsynchronized reads

From: Joshua Bloch (jbloch@eng.sun.com)
Date: Tue Jun 29 1999 - 03:25:47 EDT


Raymie says:

>> Josh Bloch writes:
>> People (including Bill Joy and Guy Steele, who wrote the memory
>> model) are shocked when they find out that according to the model,
>> this code returns a potentially corrupt String:
>>
>> static String foo = null;
>>
>> String getFoo() {
>> if (foo == null)
>> foo = new String(..whatever..);
>> return foo;
>> }
>
>This too makes it sound like I'm advocating something dangerous and
>counter-intuitive. There is nothing necessarily wrong with the above
>code fragment, and it does not necessarily return a corrupt String.
>However, if the programmer's intent is to allow multiple threads to
>call "getFoo" without external synchronization, then it _can_ lead to
>corruption.

  It can lead to corruption of a final, immutable system-provided object,
and that's violently counterintuitive! Joe Programmer (and Guy Steele and
Bill Joy and Doug Lea (http://gee.cs.oswego.edu/dl/cpj/immut.html) and Bill
Venners (http://www.artima.com/flexiblejava/threadsafety.html)) reason
thusly:

    (a) The constructor establishes an invariant.

    (b) The object is immutable, so no operations modify the invariant.

==> Any thread observing the object will observe the invariant,
        without the need for synchronization.

If a language violates programmers' strongly held intuitions, they'll
write buggy code.

   Raymie's next point was (roughly speaking) that Java massively
oversynchronizes its libraries (e.g. StringBuffer). This is absolutely
true, but I believe that it's tangential to the discussion at hand. His
next point is this:

>If we are really interested in helping unsophisticated programmers
>write correct multi-threaded programs, then we must teach them a
>simple religion: _all_ accesses to shared variables must be protected.
>This is a very easy religion to learn.

   As I said previously, I think that it's not an easy religion to learn.
Intuitively, synchronization is for insulating one thread from changes made
by another. People just don't think that it's necessary for immutable
objects.

    Raymie's next point is that the lazy initialization idiom in my previous
letter does not guarantee that only a single instance of foo is created. I
know this well. In fact, I may be one of the earliest discoverers of the
"doublecheck" idiom that Raymie describes (I was using it at CMU in the
mid-'80s). The only reason that I didn't mention it in my previous letter is
that I thought it was an unnecessary complication.

    I believe that Raymie is wrong in stating that the doublecheck idiom is
preferable to the totally unsynchronized read. They do different things;
sometimes you need one and sometimes you need the other. For example,
consider BigInteger: it has a bunch of methods that calculate expensive int
functions of the (immutable) BigInteger (e.g., the number of one-bits in the
BigInteger). The results of these calculation are cached in int fields the
first time they're needed. It doesn't matter if such a field is
occasionally recalculated, so we don't need the doublecheck idiom. (Note,
by the way, that this code is safe even under the current threading model,
as each unsynchronized shared datum is a single integer, so it's impossible
to observe it in a bogus state. At worst, each thread will calculate the
value the first time it needs it.)

>And once you have this religion, then the need for synchronization in
>"getFoo" is not at all surprising.

  As stated above, I disagree, given that getFoo returns a *String*.

>On the other hand, if you don't have this religion, you will try to
>get fancy, and you will definitely introduce a race.

    People aren't trying to be "fancy" when they don't synchronize access to
immutable objects. They just don't think that it's necessary. It doesn't
occur to them to synchronize.

   Raymies' final point:

>It strikes me as completely implausible that people would prefer (A)
>[doublecheck] over (B) [fully synchronized] because (A) is simpler to
>understand or in some sense less error prone.

    Agreed; the doublecheck idiom is known only to moderately sophisticated
programmers. But the totally unsynchronized idiom for lazy initialization
with the possibility of reinitialization is much more widely known and used.

>In general, it seems hard to believe that people find the religion of
>"synchronize all accesses to shared variables" as difficult.

   Believe it. People think that synchronization is only for mutable
variables, i.e., those that can be modified after they're "first published."
Anything else is counterintuitive.

>It seems to me that this conversation would be more efficient if we
>could agree to the following:
>
> Version (B) is simpler to understand than Version (A). Version (A)
> is preferable because it's faster.

   Yes, although we both agree that (A) isn't necessarily preferable.
Premature optimization is the root of all evil.

> More generally, "synchronized
> access to all shared variables" is a simpler principle by which to
> program than trying to selectively apply idioms for unsynchronized
> access.

   No. The masses don't "try to selectively apply idioms for unsynchronized
access. They just don't bother synchronizing where they don't believe that
it's necessary. The JLS encourages this by going out of its way to tell you
what data access is atomic and what isn't.

> The attraction of a tighter memory specification is _not_
> to prevent the "unwashed masses" from introducing race errors but
> rather to allow programs to run faster by removing unneeded
> synchronizations.

   No, it does both. It allows sophisticated programmers to use techniques
like the doublecheck idiom, *and* it prevents unsophisticated programmers
from tripping up by failing to synchronize access to lazily initialized
immutables.

           Sorry for the overly long letter,

           Josh

-------------------------------
This is the JavaMemoryModel mailing list, managed by Majordomo 1.94.4.

To send a message to the list, email JavaMemoryModel@cs.umd.edu
To send a request to the list, email majordomo@cs.umd.edu and put
your request in the body of the message (use the request "help" for help).
For more information, visit http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:13 EDT