Re: JavaMemoryModel: Idiom for safe, unsynchronized reads

From: Raymie Stata (stata@pa.dec.com)
Date: Mon Jun 28 1999 - 23:35:54 EDT


My previous messages dealt with the programmer's and implementor's
perspectives in the same message. I'm going to split these into different
threads. This message deals with the programmer's perspective. A separate
message under a different topic will deal with the implementor's
perspective.

> Doug Lea writes:
> Consider the use of simple classes like String and Integer....
> Should there be two versions, one of which uses synch in its
> constructor and one not? Or should people put synch blocks only when
> constructing those that the believe might be accessible across
> threads? Or what? No answers along these lines strike me as
> tolerable.

This posting makes it sound like I'm advocating something really
dangerous and counter-intuitive. In fact, what I'm saying does _not_
imply that there need to be two versions of String or Integer nor
anything else similarly intolerable. It's not that bad.

> Josh Bloch writes:
> People (including Bill Joy and Guy Steele, who wrote the memory
> model) are shocked when they find out that according to the model,
> this code returns a potentially corrupt String:
>
> static String foo = null;
>
> String getFoo() {
> if (foo == null)
> foo = new String(..whatever..);
> return foo;
> }

This too makes it sound like I'm advocating something dangerous and
counter-intuitive. There is nothing necessarily wrong with the above
code fragment, and it does not necessarily return a corrupt String.
However, if the programmer's intent is to allow multiple threads to
call "getFoo" without external synchronization, then it _can_ lead to
corruption.

If "foo" is a variable shared by multiple threads, it needs to be
protected against races. What's so shocking about that? What's
complicated or intolerable about making methods like "getFoo"
synchronized methods? One might argue that there is a slight
performance penalty for adding synchronization here. However, if one
wants to reduce synchronization overhead in their Java programs,
rather than removing it from methods like "getFoo" that need it, one
should start by looking at uses of unshared JDK objects (like
StringBuffer), many of which cause _lots_ of unnecessary
synchronization.

If we are really interested in helping unsophisticated programmers
write correct multi-threaded programs, then we must teach them a
simple religion: _all_ accesses to shared variables must be protected.
This is a very easy religion to learn. And once you have this
religion, then the need for synchronization in "getFoo" is not at all
surprising. On the other hand, if you don't have this religion, you
will try to get fancy, and you will definitely introduce a race. (My
understanding is that the JDK is full of what by anybody's definition
are bad races -- the authors don't have the religion!)

Raymie

P.S. In a multi-threaded environment, the above code can also lead
multiple executions of the "new" expression, probably not what the
programmer intended. Thus, the recommended pattern for such a method
is:

     String getFoo() { // Version A
         if (foo == null) {
             synchronized (this) {
                 if (foo == null)
                     foo = new String(..whatever..);
             }
         }
         return foo;
     }

So we expect unsophisticated programmers to realize that they need to
write the above, rather than simply adding the keyword "synchronized"
to Josh's original code:

     String synchronized getFoo() { // Version B
         if (foo == null)
             foo = new String(..whatever..);
         return foo;
     }

It strikes me as completely implausible that people would prefer (A)
over (B) because (A) is simpler to understand or in some sense less
error prone. In general, it seems hard to believe that people find
the religion of "synchronize all accesses to shared variables" as
difficult.

It seems to me that this conversation would be more efficient if we
could agree to the following:

  Version (B) is simpler to understand than Version (A). Version (A)
  is preferable because it's faster. More generally, "synchronized
  access to all shared variables" is a simpler principle by which to
  program than trying to selectively apply idioms for unsynchronized
  access. The attraction of a tighter memory specification is _not_
  to prevent the "unwashed masses" from introducing race errors but
  rather to allow programs to run faster by removing unneeded
  synchronizations.

In what way do you disagree with this statement?
-------------------------------
This is the JavaMemoryModel mailing list, managed by Majordomo 1.94.4.

To send a message to the list, email JavaMemoryModel@cs.umd.edu
To send a request to the list, email majordomo@cs.umd.edu and put
your request in the body of the message (use the request "help" for help).
For more information, visit http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:13 EDT