Re: JavaMemoryModel: Roach motel ordering: Do we have consensus?

From: Bill Pugh (pugh@cs.umd.edu)
Date: Thu Jul 12 2001 - 12:03:43 EDT


Majordomo bounced this posting, so I'm reposting it.

Please keep in mind: if you post from a different email address than
you are subscribed by, your posting will bounce. Alex is subscribed
under alex.garthwaite@sun.com, so the posting from
alex@enkidu.East.Sun.COM bounced. Let me know if you want help in
getting the email address you are registered under changed.

        Bill

At 11:05 AM -0400 7/12/01, Alexander Garthwaite - Sun Labs
<alex@enkidu.East.Sun.COM> wrote:
>
>Hans Boehm wrote:
>
>> 1) Can a volatile write be reordered with a following volatile read as in
>> TSO. I think this is basically the same question as whether monitor exit
>> and a following volatile read can be reordered. I currently think this is
>> observable (in spite of earlier messages to the contrary), but it's probably
>> not critical either way. Allowing the reordering is cheaper on Itanium, at
>> least for volatiles.
>>
>> 2) If two threads write two different volatile locations, and if there are
>> several observer threads, do all the observer threads see the writes in the
>> same order? To what extent is there a global total order among all the
>> volatile accesses?
>>
>> Question 1:
>>
>> I think you can observe if a volatile read is moved before an unlock
>> operation. Let y be volatile. Initially x and y are zero.
>>
>> Thread 1:
>> x = 1;
>> unlock;
>> z = y; //volatile read
>>
>> Thread 2:
>> y = 1;
>> <barrier, e.g. assignment of another volatile to itself>
>> x = 2;
>>
>> Can this end with x = 1 and z = 0?
>>
>> If the reordering of the read is not allowed, then all actions in each
>> thread are ordered. x = 1 precedes the read of y. Since x = 1 at the end,
>> it must have followed the x = 2 assignment. Thus the read of y must have
>> followed the assignment to y.
>>
>> If we allow the reordering, then we can rewrite thread 1 as x = 1; z = y;
>> unlock . Sinze z = y is only an acquire barrier, this can be reordered as z
>> = y; x = 1; unlock . Thus x = 1 and z = 0 becomes a possible outcome.
>>
>> Did I miss something?
>>
>> This seems to be eseentially equivalent to reordering a volatile write
>> followed by a volatile read.
>>
>> That having been said, I'm still leaning towards allowing the reordering.
>> Disallowing the ordering seems to require an extra memory barrier on at
>> least X86 and Itanium. Was there a natural example for which it actually
>> matters? It sounds like the specification would be simpler if we allowed
>> it.
>
>I agree that such reorderings are observable and I can think of several
>techniques that rely on the ability to prevent (volatile) reads
>being reordered before (volatile) writes.
>
>The first example would be a concurrent deque such as Arora and Blumofe
>use. Their use of this structure in work-stealing associates a deque
>with each thread and allows the owner of a given deque to largely
>interact with its own deque (at the bottom) without synchronization
>overhead. Only when the deque becomes sufficiently small does the owner
>have to synchronize access with possible stealers. Stealers, on the
>other hand, must synchronize on the top index when accessing another
>thread's deque.
>
>When the deque-owner pops an item off the bottom, it performs
>the following sequence of reads and writes (assume that "bottom"
>and "top" are volatile fields in the deque object:
>
>Object pop() {
> int localBot = bottom - 1;
> int localTop;
> bottom = localBot;
> /* need membar here to enforce SC; this would mean a
> "membar Store#Load" for a SPARC TSO system. */
> localTop = top;
> if (localTop < localBot) {
> return array[localBot];
> }
>
> ... do expensive interaction ...
>}
>
>The ordering ensures that the owner has published a new "bottom"
>to any stealers and that the "top" it reads is informed by that
>act. It allows the owner to know that at worst top is equal to
>or one more than bottom. Allowing the read of "top" to be rescheduled
>above the write of "bottom" would invalidate the comparison that
>ensures they are sufficiently far apart.
>
>The second example is a case of an optimistic write whose effectiveness
>is validated with a subsequent read. One example can be found in
>Dave Dice's relaxed locks synchronization technique. His technique
>(when written in C) requires only one atomic operation for lock
>acquisition and none for lock-release. At the same time, it also
>has a straight-forward mechanism for lock-deflation without
>requiring extra space. One of his key insights was that where
>other similar techniques had two ways to represent an object
>as locked and one way to represent objects as unlocked (thin-locks,
>for example), his technique had two ways to represent an unlocked object
>and one way to represent one as locked. This introduced a benign
>race where one thread was acquiring the lock while another thread
>was deflating the lock-record. His solution was a read by the
>thread successfully acquiring the lock to validate that the lock-record
>was still associated with the right object.
>
>Since Java doesn't have a CAS, (some of) his code, were it written in Java,
>would look something like:
>
> class LockRecord {
> private Thread owner;
> ...
>
> public Thread acquire(Thread locker) {
> synchronized(this) { /* it would be nice to have CAS */
> if (owner == null) owner = locker; /* B */
> }
> return owner;
> }
> }
>
> class Item {
> private volatile LockRecord lockrec;
>
> public void lock(Thread locker) {
> LockRecord lr = lockrec;
> if (lr != null) {
> if (lr.acquire(locker) == locker)
> if (lr == lockrec) return; /* A */
> else {
> ...
> }
> } else {
> ...
> }
> }
> }
>
>Suppose that the call to lr.acquire is inlined in lock().
>If the read of "lr" (at A) could be pulled into the synchronized block
>and reordered before the store of owner (at B), this would
>undermine the usefulness of the check.
>
>I suspect there are many examples of such optimistic writes followed
>by validating reads. I also expect there are many examples where the
>reordering is benign. The motivation to not overly constrain the spec
>at the expense of performance is a good one but if we go with a model
>like TSO, we will definitely need some way to insert barriers to
>prevent such reorderings where it does matter.
>
> -- Alex
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT