Majordomo bounced this posting, so I'm reposting it.
Please keep in mind: if you post from a different email address than 
you are subscribed by, your posting will bounce. Alex is subscribed 
under alex.garthwaite@sun.com, so the posting from 
alex@enkidu.East.Sun.COM bounced. Let me know if you want help in 
getting the email address you are registered under changed.
        Bill
At 11:05 AM -0400 7/12/01, Alexander Garthwaite - Sun Labs 
<alex@enkidu.East.Sun.COM> wrote:
>
>Hans Boehm wrote:
>
>>  1) Can a volatile write be reordered with a following volatile read as in
>>  TSO.  I think this is basically the same question as whether monitor exit
>>  and a following volatile read can be reordered.  I currently think this is
>>  observable (in spite of earlier messages to the contrary), but it's probably
>>  not critical either way.  Allowing the reordering is cheaper on Itanium, at
>>  least for volatiles.
>>
>>  2) If two threads write two different volatile locations, and if there are
>>  several observer threads, do all the observer threads see the writes in the
>>  same order?  To what extent is there a global total order among all the
>>  volatile accesses?
>>
>>  Question 1:
>>
>>  I think you can observe if a volatile read is moved before an unlock
>>  operation.  Let y be volatile.  Initially x and y are zero.
>>
>>  Thread 1:
>>  x = 1;
>>  unlock;
>>  z = y; //volatile read
>>
>>  Thread 2:
>>  y = 1;
>>  <barrier, e.g. assignment of another volatile to itself>
>>  x = 2;
>>
>>  Can this end with x = 1 and z = 0?
>>
>>  If the reordering of the read is not allowed, then all actions in each
>>  thread are ordered.  x = 1 precedes the read of y.  Since x = 1 at the end,
>>  it must have followed the x = 2 assignment.  Thus the read of y must have
>>  followed the assignment to y.
>>
>>  If we allow the reordering, then we can rewrite thread 1 as x = 1; z = y;
>>  unlock .  Sinze z = y is only an acquire barrier, this can be reordered as z
>>  = y; x = 1; unlock . Thus x = 1 and z = 0 becomes a possible outcome.
>>
>>  Did I miss something?
>>
>>  This seems to be eseentially equivalent to reordering a volatile write
>>  followed by a volatile read.
>>
>>  That having been said, I'm still leaning towards allowing the reordering.
>>  Disallowing the ordering seems to require an extra memory barrier on at
>>  least X86 and Itanium.  Was there a natural example for which it actually
>>  matters?  It sounds like the specification would be simpler if we allowed
>>  it.
>
>I agree that such reorderings are observable and I can think of several
>techniques that rely on the ability to prevent (volatile) reads
>being reordered before (volatile) writes.
>
>The first example would be a concurrent deque such as Arora and Blumofe
>use. Their use of this structure in work-stealing associates a deque
>with each thread and allows the owner of a given deque to largely
>interact with its own deque (at the bottom) without synchronization
>overhead. Only when the deque becomes sufficiently small does the owner
>have to synchronize access with possible stealers. Stealers, on the
>other hand, must synchronize on the top index when accessing another
>thread's deque.
>
>When the deque-owner pops an item off the bottom, it performs
>the following sequence of reads and writes (assume that "bottom"
>and "top" are volatile fields in the deque object:
>
>Object pop() {
>     int localBot = bottom - 1;
>     int localTop;
>     bottom = localBot;
>     /* need membar here to enforce SC; this would mean a
>        "membar Store#Load" for a SPARC TSO system. */
>     localTop = top;
>     if (localTop < localBot) {
>         return array[localBot];
>     }
>
>     ... do expensive interaction ...
>}
>
>The ordering ensures that the owner has published a new "bottom"
>to any stealers and that the "top" it reads is informed by that
>act. It allows the owner to know that at worst top is equal to
>or one more than bottom.  Allowing the read of "top" to be rescheduled
>above the write of "bottom" would invalidate the comparison that
>ensures they are sufficiently far apart.
>
>The second example is a case of an optimistic write whose effectiveness
>is validated with a subsequent read. One example can be found in
>Dave Dice's relaxed locks synchronization technique. His technique
>(when written in C) requires only one atomic operation for lock
>acquisition and none for lock-release. At the same time, it also
>has a straight-forward mechanism for lock-deflation without
>requiring extra space. One of his key insights was that where
>other similar techniques had two ways to represent an object
>as locked and one way to represent objects as unlocked (thin-locks,
>for example), his technique had two ways to represent an unlocked object
>and one way to represent one as locked. This introduced a benign
>race where one thread was acquiring the lock while another thread
>was deflating the lock-record. His solution was a read by the
>thread successfully acquiring the lock to validate that the lock-record
>was still associated with the right object.
>
>Since Java doesn't have a CAS, (some of) his code, were it written in Java,
>would look something like:
>
>     class LockRecord {
>         private Thread owner;
>         ...
>
>         public Thread acquire(Thread locker) {
>             synchronized(this) {        /* it would be nice to have CAS */
>                 if (owner == null) owner = locker;    /* B */
>             }
>             return owner;
>         }
>     }
>
>     class Item {
>         private volatile LockRecord lockrec;
>
>         public void lock(Thread locker) {
>             LockRecord lr = lockrec;
>             if (lr != null) {
>                 if (lr.acquire(locker) == locker)
>                     if (lr == lockrec) return;        /* A */
>                 else {
>                     ...
>                 }
>             } else {
>                 ...
>             }
>         }
>     }
>
>Suppose that the call to lr.acquire is inlined in lock().
>If the read of "lr" (at A) could be pulled into the synchronized block
>and reordered before the store of owner (at B), this would
>undermine the usefulness of the check.
>
>I suspect there are many examples of such optimistic writes followed
>by validating reads. I also expect there are many examples where the
>reordering is benign. The motivation to not overly constrain the spec
>at the expense of performance is a good one but if we go with a model
>like TSO, we will definitely need some way to insert barriers to
>prevent such reorderings where it does matter.
>
>                              -- Alex
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT