Re: JavaMemoryModel: A memory model for the masses

From: Bill Pugh (
Date: Tue Nov 09 1999 - 23:12:31 EST

At 8:13 AM -0500 11/9/99, Doug Lea wrote:
>As an example, here's a popular design for an unbounded linked queue,
>published in Maged Michael and Michael Scott's "Simple, Fast, and
>Practical Non-Blocking and Blocking Concurrent Queue Algorithms",
>Proceedings, 15th ACM Symposium on Principles of Distributed
>Computing, ACM, 1996 -- see
>This is the fastest lock-based queue design that I know of. It is used
>in a lot of systems. A Java version that conforms to both current and
>proposd revised models looks like this. (This is from p130-131 of 2nd
>edition of CPJ, and is the basis of my util.concurrent.LinkedQueue class.)

With my proposed change to the meaning of volatile, a programmer can
declare the
next field to be volatile and eliminate the inner lock. Doug
hinted/suggested this:
>Note that volatiles could come to the rescue here if dereferencing a
>volatile reference did guarantee visibility of contents.

At 8:13 AM -0500 11/9/99, Doug Lea wrote:
>I suppose that is worth mentioning here that the non-blocking version
>of this queue presented in the Michael & Scott paper cannot be
>efficiently implemented in Java because it requires a CAS.


I've been thinking about how a compiler could recognize that a synchronization
lock could be replaced with a CAS instruction. I think the following
code will do
work for the Queue example.

The compiler has to be able to determine that the only places where a lock will
be obtained on the object referenced by this.lock is in the 5
synchronized block
within this code. Further, it needs to verify that each of these
synchronized blocks
is performing a simple atomic compare, swap and branch. Given that, I
think it will
be possible to replace them all with compare and swaps.

Of course, we got a fair number of volatile fields in here as well, so figuring
out the exact number of memory barriers required by this code is a
little tricky.
It is somewhat simplified by the fact that the CAS probably acts as a memory
barrier on most machines.

Actually, I suspect that the code we code generate from this would be
more efficient
than Michael's non-blocking queue algorithm, partially because having
garbage collection
makes a number of things simpler (it eliminates the need for a
double-word CAS and a counter

I need to think this over some more, but I'm going to be brave and
post the code
I've got at the moment. I would be too surprised if I have to email
out some bug fixes
tomorrow morning.


Queue implementation that can be optimized to use non-blocking CAS:

public class Queue {

     static class Node {
         Node(Object o) {
             value = o;
         Object value;
         volatile Node next;

     private final Object lock = new Object();
     volatile Node head = new Node(null);
     volatile Node tail = head;

     void enqueue(Object o) {
         Node nn = new Node(o);
         Node t;
         while (true) {
             t = tail;
             synchronized(lock) {
                 if ( == null) {
            = nn;

             Node nxt =;
             synchronized(lock) {
                 if (tail == t) tail = nxt;
         synchronized(lock) {
             if (tail == t) tail = nn;

     Object dequeue() {
         Node h;
         while(true) {
             h = head;
             Node nxt =;
             if (nxt == null) return null;
             Node t = tail;
             if (h == t)
                 synchronized(lock) {
                     if (tail == t) tail = nxt;
             else synchronized(lock) {
                 if (head == h) {
                     head = nxt;
         Object tmp = h.value;
        h.value = null;
         return tmp;


JavaMemoryModel mailing list -

This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:22 EDT