JavaMemoryModel: Cost of memory barriers, and the difficulty of removing them

From: Bill Pugh (pugh@cs.umd.edu)
Date: Tue Jul 06 1999 - 08:54:19 EDT


As a number of people have noted, there are a number of compiler
analysis techniques that would allow you to remove lock operations.

1) If you know that you already held a lock on an object, you
wouldn't have to try to lock it a second time (e.g., when a
synchronized method calls another synchronized method on the same
object)

2) If you know that no other thread has a reference to the object
(perhaps the object has just been allocated and you know it hasn't
escaped yet).

While these techniques would allow you to eliminate the locks
associated with the synchronization. However, the current semantics
of monitorenter and monitorexit essentially require a memory barrier
at both enter and exit.
You can do some analysis and transformations to move memory barriers
across operations that aren't visible to other threads, and to
combine adjacent memory barriers. But removing them altogether is
difficult.

For example, the following (contrived) code:

Point tmp = new Point();
synchronized (tmp) { tmp.x = 17; }
tmp.y = 41;
synchronized (tmp) { tmp.y++; }

would have the following naive translation to intermediate code (treating
lock and unlock as separate from memory barriers):

R1 = new Point();
lock R1
memoryBarrier
R1.x = 17
memoryBarrier
unlock R1
R1.y = 41
lock R1
memoryBarrier
R2 = R1.y
R2 = R2 + 1
R1.y = R2
memoryBarrier
unlock R1

Assuming we have analyzed Point() are know that it doesn't allow the
reference to the newly allocate point to escape, we know that no
other thread has a reference to the Point referenced by R1. So we can
eliminate all of the lock and unlock instructions:

R1 = new Point();
memoryBarrier
R1.x = 17
memoryBarrier
R1.y = 41
memoryBarrier
R2 = R1.y
R2 = R2 + 1
R1.y = R2
memoryBarrier

Now, we can move a memoryBarrier past a read/write to a memory
location that isn't yet visible to any other thread, giving us:

R1 = new Point();
R1.x = 17
R1.y = 41
R2 = R1.y
R2 = R2 + 1
R1.y = R2
memoryBarrier
memoryBarrier
memoryBarrier
memoryBarrier

We can now combine memory barriers, do forward substitution and
constant propagation, getting:

R1 = new Point();
R1.x = 17
R1.y = 42
memoryBarrier

However, we can't get rid of that remaining memory barrier.

Another way of looking at it, is should the following statement be a
no-op, or should it be a memory barrier:

synchronized (new Object()) {}

Right now, the semantics of a synchronized block are approximately
that monitorenter and monitorexit are memory barriers. Are there
alternative definitions of the flushing/barrier effects of memory
barriers that would still
work for the uses people are making of synchronized blocks, yet work
better for high performance parallel processors? Some of the people
from IBM have said that the inability to remove memory barriers
required by synchronization on thread-local objects has a substantial
performance penalty (having real numbers on this would be great).

Part of the problem is that all of the semantics we've talked about
treat the monitorenter and monitorexit instructions as isolated
instructions. If our semantics took into account the fact that they
must be executed as matched pairs, and defined the semantics in terms
of matched pairs, could we get something that would be easier to
remove?

Here are some possibilities. I haven't thought these out in detail,
so I'm not proposing them nor am I prepared to defend them. Just
tossing them out:

Possibility 1:

        The flushing caused by monitorenter/monitorexit only applies to
        memory locations read or written between the monitorenter and
        monitorexit.

Possibility 2:

        No other thread is allowed to see a memory action inside the
synchronized block precede a memory action before the synchronized
block, nor is any other thread allowed to see a memory action inside
the synchronized block follow a memory action after the synchronized
block. However, transitive ordering constraints to do not apply. If a
thread doesn't see any memory actions inside the synchronized block,
it may see actions before the synchronized block follow actions after
the synchronized block. In the case where all of the memory
references inside a synchronized block are to thread local memory,
all memory barriers can be eliminated.

        Bill

-------------------------------
This is the JavaMemoryModel mailing list, managed by Majordomo 1.94.4.

To send a message to the list, email JavaMemoryModel@cs.umd.edu
To send a request to the list, email majordomo@cs.umd.edu and put
your request in the body of the message (use the request "help" for help).
For more information, visit http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:16 EDT