RE: JavaMemoryModel: Re: "Double-Checked Locking is Broken"

From: TEREKHOV@de.ibm.com
Date: Wed Mar 28 2001 - 01:53:30 EST


Hi, terekhov@de.ibm.com wrote:

> "Boehm, Hans" <hans_boehm@hp.com> wrote:

[...]

> > In the per-CPU case, you may get switched between the CPU id lookup
> > and the actual read, so unless you somehow inhibit preemption,
> > you may still get the wrong one.
>
> yup. but this is not a problem (well, at least for DCL). memory view is
> the same after the switch as it was before it. So that if on some CPU
> a thread was allowed (via CPULocal) to pickup the singleton
> pointer/ref and use it, even after the switch to some other CPU it
> will be safe to continue w/o extra synch. (read memory barrier).

i think that now i see your point.. yeah, CPULocal::get should provide
atomicity with respect to "CPU id lookup and the actual read" - e.g.
re-check the CPU id after actual read and repeat read in the case of
mismatch.

thanks.

regards,
alexander.

---------------------- Forwarded by Alexander Terekhov/Germany/IBM on
03/28/2001 08:41 AM ---------------------------

TEREKHOV@de.ibm.com on 03/27/2001 09:45:00 PM

Please respond to TEREKHOV@de.ibm.com

To: "Boehm, Hans" <hans_boehm@hp.com>
cc: "'dl@cs.oswego.edu'" <dl@cs.oswego.edu>, dfb@watson.ibm.com,
      bogda@cs.ucsb.edu, paul@paulhaahr.com, tom@go2net.com,
      jmaessen@MIT.EDU, pugh@cs.umd.edu, egs@cs.washington.edu,
      schmidt@cs.wustl.edu, javaMemoryModel@cs.umd.edu

Subject: RE: JavaMemoryModel: Re: "Double-Checked Locking is Broken"

Hi, "Boehm, Hans" <hans_boehm@hp.com> wrote:

[...]

> And CPU locals don't seem to be very intuitive for the
> programmer.

agree. indeed, they are good for _very_few_ things (such as DCL ;-)

> I guess I'm still confused as to what we're trying to accomplish.
> Why are CPU locals better than thread locals?

they would just help to save some storage. with 2 CPUs, YY singletons
and XXX threads we would need 2 x YY CPULocal instances (instead
of XXX x YY ThreadLocal instances). however, they would also speedup
things a _little_bit, because it could also reduce number of
lock/unlock/set (with ThreadLocals we need XXX number of lock/unlock/set).

[...]

> In the per-CPU case, you may get switched between the CPU id lookup
> and the actual read, so unless you somehow inhibit preemption,
> you may still get the wrong one.

yup. but this is not a problem (well, at least for DCL). memory view is
the same after the switch as it was before it. So that if on some CPU
a thread was allowed (via CPULocal) to pickup the singleton
pointer/ref and use it, even after the switch to some other CPU it
will be safe to continue w/o extra synch. (read memory barrier).

regards,
alexander.

"Boehm, Hans" <hans_boehm@hp.com> on 03/27/2001 08:53:05 PM

Please respond to "Boehm, Hans" <hans_boehm@hp.com>

To: Alexander Terekhov/Germany/IBM@IBMDE, "Boehm, Hans"
      <hans_boehm@hp.com>
cc: "'dl@cs.oswego.edu'" <dl@cs.oswego.edu>, dfb@watson.ibm.com,
      bogda@cs.ucsb.edu, paul@paulhaahr.com, tom@go2net.com,
      jmaessen@MIT.EDU, pugh@cs.umd.edu, egs@cs.washington.edu,
      schmidt@cs.wustl.edu, javaMemoryModel@cs.umd.edu
Subject: RE: JavaMemoryModel: Re: "Double-Checked Locking is Broken"

> -----Original Message-----
> From: TEREKHOV@de.ibm.com [mailto:TEREKHOV@de.ibm.com]
>
> > Does that really buy you performance at user level? Unlike thread
> locals,
> > the semantics of "CPULocals" seem nontrivial, since you can
> ge preempted
> > halfway through an update ...
>
> that should not be a problem for atomic writes to
> pre-allocated storage.
Yes, but it seems to add lots of additional corner cases that would need to
be defined. And CPU locals don't seem to be very intuitive for the
programmer.

> and i also think that there should not be any problem to add
> full mutex
> based synchronization (still atomic writes but to dynamically
> allocated
> storage using internal DCL w/o memory barriers) so that
> get/lookup calls
> would still _not_ need any synchronization / memory barriers - that is
> what really buys performance. however, it is really important
> that set()
> after get() should not assume that it updates the same
> CPULocal variable
> which was checked via get() - CPU may change - a thread could
> be running
> on a different CPU after get() (fortunately that will make _no_
> difference with respect to memory visibility).
>
I guess I'm still confused as to what we're trying to accomplish. Why are
CPU locals better than thread locals?

Thread locals seem to be implementable with moderate overhead, though I
would clearly like to see faster implementations. But I'm having trouble
coming up with an implementation of user-level CPU locals that's much
faster. My understanding is that there are substantial costs associated
with having either per-thread or per-cpu address mappings. Thus in either
case, you would have to look up a (thread, CPU) id, and then use that to
either get to a local storage pointer in the (thread, CPU) descriptor, or
use that as an index into a multiple-concurrent-reader hash table of some
sort. In the per-CPU case, you may get switched between the CPU id lookup
and the actual read, so unless you somehow inhibit preemption, you may
still
get the wrong one.

Hans

-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel

-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:30 EDT