Hi, "Boehm, Hans" <email@example.com> wrote:
> And CPU locals don't seem to be very intuitive for the
agree. indeed, they are good for _very_few_ things (such as DCL ;-)
> I guess I'm still confused as to what we're trying to accomplish.
> Why are CPU locals better than thread locals?
they would just help to save some storage. with 2 CPUs, YY singletons
and XXX threads we would need 2 x YY CPULocal instances (instead
of XXX x YY ThreadLocal instances). however, they would also speedup
things a _little_bit, because it could also reduce number of
lock/unlock/set (with ThreadLocals we need XXX number of lock/unlock/set).
> In the per-CPU case, you may get switched between the CPU id lookup
> and the actual read, so unless you somehow inhibit preemption,
> you may still get the wrong one.
yup. but this is not a problem (well, at least for DCL). memory view is
the same after the switch as it was before it. So that if on some CPU
a thread was allowed (via CPULocal) to pickup the singleton
pointer/ref and use it, even after the switch to some other CPU it
will be safe to continue w/o extra synch. (read memory barrier).
"Boehm, Hans" <firstname.lastname@example.org> on 03/27/2001 08:53:05 PM
Please respond to "Boehm, Hans" <email@example.com>
To: Alexander Terekhov/Germany/IBM@IBMDE, "Boehm, Hans"
cc: "'firstname.lastname@example.org'" <email@example.com>, firstname.lastname@example.org,
email@example.com, firstname.lastname@example.org, email@example.com,
jmaessen@MIT.EDU, firstname.lastname@example.org, email@example.com,
Subject: RE: JavaMemoryModel: Re: "Double-Checked Locking is Broken"
> -----Original Message-----
> From: TEREKHOV@de.ibm.com [mailto:TEREKHOV@de.ibm.com]
> > Does that really buy you performance at user level? Unlike thread
> > the semantics of "CPULocals" seem nontrivial, since you can
> ge preempted
> > halfway through an update ...
> that should not be a problem for atomic writes to
> pre-allocated storage.
Yes, but it seems to add lots of additional corner cases that would need to
be defined. And CPU locals don't seem to be very intuitive for the
> and i also think that there should not be any problem to add
> full mutex
> based synchronization (still atomic writes but to dynamically
> storage using internal DCL w/o memory barriers) so that
> get/lookup calls
> would still _not_ need any synchronization / memory barriers - that is
> what really buys performance. however, it is really important
> that set()
> after get() should not assume that it updates the same
> CPULocal variable
> which was checked via get() - CPU may change - a thread could
> be running
> on a different CPU after get() (fortunately that will make _no_
> difference with respect to memory visibility).
I guess I'm still confused as to what we're trying to accomplish. Why are
CPU locals better than thread locals?
Thread locals seem to be implementable with moderate overhead, though I
would clearly like to see faster implementations. But I'm having trouble
coming up with an implementation of user-level CPU locals that's much
faster. My understanding is that there are substantial costs associated
with having either per-thread or per-cpu address mappings. Thus in either
case, you would have to look up a (thread, CPU) id, and then use that to
either get to a local storage pointer in the (thread, CPU) descriptor, or
use that as an index into a multiple-concurrent-reader hash table of some
sort. In the per-CPU case, you may get switched between the CPU id lookup
and the actual read, so unless you somehow inhibit preemption, you may
get the wrong one.
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:30 EDT