JavaMemoryModel: Re: "Double-Checked Locking is Broken"

From: Doug Lea (
Date: Mon Mar 26 2001 - 22:12:43 EST

> last question.. is there any overhead related to type-cast
> here:
> > > static Singleton getInstance() {
> > > // get our thread's reference to the singleton
> > > Singleton instance = (Singleton)(perThreadInstance.get());
> ^^^^^^^^^^^

This is probably the least of your concerns. Casts are relatively
cheap. The main concern, that I should have mentioned before, is that
ThreadLocal varies tremendously in speed across JVMs and JDK versions.
On most 1.2.x JVMs, performance is so bad in this context that you'd
never want to use it. (The main reason is that until 1.3 ThreadLocal
internally used WeakHashMaps, which are needlessly heavy. The 1.4
version will in turn be faster than 1.3.)

You can usually avoid this uncertainty though if you need to.

If you can create and use your own thread subclass, you can implement
your own variants of ThreadLocals. (See Section of the 2nd
edition of my CPJ book). In fact, if you know in advance all of the
singletons you'll use, you don't need a table, just fields in the
thread subclass will do. You can squeeze times even further if you
can just pass in Thread refs rather than looking it up each time via
Thread.currentThread. The attached file shows examples/hacks. I'm not
sure I recommend any of this, but if you are going to go this route,
you might as well make it both fast and correct.


Due to the nice folks at, I did test
out some of this on alphas. (Testdrive is a very nice service! Anyone
can register. It would be great if other MP vendors did this too.)

The fastest versions of Java I could find on MP alphas at testdrive
were 1.2.2 VMs on a 2X500 running Tru64 and a 4X667 running linux. The
4-CPU box failed some of Bill's "volatile" tests (at I gather that these
JVMs don't use enough barriers even for "old" volatile (which is
itself insufficient to guarantee double check).

The machines were NOT idle (load average was usually around one), but
repeated tests gave about the same ratios, so these figures are
probably in the right ballpark.

Here are results (the 3rd and 4th columns are 4-CPU sparc, and the last 2
columns are results on basically the same tests, taken from last post)
Table entries are ratios compared to "Eager" version of Singleton.

chip alpha alpha sparc sparc x86 sparc
OS linux Tru64 sol 8 sol8 ? Sol 98
JDK 1.2.2 1.2.2 1.3 1.2.2_07 1.3 1.3

Eager 1.00 1.00 1.00 1.00 1.00 1.00
Volatile(DCL) 1.09 1.01 1.22 1.34 1.31 1.18
ThreadLocal 300.80 17.84 6.32 240.74 6.50 5.01
SimThreadLocal 4.43 4.19 4.81 2.39 ? ?
Synch 189.26 5.73 69.03 66.41 32.12 9.64
Thread Field 2.16 2.71 4.16 2.00 ? ?
Direct Field 1.00 1.25 1.18 1.29 ? ?


* The run on 4-CPU sparc under 1.2.2_07 demonstrates above
  remark that ThreadLocal was unusable in this context until 1.3.

* SimThreadLocal handcrafts something close to the 1.4 ThreadLocal
  implementation, in a way that works on pre-1.4.

* Again, I'm pretty sure the alpha JVMs didn't put in enough barriers
  in Volatils/DCL code. This is not their fault. They weren't
  required to. But these results are wrong (too fast) for a properly
  barriered version. In fact, on this set of runs, NONE of the
  volatile results are likely to be exactly right (all too fast).

* "Direct Field" differs from "Thread Field" by directly referencing
  the singleton field off the thread object rather than going through
  Thread.currentThread. This doesn't apply very often in practice,
  but shows the best possible results you could ever get via this
  kind of design.

* As always, remember that this is a microbenchmark, that might
  not have much relevance to practical use of singletons.

Doug Lea, Computer Science Department, SUNY Oswego, Oswego, NY 13126 USA 315-312-2688 FAX:315-312-5424  

------------------------------- JavaMemoryModel mailing list -

This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:30 EDT