JavaMemoryModel: Re: "Double-Checked Locking is Broken"

From: Doug Lea (
Date: Sat Mar 24 2001 - 13:53:51 EST

Hi, I'm CCing this reply to the JavaMemoryModel list, so I'll
repeat most of your mail:

> i believe that a portable solution for Double-Checked Locking in a
> multithreaded environment does exist. (well, at least for POSIX and
> C/C++) it is quite simple and does not require _synchronization_ on
> _every_ call and also does not incur the _memory_barrier_ overhead
> on _every_ call. however, it does incur another type of overhead -
> the overhead related to access to thread specific data (storage):

  [first version of this code omitted]

> static Singleton* g_pSingleton = 0;
> _thread_specific_ Singleton* tsd_pSingleton = 0; // thread_specific ptr
> (pseudo-code)
> Singleton*
> Singleton::get_instance()
> {
> Singleton* pSingleton = tsd_pSinglton; // tsd::get
> if ( 0 == pSingleton ) {
> {(
> Guard guard( somelock );
> if ( 0 == g_pSingleton ) {
> g_pSingleton = new Singleton();
> }
> }
> tsd_pSingleton = pSingleton = g_pSingleton; // tsd::set
> }
> return pSingleton;
> }
> or.. other variations with one tsd key/lock for multiple singletons...

Yes, I think this can be a reasonable tactic on systems with
thread-specific-data/thread-local-storage, where you can set up this
kind of "diffusion" to safely propagate values/references across
threads. In Java, it might look like:

  class Singleton {
    final static ThreadLocal perThreadInstance = new ThreadLocal();
    static Singleton theInstance;

    static Singleton getInstance() {
      // get our thread's reference to the singleton
      Singleton instance = (Singleton)(perThreadInstance.get());

      if (instance == null) {

        // Might as well use the ThreadLocal itself as the lock
        synchronized(perThreadInstance) {
          instance = theInstance;
          if (instance == null)
            instance = theInstance = new Singleton();

        // copy global to per-thread

      return instance;

This guarantees that "theInstance" is always accessed under the common

In POSIX, I'm not sure how to do allocation, cleanup, and management
of TSD. These could be problems, especially for singletons in dlls.
(Or maybe not. I hope someone else knows.) Otherwise, so long as you
aren't using very large numbers of singletons and threads, this looks
like a competitive option.

> Note that it would be sufficient to have a processor specific flag
> (instead of thread specific flag) but AFAIK there is no portable
> way for doing it...

Right. Maybe we need "CPULocals" in Java.

Out of curiosity, I coded up a quick performance test (attached). It
has 8 threads each calling getInstance() and then running a trivial
method on the singleton, a million times on a 4-CPU sparc. (This is
an extreme and unrealistic test, but accentuates differences.)

The performance is pretty good, but, on a Sparc, not as good as a
version using lazy-singletons implemented as volatiles, run on a
preliminary test version of hotspot1.4 that conforms to our proposed
revised "volatile" spec. Results (median of three runs) are:

Eager: 384ms
Volatile: 408ms
ThreadSpecific: 1859ms
Synch: 33944ms

On machines with costlier memory barriers, the relative cost
of thread-specific version vs volatile will probably be smaller.

Reminder to others on this list: In Java, because of dynamic class
loading, there is much less motivation to do ANY kind of lazy
initialization of singletons.

Doug Lea, Computer Science Department, SUNY Oswego, Oswego, NY 13126 USA 315-312-2688 FAX:315-312-5424  

------------------------------- JavaMemoryModel mailing list -

This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:30 EDT