Re: JavaMemoryModel: String literals and String.intern()

From: David Detlefs - Sun Microsystems Labs BOS (david.detlefs@sun.com)
Date: Mon Apr 19 2004 - 09:48:39 EDT


Eliot, Thomas, Doron --

> If strings are actually interned, then there is an interning table
> somewhere. To be able to collect such strings, the table would have to
> operate using weak pointers (one of the java.lang.Reference
> classes). Perhaps it is indeed coded that way.

True in the Sun implementation.

> However, if any thread has
> hold of such an object (e.g., via having synchronized on it but not yet
> released the lock), then the object will not be collected.

Yes. But I still find Doron's example interesting:

Thread 1, Thread 2:
    synchronized ("some string") {
        ++a;
    }

Let's expand Doron's example a little further. Let's say that we have

class C0 {
  public static int a = 0;
}

class C1 {
  void foo() {
    synchronized ("some string") {
        ++a;
    }
  }
}

class C2 {
  void foo() {
    synchronized ("some string") {
        ++a;
    }
  }
}

Consider the following execution.

Main thread
-----------
Initialized C0
Creates and starts
   Thread1, Thread2

Thread1
-------
Loads C1, which interns "some string".
executes C1.foo to completion.

A GC happens, which unloads C1 -- "some string" is now unreferenced,
and the collection removes it from the interned string table.

Thread 2
--------
Loads C2, interning a new instance of "some string".
It executes C2.foo.

Now, it seems to me that Threads 1 and 2 never synchronized on the
same object, and therefore we could not conclude that C0.a == 2 at the
end.

While this is an interesting example, I don't think this is a memory
model issue: it's an illustration one has to be careful about what
string interning means in the presence of garbage collection,
especially garbage collection with class unloading:

* For a given string constant s, to conclude that two invocations of
  String.intern(s) at times t1 and t2 return the same (==) result,
  you'd have to show that the result of the first invocation remained
  reachable for the entire interval [t1..t2].

* Of course, the definition of "reachable" in the above has to take
  into account the rules for reachability of classes and class
  unloading. Remember that these rules are (unfortunately) somewhat
  complicated, having a lot to do with the reachability of the class
  loader that loaded the class.

This is what I would favor. The alternative is Eliot's argument:

> On the other hand, if there is an intervening garbage collection, I claim
> that all threads have synchronized (one way or another) with the GC,

Introducing this idea into the semantics seems to let the camel's nose
into the tent: this would be first time GC (as opposed to more
abstract definitions involving reachability) would be mentioned in the
semantics, right?

Maybe there's another way to phrase this idea that doesn't introduce
GC explicitly into the semantics, but I don't see immediately see it.

Maybe this has already been said, but you could have the same problem
with (for example) a static factory method to map integer constants
to canonical corresponding java.lang.Integer instances, using a
weak-reference-based backing table. If you assumed too much about the
equality of such results globally, by for example synchronizing on
them, you could get into the same kind of trouble.

-- 
============================================= AntiSpamString: nix flubber now
=============================================================================
Dave Detlefs                           http://www.sunlabs.com/people/detlefs/
Sun Microsystems Laboratories                           david.detlefs@sun.com
1 Network Drive, Burlington, MA 01803-0902                     (781)-442-0841

------------------------------- JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:01:05 EDT