Re: JavaMemoryModel: Finalization idioms

From: David Detlefs - Sun Microsystems Labs BOS (david.detlefs@sun.com)
Date: Mon May 02 2005 - 10:10:28 EDT


Jeremy says --

> To David D. and Evan - You could finalize man in any of the places David
> H. mentions if useA() and useB() don't access this, and are inlined by
> the compiler. In such code, there would be no uses of man after its
> initialization.

And David Holmes confirms that this was the intention of his example:

> Thanks Jeremy, I thought for a moment I had completely misunderstood this
> issue once again. Though your clarification does restrict the problem to a
> subset of situations to what I was thinking - ie I don't completely
> understand it either.

And continues:

> To Dave Detlets [Well, Detlefs, actually :-], and Evan, the issue
> relates to the optimization mentioned in JLS 12.6.1:
>
> "A reachable object is any object that can be accessed in any potential
> continuing computation from any live thread. Optimizing transformations of a
> program can be designed that reduce the number of objects that are reachable
> to be less than those which would naively be considered reachable."

Thank you, that helps make your intent more explicit, and it is an
interesting point. I had been thinking that since in Java all methods
are virtual, and require a null pointer check on the invocation
target, that this (at the very least) would make the invocation target
reachable. But I will concede the point: you could verify that the
reference is non-NULL, while still collecting the referent, when none of
its fields will ever be accessed again.

However: let's reconsider Hans' original motivating example. We have
a static native array holding some native resource. Allocating
a Foo object also allocates an integer index into this array,
associating the Foo with the native resource. This index is stored in
the Foo object. The finalizer reads this index, and makes it free,
available for later allocation.

Note that this situation is not compatible with your example. If
ResourceManager=Foo, and neither useA nor useA access the index, then
they must not access the native resource, since the index is required,
and it is perfectly OK for the finalizer to run concurrently with
them.

I don't know how one would phrase a proof, but the argument seems
pretty convincing to me: if a connection between a Java instance and a
native resource is made by a field in an object, then finalizers and
methods that use the resource will both use the field, and both access
this. If they don't make a connection that way, then methods
accessing the resource already must considered concurrent, and a
finalizer is just another such method.

Anyone else find this convincing?

> The JLS is not clear on what sort of optimisations are possible. The
> discussion on this list in the past seemed - from my recollection - to
> imply more situations that the JLS alludes to. The motivation for my
> example was the idea that the local reference to the resource manager
> object might be stored in a register and not be seen during GC as
> referring to a live object, hence the object could be considered
> unreachable and so finalizable.

Well, I think optimizations consistent with section 12.6.1, which you
quoted above. I.e., it's certainly not OK for a GC to consider an
object unreachable if its only reference is stored in a register, *if*
that reference may be used again in the future! I think this is the
intent of the "naive" that you took objection to:

> (I resent the use of "naive" there. I don't consider it to be naive to think
> that an object is reachable when it is still being used!)

I think the intention is that it is "naive" to assume that if the
operational semantics of bytecode interpretation say that some local
variable, ostack location, or field contains a pointer to object X at
program point P, that X is therefore reachable at P -- if this value
is "dead", that is, will not be accessed in the future, then X may be
considered unreachable.

> To Hans: What you have said seems to imply that the existence of the sync
> block requires the system to know that the sync block exists and act
> accordingly. I don't understand this.

Given what I said above, a still-undefined concept is "access", which
I used in the phrase "accessed in the future" above. Hans wants to
recommend *something* that users can put in methods like your "useB"
that count as "accesses" wrt reachability, so that the "this" must be
considered reachable by a (correct) VM.

> I find it hard to believe that the VM/JIT would actively try to
> perform this optimization.

The optimization of deciding whether a local variable would be
accessed in the future? Many do, and it can be an important
optimization. Consider:

  void foo() {
     DataStructure dt = allocate_ten_meg_data_structure();
     DataStructure2 dt2 = calc_some_derived_data_structure(dt);
     do_rest_of_program(dt2);
  }

Assume that "calc_some_derived_data_structure(dt)" is purely
functional; it doesn't store "dt" anywhere. When does the local
variable "dt" cease to be a root that makes this data structure
reachable? "Naively", it's at the end of "foo". But a better answer
immediately at its last use, after it has been copied onto the ostack
to become an argument to "calc_some_derived_data_structure".

> My thought - which could be quite wrong - was that it arose as
> a side-effect of the JIT's actions eg. a local reference is stored in a
> register and so is not seen by GC and thus an object is considered
> unreachable.

Modern VM's (at least the correct ones :-) certainly deal comfortably
with references in machine registers.

> However, Jeremy also indicated that the premature finalization could only
> occur if the methods useA and useB don't refer to 'this' - yet the JIT
> doesn't know this without examining the internals of those methods, which
> again seems unreasonable to me.

He also said "and are inlined by the compiler", meaning the JIT
compiler, in which case it's perfectly reasonable for the JIT compiler
to examine the internals of those methods.

Look, objects are only going to become unreachable early via
aggressive compiler optimization. A (correct) compiler can only be as
aggressive as allowed by the semantics and the analysis it does of the
portion of the program it sees. If the JIT compilation of your
"someMethod" doesn't inline "useB" to determine whether or not it uses
its "this" argument, then it must conservatively assume it will, and
keep "man" alive locally up until the point of the call, after which
it can let "useB" keep its "this" alive or not, depending on the
semantics of "useB". Hans is trying to allow the user a mechanism for
requiring the compiler to keep "this" alive to the end of "useB".

> So I'm left - as usual - with no clear understanding of exactly what kinds
> of situations can lead to premature finalizer execution. And hence no clear
> understanding of the validity of proposed solutions.

I hope this helps somewhat...

-- 
=============================================================================
Dave Detlefs                           http://www.sunlabs.com/people/detlefs/
Sun Microsystems Laboratories                           david.detlefs@sun.com
1 Network Drive, Burlington, MA 01803-0902                     (781)-442-0841

------------------------------- JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:01:10 EDT