OK, I've finally got some results to report.
I'm going to describe a way to implement atomic object creation on
Alpha chips (other processors should be easier or not require
anything special). For purposes of this discussion, the creation of
an object ends when a reference to the object is store in the heap or
the constructor finishes, whatever comes first. I'm not really going
to discuss the case in which a reference to the object escapes before
the object is constructed, and there are probably some loose ends
that need to be worked out in that case. Similarly, I won't discuss
where the write barriers are required when objects are constructed;
it is fairly straightforward and not very expensive.
One way to implement atomic object creation is, as Sanjay suggested,
to put a memory barrier in front of each getfield. You can then try
to optimize away some of the memory barriers.
An alternative is to maintain an invariant that if a thread/stack has
a reference to a heap allocated object, then the thread has done the
memory barriers required to see the constructed version of that
object. The simple implementation of this is to put a memory barrier
after each operation that loads a reference (getfield, getstatic or
aaload). Since we only need to put a barrier after getfields that
load a reference/address, as opposed to putting one before all
getfields, we can hope that we do better. Note that since arrays
don't have constructors, we can decide to skip the memory barrier for
loading references to arrays, although it means we need to worry
about things like getting zero for array length.
A further optimization is to note that we don't need a memory barrier
after a getfield of a final field. The reasoning here is that if a
thread is up to date with respect to an object X, and then loads a
final field of X that references Y, then Y must be older than X and
the thread must therefore be up to date with respect to Y.
We can do a similar optimization for final static fields.
Also, if the reference we load is null, we can skip the memory
barrier (this would involve a runtime test to see if the ref is null,
and doing the memory barrier only if it is not).
I've got experimental results for all of the above, more on them in a moment.
We could also do optimizations to combine memory barriers (although I
don't have any studies on that). Combining memory barriers would
interact with only doing memory barriers for non-null references.
OK, on to the experiments.
Jeremy Manson (a student working with me) instrumented Sun's jdk1.2
jvm interpreter to collect lots of statistics. We ran the jvm in pure
interpreted mode (even turning off the quick opcodes for getfield and
getstatic), and didn't do any optimizations to eliminate redundant
getfields or combine memory barriers (which would probably help our
results). We simple counted statistics.
In addition to running on the original spec jvm98 programs, I wrote a
tool that made fields into blank finals if doing so was legal, and
applied it to the entire spec code. In some cases, this requires
assuming that the package is closed, because otherwise a package
accessible field might be modified by a class loaded later, in which
case it couldn't be final. In some important cases, it made protected
fields final, which is even more dubious. However, the authors of the
spec jvm98 code (compress in particular) obviously didn't understand
protected, because there is absolutely no reason for the fields to be
declared protected. At any rate, this transformation could be done as
a load-time analysis that could be de-optimized if code loaded later
The results are available as an html table, an excel spreadsheet or a PDF file:
To quickly summarize, the total for one execution of each of the
programs in the spec jvm98 suite:
# of getfields 202,869,185
# of loads of refs 64,993,844
# of nonfinal loads of refs 64,809,454
# of nonfinal nonnull loads of refs 50,011,951
Once modified to make more fields final:
# of nonfinal loads of refs 22,628,420
# of nonfinal nonnull loads of refs 7,832,852
So doing this gives us about a factor of 25 improvement over the
number of barriers required if we associate a barrier with each
By the way, looking at the total is deceptive. The number of getfield
instructions in the spec jvm98 benchmark is dominated by the compress
However, one big problem I don't understand. Sanjay's numbers came out to be
4,329,317,558 total memory barriers for one run of each program (his
original numbers reflected 3 runs of each program). That is about 21
times more memory barriers that we get if we simple instrument every
getfield. I guess Sanjay and I will have to figure out why the huge
difference and get back to you on that.
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:19 EDT