JavaMemoryModel: cost of initialization-safety on an Alpha SMP

From: sanjay@pa.dec.com
Date: Wed Aug 18 1999 - 18:11:16 EDT


This note describes the results of experiments I ran to determine the
cost of initialization safety on an Alpha SMP. The bottom-line is a
slow-down of about 63% on Spec JVM98 programs when appropriate
memory-barriers are added to guarantee initialization safety for all
fields. This seems like too large a performance penalty to pay in
general, so I would advocate only requiring initialization-safety for
some classes (such as String) and/or fields that have been marked in
some way.

The rest of this message goes into more detail.

PROBLEM DESCRIPTION
===================

Consider the following execution sequences for two separate threads.
Assume that the type T is immutable after construction. Also assume
that "Foo.x" is a global variable which is initially null.

Thread 1:
        a: t = new T; // Allocate object
        b: start t.<init>(...) // Call constructor
        c: t.f = 1; // Set field inside constructor
        d: finish t.<init>(...) // Constructor finishes
        e: Foo.x = t; // Make object available to other threads

Thread 2:
        f: t = Foo.x; // Get published object
        g: if (t != null) {
        h: w = t.f; // Read field out of published object
        i: }

There is no synchronization in either of the two pieces of code, so
there is a data race between the initialization of "t.f" and its read.
What should be the value of "w"? The two camps, which I will call
"initialization-safety" and "data-race" camps say the following:

        initialization-safety: w == 1
        data-race: w == 0, or w == 1

Implementing initialization-safety will be easy on some architectures
and hard on others. For example, on a Sparc/TSO machine, just
generating the obvious load and store instructions will be sufficient.
On an Alpha, with its relaxed memory ordering rules, the following
code sequences will be needed:

Thread 1:
        a: t = new T;
        b: start t.<init>(...)
        c: t.f = 1;
        d: finish t.<init>(...)
           wmb // Prevent re-ordering of "c" and "e"
        e: Foo.x = t;

Thread 2:
        f: t = Foo.x;
        g: if (t != null) {
               mb // Prevent re-ordering of "f" and "h"
        h: w = t.f;
        i: }

The requirement for the memory-barrier between the two reads in thread 2
surprises most people since they do not realize that reads to locations
written by other processors have no implicit ordering guarantees.

SIMPLE SOLUTION
===============

This section describes a simple mechanism for implementing
initialization-safety on an Alpha. It can be easily extended to other
architectures.

1. Emit a write-memory-barrier (wmb) immediately after a constructor call
   for a new object returns.

2. Emit a memory-barrier (mb) immediately before a "getfield" instruction.

Note: No special rule is needed for the "aaload" instruction (loading
from an array of references) because we provide initialization safety
for the contents of an array only if the array is encapsulated within
an object and not modified after the object is constructed.

OPTIMIZATIONS
=============

Most of the slowdown caused by requiring initialization-safety is
caused by the memory-barriers on the reader side. If these
memory-barriers are blindly removed while the wmb's on the writer side
are left in, the overall slowdown is only about 1%. Therefore the
optimizations described in this section focus on removing the
memory-barriers on the reader side.

It's overkill to require a memory-barrier before every "getfield". For
example, consider the following code sequence:

        a: x = Foo.x;
        b: y = Foo.y;
        c: v = x.f;
        d: w = y.f;

Placing one memory-barrier just before "c" is sufficient: it obviates
the need to place a memory-barrier before "d". In general, a
memory-barrier before a read "R" through variable "v" may be omitted
if a memory-barrier appears between the operation that produced the
pointer that is in "v", and read "R".

The following forward-dataflow analysis of a method can be used to
determine the points where memory-barriers are needed. In the
description, "variable" refers to either a JVM local-variable, or a
slot on the operand-stack.

The state propagated by the dataflow algorithm consists of a flag for
each pointer variable. The flag for a variable says whether or not
the variable is "clean" or "dirty". The idea here is that a variable
will be marked "dirty" when a value from memory is loaded into it, and
will be cleaned as it passes a memory-barrier.

1. At the start of a method, all variables are marked "dirty".
2. If an instruction copies variable "v" to "w", "v"'s state is copied
   to "w". This takes care of all of the JVM stack/variable copying
   instructions.
3. If an instruction loads a pointer value from memory into variable "v",
   "v" is made "dirty" (the instructions that can load pointer values
   are: aaload, getfield, getstatic, invoke*).
4. Some instructions that produce pointer values into variable "v" cause
   that variable to be marked "clean" (aconst_null, ldc1, ldc2, new*).
5. At merge points, "dirty" and "clean" merge to "dirty".
6. At a "getfield" instruction, if the pointer input to the "getfield"
   is "dirty", a memory-barrier is issued and ALL variables are marked
   "clean".

A second optimization provides special treatment of "this". The dataflow
algorithm described above is modified as follows:

1. When making a non-static method-call, if the target of the method-call
   is "dirty", a memory-barrier is issued and all variables are marked
   as "clean".
2. When starting the dataflow for a non-static method, the variable
   corresponding to the "this" argument is initialized to the "clean"
   state.

Note that this second optimization could theoretically increase the
memory-barrier overhead, but in practice it improves performance.

RESULTS
=======

The results presented here are given as a slowdown on the Spec JVM98
benchmarks. The base-case is the overall Spec number (higher is
better) for a JVM that does not provide initialization safety. The
slow-down number reported for a given experiment is the Spec number
for the base case divided by the Spec number for the experiment.

Configuration:
        #Processors: 2
        Processor: 21264a
        Speed: 667 MHz
        On-chip-caches: 64KB D + 64KB I
        Board-cache: 4MB per processor
        RAM: 512MB
        OS: Tru64 Unix 4.0f
(The JVM used was built at the Systems Research Center. It contains a
just-in-time compiler, and good garbage-collection/synchronization
implementations. It's the fastest JVM for alphas that I know of.)

config slowdown
--------------------------------
Base 1.0 -- no initialization-safety
Simple scheme 3.0 -- mb before every getfield
Dataflow 1.87 -- Only dataflow optimization on
Dataflow+This 1.63 -- Both optimizations on

As you can see, the optimizations help significantly, but even so
there is a large slowdown. In my opinion, this slowdown is too large
to require initialization-safety for all fields. Some way of marking
the fields or classes that should be safe w.r.t. initialization races
will be preferable.

-Sanjay Ghemawat
 Compaq Systems Research Center

-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel



This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:18 EDT