JSR-000XXX Revise the Specification of the JavaTM Threads

Identification | Request | Contributions

Section 1: Identification

Submitting Participant: Univ. of Maryland
Name of Contact Person: William Pugh
E-Mail Address: pugh@cs.umd.edu
Telephone Number: 301-405-2705
Fax Number: 301-405-2744

Section 2: Request

2.1 - Please describe the proposed Specification:

A specification that describes the semantics of threads, locks, volatile variables and data races. This specification will be a replacement for Chapter 17 of the Java Language Specification (and Chapter 8 of the Java Virtual Machine Specification).

2.2 - What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

All platforms

2.3 - What need of the Java community will be addressed by the proposed specification?

Programmers need to be able to understand which thread communication idioms are legal and to write reliable multithreaded software; JVM implementors need to be able to implement a high-performance JVM without violating the Java specification.

2.4 - Why isn't this need met by existing specifications?

Chapter 17 of the Java Language Specification (and Chapter 8 of the Java Virtual Machine Specification) describe the semantics of threads and locks, as well as related features such as volatile variables.

Unfortunately, that specification has been found to be very hard to understand and has many subtle, unintended implications. It is unclear if anyone actually understands the entire specification and its implications. Many synchronization idioms recommended in books and articles are invalid according to the existing specification. Subtle, unintended implications of the existing specification prohibits common compiler optimizations done by essentially all Java virtual machines and would be prohibitively expensive to enforce on many existing processor architectures.

A number of people have looked at the specification and decided that patching or modifying the existing specification could not produce a satisfactory and understandable result. Therefore, we recommend that a replacement specification be developed.

2.5 - Please give a short description of the underlying technology or technologies:

Some of the issues/features/goals we would consider in revising this specification are:

  1. For "correctly synchronized" programs, the semantics should be simple and intuitive. Of course, this depends on defining "correctly synchronized" both formally and intuitively.
  2. For incorrectly synchronized programs, it is not acceptable to just say that the semantics are undefined (as has been done in a number of other language specifications, such as Modula-3 and Ada). Instead, we need to determine which safety guarantees need to be made so that incorrectly synchronized programs cannot be used to attack the security of a system.
  3. A primary concern is the ability of unsophisticated programmers to create reliable/correct multithreaded programs. To accomplish this goal, we must balance the needs for
    1. a simple, easy to understand model, and
    2. a model that does not overly restrict the possible ways to write reliable programs.
  4. A secondary concern is to allow the creation of high performance JVM implementations across a wide range of platforms.
  5. There exist a number of dubious coding idioms, such as the double-check idiom, that are designed to allow threads to communicate without synchronization. Almost all such idioms are broken under the existing semantics. Changing the semantics to allow such idioms to work would impose substantial performance penalties on certain platforms, even for code that did not use the dubious idioms.
    1. It is expected than many of these synchronization-avoiding idioms will also be broken under the revised semantics.
    2. We will develop educational material and lead an educational effort to inform developers of commonly used incorrect idioms.
    3. Where possible, we will develop tools that statically detect some occurrences of common incorrect idioms.
    4. Strengthened semantics for volatile (F) should allow many of these idioms to be fixed by making a single field volatile.
    5. Except for a very few subtle and dubious cases, we do not anticipate breaking any code that is guaranteed to work under the existing semantics.
  6. The ability to declare a field as volatile was adopted from C/C++, where it was originally use for memory-mapped I/O devices. In Java, volatile is primarily/solely used for fields that will be accessed without synchronization.
    1. Very little existing Java code uses volatile, because programmers are unsure of the semantics of volatile and because they are unaware of the importance of volatile for code not using synchronization.
    2. Unfortunately, the existing semantics of volatile are weak enough that many (apparently reasonable) uses of volatile are invalid. Furthermore, most existing JVM's do not correctly implement the existing semantics for volatile.
    3. We will look at strengthening the semantics of volatile to make it easier to use correctly.
  7. Most programmers assume that immutable objects -- objects whose fields are only set in their constructor -- such as String do not need synchronization. Unfortunately, in an incorrectly synchronized program, it is possible for a thread to observe a unsynchronized immutable object change. In particular, it is possible for a String to first appear to have the value "/tmp", and on later observation appear to have the value "/usr". This has clear and serious implications for security.
    1. The existing semantics allow this behavior, as do the specifications of shared memory multiprocessors with weak memory models. This could be fixed by making all of the methods of the String class synchronized. But this would be non-intuitive, and would impose a performance penalty on all Java platforms, even though it is needed only on a tiny percentage of them.
    2. We need to allow programmers to create classes that represent truly immutable objects, while not imposing a significant performance penalty on platforms where nothing needs to be done to ensure true immutability.
    3. We expect to do this by strengthening the semantics of final fields to allow a guarantee of true immutability, even in the presence of data races. This strengthening would also permit more aggressive compiler optimization of code using final fields.
    4. As part of this change, we may prohibit the use of native code and/or reflection to change final fields (with some sort of back door provided to allow backwards compatibility for System.in, System.out and System.err).
  8. As part of this effort, we should try to understand the potential implementation impact of any proposed semantics. In particular, some proposed semantics will be more expensive to implement on processors with weak memory models and expensive synchronization.
  9. Some of the changes contemplated, particularly the changes to the semantics of volatile and final, will require that JVM's be changed in order to be compliant with the new specification.
  10. We will develop compatibility tests that will automatically test whether a JVM enforces some of the guarantees made by the thread specification (other guarantees may be difficult or infeasible to test).
  11. We will also consider standards as to how the thread safety properties of an API should be documented. Javadoc (correctly) does not list whether a method is synchronized, because a method could be thread safe due to internal synchronization. For example, is java.io.ByteArrayInputStream guaranteed to be thread safe? Nothing in the document says that it is, but Sun's standard implementation is. Making the implementations of input and output streams was a dubious decision incurring substantial performance penalties. Could a valid Java implementation provide unsynchronized I/O streams?
    One possibility would be devise Javadoc tags for thread safety and guidelines for using them.
  12. The semantics of multithreaded programs are far more subtle than previously thought. Even simple ideas like "happened previously" can become subtle and complicated in the presence of multiple threads. We will review other thread related issues, such as class initialization, asynchronous exceptions, finalizers, sleep, wait, join and interrupts, in accordance with the goals described above.

2.6 - Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, com.something, etc.)

No

2.7 - Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

Many of the most surprising behaviors of a multithreaded program can happen only on a shared memory multiprocessor with a weak memory model (e.g., SMP Alpha systems). Similarly, the cost of strengthening the specification will be highest on shared memory multiprocessors with a weak memory model.

However, the specification should make the same guarantees about behavior on all platforms, even if in practice it might be impossible for some legal behaviors to occur on certain platforms.

2.8 - Are there any security issues that cannot be addressed by the current security model?

No

2.9 - Are there any internationalization or localization issues?

No.

2.10 - Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

Chapters 17 of the JLS and chapter 8 of the JVMS will be completely replaced. Other changes may be needed elsewhere in the JLS and JVMS.

Section 3: Contributions

3.1 - Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

Discussion of the issues in this JSR has been ongoing for some time. Some of this discussions have taken place on the Java Memory Model mailing list. The Java Memory Model web page contains archives of those discussions, as well as links to related resources. Some of the relevant resources:

  1. Technical papers:
    1. Fixing the Java memory model, Proceedings of the ACM 1999 conference on Java Grande, 1999 Pages 89-98. (Slides from talk) -- this paper sets out some of the problems with the existing specification, although the solution proposed in this paper has been found to be inadequate and it not recommended for consideration.
    2. Improving the Java Memory Model with CRF, Jan-Willem Maessen, Arvind and Xiaowei Shen, Computation Structures Group Memo 428, MIT, 2000.
    3. Bill's current proposal

  2. Descriptions of double-check idiom:
    1. Lazy instantiation, Philip Bishop and Nigel Warren, JavaWorld Magazine
    2. Reality Check, Douglas C. Schmidt, C++ Report, SIGS, Vol. 8, No. 3, March 1996.
    3. Double-Checked Locking: An Optimization Pattern for Efficiently Initializing and Accessing Thread-safe Objects, Douglas Schmidt and Tim Harrison. 3rd annual Pattern Languages of Program Design conference, 1996
    4. Programming Java threads in the real world, Part 7, Allen Holub, Javaworld Magazine, April 1999.

3.2 - Explanation of how these items might be used as a starting point for the work.

The specifications listed above could be used as a basis for the public draft. Where they fall short of stated goals or constraints, further work will be needed to determine the best course of action