If you look at Doug Lea's JSR133 cookbook page (http://gee.cs.oswego.edu/dl/jmm/cookbook.html), there is only one common architecture which makes it cheap to implement sequential consistency, and that's HP's PA-RISC. And HP is clearly moving towards Itanium. I think the probability that most of us will switch to a fundamentally new architecture, or an old architecture with a new memory model, within the next 5 years is near zero. As a practical matter, I don't think SC will be viable for a long time, if ever.
My guess is that it's relatively cheaper to implement SC on Itanium than most other processors, but not cheap enough. All operations involving the heap would need acquire loads and release stores. And you would lose a lot of the benefit of compiler scheduling, reuse of final fields, etc. I suspect the performance cost of this is a large fraction of, but less than the cost encountered by hardware X86 emulation, which needs the same treatment for all memory operations.
In addition, heap writes would often need to be followed by a full memory barrier, as on essentially all other architectures. I don't recall statistics about the frequency of heap stores in Java, but I guess it's typically between one every 10 cycles and one every 100 cycles. Current Itanium implementations have an advantage here in that the penalty for a full barrier is often "only" on the order of 10 cycles, as compared to over 100 on some other processor implementations.
If I had to make a wild guess, I'd say we're in the ballpark of a factor of 2 overall performance loss, which is less that what I think it would be on a Pentium 4, or probably Power. But all of these are wild guesses.
Aggressive compiler escape analysis can help a bit for a lot of this, but I think there are no known techniques that are good enough to really solve the problem.
I'm also not sure that SC is that much of an improvement for the programmer. It's certainly a bit easier to explain. But for code that correctly uses locks to protect shared variables, it doesn't matter. For other code, it suffices to annotate shared variables with "volatile" to get back to SC. I'd certainly prefer to read the code with the "volatile" annotations.
> -----Original Message-----
> From: Sarita Adve [mailto:email@example.com]
> Sent: Thursday, August 07, 2003 11:23 PM
> To: 'Ben Wint'; firstname.lastname@example.org
> Subject: RE: JavaMemoryModel: That gap again
> Note that this paper does not take compiler optimizations
> into account. From
> the hardware side, it is unclear how IA-64 processors would
> perform with
> sequential consistency (IA-64 has a relaxed model).
> > -----Original Message-----
> > From: email@example.com
> > [mailto:firstname.lastname@example.org] On Behalf Of Ben Wint
> > Sent: Thursday, August 07, 2003 3:41 AM
> > To: email@example.com
> > Subject: JavaMemoryModel: That gap again
> > Presumably you'd like community review feedback sent to
> > some community review forum rather than cluttering up this
> > mailing list. Where is that?
> > Hill, IEEE Computer'98 >> How will the performance gap
> > [between seq consistency & other models] change over the
> > next ten years? One argument is that it will grow, because
> > the latency to memory ... is likely to grow. On the contrary,
> > I see two reasons that make it likely to shrink. <<
> > Five years on, which way is it going?
> > Ben Wint
> > -------------------------------
> > JavaMemoryModel mailing list -
> > http://www.cs.umd.edu/~pugh/java/memoryModel
> JavaMemoryModel mailing list -
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:50 EDT