> -----Original Message-----
> From: Boehm, Hans [mailto:email@example.com]
> Sent: Wednesday, July 18, 2001 4:46 PM
> > Ok, I classify this issue as write atomicity. In other
> > words, when a write
> > becomes visible to a processor, does it become visible to all
> > processors at
> > the same time? There are at least three options:
> > (1) Yes, a write becomes visible to all processors at the same time.
> > (2) A write can become visible to its own processor early, but becomes
> > visible to all other processors at the same time.
> > (3) A write can become visible to some other processors early.
> > I believe your question is about whether we should permit option 3.
> Are there cases in which option 2 gives you some useful properties that
> option 3 doesn't?
> I'm not sure that as a programmer the distinction between 2 and 3 matters.
> Does it?
It depends on what the other ordering constraints are. For example, for a
model that prohibits all reorderings except W -> R (a la TSO), here are
codes that can distinguish between options 2 and 3:
Flag1 = 1
register1 = A
register2 = Flag2
Flag2 = 1
A = 2
register3 = A
register4 = Flag1
Result: register1=1, register2=0, register3=2, register4=0
This result can only happen if option 2 is permitted, and is independent of
The example you've been citing (with four processors) needs option 3, and is
independent of option 2. Another such example with only three processors is:
A = 1
register1 = A
Result: B=1, register1=0
How useful is it to prohibit 2 or 3 for programmers?
One could argue that it is important to prohibit 3 for the above code (also
called causality by some).
I'm not sure how important it is to explicitly prohibit 2 for programmers. I
haven't come across an algorithm that relies on it, but others on this list
would know better. This optimization is tricky because whether or not it
becomes visible depends a lot on other system constraints. If it becomes an
important issue, I can look up previous work to remind myself of all the
related issues. Certainly once we finesse the rest of the semantics, I'll go
back to it.
In this context, I should also clarify one thing from my previous message:
> > Option (2) is related to local data dependences and program
> > order related
> > memory model constraints. If we go with requiring that program ordered
> > volatile write followed by volatile read should be usable for
> > ordering, then
> > option (2) is prohibited directly.
I should have said "if we go with requiring that program ordered volatile
write followed by volatile read should be usable for ordering, then **it
becomes easier to say that option (2) should not be visible for volatiles.**
That is, we don't have to explicitly prohibit the optimization, but because
of the other parts of the spec, its effect may not be visible anyway.
> > As far as real machines - The last time I did a survey of
> > this issue (with
> > Kourosh about five years ago), the only real machine that
> > permitted option
> > (3) was the Cray T3D.
> Did you also look at large NUMA machines like an SGI O2K? Or was
> this done
> too early? I suspect we don't need to worry about the T3D, but
> it would be
> nice to have some evidence that NUMA machines aren't likely to end up
> weakening memory models in this respect.
SGI O2K is supposed to be sequentially consistent. We looked at Convex
Exemplar and Sequent NUMA-Q. We certainly didn't find evidence then that
these machines permitted (3). Unfortunately, I don't remember if we found
absolute documentation that they prohibited (3).
> > My opinion: requiring non-normal writes to be atomic (i.e.,
> > prohibit options
> > 2 and 3) should be acceptable from hardware point of view.
> It sounds like it's possible on most architectures, but would in
> many cases
> require some sort of a barrier after a volatile write.
> Presumably that
> barrier would entail some cost, since it would wait for the store
> buffer to
> drain, and we know the store buffer is nonempty at that point? If the
> hardware guarantees option (2), I still don't see how you can
> eliminate that
> barrier, though you might be able to postpone it.
> Thus "acceptable" means
> acceptable at substantial cost?
Note that for option (3), the barrier would need to go after the read that
returned the value of the non-atomic write. But most machines today wouldn't
require it (including IA-64, Alpha, Origin, TSO). For option (2), yes, we
would need a barrier after volatile writes, but we would need this anyway if
we went with program order constraints for volatile write --> volatile read.
Also, I should reword the above paragraph from the previous message too:
> > My opinion: requiring non-normal writes to ***appear*** atomic (i.e.,
> > *** making options
> > 2 and 3 invisible***) should be acceptable from hardware point of view.
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT