The OP of a recent question added a comment to it linking a paper entitled Common Compiler Optimisations are Invalid in the C11 Memory Model and what we can do about it, which apparently was presented at POPL 2015. Among other things, it purports to show several unexpected and counterintuitive conclusions derived from the specifications for what it calls the "C11 memory model", which I take to consist largely of the provisions of section 5.1.2.4 of the C11 language specification.
The paper is somewhat lengthy, but for the purposes of this question I focus on the discussion of scheme "SEQ" on the second page. This concerns a multithreaded program in which ...
a
is non-atomic (for example, an int
),
x
and y
are atomic (for example, _Atomic int
), and
a
, x
, and y
all initially have value 0
,
... and the following occurs (transliterated from pseudocode):
Thread 1
a = 1;
Thread 2
if (atomic_load_explicit(&x, memory_order_relaxed))
if (a)
atomic_store_explicit(&y, 1, memory_order_relaxed);
Thread 3
if (atomic_load_explicit(&y, memory_order_relaxed))
atomic_store_explicit(&x, 1, memory_order_relaxed);
The paper makes this argument:
First, notice that there is no execution (consistent execution in
the terminology of Section 2) in which the load of a
occurs. We show this
by contradiction. Suppose that there is an execution in which a load
of a
occurs. In such an execution the load of a
can only return 0
(the
initial value of a
) because the store a=1
does not happen before it
(because it is in a different thread that has not been synchronised
with) and non-atomic loads must return the latest write that happens
before them. Therefore, in this execution the store to y
does not happen,
which in turn means that the load of y
cannot return 1
and the store
to x
also does not happen. Then, x
cannot read 1
, and thus the load of a
does
not occur. As a consequence this program is not racy: since the load
of a
does not occur in any execution, there are no executions with
conflicting accesses on the same non-atomic variable. We conclude that
the only possible final state is a=1 ∧ x=y=0
.
(Question 1) But isn't that argument fatally flawed?
The assertion that the load of a
can only return 0 is made subject to the assumption that a
is in fact read, which the argument intends to contradict. But in that case, as the paper observes, there is no happens before relationship between the store to a
in thread 1 and the load from a
in thread 2. These are conflicting accesses, neither is atomic, and one is a write. Therefore, per paragraph 5.1.2.4/25, the program contains a data race resulting in undefined behavior. Because the behavior is undefined, nothing can be concluded about the value loaded from a
by thread 2, and in particular, it cannot be concluded from the specification that the load must return 0
. The rest of the argument then collapses.
Although the paper claims that the argument shows that the program does not contain a data race ("is not racy"), in fact that is not a consequence of the argument but rather a hidden assumption. Only if, contrary to 5.1.2.4/25, the program did not contain a data race would the argument stand up.
Now perhaps the key is that the argument above considers only "consistent executions", a term defined in a later section of the paper. I confess that it gets a little deep for me at that point, but if in fact constraining the behavior to consistent executions is sufficient to support the assertion that the load of a
must return 0, then it seems that it is no longer (just) the rules of the C11 memory model that we are talking about.
This matters because the authors conclude that a source-to-source translation combining threads 1 & 2 to yield ...
Thread 2'
a = 1;
if (atomic_load_explicit(&x, memory_order_relaxed))
if (a)
atomic_store_explicit(&y, 1, memory_order_relaxed);
... is not permitted of implementations by the C11 memory model, on the basis that it allows executions in which the final state is a = x = y = 1
. That this and various other code transformations and optimizations are invalid is the thesis of the paper.
(Question 2) But isn't it indeed valid for a C11 implementation to treat the original three-threaded program as if it were the two-threaded program consisting of threads 2' and 3?
If that allows "consistent executions" with results that could not be produced by any consistent execution of the original program, then I think that just shows that C11 does not constrain programs to exhibiting consistent executions. Am I missing something here?