Are Linux Kernel Memory Barriers Transitive?

Please note that although this is not a trick question, what follows below is most certainly a trick answer. Please read it in its entirety or don't read it at all. You have been warned!!!

The question involved the following code fragment, where each function foo_n() runs on CPU n:

int x, y; /* shared variables */
int r1, r2, r3; /* private variables */

void foo_0(void)
{
	ACCESS_ONCE(x) = 1;
}

void foo_1(void)
{
	r1 = x;
	smp_mb();
	r2 = y;
}

void foo_2(void)
{
	y = 1;
	smp_mb();
	r3 = x;
}

Suppose that the following assertion runs after all of the preceding functions complete. Can this assertion ever trigger?

assert(!(r1 == 1 && r2 == 0 && r3 == 0));

Appealing to Intuition

When it comes to memory barriers, a little intuition can be a very dangerous thing. Nevertheless, let's at least see where it leads.

Let's assume that the assertion can trigger. This means that r1 == 1, which means that foo_0() must have executed before foo_1() did. For the assertion to trigger, we must also have r2 == 0, which means that foo_1() must have executed before foo_2() did. But if foo_0() executed before foo_1() and foo_1() executed before foo_2(), then foo_2()'s load from x must assuredly see foo_0()'s store to x. This in turn means that r3 == 1, preventing the assertion from triggering.

Most people's intuitions would be much happier if the assertion could never trigger.

Appealing to Linux Kernel Documentation

Another time-honored (if somewhat rare) approach is to dig through the Linux kernel documentation, particularly Documentation/memory-barriers.txt. Although this document says that smp_mb() orders prior accesses against subsequent accesses, it says nothing about what happens to independent accesses from other threads, such as foo_0()'s store to x.

A guarantee not stated is no guarantee at all, so we would have to conclude that the assertion might trigger after all.

Appealing to Hardware Documentation

But the truly insane will read the actual code along with all of the relevant hardware documentation. Those that manage to either avoid or return from the relevant institutions would conclude that the assertion is guaranteed not to trigger on all of the recent SMP system that Linux supports. ARM and Power each provide recursive definitions of their full memory barriers (DMB and sync, respectively), Itanium guarantees a total order on all “release” operations, and s390, x86, SPARC TSO, and PA-RISC guarantee that causality is preserved. (Did I miss any? If so, please let me know!)

So perhaps intuition wins out for once! ;–)

But more on this topic later this week...