1*ebb477cbSPaul E. McKenneyCONTROL DEPENDENCIES 2*ebb477cbSPaul E. McKenney==================== 3*ebb477cbSPaul E. McKenney 4*ebb477cbSPaul E. McKenneyA major difficulty with control dependencies is that current compilers 5*ebb477cbSPaul E. McKenneydo not support them. One purpose of this document is therefore to 6*ebb477cbSPaul E. McKenneyhelp you prevent your compiler from breaking your code. However, 7*ebb477cbSPaul E. McKenneycontrol dependencies also pose other challenges, which leads to the 8*ebb477cbSPaul E. McKenneysecond purpose of this document, namely to help you to avoid breaking 9*ebb477cbSPaul E. McKenneyyour own code, even in the absence of help from your compiler. 10*ebb477cbSPaul E. McKenney 11*ebb477cbSPaul E. McKenneyOne such challenge is that control dependencies order only later stores. 12*ebb477cbSPaul E. McKenneyTherefore, a load-load control dependency will not preserve ordering 13*ebb477cbSPaul E. McKenneyunless a read memory barrier is provided. Consider the following code: 14*ebb477cbSPaul E. McKenney 15*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 16*ebb477cbSPaul E. McKenney if (q) 17*ebb477cbSPaul E. McKenney p = READ_ONCE(b); 18*ebb477cbSPaul E. McKenney 19*ebb477cbSPaul E. McKenneyThis is not guaranteed to provide any ordering because some types of CPUs 20*ebb477cbSPaul E. McKenneyare permitted to predict the result of the load from "b". This prediction 21*ebb477cbSPaul E. McKenneycan cause other CPUs to see this load as having happened before the load 22*ebb477cbSPaul E. McKenneyfrom "a". This means that an explicit read barrier is required, for example 23*ebb477cbSPaul E. McKenneyas follows: 24*ebb477cbSPaul E. McKenney 25*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 26*ebb477cbSPaul E. McKenney if (q) { 27*ebb477cbSPaul E. McKenney smp_rmb(); 28*ebb477cbSPaul E. McKenney p = READ_ONCE(b); 29*ebb477cbSPaul E. McKenney } 30*ebb477cbSPaul E. McKenney 31*ebb477cbSPaul E. McKenneyHowever, stores are not speculated. This means that ordering is 32*ebb477cbSPaul E. McKenney(usually) guaranteed for load-store control dependencies, as in the 33*ebb477cbSPaul E. McKenneyfollowing example: 34*ebb477cbSPaul E. McKenney 35*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 36*ebb477cbSPaul E. McKenney if (q) 37*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 38*ebb477cbSPaul E. McKenney 39*ebb477cbSPaul E. McKenneyControl dependencies can pair with each other and with other types 40*ebb477cbSPaul E. McKenneyof ordering. But please note that neither the READ_ONCE() nor the 41*ebb477cbSPaul E. McKenneyWRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might 42*ebb477cbSPaul E. McKenneyfuse the load from "a" with other loads. Without the WRITE_ONCE(), 43*ebb477cbSPaul E. McKenneythe compiler might fuse the store to "b" with other stores. Worse yet, 44*ebb477cbSPaul E. McKenneythe compiler might convert the store into a load and a check followed 45*ebb477cbSPaul E. McKenneyby a store, and this compiler-generated load would not be ordered by 46*ebb477cbSPaul E. McKenneythe control dependency. 47*ebb477cbSPaul E. McKenney 48*ebb477cbSPaul E. McKenneyFurthermore, if the compiler is able to prove that the value of variable 49*ebb477cbSPaul E. McKenney"a" is always non-zero, it would be well within its rights to optimize 50*ebb477cbSPaul E. McKenneythe original example by eliminating the "if" statement as follows: 51*ebb477cbSPaul E. McKenney 52*ebb477cbSPaul E. McKenney q = a; 53*ebb477cbSPaul E. McKenney b = 1; /* BUG: Compiler and CPU can both reorder!!! */ 54*ebb477cbSPaul E. McKenney 55*ebb477cbSPaul E. McKenneySo don't leave out either the READ_ONCE() or the WRITE_ONCE(). 56*ebb477cbSPaul E. McKenneyIn particular, although READ_ONCE() does force the compiler to emit a 57*ebb477cbSPaul E. McKenneyload, it does *not* force the compiler to actually use the loaded value. 58*ebb477cbSPaul E. McKenney 59*ebb477cbSPaul E. McKenneyIt is tempting to try use control dependencies to enforce ordering on 60*ebb477cbSPaul E. McKenneyidentical stores on both branches of the "if" statement as follows: 61*ebb477cbSPaul E. McKenney 62*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 63*ebb477cbSPaul E. McKenney if (q) { 64*ebb477cbSPaul E. McKenney barrier(); 65*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 66*ebb477cbSPaul E. McKenney do_something(); 67*ebb477cbSPaul E. McKenney } else { 68*ebb477cbSPaul E. McKenney barrier(); 69*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 70*ebb477cbSPaul E. McKenney do_something_else(); 71*ebb477cbSPaul E. McKenney } 72*ebb477cbSPaul E. McKenney 73*ebb477cbSPaul E. McKenneyUnfortunately, current compilers will transform this as follows at high 74*ebb477cbSPaul E. McKenneyoptimization levels: 75*ebb477cbSPaul E. McKenney 76*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 77*ebb477cbSPaul E. McKenney barrier(); 78*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */ 79*ebb477cbSPaul E. McKenney if (q) { 80*ebb477cbSPaul E. McKenney /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ 81*ebb477cbSPaul E. McKenney do_something(); 82*ebb477cbSPaul E. McKenney } else { 83*ebb477cbSPaul E. McKenney /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ 84*ebb477cbSPaul E. McKenney do_something_else(); 85*ebb477cbSPaul E. McKenney } 86*ebb477cbSPaul E. McKenney 87*ebb477cbSPaul E. McKenneyNow there is no conditional between the load from "a" and the store to 88*ebb477cbSPaul E. McKenney"b", which means that the CPU is within its rights to reorder them: The 89*ebb477cbSPaul E. McKenneyconditional is absolutely required, and must be present in the final 90*ebb477cbSPaul E. McKenneyassembly code, after all of the compiler and link-time optimizations 91*ebb477cbSPaul E. McKenneyhave been applied. Therefore, if you need ordering in this example, 92*ebb477cbSPaul E. McKenneyyou must use explicit memory ordering, for example, smp_store_release(): 93*ebb477cbSPaul E. McKenney 94*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 95*ebb477cbSPaul E. McKenney if (q) { 96*ebb477cbSPaul E. McKenney smp_store_release(&b, 1); 97*ebb477cbSPaul E. McKenney do_something(); 98*ebb477cbSPaul E. McKenney } else { 99*ebb477cbSPaul E. McKenney smp_store_release(&b, 1); 100*ebb477cbSPaul E. McKenney do_something_else(); 101*ebb477cbSPaul E. McKenney } 102*ebb477cbSPaul E. McKenney 103*ebb477cbSPaul E. McKenneyWithout explicit memory ordering, control-dependency-based ordering is 104*ebb477cbSPaul E. McKenneyguaranteed only when the stores differ, for example: 105*ebb477cbSPaul E. McKenney 106*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 107*ebb477cbSPaul E. McKenney if (q) { 108*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 109*ebb477cbSPaul E. McKenney do_something(); 110*ebb477cbSPaul E. McKenney } else { 111*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 2); 112*ebb477cbSPaul E. McKenney do_something_else(); 113*ebb477cbSPaul E. McKenney } 114*ebb477cbSPaul E. McKenney 115*ebb477cbSPaul E. McKenneyThe initial READ_ONCE() is still required to prevent the compiler from 116*ebb477cbSPaul E. McKenneyknowing too much about the value of "a". 117*ebb477cbSPaul E. McKenney 118*ebb477cbSPaul E. McKenneyBut please note that you need to be careful what you do with the local 119*ebb477cbSPaul E. McKenneyvariable "q", otherwise the compiler might be able to guess the value 120*ebb477cbSPaul E. McKenneyand again remove the conditional branch that is absolutely required to 121*ebb477cbSPaul E. McKenneypreserve ordering. For example: 122*ebb477cbSPaul E. McKenney 123*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 124*ebb477cbSPaul E. McKenney if (q % MAX) { 125*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 126*ebb477cbSPaul E. McKenney do_something(); 127*ebb477cbSPaul E. McKenney } else { 128*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 2); 129*ebb477cbSPaul E. McKenney do_something_else(); 130*ebb477cbSPaul E. McKenney } 131*ebb477cbSPaul E. McKenney 132*ebb477cbSPaul E. McKenneyIf MAX is compile-time defined to be 1, then the compiler knows that 133*ebb477cbSPaul E. McKenney(q % MAX) must be equal to zero, regardless of the value of "q". 134*ebb477cbSPaul E. McKenneyThe compiler is therefore within its rights to transform the above code 135*ebb477cbSPaul E. McKenneyinto the following: 136*ebb477cbSPaul E. McKenney 137*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 138*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 2); 139*ebb477cbSPaul E. McKenney do_something_else(); 140*ebb477cbSPaul E. McKenney 141*ebb477cbSPaul E. McKenneyGiven this transformation, the CPU is not required to respect the ordering 142*ebb477cbSPaul E. McKenneybetween the load from variable "a" and the store to variable "b". It is 143*ebb477cbSPaul E. McKenneytempting to add a barrier(), but this does not help. The conditional 144*ebb477cbSPaul E. McKenneyis gone, and the barrier won't bring it back. Therefore, if you need 145*ebb477cbSPaul E. McKenneyto relying on control dependencies to produce this ordering, you should 146*ebb477cbSPaul E. McKenneymake sure that MAX is greater than one, perhaps as follows: 147*ebb477cbSPaul E. McKenney 148*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 149*ebb477cbSPaul E. McKenney BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ 150*ebb477cbSPaul E. McKenney if (q % MAX) { 151*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 152*ebb477cbSPaul E. McKenney do_something(); 153*ebb477cbSPaul E. McKenney } else { 154*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 2); 155*ebb477cbSPaul E. McKenney do_something_else(); 156*ebb477cbSPaul E. McKenney } 157*ebb477cbSPaul E. McKenney 158*ebb477cbSPaul E. McKenneyPlease note once again that each leg of the "if" statement absolutely 159*ebb477cbSPaul E. McKenneymust store different values to "b". As in previous examples, if the two 160*ebb477cbSPaul E. McKenneyvalues were identical, the compiler could pull this store outside of the 161*ebb477cbSPaul E. McKenney"if" statement, destroying the control dependency's ordering properties. 162*ebb477cbSPaul E. McKenney 163*ebb477cbSPaul E. McKenneyYou must also be careful avoid relying too much on boolean short-circuit 164*ebb477cbSPaul E. McKenneyevaluation. Consider this example: 165*ebb477cbSPaul E. McKenney 166*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 167*ebb477cbSPaul E. McKenney if (q || 1 > 0) 168*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 169*ebb477cbSPaul E. McKenney 170*ebb477cbSPaul E. McKenneyBecause the first condition cannot fault and the second condition is 171*ebb477cbSPaul E. McKenneyalways true, the compiler can transform this example as follows, again 172*ebb477cbSPaul E. McKenneydestroying the control dependency's ordering: 173*ebb477cbSPaul E. McKenney 174*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 175*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 176*ebb477cbSPaul E. McKenney 177*ebb477cbSPaul E. McKenneyThis is yet another example showing the importance of preventing the 178*ebb477cbSPaul E. McKenneycompiler from out-guessing your code. Again, although READ_ONCE() really 179*ebb477cbSPaul E. McKenneydoes force the compiler to emit code for a given load, the compiler is 180*ebb477cbSPaul E. McKenneywithin its rights to discard the loaded value. 181*ebb477cbSPaul E. McKenney 182*ebb477cbSPaul E. McKenneyIn addition, control dependencies apply only to the then-clause and 183*ebb477cbSPaul E. McKenneyelse-clause of the "if" statement in question. In particular, they do 184*ebb477cbSPaul E. McKenneynot necessarily order the code following the entire "if" statement: 185*ebb477cbSPaul E. McKenney 186*ebb477cbSPaul E. McKenney q = READ_ONCE(a); 187*ebb477cbSPaul E. McKenney if (q) { 188*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 1); 189*ebb477cbSPaul E. McKenney } else { 190*ebb477cbSPaul E. McKenney WRITE_ONCE(b, 2); 191*ebb477cbSPaul E. McKenney } 192*ebb477cbSPaul E. McKenney WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */ 193*ebb477cbSPaul E. McKenney 194*ebb477cbSPaul E. McKenneyIt is tempting to argue that there in fact is ordering because the 195*ebb477cbSPaul E. McKenneycompiler cannot reorder volatile accesses and also cannot reorder 196*ebb477cbSPaul E. McKenneythe writes to "b" with the condition. Unfortunately for this line 197*ebb477cbSPaul E. McKenneyof reasoning, the compiler might compile the two writes to "b" as 198*ebb477cbSPaul E. McKenneyconditional-move instructions, as in this fanciful pseudo-assembly 199*ebb477cbSPaul E. McKenneylanguage: 200*ebb477cbSPaul E. McKenney 201*ebb477cbSPaul E. McKenney ld r1,a 202*ebb477cbSPaul E. McKenney cmp r1,$0 203*ebb477cbSPaul E. McKenney cmov,ne r4,$1 204*ebb477cbSPaul E. McKenney cmov,eq r4,$2 205*ebb477cbSPaul E. McKenney st r4,b 206*ebb477cbSPaul E. McKenney st $1,c 207*ebb477cbSPaul E. McKenney 208*ebb477cbSPaul E. McKenneyThe control dependencies would then extend only to the pair of cmov 209*ebb477cbSPaul E. McKenneyinstructions and the store depending on them. This means that a weakly 210*ebb477cbSPaul E. McKenneyordered CPU would have no dependency of any sort between the load from 211*ebb477cbSPaul E. McKenney"a" and the store to "c". In short, control dependencies provide ordering 212*ebb477cbSPaul E. McKenneyonly to the stores in the then-clause and else-clause of the "if" statement 213*ebb477cbSPaul E. McKenneyin question (including functions invoked by those two clauses), and not 214*ebb477cbSPaul E. McKenneyto code following that "if" statement. 215*ebb477cbSPaul E. McKenney 216*ebb477cbSPaul E. McKenney 217*ebb477cbSPaul E. McKenneyIn summary: 218*ebb477cbSPaul E. McKenney 219*ebb477cbSPaul E. McKenney (*) Control dependencies can order prior loads against later stores. 220*ebb477cbSPaul E. McKenney However, they do *not* guarantee any other sort of ordering: 221*ebb477cbSPaul E. McKenney Not prior loads against later loads, nor prior stores against 222*ebb477cbSPaul E. McKenney later anything. If you need these other forms of ordering, use 223*ebb477cbSPaul E. McKenney smp_load_acquire(), smp_store_release(), or, in the case of prior 224*ebb477cbSPaul E. McKenney stores and later loads, smp_mb(). 225*ebb477cbSPaul E. McKenney 226*ebb477cbSPaul E. McKenney (*) If both legs of the "if" statement contain identical stores to 227*ebb477cbSPaul E. McKenney the same variable, then you must explicitly order those stores, 228*ebb477cbSPaul E. McKenney either by preceding both of them with smp_mb() or by using 229*ebb477cbSPaul E. McKenney smp_store_release(). Please note that it is *not* sufficient to use 230*ebb477cbSPaul E. McKenney barrier() at beginning and end of each leg of the "if" statement 231*ebb477cbSPaul E. McKenney because, as shown by the example above, optimizing compilers can 232*ebb477cbSPaul E. McKenney destroy the control dependency while respecting the letter of the 233*ebb477cbSPaul E. McKenney barrier() law. 234*ebb477cbSPaul E. McKenney 235*ebb477cbSPaul E. McKenney (*) Control dependencies require at least one run-time conditional 236*ebb477cbSPaul E. McKenney between the prior load and the subsequent store, and this 237*ebb477cbSPaul E. McKenney conditional must involve the prior load. If the compiler is able 238*ebb477cbSPaul E. McKenney to optimize the conditional away, it will have also optimized 239*ebb477cbSPaul E. McKenney away the ordering. Careful use of READ_ONCE() and WRITE_ONCE() 240*ebb477cbSPaul E. McKenney can help to preserve the needed conditional. 241*ebb477cbSPaul E. McKenney 242*ebb477cbSPaul E. McKenney (*) Control dependencies require that the compiler avoid reordering the 243*ebb477cbSPaul E. McKenney dependency into nonexistence. Careful use of READ_ONCE() or 244*ebb477cbSPaul E. McKenney atomic{,64}_read() can help to preserve your control dependency. 245*ebb477cbSPaul E. McKenney 246*ebb477cbSPaul E. McKenney (*) Control dependencies apply only to the then-clause and else-clause 247*ebb477cbSPaul E. McKenney of the "if" statement containing the control dependency, including 248*ebb477cbSPaul E. McKenney any functions that these two clauses call. Control dependencies 249*ebb477cbSPaul E. McKenney do *not* apply to code beyond the end of that "if" statement. 250*ebb477cbSPaul E. McKenney 251*ebb477cbSPaul E. McKenney (*) Control dependencies pair normally with other types of barriers. 252*ebb477cbSPaul E. McKenney 253*ebb477cbSPaul E. McKenney (*) Control dependencies do *not* provide multicopy atomicity. If you 254*ebb477cbSPaul E. McKenney need all the CPUs to agree on the ordering of a given store against 255*ebb477cbSPaul E. McKenney all other accesses, use smp_mb(). 256*ebb477cbSPaul E. McKenney 257*ebb477cbSPaul E. McKenney (*) Compilers do not understand control dependencies. It is therefore 258*ebb477cbSPaul E. McKenney your job to ensure that they do not break your code. 259