xref: /openbmc/linux/Documentation/atomic_t.txt (revision 4f727ecefefbd180de10e25b3e74c03dce3f1e75)
1
2On atomic types (atomic_t atomic64_t and atomic_long_t).
3
4The atomic type provides an interface to the architecture's means of atomic
5RMW operations between CPUs (atomic operations on MMIO are not supported and
6can lead to fatal traps on some platforms).
7
8API
9---
10
11The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
12brevity):
13
14Non-RMW ops:
15
16  atomic_read(), atomic_set()
17  atomic_read_acquire(), atomic_set_release()
18
19
20RMW atomic operations:
21
22Arithmetic:
23
24  atomic_{add,sub,inc,dec}()
25  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
26  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
27
28
29Bitwise:
30
31  atomic_{and,or,xor,andnot}()
32  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
33
34
35Swap:
36
37  atomic_xchg{,_relaxed,_acquire,_release}()
38  atomic_cmpxchg{,_relaxed,_acquire,_release}()
39  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
40
41
42Reference count (but please see refcount_t):
43
44  atomic_add_unless(), atomic_inc_not_zero()
45  atomic_sub_and_test(), atomic_dec_and_test()
46
47
48Misc:
49
50  atomic_inc_and_test(), atomic_add_negative()
51  atomic_dec_unless_positive(), atomic_inc_unless_negative()
52
53
54Barriers:
55
56  smp_mb__{before,after}_atomic()
57
58
59TYPES (signed vs unsigned)
60-----
61
62While atomic_t, atomic_long_t and atomic64_t use int, long and s64
63respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
64(which implies -fwrapv) and defines signed overflow to behave like
652s-complement.
66
67Therefore, an explicitly unsigned variant of the atomic ops is strictly
68unnecessary and we can simply cast, there is no UB.
69
70There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
71signed types.
72
73With this we also conform to the C/C++ _Atomic behaviour and things like
74P1236R1.
75
76
77SEMANTICS
78---------
79
80Non-RMW ops:
81
82The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
83implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
84smp_store_release() respectively.
85
86The one detail to this is that atomic_set{}() should be observable to the RMW
87ops. That is:
88
89  C atomic-set
90
91  {
92    atomic_set(v, 1);
93  }
94
95  P1(atomic_t *v)
96  {
97    atomic_add_unless(v, 1, 0);
98  }
99
100  P2(atomic_t *v)
101  {
102    atomic_set(v, 0);
103  }
104
105  exists
106  (v=2)
107
108In this case we would expect the atomic_set() from CPU1 to either happen
109before the atomic_add_unless(), in which case that latter one would no-op, or
110_after_ in which case we'd overwrite its result. In no case is "2" a valid
111outcome.
112
113This is typically true on 'normal' platforms, where a regular competing STORE
114will invalidate a LL/SC or fail a CMPXCHG.
115
116The obvious case where this is not so is when we need to implement atomic ops
117with a lock:
118
119  CPU0						CPU1
120
121  atomic_add_unless(v, 1, 0);
122    lock();
123    ret = READ_ONCE(v->counter); // == 1
124						atomic_set(v, 0);
125    if (ret != u)				  WRITE_ONCE(v->counter, 0);
126      WRITE_ONCE(v->counter, ret + 1);
127    unlock();
128
129the typical solution is to then implement atomic_set{}() with atomic_xchg().
130
131
132RMW ops:
133
134These come in various forms:
135
136 - plain operations without return value: atomic_{}()
137
138 - operations which return the modified value: atomic_{}_return()
139
140   these are limited to the arithmetic operations because those are
141   reversible. Bitops are irreversible and therefore the modified value
142   is of dubious utility.
143
144 - operations which return the original value: atomic_fetch_{}()
145
146 - swap operations: xchg(), cmpxchg() and try_cmpxchg()
147
148 - misc; the special purpose operations that are commonly used and would,
149   given the interface, normally be implemented using (try_)cmpxchg loops but
150   are time critical and can, (typically) on LL/SC architectures, be more
151   efficiently implemented.
152
153All these operations are SMP atomic; that is, the operations (for a single
154atomic variable) can be fully ordered and no intermediate state is lost or
155visible.
156
157
158ORDERING  (go read memory-barriers.txt first)
159--------
160
161The rule of thumb:
162
163 - non-RMW operations are unordered;
164
165 - RMW operations that have no return value are unordered;
166
167 - RMW operations that have a return value are fully ordered;
168
169 - RMW operations that are conditional are unordered on FAILURE,
170   otherwise the above rules apply.
171
172Except of course when an operation has an explicit ordering like:
173
174 {}_relaxed: unordered
175 {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
176 {}_release: the W of the RMW (or atomic_set)  is a  RELEASE
177
178Where 'unordered' is against other memory locations. Address dependencies are
179not defeated.
180
181Fully ordered primitives are ordered against everything prior and everything
182subsequent. Therefore a fully ordered primitive is like having an smp_mb()
183before and an smp_mb() after the primitive.
184
185
186The barriers:
187
188  smp_mb__{before,after}_atomic()
189
190only apply to the RMW ops and can be used to augment/upgrade the ordering
191inherent to the used atomic op. These barriers provide a full smp_mb().
192
193These helper barriers exist because architectures have varying implicit
194ordering on their SMP atomic primitives. For example our TSO architectures
195provide full ordered atomics and these barriers are no-ops.
196
197Thus:
198
199  atomic_fetch_add();
200
201is equivalent to:
202
203  smp_mb__before_atomic();
204  atomic_fetch_add_relaxed();
205  smp_mb__after_atomic();
206
207However the atomic_fetch_add() might be implemented more efficiently.
208
209Further, while something like:
210
211  smp_mb__before_atomic();
212  atomic_dec(&X);
213
214is a 'typical' RELEASE pattern, the barrier is strictly stronger than
215a RELEASE. Similarly for something like:
216
217  atomic_inc(&X);
218  smp_mb__after_atomic();
219
220is an ACQUIRE pattern (though very much not typical), but again the barrier is
221strictly stronger than ACQUIRE. As illustrated:
222
223  C strong-acquire
224
225  {
226  }
227
228  P1(int *x, atomic_t *y)
229  {
230    r0 = READ_ONCE(*x);
231    smp_rmb();
232    r1 = atomic_read(y);
233  }
234
235  P2(int *x, atomic_t *y)
236  {
237    atomic_inc(y);
238    smp_mb__after_atomic();
239    WRITE_ONCE(*x, 1);
240  }
241
242  exists
243  (r0=1 /\ r1=0)
244
245This should not happen; but a hypothetical atomic_inc_acquire() --
246(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
247since then:
248
249  P1			P2
250
251			t = LL.acq *y (0)
252			t++;
253			*x = 1;
254  r0 = *x (1)
255  RMB
256  r1 = *y (0)
257			SC *y, t;
258
259is allowed.
260