1======================================
2Sequence counters and sequential locks
3======================================
4
5Introduction
6============
7
8Sequence counters are a reader-writer consistency mechanism with
9lockless readers (read-only retry loops), and no writer starvation. They
10are used for data that's rarely written to (e.g. system time), where the
11reader wants a consistent set of information and is willing to retry if
12that information changes.
13
14A data set is consistent when the sequence count at the beginning of the
15read side critical section is even and the same sequence count value is
16read again at the end of the critical section. The data in the set must
17be copied out inside the read side critical section. If the sequence
18count has changed between the start and the end of the critical section,
19the reader must retry.
20
21Writers increment the sequence count at the start and the end of their
22critical section. After starting the critical section the sequence count
23is odd and indicates to the readers that an update is in progress. At
24the end of the write side critical section the sequence count becomes
25even again which lets readers make progress.
26
27A sequence counter write side critical section must never be preempted
28or interrupted by read side sections. Otherwise the reader will spin for
29the entire scheduler tick due to the odd sequence count value and the
30interrupted writer. If that reader belongs to a real-time scheduling
31class, it can spin forever and the kernel will livelock.
32
33This mechanism cannot be used if the protected data contains pointers,
34as the writer can invalidate a pointer that the reader is following.
35
36
37.. _seqcount_t:
38
39Sequence counters (``seqcount_t``)
40==================================
41
42This is the the raw counting mechanism, which does not protect against
43multiple writers.  Write side critical sections must thus be serialized
44by an external lock.
45
46If the write serialization primitive is not implicitly disabling
47preemption, preemption must be explicitly disabled before entering the
48write side section. If the read section can be invoked from hardirq or
49softirq contexts, interrupts or bottom halves must also be respectively
50disabled before entering the write section.
51
52If it's desired to automatically handle the sequence counter
53requirements of writer serialization and non-preemptibility, use
54:ref:`seqlock_t` instead.
55
56Initialization::
57
58	/* dynamic */
59	seqcount_t foo_seqcount;
60	seqcount_init(&foo_seqcount);
61
62	/* static */
63	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
64
65	/* C99 struct init */
66	struct {
67		.seq   = SEQCNT_ZERO(foo.seq),
68	} foo;
69
70Write path::
71
72	/* Serialized context with disabled preemption */
73
74	write_seqcount_begin(&foo_seqcount);
75
76	/* ... [[write-side critical section]] ... */
77
78	write_seqcount_end(&foo_seqcount);
79
80Read path::
81
82	do {
83		seq = read_seqcount_begin(&foo_seqcount);
84
85		/* ... [[read-side critical section]] ... */
86
87	} while (read_seqcount_retry(&foo_seqcount, seq));
88
89
90.. _seqcount_locktype_t:
91
92Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
93-----------------------------------------------------------------
94
95As discussed at :ref:`seqcount_t`, sequence count write side critical
96sections must be serialized and non-preemptible. This variant of
97sequence counters associate the lock used for writer serialization at
98initialization time, which enables lockdep to validate that the write
99side critical sections are properly serialized.
100
101This lock association is a NOOP if lockdep is disabled and has neither
102storage nor runtime overhead. If lockdep is enabled, the lock pointer is
103stored in struct seqcount and lockdep's "lock is held" assertions are
104injected at the beginning of the write side critical section to validate
105that it is properly protected.
106
107For lock types which do not implicitly disable preemption, preemption
108protection is enforced in the write side function.
109
110The following sequence counters with associated locks are defined:
111
112  - ``seqcount_spinlock_t``
113  - ``seqcount_raw_spinlock_t``
114  - ``seqcount_rwlock_t``
115  - ``seqcount_mutex_t``
116  - ``seqcount_ww_mutex_t``
117
118The plain seqcount read and write APIs branch out to the specific
119seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
120API explosion per each new seqcount LOCKTYPE.
121
122Initialization (replace "LOCKTYPE" with one of the supported locks)::
123
124	/* dynamic */
125	seqcount_LOCKTYPE_t foo_seqcount;
126	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
127
128	/* static */
129	static seqcount_LOCKTYPE_t foo_seqcount =
130		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
131
132	/* C99 struct init */
133	struct {
134		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
135	} foo;
136
137Write path: same as in :ref:`seqcount_t`, while running from a context
138with the associated LOCKTYPE lock acquired.
139
140Read path: same as in :ref:`seqcount_t`.
141
142
143.. _seqcount_latch_t:
144
145Latch sequence counters (``seqcount_latch_t``)
146----------------------------------------------
147
148Latch sequence counters are a multiversion concurrency control mechanism
149where the embedded seqcount_t counter even/odd value is used to switch
150between two copies of protected data. This allows the sequence counter
151read path to safely interrupt its own write side critical section.
152
153Use seqcount_latch_t when the write side sections cannot be protected
154from interruption by readers. This is typically the case when the read
155side can be invoked from NMI handlers.
156
157Check `raw_write_seqcount_latch()` for more information.
158
159
160.. _seqlock_t:
161
162Sequential locks (``seqlock_t``)
163================================
164
165This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
166embedded spinlock for writer serialization and non-preemptibility.
167
168If the read side section can be invoked from hardirq or softirq context,
169use the write side function variants which disable interrupts or bottom
170halves respectively.
171
172Initialization::
173
174	/* dynamic */
175	seqlock_t foo_seqlock;
176	seqlock_init(&foo_seqlock);
177
178	/* static */
179	static DEFINE_SEQLOCK(foo_seqlock);
180
181	/* C99 struct init */
182	struct {
183		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
184	} foo;
185
186Write path::
187
188	write_seqlock(&foo_seqlock);
189
190	/* ... [[write-side critical section]] ... */
191
192	write_sequnlock(&foo_seqlock);
193
194Read path, three categories:
195
1961. Normal Sequence readers which never block a writer but they must
197   retry if a writer is in progress by detecting change in the sequence
198   number.  Writers do not wait for a sequence reader::
199
200	do {
201		seq = read_seqbegin(&foo_seqlock);
202
203		/* ... [[read-side critical section]] ... */
204
205	} while (read_seqretry(&foo_seqlock, seq));
206
2072. Locking readers which will wait if a writer or another locking reader
208   is in progress. A locking reader in progress will also block a writer
209   from entering its critical section. This read lock is
210   exclusive. Unlike rwlock_t, only one locking reader can acquire it::
211
212	read_seqlock_excl(&foo_seqlock);
213
214	/* ... [[read-side critical section]] ... */
215
216	read_sequnlock_excl(&foo_seqlock);
217
2183. Conditional lockless reader (as in 1), or locking reader (as in 2),
219   according to a passed marker. This is used to avoid lockless readers
220   starvation (too much retry loops) in case of a sharp spike in write
221   activity. First, a lockless read is tried (even marker passed). If
222   that trial fails (odd sequence counter is returned, which is used as
223   the next iteration marker), the lockless read is transformed to a
224   full locking read and no retry loop is necessary::
225
226	/* marker; even initialization */
227	int seq = 0;
228	do {
229		read_seqbegin_or_lock(&foo_seqlock, &seq);
230
231		/* ... [[read-side critical section]] ... */
232
233	} while (need_seqretry(&foo_seqlock, seq));
234	done_seqretry(&foo_seqlock, seq);
235
236
237API documentation
238=================
239
240.. kernel-doc:: include/linux/seqlock.h
241