10d24f65eSAhmed S. Darwish======================================
20d24f65eSAhmed S. DarwishSequence counters and sequential locks
30d24f65eSAhmed S. Darwish======================================
40d24f65eSAhmed S. Darwish
50d24f65eSAhmed S. DarwishIntroduction
60d24f65eSAhmed S. Darwish============
70d24f65eSAhmed S. Darwish
80d24f65eSAhmed S. DarwishSequence counters are a reader-writer consistency mechanism with
90d24f65eSAhmed S. Darwishlockless readers (read-only retry loops), and no writer starvation. They
100d24f65eSAhmed S. Darwishare used for data that's rarely written to (e.g. system time), where the
110d24f65eSAhmed S. Darwishreader wants a consistent set of information and is willing to retry if
120d24f65eSAhmed S. Darwishthat information changes.
130d24f65eSAhmed S. Darwish
140d24f65eSAhmed S. DarwishA data set is consistent when the sequence count at the beginning of the
150d24f65eSAhmed S. Darwishread side critical section is even and the same sequence count value is
160d24f65eSAhmed S. Darwishread again at the end of the critical section. The data in the set must
170d24f65eSAhmed S. Darwishbe copied out inside the read side critical section. If the sequence
180d24f65eSAhmed S. Darwishcount has changed between the start and the end of the critical section,
190d24f65eSAhmed S. Darwishthe reader must retry.
200d24f65eSAhmed S. Darwish
210d24f65eSAhmed S. DarwishWriters increment the sequence count at the start and the end of their
220d24f65eSAhmed S. Darwishcritical section. After starting the critical section the sequence count
230d24f65eSAhmed S. Darwishis odd and indicates to the readers that an update is in progress. At
240d24f65eSAhmed S. Darwishthe end of the write side critical section the sequence count becomes
250d24f65eSAhmed S. Darwisheven again which lets readers make progress.
260d24f65eSAhmed S. Darwish
270d24f65eSAhmed S. DarwishA sequence counter write side critical section must never be preempted
280d24f65eSAhmed S. Darwishor interrupted by read side sections. Otherwise the reader will spin for
290d24f65eSAhmed S. Darwishthe entire scheduler tick due to the odd sequence count value and the
300d24f65eSAhmed S. Darwishinterrupted writer. If that reader belongs to a real-time scheduling
310d24f65eSAhmed S. Darwishclass, it can spin forever and the kernel will livelock.
320d24f65eSAhmed S. Darwish
330d24f65eSAhmed S. DarwishThis mechanism cannot be used if the protected data contains pointers,
340d24f65eSAhmed S. Darwishas the writer can invalidate a pointer that the reader is following.
350d24f65eSAhmed S. Darwish
360d24f65eSAhmed S. Darwish
370d24f65eSAhmed S. Darwish.. _seqcount_t:
380d24f65eSAhmed S. Darwish
390d24f65eSAhmed S. DarwishSequence counters (``seqcount_t``)
400d24f65eSAhmed S. Darwish==================================
410d24f65eSAhmed S. Darwish
42*d2bef8e1SAkhil RajThis is the raw counting mechanism, which does not protect against
430d24f65eSAhmed S. Darwishmultiple writers.  Write side critical sections must thus be serialized
440d24f65eSAhmed S. Darwishby an external lock.
450d24f65eSAhmed S. Darwish
460d24f65eSAhmed S. DarwishIf the write serialization primitive is not implicitly disabling
470d24f65eSAhmed S. Darwishpreemption, preemption must be explicitly disabled before entering the
480d24f65eSAhmed S. Darwishwrite side section. If the read section can be invoked from hardirq or
490d24f65eSAhmed S. Darwishsoftirq contexts, interrupts or bottom halves must also be respectively
500d24f65eSAhmed S. Darwishdisabled before entering the write section.
510d24f65eSAhmed S. Darwish
520d24f65eSAhmed S. DarwishIf it's desired to automatically handle the sequence counter
530d24f65eSAhmed S. Darwishrequirements of writer serialization and non-preemptibility, use
540d24f65eSAhmed S. Darwish:ref:`seqlock_t` instead.
550d24f65eSAhmed S. Darwish
560d24f65eSAhmed S. DarwishInitialization::
570d24f65eSAhmed S. Darwish
580d24f65eSAhmed S. Darwish	/* dynamic */
590d24f65eSAhmed S. Darwish	seqcount_t foo_seqcount;
600d24f65eSAhmed S. Darwish	seqcount_init(&foo_seqcount);
610d24f65eSAhmed S. Darwish
620d24f65eSAhmed S. Darwish	/* static */
630d24f65eSAhmed S. Darwish	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
640d24f65eSAhmed S. Darwish
650d24f65eSAhmed S. Darwish	/* C99 struct init */
660d24f65eSAhmed S. Darwish	struct {
670d24f65eSAhmed S. Darwish		.seq   = SEQCNT_ZERO(foo.seq),
680d24f65eSAhmed S. Darwish	} foo;
690d24f65eSAhmed S. Darwish
700d24f65eSAhmed S. DarwishWrite path::
710d24f65eSAhmed S. Darwish
720d24f65eSAhmed S. Darwish	/* Serialized context with disabled preemption */
730d24f65eSAhmed S. Darwish
740d24f65eSAhmed S. Darwish	write_seqcount_begin(&foo_seqcount);
750d24f65eSAhmed S. Darwish
760d24f65eSAhmed S. Darwish	/* ... [[write-side critical section]] ... */
770d24f65eSAhmed S. Darwish
780d24f65eSAhmed S. Darwish	write_seqcount_end(&foo_seqcount);
790d24f65eSAhmed S. Darwish
800d24f65eSAhmed S. DarwishRead path::
810d24f65eSAhmed S. Darwish
820d24f65eSAhmed S. Darwish	do {
830d24f65eSAhmed S. Darwish		seq = read_seqcount_begin(&foo_seqcount);
840d24f65eSAhmed S. Darwish
850d24f65eSAhmed S. Darwish		/* ... [[read-side critical section]] ... */
860d24f65eSAhmed S. Darwish
870d24f65eSAhmed S. Darwish	} while (read_seqcount_retry(&foo_seqcount, seq));
880d24f65eSAhmed S. Darwish
890d24f65eSAhmed S. Darwish
9055f3560dSAhmed S. Darwish.. _seqcount_locktype_t:
9155f3560dSAhmed S. Darwish
92cf486472SAhmed S. DarwishSequence counters with associated locks (``seqcount_LOCKNAME_t``)
9355f3560dSAhmed S. Darwish-----------------------------------------------------------------
9455f3560dSAhmed S. Darwish
9555f3560dSAhmed S. DarwishAs discussed at :ref:`seqcount_t`, sequence count write side critical
9655f3560dSAhmed S. Darwishsections must be serialized and non-preemptible. This variant of
9755f3560dSAhmed S. Darwishsequence counters associate the lock used for writer serialization at
9855f3560dSAhmed S. Darwishinitialization time, which enables lockdep to validate that the write
9955f3560dSAhmed S. Darwishside critical sections are properly serialized.
10055f3560dSAhmed S. Darwish
10155f3560dSAhmed S. DarwishThis lock association is a NOOP if lockdep is disabled and has neither
10255f3560dSAhmed S. Darwishstorage nor runtime overhead. If lockdep is enabled, the lock pointer is
10355f3560dSAhmed S. Darwishstored in struct seqcount and lockdep's "lock is held" assertions are
10455f3560dSAhmed S. Darwishinjected at the beginning of the write side critical section to validate
10555f3560dSAhmed S. Darwishthat it is properly protected.
10655f3560dSAhmed S. Darwish
10755f3560dSAhmed S. DarwishFor lock types which do not implicitly disable preemption, preemption
10855f3560dSAhmed S. Darwishprotection is enforced in the write side function.
10955f3560dSAhmed S. Darwish
11055f3560dSAhmed S. DarwishThe following sequence counters with associated locks are defined:
11155f3560dSAhmed S. Darwish
11255f3560dSAhmed S. Darwish  - ``seqcount_spinlock_t``
11355f3560dSAhmed S. Darwish  - ``seqcount_raw_spinlock_t``
11455f3560dSAhmed S. Darwish  - ``seqcount_rwlock_t``
11555f3560dSAhmed S. Darwish  - ``seqcount_mutex_t``
11655f3560dSAhmed S. Darwish  - ``seqcount_ww_mutex_t``
11755f3560dSAhmed S. Darwish
118cf486472SAhmed S. DarwishThe sequence counter read and write APIs can take either a plain
119cf486472SAhmed S. Darwishseqcount_t or any of the seqcount_LOCKNAME_t variants above.
12055f3560dSAhmed S. Darwish
121cf486472SAhmed S. DarwishInitialization (replace "LOCKNAME" with one of the supported locks)::
12255f3560dSAhmed S. Darwish
12355f3560dSAhmed S. Darwish	/* dynamic */
124cf486472SAhmed S. Darwish	seqcount_LOCKNAME_t foo_seqcount;
125cf486472SAhmed S. Darwish	seqcount_LOCKNAME_init(&foo_seqcount, &lock);
12655f3560dSAhmed S. Darwish
12755f3560dSAhmed S. Darwish	/* static */
128cf486472SAhmed S. Darwish	static seqcount_LOCKNAME_t foo_seqcount =
129cf486472SAhmed S. Darwish		SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock);
13055f3560dSAhmed S. Darwish
13155f3560dSAhmed S. Darwish	/* C99 struct init */
13255f3560dSAhmed S. Darwish	struct {
133cf486472SAhmed S. Darwish		.seq   = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock),
13455f3560dSAhmed S. Darwish	} foo;
13555f3560dSAhmed S. Darwish
13655f3560dSAhmed S. DarwishWrite path: same as in :ref:`seqcount_t`, while running from a context
137cf486472SAhmed S. Darwishwith the associated write serialization lock acquired.
13855f3560dSAhmed S. Darwish
13955f3560dSAhmed S. DarwishRead path: same as in :ref:`seqcount_t`.
14055f3560dSAhmed S. Darwish
14180793c34SAhmed S. Darwish
14280793c34SAhmed S. Darwish.. _seqcount_latch_t:
14380793c34SAhmed S. Darwish
14480793c34SAhmed S. DarwishLatch sequence counters (``seqcount_latch_t``)
14580793c34SAhmed S. Darwish----------------------------------------------
14680793c34SAhmed S. Darwish
14780793c34SAhmed S. DarwishLatch sequence counters are a multiversion concurrency control mechanism
14880793c34SAhmed S. Darwishwhere the embedded seqcount_t counter even/odd value is used to switch
14980793c34SAhmed S. Darwishbetween two copies of protected data. This allows the sequence counter
15080793c34SAhmed S. Darwishread path to safely interrupt its own write side critical section.
15180793c34SAhmed S. Darwish
15280793c34SAhmed S. DarwishUse seqcount_latch_t when the write side sections cannot be protected
15380793c34SAhmed S. Darwishfrom interruption by readers. This is typically the case when the read
15480793c34SAhmed S. Darwishside can be invoked from NMI handlers.
15580793c34SAhmed S. Darwish
15680793c34SAhmed S. DarwishCheck `raw_write_seqcount_latch()` for more information.
15780793c34SAhmed S. Darwish
15880793c34SAhmed S. Darwish
1590d24f65eSAhmed S. Darwish.. _seqlock_t:
1600d24f65eSAhmed S. Darwish
1610d24f65eSAhmed S. DarwishSequential locks (``seqlock_t``)
1620d24f65eSAhmed S. Darwish================================
1630d24f65eSAhmed S. Darwish
1640d24f65eSAhmed S. DarwishThis contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
1650d24f65eSAhmed S. Darwishembedded spinlock for writer serialization and non-preemptibility.
1660d24f65eSAhmed S. Darwish
1670d24f65eSAhmed S. DarwishIf the read side section can be invoked from hardirq or softirq context,
1680d24f65eSAhmed S. Darwishuse the write side function variants which disable interrupts or bottom
1690d24f65eSAhmed S. Darwishhalves respectively.
1700d24f65eSAhmed S. Darwish
1710d24f65eSAhmed S. DarwishInitialization::
1720d24f65eSAhmed S. Darwish
1730d24f65eSAhmed S. Darwish	/* dynamic */
1740d24f65eSAhmed S. Darwish	seqlock_t foo_seqlock;
1750d24f65eSAhmed S. Darwish	seqlock_init(&foo_seqlock);
1760d24f65eSAhmed S. Darwish
1770d24f65eSAhmed S. Darwish	/* static */
1780d24f65eSAhmed S. Darwish	static DEFINE_SEQLOCK(foo_seqlock);
1790d24f65eSAhmed S. Darwish
1800d24f65eSAhmed S. Darwish	/* C99 struct init */
1810d24f65eSAhmed S. Darwish	struct {
1820d24f65eSAhmed S. Darwish		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
1830d24f65eSAhmed S. Darwish	} foo;
1840d24f65eSAhmed S. Darwish
1850d24f65eSAhmed S. DarwishWrite path::
1860d24f65eSAhmed S. Darwish
1870d24f65eSAhmed S. Darwish	write_seqlock(&foo_seqlock);
1880d24f65eSAhmed S. Darwish
1890d24f65eSAhmed S. Darwish	/* ... [[write-side critical section]] ... */
1900d24f65eSAhmed S. Darwish
1910d24f65eSAhmed S. Darwish	write_sequnlock(&foo_seqlock);
1920d24f65eSAhmed S. Darwish
1930d24f65eSAhmed S. DarwishRead path, three categories:
1940d24f65eSAhmed S. Darwish
1950d24f65eSAhmed S. Darwish1. Normal Sequence readers which never block a writer but they must
1960d24f65eSAhmed S. Darwish   retry if a writer is in progress by detecting change in the sequence
1970d24f65eSAhmed S. Darwish   number.  Writers do not wait for a sequence reader::
1980d24f65eSAhmed S. Darwish
1990d24f65eSAhmed S. Darwish	do {
2000d24f65eSAhmed S. Darwish		seq = read_seqbegin(&foo_seqlock);
2010d24f65eSAhmed S. Darwish
2020d24f65eSAhmed S. Darwish		/* ... [[read-side critical section]] ... */
2030d24f65eSAhmed S. Darwish
2040d24f65eSAhmed S. Darwish	} while (read_seqretry(&foo_seqlock, seq));
2050d24f65eSAhmed S. Darwish
2060d24f65eSAhmed S. Darwish2. Locking readers which will wait if a writer or another locking reader
2070d24f65eSAhmed S. Darwish   is in progress. A locking reader in progress will also block a writer
2080d24f65eSAhmed S. Darwish   from entering its critical section. This read lock is
2090d24f65eSAhmed S. Darwish   exclusive. Unlike rwlock_t, only one locking reader can acquire it::
2100d24f65eSAhmed S. Darwish
2110d24f65eSAhmed S. Darwish	read_seqlock_excl(&foo_seqlock);
2120d24f65eSAhmed S. Darwish
2130d24f65eSAhmed S. Darwish	/* ... [[read-side critical section]] ... */
2140d24f65eSAhmed S. Darwish
2150d24f65eSAhmed S. Darwish	read_sequnlock_excl(&foo_seqlock);
2160d24f65eSAhmed S. Darwish
2170d24f65eSAhmed S. Darwish3. Conditional lockless reader (as in 1), or locking reader (as in 2),
2180d24f65eSAhmed S. Darwish   according to a passed marker. This is used to avoid lockless readers
2190d24f65eSAhmed S. Darwish   starvation (too much retry loops) in case of a sharp spike in write
2200d24f65eSAhmed S. Darwish   activity. First, a lockless read is tried (even marker passed). If
2210d24f65eSAhmed S. Darwish   that trial fails (odd sequence counter is returned, which is used as
2220d24f65eSAhmed S. Darwish   the next iteration marker), the lockless read is transformed to a
2230d24f65eSAhmed S. Darwish   full locking read and no retry loop is necessary::
2240d24f65eSAhmed S. Darwish
2250d24f65eSAhmed S. Darwish	/* marker; even initialization */
2260d24f65eSAhmed S. Darwish	int seq = 0;
2270d24f65eSAhmed S. Darwish	do {
2280d24f65eSAhmed S. Darwish		read_seqbegin_or_lock(&foo_seqlock, &seq);
2290d24f65eSAhmed S. Darwish
2300d24f65eSAhmed S. Darwish		/* ... [[read-side critical section]] ... */
2310d24f65eSAhmed S. Darwish
2320d24f65eSAhmed S. Darwish	} while (need_seqretry(&foo_seqlock, seq));
2330d24f65eSAhmed S. Darwish	done_seqretry(&foo_seqlock, seq);
2340d24f65eSAhmed S. Darwish
2350d24f65eSAhmed S. Darwish
2360d24f65eSAhmed S. DarwishAPI documentation
2370d24f65eSAhmed S. Darwish=================
2380d24f65eSAhmed S. Darwish
2390d24f65eSAhmed S. Darwish.. kernel-doc:: include/linux/seqlock.h
240