10d24f65eSAhmed S. Darwish====================================== 20d24f65eSAhmed S. DarwishSequence counters and sequential locks 30d24f65eSAhmed S. Darwish====================================== 40d24f65eSAhmed S. Darwish 50d24f65eSAhmed S. DarwishIntroduction 60d24f65eSAhmed S. Darwish============ 70d24f65eSAhmed S. Darwish 80d24f65eSAhmed S. DarwishSequence counters are a reader-writer consistency mechanism with 90d24f65eSAhmed S. Darwishlockless readers (read-only retry loops), and no writer starvation. They 100d24f65eSAhmed S. Darwishare used for data that's rarely written to (e.g. system time), where the 110d24f65eSAhmed S. Darwishreader wants a consistent set of information and is willing to retry if 120d24f65eSAhmed S. Darwishthat information changes. 130d24f65eSAhmed S. Darwish 140d24f65eSAhmed S. DarwishA data set is consistent when the sequence count at the beginning of the 150d24f65eSAhmed S. Darwishread side critical section is even and the same sequence count value is 160d24f65eSAhmed S. Darwishread again at the end of the critical section. The data in the set must 170d24f65eSAhmed S. Darwishbe copied out inside the read side critical section. If the sequence 180d24f65eSAhmed S. Darwishcount has changed between the start and the end of the critical section, 190d24f65eSAhmed S. Darwishthe reader must retry. 200d24f65eSAhmed S. Darwish 210d24f65eSAhmed S. DarwishWriters increment the sequence count at the start and the end of their 220d24f65eSAhmed S. Darwishcritical section. After starting the critical section the sequence count 230d24f65eSAhmed S. Darwishis odd and indicates to the readers that an update is in progress. At 240d24f65eSAhmed S. Darwishthe end of the write side critical section the sequence count becomes 250d24f65eSAhmed S. Darwisheven again which lets readers make progress. 260d24f65eSAhmed S. Darwish 270d24f65eSAhmed S. DarwishA sequence counter write side critical section must never be preempted 280d24f65eSAhmed S. Darwishor interrupted by read side sections. Otherwise the reader will spin for 290d24f65eSAhmed S. Darwishthe entire scheduler tick due to the odd sequence count value and the 300d24f65eSAhmed S. Darwishinterrupted writer. If that reader belongs to a real-time scheduling 310d24f65eSAhmed S. Darwishclass, it can spin forever and the kernel will livelock. 320d24f65eSAhmed S. Darwish 330d24f65eSAhmed S. DarwishThis mechanism cannot be used if the protected data contains pointers, 340d24f65eSAhmed S. Darwishas the writer can invalidate a pointer that the reader is following. 350d24f65eSAhmed S. Darwish 360d24f65eSAhmed S. Darwish 370d24f65eSAhmed S. Darwish.. _seqcount_t: 380d24f65eSAhmed S. Darwish 390d24f65eSAhmed S. DarwishSequence counters (``seqcount_t``) 400d24f65eSAhmed S. Darwish================================== 410d24f65eSAhmed S. Darwish 42d2bef8e1SAkhil RajThis is the raw counting mechanism, which does not protect against 430d24f65eSAhmed S. Darwishmultiple writers. Write side critical sections must thus be serialized 440d24f65eSAhmed S. Darwishby an external lock. 450d24f65eSAhmed S. Darwish 460d24f65eSAhmed S. DarwishIf the write serialization primitive is not implicitly disabling 470d24f65eSAhmed S. Darwishpreemption, preemption must be explicitly disabled before entering the 480d24f65eSAhmed S. Darwishwrite side section. If the read section can be invoked from hardirq or 490d24f65eSAhmed S. Darwishsoftirq contexts, interrupts or bottom halves must also be respectively 500d24f65eSAhmed S. Darwishdisabled before entering the write section. 510d24f65eSAhmed S. Darwish 520d24f65eSAhmed S. DarwishIf it's desired to automatically handle the sequence counter 530d24f65eSAhmed S. Darwishrequirements of writer serialization and non-preemptibility, use 540d24f65eSAhmed S. Darwish:ref:`seqlock_t` instead. 550d24f65eSAhmed S. Darwish 560d24f65eSAhmed S. DarwishInitialization:: 570d24f65eSAhmed S. Darwish 580d24f65eSAhmed S. Darwish /* dynamic */ 590d24f65eSAhmed S. Darwish seqcount_t foo_seqcount; 600d24f65eSAhmed S. Darwish seqcount_init(&foo_seqcount); 610d24f65eSAhmed S. Darwish 620d24f65eSAhmed S. Darwish /* static */ 630d24f65eSAhmed S. Darwish static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount); 640d24f65eSAhmed S. Darwish 650d24f65eSAhmed S. Darwish /* C99 struct init */ 660d24f65eSAhmed S. Darwish struct { 670d24f65eSAhmed S. Darwish .seq = SEQCNT_ZERO(foo.seq), 680d24f65eSAhmed S. Darwish } foo; 690d24f65eSAhmed S. Darwish 700d24f65eSAhmed S. DarwishWrite path:: 710d24f65eSAhmed S. Darwish 720d24f65eSAhmed S. Darwish /* Serialized context with disabled preemption */ 730d24f65eSAhmed S. Darwish 740d24f65eSAhmed S. Darwish write_seqcount_begin(&foo_seqcount); 750d24f65eSAhmed S. Darwish 760d24f65eSAhmed S. Darwish /* ... [[write-side critical section]] ... */ 770d24f65eSAhmed S. Darwish 780d24f65eSAhmed S. Darwish write_seqcount_end(&foo_seqcount); 790d24f65eSAhmed S. Darwish 800d24f65eSAhmed S. DarwishRead path:: 810d24f65eSAhmed S. Darwish 820d24f65eSAhmed S. Darwish do { 830d24f65eSAhmed S. Darwish seq = read_seqcount_begin(&foo_seqcount); 840d24f65eSAhmed S. Darwish 850d24f65eSAhmed S. Darwish /* ... [[read-side critical section]] ... */ 860d24f65eSAhmed S. Darwish 870d24f65eSAhmed S. Darwish } while (read_seqcount_retry(&foo_seqcount, seq)); 880d24f65eSAhmed S. Darwish 890d24f65eSAhmed S. Darwish 9055f3560dSAhmed S. Darwish.. _seqcount_locktype_t: 9155f3560dSAhmed S. Darwish 92cf486472SAhmed S. DarwishSequence counters with associated locks (``seqcount_LOCKNAME_t``) 9355f3560dSAhmed S. Darwish----------------------------------------------------------------- 9455f3560dSAhmed S. Darwish 9555f3560dSAhmed S. DarwishAs discussed at :ref:`seqcount_t`, sequence count write side critical 9655f3560dSAhmed S. Darwishsections must be serialized and non-preemptible. This variant of 9755f3560dSAhmed S. Darwishsequence counters associate the lock used for writer serialization at 9855f3560dSAhmed S. Darwishinitialization time, which enables lockdep to validate that the write 9955f3560dSAhmed S. Darwishside critical sections are properly serialized. 10055f3560dSAhmed S. Darwish 10155f3560dSAhmed S. DarwishThis lock association is a NOOP if lockdep is disabled and has neither 10255f3560dSAhmed S. Darwishstorage nor runtime overhead. If lockdep is enabled, the lock pointer is 10355f3560dSAhmed S. Darwishstored in struct seqcount and lockdep's "lock is held" assertions are 10455f3560dSAhmed S. Darwishinjected at the beginning of the write side critical section to validate 10555f3560dSAhmed S. Darwishthat it is properly protected. 10655f3560dSAhmed S. Darwish 10755f3560dSAhmed S. DarwishFor lock types which do not implicitly disable preemption, preemption 10855f3560dSAhmed S. Darwishprotection is enforced in the write side function. 10955f3560dSAhmed S. Darwish 11055f3560dSAhmed S. DarwishThe following sequence counters with associated locks are defined: 11155f3560dSAhmed S. Darwish 11255f3560dSAhmed S. Darwish - ``seqcount_spinlock_t`` 11355f3560dSAhmed S. Darwish - ``seqcount_raw_spinlock_t`` 11455f3560dSAhmed S. Darwish - ``seqcount_rwlock_t`` 11555f3560dSAhmed S. Darwish - ``seqcount_mutex_t`` 11655f3560dSAhmed S. Darwish - ``seqcount_ww_mutex_t`` 11755f3560dSAhmed S. Darwish 118cf486472SAhmed S. DarwishThe sequence counter read and write APIs can take either a plain 119cf486472SAhmed S. Darwishseqcount_t or any of the seqcount_LOCKNAME_t variants above. 12055f3560dSAhmed S. Darwish 121cf486472SAhmed S. DarwishInitialization (replace "LOCKNAME" with one of the supported locks):: 12255f3560dSAhmed S. Darwish 12355f3560dSAhmed S. Darwish /* dynamic */ 124cf486472SAhmed S. Darwish seqcount_LOCKNAME_t foo_seqcount; 125cf486472SAhmed S. Darwish seqcount_LOCKNAME_init(&foo_seqcount, &lock); 12655f3560dSAhmed S. Darwish 12755f3560dSAhmed S. Darwish /* static */ 128cf486472SAhmed S. Darwish static seqcount_LOCKNAME_t foo_seqcount = 129cf486472SAhmed S. Darwish SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock); 13055f3560dSAhmed S. Darwish 13155f3560dSAhmed S. Darwish /* C99 struct init */ 13255f3560dSAhmed S. Darwish struct { 133cf486472SAhmed S. Darwish .seq = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock), 13455f3560dSAhmed S. Darwish } foo; 13555f3560dSAhmed S. Darwish 13655f3560dSAhmed S. DarwishWrite path: same as in :ref:`seqcount_t`, while running from a context 137cf486472SAhmed S. Darwishwith the associated write serialization lock acquired. 13855f3560dSAhmed S. Darwish 13955f3560dSAhmed S. DarwishRead path: same as in :ref:`seqcount_t`. 14055f3560dSAhmed S. Darwish 14180793c34SAhmed S. Darwish 14280793c34SAhmed S. Darwish.. _seqcount_latch_t: 14380793c34SAhmed S. Darwish 14480793c34SAhmed S. DarwishLatch sequence counters (``seqcount_latch_t``) 14580793c34SAhmed S. Darwish---------------------------------------------- 14680793c34SAhmed S. Darwish 14780793c34SAhmed S. DarwishLatch sequence counters are a multiversion concurrency control mechanism 14880793c34SAhmed S. Darwishwhere the embedded seqcount_t counter even/odd value is used to switch 14980793c34SAhmed S. Darwishbetween two copies of protected data. This allows the sequence counter 15080793c34SAhmed S. Darwishread path to safely interrupt its own write side critical section. 15180793c34SAhmed S. Darwish 15280793c34SAhmed S. DarwishUse seqcount_latch_t when the write side sections cannot be protected 15380793c34SAhmed S. Darwishfrom interruption by readers. This is typically the case when the read 15480793c34SAhmed S. Darwishside can be invoked from NMI handlers. 15580793c34SAhmed S. Darwish 156*5b12a7e7SMarco ElverCheck `write_seqcount_latch()` for more information. 15780793c34SAhmed S. Darwish 15880793c34SAhmed S. Darwish 1590d24f65eSAhmed S. Darwish.. _seqlock_t: 1600d24f65eSAhmed S. Darwish 1610d24f65eSAhmed S. DarwishSequential locks (``seqlock_t``) 1620d24f65eSAhmed S. Darwish================================ 1630d24f65eSAhmed S. Darwish 1640d24f65eSAhmed S. DarwishThis contains the :ref:`seqcount_t` mechanism earlier discussed, plus an 1650d24f65eSAhmed S. Darwishembedded spinlock for writer serialization and non-preemptibility. 1660d24f65eSAhmed S. Darwish 1670d24f65eSAhmed S. DarwishIf the read side section can be invoked from hardirq or softirq context, 1680d24f65eSAhmed S. Darwishuse the write side function variants which disable interrupts or bottom 1690d24f65eSAhmed S. Darwishhalves respectively. 1700d24f65eSAhmed S. Darwish 1710d24f65eSAhmed S. DarwishInitialization:: 1720d24f65eSAhmed S. Darwish 1730d24f65eSAhmed S. Darwish /* dynamic */ 1740d24f65eSAhmed S. Darwish seqlock_t foo_seqlock; 1750d24f65eSAhmed S. Darwish seqlock_init(&foo_seqlock); 1760d24f65eSAhmed S. Darwish 1770d24f65eSAhmed S. Darwish /* static */ 1780d24f65eSAhmed S. Darwish static DEFINE_SEQLOCK(foo_seqlock); 1790d24f65eSAhmed S. Darwish 1800d24f65eSAhmed S. Darwish /* C99 struct init */ 1810d24f65eSAhmed S. Darwish struct { 1820d24f65eSAhmed S. Darwish .seql = __SEQLOCK_UNLOCKED(foo.seql) 1830d24f65eSAhmed S. Darwish } foo; 1840d24f65eSAhmed S. Darwish 1850d24f65eSAhmed S. DarwishWrite path:: 1860d24f65eSAhmed S. Darwish 1870d24f65eSAhmed S. Darwish write_seqlock(&foo_seqlock); 1880d24f65eSAhmed S. Darwish 1890d24f65eSAhmed S. Darwish /* ... [[write-side critical section]] ... */ 1900d24f65eSAhmed S. Darwish 1910d24f65eSAhmed S. Darwish write_sequnlock(&foo_seqlock); 1920d24f65eSAhmed S. Darwish 1930d24f65eSAhmed S. DarwishRead path, three categories: 1940d24f65eSAhmed S. Darwish 1950d24f65eSAhmed S. Darwish1. Normal Sequence readers which never block a writer but they must 1960d24f65eSAhmed S. Darwish retry if a writer is in progress by detecting change in the sequence 1970d24f65eSAhmed S. Darwish number. Writers do not wait for a sequence reader:: 1980d24f65eSAhmed S. Darwish 1990d24f65eSAhmed S. Darwish do { 2000d24f65eSAhmed S. Darwish seq = read_seqbegin(&foo_seqlock); 2010d24f65eSAhmed S. Darwish 2020d24f65eSAhmed S. Darwish /* ... [[read-side critical section]] ... */ 2030d24f65eSAhmed S. Darwish 2040d24f65eSAhmed S. Darwish } while (read_seqretry(&foo_seqlock, seq)); 2050d24f65eSAhmed S. Darwish 2060d24f65eSAhmed S. Darwish2. Locking readers which will wait if a writer or another locking reader 2070d24f65eSAhmed S. Darwish is in progress. A locking reader in progress will also block a writer 2080d24f65eSAhmed S. Darwish from entering its critical section. This read lock is 2090d24f65eSAhmed S. Darwish exclusive. Unlike rwlock_t, only one locking reader can acquire it:: 2100d24f65eSAhmed S. Darwish 2110d24f65eSAhmed S. Darwish read_seqlock_excl(&foo_seqlock); 2120d24f65eSAhmed S. Darwish 2130d24f65eSAhmed S. Darwish /* ... [[read-side critical section]] ... */ 2140d24f65eSAhmed S. Darwish 2150d24f65eSAhmed S. Darwish read_sequnlock_excl(&foo_seqlock); 2160d24f65eSAhmed S. Darwish 2170d24f65eSAhmed S. Darwish3. Conditional lockless reader (as in 1), or locking reader (as in 2), 2180d24f65eSAhmed S. Darwish according to a passed marker. This is used to avoid lockless readers 2190d24f65eSAhmed S. Darwish starvation (too much retry loops) in case of a sharp spike in write 2200d24f65eSAhmed S. Darwish activity. First, a lockless read is tried (even marker passed). If 2210d24f65eSAhmed S. Darwish that trial fails (odd sequence counter is returned, which is used as 2220d24f65eSAhmed S. Darwish the next iteration marker), the lockless read is transformed to a 2230d24f65eSAhmed S. Darwish full locking read and no retry loop is necessary:: 2240d24f65eSAhmed S. Darwish 2250d24f65eSAhmed S. Darwish /* marker; even initialization */ 2260d24f65eSAhmed S. Darwish int seq = 0; 2270d24f65eSAhmed S. Darwish do { 2280d24f65eSAhmed S. Darwish read_seqbegin_or_lock(&foo_seqlock, &seq); 2290d24f65eSAhmed S. Darwish 2300d24f65eSAhmed S. Darwish /* ... [[read-side critical section]] ... */ 2310d24f65eSAhmed S. Darwish 2320d24f65eSAhmed S. Darwish } while (need_seqretry(&foo_seqlock, seq)); 2330d24f65eSAhmed S. Darwish done_seqretry(&foo_seqlock, seq); 2340d24f65eSAhmed S. Darwish 2350d24f65eSAhmed S. Darwish 2360d24f65eSAhmed S. DarwishAPI documentation 2370d24f65eSAhmed S. Darwish================= 2380d24f65eSAhmed S. Darwish 2390d24f65eSAhmed S. Darwish.. kernel-doc:: include/linux/seqlock.h 240