xref: /openbmc/linux/Documentation/trace/ring-buffer-design.rst (revision cbecf716ca618fd44feda6bd9a64a8179d031fc5)
1*e0bc9cf0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
2f00c313bSMauro Carvalho Chehab
3f00c313bSMauro Carvalho Chehab===========================
4f00c313bSMauro Carvalho ChehabLockless Ring Buffer Design
5f00c313bSMauro Carvalho Chehab===========================
6f00c313bSMauro Carvalho Chehab
7f00c313bSMauro Carvalho ChehabCopyright 2009 Red Hat Inc.
8f00c313bSMauro Carvalho Chehab
9f00c313bSMauro Carvalho Chehab:Author:   Steven Rostedt <srostedt@redhat.com>
10f00c313bSMauro Carvalho Chehab:License:  The GNU Free Documentation License, Version 1.2
11f00c313bSMauro Carvalho Chehab           (dual licensed under the GPL v2)
12f00c313bSMauro Carvalho Chehab:Reviewers:  Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
13f00c313bSMauro Carvalho Chehab	     and Frederic Weisbecker.
14f00c313bSMauro Carvalho Chehab
15f00c313bSMauro Carvalho Chehab
16f00c313bSMauro Carvalho ChehabWritten for: 2.6.31
17f00c313bSMauro Carvalho Chehab
18f00c313bSMauro Carvalho ChehabTerminology used in this Document
19f00c313bSMauro Carvalho Chehab---------------------------------
20f00c313bSMauro Carvalho Chehab
21f00c313bSMauro Carvalho Chehabtail
22f00c313bSMauro Carvalho Chehab	- where new writes happen in the ring buffer.
23f00c313bSMauro Carvalho Chehab
24f00c313bSMauro Carvalho Chehabhead
25f00c313bSMauro Carvalho Chehab	- where new reads happen in the ring buffer.
26f00c313bSMauro Carvalho Chehab
27f00c313bSMauro Carvalho Chehabproducer
28f00c313bSMauro Carvalho Chehab	- the task that writes into the ring buffer (same as writer)
29f00c313bSMauro Carvalho Chehab
30f00c313bSMauro Carvalho Chehabwriter
31f00c313bSMauro Carvalho Chehab	- same as producer
32f00c313bSMauro Carvalho Chehab
33f00c313bSMauro Carvalho Chehabconsumer
34f00c313bSMauro Carvalho Chehab	- the task that reads from the buffer (same as reader)
35f00c313bSMauro Carvalho Chehab
36f00c313bSMauro Carvalho Chehabreader
37f00c313bSMauro Carvalho Chehab	- same as consumer.
38f00c313bSMauro Carvalho Chehab
39f00c313bSMauro Carvalho Chehabreader_page
40f00c313bSMauro Carvalho Chehab	- A page outside the ring buffer used solely (for the most part)
41f00c313bSMauro Carvalho Chehab	  by the reader.
42f00c313bSMauro Carvalho Chehab
43f00c313bSMauro Carvalho Chehabhead_page
44f00c313bSMauro Carvalho Chehab	- a pointer to the page that the reader will use next
45f00c313bSMauro Carvalho Chehab
46f00c313bSMauro Carvalho Chehabtail_page
47f00c313bSMauro Carvalho Chehab	- a pointer to the page that will be written to next
48f00c313bSMauro Carvalho Chehab
49f00c313bSMauro Carvalho Chehabcommit_page
50f00c313bSMauro Carvalho Chehab	- a pointer to the page with the last finished non-nested write.
51f00c313bSMauro Carvalho Chehab
52f00c313bSMauro Carvalho Chehabcmpxchg
53f00c313bSMauro Carvalho Chehab	- hardware-assisted atomic transaction that performs the following::
54f00c313bSMauro Carvalho Chehab
55f00c313bSMauro Carvalho Chehab	    A = B if previous A == C
56f00c313bSMauro Carvalho Chehab
57f00c313bSMauro Carvalho Chehab	    R = cmpxchg(A, C, B) is saying that we replace A with B if and only
58f00c313bSMauro Carvalho Chehab		if current A is equal to C, and we put the old (current)
59f00c313bSMauro Carvalho Chehab		A into R
60f00c313bSMauro Carvalho Chehab
61f00c313bSMauro Carvalho Chehab	    R gets the previous A regardless if A is updated with B or not.
62f00c313bSMauro Carvalho Chehab
63f00c313bSMauro Carvalho Chehab	  To see if the update was successful a compare of ``R == C``
64f00c313bSMauro Carvalho Chehab	  may be used.
65f00c313bSMauro Carvalho Chehab
66f00c313bSMauro Carvalho ChehabThe Generic Ring Buffer
67f00c313bSMauro Carvalho Chehab-----------------------
68f00c313bSMauro Carvalho Chehab
69f00c313bSMauro Carvalho ChehabThe ring buffer can be used in either an overwrite mode or in
70f00c313bSMauro Carvalho Chehabproducer/consumer mode.
71f00c313bSMauro Carvalho Chehab
72f00c313bSMauro Carvalho ChehabProducer/consumer mode is where if the producer were to fill up the
73f00c313bSMauro Carvalho Chehabbuffer before the consumer could free up anything, the producer
74f00c313bSMauro Carvalho Chehabwill stop writing to the buffer. This will lose most recent events.
75f00c313bSMauro Carvalho Chehab
76f00c313bSMauro Carvalho ChehabOverwrite mode is where if the producer were to fill up the buffer
77f00c313bSMauro Carvalho Chehabbefore the consumer could free up anything, the producer will
78f00c313bSMauro Carvalho Chehaboverwrite the older data. This will lose the oldest events.
79f00c313bSMauro Carvalho Chehab
80f00c313bSMauro Carvalho ChehabNo two writers can write at the same time (on the same per-cpu buffer),
81f00c313bSMauro Carvalho Chehabbut a writer may interrupt another writer, but it must finish writing
82f00c313bSMauro Carvalho Chehabbefore the previous writer may continue. This is very important to the
83f00c313bSMauro Carvalho Chehabalgorithm. The writers act like a "stack". The way interrupts works
84f00c313bSMauro Carvalho Chehabenforces this behavior::
85f00c313bSMauro Carvalho Chehab
86f00c313bSMauro Carvalho Chehab
87f00c313bSMauro Carvalho Chehab  writer1 start
88f00c313bSMauro Carvalho Chehab     <preempted> writer2 start
89f00c313bSMauro Carvalho Chehab         <preempted> writer3 start
90f00c313bSMauro Carvalho Chehab                     writer3 finishes
91f00c313bSMauro Carvalho Chehab                 writer2 finishes
92f00c313bSMauro Carvalho Chehab  writer1 finishes
93f00c313bSMauro Carvalho Chehab
94f00c313bSMauro Carvalho ChehabThis is very much like a writer being preempted by an interrupt and
95f00c313bSMauro Carvalho Chehabthe interrupt doing a write as well.
96f00c313bSMauro Carvalho Chehab
97f00c313bSMauro Carvalho ChehabReaders can happen at any time. But no two readers may run at the
98f00c313bSMauro Carvalho Chehabsame time, nor can a reader preempt/interrupt another reader. A reader
99f00c313bSMauro Carvalho Chehabcannot preempt/interrupt a writer, but it may read/consume from the
100f00c313bSMauro Carvalho Chehabbuffer at the same time as a writer is writing, but the reader must be
101f00c313bSMauro Carvalho Chehabon another processor to do so. A reader may read on its own processor
102f00c313bSMauro Carvalho Chehaband can be preempted by a writer.
103f00c313bSMauro Carvalho Chehab
104f00c313bSMauro Carvalho ChehabA writer can preempt a reader, but a reader cannot preempt a writer.
105f00c313bSMauro Carvalho ChehabBut a reader can read the buffer at the same time (on another processor)
106f00c313bSMauro Carvalho Chehabas a writer.
107f00c313bSMauro Carvalho Chehab
108f00c313bSMauro Carvalho ChehabThe ring buffer is made up of a list of pages held together by a linked list.
109f00c313bSMauro Carvalho Chehab
110f00c313bSMauro Carvalho ChehabAt initialization a reader page is allocated for the reader that is not
111f00c313bSMauro Carvalho Chehabpart of the ring buffer.
112f00c313bSMauro Carvalho Chehab
113f00c313bSMauro Carvalho ChehabThe head_page, tail_page and commit_page are all initialized to point
114f00c313bSMauro Carvalho Chehabto the same page.
115f00c313bSMauro Carvalho Chehab
116f00c313bSMauro Carvalho ChehabThe reader page is initialized to have its next pointer pointing to
117f00c313bSMauro Carvalho Chehabthe head page, and its previous pointer pointing to a page before
118f00c313bSMauro Carvalho Chehabthe head page.
119f00c313bSMauro Carvalho Chehab
120f00c313bSMauro Carvalho ChehabThe reader has its own page to use. At start up time, this page is
121f00c313bSMauro Carvalho Chehaballocated but is not attached to the list. When the reader wants
122f00c313bSMauro Carvalho Chehabto read from the buffer, if its page is empty (like it is on start-up),
123f00c313bSMauro Carvalho Chehabit will swap its page with the head_page. The old reader page will
124f00c313bSMauro Carvalho Chehabbecome part of the ring buffer and the head_page will be removed.
125f00c313bSMauro Carvalho ChehabThe page after the inserted page (old reader_page) will become the
126f00c313bSMauro Carvalho Chehabnew head page.
127f00c313bSMauro Carvalho Chehab
128f00c313bSMauro Carvalho ChehabOnce the new page is given to the reader, the reader could do what
129f00c313bSMauro Carvalho Chehabit wants with it, as long as a writer has left that page.
130f00c313bSMauro Carvalho Chehab
131f00c313bSMauro Carvalho ChehabA sample of how the reader page is swapped: Note this does not
132f00c313bSMauro Carvalho Chehabshow the head page in the buffer, it is for demonstrating a swap
133f00c313bSMauro Carvalho Chehabonly.
134f00c313bSMauro Carvalho Chehab
135f00c313bSMauro Carvalho Chehab::
136f00c313bSMauro Carvalho Chehab
137f00c313bSMauro Carvalho Chehab  +------+
138f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
139f00c313bSMauro Carvalho Chehab  |page  |
140f00c313bSMauro Carvalho Chehab  +------+
141f00c313bSMauro Carvalho Chehab                  +---+   +---+   +---+
142f00c313bSMauro Carvalho Chehab                  |   |-->|   |-->|   |
143f00c313bSMauro Carvalho Chehab                  |   |<--|   |<--|   |
144f00c313bSMauro Carvalho Chehab                  +---+   +---+   +---+
145f00c313bSMauro Carvalho Chehab                   ^ |             ^ |
146f00c313bSMauro Carvalho Chehab                   | +-------------+ |
147f00c313bSMauro Carvalho Chehab                   +-----------------+
148f00c313bSMauro Carvalho Chehab
149f00c313bSMauro Carvalho Chehab
150f00c313bSMauro Carvalho Chehab  +------+
151f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
152f00c313bSMauro Carvalho Chehab  |page  |-------------------+
153f00c313bSMauro Carvalho Chehab  +------+                   v
154f00c313bSMauro Carvalho Chehab    |             +---+   +---+   +---+
155f00c313bSMauro Carvalho Chehab    |             |   |-->|   |-->|   |
156f00c313bSMauro Carvalho Chehab    |             |   |<--|   |<--|   |<-+
157f00c313bSMauro Carvalho Chehab    |             +---+   +---+   +---+  |
158f00c313bSMauro Carvalho Chehab    |              ^ |             ^ |   |
159f00c313bSMauro Carvalho Chehab    |              | +-------------+ |   |
160f00c313bSMauro Carvalho Chehab    |              +-----------------+   |
161f00c313bSMauro Carvalho Chehab    +------------------------------------+
162f00c313bSMauro Carvalho Chehab
163f00c313bSMauro Carvalho Chehab  +------+
164f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
165f00c313bSMauro Carvalho Chehab  |page  |-------------------+
166f00c313bSMauro Carvalho Chehab  +------+ <---------------+ v
167f00c313bSMauro Carvalho Chehab    |  ^          +---+   +---+   +---+
168f00c313bSMauro Carvalho Chehab    |  |          |   |-->|   |-->|   |
169f00c313bSMauro Carvalho Chehab    |  |          |   |   |   |<--|   |<-+
170f00c313bSMauro Carvalho Chehab    |  |          +---+   +---+   +---+  |
171f00c313bSMauro Carvalho Chehab    |  |             |             ^ |   |
172f00c313bSMauro Carvalho Chehab    |  |             +-------------+ |   |
173f00c313bSMauro Carvalho Chehab    |  +-----------------------------+   |
174f00c313bSMauro Carvalho Chehab    +------------------------------------+
175f00c313bSMauro Carvalho Chehab
176f00c313bSMauro Carvalho Chehab  +------+
177f00c313bSMauro Carvalho Chehab  |buffer|          RING BUFFER
178f00c313bSMauro Carvalho Chehab  |page  |-------------------+
179f00c313bSMauro Carvalho Chehab  +------+ <---------------+ v
180f00c313bSMauro Carvalho Chehab    |  ^          +---+   +---+   +---+
181f00c313bSMauro Carvalho Chehab    |  |          |   |   |   |-->|   |
182f00c313bSMauro Carvalho Chehab    |  |  New     |   |   |   |<--|   |<-+
183f00c313bSMauro Carvalho Chehab    |  | Reader   +---+   +---+   +---+  |
184f00c313bSMauro Carvalho Chehab    |  |  page ----^                 |   |
185f00c313bSMauro Carvalho Chehab    |  |                             |   |
186f00c313bSMauro Carvalho Chehab    |  +-----------------------------+   |
187f00c313bSMauro Carvalho Chehab    +------------------------------------+
188f00c313bSMauro Carvalho Chehab
189f00c313bSMauro Carvalho Chehab
190f00c313bSMauro Carvalho Chehab
191f00c313bSMauro Carvalho ChehabIt is possible that the page swapped is the commit page and the tail page,
192f00c313bSMauro Carvalho Chehabif what is in the ring buffer is less than what is held in a buffer page.
193f00c313bSMauro Carvalho Chehab
194f00c313bSMauro Carvalho Chehab::
195f00c313bSMauro Carvalho Chehab
196f00c313bSMauro Carvalho Chehab            reader page    commit page   tail page
197f00c313bSMauro Carvalho Chehab                |              |             |
198f00c313bSMauro Carvalho Chehab                v              |             |
199f00c313bSMauro Carvalho Chehab               +---+           |             |
200f00c313bSMauro Carvalho Chehab               |   |<----------+             |
201f00c313bSMauro Carvalho Chehab               |   |<------------------------+
202f00c313bSMauro Carvalho Chehab               |   |------+
203f00c313bSMauro Carvalho Chehab               +---+      |
204f00c313bSMauro Carvalho Chehab                          |
205f00c313bSMauro Carvalho Chehab                          v
206f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
207f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
208f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
209f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
210f00c313bSMauro Carvalho Chehab
211f00c313bSMauro Carvalho ChehabThis case is still valid for this algorithm.
212f00c313bSMauro Carvalho ChehabWhen the writer leaves the page, it simply goes into the ring buffer
213f00c313bSMauro Carvalho Chehabsince the reader page still points to the next location in the ring
214f00c313bSMauro Carvalho Chehabbuffer.
215f00c313bSMauro Carvalho Chehab
216f00c313bSMauro Carvalho Chehab
217f00c313bSMauro Carvalho ChehabThe main pointers:
218f00c313bSMauro Carvalho Chehab
219f00c313bSMauro Carvalho Chehab  reader page
220f00c313bSMauro Carvalho Chehab	    - The page used solely by the reader and is not part
221f00c313bSMauro Carvalho Chehab              of the ring buffer (may be swapped in)
222f00c313bSMauro Carvalho Chehab
223f00c313bSMauro Carvalho Chehab  head page
224f00c313bSMauro Carvalho Chehab	    - the next page in the ring buffer that will be swapped
225f00c313bSMauro Carvalho Chehab              with the reader page.
226f00c313bSMauro Carvalho Chehab
227f00c313bSMauro Carvalho Chehab  tail page
228f00c313bSMauro Carvalho Chehab	    - the page where the next write will take place.
229f00c313bSMauro Carvalho Chehab
230f00c313bSMauro Carvalho Chehab  commit page
231f00c313bSMauro Carvalho Chehab	    - the page that last finished a write.
232f00c313bSMauro Carvalho Chehab
233f00c313bSMauro Carvalho ChehabThe commit page only is updated by the outermost writer in the
234f00c313bSMauro Carvalho Chehabwriter stack. A writer that preempts another writer will not move the
235f00c313bSMauro Carvalho Chehabcommit page.
236f00c313bSMauro Carvalho Chehab
237f00c313bSMauro Carvalho ChehabWhen data is written into the ring buffer, a position is reserved
238f00c313bSMauro Carvalho Chehabin the ring buffer and passed back to the writer. When the writer
239f00c313bSMauro Carvalho Chehabis finished writing data into that position, it commits the write.
240f00c313bSMauro Carvalho Chehab
241f00c313bSMauro Carvalho ChehabAnother write (or a read) may take place at anytime during this
242f00c313bSMauro Carvalho Chehabtransaction. If another write happens it must finish before continuing
243f00c313bSMauro Carvalho Chehabwith the previous write.
244f00c313bSMauro Carvalho Chehab
245f00c313bSMauro Carvalho Chehab
246f00c313bSMauro Carvalho Chehab   Write reserve::
247f00c313bSMauro Carvalho Chehab
248f00c313bSMauro Carvalho Chehab       Buffer page
249f00c313bSMauro Carvalho Chehab      +---------+
250f00c313bSMauro Carvalho Chehab      |written  |
251f00c313bSMauro Carvalho Chehab      +---------+  <--- given back to writer (current commit)
252f00c313bSMauro Carvalho Chehab      |reserved |
253f00c313bSMauro Carvalho Chehab      +---------+ <--- tail pointer
254f00c313bSMauro Carvalho Chehab      | empty   |
255f00c313bSMauro Carvalho Chehab      +---------+
256f00c313bSMauro Carvalho Chehab
257f00c313bSMauro Carvalho Chehab   Write commit::
258f00c313bSMauro Carvalho Chehab
259f00c313bSMauro Carvalho Chehab       Buffer page
260f00c313bSMauro Carvalho Chehab      +---------+
261f00c313bSMauro Carvalho Chehab      |written  |
262f00c313bSMauro Carvalho Chehab      +---------+
263f00c313bSMauro Carvalho Chehab      |written  |
264f00c313bSMauro Carvalho Chehab      +---------+  <--- next position for write (current commit)
265f00c313bSMauro Carvalho Chehab      | empty   |
266f00c313bSMauro Carvalho Chehab      +---------+
267f00c313bSMauro Carvalho Chehab
268f00c313bSMauro Carvalho Chehab
269f00c313bSMauro Carvalho Chehab If a write happens after the first reserve::
270f00c313bSMauro Carvalho Chehab
271f00c313bSMauro Carvalho Chehab       Buffer page
272f00c313bSMauro Carvalho Chehab      +---------+
273f00c313bSMauro Carvalho Chehab      |written  |
274f00c313bSMauro Carvalho Chehab      +---------+  <-- current commit
275f00c313bSMauro Carvalho Chehab      |reserved |
276f00c313bSMauro Carvalho Chehab      +---------+  <--- given back to second writer
277f00c313bSMauro Carvalho Chehab      |reserved |
278f00c313bSMauro Carvalho Chehab      +---------+ <--- tail pointer
279f00c313bSMauro Carvalho Chehab
280f00c313bSMauro Carvalho Chehab  After second writer commits::
281f00c313bSMauro Carvalho Chehab
282f00c313bSMauro Carvalho Chehab
283f00c313bSMauro Carvalho Chehab       Buffer page
284f00c313bSMauro Carvalho Chehab      +---------+
285f00c313bSMauro Carvalho Chehab      |written  |
286f00c313bSMauro Carvalho Chehab      +---------+  <--(last full commit)
287f00c313bSMauro Carvalho Chehab      |reserved |
288f00c313bSMauro Carvalho Chehab      +---------+
289f00c313bSMauro Carvalho Chehab      |pending  |
290f00c313bSMauro Carvalho Chehab      |commit   |
291f00c313bSMauro Carvalho Chehab      +---------+ <--- tail pointer
292f00c313bSMauro Carvalho Chehab
293f00c313bSMauro Carvalho Chehab  When the first writer commits::
294f00c313bSMauro Carvalho Chehab
295f00c313bSMauro Carvalho Chehab       Buffer page
296f00c313bSMauro Carvalho Chehab      +---------+
297f00c313bSMauro Carvalho Chehab      |written  |
298f00c313bSMauro Carvalho Chehab      +---------+
299f00c313bSMauro Carvalho Chehab      |written  |
300f00c313bSMauro Carvalho Chehab      +---------+
301f00c313bSMauro Carvalho Chehab      |written  |
302f00c313bSMauro Carvalho Chehab      +---------+  <--(last full commit and tail pointer)
303f00c313bSMauro Carvalho Chehab
304f00c313bSMauro Carvalho Chehab
305f00c313bSMauro Carvalho ChehabThe commit pointer points to the last write location that was
306f00c313bSMauro Carvalho Chehabcommitted without preempting another write. When a write that
307f00c313bSMauro Carvalho Chehabpreempted another write is committed, it only becomes a pending commit
308f00c313bSMauro Carvalho Chehaband will not be a full commit until all writes have been committed.
309f00c313bSMauro Carvalho Chehab
310f00c313bSMauro Carvalho ChehabThe commit page points to the page that has the last full commit.
311f00c313bSMauro Carvalho ChehabThe tail page points to the page with the last write (before
312f00c313bSMauro Carvalho Chehabcommitting).
313f00c313bSMauro Carvalho Chehab
314f00c313bSMauro Carvalho ChehabThe tail page is always equal to or after the commit page. It may
315f00c313bSMauro Carvalho Chehabbe several pages ahead. If the tail page catches up to the commit
316f00c313bSMauro Carvalho Chehabpage then no more writes may take place (regardless of the mode
317f00c313bSMauro Carvalho Chehabof the ring buffer: overwrite and produce/consumer).
318f00c313bSMauro Carvalho Chehab
319f00c313bSMauro Carvalho ChehabThe order of pages is::
320f00c313bSMauro Carvalho Chehab
321f00c313bSMauro Carvalho Chehab head page
322f00c313bSMauro Carvalho Chehab commit page
323f00c313bSMauro Carvalho Chehab tail page
324f00c313bSMauro Carvalho Chehab
325f00c313bSMauro Carvalho ChehabPossible scenario::
326f00c313bSMauro Carvalho Chehab
327f00c313bSMauro Carvalho Chehab                               tail page
328f00c313bSMauro Carvalho Chehab    head page         commit page  |
329f00c313bSMauro Carvalho Chehab        |                 |        |
330f00c313bSMauro Carvalho Chehab        v                 v        v
331f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
332f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
333f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
334f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
335f00c313bSMauro Carvalho Chehab
336f00c313bSMauro Carvalho ChehabThere is a special case that the head page is after either the commit page
337f00c313bSMauro Carvalho Chehaband possibly the tail page. That is when the commit (and tail) page has been
338f00c313bSMauro Carvalho Chehabswapped with the reader page. This is because the head page is always
339f00c313bSMauro Carvalho Chehabpart of the ring buffer, but the reader page is not. Whenever there
340f00c313bSMauro Carvalho Chehabhas been less than a full page that has been committed inside the ring buffer,
341f00c313bSMauro Carvalho Chehaband a reader swaps out a page, it will be swapping out the commit page.
342f00c313bSMauro Carvalho Chehab
343f00c313bSMauro Carvalho Chehab::
344f00c313bSMauro Carvalho Chehab
345f00c313bSMauro Carvalho Chehab            reader page    commit page   tail page
346f00c313bSMauro Carvalho Chehab                |              |             |
347f00c313bSMauro Carvalho Chehab                v              |             |
348f00c313bSMauro Carvalho Chehab               +---+           |             |
349f00c313bSMauro Carvalho Chehab               |   |<----------+             |
350f00c313bSMauro Carvalho Chehab               |   |<------------------------+
351f00c313bSMauro Carvalho Chehab               |   |------+
352f00c313bSMauro Carvalho Chehab               +---+      |
353f00c313bSMauro Carvalho Chehab                          |
354f00c313bSMauro Carvalho Chehab                          v
355f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
356f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
357f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
358f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
359f00c313bSMauro Carvalho Chehab                          ^
360f00c313bSMauro Carvalho Chehab                          |
361f00c313bSMauro Carvalho Chehab                      head page
362f00c313bSMauro Carvalho Chehab
363f00c313bSMauro Carvalho Chehab
364f00c313bSMauro Carvalho ChehabIn this case, the head page will not move when the tail and commit
365f00c313bSMauro Carvalho Chehabmove back into the ring buffer.
366f00c313bSMauro Carvalho Chehab
367f00c313bSMauro Carvalho ChehabThe reader cannot swap a page into the ring buffer if the commit page
368f00c313bSMauro Carvalho Chehabis still on that page. If the read meets the last commit (real commit
369f00c313bSMauro Carvalho Chehabnot pending or reserved), then there is nothing more to read.
370f00c313bSMauro Carvalho ChehabThe buffer is considered empty until another full commit finishes.
371f00c313bSMauro Carvalho Chehab
372f00c313bSMauro Carvalho ChehabWhen the tail meets the head page, if the buffer is in overwrite mode,
373f00c313bSMauro Carvalho Chehabthe head page will be pushed ahead one. If the buffer is in producer/consumer
374f00c313bSMauro Carvalho Chehabmode, the write will fail.
375f00c313bSMauro Carvalho Chehab
376f00c313bSMauro Carvalho ChehabOverwrite mode::
377f00c313bSMauro Carvalho Chehab
378f00c313bSMauro Carvalho Chehab              tail page
379f00c313bSMauro Carvalho Chehab                 |
380f00c313bSMauro Carvalho Chehab                 v
381f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
382f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
383f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
384f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
385f00c313bSMauro Carvalho Chehab                          ^
386f00c313bSMauro Carvalho Chehab                          |
387f00c313bSMauro Carvalho Chehab                      head page
388f00c313bSMauro Carvalho Chehab
389f00c313bSMauro Carvalho Chehab
390f00c313bSMauro Carvalho Chehab              tail page
391f00c313bSMauro Carvalho Chehab                 |
392f00c313bSMauro Carvalho Chehab                 v
393f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
394f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
395f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
396f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
397f00c313bSMauro Carvalho Chehab                                   ^
398f00c313bSMauro Carvalho Chehab                                   |
399f00c313bSMauro Carvalho Chehab                               head page
400f00c313bSMauro Carvalho Chehab
401f00c313bSMauro Carvalho Chehab
402f00c313bSMauro Carvalho Chehab                      tail page
403f00c313bSMauro Carvalho Chehab                          |
404f00c313bSMauro Carvalho Chehab                          v
405f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
406f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
407f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
408f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
409f00c313bSMauro Carvalho Chehab                                   ^
410f00c313bSMauro Carvalho Chehab                                   |
411f00c313bSMauro Carvalho Chehab                               head page
412f00c313bSMauro Carvalho Chehab
413f00c313bSMauro Carvalho ChehabNote, the reader page will still point to the previous head page.
414f00c313bSMauro Carvalho ChehabBut when a swap takes place, it will use the most recent head page.
415f00c313bSMauro Carvalho Chehab
416f00c313bSMauro Carvalho Chehab
417f00c313bSMauro Carvalho ChehabMaking the Ring Buffer Lockless:
418f00c313bSMauro Carvalho Chehab--------------------------------
419f00c313bSMauro Carvalho Chehab
420f00c313bSMauro Carvalho ChehabThe main idea behind the lockless algorithm is to combine the moving
421f00c313bSMauro Carvalho Chehabof the head_page pointer with the swapping of pages with the reader.
422f00c313bSMauro Carvalho ChehabState flags are placed inside the pointer to the page. To do this,
423f00c313bSMauro Carvalho Chehabeach page must be aligned in memory by 4 bytes. This will allow the 2
424f00c313bSMauro Carvalho Chehableast significant bits of the address to be used as flags, since
425f00c313bSMauro Carvalho Chehabthey will always be zero for the address. To get the address,
426f00c313bSMauro Carvalho Chehabsimply mask out the flags::
427f00c313bSMauro Carvalho Chehab
428f00c313bSMauro Carvalho Chehab  MASK = ~3
429f00c313bSMauro Carvalho Chehab
430f00c313bSMauro Carvalho Chehab  address & MASK
431f00c313bSMauro Carvalho Chehab
432f00c313bSMauro Carvalho ChehabTwo flags will be kept by these two bits:
433f00c313bSMauro Carvalho Chehab
434f00c313bSMauro Carvalho Chehab   HEADER
435f00c313bSMauro Carvalho Chehab	- the page being pointed to is a head page
436f00c313bSMauro Carvalho Chehab
437f00c313bSMauro Carvalho Chehab   UPDATE
438f00c313bSMauro Carvalho Chehab	- the page being pointed to is being updated by a writer
439f00c313bSMauro Carvalho Chehab          and was or is about to be a head page.
440f00c313bSMauro Carvalho Chehab
441f00c313bSMauro Carvalho Chehab::
442f00c313bSMauro Carvalho Chehab
443f00c313bSMauro Carvalho Chehab	      reader page
444f00c313bSMauro Carvalho Chehab		  |
445f00c313bSMauro Carvalho Chehab		  v
446f00c313bSMauro Carvalho Chehab		+---+
447f00c313bSMauro Carvalho Chehab		|   |------+
448f00c313bSMauro Carvalho Chehab		+---+      |
449f00c313bSMauro Carvalho Chehab			    |
450f00c313bSMauro Carvalho Chehab			    v
451f00c313bSMauro Carvalho Chehab	+---+    +---+    +---+    +---+
452f00c313bSMauro Carvalho Chehab    <---|   |--->|   |-H->|   |--->|   |--->
453f00c313bSMauro Carvalho Chehab    --->|   |<---|   |<---|   |<---|   |<---
454f00c313bSMauro Carvalho Chehab	+---+    +---+    +---+    +---+
455f00c313bSMauro Carvalho Chehab
456f00c313bSMauro Carvalho Chehab
457f00c313bSMauro Carvalho ChehabThe above pointer "-H->" would have the HEADER flag set. That is
458f00c313bSMauro Carvalho Chehabthe next page is the next page to be swapped out by the reader.
459f00c313bSMauro Carvalho ChehabThis pointer means the next page is the head page.
460f00c313bSMauro Carvalho Chehab
461f00c313bSMauro Carvalho ChehabWhen the tail page meets the head pointer, it will use cmpxchg to
462f00c313bSMauro Carvalho Chehabchange the pointer to the UPDATE state::
463f00c313bSMauro Carvalho Chehab
464f00c313bSMauro Carvalho Chehab
465f00c313bSMauro Carvalho Chehab              tail page
466f00c313bSMauro Carvalho Chehab                 |
467f00c313bSMauro Carvalho Chehab                 v
468f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
469f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-H->|   |--->|   |--->
470f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
471f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
472f00c313bSMauro Carvalho Chehab
473f00c313bSMauro Carvalho Chehab              tail page
474f00c313bSMauro Carvalho Chehab                 |
475f00c313bSMauro Carvalho Chehab                 v
476f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
477f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |--->
478f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
479f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
480f00c313bSMauro Carvalho Chehab
481f00c313bSMauro Carvalho Chehab"-U->" represents a pointer in the UPDATE state.
482f00c313bSMauro Carvalho Chehab
483f00c313bSMauro Carvalho ChehabAny access to the reader will need to take some sort of lock to serialize
484f00c313bSMauro Carvalho Chehabthe readers. But the writers will never take a lock to write to the
485f00c313bSMauro Carvalho Chehabring buffer. This means we only need to worry about a single reader,
486f00c313bSMauro Carvalho Chehaband writes only preempt in "stack" formation.
487f00c313bSMauro Carvalho Chehab
488f00c313bSMauro Carvalho ChehabWhen the reader tries to swap the page with the ring buffer, it
489f00c313bSMauro Carvalho Chehabwill also use cmpxchg. If the flag bit in the pointer to the
490f00c313bSMauro Carvalho Chehabhead page does not have the HEADER flag set, the compare will fail
491f00c313bSMauro Carvalho Chehaband the reader will need to look for the new head page and try again.
492f00c313bSMauro Carvalho ChehabNote, the flags UPDATE and HEADER are never set at the same time.
493f00c313bSMauro Carvalho Chehab
494f00c313bSMauro Carvalho ChehabThe reader swaps the reader page as follows::
495f00c313bSMauro Carvalho Chehab
496f00c313bSMauro Carvalho Chehab  +------+
497f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
498f00c313bSMauro Carvalho Chehab  |page  |
499f00c313bSMauro Carvalho Chehab  +------+
500f00c313bSMauro Carvalho Chehab                  +---+    +---+    +---+
501f00c313bSMauro Carvalho Chehab                  |   |--->|   |--->|   |
502f00c313bSMauro Carvalho Chehab                  |   |<---|   |<---|   |
503f00c313bSMauro Carvalho Chehab                  +---+    +---+    +---+
504f00c313bSMauro Carvalho Chehab                   ^ |               ^ |
505f00c313bSMauro Carvalho Chehab                   | +---------------+ |
506f00c313bSMauro Carvalho Chehab                   +-----H-------------+
507f00c313bSMauro Carvalho Chehab
508f00c313bSMauro Carvalho ChehabThe reader sets the reader page next pointer as HEADER to the page after
509f00c313bSMauro Carvalho Chehabthe head page::
510f00c313bSMauro Carvalho Chehab
511f00c313bSMauro Carvalho Chehab
512f00c313bSMauro Carvalho Chehab  +------+
513f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
514f00c313bSMauro Carvalho Chehab  |page  |-------H-----------+
515f00c313bSMauro Carvalho Chehab  +------+                   v
516f00c313bSMauro Carvalho Chehab    |             +---+    +---+    +---+
517f00c313bSMauro Carvalho Chehab    |             |   |--->|   |--->|   |
518f00c313bSMauro Carvalho Chehab    |             |   |<---|   |<---|   |<-+
519f00c313bSMauro Carvalho Chehab    |             +---+    +---+    +---+  |
520f00c313bSMauro Carvalho Chehab    |              ^ |               ^ |   |
521f00c313bSMauro Carvalho Chehab    |              | +---------------+ |   |
522f00c313bSMauro Carvalho Chehab    |              +-----H-------------+   |
523f00c313bSMauro Carvalho Chehab    +--------------------------------------+
524f00c313bSMauro Carvalho Chehab
525f00c313bSMauro Carvalho ChehabIt does a cmpxchg with the pointer to the previous head page to make it
526f00c313bSMauro Carvalho Chehabpoint to the reader page. Note that the new pointer does not have the HEADER
527f00c313bSMauro Carvalho Chehabflag set.  This action atomically moves the head page forward::
528f00c313bSMauro Carvalho Chehab
529f00c313bSMauro Carvalho Chehab  +------+
530f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
531f00c313bSMauro Carvalho Chehab  |page  |-------H-----------+
532f00c313bSMauro Carvalho Chehab  +------+                   v
533f00c313bSMauro Carvalho Chehab    |  ^          +---+   +---+   +---+
534f00c313bSMauro Carvalho Chehab    |  |          |   |-->|   |-->|   |
535f00c313bSMauro Carvalho Chehab    |  |          |   |<--|   |<--|   |<-+
536f00c313bSMauro Carvalho Chehab    |  |          +---+   +---+   +---+  |
537f00c313bSMauro Carvalho Chehab    |  |             |             ^ |   |
538f00c313bSMauro Carvalho Chehab    |  |             +-------------+ |   |
539f00c313bSMauro Carvalho Chehab    |  +-----------------------------+   |
540f00c313bSMauro Carvalho Chehab    +------------------------------------+
541f00c313bSMauro Carvalho Chehab
542f00c313bSMauro Carvalho ChehabAfter the new head page is set, the previous pointer of the head page is
543f00c313bSMauro Carvalho Chehabupdated to the reader page::
544f00c313bSMauro Carvalho Chehab
545f00c313bSMauro Carvalho Chehab  +------+
546f00c313bSMauro Carvalho Chehab  |reader|          RING BUFFER
547f00c313bSMauro Carvalho Chehab  |page  |-------H-----------+
548f00c313bSMauro Carvalho Chehab  +------+ <---------------+ v
549f00c313bSMauro Carvalho Chehab    |  ^          +---+   +---+   +---+
550f00c313bSMauro Carvalho Chehab    |  |          |   |-->|   |-->|   |
551f00c313bSMauro Carvalho Chehab    |  |          |   |   |   |<--|   |<-+
552f00c313bSMauro Carvalho Chehab    |  |          +---+   +---+   +---+  |
553f00c313bSMauro Carvalho Chehab    |  |             |             ^ |   |
554f00c313bSMauro Carvalho Chehab    |  |             +-------------+ |   |
555f00c313bSMauro Carvalho Chehab    |  +-----------------------------+   |
556f00c313bSMauro Carvalho Chehab    +------------------------------------+
557f00c313bSMauro Carvalho Chehab
558f00c313bSMauro Carvalho Chehab  +------+
559f00c313bSMauro Carvalho Chehab  |buffer|          RING BUFFER
560f00c313bSMauro Carvalho Chehab  |page  |-------H-----------+  <--- New head page
561f00c313bSMauro Carvalho Chehab  +------+ <---------------+ v
562f00c313bSMauro Carvalho Chehab    |  ^          +---+   +---+   +---+
563f00c313bSMauro Carvalho Chehab    |  |          |   |   |   |-->|   |
564f00c313bSMauro Carvalho Chehab    |  |  New     |   |   |   |<--|   |<-+
565f00c313bSMauro Carvalho Chehab    |  | Reader   +---+   +---+   +---+  |
566f00c313bSMauro Carvalho Chehab    |  |  page ----^                 |   |
567f00c313bSMauro Carvalho Chehab    |  |                             |   |
568f00c313bSMauro Carvalho Chehab    |  +-----------------------------+   |
569f00c313bSMauro Carvalho Chehab    +------------------------------------+
570f00c313bSMauro Carvalho Chehab
571f00c313bSMauro Carvalho ChehabAnother important point: The page that the reader page points back to
572f00c313bSMauro Carvalho Chehabby its previous pointer (the one that now points to the new head page)
573f00c313bSMauro Carvalho Chehabnever points back to the reader page. That is because the reader page is
574f00c313bSMauro Carvalho Chehabnot part of the ring buffer. Traversing the ring buffer via the next pointers
575f00c313bSMauro Carvalho Chehabwill always stay in the ring buffer. Traversing the ring buffer via the
576f00c313bSMauro Carvalho Chehabprev pointers may not.
577f00c313bSMauro Carvalho Chehab
578f00c313bSMauro Carvalho ChehabNote, the way to determine a reader page is simply by examining the previous
579f00c313bSMauro Carvalho Chehabpointer of the page. If the next pointer of the previous page does not
580f00c313bSMauro Carvalho Chehabpoint back to the original page, then the original page is a reader page::
581f00c313bSMauro Carvalho Chehab
582f00c313bSMauro Carvalho Chehab
583f00c313bSMauro Carvalho Chehab             +--------+
584f00c313bSMauro Carvalho Chehab             | reader |  next   +----+
585f00c313bSMauro Carvalho Chehab             |  page  |-------->|    |<====== (buffer page)
586f00c313bSMauro Carvalho Chehab             +--------+         +----+
587f00c313bSMauro Carvalho Chehab                 |                | ^
588f00c313bSMauro Carvalho Chehab                 |                v | next
589f00c313bSMauro Carvalho Chehab            prev |              +----+
590f00c313bSMauro Carvalho Chehab                 +------------->|    |
591f00c313bSMauro Carvalho Chehab                                +----+
592f00c313bSMauro Carvalho Chehab
593f00c313bSMauro Carvalho ChehabThe way the head page moves forward:
594f00c313bSMauro Carvalho Chehab
595f00c313bSMauro Carvalho ChehabWhen the tail page meets the head page and the buffer is in overwrite mode
596f00c313bSMauro Carvalho Chehaband more writes take place, the head page must be moved forward before the
597f00c313bSMauro Carvalho Chehabwriter may move the tail page. The way this is done is that the writer
598f00c313bSMauro Carvalho Chehabperforms a cmpxchg to convert the pointer to the head page from the HEADER
599f00c313bSMauro Carvalho Chehabflag to have the UPDATE flag set. Once this is done, the reader will
600f00c313bSMauro Carvalho Chehabnot be able to swap the head page from the buffer, nor will it be able to
601f00c313bSMauro Carvalho Chehabmove the head page, until the writer is finished with the move.
602f00c313bSMauro Carvalho Chehab
603f00c313bSMauro Carvalho ChehabThis eliminates any races that the reader can have on the writer. The reader
604f00c313bSMauro Carvalho Chehabmust spin, and this is why the reader cannot preempt the writer::
605f00c313bSMauro Carvalho Chehab
606f00c313bSMauro Carvalho Chehab              tail page
607f00c313bSMauro Carvalho Chehab                 |
608f00c313bSMauro Carvalho Chehab                 v
609f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
610f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-H->|   |--->|   |--->
611f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
612f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
613f00c313bSMauro Carvalho Chehab
614f00c313bSMauro Carvalho Chehab              tail page
615f00c313bSMauro Carvalho Chehab                 |
616f00c313bSMauro Carvalho Chehab                 v
617f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
618f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |--->
619f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
620f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
621f00c313bSMauro Carvalho Chehab
622f00c313bSMauro Carvalho ChehabThe following page will be made into the new head page::
623f00c313bSMauro Carvalho Chehab
624f00c313bSMauro Carvalho Chehab             tail page
625f00c313bSMauro Carvalho Chehab                 |
626f00c313bSMauro Carvalho Chehab                 v
627f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
628f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |--->
629f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
630f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
631f00c313bSMauro Carvalho Chehab
632f00c313bSMauro Carvalho ChehabAfter the new head page has been set, we can set the old head page
633f00c313bSMauro Carvalho Chehabpointer back to NORMAL::
634f00c313bSMauro Carvalho Chehab
635f00c313bSMauro Carvalho Chehab             tail page
636f00c313bSMauro Carvalho Chehab                 |
637f00c313bSMauro Carvalho Chehab                 v
638f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
639f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |-H->|   |--->
640f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
641f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
642f00c313bSMauro Carvalho Chehab
643f00c313bSMauro Carvalho ChehabAfter the head page has been moved, the tail page may now move forward::
644f00c313bSMauro Carvalho Chehab
645f00c313bSMauro Carvalho Chehab                      tail page
646f00c313bSMauro Carvalho Chehab                          |
647f00c313bSMauro Carvalho Chehab                          v
648f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
649f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |-H->|   |--->
650f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
651f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
652f00c313bSMauro Carvalho Chehab
653f00c313bSMauro Carvalho Chehab
654f00c313bSMauro Carvalho ChehabThe above are the trivial updates. Now for the more complex scenarios.
655f00c313bSMauro Carvalho Chehab
656f00c313bSMauro Carvalho Chehab
657f00c313bSMauro Carvalho ChehabAs stated before, if enough writes preempt the first write, the
658f00c313bSMauro Carvalho Chehabtail page may make it all the way around the buffer and meet the commit
659f00c313bSMauro Carvalho Chehabpage. At this time, we must start dropping writes (usually with some kind
660f00c313bSMauro Carvalho Chehabof warning to the user). But what happens if the commit was still on the
661f00c313bSMauro Carvalho Chehabreader page? The commit page is not part of the ring buffer. The tail page
662f00c313bSMauro Carvalho Chehabmust account for this::
663f00c313bSMauro Carvalho Chehab
664f00c313bSMauro Carvalho Chehab
665f00c313bSMauro Carvalho Chehab            reader page    commit page
666f00c313bSMauro Carvalho Chehab                |              |
667f00c313bSMauro Carvalho Chehab                v              |
668f00c313bSMauro Carvalho Chehab               +---+           |
669f00c313bSMauro Carvalho Chehab               |   |<----------+
670f00c313bSMauro Carvalho Chehab               |   |
671f00c313bSMauro Carvalho Chehab               |   |------+
672f00c313bSMauro Carvalho Chehab               +---+      |
673f00c313bSMauro Carvalho Chehab                          |
674f00c313bSMauro Carvalho Chehab                          v
675f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
676f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-H->|   |--->|   |--->
677f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
678f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
679f00c313bSMauro Carvalho Chehab                 ^
680f00c313bSMauro Carvalho Chehab                 |
681f00c313bSMauro Carvalho Chehab             tail page
682f00c313bSMauro Carvalho Chehab
683f00c313bSMauro Carvalho ChehabIf the tail page were to simply push the head page forward, the commit when
684f00c313bSMauro Carvalho Chehableaving the reader page would not be pointing to the correct page.
685f00c313bSMauro Carvalho Chehab
686f00c313bSMauro Carvalho ChehabThe solution to this is to test if the commit page is on the reader page
687f00c313bSMauro Carvalho Chehabbefore pushing the head page. If it is, then it can be assumed that the
688f00c313bSMauro Carvalho Chehabtail page wrapped the buffer, and we must drop new writes.
689f00c313bSMauro Carvalho Chehab
690f00c313bSMauro Carvalho ChehabThis is not a race condition, because the commit page can only be moved
691f00c313bSMauro Carvalho Chehabby the outermost writer (the writer that was preempted).
692f00c313bSMauro Carvalho ChehabThis means that the commit will not move while a writer is moving the
693f00c313bSMauro Carvalho Chehabtail page. The reader cannot swap the reader page if it is also being
694f00c313bSMauro Carvalho Chehabused as the commit page. The reader can simply check that the commit
695f00c313bSMauro Carvalho Chehabis off the reader page. Once the commit page leaves the reader page
696f00c313bSMauro Carvalho Chehabit will never go back on it unless a reader does another swap with the
697f00c313bSMauro Carvalho Chehabbuffer page that is also the commit page.
698f00c313bSMauro Carvalho Chehab
699f00c313bSMauro Carvalho Chehab
700f00c313bSMauro Carvalho ChehabNested writes
701f00c313bSMauro Carvalho Chehab-------------
702f00c313bSMauro Carvalho Chehab
703f00c313bSMauro Carvalho ChehabIn the pushing forward of the tail page we must first push forward
704f00c313bSMauro Carvalho Chehabthe head page if the head page is the next page. If the head page
705f00c313bSMauro Carvalho Chehabis not the next page, the tail page is simply updated with a cmpxchg.
706f00c313bSMauro Carvalho Chehab
707f00c313bSMauro Carvalho ChehabOnly writers move the tail page. This must be done atomically to protect
708f00c313bSMauro Carvalho Chehabagainst nested writers::
709f00c313bSMauro Carvalho Chehab
710f00c313bSMauro Carvalho Chehab  temp_page = tail_page
711f00c313bSMauro Carvalho Chehab  next_page = temp_page->next
712f00c313bSMauro Carvalho Chehab  cmpxchg(tail_page, temp_page, next_page)
713f00c313bSMauro Carvalho Chehab
714f00c313bSMauro Carvalho ChehabThe above will update the tail page if it is still pointing to the expected
715f00c313bSMauro Carvalho Chehabpage. If this fails, a nested write pushed it forward, the current write
716f00c313bSMauro Carvalho Chehabdoes not need to push it::
717f00c313bSMauro Carvalho Chehab
718f00c313bSMauro Carvalho Chehab
719f00c313bSMauro Carvalho Chehab             temp page
720f00c313bSMauro Carvalho Chehab                 |
721f00c313bSMauro Carvalho Chehab                 v
722f00c313bSMauro Carvalho Chehab              tail page
723f00c313bSMauro Carvalho Chehab                 |
724f00c313bSMauro Carvalho Chehab                 v
725f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
726f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
727f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
728f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
729f00c313bSMauro Carvalho Chehab
730f00c313bSMauro Carvalho ChehabNested write comes in and moves the tail page forward::
731f00c313bSMauro Carvalho Chehab
732f00c313bSMauro Carvalho Chehab                      tail page (moved by nested writer)
733f00c313bSMauro Carvalho Chehab              temp page   |
734f00c313bSMauro Carvalho Chehab                 |        |
735f00c313bSMauro Carvalho Chehab                 v        v
736f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
737f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |--->
738f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
739f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
740f00c313bSMauro Carvalho Chehab
741f00c313bSMauro Carvalho ChehabThe above would fail the cmpxchg, but since the tail page has already
742f00c313bSMauro Carvalho Chehabbeen moved forward, the writer will just try again to reserve storage
743f00c313bSMauro Carvalho Chehabon the new tail page.
744f00c313bSMauro Carvalho Chehab
745f00c313bSMauro Carvalho ChehabBut the moving of the head page is a bit more complex::
746f00c313bSMauro Carvalho Chehab
747f00c313bSMauro Carvalho Chehab              tail page
748f00c313bSMauro Carvalho Chehab                 |
749f00c313bSMauro Carvalho Chehab                 v
750f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
751f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-H->|   |--->|   |--->
752f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
753f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
754f00c313bSMauro Carvalho Chehab
755f00c313bSMauro Carvalho ChehabThe write converts the head page pointer to UPDATE::
756f00c313bSMauro Carvalho Chehab
757f00c313bSMauro Carvalho Chehab              tail page
758f00c313bSMauro Carvalho Chehab                 |
759f00c313bSMauro Carvalho Chehab                 v
760f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
761f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |--->
762f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
763f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
764f00c313bSMauro Carvalho Chehab
765f00c313bSMauro Carvalho ChehabBut if a nested writer preempts here, it will see that the next
766f00c313bSMauro Carvalho Chehabpage is a head page, but it is also nested. It will detect that
767f00c313bSMauro Carvalho Chehabit is nested and will save that information. The detection is the
768f00c313bSMauro Carvalho Chehabfact that it sees the UPDATE flag instead of a HEADER or NORMAL
769f00c313bSMauro Carvalho Chehabpointer.
770f00c313bSMauro Carvalho Chehab
771f00c313bSMauro Carvalho ChehabThe nested writer will set the new head page pointer::
772f00c313bSMauro Carvalho Chehab
773f00c313bSMauro Carvalho Chehab             tail page
774f00c313bSMauro Carvalho Chehab                 |
775f00c313bSMauro Carvalho Chehab                 v
776f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
777f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |--->
778f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
779f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
780f00c313bSMauro Carvalho Chehab
781f00c313bSMauro Carvalho ChehabBut it will not reset the update back to normal. Only the writer
782f00c313bSMauro Carvalho Chehabthat converted a pointer from HEAD to UPDATE will convert it back
783f00c313bSMauro Carvalho Chehabto NORMAL::
784f00c313bSMauro Carvalho Chehab
785f00c313bSMauro Carvalho Chehab                      tail page
786f00c313bSMauro Carvalho Chehab                          |
787f00c313bSMauro Carvalho Chehab                          v
788f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
789f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |--->
790f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
791f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
792f00c313bSMauro Carvalho Chehab
793f00c313bSMauro Carvalho ChehabAfter the nested writer finishes, the outermost writer will convert
794f00c313bSMauro Carvalho Chehabthe UPDATE pointer to NORMAL::
795f00c313bSMauro Carvalho Chehab
796f00c313bSMauro Carvalho Chehab
797f00c313bSMauro Carvalho Chehab                      tail page
798f00c313bSMauro Carvalho Chehab                          |
799f00c313bSMauro Carvalho Chehab                          v
800f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
801f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |-H->|   |--->
802f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
803f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
804f00c313bSMauro Carvalho Chehab
805f00c313bSMauro Carvalho Chehab
806f00c313bSMauro Carvalho ChehabIt can be even more complex if several nested writes came in and moved
807f00c313bSMauro Carvalho Chehabthe tail page ahead several pages::
808f00c313bSMauro Carvalho Chehab
809f00c313bSMauro Carvalho Chehab
810f00c313bSMauro Carvalho Chehab  (first writer)
811f00c313bSMauro Carvalho Chehab
812f00c313bSMauro Carvalho Chehab              tail page
813f00c313bSMauro Carvalho Chehab                 |
814f00c313bSMauro Carvalho Chehab                 v
815f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
816f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-H->|   |--->|   |--->
817f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
818f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
819f00c313bSMauro Carvalho Chehab
820f00c313bSMauro Carvalho ChehabThe write converts the head page pointer to UPDATE::
821f00c313bSMauro Carvalho Chehab
822f00c313bSMauro Carvalho Chehab              tail page
823f00c313bSMauro Carvalho Chehab                 |
824f00c313bSMauro Carvalho Chehab                 v
825f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
826f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |--->
827f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
828f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
829f00c313bSMauro Carvalho Chehab
830f00c313bSMauro Carvalho ChehabNext writer comes in, and sees the update and sets up the new
831f00c313bSMauro Carvalho Chehabhead page::
832f00c313bSMauro Carvalho Chehab
833f00c313bSMauro Carvalho Chehab  (second writer)
834f00c313bSMauro Carvalho Chehab
835f00c313bSMauro Carvalho Chehab             tail page
836f00c313bSMauro Carvalho Chehab                 |
837f00c313bSMauro Carvalho Chehab                 v
838f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
839f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |--->
840f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
841f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
842f00c313bSMauro Carvalho Chehab
843f00c313bSMauro Carvalho ChehabThe nested writer moves the tail page forward. But does not set the old
844f00c313bSMauro Carvalho Chehabupdate page to NORMAL because it is not the outermost writer::
845f00c313bSMauro Carvalho Chehab
846f00c313bSMauro Carvalho Chehab                      tail page
847f00c313bSMauro Carvalho Chehab                          |
848f00c313bSMauro Carvalho Chehab                          v
849f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
850f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |--->
851f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
852f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
853f00c313bSMauro Carvalho Chehab
854f00c313bSMauro Carvalho ChehabAnother writer preempts and sees the page after the tail page is a head page.
855f00c313bSMauro Carvalho ChehabIt changes it from HEAD to UPDATE::
856f00c313bSMauro Carvalho Chehab
857f00c313bSMauro Carvalho Chehab  (third writer)
858f00c313bSMauro Carvalho Chehab
859f00c313bSMauro Carvalho Chehab                      tail page
860f00c313bSMauro Carvalho Chehab                          |
861f00c313bSMauro Carvalho Chehab                          v
862f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
863f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-U->|   |--->
864f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
865f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
866f00c313bSMauro Carvalho Chehab
867f00c313bSMauro Carvalho ChehabThe writer will move the head page forward::
868f00c313bSMauro Carvalho Chehab
869f00c313bSMauro Carvalho Chehab
870f00c313bSMauro Carvalho Chehab  (third writer)
871f00c313bSMauro Carvalho Chehab
872f00c313bSMauro Carvalho Chehab                      tail page
873f00c313bSMauro Carvalho Chehab                          |
874f00c313bSMauro Carvalho Chehab                          v
875f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
876f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-U->|   |-H->
877f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
878f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
879f00c313bSMauro Carvalho Chehab
880f00c313bSMauro Carvalho ChehabBut now that the third writer did change the HEAD flag to UPDATE it
881f00c313bSMauro Carvalho Chehabwill convert it to normal::
882f00c313bSMauro Carvalho Chehab
883f00c313bSMauro Carvalho Chehab
884f00c313bSMauro Carvalho Chehab  (third writer)
885f00c313bSMauro Carvalho Chehab
886f00c313bSMauro Carvalho Chehab                      tail page
887f00c313bSMauro Carvalho Chehab                          |
888f00c313bSMauro Carvalho Chehab                          v
889f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
890f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |-H->
891f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
892f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
893f00c313bSMauro Carvalho Chehab
894f00c313bSMauro Carvalho Chehab
895f00c313bSMauro Carvalho ChehabThen it will move the tail page, and return back to the second writer::
896f00c313bSMauro Carvalho Chehab
897f00c313bSMauro Carvalho Chehab
898f00c313bSMauro Carvalho Chehab  (second writer)
899f00c313bSMauro Carvalho Chehab
900f00c313bSMauro Carvalho Chehab                               tail page
901f00c313bSMauro Carvalho Chehab                                   |
902f00c313bSMauro Carvalho Chehab                                   v
903f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
904f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |-H->
905f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
906f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
907f00c313bSMauro Carvalho Chehab
908f00c313bSMauro Carvalho Chehab
909f00c313bSMauro Carvalho ChehabThe second writer will fail to move the tail page because it was already
910f00c313bSMauro Carvalho Chehabmoved, so it will try again and add its data to the new tail page.
911f00c313bSMauro Carvalho ChehabIt will return to the first writer::
912f00c313bSMauro Carvalho Chehab
913f00c313bSMauro Carvalho Chehab
914f00c313bSMauro Carvalho Chehab  (first writer)
915f00c313bSMauro Carvalho Chehab
916f00c313bSMauro Carvalho Chehab                               tail page
917f00c313bSMauro Carvalho Chehab                                   |
918f00c313bSMauro Carvalho Chehab                                   v
919f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
920f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |-H->
921f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
922f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
923f00c313bSMauro Carvalho Chehab
924f00c313bSMauro Carvalho ChehabThe first writer cannot know atomically if the tail page moved
925f00c313bSMauro Carvalho Chehabwhile it updates the HEAD page. It will then update the head page to
926f00c313bSMauro Carvalho Chehabwhat it thinks is the new head page::
927f00c313bSMauro Carvalho Chehab
928f00c313bSMauro Carvalho Chehab
929f00c313bSMauro Carvalho Chehab  (first writer)
930f00c313bSMauro Carvalho Chehab
931f00c313bSMauro Carvalho Chehab                               tail page
932f00c313bSMauro Carvalho Chehab                                   |
933f00c313bSMauro Carvalho Chehab                                   v
934f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
935f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |-H->
936f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
937f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
938f00c313bSMauro Carvalho Chehab
939f00c313bSMauro Carvalho ChehabSince the cmpxchg returns the old value of the pointer the first writer
940f00c313bSMauro Carvalho Chehabwill see it succeeded in updating the pointer from NORMAL to HEAD.
941f00c313bSMauro Carvalho ChehabBut as we can see, this is not good enough. It must also check to see
942f00c313bSMauro Carvalho Chehabif the tail page is either where it use to be or on the next page::
943f00c313bSMauro Carvalho Chehab
944f00c313bSMauro Carvalho Chehab
945f00c313bSMauro Carvalho Chehab  (first writer)
946f00c313bSMauro Carvalho Chehab
947f00c313bSMauro Carvalho Chehab                 A        B    tail page
948f00c313bSMauro Carvalho Chehab                 |        |        |
949f00c313bSMauro Carvalho Chehab                 v        v        v
950f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
951f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |-H->|   |-H->
952f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
953f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
954f00c313bSMauro Carvalho Chehab
955f00c313bSMauro Carvalho ChehabIf tail page != A and tail page != B, then it must reset the pointer
956f00c313bSMauro Carvalho Chehabback to NORMAL. The fact that it only needs to worry about nested
957f00c313bSMauro Carvalho Chehabwriters means that it only needs to check this after setting the HEAD page::
958f00c313bSMauro Carvalho Chehab
959f00c313bSMauro Carvalho Chehab
960f00c313bSMauro Carvalho Chehab  (first writer)
961f00c313bSMauro Carvalho Chehab
962f00c313bSMauro Carvalho Chehab                 A        B    tail page
963f00c313bSMauro Carvalho Chehab                 |        |        |
964f00c313bSMauro Carvalho Chehab                 v        v        v
965f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
966f00c313bSMauro Carvalho Chehab  <---|   |--->|   |-U->|   |--->|   |-H->
967f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
968f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
969f00c313bSMauro Carvalho Chehab
970f00c313bSMauro Carvalho ChehabNow the writer can update the head page. This is also why the head page must
971f00c313bSMauro Carvalho Chehabremain in UPDATE and only reset by the outermost writer. This prevents
972f00c313bSMauro Carvalho Chehabthe reader from seeing the incorrect head page::
973f00c313bSMauro Carvalho Chehab
974f00c313bSMauro Carvalho Chehab
975f00c313bSMauro Carvalho Chehab  (first writer)
976f00c313bSMauro Carvalho Chehab
977f00c313bSMauro Carvalho Chehab                 A        B    tail page
978f00c313bSMauro Carvalho Chehab                 |        |        |
979f00c313bSMauro Carvalho Chehab                 v        v        v
980f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
981f00c313bSMauro Carvalho Chehab  <---|   |--->|   |--->|   |--->|   |-H->
982f00c313bSMauro Carvalho Chehab  --->|   |<---|   |<---|   |<---|   |<---
983f00c313bSMauro Carvalho Chehab      +---+    +---+    +---+    +---+
984