1*e0bc9cf0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only 2f00c313bSMauro Carvalho Chehab 3f00c313bSMauro Carvalho Chehab=========================== 4f00c313bSMauro Carvalho ChehabLockless Ring Buffer Design 5f00c313bSMauro Carvalho Chehab=========================== 6f00c313bSMauro Carvalho Chehab 7f00c313bSMauro Carvalho ChehabCopyright 2009 Red Hat Inc. 8f00c313bSMauro Carvalho Chehab 9f00c313bSMauro Carvalho Chehab:Author: Steven Rostedt <srostedt@redhat.com> 10f00c313bSMauro Carvalho Chehab:License: The GNU Free Documentation License, Version 1.2 11f00c313bSMauro Carvalho Chehab (dual licensed under the GPL v2) 12f00c313bSMauro Carvalho Chehab:Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto, 13f00c313bSMauro Carvalho Chehab and Frederic Weisbecker. 14f00c313bSMauro Carvalho Chehab 15f00c313bSMauro Carvalho Chehab 16f00c313bSMauro Carvalho ChehabWritten for: 2.6.31 17f00c313bSMauro Carvalho Chehab 18f00c313bSMauro Carvalho ChehabTerminology used in this Document 19f00c313bSMauro Carvalho Chehab--------------------------------- 20f00c313bSMauro Carvalho Chehab 21f00c313bSMauro Carvalho Chehabtail 22f00c313bSMauro Carvalho Chehab - where new writes happen in the ring buffer. 23f00c313bSMauro Carvalho Chehab 24f00c313bSMauro Carvalho Chehabhead 25f00c313bSMauro Carvalho Chehab - where new reads happen in the ring buffer. 26f00c313bSMauro Carvalho Chehab 27f00c313bSMauro Carvalho Chehabproducer 28f00c313bSMauro Carvalho Chehab - the task that writes into the ring buffer (same as writer) 29f00c313bSMauro Carvalho Chehab 30f00c313bSMauro Carvalho Chehabwriter 31f00c313bSMauro Carvalho Chehab - same as producer 32f00c313bSMauro Carvalho Chehab 33f00c313bSMauro Carvalho Chehabconsumer 34f00c313bSMauro Carvalho Chehab - the task that reads from the buffer (same as reader) 35f00c313bSMauro Carvalho Chehab 36f00c313bSMauro Carvalho Chehabreader 37f00c313bSMauro Carvalho Chehab - same as consumer. 38f00c313bSMauro Carvalho Chehab 39f00c313bSMauro Carvalho Chehabreader_page 40f00c313bSMauro Carvalho Chehab - A page outside the ring buffer used solely (for the most part) 41f00c313bSMauro Carvalho Chehab by the reader. 42f00c313bSMauro Carvalho Chehab 43f00c313bSMauro Carvalho Chehabhead_page 44f00c313bSMauro Carvalho Chehab - a pointer to the page that the reader will use next 45f00c313bSMauro Carvalho Chehab 46f00c313bSMauro Carvalho Chehabtail_page 47f00c313bSMauro Carvalho Chehab - a pointer to the page that will be written to next 48f00c313bSMauro Carvalho Chehab 49f00c313bSMauro Carvalho Chehabcommit_page 50f00c313bSMauro Carvalho Chehab - a pointer to the page with the last finished non-nested write. 51f00c313bSMauro Carvalho Chehab 52f00c313bSMauro Carvalho Chehabcmpxchg 53f00c313bSMauro Carvalho Chehab - hardware-assisted atomic transaction that performs the following:: 54f00c313bSMauro Carvalho Chehab 55f00c313bSMauro Carvalho Chehab A = B if previous A == C 56f00c313bSMauro Carvalho Chehab 57f00c313bSMauro Carvalho Chehab R = cmpxchg(A, C, B) is saying that we replace A with B if and only 58f00c313bSMauro Carvalho Chehab if current A is equal to C, and we put the old (current) 59f00c313bSMauro Carvalho Chehab A into R 60f00c313bSMauro Carvalho Chehab 61f00c313bSMauro Carvalho Chehab R gets the previous A regardless if A is updated with B or not. 62f00c313bSMauro Carvalho Chehab 63f00c313bSMauro Carvalho Chehab To see if the update was successful a compare of ``R == C`` 64f00c313bSMauro Carvalho Chehab may be used. 65f00c313bSMauro Carvalho Chehab 66f00c313bSMauro Carvalho ChehabThe Generic Ring Buffer 67f00c313bSMauro Carvalho Chehab----------------------- 68f00c313bSMauro Carvalho Chehab 69f00c313bSMauro Carvalho ChehabThe ring buffer can be used in either an overwrite mode or in 70f00c313bSMauro Carvalho Chehabproducer/consumer mode. 71f00c313bSMauro Carvalho Chehab 72f00c313bSMauro Carvalho ChehabProducer/consumer mode is where if the producer were to fill up the 73f00c313bSMauro Carvalho Chehabbuffer before the consumer could free up anything, the producer 74f00c313bSMauro Carvalho Chehabwill stop writing to the buffer. This will lose most recent events. 75f00c313bSMauro Carvalho Chehab 76f00c313bSMauro Carvalho ChehabOverwrite mode is where if the producer were to fill up the buffer 77f00c313bSMauro Carvalho Chehabbefore the consumer could free up anything, the producer will 78f00c313bSMauro Carvalho Chehaboverwrite the older data. This will lose the oldest events. 79f00c313bSMauro Carvalho Chehab 80f00c313bSMauro Carvalho ChehabNo two writers can write at the same time (on the same per-cpu buffer), 81f00c313bSMauro Carvalho Chehabbut a writer may interrupt another writer, but it must finish writing 82f00c313bSMauro Carvalho Chehabbefore the previous writer may continue. This is very important to the 83f00c313bSMauro Carvalho Chehabalgorithm. The writers act like a "stack". The way interrupts works 84f00c313bSMauro Carvalho Chehabenforces this behavior:: 85f00c313bSMauro Carvalho Chehab 86f00c313bSMauro Carvalho Chehab 87f00c313bSMauro Carvalho Chehab writer1 start 88f00c313bSMauro Carvalho Chehab <preempted> writer2 start 89f00c313bSMauro Carvalho Chehab <preempted> writer3 start 90f00c313bSMauro Carvalho Chehab writer3 finishes 91f00c313bSMauro Carvalho Chehab writer2 finishes 92f00c313bSMauro Carvalho Chehab writer1 finishes 93f00c313bSMauro Carvalho Chehab 94f00c313bSMauro Carvalho ChehabThis is very much like a writer being preempted by an interrupt and 95f00c313bSMauro Carvalho Chehabthe interrupt doing a write as well. 96f00c313bSMauro Carvalho Chehab 97f00c313bSMauro Carvalho ChehabReaders can happen at any time. But no two readers may run at the 98f00c313bSMauro Carvalho Chehabsame time, nor can a reader preempt/interrupt another reader. A reader 99f00c313bSMauro Carvalho Chehabcannot preempt/interrupt a writer, but it may read/consume from the 100f00c313bSMauro Carvalho Chehabbuffer at the same time as a writer is writing, but the reader must be 101f00c313bSMauro Carvalho Chehabon another processor to do so. A reader may read on its own processor 102f00c313bSMauro Carvalho Chehaband can be preempted by a writer. 103f00c313bSMauro Carvalho Chehab 104f00c313bSMauro Carvalho ChehabA writer can preempt a reader, but a reader cannot preempt a writer. 105f00c313bSMauro Carvalho ChehabBut a reader can read the buffer at the same time (on another processor) 106f00c313bSMauro Carvalho Chehabas a writer. 107f00c313bSMauro Carvalho Chehab 108f00c313bSMauro Carvalho ChehabThe ring buffer is made up of a list of pages held together by a linked list. 109f00c313bSMauro Carvalho Chehab 110f00c313bSMauro Carvalho ChehabAt initialization a reader page is allocated for the reader that is not 111f00c313bSMauro Carvalho Chehabpart of the ring buffer. 112f00c313bSMauro Carvalho Chehab 113f00c313bSMauro Carvalho ChehabThe head_page, tail_page and commit_page are all initialized to point 114f00c313bSMauro Carvalho Chehabto the same page. 115f00c313bSMauro Carvalho Chehab 116f00c313bSMauro Carvalho ChehabThe reader page is initialized to have its next pointer pointing to 117f00c313bSMauro Carvalho Chehabthe head page, and its previous pointer pointing to a page before 118f00c313bSMauro Carvalho Chehabthe head page. 119f00c313bSMauro Carvalho Chehab 120f00c313bSMauro Carvalho ChehabThe reader has its own page to use. At start up time, this page is 121f00c313bSMauro Carvalho Chehaballocated but is not attached to the list. When the reader wants 122f00c313bSMauro Carvalho Chehabto read from the buffer, if its page is empty (like it is on start-up), 123f00c313bSMauro Carvalho Chehabit will swap its page with the head_page. The old reader page will 124f00c313bSMauro Carvalho Chehabbecome part of the ring buffer and the head_page will be removed. 125f00c313bSMauro Carvalho ChehabThe page after the inserted page (old reader_page) will become the 126f00c313bSMauro Carvalho Chehabnew head page. 127f00c313bSMauro Carvalho Chehab 128f00c313bSMauro Carvalho ChehabOnce the new page is given to the reader, the reader could do what 129f00c313bSMauro Carvalho Chehabit wants with it, as long as a writer has left that page. 130f00c313bSMauro Carvalho Chehab 131f00c313bSMauro Carvalho ChehabA sample of how the reader page is swapped: Note this does not 132f00c313bSMauro Carvalho Chehabshow the head page in the buffer, it is for demonstrating a swap 133f00c313bSMauro Carvalho Chehabonly. 134f00c313bSMauro Carvalho Chehab 135f00c313bSMauro Carvalho Chehab:: 136f00c313bSMauro Carvalho Chehab 137f00c313bSMauro Carvalho Chehab +------+ 138f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 139f00c313bSMauro Carvalho Chehab |page | 140f00c313bSMauro Carvalho Chehab +------+ 141f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ 142f00c313bSMauro Carvalho Chehab | |-->| |-->| | 143f00c313bSMauro Carvalho Chehab | |<--| |<--| | 144f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ 145f00c313bSMauro Carvalho Chehab ^ | ^ | 146f00c313bSMauro Carvalho Chehab | +-------------+ | 147f00c313bSMauro Carvalho Chehab +-----------------+ 148f00c313bSMauro Carvalho Chehab 149f00c313bSMauro Carvalho Chehab 150f00c313bSMauro Carvalho Chehab +------+ 151f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 152f00c313bSMauro Carvalho Chehab |page |-------------------+ 153f00c313bSMauro Carvalho Chehab +------+ v 154f00c313bSMauro Carvalho Chehab | +---+ +---+ +---+ 155f00c313bSMauro Carvalho Chehab | | |-->| |-->| | 156f00c313bSMauro Carvalho Chehab | | |<--| |<--| |<-+ 157f00c313bSMauro Carvalho Chehab | +---+ +---+ +---+ | 158f00c313bSMauro Carvalho Chehab | ^ | ^ | | 159f00c313bSMauro Carvalho Chehab | | +-------------+ | | 160f00c313bSMauro Carvalho Chehab | +-----------------+ | 161f00c313bSMauro Carvalho Chehab +------------------------------------+ 162f00c313bSMauro Carvalho Chehab 163f00c313bSMauro Carvalho Chehab +------+ 164f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 165f00c313bSMauro Carvalho Chehab |page |-------------------+ 166f00c313bSMauro Carvalho Chehab +------+ <---------------+ v 167f00c313bSMauro Carvalho Chehab | ^ +---+ +---+ +---+ 168f00c313bSMauro Carvalho Chehab | | | |-->| |-->| | 169f00c313bSMauro Carvalho Chehab | | | | | |<--| |<-+ 170f00c313bSMauro Carvalho Chehab | | +---+ +---+ +---+ | 171f00c313bSMauro Carvalho Chehab | | | ^ | | 172f00c313bSMauro Carvalho Chehab | | +-------------+ | | 173f00c313bSMauro Carvalho Chehab | +-----------------------------+ | 174f00c313bSMauro Carvalho Chehab +------------------------------------+ 175f00c313bSMauro Carvalho Chehab 176f00c313bSMauro Carvalho Chehab +------+ 177f00c313bSMauro Carvalho Chehab |buffer| RING BUFFER 178f00c313bSMauro Carvalho Chehab |page |-------------------+ 179f00c313bSMauro Carvalho Chehab +------+ <---------------+ v 180f00c313bSMauro Carvalho Chehab | ^ +---+ +---+ +---+ 181f00c313bSMauro Carvalho Chehab | | | | | |-->| | 182f00c313bSMauro Carvalho Chehab | | New | | | |<--| |<-+ 183f00c313bSMauro Carvalho Chehab | | Reader +---+ +---+ +---+ | 184f00c313bSMauro Carvalho Chehab | | page ----^ | | 185f00c313bSMauro Carvalho Chehab | | | | 186f00c313bSMauro Carvalho Chehab | +-----------------------------+ | 187f00c313bSMauro Carvalho Chehab +------------------------------------+ 188f00c313bSMauro Carvalho Chehab 189f00c313bSMauro Carvalho Chehab 190f00c313bSMauro Carvalho Chehab 191f00c313bSMauro Carvalho ChehabIt is possible that the page swapped is the commit page and the tail page, 192f00c313bSMauro Carvalho Chehabif what is in the ring buffer is less than what is held in a buffer page. 193f00c313bSMauro Carvalho Chehab 194f00c313bSMauro Carvalho Chehab:: 195f00c313bSMauro Carvalho Chehab 196f00c313bSMauro Carvalho Chehab reader page commit page tail page 197f00c313bSMauro Carvalho Chehab | | | 198f00c313bSMauro Carvalho Chehab v | | 199f00c313bSMauro Carvalho Chehab +---+ | | 200f00c313bSMauro Carvalho Chehab | |<----------+ | 201f00c313bSMauro Carvalho Chehab | |<------------------------+ 202f00c313bSMauro Carvalho Chehab | |------+ 203f00c313bSMauro Carvalho Chehab +---+ | 204f00c313bSMauro Carvalho Chehab | 205f00c313bSMauro Carvalho Chehab v 206f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 207f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 208f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 209f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 210f00c313bSMauro Carvalho Chehab 211f00c313bSMauro Carvalho ChehabThis case is still valid for this algorithm. 212f00c313bSMauro Carvalho ChehabWhen the writer leaves the page, it simply goes into the ring buffer 213f00c313bSMauro Carvalho Chehabsince the reader page still points to the next location in the ring 214f00c313bSMauro Carvalho Chehabbuffer. 215f00c313bSMauro Carvalho Chehab 216f00c313bSMauro Carvalho Chehab 217f00c313bSMauro Carvalho ChehabThe main pointers: 218f00c313bSMauro Carvalho Chehab 219f00c313bSMauro Carvalho Chehab reader page 220f00c313bSMauro Carvalho Chehab - The page used solely by the reader and is not part 221f00c313bSMauro Carvalho Chehab of the ring buffer (may be swapped in) 222f00c313bSMauro Carvalho Chehab 223f00c313bSMauro Carvalho Chehab head page 224f00c313bSMauro Carvalho Chehab - the next page in the ring buffer that will be swapped 225f00c313bSMauro Carvalho Chehab with the reader page. 226f00c313bSMauro Carvalho Chehab 227f00c313bSMauro Carvalho Chehab tail page 228f00c313bSMauro Carvalho Chehab - the page where the next write will take place. 229f00c313bSMauro Carvalho Chehab 230f00c313bSMauro Carvalho Chehab commit page 231f00c313bSMauro Carvalho Chehab - the page that last finished a write. 232f00c313bSMauro Carvalho Chehab 233f00c313bSMauro Carvalho ChehabThe commit page only is updated by the outermost writer in the 234f00c313bSMauro Carvalho Chehabwriter stack. A writer that preempts another writer will not move the 235f00c313bSMauro Carvalho Chehabcommit page. 236f00c313bSMauro Carvalho Chehab 237f00c313bSMauro Carvalho ChehabWhen data is written into the ring buffer, a position is reserved 238f00c313bSMauro Carvalho Chehabin the ring buffer and passed back to the writer. When the writer 239f00c313bSMauro Carvalho Chehabis finished writing data into that position, it commits the write. 240f00c313bSMauro Carvalho Chehab 241f00c313bSMauro Carvalho ChehabAnother write (or a read) may take place at anytime during this 242f00c313bSMauro Carvalho Chehabtransaction. If another write happens it must finish before continuing 243f00c313bSMauro Carvalho Chehabwith the previous write. 244f00c313bSMauro Carvalho Chehab 245f00c313bSMauro Carvalho Chehab 246f00c313bSMauro Carvalho Chehab Write reserve:: 247f00c313bSMauro Carvalho Chehab 248f00c313bSMauro Carvalho Chehab Buffer page 249f00c313bSMauro Carvalho Chehab +---------+ 250f00c313bSMauro Carvalho Chehab |written | 251f00c313bSMauro Carvalho Chehab +---------+ <--- given back to writer (current commit) 252f00c313bSMauro Carvalho Chehab |reserved | 253f00c313bSMauro Carvalho Chehab +---------+ <--- tail pointer 254f00c313bSMauro Carvalho Chehab | empty | 255f00c313bSMauro Carvalho Chehab +---------+ 256f00c313bSMauro Carvalho Chehab 257f00c313bSMauro Carvalho Chehab Write commit:: 258f00c313bSMauro Carvalho Chehab 259f00c313bSMauro Carvalho Chehab Buffer page 260f00c313bSMauro Carvalho Chehab +---------+ 261f00c313bSMauro Carvalho Chehab |written | 262f00c313bSMauro Carvalho Chehab +---------+ 263f00c313bSMauro Carvalho Chehab |written | 264f00c313bSMauro Carvalho Chehab +---------+ <--- next position for write (current commit) 265f00c313bSMauro Carvalho Chehab | empty | 266f00c313bSMauro Carvalho Chehab +---------+ 267f00c313bSMauro Carvalho Chehab 268f00c313bSMauro Carvalho Chehab 269f00c313bSMauro Carvalho Chehab If a write happens after the first reserve:: 270f00c313bSMauro Carvalho Chehab 271f00c313bSMauro Carvalho Chehab Buffer page 272f00c313bSMauro Carvalho Chehab +---------+ 273f00c313bSMauro Carvalho Chehab |written | 274f00c313bSMauro Carvalho Chehab +---------+ <-- current commit 275f00c313bSMauro Carvalho Chehab |reserved | 276f00c313bSMauro Carvalho Chehab +---------+ <--- given back to second writer 277f00c313bSMauro Carvalho Chehab |reserved | 278f00c313bSMauro Carvalho Chehab +---------+ <--- tail pointer 279f00c313bSMauro Carvalho Chehab 280f00c313bSMauro Carvalho Chehab After second writer commits:: 281f00c313bSMauro Carvalho Chehab 282f00c313bSMauro Carvalho Chehab 283f00c313bSMauro Carvalho Chehab Buffer page 284f00c313bSMauro Carvalho Chehab +---------+ 285f00c313bSMauro Carvalho Chehab |written | 286f00c313bSMauro Carvalho Chehab +---------+ <--(last full commit) 287f00c313bSMauro Carvalho Chehab |reserved | 288f00c313bSMauro Carvalho Chehab +---------+ 289f00c313bSMauro Carvalho Chehab |pending | 290f00c313bSMauro Carvalho Chehab |commit | 291f00c313bSMauro Carvalho Chehab +---------+ <--- tail pointer 292f00c313bSMauro Carvalho Chehab 293f00c313bSMauro Carvalho Chehab When the first writer commits:: 294f00c313bSMauro Carvalho Chehab 295f00c313bSMauro Carvalho Chehab Buffer page 296f00c313bSMauro Carvalho Chehab +---------+ 297f00c313bSMauro Carvalho Chehab |written | 298f00c313bSMauro Carvalho Chehab +---------+ 299f00c313bSMauro Carvalho Chehab |written | 300f00c313bSMauro Carvalho Chehab +---------+ 301f00c313bSMauro Carvalho Chehab |written | 302f00c313bSMauro Carvalho Chehab +---------+ <--(last full commit and tail pointer) 303f00c313bSMauro Carvalho Chehab 304f00c313bSMauro Carvalho Chehab 305f00c313bSMauro Carvalho ChehabThe commit pointer points to the last write location that was 306f00c313bSMauro Carvalho Chehabcommitted without preempting another write. When a write that 307f00c313bSMauro Carvalho Chehabpreempted another write is committed, it only becomes a pending commit 308f00c313bSMauro Carvalho Chehaband will not be a full commit until all writes have been committed. 309f00c313bSMauro Carvalho Chehab 310f00c313bSMauro Carvalho ChehabThe commit page points to the page that has the last full commit. 311f00c313bSMauro Carvalho ChehabThe tail page points to the page with the last write (before 312f00c313bSMauro Carvalho Chehabcommitting). 313f00c313bSMauro Carvalho Chehab 314f00c313bSMauro Carvalho ChehabThe tail page is always equal to or after the commit page. It may 315f00c313bSMauro Carvalho Chehabbe several pages ahead. If the tail page catches up to the commit 316f00c313bSMauro Carvalho Chehabpage then no more writes may take place (regardless of the mode 317f00c313bSMauro Carvalho Chehabof the ring buffer: overwrite and produce/consumer). 318f00c313bSMauro Carvalho Chehab 319f00c313bSMauro Carvalho ChehabThe order of pages is:: 320f00c313bSMauro Carvalho Chehab 321f00c313bSMauro Carvalho Chehab head page 322f00c313bSMauro Carvalho Chehab commit page 323f00c313bSMauro Carvalho Chehab tail page 324f00c313bSMauro Carvalho Chehab 325f00c313bSMauro Carvalho ChehabPossible scenario:: 326f00c313bSMauro Carvalho Chehab 327f00c313bSMauro Carvalho Chehab tail page 328f00c313bSMauro Carvalho Chehab head page commit page | 329f00c313bSMauro Carvalho Chehab | | | 330f00c313bSMauro Carvalho Chehab v v v 331f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 332f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 333f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 334f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 335f00c313bSMauro Carvalho Chehab 336f00c313bSMauro Carvalho ChehabThere is a special case that the head page is after either the commit page 337f00c313bSMauro Carvalho Chehaband possibly the tail page. That is when the commit (and tail) page has been 338f00c313bSMauro Carvalho Chehabswapped with the reader page. This is because the head page is always 339f00c313bSMauro Carvalho Chehabpart of the ring buffer, but the reader page is not. Whenever there 340f00c313bSMauro Carvalho Chehabhas been less than a full page that has been committed inside the ring buffer, 341f00c313bSMauro Carvalho Chehaband a reader swaps out a page, it will be swapping out the commit page. 342f00c313bSMauro Carvalho Chehab 343f00c313bSMauro Carvalho Chehab:: 344f00c313bSMauro Carvalho Chehab 345f00c313bSMauro Carvalho Chehab reader page commit page tail page 346f00c313bSMauro Carvalho Chehab | | | 347f00c313bSMauro Carvalho Chehab v | | 348f00c313bSMauro Carvalho Chehab +---+ | | 349f00c313bSMauro Carvalho Chehab | |<----------+ | 350f00c313bSMauro Carvalho Chehab | |<------------------------+ 351f00c313bSMauro Carvalho Chehab | |------+ 352f00c313bSMauro Carvalho Chehab +---+ | 353f00c313bSMauro Carvalho Chehab | 354f00c313bSMauro Carvalho Chehab v 355f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 356f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 357f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 358f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 359f00c313bSMauro Carvalho Chehab ^ 360f00c313bSMauro Carvalho Chehab | 361f00c313bSMauro Carvalho Chehab head page 362f00c313bSMauro Carvalho Chehab 363f00c313bSMauro Carvalho Chehab 364f00c313bSMauro Carvalho ChehabIn this case, the head page will not move when the tail and commit 365f00c313bSMauro Carvalho Chehabmove back into the ring buffer. 366f00c313bSMauro Carvalho Chehab 367f00c313bSMauro Carvalho ChehabThe reader cannot swap a page into the ring buffer if the commit page 368f00c313bSMauro Carvalho Chehabis still on that page. If the read meets the last commit (real commit 369f00c313bSMauro Carvalho Chehabnot pending or reserved), then there is nothing more to read. 370f00c313bSMauro Carvalho ChehabThe buffer is considered empty until another full commit finishes. 371f00c313bSMauro Carvalho Chehab 372f00c313bSMauro Carvalho ChehabWhen the tail meets the head page, if the buffer is in overwrite mode, 373f00c313bSMauro Carvalho Chehabthe head page will be pushed ahead one. If the buffer is in producer/consumer 374f00c313bSMauro Carvalho Chehabmode, the write will fail. 375f00c313bSMauro Carvalho Chehab 376f00c313bSMauro Carvalho ChehabOverwrite mode:: 377f00c313bSMauro Carvalho Chehab 378f00c313bSMauro Carvalho Chehab tail page 379f00c313bSMauro Carvalho Chehab | 380f00c313bSMauro Carvalho Chehab v 381f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 382f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 383f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 384f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 385f00c313bSMauro Carvalho Chehab ^ 386f00c313bSMauro Carvalho Chehab | 387f00c313bSMauro Carvalho Chehab head page 388f00c313bSMauro Carvalho Chehab 389f00c313bSMauro Carvalho Chehab 390f00c313bSMauro Carvalho Chehab tail page 391f00c313bSMauro Carvalho Chehab | 392f00c313bSMauro Carvalho Chehab v 393f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 394f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 395f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 396f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 397f00c313bSMauro Carvalho Chehab ^ 398f00c313bSMauro Carvalho Chehab | 399f00c313bSMauro Carvalho Chehab head page 400f00c313bSMauro Carvalho Chehab 401f00c313bSMauro Carvalho Chehab 402f00c313bSMauro Carvalho Chehab tail page 403f00c313bSMauro Carvalho Chehab | 404f00c313bSMauro Carvalho Chehab v 405f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 406f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 407f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 408f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 409f00c313bSMauro Carvalho Chehab ^ 410f00c313bSMauro Carvalho Chehab | 411f00c313bSMauro Carvalho Chehab head page 412f00c313bSMauro Carvalho Chehab 413f00c313bSMauro Carvalho ChehabNote, the reader page will still point to the previous head page. 414f00c313bSMauro Carvalho ChehabBut when a swap takes place, it will use the most recent head page. 415f00c313bSMauro Carvalho Chehab 416f00c313bSMauro Carvalho Chehab 417f00c313bSMauro Carvalho ChehabMaking the Ring Buffer Lockless: 418f00c313bSMauro Carvalho Chehab-------------------------------- 419f00c313bSMauro Carvalho Chehab 420f00c313bSMauro Carvalho ChehabThe main idea behind the lockless algorithm is to combine the moving 421f00c313bSMauro Carvalho Chehabof the head_page pointer with the swapping of pages with the reader. 422f00c313bSMauro Carvalho ChehabState flags are placed inside the pointer to the page. To do this, 423f00c313bSMauro Carvalho Chehabeach page must be aligned in memory by 4 bytes. This will allow the 2 424f00c313bSMauro Carvalho Chehableast significant bits of the address to be used as flags, since 425f00c313bSMauro Carvalho Chehabthey will always be zero for the address. To get the address, 426f00c313bSMauro Carvalho Chehabsimply mask out the flags:: 427f00c313bSMauro Carvalho Chehab 428f00c313bSMauro Carvalho Chehab MASK = ~3 429f00c313bSMauro Carvalho Chehab 430f00c313bSMauro Carvalho Chehab address & MASK 431f00c313bSMauro Carvalho Chehab 432f00c313bSMauro Carvalho ChehabTwo flags will be kept by these two bits: 433f00c313bSMauro Carvalho Chehab 434f00c313bSMauro Carvalho Chehab HEADER 435f00c313bSMauro Carvalho Chehab - the page being pointed to is a head page 436f00c313bSMauro Carvalho Chehab 437f00c313bSMauro Carvalho Chehab UPDATE 438f00c313bSMauro Carvalho Chehab - the page being pointed to is being updated by a writer 439f00c313bSMauro Carvalho Chehab and was or is about to be a head page. 440f00c313bSMauro Carvalho Chehab 441f00c313bSMauro Carvalho Chehab:: 442f00c313bSMauro Carvalho Chehab 443f00c313bSMauro Carvalho Chehab reader page 444f00c313bSMauro Carvalho Chehab | 445f00c313bSMauro Carvalho Chehab v 446f00c313bSMauro Carvalho Chehab +---+ 447f00c313bSMauro Carvalho Chehab | |------+ 448f00c313bSMauro Carvalho Chehab +---+ | 449f00c313bSMauro Carvalho Chehab | 450f00c313bSMauro Carvalho Chehab v 451f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 452f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 453f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 454f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 455f00c313bSMauro Carvalho Chehab 456f00c313bSMauro Carvalho Chehab 457f00c313bSMauro Carvalho ChehabThe above pointer "-H->" would have the HEADER flag set. That is 458f00c313bSMauro Carvalho Chehabthe next page is the next page to be swapped out by the reader. 459f00c313bSMauro Carvalho ChehabThis pointer means the next page is the head page. 460f00c313bSMauro Carvalho Chehab 461f00c313bSMauro Carvalho ChehabWhen the tail page meets the head pointer, it will use cmpxchg to 462f00c313bSMauro Carvalho Chehabchange the pointer to the UPDATE state:: 463f00c313bSMauro Carvalho Chehab 464f00c313bSMauro Carvalho Chehab 465f00c313bSMauro Carvalho Chehab tail page 466f00c313bSMauro Carvalho Chehab | 467f00c313bSMauro Carvalho Chehab v 468f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 469f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 470f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 471f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 472f00c313bSMauro Carvalho Chehab 473f00c313bSMauro Carvalho Chehab tail page 474f00c313bSMauro Carvalho Chehab | 475f00c313bSMauro Carvalho Chehab v 476f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 477f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |---> 478f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 479f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 480f00c313bSMauro Carvalho Chehab 481f00c313bSMauro Carvalho Chehab"-U->" represents a pointer in the UPDATE state. 482f00c313bSMauro Carvalho Chehab 483f00c313bSMauro Carvalho ChehabAny access to the reader will need to take some sort of lock to serialize 484f00c313bSMauro Carvalho Chehabthe readers. But the writers will never take a lock to write to the 485f00c313bSMauro Carvalho Chehabring buffer. This means we only need to worry about a single reader, 486f00c313bSMauro Carvalho Chehaband writes only preempt in "stack" formation. 487f00c313bSMauro Carvalho Chehab 488f00c313bSMauro Carvalho ChehabWhen the reader tries to swap the page with the ring buffer, it 489f00c313bSMauro Carvalho Chehabwill also use cmpxchg. If the flag bit in the pointer to the 490f00c313bSMauro Carvalho Chehabhead page does not have the HEADER flag set, the compare will fail 491f00c313bSMauro Carvalho Chehaband the reader will need to look for the new head page and try again. 492f00c313bSMauro Carvalho ChehabNote, the flags UPDATE and HEADER are never set at the same time. 493f00c313bSMauro Carvalho Chehab 494f00c313bSMauro Carvalho ChehabThe reader swaps the reader page as follows:: 495f00c313bSMauro Carvalho Chehab 496f00c313bSMauro Carvalho Chehab +------+ 497f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 498f00c313bSMauro Carvalho Chehab |page | 499f00c313bSMauro Carvalho Chehab +------+ 500f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ 501f00c313bSMauro Carvalho Chehab | |--->| |--->| | 502f00c313bSMauro Carvalho Chehab | |<---| |<---| | 503f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ 504f00c313bSMauro Carvalho Chehab ^ | ^ | 505f00c313bSMauro Carvalho Chehab | +---------------+ | 506f00c313bSMauro Carvalho Chehab +-----H-------------+ 507f00c313bSMauro Carvalho Chehab 508f00c313bSMauro Carvalho ChehabThe reader sets the reader page next pointer as HEADER to the page after 509f00c313bSMauro Carvalho Chehabthe head page:: 510f00c313bSMauro Carvalho Chehab 511f00c313bSMauro Carvalho Chehab 512f00c313bSMauro Carvalho Chehab +------+ 513f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 514f00c313bSMauro Carvalho Chehab |page |-------H-----------+ 515f00c313bSMauro Carvalho Chehab +------+ v 516f00c313bSMauro Carvalho Chehab | +---+ +---+ +---+ 517f00c313bSMauro Carvalho Chehab | | |--->| |--->| | 518f00c313bSMauro Carvalho Chehab | | |<---| |<---| |<-+ 519f00c313bSMauro Carvalho Chehab | +---+ +---+ +---+ | 520f00c313bSMauro Carvalho Chehab | ^ | ^ | | 521f00c313bSMauro Carvalho Chehab | | +---------------+ | | 522f00c313bSMauro Carvalho Chehab | +-----H-------------+ | 523f00c313bSMauro Carvalho Chehab +--------------------------------------+ 524f00c313bSMauro Carvalho Chehab 525f00c313bSMauro Carvalho ChehabIt does a cmpxchg with the pointer to the previous head page to make it 526f00c313bSMauro Carvalho Chehabpoint to the reader page. Note that the new pointer does not have the HEADER 527f00c313bSMauro Carvalho Chehabflag set. This action atomically moves the head page forward:: 528f00c313bSMauro Carvalho Chehab 529f00c313bSMauro Carvalho Chehab +------+ 530f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 531f00c313bSMauro Carvalho Chehab |page |-------H-----------+ 532f00c313bSMauro Carvalho Chehab +------+ v 533f00c313bSMauro Carvalho Chehab | ^ +---+ +---+ +---+ 534f00c313bSMauro Carvalho Chehab | | | |-->| |-->| | 535f00c313bSMauro Carvalho Chehab | | | |<--| |<--| |<-+ 536f00c313bSMauro Carvalho Chehab | | +---+ +---+ +---+ | 537f00c313bSMauro Carvalho Chehab | | | ^ | | 538f00c313bSMauro Carvalho Chehab | | +-------------+ | | 539f00c313bSMauro Carvalho Chehab | +-----------------------------+ | 540f00c313bSMauro Carvalho Chehab +------------------------------------+ 541f00c313bSMauro Carvalho Chehab 542f00c313bSMauro Carvalho ChehabAfter the new head page is set, the previous pointer of the head page is 543f00c313bSMauro Carvalho Chehabupdated to the reader page:: 544f00c313bSMauro Carvalho Chehab 545f00c313bSMauro Carvalho Chehab +------+ 546f00c313bSMauro Carvalho Chehab |reader| RING BUFFER 547f00c313bSMauro Carvalho Chehab |page |-------H-----------+ 548f00c313bSMauro Carvalho Chehab +------+ <---------------+ v 549f00c313bSMauro Carvalho Chehab | ^ +---+ +---+ +---+ 550f00c313bSMauro Carvalho Chehab | | | |-->| |-->| | 551f00c313bSMauro Carvalho Chehab | | | | | |<--| |<-+ 552f00c313bSMauro Carvalho Chehab | | +---+ +---+ +---+ | 553f00c313bSMauro Carvalho Chehab | | | ^ | | 554f00c313bSMauro Carvalho Chehab | | +-------------+ | | 555f00c313bSMauro Carvalho Chehab | +-----------------------------+ | 556f00c313bSMauro Carvalho Chehab +------------------------------------+ 557f00c313bSMauro Carvalho Chehab 558f00c313bSMauro Carvalho Chehab +------+ 559f00c313bSMauro Carvalho Chehab |buffer| RING BUFFER 560f00c313bSMauro Carvalho Chehab |page |-------H-----------+ <--- New head page 561f00c313bSMauro Carvalho Chehab +------+ <---------------+ v 562f00c313bSMauro Carvalho Chehab | ^ +---+ +---+ +---+ 563f00c313bSMauro Carvalho Chehab | | | | | |-->| | 564f00c313bSMauro Carvalho Chehab | | New | | | |<--| |<-+ 565f00c313bSMauro Carvalho Chehab | | Reader +---+ +---+ +---+ | 566f00c313bSMauro Carvalho Chehab | | page ----^ | | 567f00c313bSMauro Carvalho Chehab | | | | 568f00c313bSMauro Carvalho Chehab | +-----------------------------+ | 569f00c313bSMauro Carvalho Chehab +------------------------------------+ 570f00c313bSMauro Carvalho Chehab 571f00c313bSMauro Carvalho ChehabAnother important point: The page that the reader page points back to 572f00c313bSMauro Carvalho Chehabby its previous pointer (the one that now points to the new head page) 573f00c313bSMauro Carvalho Chehabnever points back to the reader page. That is because the reader page is 574f00c313bSMauro Carvalho Chehabnot part of the ring buffer. Traversing the ring buffer via the next pointers 575f00c313bSMauro Carvalho Chehabwill always stay in the ring buffer. Traversing the ring buffer via the 576f00c313bSMauro Carvalho Chehabprev pointers may not. 577f00c313bSMauro Carvalho Chehab 578f00c313bSMauro Carvalho ChehabNote, the way to determine a reader page is simply by examining the previous 579f00c313bSMauro Carvalho Chehabpointer of the page. If the next pointer of the previous page does not 580f00c313bSMauro Carvalho Chehabpoint back to the original page, then the original page is a reader page:: 581f00c313bSMauro Carvalho Chehab 582f00c313bSMauro Carvalho Chehab 583f00c313bSMauro Carvalho Chehab +--------+ 584f00c313bSMauro Carvalho Chehab | reader | next +----+ 585f00c313bSMauro Carvalho Chehab | page |-------->| |<====== (buffer page) 586f00c313bSMauro Carvalho Chehab +--------+ +----+ 587f00c313bSMauro Carvalho Chehab | | ^ 588f00c313bSMauro Carvalho Chehab | v | next 589f00c313bSMauro Carvalho Chehab prev | +----+ 590f00c313bSMauro Carvalho Chehab +------------->| | 591f00c313bSMauro Carvalho Chehab +----+ 592f00c313bSMauro Carvalho Chehab 593f00c313bSMauro Carvalho ChehabThe way the head page moves forward: 594f00c313bSMauro Carvalho Chehab 595f00c313bSMauro Carvalho ChehabWhen the tail page meets the head page and the buffer is in overwrite mode 596f00c313bSMauro Carvalho Chehaband more writes take place, the head page must be moved forward before the 597f00c313bSMauro Carvalho Chehabwriter may move the tail page. The way this is done is that the writer 598f00c313bSMauro Carvalho Chehabperforms a cmpxchg to convert the pointer to the head page from the HEADER 599f00c313bSMauro Carvalho Chehabflag to have the UPDATE flag set. Once this is done, the reader will 600f00c313bSMauro Carvalho Chehabnot be able to swap the head page from the buffer, nor will it be able to 601f00c313bSMauro Carvalho Chehabmove the head page, until the writer is finished with the move. 602f00c313bSMauro Carvalho Chehab 603f00c313bSMauro Carvalho ChehabThis eliminates any races that the reader can have on the writer. The reader 604f00c313bSMauro Carvalho Chehabmust spin, and this is why the reader cannot preempt the writer:: 605f00c313bSMauro Carvalho Chehab 606f00c313bSMauro Carvalho Chehab tail page 607f00c313bSMauro Carvalho Chehab | 608f00c313bSMauro Carvalho Chehab v 609f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 610f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 611f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 612f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 613f00c313bSMauro Carvalho Chehab 614f00c313bSMauro Carvalho Chehab tail page 615f00c313bSMauro Carvalho Chehab | 616f00c313bSMauro Carvalho Chehab v 617f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 618f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |---> 619f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 620f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 621f00c313bSMauro Carvalho Chehab 622f00c313bSMauro Carvalho ChehabThe following page will be made into the new head page:: 623f00c313bSMauro Carvalho Chehab 624f00c313bSMauro Carvalho Chehab tail page 625f00c313bSMauro Carvalho Chehab | 626f00c313bSMauro Carvalho Chehab v 627f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 628f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |---> 629f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 630f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 631f00c313bSMauro Carvalho Chehab 632f00c313bSMauro Carvalho ChehabAfter the new head page has been set, we can set the old head page 633f00c313bSMauro Carvalho Chehabpointer back to NORMAL:: 634f00c313bSMauro Carvalho Chehab 635f00c313bSMauro Carvalho Chehab tail page 636f00c313bSMauro Carvalho Chehab | 637f00c313bSMauro Carvalho Chehab v 638f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 639f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |-H->| |---> 640f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 641f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 642f00c313bSMauro Carvalho Chehab 643f00c313bSMauro Carvalho ChehabAfter the head page has been moved, the tail page may now move forward:: 644f00c313bSMauro Carvalho Chehab 645f00c313bSMauro Carvalho Chehab tail page 646f00c313bSMauro Carvalho Chehab | 647f00c313bSMauro Carvalho Chehab v 648f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 649f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |-H->| |---> 650f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 651f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 652f00c313bSMauro Carvalho Chehab 653f00c313bSMauro Carvalho Chehab 654f00c313bSMauro Carvalho ChehabThe above are the trivial updates. Now for the more complex scenarios. 655f00c313bSMauro Carvalho Chehab 656f00c313bSMauro Carvalho Chehab 657f00c313bSMauro Carvalho ChehabAs stated before, if enough writes preempt the first write, the 658f00c313bSMauro Carvalho Chehabtail page may make it all the way around the buffer and meet the commit 659f00c313bSMauro Carvalho Chehabpage. At this time, we must start dropping writes (usually with some kind 660f00c313bSMauro Carvalho Chehabof warning to the user). But what happens if the commit was still on the 661f00c313bSMauro Carvalho Chehabreader page? The commit page is not part of the ring buffer. The tail page 662f00c313bSMauro Carvalho Chehabmust account for this:: 663f00c313bSMauro Carvalho Chehab 664f00c313bSMauro Carvalho Chehab 665f00c313bSMauro Carvalho Chehab reader page commit page 666f00c313bSMauro Carvalho Chehab | | 667f00c313bSMauro Carvalho Chehab v | 668f00c313bSMauro Carvalho Chehab +---+ | 669f00c313bSMauro Carvalho Chehab | |<----------+ 670f00c313bSMauro Carvalho Chehab | | 671f00c313bSMauro Carvalho Chehab | |------+ 672f00c313bSMauro Carvalho Chehab +---+ | 673f00c313bSMauro Carvalho Chehab | 674f00c313bSMauro Carvalho Chehab v 675f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 676f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 677f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 678f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 679f00c313bSMauro Carvalho Chehab ^ 680f00c313bSMauro Carvalho Chehab | 681f00c313bSMauro Carvalho Chehab tail page 682f00c313bSMauro Carvalho Chehab 683f00c313bSMauro Carvalho ChehabIf the tail page were to simply push the head page forward, the commit when 684f00c313bSMauro Carvalho Chehableaving the reader page would not be pointing to the correct page. 685f00c313bSMauro Carvalho Chehab 686f00c313bSMauro Carvalho ChehabThe solution to this is to test if the commit page is on the reader page 687f00c313bSMauro Carvalho Chehabbefore pushing the head page. If it is, then it can be assumed that the 688f00c313bSMauro Carvalho Chehabtail page wrapped the buffer, and we must drop new writes. 689f00c313bSMauro Carvalho Chehab 690f00c313bSMauro Carvalho ChehabThis is not a race condition, because the commit page can only be moved 691f00c313bSMauro Carvalho Chehabby the outermost writer (the writer that was preempted). 692f00c313bSMauro Carvalho ChehabThis means that the commit will not move while a writer is moving the 693f00c313bSMauro Carvalho Chehabtail page. The reader cannot swap the reader page if it is also being 694f00c313bSMauro Carvalho Chehabused as the commit page. The reader can simply check that the commit 695f00c313bSMauro Carvalho Chehabis off the reader page. Once the commit page leaves the reader page 696f00c313bSMauro Carvalho Chehabit will never go back on it unless a reader does another swap with the 697f00c313bSMauro Carvalho Chehabbuffer page that is also the commit page. 698f00c313bSMauro Carvalho Chehab 699f00c313bSMauro Carvalho Chehab 700f00c313bSMauro Carvalho ChehabNested writes 701f00c313bSMauro Carvalho Chehab------------- 702f00c313bSMauro Carvalho Chehab 703f00c313bSMauro Carvalho ChehabIn the pushing forward of the tail page we must first push forward 704f00c313bSMauro Carvalho Chehabthe head page if the head page is the next page. If the head page 705f00c313bSMauro Carvalho Chehabis not the next page, the tail page is simply updated with a cmpxchg. 706f00c313bSMauro Carvalho Chehab 707f00c313bSMauro Carvalho ChehabOnly writers move the tail page. This must be done atomically to protect 708f00c313bSMauro Carvalho Chehabagainst nested writers:: 709f00c313bSMauro Carvalho Chehab 710f00c313bSMauro Carvalho Chehab temp_page = tail_page 711f00c313bSMauro Carvalho Chehab next_page = temp_page->next 712f00c313bSMauro Carvalho Chehab cmpxchg(tail_page, temp_page, next_page) 713f00c313bSMauro Carvalho Chehab 714f00c313bSMauro Carvalho ChehabThe above will update the tail page if it is still pointing to the expected 715f00c313bSMauro Carvalho Chehabpage. If this fails, a nested write pushed it forward, the current write 716f00c313bSMauro Carvalho Chehabdoes not need to push it:: 717f00c313bSMauro Carvalho Chehab 718f00c313bSMauro Carvalho Chehab 719f00c313bSMauro Carvalho Chehab temp page 720f00c313bSMauro Carvalho Chehab | 721f00c313bSMauro Carvalho Chehab v 722f00c313bSMauro Carvalho Chehab tail page 723f00c313bSMauro Carvalho Chehab | 724f00c313bSMauro Carvalho Chehab v 725f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 726f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 727f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 728f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 729f00c313bSMauro Carvalho Chehab 730f00c313bSMauro Carvalho ChehabNested write comes in and moves the tail page forward:: 731f00c313bSMauro Carvalho Chehab 732f00c313bSMauro Carvalho Chehab tail page (moved by nested writer) 733f00c313bSMauro Carvalho Chehab temp page | 734f00c313bSMauro Carvalho Chehab | | 735f00c313bSMauro Carvalho Chehab v v 736f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 737f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |---> 738f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 739f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 740f00c313bSMauro Carvalho Chehab 741f00c313bSMauro Carvalho ChehabThe above would fail the cmpxchg, but since the tail page has already 742f00c313bSMauro Carvalho Chehabbeen moved forward, the writer will just try again to reserve storage 743f00c313bSMauro Carvalho Chehabon the new tail page. 744f00c313bSMauro Carvalho Chehab 745f00c313bSMauro Carvalho ChehabBut the moving of the head page is a bit more complex:: 746f00c313bSMauro Carvalho Chehab 747f00c313bSMauro Carvalho Chehab tail page 748f00c313bSMauro Carvalho Chehab | 749f00c313bSMauro Carvalho Chehab v 750f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 751f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 752f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 753f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 754f00c313bSMauro Carvalho Chehab 755f00c313bSMauro Carvalho ChehabThe write converts the head page pointer to UPDATE:: 756f00c313bSMauro Carvalho Chehab 757f00c313bSMauro Carvalho Chehab tail page 758f00c313bSMauro Carvalho Chehab | 759f00c313bSMauro Carvalho Chehab v 760f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 761f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |---> 762f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 763f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 764f00c313bSMauro Carvalho Chehab 765f00c313bSMauro Carvalho ChehabBut if a nested writer preempts here, it will see that the next 766f00c313bSMauro Carvalho Chehabpage is a head page, but it is also nested. It will detect that 767f00c313bSMauro Carvalho Chehabit is nested and will save that information. The detection is the 768f00c313bSMauro Carvalho Chehabfact that it sees the UPDATE flag instead of a HEADER or NORMAL 769f00c313bSMauro Carvalho Chehabpointer. 770f00c313bSMauro Carvalho Chehab 771f00c313bSMauro Carvalho ChehabThe nested writer will set the new head page pointer:: 772f00c313bSMauro Carvalho Chehab 773f00c313bSMauro Carvalho Chehab tail page 774f00c313bSMauro Carvalho Chehab | 775f00c313bSMauro Carvalho Chehab v 776f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 777f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |---> 778f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 779f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 780f00c313bSMauro Carvalho Chehab 781f00c313bSMauro Carvalho ChehabBut it will not reset the update back to normal. Only the writer 782f00c313bSMauro Carvalho Chehabthat converted a pointer from HEAD to UPDATE will convert it back 783f00c313bSMauro Carvalho Chehabto NORMAL:: 784f00c313bSMauro Carvalho Chehab 785f00c313bSMauro Carvalho Chehab tail page 786f00c313bSMauro Carvalho Chehab | 787f00c313bSMauro Carvalho Chehab v 788f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 789f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |---> 790f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 791f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 792f00c313bSMauro Carvalho Chehab 793f00c313bSMauro Carvalho ChehabAfter the nested writer finishes, the outermost writer will convert 794f00c313bSMauro Carvalho Chehabthe UPDATE pointer to NORMAL:: 795f00c313bSMauro Carvalho Chehab 796f00c313bSMauro Carvalho Chehab 797f00c313bSMauro Carvalho Chehab tail page 798f00c313bSMauro Carvalho Chehab | 799f00c313bSMauro Carvalho Chehab v 800f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 801f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |-H->| |---> 802f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 803f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 804f00c313bSMauro Carvalho Chehab 805f00c313bSMauro Carvalho Chehab 806f00c313bSMauro Carvalho ChehabIt can be even more complex if several nested writes came in and moved 807f00c313bSMauro Carvalho Chehabthe tail page ahead several pages:: 808f00c313bSMauro Carvalho Chehab 809f00c313bSMauro Carvalho Chehab 810f00c313bSMauro Carvalho Chehab (first writer) 811f00c313bSMauro Carvalho Chehab 812f00c313bSMauro Carvalho Chehab tail page 813f00c313bSMauro Carvalho Chehab | 814f00c313bSMauro Carvalho Chehab v 815f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 816f00c313bSMauro Carvalho Chehab <---| |--->| |-H->| |--->| |---> 817f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 818f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 819f00c313bSMauro Carvalho Chehab 820f00c313bSMauro Carvalho ChehabThe write converts the head page pointer to UPDATE:: 821f00c313bSMauro Carvalho Chehab 822f00c313bSMauro Carvalho Chehab tail page 823f00c313bSMauro Carvalho Chehab | 824f00c313bSMauro Carvalho Chehab v 825f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 826f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |---> 827f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 828f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 829f00c313bSMauro Carvalho Chehab 830f00c313bSMauro Carvalho ChehabNext writer comes in, and sees the update and sets up the new 831f00c313bSMauro Carvalho Chehabhead page:: 832f00c313bSMauro Carvalho Chehab 833f00c313bSMauro Carvalho Chehab (second writer) 834f00c313bSMauro Carvalho Chehab 835f00c313bSMauro Carvalho Chehab tail page 836f00c313bSMauro Carvalho Chehab | 837f00c313bSMauro Carvalho Chehab v 838f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 839f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |---> 840f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 841f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 842f00c313bSMauro Carvalho Chehab 843f00c313bSMauro Carvalho ChehabThe nested writer moves the tail page forward. But does not set the old 844f00c313bSMauro Carvalho Chehabupdate page to NORMAL because it is not the outermost writer:: 845f00c313bSMauro Carvalho Chehab 846f00c313bSMauro Carvalho Chehab tail page 847f00c313bSMauro Carvalho Chehab | 848f00c313bSMauro Carvalho Chehab v 849f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 850f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |---> 851f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 852f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 853f00c313bSMauro Carvalho Chehab 854f00c313bSMauro Carvalho ChehabAnother writer preempts and sees the page after the tail page is a head page. 855f00c313bSMauro Carvalho ChehabIt changes it from HEAD to UPDATE:: 856f00c313bSMauro Carvalho Chehab 857f00c313bSMauro Carvalho Chehab (third writer) 858f00c313bSMauro Carvalho Chehab 859f00c313bSMauro Carvalho Chehab tail page 860f00c313bSMauro Carvalho Chehab | 861f00c313bSMauro Carvalho Chehab v 862f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 863f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-U->| |---> 864f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 865f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 866f00c313bSMauro Carvalho Chehab 867f00c313bSMauro Carvalho ChehabThe writer will move the head page forward:: 868f00c313bSMauro Carvalho Chehab 869f00c313bSMauro Carvalho Chehab 870f00c313bSMauro Carvalho Chehab (third writer) 871f00c313bSMauro Carvalho Chehab 872f00c313bSMauro Carvalho Chehab tail page 873f00c313bSMauro Carvalho Chehab | 874f00c313bSMauro Carvalho Chehab v 875f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 876f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-U->| |-H-> 877f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 878f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 879f00c313bSMauro Carvalho Chehab 880f00c313bSMauro Carvalho ChehabBut now that the third writer did change the HEAD flag to UPDATE it 881f00c313bSMauro Carvalho Chehabwill convert it to normal:: 882f00c313bSMauro Carvalho Chehab 883f00c313bSMauro Carvalho Chehab 884f00c313bSMauro Carvalho Chehab (third writer) 885f00c313bSMauro Carvalho Chehab 886f00c313bSMauro Carvalho Chehab tail page 887f00c313bSMauro Carvalho Chehab | 888f00c313bSMauro Carvalho Chehab v 889f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 890f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |-H-> 891f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 892f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 893f00c313bSMauro Carvalho Chehab 894f00c313bSMauro Carvalho Chehab 895f00c313bSMauro Carvalho ChehabThen it will move the tail page, and return back to the second writer:: 896f00c313bSMauro Carvalho Chehab 897f00c313bSMauro Carvalho Chehab 898f00c313bSMauro Carvalho Chehab (second writer) 899f00c313bSMauro Carvalho Chehab 900f00c313bSMauro Carvalho Chehab tail page 901f00c313bSMauro Carvalho Chehab | 902f00c313bSMauro Carvalho Chehab v 903f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 904f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |-H-> 905f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 906f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 907f00c313bSMauro Carvalho Chehab 908f00c313bSMauro Carvalho Chehab 909f00c313bSMauro Carvalho ChehabThe second writer will fail to move the tail page because it was already 910f00c313bSMauro Carvalho Chehabmoved, so it will try again and add its data to the new tail page. 911f00c313bSMauro Carvalho ChehabIt will return to the first writer:: 912f00c313bSMauro Carvalho Chehab 913f00c313bSMauro Carvalho Chehab 914f00c313bSMauro Carvalho Chehab (first writer) 915f00c313bSMauro Carvalho Chehab 916f00c313bSMauro Carvalho Chehab tail page 917f00c313bSMauro Carvalho Chehab | 918f00c313bSMauro Carvalho Chehab v 919f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 920f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |-H-> 921f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 922f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 923f00c313bSMauro Carvalho Chehab 924f00c313bSMauro Carvalho ChehabThe first writer cannot know atomically if the tail page moved 925f00c313bSMauro Carvalho Chehabwhile it updates the HEAD page. It will then update the head page to 926f00c313bSMauro Carvalho Chehabwhat it thinks is the new head page:: 927f00c313bSMauro Carvalho Chehab 928f00c313bSMauro Carvalho Chehab 929f00c313bSMauro Carvalho Chehab (first writer) 930f00c313bSMauro Carvalho Chehab 931f00c313bSMauro Carvalho Chehab tail page 932f00c313bSMauro Carvalho Chehab | 933f00c313bSMauro Carvalho Chehab v 934f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 935f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |-H-> 936f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 937f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 938f00c313bSMauro Carvalho Chehab 939f00c313bSMauro Carvalho ChehabSince the cmpxchg returns the old value of the pointer the first writer 940f00c313bSMauro Carvalho Chehabwill see it succeeded in updating the pointer from NORMAL to HEAD. 941f00c313bSMauro Carvalho ChehabBut as we can see, this is not good enough. It must also check to see 942f00c313bSMauro Carvalho Chehabif the tail page is either where it use to be or on the next page:: 943f00c313bSMauro Carvalho Chehab 944f00c313bSMauro Carvalho Chehab 945f00c313bSMauro Carvalho Chehab (first writer) 946f00c313bSMauro Carvalho Chehab 947f00c313bSMauro Carvalho Chehab A B tail page 948f00c313bSMauro Carvalho Chehab | | | 949f00c313bSMauro Carvalho Chehab v v v 950f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 951f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |-H->| |-H-> 952f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 953f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 954f00c313bSMauro Carvalho Chehab 955f00c313bSMauro Carvalho ChehabIf tail page != A and tail page != B, then it must reset the pointer 956f00c313bSMauro Carvalho Chehabback to NORMAL. The fact that it only needs to worry about nested 957f00c313bSMauro Carvalho Chehabwriters means that it only needs to check this after setting the HEAD page:: 958f00c313bSMauro Carvalho Chehab 959f00c313bSMauro Carvalho Chehab 960f00c313bSMauro Carvalho Chehab (first writer) 961f00c313bSMauro Carvalho Chehab 962f00c313bSMauro Carvalho Chehab A B tail page 963f00c313bSMauro Carvalho Chehab | | | 964f00c313bSMauro Carvalho Chehab v v v 965f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 966f00c313bSMauro Carvalho Chehab <---| |--->| |-U->| |--->| |-H-> 967f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 968f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 969f00c313bSMauro Carvalho Chehab 970f00c313bSMauro Carvalho ChehabNow the writer can update the head page. This is also why the head page must 971f00c313bSMauro Carvalho Chehabremain in UPDATE and only reset by the outermost writer. This prevents 972f00c313bSMauro Carvalho Chehabthe reader from seeing the incorrect head page:: 973f00c313bSMauro Carvalho Chehab 974f00c313bSMauro Carvalho Chehab 975f00c313bSMauro Carvalho Chehab (first writer) 976f00c313bSMauro Carvalho Chehab 977f00c313bSMauro Carvalho Chehab A B tail page 978f00c313bSMauro Carvalho Chehab | | | 979f00c313bSMauro Carvalho Chehab v v v 980f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 981f00c313bSMauro Carvalho Chehab <---| |--->| |--->| |--->| |-H-> 982f00c313bSMauro Carvalho Chehab --->| |<---| |<---| |<---| |<--- 983f00c313bSMauro Carvalho Chehab +---+ +---+ +---+ +---+ 984