1.. SPDX-License-Identifier: GPL-2.0+ 2 3====== 4XArray 5====== 6 7:Author: Matthew Wilcox 8 9Overview 10======== 11 12The XArray is an abstract data type which behaves like a very large array 13of pointers. It meets many of the same needs as a hash or a conventional 14resizable array. Unlike a hash, it allows you to sensibly go to the 15next or previous entry in a cache-efficient manner. In contrast to a 16resizable array, there is no need to copy data or change MMU mappings in 17order to grow the array. It is more memory-efficient, parallelisable 18and cache friendly than a doubly-linked list. It takes advantage of 19RCU to perform lookups without locking. 20 21The XArray implementation is efficient when the indices used are densely 22clustered; hashing the object and using the hash as the index will not 23perform well. The XArray is optimised for small indices, but still has 24good performance with large indices. If your index can be larger than 25``ULONG_MAX`` then the XArray is not the data type for you. The most 26important user of the XArray is the page cache. 27 28Normal pointers may be stored in the XArray directly. They must be 4-byte 29aligned, which is true for any pointer returned from kmalloc() and 30alloc_page(). It isn't true for arbitrary user-space pointers, 31nor for function pointers. You can store pointers to statically allocated 32objects, as long as those objects have an alignment of at least 4. 33 34You can also store integers between 0 and ``LONG_MAX`` in the XArray. 35You must first convert it into an entry using xa_mk_value(). 36When you retrieve an entry from the XArray, you can check whether it is 37a value entry by calling xa_is_value(), and convert it back to 38an integer by calling xa_to_value(). 39 40Some users want to tag the pointers they store in the XArray. You can 41call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer() 42to turn a tagged entry back into an untagged pointer and xa_pointer_tag() 43to retrieve the tag of an entry. Tagged pointers use the same bits that 44are used to distinguish value entries from normal pointers, so you must 45decide whether they want to store value entries or tagged pointers in 46any particular XArray. 47 48The XArray does not support storing IS_ERR() pointers as some 49conflict with value entries or internal entries. 50 51An unusual feature of the XArray is the ability to create entries which 52occupy a range of indices. Once stored to, looking up any index in 53the range will return the same entry as looking up any other index in 54the range. Storing to any index will store to all of them. Multi-index 55entries can be explicitly split into smaller entries, or storing ``NULL`` 56into any entry will cause the XArray to forget about the range. 57 58Normal API 59========== 60 61Start by initialising an XArray, either with DEFINE_XARRAY() 62for statically allocated XArrays or xa_init() for dynamically 63allocated ones. A freshly-initialised XArray contains a ``NULL`` 64pointer at every index. 65 66You can then set entries using xa_store() and get entries 67using xa_load(). xa_store will overwrite any entry with the 68new entry and return the previous entry stored at that index. You can 69use xa_erase() instead of calling xa_store() with a 70``NULL`` entry. There is no difference between an entry that has never 71been stored to, one that has been erased and one that has most recently 72had ``NULL`` stored to it. 73 74You can conditionally replace an entry at an index by using 75xa_cmpxchg(). Like cmpxchg(), it will only succeed if 76the entry at that index has the 'old' value. It also returns the entry 77which was at that index; if it returns the same entry which was passed as 78'old', then xa_cmpxchg() succeeded. 79 80If you want to only store a new entry to an index if the current entry 81at that index is ``NULL``, you can use xa_insert() which 82returns ``-EBUSY`` if the entry is not empty. 83 84You can copy entries out of the XArray into a plain array by calling 85xa_extract(). Or you can iterate over the present entries in 86the XArray by calling xa_for_each(). You may prefer to use 87xa_find() or xa_find_after() to move to the next present 88entry in the XArray. 89 90Calling xa_store_range() stores the same entry in a range 91of indices. If you do this, some of the other operations will behave 92in a slightly odd way. For example, marking the entry at one index 93may result in the entry being marked at some, but not all of the other 94indices. Storing into one index may result in the entry retrieved by 95some, but not all of the other indices changing. 96 97Sometimes you need to ensure that a subsequent call to xa_store() 98will not need to allocate memory. The xa_reserve() function 99will store a reserved entry at the indicated index. Users of the 100normal API will see this entry as containing ``NULL``. If you do 101not need to use the reserved entry, you can call xa_release() 102to remove the unused entry. If another user has stored to the entry 103in the meantime, xa_release() will do nothing; if instead you 104want the entry to become ``NULL``, you should use xa_erase(). 105Using xa_insert() on a reserved entry will fail. 106 107If all entries in the array are ``NULL``, the xa_empty() function 108will return ``true``. 109 110Finally, you can remove all entries from an XArray by calling 111xa_destroy(). If the XArray entries are pointers, you may wish 112to free the entries first. You can do this by iterating over all present 113entries in the XArray using the xa_for_each() iterator. 114 115Search Marks 116------------ 117 118Each entry in the array has three bits associated with it called marks. 119Each mark may be set or cleared independently of the others. You can 120iterate over marked entries by using the xa_for_each_marked() iterator. 121 122You can enquire whether a mark is set on an entry by using 123xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it 124by using xa_set_mark() and remove the mark from an entry by calling 125xa_clear_mark(). You can ask whether any entry in the XArray has a 126particular mark set by calling xa_marked(). Erasing an entry from the 127XArray causes all marks associated with that entry to be cleared. 128 129Setting or clearing a mark on any index of a multi-index entry will 130affect all indices covered by that entry. Querying the mark on any 131index will return the same result. 132 133There is no way to iterate over entries which are not marked; the data 134structure does not allow this to be implemented efficiently. There are 135not currently iterators to search for logical combinations of bits (eg 136iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2`` 137set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2`` 138set). It would be possible to add these if a user arises. 139 140Allocating XArrays 141------------------ 142 143If you use DEFINE_XARRAY_ALLOC() to define the XArray, or 144initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(), 145the XArray changes to track whether entries are in use or not. 146 147You can call xa_alloc() to store the entry at an unused index 148in the XArray. If you need to modify the array from interrupt context, 149you can use xa_alloc_bh() or xa_alloc_irq() to disable 150interrupts while allocating the ID. 151 152Using xa_store(), xa_cmpxchg() or xa_insert() will 153also mark the entry as being allocated. Unlike a normal XArray, storing 154``NULL`` will mark the entry as being in use, like xa_reserve(). 155To free an entry, use xa_erase() (or xa_release() if 156you only want to free the entry if it's ``NULL``). 157 158By default, the lowest free entry is allocated starting from 0. If you 159want to allocate entries starting at 1, it is more efficient to use 160DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``. If you want to 161allocate IDs up to a maximum, then wrap back around to the lowest free 162ID, you can use xa_alloc_cyclic(). 163 164You cannot use ``XA_MARK_0`` with an allocating XArray as this mark 165is used to track whether an entry is free or not. The other marks are 166available for your use. 167 168Memory allocation 169----------------- 170 171The xa_store(), xa_cmpxchg(), xa_alloc(), 172xa_reserve() and xa_insert() functions take a gfp_t 173parameter in case the XArray needs to allocate memory to store this entry. 174If the entry is being deleted, no memory allocation needs to be performed, 175and the GFP flags specified will be ignored. 176 177It is possible for no memory to be allocatable, particularly if you pass 178a restrictive set of GFP flags. In that case, the functions return a 179special value which can be turned into an errno using xa_err(). 180If you don't need to know exactly which error occurred, using 181xa_is_err() is slightly more efficient. 182 183Locking 184------- 185 186When using the Normal API, you do not have to worry about locking. 187The XArray uses RCU and an internal spinlock to synchronise access: 188 189No lock needed: 190 * xa_empty() 191 * xa_marked() 192 193Takes RCU read lock: 194 * xa_load() 195 * xa_for_each() 196 * xa_find() 197 * xa_find_after() 198 * xa_extract() 199 * xa_get_mark() 200 201Takes xa_lock internally: 202 * xa_store() 203 * xa_store_bh() 204 * xa_store_irq() 205 * xa_insert() 206 * xa_insert_bh() 207 * xa_insert_irq() 208 * xa_erase() 209 * xa_erase_bh() 210 * xa_erase_irq() 211 * xa_cmpxchg() 212 * xa_cmpxchg_bh() 213 * xa_cmpxchg_irq() 214 * xa_store_range() 215 * xa_alloc() 216 * xa_alloc_bh() 217 * xa_alloc_irq() 218 * xa_reserve() 219 * xa_reserve_bh() 220 * xa_reserve_irq() 221 * xa_destroy() 222 * xa_set_mark() 223 * xa_clear_mark() 224 225Assumes xa_lock held on entry: 226 * __xa_store() 227 * __xa_insert() 228 * __xa_erase() 229 * __xa_cmpxchg() 230 * __xa_alloc() 231 * __xa_set_mark() 232 * __xa_clear_mark() 233 234If you want to take advantage of the lock to protect the data structures 235that you are storing in the XArray, you can call xa_lock() 236before calling xa_load(), then take a reference count on the 237object you have found before calling xa_unlock(). This will 238prevent stores from removing the object from the array between looking 239up the object and incrementing the refcount. You can also use RCU to 240avoid dereferencing freed memory, but an explanation of that is beyond 241the scope of this document. 242 243The XArray does not disable interrupts or softirqs while modifying 244the array. It is safe to read the XArray from interrupt or softirq 245context as the RCU lock provides enough protection. 246 247If, for example, you want to store entries in the XArray in process 248context and then erase them in softirq context, you can do that this way:: 249 250 void foo_init(struct foo *foo) 251 { 252 xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); 253 } 254 255 int foo_store(struct foo *foo, unsigned long index, void *entry) 256 { 257 int err; 258 259 xa_lock_bh(&foo->array); 260 err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); 261 if (!err) 262 foo->count++; 263 xa_unlock_bh(&foo->array); 264 return err; 265 } 266 267 /* foo_erase() is only called from softirq context */ 268 void foo_erase(struct foo *foo, unsigned long index) 269 { 270 xa_lock(&foo->array); 271 __xa_erase(&foo->array, index); 272 foo->count--; 273 xa_unlock(&foo->array); 274 } 275 276If you are going to modify the XArray from interrupt or softirq context, 277you need to initialise the array using xa_init_flags(), passing 278``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. 279 280The above example also shows a common pattern of wanting to extend the 281coverage of the xa_lock on the store side to protect some statistics 282associated with the array. 283 284Sharing the XArray with interrupt context is also possible, either 285using xa_lock_irqsave() in both the interrupt handler and process 286context, or xa_lock_irq() in process context and xa_lock() 287in the interrupt handler. Some of the more common patterns have helper 288functions such as xa_store_bh(), xa_store_irq(), 289xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh() 290and xa_cmpxchg_irq(). 291 292Sometimes you need to protect access to the XArray with a mutex because 293that lock sits above another mutex in the locking hierarchy. That does 294not entitle you to use functions like __xa_erase() without taking 295the xa_lock; the xa_lock is used for lockdep validation and will be used 296for other purposes in the future. 297 298The __xa_set_mark() and __xa_clear_mark() functions are also 299available for situations where you look up an entry and want to atomically 300set or clear a mark. It may be more efficient to use the advanced API 301in this case, as it will save you from walking the tree twice. 302 303Advanced API 304============ 305 306The advanced API offers more flexibility and better performance at the 307cost of an interface which can be harder to use and has fewer safeguards. 308No locking is done for you by the advanced API, and you are required 309to use the xa_lock while modifying the array. You can choose whether 310to use the xa_lock or the RCU lock while doing read-only operations on 311the array. You can mix advanced and normal operations on the same array; 312indeed the normal API is implemented in terms of the advanced API. The 313advanced API is only available to modules with a GPL-compatible license. 314 315The advanced API is based around the xa_state. This is an opaque data 316structure which you declare on the stack using the XA_STATE() 317macro. This macro initialises the xa_state ready to start walking 318around the XArray. It is used as a cursor to maintain the position 319in the XArray and let you compose various operations together without 320having to restart from the top every time. 321 322The xa_state is also used to store errors. You can call 323xas_error() to retrieve the error. All operations check whether 324the xa_state is in an error state before proceeding, so there's no need 325for you to check for an error after each call; you can make multiple 326calls in succession and only check at a convenient point. The only 327errors currently generated by the XArray code itself are ``ENOMEM`` and 328``EINVAL``, but it supports arbitrary errors in case you want to call 329xas_set_err() yourself. 330 331If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem() 332will attempt to allocate more memory using the specified gfp flags and 333cache it in the xa_state for the next attempt. The idea is that you take 334the xa_lock, attempt the operation and drop the lock. The operation 335attempts to allocate memory while holding the lock, but it is more 336likely to fail. Once you have dropped the lock, xas_nomem() 337can try harder to allocate more memory. It will return ``true`` if it 338is worth retrying the operation (i.e. that there was a memory error *and* 339more memory was allocated). If it has previously allocated memory, and 340that memory wasn't used, and there is no error (or some error that isn't 341``ENOMEM``), then it will free the memory previously allocated. 342 343Internal Entries 344---------------- 345 346The XArray reserves some entries for its own purposes. These are never 347exposed through the normal API, but when using the advanced API, it's 348possible to see them. Usually the best way to handle them is to pass them 349to xas_retry(), and retry the operation if it returns ``true``. 350 351.. flat-table:: 352 :widths: 1 1 6 353 354 * - Name 355 - Test 356 - Usage 357 358 * - Node 359 - xa_is_node() 360 - An XArray node. May be visible when using a multi-index xa_state. 361 362 * - Sibling 363 - xa_is_sibling() 364 - A non-canonical entry for a multi-index entry. The value indicates 365 which slot in this node has the canonical entry. 366 367 * - Retry 368 - xa_is_retry() 369 - This entry is currently being modified by a thread which has the 370 xa_lock. The node containing this entry may be freed at the end 371 of this RCU period. You should restart the lookup from the head 372 of the array. 373 374 * - Zero 375 - xa_is_zero() 376 - Zero entries appear as ``NULL`` through the Normal API, but occupy 377 an entry in the XArray which can be used to reserve the index for 378 future use. This is used by allocating XArrays for allocated entries 379 which are ``NULL``. 380 381Other internal entries may be added in the future. As far as possible, they 382will be handled by xas_retry(). 383 384Additional functionality 385------------------------ 386 387The xas_create_range() function allocates all the necessary memory 388to store every entry in a range. It will set ENOMEM in the xa_state if 389it cannot allocate memory. 390 391You can use xas_init_marks() to reset the marks on an entry 392to their default state. This is usually all marks clear, unless the 393XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set 394and all other marks are clear. Replacing one entry with another using 395xas_store() will not reset the marks on that entry; if you want 396the marks reset, you should do that explicitly. 397 398The xas_load() will walk the xa_state as close to the entry 399as it can. If you know the xa_state has already been walked to the 400entry and need to check that the entry hasn't changed, you can use 401xas_reload() to save a function call. 402 403If you need to move to a different index in the XArray, call 404xas_set(). This resets the cursor to the top of the tree, which 405will generally make the next operation walk the cursor to the desired 406spot in the tree. If you want to move to the next or previous index, 407call xas_next() or xas_prev(). Setting the index does 408not walk the cursor around the array so does not require a lock to be 409held, while moving to the next or previous index does. 410 411You can search for the next present entry using xas_find(). This 412is the equivalent of both xa_find() and xa_find_after(); 413if the cursor has been walked to an entry, then it will find the next 414entry after the one currently referenced. If not, it will return the 415entry at the index of the xa_state. Using xas_next_entry() to 416move to the next present entry instead of xas_find() will save 417a function call in the majority of cases at the expense of emitting more 418inline code. 419 420The xas_find_marked() function is similar. If the xa_state has 421not been walked, it will return the entry at the index of the xa_state, 422if it is marked. Otherwise, it will return the first marked entry after 423the entry referenced by the xa_state. The xas_next_marked() 424function is the equivalent of xas_next_entry(). 425 426When iterating over a range of the XArray using xas_for_each() 427or xas_for_each_marked(), it may be necessary to temporarily stop 428the iteration. The xas_pause() function exists for this purpose. 429After you have done the necessary work and wish to resume, the xa_state 430is in an appropriate state to continue the iteration after the entry 431you last processed. If you have interrupts disabled while iterating, 432then it is good manners to pause the iteration and reenable interrupts 433every ``XA_CHECK_SCHED`` entries. 434 435The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require 436the xa_state cursor to have been moved to the appropriate location in the 437XArray; they will do nothing if you have called xas_pause() or xas_set() 438immediately before. 439 440You can call xas_set_update() to have a callback function 441called each time the XArray updates a node. This is used by the page 442cache workingset code to maintain its list of nodes which contain only 443shadow entries. 444 445Multi-Index Entries 446------------------- 447 448The XArray has the ability to tie multiple indices together so that 449operations on one index affect all indices. For example, storing into 450any index will change the value of the entry retrieved from any index. 451Setting or clearing a mark on any index will set or clear the mark 452on every index that is tied together. The current implementation 453only allows tying ranges which are aligned powers of two together; 454eg indices 64-127 may be tied together, but 2-6 may not be. This may 455save substantial quantities of memory; for example tying 512 entries 456together will save over 4kB. 457 458You can create a multi-index entry by using XA_STATE_ORDER() 459or xas_set_order() followed by a call to xas_store(). 460Calling xas_load() with a multi-index xa_state will walk the 461xa_state to the right location in the tree, but the return value is not 462meaningful, potentially being an internal entry or ``NULL`` even when there 463is an entry stored within the range. Calling xas_find_conflict() 464will return the first entry within the range or ``NULL`` if there are no 465entries in the range. The xas_for_each_conflict() iterator will 466iterate over every entry which overlaps the specified range. 467 468If xas_load() encounters a multi-index entry, the xa_index 469in the xa_state will not be changed. When iterating over an XArray 470or calling xas_find(), if the initial index is in the middle 471of a multi-index entry, it will not be altered. Subsequent calls 472or iterations will move the index to the first index in the range. 473Each entry will only be returned once, no matter how many indices it 474occupies. 475 476Using xas_next() or xas_prev() with a multi-index xa_state 477is not supported. Using either of these functions on a multi-index entry 478will reveal sibling entries; these should be skipped over by the caller. 479 480Storing ``NULL`` into any index of a multi-index entry will set the entry 481at every index to ``NULL`` and dissolve the tie. Splitting a multi-index 482entry into entries occupying smaller ranges is not yet supported. 483 484Functions and structures 485======================== 486 487.. kernel-doc:: include/linux/xarray.h 488.. kernel-doc:: lib/xarray.c 489