1.. SPDX-License-Identifier: GPL-2.0+ 2 3====== 4XArray 5====== 6 7:Author: Matthew Wilcox 8 9Overview 10======== 11 12The XArray is an abstract data type which behaves like a very large array 13of pointers. It meets many of the same needs as a hash or a conventional 14resizable array. Unlike a hash, it allows you to sensibly go to the 15next or previous entry in a cache-efficient manner. In contrast to a 16resizable array, there is no need to copy data or change MMU mappings in 17order to grow the array. It is more memory-efficient, parallelisable 18and cache friendly than a doubly-linked list. It takes advantage of 19RCU to perform lookups without locking. 20 21The XArray implementation is efficient when the indices used are densely 22clustered; hashing the object and using the hash as the index will not 23perform well. The XArray is optimised for small indices, but still has 24good performance with large indices. If your index can be larger than 25``ULONG_MAX`` then the XArray is not the data type for you. The most 26important user of the XArray is the page cache. 27 28Each non-``NULL`` entry in the array has three bits associated with 29it called marks. Each mark may be set or cleared independently of 30the others. You can iterate over entries which are marked. 31 32Normal pointers may be stored in the XArray directly. They must be 4-byte 33aligned, which is true for any pointer returned from :c:func:`kmalloc` and 34:c:func:`alloc_page`. It isn't true for arbitrary user-space pointers, 35nor for function pointers. You can store pointers to statically allocated 36objects, as long as those objects have an alignment of at least 4. 37 38You can also store integers between 0 and ``LONG_MAX`` in the XArray. 39You must first convert it into an entry using :c:func:`xa_mk_value`. 40When you retrieve an entry from the XArray, you can check whether it is 41a value entry by calling :c:func:`xa_is_value`, and convert it back to 42an integer by calling :c:func:`xa_to_value`. 43 44Some users want to store tagged pointers instead of using the marks 45described above. They can call :c:func:`xa_tag_pointer` to create an 46entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry 47back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve 48the tag of an entry. Tagged pointers use the same bits that are used 49to distinguish value entries from normal pointers, so each user must 50decide whether they want to store value entries or tagged pointers in 51any particular XArray. 52 53The XArray does not support storing :c:func:`IS_ERR` pointers as some 54conflict with value entries or internal entries. 55 56An unusual feature of the XArray is the ability to create entries which 57occupy a range of indices. Once stored to, looking up any index in 58the range will return the same entry as looking up any other index in 59the range. Setting a mark on one index will set it on all of them. 60Storing to any index will store to all of them. Multi-index entries can 61be explicitly split into smaller entries, or storing ``NULL`` into any 62entry will cause the XArray to forget about the range. 63 64Normal API 65========== 66 67Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY` 68for statically allocated XArrays or :c:func:`xa_init` for dynamically 69allocated ones. A freshly-initialised XArray contains a ``NULL`` 70pointer at every index. 71 72You can then set entries using :c:func:`xa_store` and get entries 73using :c:func:`xa_load`. xa_store will overwrite any entry with the 74new entry and return the previous entry stored at that index. You can 75use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a 76``NULL`` entry. There is no difference between an entry that has never 77been stored to, one that has been erased and one that has most recently 78had ``NULL`` stored to it. 79 80You can conditionally replace an entry at an index by using 81:c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if 82the entry at that index has the 'old' value. It also returns the entry 83which was at that index; if it returns the same entry which was passed as 84'old', then :c:func:`xa_cmpxchg` succeeded. 85 86If you want to only store a new entry to an index if the current entry 87at that index is ``NULL``, you can use :c:func:`xa_insert` which 88returns ``-EEXIST`` if the entry is not empty. 89 90You can enquire whether a mark is set on an entry by using 91:c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark 92on it by using :c:func:`xa_set_mark` and remove the mark from an entry by 93calling :c:func:`xa_clear_mark`. You can ask whether any entry in the 94XArray has a particular mark set by calling :c:func:`xa_marked`. 95 96You can copy entries out of the XArray into a plain array by calling 97:c:func:`xa_extract`. Or you can iterate over the present entries in 98the XArray by calling :c:func:`xa_for_each`. You may prefer to use 99:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present 100entry in the XArray. 101 102Calling :c:func:`xa_store_range` stores the same entry in a range 103of indices. If you do this, some of the other operations will behave 104in a slightly odd way. For example, marking the entry at one index 105may result in the entry being marked at some, but not all of the other 106indices. Storing into one index may result in the entry retrieved by 107some, but not all of the other indices changing. 108 109Sometimes you need to ensure that a subsequent call to :c:func:`xa_store` 110will not need to allocate memory. The :c:func:`xa_reserve` function 111will store a reserved entry at the indicated index. Users of the 112normal API will see this entry as containing ``NULL``. If you do 113not need to use the reserved entry, you can call :c:func:`xa_release` 114to remove the unused entry. If another user has stored to the entry 115in the meantime, :c:func:`xa_release` will do nothing; if instead you 116want the entry to become ``NULL``, you should use :c:func:`xa_erase`. 117Using :c:func:`xa_insert` on a reserved entry will fail. 118 119If all entries in the array are ``NULL``, the :c:func:`xa_empty` function 120will return ``true``. 121 122Finally, you can remove all entries from an XArray by calling 123:c:func:`xa_destroy`. If the XArray entries are pointers, you may wish 124to free the entries first. You can do this by iterating over all present 125entries in the XArray using the :c:func:`xa_for_each` iterator. 126 127Allocating XArrays 128------------------ 129 130If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or 131initialise it by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`, 132the XArray changes to track whether entries are in use or not. 133 134You can call :c:func:`xa_alloc` to store the entry at any unused index 135in the XArray. If you need to modify the array from interrupt context, 136you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable 137interrupts while allocating the ID. 138 139Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` 140will mark the entry as being allocated. Unlike a normal XArray, storing 141``NULL`` will mark the entry as being in use, like :c:func:`xa_reserve`. 142To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if 143you only want to free the entry if it's ``NULL``). 144 145You cannot use ``XA_MARK_0`` with an allocating XArray as this mark 146is used to track whether an entry is free or not. The other marks are 147available for your use. 148 149Memory allocation 150----------------- 151 152The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`, 153:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t 154parameter in case the XArray needs to allocate memory to store this entry. 155If the entry is being deleted, no memory allocation needs to be performed, 156and the GFP flags specified will be ignored. 157 158It is possible for no memory to be allocatable, particularly if you pass 159a restrictive set of GFP flags. In that case, the functions return a 160special value which can be turned into an errno using :c:func:`xa_err`. 161If you don't need to know exactly which error occurred, using 162:c:func:`xa_is_err` is slightly more efficient. 163 164Locking 165------- 166 167When using the Normal API, you do not have to worry about locking. 168The XArray uses RCU and an internal spinlock to synchronise access: 169 170No lock needed: 171 * :c:func:`xa_empty` 172 * :c:func:`xa_marked` 173 174Takes RCU read lock: 175 * :c:func:`xa_load` 176 * :c:func:`xa_for_each` 177 * :c:func:`xa_find` 178 * :c:func:`xa_find_after` 179 * :c:func:`xa_extract` 180 * :c:func:`xa_get_mark` 181 182Takes xa_lock internally: 183 * :c:func:`xa_store` 184 * :c:func:`xa_store_bh` 185 * :c:func:`xa_store_irq` 186 * :c:func:`xa_insert` 187 * :c:func:`xa_insert_bh` 188 * :c:func:`xa_insert_irq` 189 * :c:func:`xa_erase` 190 * :c:func:`xa_erase_bh` 191 * :c:func:`xa_erase_irq` 192 * :c:func:`xa_cmpxchg` 193 * :c:func:`xa_cmpxchg_bh` 194 * :c:func:`xa_cmpxchg_irq` 195 * :c:func:`xa_store_range` 196 * :c:func:`xa_alloc` 197 * :c:func:`xa_alloc_bh` 198 * :c:func:`xa_alloc_irq` 199 * :c:func:`xa_reserve` 200 * :c:func:`xa_reserve_bh` 201 * :c:func:`xa_reserve_irq` 202 * :c:func:`xa_destroy` 203 * :c:func:`xa_set_mark` 204 * :c:func:`xa_clear_mark` 205 206Assumes xa_lock held on entry: 207 * :c:func:`__xa_store` 208 * :c:func:`__xa_insert` 209 * :c:func:`__xa_erase` 210 * :c:func:`__xa_cmpxchg` 211 * :c:func:`__xa_alloc` 212 * :c:func:`__xa_reserve` 213 * :c:func:`__xa_set_mark` 214 * :c:func:`__xa_clear_mark` 215 216If you want to take advantage of the lock to protect the data structures 217that you are storing in the XArray, you can call :c:func:`xa_lock` 218before calling :c:func:`xa_load`, then take a reference count on the 219object you have found before calling :c:func:`xa_unlock`. This will 220prevent stores from removing the object from the array between looking 221up the object and incrementing the refcount. You can also use RCU to 222avoid dereferencing freed memory, but an explanation of that is beyond 223the scope of this document. 224 225The XArray does not disable interrupts or softirqs while modifying 226the array. It is safe to read the XArray from interrupt or softirq 227context as the RCU lock provides enough protection. 228 229If, for example, you want to store entries in the XArray in process 230context and then erase them in softirq context, you can do that this way:: 231 232 void foo_init(struct foo *foo) 233 { 234 xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); 235 } 236 237 int foo_store(struct foo *foo, unsigned long index, void *entry) 238 { 239 int err; 240 241 xa_lock_bh(&foo->array); 242 err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); 243 if (!err) 244 foo->count++; 245 xa_unlock_bh(&foo->array); 246 return err; 247 } 248 249 /* foo_erase() is only called from softirq context */ 250 void foo_erase(struct foo *foo, unsigned long index) 251 { 252 xa_lock(&foo->array); 253 __xa_erase(&foo->array, index); 254 foo->count--; 255 xa_unlock(&foo->array); 256 } 257 258If you are going to modify the XArray from interrupt or softirq context, 259you need to initialise the array using :c:func:`xa_init_flags`, passing 260``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. 261 262The above example also shows a common pattern of wanting to extend the 263coverage of the xa_lock on the store side to protect some statistics 264associated with the array. 265 266Sharing the XArray with interrupt context is also possible, either 267using :c:func:`xa_lock_irqsave` in both the interrupt handler and process 268context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock` 269in the interrupt handler. Some of the more common patterns have helper 270functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`, 271:c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh` 272and :c:func:`xa_cmpxchg_irq`. 273 274Sometimes you need to protect access to the XArray with a mutex because 275that lock sits above another mutex in the locking hierarchy. That does 276not entitle you to use functions like :c:func:`__xa_erase` without taking 277the xa_lock; the xa_lock is used for lockdep validation and will be used 278for other purposes in the future. 279 280The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also 281available for situations where you look up an entry and want to atomically 282set or clear a mark. It may be more efficient to use the advanced API 283in this case, as it will save you from walking the tree twice. 284 285Advanced API 286============ 287 288The advanced API offers more flexibility and better performance at the 289cost of an interface which can be harder to use and has fewer safeguards. 290No locking is done for you by the advanced API, and you are required 291to use the xa_lock while modifying the array. You can choose whether 292to use the xa_lock or the RCU lock while doing read-only operations on 293the array. You can mix advanced and normal operations on the same array; 294indeed the normal API is implemented in terms of the advanced API. The 295advanced API is only available to modules with a GPL-compatible license. 296 297The advanced API is based around the xa_state. This is an opaque data 298structure which you declare on the stack using the :c:func:`XA_STATE` 299macro. This macro initialises the xa_state ready to start walking 300around the XArray. It is used as a cursor to maintain the position 301in the XArray and let you compose various operations together without 302having to restart from the top every time. 303 304The xa_state is also used to store errors. You can call 305:c:func:`xas_error` to retrieve the error. All operations check whether 306the xa_state is in an error state before proceeding, so there's no need 307for you to check for an error after each call; you can make multiple 308calls in succession and only check at a convenient point. The only 309errors currently generated by the XArray code itself are ``ENOMEM`` and 310``EINVAL``, but it supports arbitrary errors in case you want to call 311:c:func:`xas_set_err` yourself. 312 313If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem` 314will attempt to allocate more memory using the specified gfp flags and 315cache it in the xa_state for the next attempt. The idea is that you take 316the xa_lock, attempt the operation and drop the lock. The operation 317attempts to allocate memory while holding the lock, but it is more 318likely to fail. Once you have dropped the lock, :c:func:`xas_nomem` 319can try harder to allocate more memory. It will return ``true`` if it 320is worth retrying the operation (i.e. that there was a memory error *and* 321more memory was allocated). If it has previously allocated memory, and 322that memory wasn't used, and there is no error (or some error that isn't 323``ENOMEM``), then it will free the memory previously allocated. 324 325Internal Entries 326---------------- 327 328The XArray reserves some entries for its own purposes. These are never 329exposed through the normal API, but when using the advanced API, it's 330possible to see them. Usually the best way to handle them is to pass them 331to :c:func:`xas_retry`, and retry the operation if it returns ``true``. 332 333.. flat-table:: 334 :widths: 1 1 6 335 336 * - Name 337 - Test 338 - Usage 339 340 * - Node 341 - :c:func:`xa_is_node` 342 - An XArray node. May be visible when using a multi-index xa_state. 343 344 * - Sibling 345 - :c:func:`xa_is_sibling` 346 - A non-canonical entry for a multi-index entry. The value indicates 347 which slot in this node has the canonical entry. 348 349 * - Retry 350 - :c:func:`xa_is_retry` 351 - This entry is currently being modified by a thread which has the 352 xa_lock. The node containing this entry may be freed at the end 353 of this RCU period. You should restart the lookup from the head 354 of the array. 355 356 * - Zero 357 - :c:func:`xa_is_zero` 358 - Zero entries appear as ``NULL`` through the Normal API, but occupy 359 an entry in the XArray which can be used to reserve the index for 360 future use. This is used by allocating XArrays for allocated entries 361 which are ``NULL``. 362 363Other internal entries may be added in the future. As far as possible, they 364will be handled by :c:func:`xas_retry`. 365 366Additional functionality 367------------------------ 368 369The :c:func:`xas_create_range` function allocates all the necessary memory 370to store every entry in a range. It will set ENOMEM in the xa_state if 371it cannot allocate memory. 372 373You can use :c:func:`xas_init_marks` to reset the marks on an entry 374to their default state. This is usually all marks clear, unless the 375XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set 376and all other marks are clear. Replacing one entry with another using 377:c:func:`xas_store` will not reset the marks on that entry; if you want 378the marks reset, you should do that explicitly. 379 380The :c:func:`xas_load` will walk the xa_state as close to the entry 381as it can. If you know the xa_state has already been walked to the 382entry and need to check that the entry hasn't changed, you can use 383:c:func:`xas_reload` to save a function call. 384 385If you need to move to a different index in the XArray, call 386:c:func:`xas_set`. This resets the cursor to the top of the tree, which 387will generally make the next operation walk the cursor to the desired 388spot in the tree. If you want to move to the next or previous index, 389call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does 390not walk the cursor around the array so does not require a lock to be 391held, while moving to the next or previous index does. 392 393You can search for the next present entry using :c:func:`xas_find`. This 394is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`; 395if the cursor has been walked to an entry, then it will find the next 396entry after the one currently referenced. If not, it will return the 397entry at the index of the xa_state. Using :c:func:`xas_next_entry` to 398move to the next present entry instead of :c:func:`xas_find` will save 399a function call in the majority of cases at the expense of emitting more 400inline code. 401 402The :c:func:`xas_find_marked` function is similar. If the xa_state has 403not been walked, it will return the entry at the index of the xa_state, 404if it is marked. Otherwise, it will return the first marked entry after 405the entry referenced by the xa_state. The :c:func:`xas_next_marked` 406function is the equivalent of :c:func:`xas_next_entry`. 407 408When iterating over a range of the XArray using :c:func:`xas_for_each` 409or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop 410the iteration. The :c:func:`xas_pause` function exists for this purpose. 411After you have done the necessary work and wish to resume, the xa_state 412is in an appropriate state to continue the iteration after the entry 413you last processed. If you have interrupts disabled while iterating, 414then it is good manners to pause the iteration and reenable interrupts 415every ``XA_CHECK_SCHED`` entries. 416 417The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and 418:c:func:`xas_clear_mark` functions require the xa_state cursor to have 419been moved to the appropriate location in the xarray; they will do 420nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set` 421immediately before. 422 423You can call :c:func:`xas_set_update` to have a callback function 424called each time the XArray updates a node. This is used by the page 425cache workingset code to maintain its list of nodes which contain only 426shadow entries. 427 428Multi-Index Entries 429------------------- 430 431The XArray has the ability to tie multiple indices together so that 432operations on one index affect all indices. For example, storing into 433any index will change the value of the entry retrieved from any index. 434Setting or clearing a mark on any index will set or clear the mark 435on every index that is tied together. The current implementation 436only allows tying ranges which are aligned powers of two together; 437eg indices 64-127 may be tied together, but 2-6 may not be. This may 438save substantial quantities of memory; for example tying 512 entries 439together will save over 4kB. 440 441You can create a multi-index entry by using :c:func:`XA_STATE_ORDER` 442or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`. 443Calling :c:func:`xas_load` with a multi-index xa_state will walk the 444xa_state to the right location in the tree, but the return value is not 445meaningful, potentially being an internal entry or ``NULL`` even when there 446is an entry stored within the range. Calling :c:func:`xas_find_conflict` 447will return the first entry within the range or ``NULL`` if there are no 448entries in the range. The :c:func:`xas_for_each_conflict` iterator will 449iterate over every entry which overlaps the specified range. 450 451If :c:func:`xas_load` encounters a multi-index entry, the xa_index 452in the xa_state will not be changed. When iterating over an XArray 453or calling :c:func:`xas_find`, if the initial index is in the middle 454of a multi-index entry, it will not be altered. Subsequent calls 455or iterations will move the index to the first index in the range. 456Each entry will only be returned once, no matter how many indices it 457occupies. 458 459Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state 460is not supported. Using either of these functions on a multi-index entry 461will reveal sibling entries; these should be skipped over by the caller. 462 463Storing ``NULL`` into any index of a multi-index entry will set the entry 464at every index to ``NULL`` and dissolve the tie. Splitting a multi-index 465entry into entries occupying smaller ranges is not yet supported. 466 467Functions and structures 468======================== 469 470.. kernel-doc:: include/linux/xarray.h 471.. kernel-doc:: lib/xarray.c 472