1992a8e60SMatthew Wilcox.. SPDX-License-Identifier: GPL-2.0+ 2992a8e60SMatthew Wilcox 3992a8e60SMatthew Wilcox====== 4992a8e60SMatthew WilcoxXArray 5992a8e60SMatthew Wilcox====== 6992a8e60SMatthew Wilcox 7992a8e60SMatthew Wilcox:Author: Matthew Wilcox 8992a8e60SMatthew Wilcox 9992a8e60SMatthew WilcoxOverview 10992a8e60SMatthew Wilcox======== 11992a8e60SMatthew Wilcox 12992a8e60SMatthew WilcoxThe XArray is an abstract data type which behaves like a very large array 13992a8e60SMatthew Wilcoxof pointers. It meets many of the same needs as a hash or a conventional 14992a8e60SMatthew Wilcoxresizable array. Unlike a hash, it allows you to sensibly go to the 15992a8e60SMatthew Wilcoxnext or previous entry in a cache-efficient manner. In contrast to a 16992a8e60SMatthew Wilcoxresizable array, there is no need to copy data or change MMU mappings in 17992a8e60SMatthew Wilcoxorder to grow the array. It is more memory-efficient, parallelisable 18992a8e60SMatthew Wilcoxand cache friendly than a doubly-linked list. It takes advantage of 19992a8e60SMatthew WilcoxRCU to perform lookups without locking. 20992a8e60SMatthew Wilcox 21992a8e60SMatthew WilcoxThe XArray implementation is efficient when the indices used are densely 22992a8e60SMatthew Wilcoxclustered; hashing the object and using the hash as the index will not 23992a8e60SMatthew Wilcoxperform well. The XArray is optimised for small indices, but still has 24992a8e60SMatthew Wilcoxgood performance with large indices. If your index can be larger than 25992a8e60SMatthew Wilcox``ULONG_MAX`` then the XArray is not the data type for you. The most 26992a8e60SMatthew Wilcoximportant user of the XArray is the page cache. 27992a8e60SMatthew Wilcox 28992a8e60SMatthew WilcoxNormal pointers may be stored in the XArray directly. They must be 4-byte 299c79df7fSJonathan Corbetaligned, which is true for any pointer returned from kmalloc() and 309c79df7fSJonathan Corbetalloc_page(). It isn't true for arbitrary user-space pointers, 31992a8e60SMatthew Wilcoxnor for function pointers. You can store pointers to statically allocated 32992a8e60SMatthew Wilcoxobjects, as long as those objects have an alignment of at least 4. 33992a8e60SMatthew Wilcox 34992a8e60SMatthew WilcoxYou can also store integers between 0 and ``LONG_MAX`` in the XArray. 359c79df7fSJonathan CorbetYou must first convert it into an entry using xa_mk_value(). 36992a8e60SMatthew WilcoxWhen you retrieve an entry from the XArray, you can check whether it is 379c79df7fSJonathan Corbeta value entry by calling xa_is_value(), and convert it back to 389c79df7fSJonathan Corbetan integer by calling xa_to_value(). 39992a8e60SMatthew Wilcox 406b81141dSMatthew Wilcox (Oracle)Some users want to tag the pointers they store in the XArray. You can 416b81141dSMatthew Wilcox (Oracle)call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer() 426b81141dSMatthew Wilcox (Oracle)to turn a tagged entry back into an untagged pointer and xa_pointer_tag() 436b81141dSMatthew Wilcox (Oracle)to retrieve the tag of an entry. Tagged pointers use the same bits that 446b81141dSMatthew Wilcox (Oracle)are used to distinguish value entries from normal pointers, so you must 45992a8e60SMatthew Wilcoxdecide whether they want to store value entries or tagged pointers in 46992a8e60SMatthew Wilcoxany particular XArray. 47992a8e60SMatthew Wilcox 489c79df7fSJonathan CorbetThe XArray does not support storing IS_ERR() pointers as some 49992a8e60SMatthew Wilcoxconflict with value entries or internal entries. 50992a8e60SMatthew Wilcox 51992a8e60SMatthew WilcoxAn unusual feature of the XArray is the ability to create entries which 52992a8e60SMatthew Wilcoxoccupy a range of indices. Once stored to, looking up any index in 53992a8e60SMatthew Wilcoxthe range will return the same entry as looking up any other index in 546b81141dSMatthew Wilcox (Oracle)the range. Storing to any index will store to all of them. Multi-index 556b81141dSMatthew Wilcox (Oracle)entries can be explicitly split into smaller entries, or storing ``NULL`` 566b81141dSMatthew Wilcox (Oracle)into any entry will cause the XArray to forget about the range. 57992a8e60SMatthew Wilcox 58992a8e60SMatthew WilcoxNormal API 59992a8e60SMatthew Wilcox========== 60992a8e60SMatthew Wilcox 619c79df7fSJonathan CorbetStart by initialising an XArray, either with DEFINE_XARRAY() 629c79df7fSJonathan Corbetfor statically allocated XArrays or xa_init() for dynamically 63992a8e60SMatthew Wilcoxallocated ones. A freshly-initialised XArray contains a ``NULL`` 64992a8e60SMatthew Wilcoxpointer at every index. 65992a8e60SMatthew Wilcox 669c79df7fSJonathan CorbetYou can then set entries using xa_store() and get entries 679c79df7fSJonathan Corbetusing xa_load(). xa_store will overwrite any entry with the 68992a8e60SMatthew Wilcoxnew entry and return the previous entry stored at that index. You can 699c79df7fSJonathan Corbetuse xa_erase() instead of calling xa_store() with a 70992a8e60SMatthew Wilcox``NULL`` entry. There is no difference between an entry that has never 71804dfaf0SMatthew Wilcoxbeen stored to, one that has been erased and one that has most recently 72804dfaf0SMatthew Wilcoxhad ``NULL`` stored to it. 73992a8e60SMatthew Wilcox 74992a8e60SMatthew WilcoxYou can conditionally replace an entry at an index by using 759c79df7fSJonathan Corbetxa_cmpxchg(). Like cmpxchg(), it will only succeed if 76992a8e60SMatthew Wilcoxthe entry at that index has the 'old' value. It also returns the entry 77992a8e60SMatthew Wilcoxwhich was at that index; if it returns the same entry which was passed as 789c79df7fSJonathan Corbet'old', then xa_cmpxchg() succeeded. 79992a8e60SMatthew Wilcox 80992a8e60SMatthew WilcoxIf you want to only store a new entry to an index if the current entry 819c79df7fSJonathan Corbetat that index is ``NULL``, you can use xa_insert() which 82fd9dc93eSMatthew Wilcoxreturns ``-EBUSY`` if the entry is not empty. 83992a8e60SMatthew Wilcox 84992a8e60SMatthew WilcoxYou can copy entries out of the XArray into a plain array by calling 8500ed452cSMatthew Wilcox (Oracle)xa_extract(). Or you can iterate over the present entries in the XArray 8600ed452cSMatthew Wilcox (Oracle)by calling xa_for_each(), xa_for_each_start() or xa_for_each_range(). 8700ed452cSMatthew Wilcox (Oracle)You may prefer to use xa_find() or xa_find_after() to move to the next 8800ed452cSMatthew Wilcox (Oracle)present entry in the XArray. 89992a8e60SMatthew Wilcox 909c79df7fSJonathan CorbetCalling xa_store_range() stores the same entry in a range 910e9446c3SMatthew Wilcoxof indices. If you do this, some of the other operations will behave 920e9446c3SMatthew Wilcoxin a slightly odd way. For example, marking the entry at one index 930e9446c3SMatthew Wilcoxmay result in the entry being marked at some, but not all of the other 940e9446c3SMatthew Wilcoxindices. Storing into one index may result in the entry retrieved by 950e9446c3SMatthew Wilcoxsome, but not all of the other indices changing. 960e9446c3SMatthew Wilcox 979c79df7fSJonathan CorbetSometimes you need to ensure that a subsequent call to xa_store() 989c79df7fSJonathan Corbetwill not need to allocate memory. The xa_reserve() function 99b0606fedSMatthew Wilcoxwill store a reserved entry at the indicated index. Users of the 100b0606fedSMatthew Wilcoxnormal API will see this entry as containing ``NULL``. If you do 1019c79df7fSJonathan Corbetnot need to use the reserved entry, you can call xa_release() 102b0606fedSMatthew Wilcoxto remove the unused entry. If another user has stored to the entry 1039c79df7fSJonathan Corbetin the meantime, xa_release() will do nothing; if instead you 1049c79df7fSJonathan Corbetwant the entry to become ``NULL``, you should use xa_erase(). 1059c79df7fSJonathan CorbetUsing xa_insert() on a reserved entry will fail. 1064c0608f4SMatthew Wilcox 1079c79df7fSJonathan CorbetIf all entries in the array are ``NULL``, the xa_empty() function 108804dfaf0SMatthew Wilcoxwill return ``true``. 109804dfaf0SMatthew Wilcox 110992a8e60SMatthew WilcoxFinally, you can remove all entries from an XArray by calling 1119c79df7fSJonathan Corbetxa_destroy(). If the XArray entries are pointers, you may wish 112992a8e60SMatthew Wilcoxto free the entries first. You can do this by iterating over all present 1139c79df7fSJonathan Corbetentries in the XArray using the xa_for_each() iterator. 114992a8e60SMatthew Wilcox 1156b81141dSMatthew Wilcox (Oracle)Search Marks 1166b81141dSMatthew Wilcox (Oracle)------------ 1176b81141dSMatthew Wilcox (Oracle) 1186b81141dSMatthew Wilcox (Oracle)Each entry in the array has three bits associated with it called marks. 1196b81141dSMatthew Wilcox (Oracle)Each mark may be set or cleared independently of the others. You can 1206b81141dSMatthew Wilcox (Oracle)iterate over marked entries by using the xa_for_each_marked() iterator. 1216b81141dSMatthew Wilcox (Oracle) 1226b81141dSMatthew Wilcox (Oracle)You can enquire whether a mark is set on an entry by using 1236b81141dSMatthew Wilcox (Oracle)xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it 1246b81141dSMatthew Wilcox (Oracle)by using xa_set_mark() and remove the mark from an entry by calling 1256b81141dSMatthew Wilcox (Oracle)xa_clear_mark(). You can ask whether any entry in the XArray has a 1266b81141dSMatthew Wilcox (Oracle)particular mark set by calling xa_marked(). Erasing an entry from the 1276b81141dSMatthew Wilcox (Oracle)XArray causes all marks associated with that entry to be cleared. 1286b81141dSMatthew Wilcox (Oracle) 1296b81141dSMatthew Wilcox (Oracle)Setting or clearing a mark on any index of a multi-index entry will 1306b81141dSMatthew Wilcox (Oracle)affect all indices covered by that entry. Querying the mark on any 1316b81141dSMatthew Wilcox (Oracle)index will return the same result. 1326b81141dSMatthew Wilcox (Oracle) 1336b81141dSMatthew Wilcox (Oracle)There is no way to iterate over entries which are not marked; the data 1346b81141dSMatthew Wilcox (Oracle)structure does not allow this to be implemented efficiently. There are 1356b81141dSMatthew Wilcox (Oracle)not currently iterators to search for logical combinations of bits (eg 1366b81141dSMatthew Wilcox (Oracle)iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2`` 1376b81141dSMatthew Wilcox (Oracle)set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2`` 1386b81141dSMatthew Wilcox (Oracle)set). It would be possible to add these if a user arises. 1396b81141dSMatthew Wilcox (Oracle) 140d9c48043SMatthew WilcoxAllocating XArrays 141d9c48043SMatthew Wilcox------------------ 142d9c48043SMatthew Wilcox 1439c79df7fSJonathan CorbetIf you use DEFINE_XARRAY_ALLOC() to define the XArray, or 1449c79df7fSJonathan Corbetinitialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(), 145d9c48043SMatthew Wilcoxthe XArray changes to track whether entries are in use or not. 146371c752dSMatthew Wilcox 1479c79df7fSJonathan CorbetYou can call xa_alloc() to store the entry at an unused index 148371c752dSMatthew Wilcoxin the XArray. If you need to modify the array from interrupt context, 1499c79df7fSJonathan Corbetyou can use xa_alloc_bh() or xa_alloc_irq() to disable 150d9c48043SMatthew Wilcoxinterrupts while allocating the ID. 151d9c48043SMatthew Wilcox 1529c79df7fSJonathan CorbetUsing xa_store(), xa_cmpxchg() or xa_insert() will 1533ccaf57aSMatthew Wilcoxalso mark the entry as being allocated. Unlike a normal XArray, storing 1549c79df7fSJonathan Corbet``NULL`` will mark the entry as being in use, like xa_reserve(). 1559c79df7fSJonathan CorbetTo free an entry, use xa_erase() (or xa_release() if 156d9c48043SMatthew Wilcoxyou only want to free the entry if it's ``NULL``). 157d9c48043SMatthew Wilcox 1583ccaf57aSMatthew WilcoxBy default, the lowest free entry is allocated starting from 0. If you 1593ccaf57aSMatthew Wilcoxwant to allocate entries starting at 1, it is more efficient to use 1609c79df7fSJonathan CorbetDEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``. If you want to 1612fa044e5SMatthew Wilcoxallocate IDs up to a maximum, then wrap back around to the lowest free 1629c79df7fSJonathan CorbetID, you can use xa_alloc_cyclic(). 1633ccaf57aSMatthew Wilcox 164d9c48043SMatthew WilcoxYou cannot use ``XA_MARK_0`` with an allocating XArray as this mark 165d9c48043SMatthew Wilcoxis used to track whether an entry is free or not. The other marks are 166d9c48043SMatthew Wilcoxavailable for your use. 167371c752dSMatthew Wilcox 168992a8e60SMatthew WilcoxMemory allocation 169992a8e60SMatthew Wilcox----------------- 170992a8e60SMatthew Wilcox 1719c79df7fSJonathan CorbetThe xa_store(), xa_cmpxchg(), xa_alloc(), 1729c79df7fSJonathan Corbetxa_reserve() and xa_insert() functions take a gfp_t 173371c752dSMatthew Wilcoxparameter in case the XArray needs to allocate memory to store this entry. 174992a8e60SMatthew WilcoxIf the entry is being deleted, no memory allocation needs to be performed, 175992a8e60SMatthew Wilcoxand the GFP flags specified will be ignored. 176992a8e60SMatthew Wilcox 177992a8e60SMatthew WilcoxIt is possible for no memory to be allocatable, particularly if you pass 178992a8e60SMatthew Wilcoxa restrictive set of GFP flags. In that case, the functions return a 1799c79df7fSJonathan Corbetspecial value which can be turned into an errno using xa_err(). 180992a8e60SMatthew WilcoxIf you don't need to know exactly which error occurred, using 1819c79df7fSJonathan Corbetxa_is_err() is slightly more efficient. 182992a8e60SMatthew Wilcox 183992a8e60SMatthew WilcoxLocking 184992a8e60SMatthew Wilcox------- 185992a8e60SMatthew Wilcox 186992a8e60SMatthew WilcoxWhen using the Normal API, you do not have to worry about locking. 187992a8e60SMatthew WilcoxThe XArray uses RCU and an internal spinlock to synchronise access: 188992a8e60SMatthew Wilcox 189992a8e60SMatthew WilcoxNo lock needed: 1909c79df7fSJonathan Corbet * xa_empty() 1919c79df7fSJonathan Corbet * xa_marked() 192992a8e60SMatthew Wilcox 193992a8e60SMatthew WilcoxTakes RCU read lock: 1949c79df7fSJonathan Corbet * xa_load() 1959c79df7fSJonathan Corbet * xa_for_each() 19600ed452cSMatthew Wilcox (Oracle) * xa_for_each_start() 19700ed452cSMatthew Wilcox (Oracle) * xa_for_each_range() 1989c79df7fSJonathan Corbet * xa_find() 1999c79df7fSJonathan Corbet * xa_find_after() 2009c79df7fSJonathan Corbet * xa_extract() 2019c79df7fSJonathan Corbet * xa_get_mark() 202992a8e60SMatthew Wilcox 203992a8e60SMatthew WilcoxTakes xa_lock internally: 2049c79df7fSJonathan Corbet * xa_store() 2059c79df7fSJonathan Corbet * xa_store_bh() 2069c79df7fSJonathan Corbet * xa_store_irq() 2079c79df7fSJonathan Corbet * xa_insert() 2089c79df7fSJonathan Corbet * xa_insert_bh() 2099c79df7fSJonathan Corbet * xa_insert_irq() 2109c79df7fSJonathan Corbet * xa_erase() 2119c79df7fSJonathan Corbet * xa_erase_bh() 2129c79df7fSJonathan Corbet * xa_erase_irq() 2139c79df7fSJonathan Corbet * xa_cmpxchg() 2149c79df7fSJonathan Corbet * xa_cmpxchg_bh() 2159c79df7fSJonathan Corbet * xa_cmpxchg_irq() 2169c79df7fSJonathan Corbet * xa_store_range() 2179c79df7fSJonathan Corbet * xa_alloc() 2189c79df7fSJonathan Corbet * xa_alloc_bh() 2199c79df7fSJonathan Corbet * xa_alloc_irq() 2209c79df7fSJonathan Corbet * xa_reserve() 2219c79df7fSJonathan Corbet * xa_reserve_bh() 2229c79df7fSJonathan Corbet * xa_reserve_irq() 2239c79df7fSJonathan Corbet * xa_destroy() 2249c79df7fSJonathan Corbet * xa_set_mark() 2259c79df7fSJonathan Corbet * xa_clear_mark() 226992a8e60SMatthew Wilcox 227992a8e60SMatthew WilcoxAssumes xa_lock held on entry: 2289c79df7fSJonathan Corbet * __xa_store() 2299c79df7fSJonathan Corbet * __xa_insert() 2309c79df7fSJonathan Corbet * __xa_erase() 2319c79df7fSJonathan Corbet * __xa_cmpxchg() 2329c79df7fSJonathan Corbet * __xa_alloc() 2339c79df7fSJonathan Corbet * __xa_set_mark() 2349c79df7fSJonathan Corbet * __xa_clear_mark() 235992a8e60SMatthew Wilcox 236992a8e60SMatthew WilcoxIf you want to take advantage of the lock to protect the data structures 2379c79df7fSJonathan Corbetthat you are storing in the XArray, you can call xa_lock() 2389c79df7fSJonathan Corbetbefore calling xa_load(), then take a reference count on the 2399c79df7fSJonathan Corbetobject you have found before calling xa_unlock(). This will 240992a8e60SMatthew Wilcoxprevent stores from removing the object from the array between looking 241992a8e60SMatthew Wilcoxup the object and incrementing the refcount. You can also use RCU to 242992a8e60SMatthew Wilcoxavoid dereferencing freed memory, but an explanation of that is beyond 243992a8e60SMatthew Wilcoxthe scope of this document. 244992a8e60SMatthew Wilcox 245992a8e60SMatthew WilcoxThe XArray does not disable interrupts or softirqs while modifying 246992a8e60SMatthew Wilcoxthe array. It is safe to read the XArray from interrupt or softirq 247992a8e60SMatthew Wilcoxcontext as the RCU lock provides enough protection. 248992a8e60SMatthew Wilcox 249992a8e60SMatthew WilcoxIf, for example, you want to store entries in the XArray in process 250992a8e60SMatthew Wilcoxcontext and then erase them in softirq context, you can do that this way:: 251992a8e60SMatthew Wilcox 252992a8e60SMatthew Wilcox void foo_init(struct foo *foo) 253992a8e60SMatthew Wilcox { 254992a8e60SMatthew Wilcox xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); 255992a8e60SMatthew Wilcox } 256992a8e60SMatthew Wilcox 257992a8e60SMatthew Wilcox int foo_store(struct foo *foo, unsigned long index, void *entry) 258992a8e60SMatthew Wilcox { 259992a8e60SMatthew Wilcox int err; 260992a8e60SMatthew Wilcox 261992a8e60SMatthew Wilcox xa_lock_bh(&foo->array); 262992a8e60SMatthew Wilcox err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); 263992a8e60SMatthew Wilcox if (!err) 264992a8e60SMatthew Wilcox foo->count++; 265992a8e60SMatthew Wilcox xa_unlock_bh(&foo->array); 266992a8e60SMatthew Wilcox return err; 267992a8e60SMatthew Wilcox } 268992a8e60SMatthew Wilcox 269992a8e60SMatthew Wilcox /* foo_erase() is only called from softirq context */ 270992a8e60SMatthew Wilcox void foo_erase(struct foo *foo, unsigned long index) 271992a8e60SMatthew Wilcox { 272992a8e60SMatthew Wilcox xa_lock(&foo->array); 273992a8e60SMatthew Wilcox __xa_erase(&foo->array, index); 274992a8e60SMatthew Wilcox foo->count--; 275992a8e60SMatthew Wilcox xa_unlock(&foo->array); 276992a8e60SMatthew Wilcox } 277992a8e60SMatthew Wilcox 278992a8e60SMatthew WilcoxIf you are going to modify the XArray from interrupt or softirq context, 2799c79df7fSJonathan Corbetyou need to initialise the array using xa_init_flags(), passing 280992a8e60SMatthew Wilcox``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. 281992a8e60SMatthew Wilcox 282992a8e60SMatthew WilcoxThe above example also shows a common pattern of wanting to extend the 283992a8e60SMatthew Wilcoxcoverage of the xa_lock on the store side to protect some statistics 284992a8e60SMatthew Wilcoxassociated with the array. 285992a8e60SMatthew Wilcox 286992a8e60SMatthew WilcoxSharing the XArray with interrupt context is also possible, either 2879c79df7fSJonathan Corbetusing xa_lock_irqsave() in both the interrupt handler and process 2889c79df7fSJonathan Corbetcontext, or xa_lock_irq() in process context and xa_lock() 289992a8e60SMatthew Wilcoxin the interrupt handler. Some of the more common patterns have helper 2909c79df7fSJonathan Corbetfunctions such as xa_store_bh(), xa_store_irq(), 2919c79df7fSJonathan Corbetxa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh() 2929c79df7fSJonathan Corbetand xa_cmpxchg_irq(). 293992a8e60SMatthew Wilcox 294992a8e60SMatthew WilcoxSometimes you need to protect access to the XArray with a mutex because 295992a8e60SMatthew Wilcoxthat lock sits above another mutex in the locking hierarchy. That does 2969c79df7fSJonathan Corbetnot entitle you to use functions like __xa_erase() without taking 297992a8e60SMatthew Wilcoxthe xa_lock; the xa_lock is used for lockdep validation and will be used 298992a8e60SMatthew Wilcoxfor other purposes in the future. 299992a8e60SMatthew Wilcox 3009c79df7fSJonathan CorbetThe __xa_set_mark() and __xa_clear_mark() functions are also 301992a8e60SMatthew Wilcoxavailable for situations where you look up an entry and want to atomically 302992a8e60SMatthew Wilcoxset or clear a mark. It may be more efficient to use the advanced API 303992a8e60SMatthew Wilcoxin this case, as it will save you from walking the tree twice. 304992a8e60SMatthew Wilcox 305992a8e60SMatthew WilcoxAdvanced API 306992a8e60SMatthew Wilcox============ 307992a8e60SMatthew Wilcox 308992a8e60SMatthew WilcoxThe advanced API offers more flexibility and better performance at the 309992a8e60SMatthew Wilcoxcost of an interface which can be harder to use and has fewer safeguards. 310992a8e60SMatthew WilcoxNo locking is done for you by the advanced API, and you are required 311992a8e60SMatthew Wilcoxto use the xa_lock while modifying the array. You can choose whether 312992a8e60SMatthew Wilcoxto use the xa_lock or the RCU lock while doing read-only operations on 313992a8e60SMatthew Wilcoxthe array. You can mix advanced and normal operations on the same array; 314992a8e60SMatthew Wilcoxindeed the normal API is implemented in terms of the advanced API. The 315992a8e60SMatthew Wilcoxadvanced API is only available to modules with a GPL-compatible license. 316992a8e60SMatthew Wilcox 317992a8e60SMatthew WilcoxThe advanced API is based around the xa_state. This is an opaque data 318*ac23d1a9SMatthew Wilcox (Oracle)structure which you declare on the stack using the XA_STATE() macro. 319*ac23d1a9SMatthew Wilcox (Oracle)This macro initialises the xa_state ready to start walking around the 320*ac23d1a9SMatthew Wilcox (Oracle)XArray. It is used as a cursor to maintain the position in the XArray 321*ac23d1a9SMatthew Wilcox (Oracle)and let you compose various operations together without having to restart 322*ac23d1a9SMatthew Wilcox (Oracle)from the top every time. The contents of the xa_state are protected by 323*ac23d1a9SMatthew Wilcox (Oracle)the rcu_read_lock() or the xas_lock(). If you need to drop whichever of 324*ac23d1a9SMatthew Wilcox (Oracle)those locks is protecting your state and tree, you must call xas_pause() 325*ac23d1a9SMatthew Wilcox (Oracle)so that future calls do not rely on the parts of the state which were 326*ac23d1a9SMatthew Wilcox (Oracle)left unprotected. 327992a8e60SMatthew Wilcox 328992a8e60SMatthew WilcoxThe xa_state is also used to store errors. You can call 3299c79df7fSJonathan Corbetxas_error() to retrieve the error. All operations check whether 330992a8e60SMatthew Wilcoxthe xa_state is in an error state before proceeding, so there's no need 331992a8e60SMatthew Wilcoxfor you to check for an error after each call; you can make multiple 332992a8e60SMatthew Wilcoxcalls in succession and only check at a convenient point. The only 333992a8e60SMatthew Wilcoxerrors currently generated by the XArray code itself are ``ENOMEM`` and 334992a8e60SMatthew Wilcox``EINVAL``, but it supports arbitrary errors in case you want to call 3359c79df7fSJonathan Corbetxas_set_err() yourself. 336992a8e60SMatthew Wilcox 3379c79df7fSJonathan CorbetIf the xa_state is holding an ``ENOMEM`` error, calling xas_nomem() 338992a8e60SMatthew Wilcoxwill attempt to allocate more memory using the specified gfp flags and 339992a8e60SMatthew Wilcoxcache it in the xa_state for the next attempt. The idea is that you take 340992a8e60SMatthew Wilcoxthe xa_lock, attempt the operation and drop the lock. The operation 341992a8e60SMatthew Wilcoxattempts to allocate memory while holding the lock, but it is more 3429c79df7fSJonathan Corbetlikely to fail. Once you have dropped the lock, xas_nomem() 343992a8e60SMatthew Wilcoxcan try harder to allocate more memory. It will return ``true`` if it 344992a8e60SMatthew Wilcoxis worth retrying the operation (i.e. that there was a memory error *and* 345992a8e60SMatthew Wilcoxmore memory was allocated). If it has previously allocated memory, and 346992a8e60SMatthew Wilcoxthat memory wasn't used, and there is no error (or some error that isn't 347992a8e60SMatthew Wilcox``ENOMEM``), then it will free the memory previously allocated. 348992a8e60SMatthew Wilcox 349992a8e60SMatthew WilcoxInternal Entries 350992a8e60SMatthew Wilcox---------------- 351992a8e60SMatthew Wilcox 352992a8e60SMatthew WilcoxThe XArray reserves some entries for its own purposes. These are never 353992a8e60SMatthew Wilcoxexposed through the normal API, but when using the advanced API, it's 354992a8e60SMatthew Wilcoxpossible to see them. Usually the best way to handle them is to pass them 3559c79df7fSJonathan Corbetto xas_retry(), and retry the operation if it returns ``true``. 356992a8e60SMatthew Wilcox 357992a8e60SMatthew Wilcox.. flat-table:: 358992a8e60SMatthew Wilcox :widths: 1 1 6 359992a8e60SMatthew Wilcox 360992a8e60SMatthew Wilcox * - Name 361992a8e60SMatthew Wilcox - Test 362992a8e60SMatthew Wilcox - Usage 363992a8e60SMatthew Wilcox 364992a8e60SMatthew Wilcox * - Node 3659c79df7fSJonathan Corbet - xa_is_node() 366992a8e60SMatthew Wilcox - An XArray node. May be visible when using a multi-index xa_state. 367992a8e60SMatthew Wilcox 368992a8e60SMatthew Wilcox * - Sibling 3699c79df7fSJonathan Corbet - xa_is_sibling() 370992a8e60SMatthew Wilcox - A non-canonical entry for a multi-index entry. The value indicates 371992a8e60SMatthew Wilcox which slot in this node has the canonical entry. 372992a8e60SMatthew Wilcox 373992a8e60SMatthew Wilcox * - Retry 3749c79df7fSJonathan Corbet - xa_is_retry() 375992a8e60SMatthew Wilcox - This entry is currently being modified by a thread which has the 376992a8e60SMatthew Wilcox xa_lock. The node containing this entry may be freed at the end 377992a8e60SMatthew Wilcox of this RCU period. You should restart the lookup from the head 378992a8e60SMatthew Wilcox of the array. 379992a8e60SMatthew Wilcox 3809f14d4f1SMatthew Wilcox * - Zero 3819c79df7fSJonathan Corbet - xa_is_zero() 3829f14d4f1SMatthew Wilcox - Zero entries appear as ``NULL`` through the Normal API, but occupy 3839f14d4f1SMatthew Wilcox an entry in the XArray which can be used to reserve the index for 384d9c48043SMatthew Wilcox future use. This is used by allocating XArrays for allocated entries 385d9c48043SMatthew Wilcox which are ``NULL``. 3869f14d4f1SMatthew Wilcox 387992a8e60SMatthew WilcoxOther internal entries may be added in the future. As far as possible, they 3889c79df7fSJonathan Corbetwill be handled by xas_retry(). 389992a8e60SMatthew Wilcox 390992a8e60SMatthew WilcoxAdditional functionality 391992a8e60SMatthew Wilcox------------------------ 392992a8e60SMatthew Wilcox 3939c79df7fSJonathan CorbetThe xas_create_range() function allocates all the necessary memory 394992a8e60SMatthew Wilcoxto store every entry in a range. It will set ENOMEM in the xa_state if 395992a8e60SMatthew Wilcoxit cannot allocate memory. 396992a8e60SMatthew Wilcox 3979c79df7fSJonathan CorbetYou can use xas_init_marks() to reset the marks on an entry 398992a8e60SMatthew Wilcoxto their default state. This is usually all marks clear, unless the 399992a8e60SMatthew WilcoxXArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set 400992a8e60SMatthew Wilcoxand all other marks are clear. Replacing one entry with another using 4019c79df7fSJonathan Corbetxas_store() will not reset the marks on that entry; if you want 402992a8e60SMatthew Wilcoxthe marks reset, you should do that explicitly. 403992a8e60SMatthew Wilcox 4049c79df7fSJonathan CorbetThe xas_load() will walk the xa_state as close to the entry 405992a8e60SMatthew Wilcoxas it can. If you know the xa_state has already been walked to the 406992a8e60SMatthew Wilcoxentry and need to check that the entry hasn't changed, you can use 4079c79df7fSJonathan Corbetxas_reload() to save a function call. 408992a8e60SMatthew Wilcox 409992a8e60SMatthew WilcoxIf you need to move to a different index in the XArray, call 4109c79df7fSJonathan Corbetxas_set(). This resets the cursor to the top of the tree, which 411992a8e60SMatthew Wilcoxwill generally make the next operation walk the cursor to the desired 412992a8e60SMatthew Wilcoxspot in the tree. If you want to move to the next or previous index, 4139c79df7fSJonathan Corbetcall xas_next() or xas_prev(). Setting the index does 414992a8e60SMatthew Wilcoxnot walk the cursor around the array so does not require a lock to be 415992a8e60SMatthew Wilcoxheld, while moving to the next or previous index does. 416992a8e60SMatthew Wilcox 4179c79df7fSJonathan CorbetYou can search for the next present entry using xas_find(). This 4189c79df7fSJonathan Corbetis the equivalent of both xa_find() and xa_find_after(); 419992a8e60SMatthew Wilcoxif the cursor has been walked to an entry, then it will find the next 420992a8e60SMatthew Wilcoxentry after the one currently referenced. If not, it will return the 4219c79df7fSJonathan Corbetentry at the index of the xa_state. Using xas_next_entry() to 4229c79df7fSJonathan Corbetmove to the next present entry instead of xas_find() will save 423992a8e60SMatthew Wilcoxa function call in the majority of cases at the expense of emitting more 424992a8e60SMatthew Wilcoxinline code. 425992a8e60SMatthew Wilcox 4269c79df7fSJonathan CorbetThe xas_find_marked() function is similar. If the xa_state has 427992a8e60SMatthew Wilcoxnot been walked, it will return the entry at the index of the xa_state, 428992a8e60SMatthew Wilcoxif it is marked. Otherwise, it will return the first marked entry after 4299c79df7fSJonathan Corbetthe entry referenced by the xa_state. The xas_next_marked() 4309c79df7fSJonathan Corbetfunction is the equivalent of xas_next_entry(). 431992a8e60SMatthew Wilcox 4329c79df7fSJonathan CorbetWhen iterating over a range of the XArray using xas_for_each() 4339c79df7fSJonathan Corbetor xas_for_each_marked(), it may be necessary to temporarily stop 4349c79df7fSJonathan Corbetthe iteration. The xas_pause() function exists for this purpose. 435992a8e60SMatthew WilcoxAfter you have done the necessary work and wish to resume, the xa_state 436992a8e60SMatthew Wilcoxis in an appropriate state to continue the iteration after the entry 437992a8e60SMatthew Wilcoxyou last processed. If you have interrupts disabled while iterating, 438992a8e60SMatthew Wilcoxthen it is good manners to pause the iteration and reenable interrupts 439992a8e60SMatthew Wilcoxevery ``XA_CHECK_SCHED`` entries. 440992a8e60SMatthew Wilcox 4416b81141dSMatthew Wilcox (Oracle)The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require 4426b81141dSMatthew Wilcox (Oracle)the xa_state cursor to have been moved to the appropriate location in the 4436b81141dSMatthew Wilcox (Oracle)XArray; they will do nothing if you have called xas_pause() or xas_set() 444992a8e60SMatthew Wilcoximmediately before. 445992a8e60SMatthew Wilcox 4469c79df7fSJonathan CorbetYou can call xas_set_update() to have a callback function 447992a8e60SMatthew Wilcoxcalled each time the XArray updates a node. This is used by the page 448992a8e60SMatthew Wilcoxcache workingset code to maintain its list of nodes which contain only 449992a8e60SMatthew Wilcoxshadow entries. 450992a8e60SMatthew Wilcox 451992a8e60SMatthew WilcoxMulti-Index Entries 452992a8e60SMatthew Wilcox------------------- 453992a8e60SMatthew Wilcox 454992a8e60SMatthew WilcoxThe XArray has the ability to tie multiple indices together so that 455992a8e60SMatthew Wilcoxoperations on one index affect all indices. For example, storing into 456992a8e60SMatthew Wilcoxany index will change the value of the entry retrieved from any index. 457992a8e60SMatthew WilcoxSetting or clearing a mark on any index will set or clear the mark 458992a8e60SMatthew Wilcoxon every index that is tied together. The current implementation 459992a8e60SMatthew Wilcoxonly allows tying ranges which are aligned powers of two together; 460992a8e60SMatthew Wilcoxeg indices 64-127 may be tied together, but 2-6 may not be. This may 461992a8e60SMatthew Wilcoxsave substantial quantities of memory; for example tying 512 entries 462992a8e60SMatthew Wilcoxtogether will save over 4kB. 463992a8e60SMatthew Wilcox 4649c79df7fSJonathan CorbetYou can create a multi-index entry by using XA_STATE_ORDER() 4659c79df7fSJonathan Corbetor xas_set_order() followed by a call to xas_store(). 4669c79df7fSJonathan CorbetCalling xas_load() with a multi-index xa_state will walk the 467992a8e60SMatthew Wilcoxxa_state to the right location in the tree, but the return value is not 468992a8e60SMatthew Wilcoxmeaningful, potentially being an internal entry or ``NULL`` even when there 4699c79df7fSJonathan Corbetis an entry stored within the range. Calling xas_find_conflict() 470992a8e60SMatthew Wilcoxwill return the first entry within the range or ``NULL`` if there are no 4719c79df7fSJonathan Corbetentries in the range. The xas_for_each_conflict() iterator will 472992a8e60SMatthew Wilcoxiterate over every entry which overlaps the specified range. 473992a8e60SMatthew Wilcox 4749c79df7fSJonathan CorbetIf xas_load() encounters a multi-index entry, the xa_index 475992a8e60SMatthew Wilcoxin the xa_state will not be changed. When iterating over an XArray 4769c79df7fSJonathan Corbetor calling xas_find(), if the initial index is in the middle 477992a8e60SMatthew Wilcoxof a multi-index entry, it will not be altered. Subsequent calls 478992a8e60SMatthew Wilcoxor iterations will move the index to the first index in the range. 479992a8e60SMatthew WilcoxEach entry will only be returned once, no matter how many indices it 480992a8e60SMatthew Wilcoxoccupies. 481992a8e60SMatthew Wilcox 4828fc75643SMatthew Wilcox (Oracle)Using xas_next() or xas_prev() with a multi-index xa_state is not 4838fc75643SMatthew Wilcox (Oracle)supported. Using either of these functions on a multi-index entry will 4848fc75643SMatthew Wilcox (Oracle)reveal sibling entries; these should be skipped over by the caller. 485992a8e60SMatthew Wilcox 4868fc75643SMatthew Wilcox (Oracle)Storing ``NULL`` into any index of a multi-index entry will set the 4878fc75643SMatthew Wilcox (Oracle)entry at every index to ``NULL`` and dissolve the tie. A multi-index 4888fc75643SMatthew Wilcox (Oracle)entry can be split into entries occupying smaller ranges by calling 4898fc75643SMatthew Wilcox (Oracle)xas_split_alloc() without the xa_lock held, followed by taking the lock 4908fc75643SMatthew Wilcox (Oracle)and calling xas_split(). 491992a8e60SMatthew Wilcox 492992a8e60SMatthew WilcoxFunctions and structures 493992a8e60SMatthew Wilcox======================== 494992a8e60SMatthew Wilcox 495992a8e60SMatthew Wilcox.. kernel-doc:: include/linux/xarray.h 496992a8e60SMatthew Wilcox.. kernel-doc:: lib/xarray.c 497