1992a8e60SMatthew Wilcox.. SPDX-License-Identifier: GPL-2.0+
2992a8e60SMatthew Wilcox
3992a8e60SMatthew Wilcox======
4992a8e60SMatthew WilcoxXArray
5992a8e60SMatthew Wilcox======
6992a8e60SMatthew Wilcox
7992a8e60SMatthew Wilcox:Author: Matthew Wilcox
8992a8e60SMatthew Wilcox
9992a8e60SMatthew WilcoxOverview
10992a8e60SMatthew Wilcox========
11992a8e60SMatthew Wilcox
12992a8e60SMatthew WilcoxThe XArray is an abstract data type which behaves like a very large array
13992a8e60SMatthew Wilcoxof pointers.  It meets many of the same needs as a hash or a conventional
14992a8e60SMatthew Wilcoxresizable array.  Unlike a hash, it allows you to sensibly go to the
15992a8e60SMatthew Wilcoxnext or previous entry in a cache-efficient manner.  In contrast to a
16992a8e60SMatthew Wilcoxresizable array, there is no need to copy data or change MMU mappings in
17992a8e60SMatthew Wilcoxorder to grow the array.  It is more memory-efficient, parallelisable
18992a8e60SMatthew Wilcoxand cache friendly than a doubly-linked list.  It takes advantage of
19992a8e60SMatthew WilcoxRCU to perform lookups without locking.
20992a8e60SMatthew Wilcox
21992a8e60SMatthew WilcoxThe XArray implementation is efficient when the indices used are densely
22992a8e60SMatthew Wilcoxclustered; hashing the object and using the hash as the index will not
23992a8e60SMatthew Wilcoxperform well.  The XArray is optimised for small indices, but still has
24992a8e60SMatthew Wilcoxgood performance with large indices.  If your index can be larger than
25992a8e60SMatthew Wilcox``ULONG_MAX`` then the XArray is not the data type for you.  The most
26992a8e60SMatthew Wilcoximportant user of the XArray is the page cache.
27992a8e60SMatthew Wilcox
28992a8e60SMatthew WilcoxNormal pointers may be stored in the XArray directly.  They must be 4-byte
299c79df7fSJonathan Corbetaligned, which is true for any pointer returned from kmalloc() and
309c79df7fSJonathan Corbetalloc_page().  It isn't true for arbitrary user-space pointers,
31992a8e60SMatthew Wilcoxnor for function pointers.  You can store pointers to statically allocated
32992a8e60SMatthew Wilcoxobjects, as long as those objects have an alignment of at least 4.
33992a8e60SMatthew Wilcox
34992a8e60SMatthew WilcoxYou can also store integers between 0 and ``LONG_MAX`` in the XArray.
359c79df7fSJonathan CorbetYou must first convert it into an entry using xa_mk_value().
36992a8e60SMatthew WilcoxWhen you retrieve an entry from the XArray, you can check whether it is
379c79df7fSJonathan Corbeta value entry by calling xa_is_value(), and convert it back to
389c79df7fSJonathan Corbetan integer by calling xa_to_value().
39992a8e60SMatthew Wilcox
406b81141dSMatthew Wilcox (Oracle)Some users want to tag the pointers they store in the XArray.  You can
416b81141dSMatthew Wilcox (Oracle)call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer()
426b81141dSMatthew Wilcox (Oracle)to turn a tagged entry back into an untagged pointer and xa_pointer_tag()
436b81141dSMatthew Wilcox (Oracle)to retrieve the tag of an entry.  Tagged pointers use the same bits that
446b81141dSMatthew Wilcox (Oracle)are used to distinguish value entries from normal pointers, so you must
45992a8e60SMatthew Wilcoxdecide whether they want to store value entries or tagged pointers in
46992a8e60SMatthew Wilcoxany particular XArray.
47992a8e60SMatthew Wilcox
489c79df7fSJonathan CorbetThe XArray does not support storing IS_ERR() pointers as some
49992a8e60SMatthew Wilcoxconflict with value entries or internal entries.
50992a8e60SMatthew Wilcox
51992a8e60SMatthew WilcoxAn unusual feature of the XArray is the ability to create entries which
52992a8e60SMatthew Wilcoxoccupy a range of indices.  Once stored to, looking up any index in
53992a8e60SMatthew Wilcoxthe range will return the same entry as looking up any other index in
546b81141dSMatthew Wilcox (Oracle)the range.  Storing to any index will store to all of them.  Multi-index
556b81141dSMatthew Wilcox (Oracle)entries can be explicitly split into smaller entries, or storing ``NULL``
566b81141dSMatthew Wilcox (Oracle)into any entry will cause the XArray to forget about the range.
57992a8e60SMatthew Wilcox
58992a8e60SMatthew WilcoxNormal API
59992a8e60SMatthew Wilcox==========
60992a8e60SMatthew Wilcox
619c79df7fSJonathan CorbetStart by initialising an XArray, either with DEFINE_XARRAY()
629c79df7fSJonathan Corbetfor statically allocated XArrays or xa_init() for dynamically
63992a8e60SMatthew Wilcoxallocated ones.  A freshly-initialised XArray contains a ``NULL``
64992a8e60SMatthew Wilcoxpointer at every index.
65992a8e60SMatthew Wilcox
669c79df7fSJonathan CorbetYou can then set entries using xa_store() and get entries
679c79df7fSJonathan Corbetusing xa_load().  xa_store will overwrite any entry with the
68992a8e60SMatthew Wilcoxnew entry and return the previous entry stored at that index.  You can
699c79df7fSJonathan Corbetuse xa_erase() instead of calling xa_store() with a
70992a8e60SMatthew Wilcox``NULL`` entry.  There is no difference between an entry that has never
71804dfaf0SMatthew Wilcoxbeen stored to, one that has been erased and one that has most recently
72804dfaf0SMatthew Wilcoxhad ``NULL`` stored to it.
73992a8e60SMatthew Wilcox
74992a8e60SMatthew WilcoxYou can conditionally replace an entry at an index by using
759c79df7fSJonathan Corbetxa_cmpxchg().  Like cmpxchg(), it will only succeed if
76992a8e60SMatthew Wilcoxthe entry at that index has the 'old' value.  It also returns the entry
77992a8e60SMatthew Wilcoxwhich was at that index; if it returns the same entry which was passed as
789c79df7fSJonathan Corbet'old', then xa_cmpxchg() succeeded.
79992a8e60SMatthew Wilcox
80992a8e60SMatthew WilcoxIf you want to only store a new entry to an index if the current entry
819c79df7fSJonathan Corbetat that index is ``NULL``, you can use xa_insert() which
82fd9dc93eSMatthew Wilcoxreturns ``-EBUSY`` if the entry is not empty.
83992a8e60SMatthew Wilcox
84992a8e60SMatthew WilcoxYou can copy entries out of the XArray into a plain array by calling
8500ed452cSMatthew Wilcox (Oracle)xa_extract().  Or you can iterate over the present entries in the XArray
8600ed452cSMatthew Wilcox (Oracle)by calling xa_for_each(), xa_for_each_start() or xa_for_each_range().
8700ed452cSMatthew Wilcox (Oracle)You may prefer to use xa_find() or xa_find_after() to move to the next
8800ed452cSMatthew Wilcox (Oracle)present entry in the XArray.
89992a8e60SMatthew Wilcox
909c79df7fSJonathan CorbetCalling xa_store_range() stores the same entry in a range
910e9446c3SMatthew Wilcoxof indices.  If you do this, some of the other operations will behave
920e9446c3SMatthew Wilcoxin a slightly odd way.  For example, marking the entry at one index
930e9446c3SMatthew Wilcoxmay result in the entry being marked at some, but not all of the other
940e9446c3SMatthew Wilcoxindices.  Storing into one index may result in the entry retrieved by
950e9446c3SMatthew Wilcoxsome, but not all of the other indices changing.
960e9446c3SMatthew Wilcox
979c79df7fSJonathan CorbetSometimes you need to ensure that a subsequent call to xa_store()
989c79df7fSJonathan Corbetwill not need to allocate memory.  The xa_reserve() function
99b0606fedSMatthew Wilcoxwill store a reserved entry at the indicated index.  Users of the
100b0606fedSMatthew Wilcoxnormal API will see this entry as containing ``NULL``.  If you do
1019c79df7fSJonathan Corbetnot need to use the reserved entry, you can call xa_release()
102b0606fedSMatthew Wilcoxto remove the unused entry.  If another user has stored to the entry
1039c79df7fSJonathan Corbetin the meantime, xa_release() will do nothing; if instead you
1049c79df7fSJonathan Corbetwant the entry to become ``NULL``, you should use xa_erase().
1059c79df7fSJonathan CorbetUsing xa_insert() on a reserved entry will fail.
1064c0608f4SMatthew Wilcox
1079c79df7fSJonathan CorbetIf all entries in the array are ``NULL``, the xa_empty() function
108804dfaf0SMatthew Wilcoxwill return ``true``.
109804dfaf0SMatthew Wilcox
110992a8e60SMatthew WilcoxFinally, you can remove all entries from an XArray by calling
1119c79df7fSJonathan Corbetxa_destroy().  If the XArray entries are pointers, you may wish
112992a8e60SMatthew Wilcoxto free the entries first.  You can do this by iterating over all present
1139c79df7fSJonathan Corbetentries in the XArray using the xa_for_each() iterator.
114992a8e60SMatthew Wilcox
1156b81141dSMatthew Wilcox (Oracle)Search Marks
1166b81141dSMatthew Wilcox (Oracle)------------
1176b81141dSMatthew Wilcox (Oracle)
1186b81141dSMatthew Wilcox (Oracle)Each entry in the array has three bits associated with it called marks.
1196b81141dSMatthew Wilcox (Oracle)Each mark may be set or cleared independently of the others.  You can
1206b81141dSMatthew Wilcox (Oracle)iterate over marked entries by using the xa_for_each_marked() iterator.
1216b81141dSMatthew Wilcox (Oracle)
1226b81141dSMatthew Wilcox (Oracle)You can enquire whether a mark is set on an entry by using
1236b81141dSMatthew Wilcox (Oracle)xa_get_mark().  If the entry is not ``NULL``, you can set a mark on it
1246b81141dSMatthew Wilcox (Oracle)by using xa_set_mark() and remove the mark from an entry by calling
1256b81141dSMatthew Wilcox (Oracle)xa_clear_mark().  You can ask whether any entry in the XArray has a
1266b81141dSMatthew Wilcox (Oracle)particular mark set by calling xa_marked().  Erasing an entry from the
1276b81141dSMatthew Wilcox (Oracle)XArray causes all marks associated with that entry to be cleared.
1286b81141dSMatthew Wilcox (Oracle)
1296b81141dSMatthew Wilcox (Oracle)Setting or clearing a mark on any index of a multi-index entry will
1306b81141dSMatthew Wilcox (Oracle)affect all indices covered by that entry.  Querying the mark on any
1316b81141dSMatthew Wilcox (Oracle)index will return the same result.
1326b81141dSMatthew Wilcox (Oracle)
1336b81141dSMatthew Wilcox (Oracle)There is no way to iterate over entries which are not marked; the data
1346b81141dSMatthew Wilcox (Oracle)structure does not allow this to be implemented efficiently.  There are
1356b81141dSMatthew Wilcox (Oracle)not currently iterators to search for logical combinations of bits (eg
1366b81141dSMatthew Wilcox (Oracle)iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2``
1376b81141dSMatthew Wilcox (Oracle)set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2``
1386b81141dSMatthew Wilcox (Oracle)set).  It would be possible to add these if a user arises.
1396b81141dSMatthew Wilcox (Oracle)
140d9c48043SMatthew WilcoxAllocating XArrays
141d9c48043SMatthew Wilcox------------------
142d9c48043SMatthew Wilcox
1439c79df7fSJonathan CorbetIf you use DEFINE_XARRAY_ALLOC() to define the XArray, or
1449c79df7fSJonathan Corbetinitialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
145d9c48043SMatthew Wilcoxthe XArray changes to track whether entries are in use or not.
146371c752dSMatthew Wilcox
1479c79df7fSJonathan CorbetYou can call xa_alloc() to store the entry at an unused index
148371c752dSMatthew Wilcoxin the XArray.  If you need to modify the array from interrupt context,
1499c79df7fSJonathan Corbetyou can use xa_alloc_bh() or xa_alloc_irq() to disable
150d9c48043SMatthew Wilcoxinterrupts while allocating the ID.
151d9c48043SMatthew Wilcox
1529c79df7fSJonathan CorbetUsing xa_store(), xa_cmpxchg() or xa_insert() will
1533ccaf57aSMatthew Wilcoxalso mark the entry as being allocated.  Unlike a normal XArray, storing
1549c79df7fSJonathan Corbet``NULL`` will mark the entry as being in use, like xa_reserve().
1559c79df7fSJonathan CorbetTo free an entry, use xa_erase() (or xa_release() if
156d9c48043SMatthew Wilcoxyou only want to free the entry if it's ``NULL``).
157d9c48043SMatthew Wilcox
1583ccaf57aSMatthew WilcoxBy default, the lowest free entry is allocated starting from 0.  If you
1593ccaf57aSMatthew Wilcoxwant to allocate entries starting at 1, it is more efficient to use
1609c79df7fSJonathan CorbetDEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``.  If you want to
1612fa044e5SMatthew Wilcoxallocate IDs up to a maximum, then wrap back around to the lowest free
1629c79df7fSJonathan CorbetID, you can use xa_alloc_cyclic().
1633ccaf57aSMatthew Wilcox
164d9c48043SMatthew WilcoxYou cannot use ``XA_MARK_0`` with an allocating XArray as this mark
165d9c48043SMatthew Wilcoxis used to track whether an entry is free or not.  The other marks are
166d9c48043SMatthew Wilcoxavailable for your use.
167371c752dSMatthew Wilcox
168992a8e60SMatthew WilcoxMemory allocation
169992a8e60SMatthew Wilcox-----------------
170992a8e60SMatthew Wilcox
1719c79df7fSJonathan CorbetThe xa_store(), xa_cmpxchg(), xa_alloc(),
1729c79df7fSJonathan Corbetxa_reserve() and xa_insert() functions take a gfp_t
173371c752dSMatthew Wilcoxparameter in case the XArray needs to allocate memory to store this entry.
174992a8e60SMatthew WilcoxIf the entry is being deleted, no memory allocation needs to be performed,
175992a8e60SMatthew Wilcoxand the GFP flags specified will be ignored.
176992a8e60SMatthew Wilcox
177992a8e60SMatthew WilcoxIt is possible for no memory to be allocatable, particularly if you pass
178992a8e60SMatthew Wilcoxa restrictive set of GFP flags.  In that case, the functions return a
1799c79df7fSJonathan Corbetspecial value which can be turned into an errno using xa_err().
180992a8e60SMatthew WilcoxIf you don't need to know exactly which error occurred, using
1819c79df7fSJonathan Corbetxa_is_err() is slightly more efficient.
182992a8e60SMatthew Wilcox
183992a8e60SMatthew WilcoxLocking
184992a8e60SMatthew Wilcox-------
185992a8e60SMatthew Wilcox
186992a8e60SMatthew WilcoxWhen using the Normal API, you do not have to worry about locking.
187992a8e60SMatthew WilcoxThe XArray uses RCU and an internal spinlock to synchronise access:
188992a8e60SMatthew Wilcox
189992a8e60SMatthew WilcoxNo lock needed:
1909c79df7fSJonathan Corbet * xa_empty()
1919c79df7fSJonathan Corbet * xa_marked()
192992a8e60SMatthew Wilcox
193992a8e60SMatthew WilcoxTakes RCU read lock:
1949c79df7fSJonathan Corbet * xa_load()
1959c79df7fSJonathan Corbet * xa_for_each()
19600ed452cSMatthew Wilcox (Oracle) * xa_for_each_start()
19700ed452cSMatthew Wilcox (Oracle) * xa_for_each_range()
1989c79df7fSJonathan Corbet * xa_find()
1999c79df7fSJonathan Corbet * xa_find_after()
2009c79df7fSJonathan Corbet * xa_extract()
2019c79df7fSJonathan Corbet * xa_get_mark()
202992a8e60SMatthew Wilcox
203992a8e60SMatthew WilcoxTakes xa_lock internally:
2049c79df7fSJonathan Corbet * xa_store()
2059c79df7fSJonathan Corbet * xa_store_bh()
2069c79df7fSJonathan Corbet * xa_store_irq()
2079c79df7fSJonathan Corbet * xa_insert()
2089c79df7fSJonathan Corbet * xa_insert_bh()
2099c79df7fSJonathan Corbet * xa_insert_irq()
2109c79df7fSJonathan Corbet * xa_erase()
2119c79df7fSJonathan Corbet * xa_erase_bh()
2129c79df7fSJonathan Corbet * xa_erase_irq()
2139c79df7fSJonathan Corbet * xa_cmpxchg()
2149c79df7fSJonathan Corbet * xa_cmpxchg_bh()
2159c79df7fSJonathan Corbet * xa_cmpxchg_irq()
2169c79df7fSJonathan Corbet * xa_store_range()
2179c79df7fSJonathan Corbet * xa_alloc()
2189c79df7fSJonathan Corbet * xa_alloc_bh()
2199c79df7fSJonathan Corbet * xa_alloc_irq()
2209c79df7fSJonathan Corbet * xa_reserve()
2219c79df7fSJonathan Corbet * xa_reserve_bh()
2229c79df7fSJonathan Corbet * xa_reserve_irq()
2239c79df7fSJonathan Corbet * xa_destroy()
2249c79df7fSJonathan Corbet * xa_set_mark()
2259c79df7fSJonathan Corbet * xa_clear_mark()
226992a8e60SMatthew Wilcox
227992a8e60SMatthew WilcoxAssumes xa_lock held on entry:
2289c79df7fSJonathan Corbet * __xa_store()
2299c79df7fSJonathan Corbet * __xa_insert()
2309c79df7fSJonathan Corbet * __xa_erase()
2319c79df7fSJonathan Corbet * __xa_cmpxchg()
2329c79df7fSJonathan Corbet * __xa_alloc()
2339c79df7fSJonathan Corbet * __xa_set_mark()
2349c79df7fSJonathan Corbet * __xa_clear_mark()
235992a8e60SMatthew Wilcox
236992a8e60SMatthew WilcoxIf you want to take advantage of the lock to protect the data structures
2379c79df7fSJonathan Corbetthat you are storing in the XArray, you can call xa_lock()
2389c79df7fSJonathan Corbetbefore calling xa_load(), then take a reference count on the
2399c79df7fSJonathan Corbetobject you have found before calling xa_unlock().  This will
240992a8e60SMatthew Wilcoxprevent stores from removing the object from the array between looking
241992a8e60SMatthew Wilcoxup the object and incrementing the refcount.  You can also use RCU to
242992a8e60SMatthew Wilcoxavoid dereferencing freed memory, but an explanation of that is beyond
243992a8e60SMatthew Wilcoxthe scope of this document.
244992a8e60SMatthew Wilcox
245992a8e60SMatthew WilcoxThe XArray does not disable interrupts or softirqs while modifying
246992a8e60SMatthew Wilcoxthe array.  It is safe to read the XArray from interrupt or softirq
247992a8e60SMatthew Wilcoxcontext as the RCU lock provides enough protection.
248992a8e60SMatthew Wilcox
249992a8e60SMatthew WilcoxIf, for example, you want to store entries in the XArray in process
250992a8e60SMatthew Wilcoxcontext and then erase them in softirq context, you can do that this way::
251992a8e60SMatthew Wilcox
252992a8e60SMatthew Wilcox    void foo_init(struct foo *foo)
253992a8e60SMatthew Wilcox    {
254992a8e60SMatthew Wilcox        xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
255992a8e60SMatthew Wilcox    }
256992a8e60SMatthew Wilcox
257992a8e60SMatthew Wilcox    int foo_store(struct foo *foo, unsigned long index, void *entry)
258992a8e60SMatthew Wilcox    {
259992a8e60SMatthew Wilcox        int err;
260992a8e60SMatthew Wilcox
261992a8e60SMatthew Wilcox        xa_lock_bh(&foo->array);
262992a8e60SMatthew Wilcox        err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
263992a8e60SMatthew Wilcox        if (!err)
264992a8e60SMatthew Wilcox            foo->count++;
265992a8e60SMatthew Wilcox        xa_unlock_bh(&foo->array);
266992a8e60SMatthew Wilcox        return err;
267992a8e60SMatthew Wilcox    }
268992a8e60SMatthew Wilcox
269992a8e60SMatthew Wilcox    /* foo_erase() is only called from softirq context */
270992a8e60SMatthew Wilcox    void foo_erase(struct foo *foo, unsigned long index)
271992a8e60SMatthew Wilcox    {
272992a8e60SMatthew Wilcox        xa_lock(&foo->array);
273992a8e60SMatthew Wilcox        __xa_erase(&foo->array, index);
274992a8e60SMatthew Wilcox        foo->count--;
275992a8e60SMatthew Wilcox        xa_unlock(&foo->array);
276992a8e60SMatthew Wilcox    }
277992a8e60SMatthew Wilcox
278992a8e60SMatthew WilcoxIf you are going to modify the XArray from interrupt or softirq context,
2799c79df7fSJonathan Corbetyou need to initialise the array using xa_init_flags(), passing
280992a8e60SMatthew Wilcox``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
281992a8e60SMatthew Wilcox
282992a8e60SMatthew WilcoxThe above example also shows a common pattern of wanting to extend the
283992a8e60SMatthew Wilcoxcoverage of the xa_lock on the store side to protect some statistics
284992a8e60SMatthew Wilcoxassociated with the array.
285992a8e60SMatthew Wilcox
286992a8e60SMatthew WilcoxSharing the XArray with interrupt context is also possible, either
2879c79df7fSJonathan Corbetusing xa_lock_irqsave() in both the interrupt handler and process
2889c79df7fSJonathan Corbetcontext, or xa_lock_irq() in process context and xa_lock()
289992a8e60SMatthew Wilcoxin the interrupt handler.  Some of the more common patterns have helper
2909c79df7fSJonathan Corbetfunctions such as xa_store_bh(), xa_store_irq(),
2919c79df7fSJonathan Corbetxa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
2929c79df7fSJonathan Corbetand xa_cmpxchg_irq().
293992a8e60SMatthew Wilcox
294992a8e60SMatthew WilcoxSometimes you need to protect access to the XArray with a mutex because
295992a8e60SMatthew Wilcoxthat lock sits above another mutex in the locking hierarchy.  That does
2969c79df7fSJonathan Corbetnot entitle you to use functions like __xa_erase() without taking
297992a8e60SMatthew Wilcoxthe xa_lock; the xa_lock is used for lockdep validation and will be used
298992a8e60SMatthew Wilcoxfor other purposes in the future.
299992a8e60SMatthew Wilcox
3009c79df7fSJonathan CorbetThe __xa_set_mark() and __xa_clear_mark() functions are also
301992a8e60SMatthew Wilcoxavailable for situations where you look up an entry and want to atomically
302992a8e60SMatthew Wilcoxset or clear a mark.  It may be more efficient to use the advanced API
303992a8e60SMatthew Wilcoxin this case, as it will save you from walking the tree twice.
304992a8e60SMatthew Wilcox
305992a8e60SMatthew WilcoxAdvanced API
306992a8e60SMatthew Wilcox============
307992a8e60SMatthew Wilcox
308992a8e60SMatthew WilcoxThe advanced API offers more flexibility and better performance at the
309992a8e60SMatthew Wilcoxcost of an interface which can be harder to use and has fewer safeguards.
310992a8e60SMatthew WilcoxNo locking is done for you by the advanced API, and you are required
311992a8e60SMatthew Wilcoxto use the xa_lock while modifying the array.  You can choose whether
312992a8e60SMatthew Wilcoxto use the xa_lock or the RCU lock while doing read-only operations on
313992a8e60SMatthew Wilcoxthe array.  You can mix advanced and normal operations on the same array;
314992a8e60SMatthew Wilcoxindeed the normal API is implemented in terms of the advanced API.  The
315992a8e60SMatthew Wilcoxadvanced API is only available to modules with a GPL-compatible license.
316992a8e60SMatthew Wilcox
317992a8e60SMatthew WilcoxThe advanced API is based around the xa_state.  This is an opaque data
318*ac23d1a9SMatthew Wilcox (Oracle)structure which you declare on the stack using the XA_STATE() macro.
319*ac23d1a9SMatthew Wilcox (Oracle)This macro initialises the xa_state ready to start walking around the
320*ac23d1a9SMatthew Wilcox (Oracle)XArray.  It is used as a cursor to maintain the position in the XArray
321*ac23d1a9SMatthew Wilcox (Oracle)and let you compose various operations together without having to restart
322*ac23d1a9SMatthew Wilcox (Oracle)from the top every time.  The contents of the xa_state are protected by
323*ac23d1a9SMatthew Wilcox (Oracle)the rcu_read_lock() or the xas_lock().  If you need to drop whichever of
324*ac23d1a9SMatthew Wilcox (Oracle)those locks is protecting your state and tree, you must call xas_pause()
325*ac23d1a9SMatthew Wilcox (Oracle)so that future calls do not rely on the parts of the state which were
326*ac23d1a9SMatthew Wilcox (Oracle)left unprotected.
327992a8e60SMatthew Wilcox
328992a8e60SMatthew WilcoxThe xa_state is also used to store errors.  You can call
3299c79df7fSJonathan Corbetxas_error() to retrieve the error.  All operations check whether
330992a8e60SMatthew Wilcoxthe xa_state is in an error state before proceeding, so there's no need
331992a8e60SMatthew Wilcoxfor you to check for an error after each call; you can make multiple
332992a8e60SMatthew Wilcoxcalls in succession and only check at a convenient point.  The only
333992a8e60SMatthew Wilcoxerrors currently generated by the XArray code itself are ``ENOMEM`` and
334992a8e60SMatthew Wilcox``EINVAL``, but it supports arbitrary errors in case you want to call
3359c79df7fSJonathan Corbetxas_set_err() yourself.
336992a8e60SMatthew Wilcox
3379c79df7fSJonathan CorbetIf the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
338992a8e60SMatthew Wilcoxwill attempt to allocate more memory using the specified gfp flags and
339992a8e60SMatthew Wilcoxcache it in the xa_state for the next attempt.  The idea is that you take
340992a8e60SMatthew Wilcoxthe xa_lock, attempt the operation and drop the lock.  The operation
341992a8e60SMatthew Wilcoxattempts to allocate memory while holding the lock, but it is more
3429c79df7fSJonathan Corbetlikely to fail.  Once you have dropped the lock, xas_nomem()
343992a8e60SMatthew Wilcoxcan try harder to allocate more memory.  It will return ``true`` if it
344992a8e60SMatthew Wilcoxis worth retrying the operation (i.e. that there was a memory error *and*
345992a8e60SMatthew Wilcoxmore memory was allocated).  If it has previously allocated memory, and
346992a8e60SMatthew Wilcoxthat memory wasn't used, and there is no error (or some error that isn't
347992a8e60SMatthew Wilcox``ENOMEM``), then it will free the memory previously allocated.
348992a8e60SMatthew Wilcox
349992a8e60SMatthew WilcoxInternal Entries
350992a8e60SMatthew Wilcox----------------
351992a8e60SMatthew Wilcox
352992a8e60SMatthew WilcoxThe XArray reserves some entries for its own purposes.  These are never
353992a8e60SMatthew Wilcoxexposed through the normal API, but when using the advanced API, it's
354992a8e60SMatthew Wilcoxpossible to see them.  Usually the best way to handle them is to pass them
3559c79df7fSJonathan Corbetto xas_retry(), and retry the operation if it returns ``true``.
356992a8e60SMatthew Wilcox
357992a8e60SMatthew Wilcox.. flat-table::
358992a8e60SMatthew Wilcox   :widths: 1 1 6
359992a8e60SMatthew Wilcox
360992a8e60SMatthew Wilcox   * - Name
361992a8e60SMatthew Wilcox     - Test
362992a8e60SMatthew Wilcox     - Usage
363992a8e60SMatthew Wilcox
364992a8e60SMatthew Wilcox   * - Node
3659c79df7fSJonathan Corbet     - xa_is_node()
366992a8e60SMatthew Wilcox     - An XArray node.  May be visible when using a multi-index xa_state.
367992a8e60SMatthew Wilcox
368992a8e60SMatthew Wilcox   * - Sibling
3699c79df7fSJonathan Corbet     - xa_is_sibling()
370992a8e60SMatthew Wilcox     - A non-canonical entry for a multi-index entry.  The value indicates
371992a8e60SMatthew Wilcox       which slot in this node has the canonical entry.
372992a8e60SMatthew Wilcox
373992a8e60SMatthew Wilcox   * - Retry
3749c79df7fSJonathan Corbet     - xa_is_retry()
375992a8e60SMatthew Wilcox     - This entry is currently being modified by a thread which has the
376992a8e60SMatthew Wilcox       xa_lock.  The node containing this entry may be freed at the end
377992a8e60SMatthew Wilcox       of this RCU period.  You should restart the lookup from the head
378992a8e60SMatthew Wilcox       of the array.
379992a8e60SMatthew Wilcox
3809f14d4f1SMatthew Wilcox   * - Zero
3819c79df7fSJonathan Corbet     - xa_is_zero()
3829f14d4f1SMatthew Wilcox     - Zero entries appear as ``NULL`` through the Normal API, but occupy
3839f14d4f1SMatthew Wilcox       an entry in the XArray which can be used to reserve the index for
384d9c48043SMatthew Wilcox       future use.  This is used by allocating XArrays for allocated entries
385d9c48043SMatthew Wilcox       which are ``NULL``.
3869f14d4f1SMatthew Wilcox
387992a8e60SMatthew WilcoxOther internal entries may be added in the future.  As far as possible, they
3889c79df7fSJonathan Corbetwill be handled by xas_retry().
389992a8e60SMatthew Wilcox
390992a8e60SMatthew WilcoxAdditional functionality
391992a8e60SMatthew Wilcox------------------------
392992a8e60SMatthew Wilcox
3939c79df7fSJonathan CorbetThe xas_create_range() function allocates all the necessary memory
394992a8e60SMatthew Wilcoxto store every entry in a range.  It will set ENOMEM in the xa_state if
395992a8e60SMatthew Wilcoxit cannot allocate memory.
396992a8e60SMatthew Wilcox
3979c79df7fSJonathan CorbetYou can use xas_init_marks() to reset the marks on an entry
398992a8e60SMatthew Wilcoxto their default state.  This is usually all marks clear, unless the
399992a8e60SMatthew WilcoxXArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
400992a8e60SMatthew Wilcoxand all other marks are clear.  Replacing one entry with another using
4019c79df7fSJonathan Corbetxas_store() will not reset the marks on that entry; if you want
402992a8e60SMatthew Wilcoxthe marks reset, you should do that explicitly.
403992a8e60SMatthew Wilcox
4049c79df7fSJonathan CorbetThe xas_load() will walk the xa_state as close to the entry
405992a8e60SMatthew Wilcoxas it can.  If you know the xa_state has already been walked to the
406992a8e60SMatthew Wilcoxentry and need to check that the entry hasn't changed, you can use
4079c79df7fSJonathan Corbetxas_reload() to save a function call.
408992a8e60SMatthew Wilcox
409992a8e60SMatthew WilcoxIf you need to move to a different index in the XArray, call
4109c79df7fSJonathan Corbetxas_set().  This resets the cursor to the top of the tree, which
411992a8e60SMatthew Wilcoxwill generally make the next operation walk the cursor to the desired
412992a8e60SMatthew Wilcoxspot in the tree.  If you want to move to the next or previous index,
4139c79df7fSJonathan Corbetcall xas_next() or xas_prev().  Setting the index does
414992a8e60SMatthew Wilcoxnot walk the cursor around the array so does not require a lock to be
415992a8e60SMatthew Wilcoxheld, while moving to the next or previous index does.
416992a8e60SMatthew Wilcox
4179c79df7fSJonathan CorbetYou can search for the next present entry using xas_find().  This
4189c79df7fSJonathan Corbetis the equivalent of both xa_find() and xa_find_after();
419992a8e60SMatthew Wilcoxif the cursor has been walked to an entry, then it will find the next
420992a8e60SMatthew Wilcoxentry after the one currently referenced.  If not, it will return the
4219c79df7fSJonathan Corbetentry at the index of the xa_state.  Using xas_next_entry() to
4229c79df7fSJonathan Corbetmove to the next present entry instead of xas_find() will save
423992a8e60SMatthew Wilcoxa function call in the majority of cases at the expense of emitting more
424992a8e60SMatthew Wilcoxinline code.
425992a8e60SMatthew Wilcox
4269c79df7fSJonathan CorbetThe xas_find_marked() function is similar.  If the xa_state has
427992a8e60SMatthew Wilcoxnot been walked, it will return the entry at the index of the xa_state,
428992a8e60SMatthew Wilcoxif it is marked.  Otherwise, it will return the first marked entry after
4299c79df7fSJonathan Corbetthe entry referenced by the xa_state.  The xas_next_marked()
4309c79df7fSJonathan Corbetfunction is the equivalent of xas_next_entry().
431992a8e60SMatthew Wilcox
4329c79df7fSJonathan CorbetWhen iterating over a range of the XArray using xas_for_each()
4339c79df7fSJonathan Corbetor xas_for_each_marked(), it may be necessary to temporarily stop
4349c79df7fSJonathan Corbetthe iteration.  The xas_pause() function exists for this purpose.
435992a8e60SMatthew WilcoxAfter you have done the necessary work and wish to resume, the xa_state
436992a8e60SMatthew Wilcoxis in an appropriate state to continue the iteration after the entry
437992a8e60SMatthew Wilcoxyou last processed.  If you have interrupts disabled while iterating,
438992a8e60SMatthew Wilcoxthen it is good manners to pause the iteration and reenable interrupts
439992a8e60SMatthew Wilcoxevery ``XA_CHECK_SCHED`` entries.
440992a8e60SMatthew Wilcox
4416b81141dSMatthew Wilcox (Oracle)The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require
4426b81141dSMatthew Wilcox (Oracle)the xa_state cursor to have been moved to the appropriate location in the
4436b81141dSMatthew Wilcox (Oracle)XArray; they will do nothing if you have called xas_pause() or xas_set()
444992a8e60SMatthew Wilcoximmediately before.
445992a8e60SMatthew Wilcox
4469c79df7fSJonathan CorbetYou can call xas_set_update() to have a callback function
447992a8e60SMatthew Wilcoxcalled each time the XArray updates a node.  This is used by the page
448992a8e60SMatthew Wilcoxcache workingset code to maintain its list of nodes which contain only
449992a8e60SMatthew Wilcoxshadow entries.
450992a8e60SMatthew Wilcox
451992a8e60SMatthew WilcoxMulti-Index Entries
452992a8e60SMatthew Wilcox-------------------
453992a8e60SMatthew Wilcox
454992a8e60SMatthew WilcoxThe XArray has the ability to tie multiple indices together so that
455992a8e60SMatthew Wilcoxoperations on one index affect all indices.  For example, storing into
456992a8e60SMatthew Wilcoxany index will change the value of the entry retrieved from any index.
457992a8e60SMatthew WilcoxSetting or clearing a mark on any index will set or clear the mark
458992a8e60SMatthew Wilcoxon every index that is tied together.  The current implementation
459992a8e60SMatthew Wilcoxonly allows tying ranges which are aligned powers of two together;
460992a8e60SMatthew Wilcoxeg indices 64-127 may be tied together, but 2-6 may not be.  This may
461992a8e60SMatthew Wilcoxsave substantial quantities of memory; for example tying 512 entries
462992a8e60SMatthew Wilcoxtogether will save over 4kB.
463992a8e60SMatthew Wilcox
4649c79df7fSJonathan CorbetYou can create a multi-index entry by using XA_STATE_ORDER()
4659c79df7fSJonathan Corbetor xas_set_order() followed by a call to xas_store().
4669c79df7fSJonathan CorbetCalling xas_load() with a multi-index xa_state will walk the
467992a8e60SMatthew Wilcoxxa_state to the right location in the tree, but the return value is not
468992a8e60SMatthew Wilcoxmeaningful, potentially being an internal entry or ``NULL`` even when there
4699c79df7fSJonathan Corbetis an entry stored within the range.  Calling xas_find_conflict()
470992a8e60SMatthew Wilcoxwill return the first entry within the range or ``NULL`` if there are no
4719c79df7fSJonathan Corbetentries in the range.  The xas_for_each_conflict() iterator will
472992a8e60SMatthew Wilcoxiterate over every entry which overlaps the specified range.
473992a8e60SMatthew Wilcox
4749c79df7fSJonathan CorbetIf xas_load() encounters a multi-index entry, the xa_index
475992a8e60SMatthew Wilcoxin the xa_state will not be changed.  When iterating over an XArray
4769c79df7fSJonathan Corbetor calling xas_find(), if the initial index is in the middle
477992a8e60SMatthew Wilcoxof a multi-index entry, it will not be altered.  Subsequent calls
478992a8e60SMatthew Wilcoxor iterations will move the index to the first index in the range.
479992a8e60SMatthew WilcoxEach entry will only be returned once, no matter how many indices it
480992a8e60SMatthew Wilcoxoccupies.
481992a8e60SMatthew Wilcox
4828fc75643SMatthew Wilcox (Oracle)Using xas_next() or xas_prev() with a multi-index xa_state is not
4838fc75643SMatthew Wilcox (Oracle)supported.  Using either of these functions on a multi-index entry will
4848fc75643SMatthew Wilcox (Oracle)reveal sibling entries; these should be skipped over by the caller.
485992a8e60SMatthew Wilcox
4868fc75643SMatthew Wilcox (Oracle)Storing ``NULL`` into any index of a multi-index entry will set the
4878fc75643SMatthew Wilcox (Oracle)entry at every index to ``NULL`` and dissolve the tie.  A multi-index
4888fc75643SMatthew Wilcox (Oracle)entry can be split into entries occupying smaller ranges by calling
4898fc75643SMatthew Wilcox (Oracle)xas_split_alloc() without the xa_lock held, followed by taking the lock
4908fc75643SMatthew Wilcox (Oracle)and calling xas_split().
491992a8e60SMatthew Wilcox
492992a8e60SMatthew WilcoxFunctions and structures
493992a8e60SMatthew Wilcox========================
494992a8e60SMatthew Wilcox
495992a8e60SMatthew Wilcox.. kernel-doc:: include/linux/xarray.h
496992a8e60SMatthew Wilcox.. kernel-doc:: lib/xarray.c
497