xref: /openbmc/linux/Documentation/core-api/rbtree.rst (revision 14bbe3e33710be52f21d61253a94c5f44a696d02)
1*14bbe3e3SMatthew Wilcox (Oracle)=================================
2*14bbe3e3SMatthew Wilcox (Oracle)Red-black Trees (rbtree) in Linux
3*14bbe3e3SMatthew Wilcox (Oracle)=================================
4*14bbe3e3SMatthew Wilcox (Oracle)
5*14bbe3e3SMatthew Wilcox (Oracle)
6*14bbe3e3SMatthew Wilcox (Oracle):Date: January 18, 2007
7*14bbe3e3SMatthew Wilcox (Oracle):Author: Rob Landley <rob@landley.net>
8*14bbe3e3SMatthew Wilcox (Oracle)
9*14bbe3e3SMatthew Wilcox (Oracle)What are red-black trees, and what are they for?
10*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------
11*14bbe3e3SMatthew Wilcox (Oracle)
12*14bbe3e3SMatthew Wilcox (Oracle)Red-black trees are a type of self-balancing binary search tree, used for
13*14bbe3e3SMatthew Wilcox (Oracle)storing sortable key/value data pairs.  This differs from radix trees (which
14*14bbe3e3SMatthew Wilcox (Oracle)are used to efficiently store sparse arrays and thus use long integer indexes
15*14bbe3e3SMatthew Wilcox (Oracle)to insert/access/delete nodes) and hash tables (which are not kept sorted to
16*14bbe3e3SMatthew Wilcox (Oracle)be easily traversed in order, and must be tuned for a specific size and
17*14bbe3e3SMatthew Wilcox (Oracle)hash function where rbtrees scale gracefully storing arbitrary keys).
18*14bbe3e3SMatthew Wilcox (Oracle)
19*14bbe3e3SMatthew Wilcox (Oracle)Red-black trees are similar to AVL trees, but provide faster real-time bounded
20*14bbe3e3SMatthew Wilcox (Oracle)worst case performance for insertion and deletion (at most two rotations and
21*14bbe3e3SMatthew Wilcox (Oracle)three rotations, respectively, to balance the tree), with slightly slower
22*14bbe3e3SMatthew Wilcox (Oracle)(but still O(log n)) lookup time.
23*14bbe3e3SMatthew Wilcox (Oracle)
24*14bbe3e3SMatthew Wilcox (Oracle)To quote Linux Weekly News:
25*14bbe3e3SMatthew Wilcox (Oracle)
26*14bbe3e3SMatthew Wilcox (Oracle)    There are a number of red-black trees in use in the kernel.
27*14bbe3e3SMatthew Wilcox (Oracle)    The deadline and CFQ I/O schedulers employ rbtrees to
28*14bbe3e3SMatthew Wilcox (Oracle)    track requests; the packet CD/DVD driver does the same.
29*14bbe3e3SMatthew Wilcox (Oracle)    The high-resolution timer code uses an rbtree to organize outstanding
30*14bbe3e3SMatthew Wilcox (Oracle)    timer requests.  The ext3 filesystem tracks directory entries in a
31*14bbe3e3SMatthew Wilcox (Oracle)    red-black tree.  Virtual memory areas (VMAs) are tracked with red-black
32*14bbe3e3SMatthew Wilcox (Oracle)    trees, as are epoll file descriptors, cryptographic keys, and network
33*14bbe3e3SMatthew Wilcox (Oracle)    packets in the "hierarchical token bucket" scheduler.
34*14bbe3e3SMatthew Wilcox (Oracle)
35*14bbe3e3SMatthew Wilcox (Oracle)This document covers use of the Linux rbtree implementation.  For more
36*14bbe3e3SMatthew Wilcox (Oracle)information on the nature and implementation of Red Black Trees,  see:
37*14bbe3e3SMatthew Wilcox (Oracle)
38*14bbe3e3SMatthew Wilcox (Oracle)  Linux Weekly News article on red-black trees
39*14bbe3e3SMatthew Wilcox (Oracle)    http://lwn.net/Articles/184495/
40*14bbe3e3SMatthew Wilcox (Oracle)
41*14bbe3e3SMatthew Wilcox (Oracle)  Wikipedia entry on red-black trees
42*14bbe3e3SMatthew Wilcox (Oracle)    http://en.wikipedia.org/wiki/Red-black_tree
43*14bbe3e3SMatthew Wilcox (Oracle)
44*14bbe3e3SMatthew Wilcox (Oracle)Linux implementation of red-black trees
45*14bbe3e3SMatthew Wilcox (Oracle)---------------------------------------
46*14bbe3e3SMatthew Wilcox (Oracle)
47*14bbe3e3SMatthew Wilcox (Oracle)Linux's rbtree implementation lives in the file "lib/rbtree.c".  To use it,
48*14bbe3e3SMatthew Wilcox (Oracle)"#include <linux/rbtree.h>".
49*14bbe3e3SMatthew Wilcox (Oracle)
50*14bbe3e3SMatthew Wilcox (Oracle)The Linux rbtree implementation is optimized for speed, and thus has one
51*14bbe3e3SMatthew Wilcox (Oracle)less layer of indirection (and better cache locality) than more traditional
52*14bbe3e3SMatthew Wilcox (Oracle)tree implementations.  Instead of using pointers to separate rb_node and data
53*14bbe3e3SMatthew Wilcox (Oracle)structures, each instance of struct rb_node is embedded in the data structure
54*14bbe3e3SMatthew Wilcox (Oracle)it organizes.  And instead of using a comparison callback function pointer,
55*14bbe3e3SMatthew Wilcox (Oracle)users are expected to write their own tree search and insert functions
56*14bbe3e3SMatthew Wilcox (Oracle)which call the provided rbtree functions.  Locking is also left up to the
57*14bbe3e3SMatthew Wilcox (Oracle)user of the rbtree code.
58*14bbe3e3SMatthew Wilcox (Oracle)
59*14bbe3e3SMatthew Wilcox (Oracle)Creating a new rbtree
60*14bbe3e3SMatthew Wilcox (Oracle)---------------------
61*14bbe3e3SMatthew Wilcox (Oracle)
62*14bbe3e3SMatthew Wilcox (Oracle)Data nodes in an rbtree tree are structures containing a struct rb_node member::
63*14bbe3e3SMatthew Wilcox (Oracle)
64*14bbe3e3SMatthew Wilcox (Oracle)  struct mytype {
65*14bbe3e3SMatthew Wilcox (Oracle)  	struct rb_node node;
66*14bbe3e3SMatthew Wilcox (Oracle)  	char *keystring;
67*14bbe3e3SMatthew Wilcox (Oracle)  };
68*14bbe3e3SMatthew Wilcox (Oracle)
69*14bbe3e3SMatthew Wilcox (Oracle)When dealing with a pointer to the embedded struct rb_node, the containing data
70*14bbe3e3SMatthew Wilcox (Oracle)structure may be accessed with the standard container_of() macro.  In addition,
71*14bbe3e3SMatthew Wilcox (Oracle)individual members may be accessed directly via rb_entry(node, type, member).
72*14bbe3e3SMatthew Wilcox (Oracle)
73*14bbe3e3SMatthew Wilcox (Oracle)At the root of each rbtree is an rb_root structure, which is initialized to be
74*14bbe3e3SMatthew Wilcox (Oracle)empty via:
75*14bbe3e3SMatthew Wilcox (Oracle)
76*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_root mytree = RB_ROOT;
77*14bbe3e3SMatthew Wilcox (Oracle)
78*14bbe3e3SMatthew Wilcox (Oracle)Searching for a value in an rbtree
79*14bbe3e3SMatthew Wilcox (Oracle)----------------------------------
80*14bbe3e3SMatthew Wilcox (Oracle)
81*14bbe3e3SMatthew Wilcox (Oracle)Writing a search function for your tree is fairly straightforward: start at the
82*14bbe3e3SMatthew Wilcox (Oracle)root, compare each value, and follow the left or right branch as necessary.
83*14bbe3e3SMatthew Wilcox (Oracle)
84*14bbe3e3SMatthew Wilcox (Oracle)Example::
85*14bbe3e3SMatthew Wilcox (Oracle)
86*14bbe3e3SMatthew Wilcox (Oracle)  struct mytype *my_search(struct rb_root *root, char *string)
87*14bbe3e3SMatthew Wilcox (Oracle)  {
88*14bbe3e3SMatthew Wilcox (Oracle)  	struct rb_node *node = root->rb_node;
89*14bbe3e3SMatthew Wilcox (Oracle)
90*14bbe3e3SMatthew Wilcox (Oracle)  	while (node) {
91*14bbe3e3SMatthew Wilcox (Oracle)  		struct mytype *data = container_of(node, struct mytype, node);
92*14bbe3e3SMatthew Wilcox (Oracle)		int result;
93*14bbe3e3SMatthew Wilcox (Oracle)
94*14bbe3e3SMatthew Wilcox (Oracle)		result = strcmp(string, data->keystring);
95*14bbe3e3SMatthew Wilcox (Oracle)
96*14bbe3e3SMatthew Wilcox (Oracle)		if (result < 0)
97*14bbe3e3SMatthew Wilcox (Oracle)  			node = node->rb_left;
98*14bbe3e3SMatthew Wilcox (Oracle)		else if (result > 0)
99*14bbe3e3SMatthew Wilcox (Oracle)  			node = node->rb_right;
100*14bbe3e3SMatthew Wilcox (Oracle)		else
101*14bbe3e3SMatthew Wilcox (Oracle)  			return data;
102*14bbe3e3SMatthew Wilcox (Oracle)	}
103*14bbe3e3SMatthew Wilcox (Oracle)	return NULL;
104*14bbe3e3SMatthew Wilcox (Oracle)  }
105*14bbe3e3SMatthew Wilcox (Oracle)
106*14bbe3e3SMatthew Wilcox (Oracle)Inserting data into an rbtree
107*14bbe3e3SMatthew Wilcox (Oracle)-----------------------------
108*14bbe3e3SMatthew Wilcox (Oracle)
109*14bbe3e3SMatthew Wilcox (Oracle)Inserting data in the tree involves first searching for the place to insert the
110*14bbe3e3SMatthew Wilcox (Oracle)new node, then inserting the node and rebalancing ("recoloring") the tree.
111*14bbe3e3SMatthew Wilcox (Oracle)
112*14bbe3e3SMatthew Wilcox (Oracle)The search for insertion differs from the previous search by finding the
113*14bbe3e3SMatthew Wilcox (Oracle)location of the pointer on which to graft the new node.  The new node also
114*14bbe3e3SMatthew Wilcox (Oracle)needs a link to its parent node for rebalancing purposes.
115*14bbe3e3SMatthew Wilcox (Oracle)
116*14bbe3e3SMatthew Wilcox (Oracle)Example::
117*14bbe3e3SMatthew Wilcox (Oracle)
118*14bbe3e3SMatthew Wilcox (Oracle)  int my_insert(struct rb_root *root, struct mytype *data)
119*14bbe3e3SMatthew Wilcox (Oracle)  {
120*14bbe3e3SMatthew Wilcox (Oracle)  	struct rb_node **new = &(root->rb_node), *parent = NULL;
121*14bbe3e3SMatthew Wilcox (Oracle)
122*14bbe3e3SMatthew Wilcox (Oracle)  	/* Figure out where to put new node */
123*14bbe3e3SMatthew Wilcox (Oracle)  	while (*new) {
124*14bbe3e3SMatthew Wilcox (Oracle)  		struct mytype *this = container_of(*new, struct mytype, node);
125*14bbe3e3SMatthew Wilcox (Oracle)  		int result = strcmp(data->keystring, this->keystring);
126*14bbe3e3SMatthew Wilcox (Oracle)
127*14bbe3e3SMatthew Wilcox (Oracle)		parent = *new;
128*14bbe3e3SMatthew Wilcox (Oracle)  		if (result < 0)
129*14bbe3e3SMatthew Wilcox (Oracle)  			new = &((*new)->rb_left);
130*14bbe3e3SMatthew Wilcox (Oracle)  		else if (result > 0)
131*14bbe3e3SMatthew Wilcox (Oracle)  			new = &((*new)->rb_right);
132*14bbe3e3SMatthew Wilcox (Oracle)  		else
133*14bbe3e3SMatthew Wilcox (Oracle)  			return FALSE;
134*14bbe3e3SMatthew Wilcox (Oracle)  	}
135*14bbe3e3SMatthew Wilcox (Oracle)
136*14bbe3e3SMatthew Wilcox (Oracle)  	/* Add new node and rebalance tree. */
137*14bbe3e3SMatthew Wilcox (Oracle)  	rb_link_node(&data->node, parent, new);
138*14bbe3e3SMatthew Wilcox (Oracle)  	rb_insert_color(&data->node, root);
139*14bbe3e3SMatthew Wilcox (Oracle)
140*14bbe3e3SMatthew Wilcox (Oracle)	return TRUE;
141*14bbe3e3SMatthew Wilcox (Oracle)  }
142*14bbe3e3SMatthew Wilcox (Oracle)
143*14bbe3e3SMatthew Wilcox (Oracle)Removing or replacing existing data in an rbtree
144*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------
145*14bbe3e3SMatthew Wilcox (Oracle)
146*14bbe3e3SMatthew Wilcox (Oracle)To remove an existing node from a tree, call::
147*14bbe3e3SMatthew Wilcox (Oracle)
148*14bbe3e3SMatthew Wilcox (Oracle)  void rb_erase(struct rb_node *victim, struct rb_root *tree);
149*14bbe3e3SMatthew Wilcox (Oracle)
150*14bbe3e3SMatthew Wilcox (Oracle)Example::
151*14bbe3e3SMatthew Wilcox (Oracle)
152*14bbe3e3SMatthew Wilcox (Oracle)  struct mytype *data = mysearch(&mytree, "walrus");
153*14bbe3e3SMatthew Wilcox (Oracle)
154*14bbe3e3SMatthew Wilcox (Oracle)  if (data) {
155*14bbe3e3SMatthew Wilcox (Oracle)  	rb_erase(&data->node, &mytree);
156*14bbe3e3SMatthew Wilcox (Oracle)  	myfree(data);
157*14bbe3e3SMatthew Wilcox (Oracle)  }
158*14bbe3e3SMatthew Wilcox (Oracle)
159*14bbe3e3SMatthew Wilcox (Oracle)To replace an existing node in a tree with a new one with the same key, call::
160*14bbe3e3SMatthew Wilcox (Oracle)
161*14bbe3e3SMatthew Wilcox (Oracle)  void rb_replace_node(struct rb_node *old, struct rb_node *new,
162*14bbe3e3SMatthew Wilcox (Oracle)  			struct rb_root *tree);
163*14bbe3e3SMatthew Wilcox (Oracle)
164*14bbe3e3SMatthew Wilcox (Oracle)Replacing a node this way does not re-sort the tree: If the new node doesn't
165*14bbe3e3SMatthew Wilcox (Oracle)have the same key as the old node, the rbtree will probably become corrupted.
166*14bbe3e3SMatthew Wilcox (Oracle)
167*14bbe3e3SMatthew Wilcox (Oracle)Iterating through the elements stored in an rbtree (in sort order)
168*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------------------------
169*14bbe3e3SMatthew Wilcox (Oracle)
170*14bbe3e3SMatthew Wilcox (Oracle)Four functions are provided for iterating through an rbtree's contents in
171*14bbe3e3SMatthew Wilcox (Oracle)sorted order.  These work on arbitrary trees, and should not need to be
172*14bbe3e3SMatthew Wilcox (Oracle)modified or wrapped (except for locking purposes)::
173*14bbe3e3SMatthew Wilcox (Oracle)
174*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *rb_first(struct rb_root *tree);
175*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *rb_last(struct rb_root *tree);
176*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *rb_next(struct rb_node *node);
177*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *rb_prev(struct rb_node *node);
178*14bbe3e3SMatthew Wilcox (Oracle)
179*14bbe3e3SMatthew Wilcox (Oracle)To start iterating, call rb_first() or rb_last() with a pointer to the root
180*14bbe3e3SMatthew Wilcox (Oracle)of the tree, which will return a pointer to the node structure contained in
181*14bbe3e3SMatthew Wilcox (Oracle)the first or last element in the tree.  To continue, fetch the next or previous
182*14bbe3e3SMatthew Wilcox (Oracle)node by calling rb_next() or rb_prev() on the current node.  This will return
183*14bbe3e3SMatthew Wilcox (Oracle)NULL when there are no more nodes left.
184*14bbe3e3SMatthew Wilcox (Oracle)
185*14bbe3e3SMatthew Wilcox (Oracle)The iterator functions return a pointer to the embedded struct rb_node, from
186*14bbe3e3SMatthew Wilcox (Oracle)which the containing data structure may be accessed with the container_of()
187*14bbe3e3SMatthew Wilcox (Oracle)macro, and individual members may be accessed directly via
188*14bbe3e3SMatthew Wilcox (Oracle)rb_entry(node, type, member).
189*14bbe3e3SMatthew Wilcox (Oracle)
190*14bbe3e3SMatthew Wilcox (Oracle)Example::
191*14bbe3e3SMatthew Wilcox (Oracle)
192*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *node;
193*14bbe3e3SMatthew Wilcox (Oracle)  for (node = rb_first(&mytree); node; node = rb_next(node))
194*14bbe3e3SMatthew Wilcox (Oracle)	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
195*14bbe3e3SMatthew Wilcox (Oracle)
196*14bbe3e3SMatthew Wilcox (Oracle)Cached rbtrees
197*14bbe3e3SMatthew Wilcox (Oracle)--------------
198*14bbe3e3SMatthew Wilcox (Oracle)
199*14bbe3e3SMatthew Wilcox (Oracle)Computing the leftmost (smallest) node is quite a common task for binary
200*14bbe3e3SMatthew Wilcox (Oracle)search trees, such as for traversals or users relying on a the particular
201*14bbe3e3SMatthew Wilcox (Oracle)order for their own logic. To this end, users can use 'struct rb_root_cached'
202*14bbe3e3SMatthew Wilcox (Oracle)to optimize O(logN) rb_first() calls to a simple pointer fetch avoiding
203*14bbe3e3SMatthew Wilcox (Oracle)potentially expensive tree iterations. This is done at negligible runtime
204*14bbe3e3SMatthew Wilcox (Oracle)overhead for maintanence; albeit larger memory footprint.
205*14bbe3e3SMatthew Wilcox (Oracle)
206*14bbe3e3SMatthew Wilcox (Oracle)Similar to the rb_root structure, cached rbtrees are initialized to be
207*14bbe3e3SMatthew Wilcox (Oracle)empty via::
208*14bbe3e3SMatthew Wilcox (Oracle)
209*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_root_cached mytree = RB_ROOT_CACHED;
210*14bbe3e3SMatthew Wilcox (Oracle)
211*14bbe3e3SMatthew Wilcox (Oracle)Cached rbtree is simply a regular rb_root with an extra pointer to cache the
212*14bbe3e3SMatthew Wilcox (Oracle)leftmost node. This allows rb_root_cached to exist wherever rb_root does,
213*14bbe3e3SMatthew Wilcox (Oracle)which permits augmented trees to be supported as well as only a few extra
214*14bbe3e3SMatthew Wilcox (Oracle)interfaces::
215*14bbe3e3SMatthew Wilcox (Oracle)
216*14bbe3e3SMatthew Wilcox (Oracle)  struct rb_node *rb_first_cached(struct rb_root_cached *tree);
217*14bbe3e3SMatthew Wilcox (Oracle)  void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool);
218*14bbe3e3SMatthew Wilcox (Oracle)  void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
219*14bbe3e3SMatthew Wilcox (Oracle)
220*14bbe3e3SMatthew Wilcox (Oracle)Both insert and erase calls have their respective counterpart of augmented
221*14bbe3e3SMatthew Wilcox (Oracle)trees::
222*14bbe3e3SMatthew Wilcox (Oracle)
223*14bbe3e3SMatthew Wilcox (Oracle)  void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *,
224*14bbe3e3SMatthew Wilcox (Oracle)				  bool, struct rb_augment_callbacks *);
225*14bbe3e3SMatthew Wilcox (Oracle)  void rb_erase_augmented_cached(struct rb_node *, struct rb_root_cached *,
226*14bbe3e3SMatthew Wilcox (Oracle)				 struct rb_augment_callbacks *);
227*14bbe3e3SMatthew Wilcox (Oracle)
228*14bbe3e3SMatthew Wilcox (Oracle)
229*14bbe3e3SMatthew Wilcox (Oracle)Support for Augmented rbtrees
230*14bbe3e3SMatthew Wilcox (Oracle)-----------------------------
231*14bbe3e3SMatthew Wilcox (Oracle)
232*14bbe3e3SMatthew Wilcox (Oracle)Augmented rbtree is an rbtree with "some" additional data stored in
233*14bbe3e3SMatthew Wilcox (Oracle)each node, where the additional data for node N must be a function of
234*14bbe3e3SMatthew Wilcox (Oracle)the contents of all nodes in the subtree rooted at N. This data can
235*14bbe3e3SMatthew Wilcox (Oracle)be used to augment some new functionality to rbtree. Augmented rbtree
236*14bbe3e3SMatthew Wilcox (Oracle)is an optional feature built on top of basic rbtree infrastructure.
237*14bbe3e3SMatthew Wilcox (Oracle)An rbtree user who wants this feature will have to call the augmentation
238*14bbe3e3SMatthew Wilcox (Oracle)functions with the user provided augmentation callback when inserting
239*14bbe3e3SMatthew Wilcox (Oracle)and erasing nodes.
240*14bbe3e3SMatthew Wilcox (Oracle)
241*14bbe3e3SMatthew Wilcox (Oracle)C files implementing augmented rbtree manipulation must include
242*14bbe3e3SMatthew Wilcox (Oracle)<linux/rbtree_augmented.h> instead of <linux/rbtree.h>. Note that
243*14bbe3e3SMatthew Wilcox (Oracle)linux/rbtree_augmented.h exposes some rbtree implementations details
244*14bbe3e3SMatthew Wilcox (Oracle)you are not expected to rely on; please stick to the documented APIs
245*14bbe3e3SMatthew Wilcox (Oracle)there and do not include <linux/rbtree_augmented.h> from header files
246*14bbe3e3SMatthew Wilcox (Oracle)either so as to minimize chances of your users accidentally relying on
247*14bbe3e3SMatthew Wilcox (Oracle)such implementation details.
248*14bbe3e3SMatthew Wilcox (Oracle)
249*14bbe3e3SMatthew Wilcox (Oracle)On insertion, the user must update the augmented information on the path
250*14bbe3e3SMatthew Wilcox (Oracle)leading to the inserted node, then call rb_link_node() as usual and
251*14bbe3e3SMatthew Wilcox (Oracle)rb_augment_inserted() instead of the usual rb_insert_color() call.
252*14bbe3e3SMatthew Wilcox (Oracle)If rb_augment_inserted() rebalances the rbtree, it will callback into
253*14bbe3e3SMatthew Wilcox (Oracle)a user provided function to update the augmented information on the
254*14bbe3e3SMatthew Wilcox (Oracle)affected subtrees.
255*14bbe3e3SMatthew Wilcox (Oracle)
256*14bbe3e3SMatthew Wilcox (Oracle)When erasing a node, the user must call rb_erase_augmented() instead of
257*14bbe3e3SMatthew Wilcox (Oracle)rb_erase(). rb_erase_augmented() calls back into user provided functions
258*14bbe3e3SMatthew Wilcox (Oracle)to updated the augmented information on affected subtrees.
259*14bbe3e3SMatthew Wilcox (Oracle)
260*14bbe3e3SMatthew Wilcox (Oracle)In both cases, the callbacks are provided through struct rb_augment_callbacks.
261*14bbe3e3SMatthew Wilcox (Oracle)3 callbacks must be defined:
262*14bbe3e3SMatthew Wilcox (Oracle)
263*14bbe3e3SMatthew Wilcox (Oracle)- A propagation callback, which updates the augmented value for a given
264*14bbe3e3SMatthew Wilcox (Oracle)  node and its ancestors, up to a given stop point (or NULL to update
265*14bbe3e3SMatthew Wilcox (Oracle)  all the way to the root).
266*14bbe3e3SMatthew Wilcox (Oracle)
267*14bbe3e3SMatthew Wilcox (Oracle)- A copy callback, which copies the augmented value for a given subtree
268*14bbe3e3SMatthew Wilcox (Oracle)  to a newly assigned subtree root.
269*14bbe3e3SMatthew Wilcox (Oracle)
270*14bbe3e3SMatthew Wilcox (Oracle)- A tree rotation callback, which copies the augmented value for a given
271*14bbe3e3SMatthew Wilcox (Oracle)  subtree to a newly assigned subtree root AND recomputes the augmented
272*14bbe3e3SMatthew Wilcox (Oracle)  information for the former subtree root.
273*14bbe3e3SMatthew Wilcox (Oracle)
274*14bbe3e3SMatthew Wilcox (Oracle)The compiled code for rb_erase_augmented() may inline the propagation and
275*14bbe3e3SMatthew Wilcox (Oracle)copy callbacks, which results in a large function, so each augmented rbtree
276*14bbe3e3SMatthew Wilcox (Oracle)user should have a single rb_erase_augmented() call site in order to limit
277*14bbe3e3SMatthew Wilcox (Oracle)compiled code size.
278*14bbe3e3SMatthew Wilcox (Oracle)
279*14bbe3e3SMatthew Wilcox (Oracle)
280*14bbe3e3SMatthew Wilcox (Oracle)Sample usage
281*14bbe3e3SMatthew Wilcox (Oracle)^^^^^^^^^^^^
282*14bbe3e3SMatthew Wilcox (Oracle)
283*14bbe3e3SMatthew Wilcox (Oracle)Interval tree is an example of augmented rb tree. Reference -
284*14bbe3e3SMatthew Wilcox (Oracle)"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
285*14bbe3e3SMatthew Wilcox (Oracle)More details about interval trees:
286*14bbe3e3SMatthew Wilcox (Oracle)
287*14bbe3e3SMatthew Wilcox (Oracle)Classical rbtree has a single key and it cannot be directly used to store
288*14bbe3e3SMatthew Wilcox (Oracle)interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
289*14bbe3e3SMatthew Wilcox (Oracle)lo:hi or to find whether there is an exact match for a new lo:hi.
290*14bbe3e3SMatthew Wilcox (Oracle)
291*14bbe3e3SMatthew Wilcox (Oracle)However, rbtree can be augmented to store such interval ranges in a structured
292*14bbe3e3SMatthew Wilcox (Oracle)way making it possible to do efficient lookup and exact match.
293*14bbe3e3SMatthew Wilcox (Oracle)
294*14bbe3e3SMatthew Wilcox (Oracle)This "extra information" stored in each node is the maximum hi
295*14bbe3e3SMatthew Wilcox (Oracle)(max_hi) value among all the nodes that are its descendants. This
296*14bbe3e3SMatthew Wilcox (Oracle)information can be maintained at each node just be looking at the node
297*14bbe3e3SMatthew Wilcox (Oracle)and its immediate children. And this will be used in O(log n) lookup
298*14bbe3e3SMatthew Wilcox (Oracle)for lowest match (lowest start address among all possible matches)
299*14bbe3e3SMatthew Wilcox (Oracle)with something like::
300*14bbe3e3SMatthew Wilcox (Oracle)
301*14bbe3e3SMatthew Wilcox (Oracle)  struct interval_tree_node *
302*14bbe3e3SMatthew Wilcox (Oracle)  interval_tree_first_match(struct rb_root *root,
303*14bbe3e3SMatthew Wilcox (Oracle)			    unsigned long start, unsigned long last)
304*14bbe3e3SMatthew Wilcox (Oracle)  {
305*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *node;
306*14bbe3e3SMatthew Wilcox (Oracle)
307*14bbe3e3SMatthew Wilcox (Oracle)	if (!root->rb_node)
308*14bbe3e3SMatthew Wilcox (Oracle)		return NULL;
309*14bbe3e3SMatthew Wilcox (Oracle)	node = rb_entry(root->rb_node, struct interval_tree_node, rb);
310*14bbe3e3SMatthew Wilcox (Oracle)
311*14bbe3e3SMatthew Wilcox (Oracle)	while (true) {
312*14bbe3e3SMatthew Wilcox (Oracle)		if (node->rb.rb_left) {
313*14bbe3e3SMatthew Wilcox (Oracle)			struct interval_tree_node *left =
314*14bbe3e3SMatthew Wilcox (Oracle)				rb_entry(node->rb.rb_left,
315*14bbe3e3SMatthew Wilcox (Oracle)					 struct interval_tree_node, rb);
316*14bbe3e3SMatthew Wilcox (Oracle)			if (left->__subtree_last >= start) {
317*14bbe3e3SMatthew Wilcox (Oracle)				/*
318*14bbe3e3SMatthew Wilcox (Oracle)				 * Some nodes in left subtree satisfy Cond2.
319*14bbe3e3SMatthew Wilcox (Oracle)				 * Iterate to find the leftmost such node N.
320*14bbe3e3SMatthew Wilcox (Oracle)				 * If it also satisfies Cond1, that's the match
321*14bbe3e3SMatthew Wilcox (Oracle)				 * we are looking for. Otherwise, there is no
322*14bbe3e3SMatthew Wilcox (Oracle)				 * matching interval as nodes to the right of N
323*14bbe3e3SMatthew Wilcox (Oracle)				 * can't satisfy Cond1 either.
324*14bbe3e3SMatthew Wilcox (Oracle)				 */
325*14bbe3e3SMatthew Wilcox (Oracle)				node = left;
326*14bbe3e3SMatthew Wilcox (Oracle)				continue;
327*14bbe3e3SMatthew Wilcox (Oracle)			}
328*14bbe3e3SMatthew Wilcox (Oracle)		}
329*14bbe3e3SMatthew Wilcox (Oracle)		if (node->start <= last) {		/* Cond1 */
330*14bbe3e3SMatthew Wilcox (Oracle)			if (node->last >= start)	/* Cond2 */
331*14bbe3e3SMatthew Wilcox (Oracle)				return node;	/* node is leftmost match */
332*14bbe3e3SMatthew Wilcox (Oracle)			if (node->rb.rb_right) {
333*14bbe3e3SMatthew Wilcox (Oracle)				node = rb_entry(node->rb.rb_right,
334*14bbe3e3SMatthew Wilcox (Oracle)					struct interval_tree_node, rb);
335*14bbe3e3SMatthew Wilcox (Oracle)				if (node->__subtree_last >= start)
336*14bbe3e3SMatthew Wilcox (Oracle)					continue;
337*14bbe3e3SMatthew Wilcox (Oracle)			}
338*14bbe3e3SMatthew Wilcox (Oracle)		}
339*14bbe3e3SMatthew Wilcox (Oracle)		return NULL;	/* No match */
340*14bbe3e3SMatthew Wilcox (Oracle)	}
341*14bbe3e3SMatthew Wilcox (Oracle)  }
342*14bbe3e3SMatthew Wilcox (Oracle)
343*14bbe3e3SMatthew Wilcox (Oracle)Insertion/removal are defined using the following augmented callbacks::
344*14bbe3e3SMatthew Wilcox (Oracle)
345*14bbe3e3SMatthew Wilcox (Oracle)  static inline unsigned long
346*14bbe3e3SMatthew Wilcox (Oracle)  compute_subtree_last(struct interval_tree_node *node)
347*14bbe3e3SMatthew Wilcox (Oracle)  {
348*14bbe3e3SMatthew Wilcox (Oracle)	unsigned long max = node->last, subtree_last;
349*14bbe3e3SMatthew Wilcox (Oracle)	if (node->rb.rb_left) {
350*14bbe3e3SMatthew Wilcox (Oracle)		subtree_last = rb_entry(node->rb.rb_left,
351*14bbe3e3SMatthew Wilcox (Oracle)			struct interval_tree_node, rb)->__subtree_last;
352*14bbe3e3SMatthew Wilcox (Oracle)		if (max < subtree_last)
353*14bbe3e3SMatthew Wilcox (Oracle)			max = subtree_last;
354*14bbe3e3SMatthew Wilcox (Oracle)	}
355*14bbe3e3SMatthew Wilcox (Oracle)	if (node->rb.rb_right) {
356*14bbe3e3SMatthew Wilcox (Oracle)		subtree_last = rb_entry(node->rb.rb_right,
357*14bbe3e3SMatthew Wilcox (Oracle)			struct interval_tree_node, rb)->__subtree_last;
358*14bbe3e3SMatthew Wilcox (Oracle)		if (max < subtree_last)
359*14bbe3e3SMatthew Wilcox (Oracle)			max = subtree_last;
360*14bbe3e3SMatthew Wilcox (Oracle)	}
361*14bbe3e3SMatthew Wilcox (Oracle)	return max;
362*14bbe3e3SMatthew Wilcox (Oracle)  }
363*14bbe3e3SMatthew Wilcox (Oracle)
364*14bbe3e3SMatthew Wilcox (Oracle)  static void augment_propagate(struct rb_node *rb, struct rb_node *stop)
365*14bbe3e3SMatthew Wilcox (Oracle)  {
366*14bbe3e3SMatthew Wilcox (Oracle)	while (rb != stop) {
367*14bbe3e3SMatthew Wilcox (Oracle)		struct interval_tree_node *node =
368*14bbe3e3SMatthew Wilcox (Oracle)			rb_entry(rb, struct interval_tree_node, rb);
369*14bbe3e3SMatthew Wilcox (Oracle)		unsigned long subtree_last = compute_subtree_last(node);
370*14bbe3e3SMatthew Wilcox (Oracle)		if (node->__subtree_last == subtree_last)
371*14bbe3e3SMatthew Wilcox (Oracle)			break;
372*14bbe3e3SMatthew Wilcox (Oracle)		node->__subtree_last = subtree_last;
373*14bbe3e3SMatthew Wilcox (Oracle)		rb = rb_parent(&node->rb);
374*14bbe3e3SMatthew Wilcox (Oracle)	}
375*14bbe3e3SMatthew Wilcox (Oracle)  }
376*14bbe3e3SMatthew Wilcox (Oracle)
377*14bbe3e3SMatthew Wilcox (Oracle)  static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new)
378*14bbe3e3SMatthew Wilcox (Oracle)  {
379*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *old =
380*14bbe3e3SMatthew Wilcox (Oracle)		rb_entry(rb_old, struct interval_tree_node, rb);
381*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *new =
382*14bbe3e3SMatthew Wilcox (Oracle)		rb_entry(rb_new, struct interval_tree_node, rb);
383*14bbe3e3SMatthew Wilcox (Oracle)
384*14bbe3e3SMatthew Wilcox (Oracle)	new->__subtree_last = old->__subtree_last;
385*14bbe3e3SMatthew Wilcox (Oracle)  }
386*14bbe3e3SMatthew Wilcox (Oracle)
387*14bbe3e3SMatthew Wilcox (Oracle)  static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new)
388*14bbe3e3SMatthew Wilcox (Oracle)  {
389*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *old =
390*14bbe3e3SMatthew Wilcox (Oracle)		rb_entry(rb_old, struct interval_tree_node, rb);
391*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *new =
392*14bbe3e3SMatthew Wilcox (Oracle)		rb_entry(rb_new, struct interval_tree_node, rb);
393*14bbe3e3SMatthew Wilcox (Oracle)
394*14bbe3e3SMatthew Wilcox (Oracle)	new->__subtree_last = old->__subtree_last;
395*14bbe3e3SMatthew Wilcox (Oracle)	old->__subtree_last = compute_subtree_last(old);
396*14bbe3e3SMatthew Wilcox (Oracle)  }
397*14bbe3e3SMatthew Wilcox (Oracle)
398*14bbe3e3SMatthew Wilcox (Oracle)  static const struct rb_augment_callbacks augment_callbacks = {
399*14bbe3e3SMatthew Wilcox (Oracle)	augment_propagate, augment_copy, augment_rotate
400*14bbe3e3SMatthew Wilcox (Oracle)  };
401*14bbe3e3SMatthew Wilcox (Oracle)
402*14bbe3e3SMatthew Wilcox (Oracle)  void interval_tree_insert(struct interval_tree_node *node,
403*14bbe3e3SMatthew Wilcox (Oracle)			    struct rb_root *root)
404*14bbe3e3SMatthew Wilcox (Oracle)  {
405*14bbe3e3SMatthew Wilcox (Oracle)	struct rb_node **link = &root->rb_node, *rb_parent = NULL;
406*14bbe3e3SMatthew Wilcox (Oracle)	unsigned long start = node->start, last = node->last;
407*14bbe3e3SMatthew Wilcox (Oracle)	struct interval_tree_node *parent;
408*14bbe3e3SMatthew Wilcox (Oracle)
409*14bbe3e3SMatthew Wilcox (Oracle)	while (*link) {
410*14bbe3e3SMatthew Wilcox (Oracle)		rb_parent = *link;
411*14bbe3e3SMatthew Wilcox (Oracle)		parent = rb_entry(rb_parent, struct interval_tree_node, rb);
412*14bbe3e3SMatthew Wilcox (Oracle)		if (parent->__subtree_last < last)
413*14bbe3e3SMatthew Wilcox (Oracle)			parent->__subtree_last = last;
414*14bbe3e3SMatthew Wilcox (Oracle)		if (start < parent->start)
415*14bbe3e3SMatthew Wilcox (Oracle)			link = &parent->rb.rb_left;
416*14bbe3e3SMatthew Wilcox (Oracle)		else
417*14bbe3e3SMatthew Wilcox (Oracle)			link = &parent->rb.rb_right;
418*14bbe3e3SMatthew Wilcox (Oracle)	}
419*14bbe3e3SMatthew Wilcox (Oracle)
420*14bbe3e3SMatthew Wilcox (Oracle)	node->__subtree_last = last;
421*14bbe3e3SMatthew Wilcox (Oracle)	rb_link_node(&node->rb, rb_parent, link);
422*14bbe3e3SMatthew Wilcox (Oracle)	rb_insert_augmented(&node->rb, root, &augment_callbacks);
423*14bbe3e3SMatthew Wilcox (Oracle)  }
424*14bbe3e3SMatthew Wilcox (Oracle)
425*14bbe3e3SMatthew Wilcox (Oracle)  void interval_tree_remove(struct interval_tree_node *node,
426*14bbe3e3SMatthew Wilcox (Oracle)			    struct rb_root *root)
427*14bbe3e3SMatthew Wilcox (Oracle)  {
428*14bbe3e3SMatthew Wilcox (Oracle)	rb_erase_augmented(&node->rb, root, &augment_callbacks);
429*14bbe3e3SMatthew Wilcox (Oracle)  }
430