1*14bbe3e3SMatthew Wilcox (Oracle)================================= 2*14bbe3e3SMatthew Wilcox (Oracle)Red-black Trees (rbtree) in Linux 3*14bbe3e3SMatthew Wilcox (Oracle)================================= 4*14bbe3e3SMatthew Wilcox (Oracle) 5*14bbe3e3SMatthew Wilcox (Oracle) 6*14bbe3e3SMatthew Wilcox (Oracle):Date: January 18, 2007 7*14bbe3e3SMatthew Wilcox (Oracle):Author: Rob Landley <rob@landley.net> 8*14bbe3e3SMatthew Wilcox (Oracle) 9*14bbe3e3SMatthew Wilcox (Oracle)What are red-black trees, and what are they for? 10*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------ 11*14bbe3e3SMatthew Wilcox (Oracle) 12*14bbe3e3SMatthew Wilcox (Oracle)Red-black trees are a type of self-balancing binary search tree, used for 13*14bbe3e3SMatthew Wilcox (Oracle)storing sortable key/value data pairs. This differs from radix trees (which 14*14bbe3e3SMatthew Wilcox (Oracle)are used to efficiently store sparse arrays and thus use long integer indexes 15*14bbe3e3SMatthew Wilcox (Oracle)to insert/access/delete nodes) and hash tables (which are not kept sorted to 16*14bbe3e3SMatthew Wilcox (Oracle)be easily traversed in order, and must be tuned for a specific size and 17*14bbe3e3SMatthew Wilcox (Oracle)hash function where rbtrees scale gracefully storing arbitrary keys). 18*14bbe3e3SMatthew Wilcox (Oracle) 19*14bbe3e3SMatthew Wilcox (Oracle)Red-black trees are similar to AVL trees, but provide faster real-time bounded 20*14bbe3e3SMatthew Wilcox (Oracle)worst case performance for insertion and deletion (at most two rotations and 21*14bbe3e3SMatthew Wilcox (Oracle)three rotations, respectively, to balance the tree), with slightly slower 22*14bbe3e3SMatthew Wilcox (Oracle)(but still O(log n)) lookup time. 23*14bbe3e3SMatthew Wilcox (Oracle) 24*14bbe3e3SMatthew Wilcox (Oracle)To quote Linux Weekly News: 25*14bbe3e3SMatthew Wilcox (Oracle) 26*14bbe3e3SMatthew Wilcox (Oracle) There are a number of red-black trees in use in the kernel. 27*14bbe3e3SMatthew Wilcox (Oracle) The deadline and CFQ I/O schedulers employ rbtrees to 28*14bbe3e3SMatthew Wilcox (Oracle) track requests; the packet CD/DVD driver does the same. 29*14bbe3e3SMatthew Wilcox (Oracle) The high-resolution timer code uses an rbtree to organize outstanding 30*14bbe3e3SMatthew Wilcox (Oracle) timer requests. The ext3 filesystem tracks directory entries in a 31*14bbe3e3SMatthew Wilcox (Oracle) red-black tree. Virtual memory areas (VMAs) are tracked with red-black 32*14bbe3e3SMatthew Wilcox (Oracle) trees, as are epoll file descriptors, cryptographic keys, and network 33*14bbe3e3SMatthew Wilcox (Oracle) packets in the "hierarchical token bucket" scheduler. 34*14bbe3e3SMatthew Wilcox (Oracle) 35*14bbe3e3SMatthew Wilcox (Oracle)This document covers use of the Linux rbtree implementation. For more 36*14bbe3e3SMatthew Wilcox (Oracle)information on the nature and implementation of Red Black Trees, see: 37*14bbe3e3SMatthew Wilcox (Oracle) 38*14bbe3e3SMatthew Wilcox (Oracle) Linux Weekly News article on red-black trees 39*14bbe3e3SMatthew Wilcox (Oracle) http://lwn.net/Articles/184495/ 40*14bbe3e3SMatthew Wilcox (Oracle) 41*14bbe3e3SMatthew Wilcox (Oracle) Wikipedia entry on red-black trees 42*14bbe3e3SMatthew Wilcox (Oracle) http://en.wikipedia.org/wiki/Red-black_tree 43*14bbe3e3SMatthew Wilcox (Oracle) 44*14bbe3e3SMatthew Wilcox (Oracle)Linux implementation of red-black trees 45*14bbe3e3SMatthew Wilcox (Oracle)--------------------------------------- 46*14bbe3e3SMatthew Wilcox (Oracle) 47*14bbe3e3SMatthew Wilcox (Oracle)Linux's rbtree implementation lives in the file "lib/rbtree.c". To use it, 48*14bbe3e3SMatthew Wilcox (Oracle)"#include <linux/rbtree.h>". 49*14bbe3e3SMatthew Wilcox (Oracle) 50*14bbe3e3SMatthew Wilcox (Oracle)The Linux rbtree implementation is optimized for speed, and thus has one 51*14bbe3e3SMatthew Wilcox (Oracle)less layer of indirection (and better cache locality) than more traditional 52*14bbe3e3SMatthew Wilcox (Oracle)tree implementations. Instead of using pointers to separate rb_node and data 53*14bbe3e3SMatthew Wilcox (Oracle)structures, each instance of struct rb_node is embedded in the data structure 54*14bbe3e3SMatthew Wilcox (Oracle)it organizes. And instead of using a comparison callback function pointer, 55*14bbe3e3SMatthew Wilcox (Oracle)users are expected to write their own tree search and insert functions 56*14bbe3e3SMatthew Wilcox (Oracle)which call the provided rbtree functions. Locking is also left up to the 57*14bbe3e3SMatthew Wilcox (Oracle)user of the rbtree code. 58*14bbe3e3SMatthew Wilcox (Oracle) 59*14bbe3e3SMatthew Wilcox (Oracle)Creating a new rbtree 60*14bbe3e3SMatthew Wilcox (Oracle)--------------------- 61*14bbe3e3SMatthew Wilcox (Oracle) 62*14bbe3e3SMatthew Wilcox (Oracle)Data nodes in an rbtree tree are structures containing a struct rb_node member:: 63*14bbe3e3SMatthew Wilcox (Oracle) 64*14bbe3e3SMatthew Wilcox (Oracle) struct mytype { 65*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node node; 66*14bbe3e3SMatthew Wilcox (Oracle) char *keystring; 67*14bbe3e3SMatthew Wilcox (Oracle) }; 68*14bbe3e3SMatthew Wilcox (Oracle) 69*14bbe3e3SMatthew Wilcox (Oracle)When dealing with a pointer to the embedded struct rb_node, the containing data 70*14bbe3e3SMatthew Wilcox (Oracle)structure may be accessed with the standard container_of() macro. In addition, 71*14bbe3e3SMatthew Wilcox (Oracle)individual members may be accessed directly via rb_entry(node, type, member). 72*14bbe3e3SMatthew Wilcox (Oracle) 73*14bbe3e3SMatthew Wilcox (Oracle)At the root of each rbtree is an rb_root structure, which is initialized to be 74*14bbe3e3SMatthew Wilcox (Oracle)empty via: 75*14bbe3e3SMatthew Wilcox (Oracle) 76*14bbe3e3SMatthew Wilcox (Oracle) struct rb_root mytree = RB_ROOT; 77*14bbe3e3SMatthew Wilcox (Oracle) 78*14bbe3e3SMatthew Wilcox (Oracle)Searching for a value in an rbtree 79*14bbe3e3SMatthew Wilcox (Oracle)---------------------------------- 80*14bbe3e3SMatthew Wilcox (Oracle) 81*14bbe3e3SMatthew Wilcox (Oracle)Writing a search function for your tree is fairly straightforward: start at the 82*14bbe3e3SMatthew Wilcox (Oracle)root, compare each value, and follow the left or right branch as necessary. 83*14bbe3e3SMatthew Wilcox (Oracle) 84*14bbe3e3SMatthew Wilcox (Oracle)Example:: 85*14bbe3e3SMatthew Wilcox (Oracle) 86*14bbe3e3SMatthew Wilcox (Oracle) struct mytype *my_search(struct rb_root *root, char *string) 87*14bbe3e3SMatthew Wilcox (Oracle) { 88*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *node = root->rb_node; 89*14bbe3e3SMatthew Wilcox (Oracle) 90*14bbe3e3SMatthew Wilcox (Oracle) while (node) { 91*14bbe3e3SMatthew Wilcox (Oracle) struct mytype *data = container_of(node, struct mytype, node); 92*14bbe3e3SMatthew Wilcox (Oracle) int result; 93*14bbe3e3SMatthew Wilcox (Oracle) 94*14bbe3e3SMatthew Wilcox (Oracle) result = strcmp(string, data->keystring); 95*14bbe3e3SMatthew Wilcox (Oracle) 96*14bbe3e3SMatthew Wilcox (Oracle) if (result < 0) 97*14bbe3e3SMatthew Wilcox (Oracle) node = node->rb_left; 98*14bbe3e3SMatthew Wilcox (Oracle) else if (result > 0) 99*14bbe3e3SMatthew Wilcox (Oracle) node = node->rb_right; 100*14bbe3e3SMatthew Wilcox (Oracle) else 101*14bbe3e3SMatthew Wilcox (Oracle) return data; 102*14bbe3e3SMatthew Wilcox (Oracle) } 103*14bbe3e3SMatthew Wilcox (Oracle) return NULL; 104*14bbe3e3SMatthew Wilcox (Oracle) } 105*14bbe3e3SMatthew Wilcox (Oracle) 106*14bbe3e3SMatthew Wilcox (Oracle)Inserting data into an rbtree 107*14bbe3e3SMatthew Wilcox (Oracle)----------------------------- 108*14bbe3e3SMatthew Wilcox (Oracle) 109*14bbe3e3SMatthew Wilcox (Oracle)Inserting data in the tree involves first searching for the place to insert the 110*14bbe3e3SMatthew Wilcox (Oracle)new node, then inserting the node and rebalancing ("recoloring") the tree. 111*14bbe3e3SMatthew Wilcox (Oracle) 112*14bbe3e3SMatthew Wilcox (Oracle)The search for insertion differs from the previous search by finding the 113*14bbe3e3SMatthew Wilcox (Oracle)location of the pointer on which to graft the new node. The new node also 114*14bbe3e3SMatthew Wilcox (Oracle)needs a link to its parent node for rebalancing purposes. 115*14bbe3e3SMatthew Wilcox (Oracle) 116*14bbe3e3SMatthew Wilcox (Oracle)Example:: 117*14bbe3e3SMatthew Wilcox (Oracle) 118*14bbe3e3SMatthew Wilcox (Oracle) int my_insert(struct rb_root *root, struct mytype *data) 119*14bbe3e3SMatthew Wilcox (Oracle) { 120*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node **new = &(root->rb_node), *parent = NULL; 121*14bbe3e3SMatthew Wilcox (Oracle) 122*14bbe3e3SMatthew Wilcox (Oracle) /* Figure out where to put new node */ 123*14bbe3e3SMatthew Wilcox (Oracle) while (*new) { 124*14bbe3e3SMatthew Wilcox (Oracle) struct mytype *this = container_of(*new, struct mytype, node); 125*14bbe3e3SMatthew Wilcox (Oracle) int result = strcmp(data->keystring, this->keystring); 126*14bbe3e3SMatthew Wilcox (Oracle) 127*14bbe3e3SMatthew Wilcox (Oracle) parent = *new; 128*14bbe3e3SMatthew Wilcox (Oracle) if (result < 0) 129*14bbe3e3SMatthew Wilcox (Oracle) new = &((*new)->rb_left); 130*14bbe3e3SMatthew Wilcox (Oracle) else if (result > 0) 131*14bbe3e3SMatthew Wilcox (Oracle) new = &((*new)->rb_right); 132*14bbe3e3SMatthew Wilcox (Oracle) else 133*14bbe3e3SMatthew Wilcox (Oracle) return FALSE; 134*14bbe3e3SMatthew Wilcox (Oracle) } 135*14bbe3e3SMatthew Wilcox (Oracle) 136*14bbe3e3SMatthew Wilcox (Oracle) /* Add new node and rebalance tree. */ 137*14bbe3e3SMatthew Wilcox (Oracle) rb_link_node(&data->node, parent, new); 138*14bbe3e3SMatthew Wilcox (Oracle) rb_insert_color(&data->node, root); 139*14bbe3e3SMatthew Wilcox (Oracle) 140*14bbe3e3SMatthew Wilcox (Oracle) return TRUE; 141*14bbe3e3SMatthew Wilcox (Oracle) } 142*14bbe3e3SMatthew Wilcox (Oracle) 143*14bbe3e3SMatthew Wilcox (Oracle)Removing or replacing existing data in an rbtree 144*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------ 145*14bbe3e3SMatthew Wilcox (Oracle) 146*14bbe3e3SMatthew Wilcox (Oracle)To remove an existing node from a tree, call:: 147*14bbe3e3SMatthew Wilcox (Oracle) 148*14bbe3e3SMatthew Wilcox (Oracle) void rb_erase(struct rb_node *victim, struct rb_root *tree); 149*14bbe3e3SMatthew Wilcox (Oracle) 150*14bbe3e3SMatthew Wilcox (Oracle)Example:: 151*14bbe3e3SMatthew Wilcox (Oracle) 152*14bbe3e3SMatthew Wilcox (Oracle) struct mytype *data = mysearch(&mytree, "walrus"); 153*14bbe3e3SMatthew Wilcox (Oracle) 154*14bbe3e3SMatthew Wilcox (Oracle) if (data) { 155*14bbe3e3SMatthew Wilcox (Oracle) rb_erase(&data->node, &mytree); 156*14bbe3e3SMatthew Wilcox (Oracle) myfree(data); 157*14bbe3e3SMatthew Wilcox (Oracle) } 158*14bbe3e3SMatthew Wilcox (Oracle) 159*14bbe3e3SMatthew Wilcox (Oracle)To replace an existing node in a tree with a new one with the same key, call:: 160*14bbe3e3SMatthew Wilcox (Oracle) 161*14bbe3e3SMatthew Wilcox (Oracle) void rb_replace_node(struct rb_node *old, struct rb_node *new, 162*14bbe3e3SMatthew Wilcox (Oracle) struct rb_root *tree); 163*14bbe3e3SMatthew Wilcox (Oracle) 164*14bbe3e3SMatthew Wilcox (Oracle)Replacing a node this way does not re-sort the tree: If the new node doesn't 165*14bbe3e3SMatthew Wilcox (Oracle)have the same key as the old node, the rbtree will probably become corrupted. 166*14bbe3e3SMatthew Wilcox (Oracle) 167*14bbe3e3SMatthew Wilcox (Oracle)Iterating through the elements stored in an rbtree (in sort order) 168*14bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------------------------ 169*14bbe3e3SMatthew Wilcox (Oracle) 170*14bbe3e3SMatthew Wilcox (Oracle)Four functions are provided for iterating through an rbtree's contents in 171*14bbe3e3SMatthew Wilcox (Oracle)sorted order. These work on arbitrary trees, and should not need to be 172*14bbe3e3SMatthew Wilcox (Oracle)modified or wrapped (except for locking purposes):: 173*14bbe3e3SMatthew Wilcox (Oracle) 174*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_first(struct rb_root *tree); 175*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_last(struct rb_root *tree); 176*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_next(struct rb_node *node); 177*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_prev(struct rb_node *node); 178*14bbe3e3SMatthew Wilcox (Oracle) 179*14bbe3e3SMatthew Wilcox (Oracle)To start iterating, call rb_first() or rb_last() with a pointer to the root 180*14bbe3e3SMatthew Wilcox (Oracle)of the tree, which will return a pointer to the node structure contained in 181*14bbe3e3SMatthew Wilcox (Oracle)the first or last element in the tree. To continue, fetch the next or previous 182*14bbe3e3SMatthew Wilcox (Oracle)node by calling rb_next() or rb_prev() on the current node. This will return 183*14bbe3e3SMatthew Wilcox (Oracle)NULL when there are no more nodes left. 184*14bbe3e3SMatthew Wilcox (Oracle) 185*14bbe3e3SMatthew Wilcox (Oracle)The iterator functions return a pointer to the embedded struct rb_node, from 186*14bbe3e3SMatthew Wilcox (Oracle)which the containing data structure may be accessed with the container_of() 187*14bbe3e3SMatthew Wilcox (Oracle)macro, and individual members may be accessed directly via 188*14bbe3e3SMatthew Wilcox (Oracle)rb_entry(node, type, member). 189*14bbe3e3SMatthew Wilcox (Oracle) 190*14bbe3e3SMatthew Wilcox (Oracle)Example:: 191*14bbe3e3SMatthew Wilcox (Oracle) 192*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *node; 193*14bbe3e3SMatthew Wilcox (Oracle) for (node = rb_first(&mytree); node; node = rb_next(node)) 194*14bbe3e3SMatthew Wilcox (Oracle) printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); 195*14bbe3e3SMatthew Wilcox (Oracle) 196*14bbe3e3SMatthew Wilcox (Oracle)Cached rbtrees 197*14bbe3e3SMatthew Wilcox (Oracle)-------------- 198*14bbe3e3SMatthew Wilcox (Oracle) 199*14bbe3e3SMatthew Wilcox (Oracle)Computing the leftmost (smallest) node is quite a common task for binary 200*14bbe3e3SMatthew Wilcox (Oracle)search trees, such as for traversals or users relying on a the particular 201*14bbe3e3SMatthew Wilcox (Oracle)order for their own logic. To this end, users can use 'struct rb_root_cached' 202*14bbe3e3SMatthew Wilcox (Oracle)to optimize O(logN) rb_first() calls to a simple pointer fetch avoiding 203*14bbe3e3SMatthew Wilcox (Oracle)potentially expensive tree iterations. This is done at negligible runtime 204*14bbe3e3SMatthew Wilcox (Oracle)overhead for maintanence; albeit larger memory footprint. 205*14bbe3e3SMatthew Wilcox (Oracle) 206*14bbe3e3SMatthew Wilcox (Oracle)Similar to the rb_root structure, cached rbtrees are initialized to be 207*14bbe3e3SMatthew Wilcox (Oracle)empty via:: 208*14bbe3e3SMatthew Wilcox (Oracle) 209*14bbe3e3SMatthew Wilcox (Oracle) struct rb_root_cached mytree = RB_ROOT_CACHED; 210*14bbe3e3SMatthew Wilcox (Oracle) 211*14bbe3e3SMatthew Wilcox (Oracle)Cached rbtree is simply a regular rb_root with an extra pointer to cache the 212*14bbe3e3SMatthew Wilcox (Oracle)leftmost node. This allows rb_root_cached to exist wherever rb_root does, 213*14bbe3e3SMatthew Wilcox (Oracle)which permits augmented trees to be supported as well as only a few extra 214*14bbe3e3SMatthew Wilcox (Oracle)interfaces:: 215*14bbe3e3SMatthew Wilcox (Oracle) 216*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_first_cached(struct rb_root_cached *tree); 217*14bbe3e3SMatthew Wilcox (Oracle) void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool); 218*14bbe3e3SMatthew Wilcox (Oracle) void rb_erase_cached(struct rb_node *node, struct rb_root_cached *); 219*14bbe3e3SMatthew Wilcox (Oracle) 220*14bbe3e3SMatthew Wilcox (Oracle)Both insert and erase calls have their respective counterpart of augmented 221*14bbe3e3SMatthew Wilcox (Oracle)trees:: 222*14bbe3e3SMatthew Wilcox (Oracle) 223*14bbe3e3SMatthew Wilcox (Oracle) void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *, 224*14bbe3e3SMatthew Wilcox (Oracle) bool, struct rb_augment_callbacks *); 225*14bbe3e3SMatthew Wilcox (Oracle) void rb_erase_augmented_cached(struct rb_node *, struct rb_root_cached *, 226*14bbe3e3SMatthew Wilcox (Oracle) struct rb_augment_callbacks *); 227*14bbe3e3SMatthew Wilcox (Oracle) 228*14bbe3e3SMatthew Wilcox (Oracle) 229*14bbe3e3SMatthew Wilcox (Oracle)Support for Augmented rbtrees 230*14bbe3e3SMatthew Wilcox (Oracle)----------------------------- 231*14bbe3e3SMatthew Wilcox (Oracle) 232*14bbe3e3SMatthew Wilcox (Oracle)Augmented rbtree is an rbtree with "some" additional data stored in 233*14bbe3e3SMatthew Wilcox (Oracle)each node, where the additional data for node N must be a function of 234*14bbe3e3SMatthew Wilcox (Oracle)the contents of all nodes in the subtree rooted at N. This data can 235*14bbe3e3SMatthew Wilcox (Oracle)be used to augment some new functionality to rbtree. Augmented rbtree 236*14bbe3e3SMatthew Wilcox (Oracle)is an optional feature built on top of basic rbtree infrastructure. 237*14bbe3e3SMatthew Wilcox (Oracle)An rbtree user who wants this feature will have to call the augmentation 238*14bbe3e3SMatthew Wilcox (Oracle)functions with the user provided augmentation callback when inserting 239*14bbe3e3SMatthew Wilcox (Oracle)and erasing nodes. 240*14bbe3e3SMatthew Wilcox (Oracle) 241*14bbe3e3SMatthew Wilcox (Oracle)C files implementing augmented rbtree manipulation must include 242*14bbe3e3SMatthew Wilcox (Oracle)<linux/rbtree_augmented.h> instead of <linux/rbtree.h>. Note that 243*14bbe3e3SMatthew Wilcox (Oracle)linux/rbtree_augmented.h exposes some rbtree implementations details 244*14bbe3e3SMatthew Wilcox (Oracle)you are not expected to rely on; please stick to the documented APIs 245*14bbe3e3SMatthew Wilcox (Oracle)there and do not include <linux/rbtree_augmented.h> from header files 246*14bbe3e3SMatthew Wilcox (Oracle)either so as to minimize chances of your users accidentally relying on 247*14bbe3e3SMatthew Wilcox (Oracle)such implementation details. 248*14bbe3e3SMatthew Wilcox (Oracle) 249*14bbe3e3SMatthew Wilcox (Oracle)On insertion, the user must update the augmented information on the path 250*14bbe3e3SMatthew Wilcox (Oracle)leading to the inserted node, then call rb_link_node() as usual and 251*14bbe3e3SMatthew Wilcox (Oracle)rb_augment_inserted() instead of the usual rb_insert_color() call. 252*14bbe3e3SMatthew Wilcox (Oracle)If rb_augment_inserted() rebalances the rbtree, it will callback into 253*14bbe3e3SMatthew Wilcox (Oracle)a user provided function to update the augmented information on the 254*14bbe3e3SMatthew Wilcox (Oracle)affected subtrees. 255*14bbe3e3SMatthew Wilcox (Oracle) 256*14bbe3e3SMatthew Wilcox (Oracle)When erasing a node, the user must call rb_erase_augmented() instead of 257*14bbe3e3SMatthew Wilcox (Oracle)rb_erase(). rb_erase_augmented() calls back into user provided functions 258*14bbe3e3SMatthew Wilcox (Oracle)to updated the augmented information on affected subtrees. 259*14bbe3e3SMatthew Wilcox (Oracle) 260*14bbe3e3SMatthew Wilcox (Oracle)In both cases, the callbacks are provided through struct rb_augment_callbacks. 261*14bbe3e3SMatthew Wilcox (Oracle)3 callbacks must be defined: 262*14bbe3e3SMatthew Wilcox (Oracle) 263*14bbe3e3SMatthew Wilcox (Oracle)- A propagation callback, which updates the augmented value for a given 264*14bbe3e3SMatthew Wilcox (Oracle) node and its ancestors, up to a given stop point (or NULL to update 265*14bbe3e3SMatthew Wilcox (Oracle) all the way to the root). 266*14bbe3e3SMatthew Wilcox (Oracle) 267*14bbe3e3SMatthew Wilcox (Oracle)- A copy callback, which copies the augmented value for a given subtree 268*14bbe3e3SMatthew Wilcox (Oracle) to a newly assigned subtree root. 269*14bbe3e3SMatthew Wilcox (Oracle) 270*14bbe3e3SMatthew Wilcox (Oracle)- A tree rotation callback, which copies the augmented value for a given 271*14bbe3e3SMatthew Wilcox (Oracle) subtree to a newly assigned subtree root AND recomputes the augmented 272*14bbe3e3SMatthew Wilcox (Oracle) information for the former subtree root. 273*14bbe3e3SMatthew Wilcox (Oracle) 274*14bbe3e3SMatthew Wilcox (Oracle)The compiled code for rb_erase_augmented() may inline the propagation and 275*14bbe3e3SMatthew Wilcox (Oracle)copy callbacks, which results in a large function, so each augmented rbtree 276*14bbe3e3SMatthew Wilcox (Oracle)user should have a single rb_erase_augmented() call site in order to limit 277*14bbe3e3SMatthew Wilcox (Oracle)compiled code size. 278*14bbe3e3SMatthew Wilcox (Oracle) 279*14bbe3e3SMatthew Wilcox (Oracle) 280*14bbe3e3SMatthew Wilcox (Oracle)Sample usage 281*14bbe3e3SMatthew Wilcox (Oracle)^^^^^^^^^^^^ 282*14bbe3e3SMatthew Wilcox (Oracle) 283*14bbe3e3SMatthew Wilcox (Oracle)Interval tree is an example of augmented rb tree. Reference - 284*14bbe3e3SMatthew Wilcox (Oracle)"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. 285*14bbe3e3SMatthew Wilcox (Oracle)More details about interval trees: 286*14bbe3e3SMatthew Wilcox (Oracle) 287*14bbe3e3SMatthew Wilcox (Oracle)Classical rbtree has a single key and it cannot be directly used to store 288*14bbe3e3SMatthew Wilcox (Oracle)interval ranges like [lo:hi] and do a quick lookup for any overlap with a new 289*14bbe3e3SMatthew Wilcox (Oracle)lo:hi or to find whether there is an exact match for a new lo:hi. 290*14bbe3e3SMatthew Wilcox (Oracle) 291*14bbe3e3SMatthew Wilcox (Oracle)However, rbtree can be augmented to store such interval ranges in a structured 292*14bbe3e3SMatthew Wilcox (Oracle)way making it possible to do efficient lookup and exact match. 293*14bbe3e3SMatthew Wilcox (Oracle) 294*14bbe3e3SMatthew Wilcox (Oracle)This "extra information" stored in each node is the maximum hi 295*14bbe3e3SMatthew Wilcox (Oracle)(max_hi) value among all the nodes that are its descendants. This 296*14bbe3e3SMatthew Wilcox (Oracle)information can be maintained at each node just be looking at the node 297*14bbe3e3SMatthew Wilcox (Oracle)and its immediate children. And this will be used in O(log n) lookup 298*14bbe3e3SMatthew Wilcox (Oracle)for lowest match (lowest start address among all possible matches) 299*14bbe3e3SMatthew Wilcox (Oracle)with something like:: 300*14bbe3e3SMatthew Wilcox (Oracle) 301*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node * 302*14bbe3e3SMatthew Wilcox (Oracle) interval_tree_first_match(struct rb_root *root, 303*14bbe3e3SMatthew Wilcox (Oracle) unsigned long start, unsigned long last) 304*14bbe3e3SMatthew Wilcox (Oracle) { 305*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *node; 306*14bbe3e3SMatthew Wilcox (Oracle) 307*14bbe3e3SMatthew Wilcox (Oracle) if (!root->rb_node) 308*14bbe3e3SMatthew Wilcox (Oracle) return NULL; 309*14bbe3e3SMatthew Wilcox (Oracle) node = rb_entry(root->rb_node, struct interval_tree_node, rb); 310*14bbe3e3SMatthew Wilcox (Oracle) 311*14bbe3e3SMatthew Wilcox (Oracle) while (true) { 312*14bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_left) { 313*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *left = 314*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(node->rb.rb_left, 315*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb); 316*14bbe3e3SMatthew Wilcox (Oracle) if (left->__subtree_last >= start) { 317*14bbe3e3SMatthew Wilcox (Oracle) /* 318*14bbe3e3SMatthew Wilcox (Oracle) * Some nodes in left subtree satisfy Cond2. 319*14bbe3e3SMatthew Wilcox (Oracle) * Iterate to find the leftmost such node N. 320*14bbe3e3SMatthew Wilcox (Oracle) * If it also satisfies Cond1, that's the match 321*14bbe3e3SMatthew Wilcox (Oracle) * we are looking for. Otherwise, there is no 322*14bbe3e3SMatthew Wilcox (Oracle) * matching interval as nodes to the right of N 323*14bbe3e3SMatthew Wilcox (Oracle) * can't satisfy Cond1 either. 324*14bbe3e3SMatthew Wilcox (Oracle) */ 325*14bbe3e3SMatthew Wilcox (Oracle) node = left; 326*14bbe3e3SMatthew Wilcox (Oracle) continue; 327*14bbe3e3SMatthew Wilcox (Oracle) } 328*14bbe3e3SMatthew Wilcox (Oracle) } 329*14bbe3e3SMatthew Wilcox (Oracle) if (node->start <= last) { /* Cond1 */ 330*14bbe3e3SMatthew Wilcox (Oracle) if (node->last >= start) /* Cond2 */ 331*14bbe3e3SMatthew Wilcox (Oracle) return node; /* node is leftmost match */ 332*14bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_right) { 333*14bbe3e3SMatthew Wilcox (Oracle) node = rb_entry(node->rb.rb_right, 334*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb); 335*14bbe3e3SMatthew Wilcox (Oracle) if (node->__subtree_last >= start) 336*14bbe3e3SMatthew Wilcox (Oracle) continue; 337*14bbe3e3SMatthew Wilcox (Oracle) } 338*14bbe3e3SMatthew Wilcox (Oracle) } 339*14bbe3e3SMatthew Wilcox (Oracle) return NULL; /* No match */ 340*14bbe3e3SMatthew Wilcox (Oracle) } 341*14bbe3e3SMatthew Wilcox (Oracle) } 342*14bbe3e3SMatthew Wilcox (Oracle) 343*14bbe3e3SMatthew Wilcox (Oracle)Insertion/removal are defined using the following augmented callbacks:: 344*14bbe3e3SMatthew Wilcox (Oracle) 345*14bbe3e3SMatthew Wilcox (Oracle) static inline unsigned long 346*14bbe3e3SMatthew Wilcox (Oracle) compute_subtree_last(struct interval_tree_node *node) 347*14bbe3e3SMatthew Wilcox (Oracle) { 348*14bbe3e3SMatthew Wilcox (Oracle) unsigned long max = node->last, subtree_last; 349*14bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_left) { 350*14bbe3e3SMatthew Wilcox (Oracle) subtree_last = rb_entry(node->rb.rb_left, 351*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb)->__subtree_last; 352*14bbe3e3SMatthew Wilcox (Oracle) if (max < subtree_last) 353*14bbe3e3SMatthew Wilcox (Oracle) max = subtree_last; 354*14bbe3e3SMatthew Wilcox (Oracle) } 355*14bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_right) { 356*14bbe3e3SMatthew Wilcox (Oracle) subtree_last = rb_entry(node->rb.rb_right, 357*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb)->__subtree_last; 358*14bbe3e3SMatthew Wilcox (Oracle) if (max < subtree_last) 359*14bbe3e3SMatthew Wilcox (Oracle) max = subtree_last; 360*14bbe3e3SMatthew Wilcox (Oracle) } 361*14bbe3e3SMatthew Wilcox (Oracle) return max; 362*14bbe3e3SMatthew Wilcox (Oracle) } 363*14bbe3e3SMatthew Wilcox (Oracle) 364*14bbe3e3SMatthew Wilcox (Oracle) static void augment_propagate(struct rb_node *rb, struct rb_node *stop) 365*14bbe3e3SMatthew Wilcox (Oracle) { 366*14bbe3e3SMatthew Wilcox (Oracle) while (rb != stop) { 367*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *node = 368*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb, struct interval_tree_node, rb); 369*14bbe3e3SMatthew Wilcox (Oracle) unsigned long subtree_last = compute_subtree_last(node); 370*14bbe3e3SMatthew Wilcox (Oracle) if (node->__subtree_last == subtree_last) 371*14bbe3e3SMatthew Wilcox (Oracle) break; 372*14bbe3e3SMatthew Wilcox (Oracle) node->__subtree_last = subtree_last; 373*14bbe3e3SMatthew Wilcox (Oracle) rb = rb_parent(&node->rb); 374*14bbe3e3SMatthew Wilcox (Oracle) } 375*14bbe3e3SMatthew Wilcox (Oracle) } 376*14bbe3e3SMatthew Wilcox (Oracle) 377*14bbe3e3SMatthew Wilcox (Oracle) static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new) 378*14bbe3e3SMatthew Wilcox (Oracle) { 379*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *old = 380*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_old, struct interval_tree_node, rb); 381*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *new = 382*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_new, struct interval_tree_node, rb); 383*14bbe3e3SMatthew Wilcox (Oracle) 384*14bbe3e3SMatthew Wilcox (Oracle) new->__subtree_last = old->__subtree_last; 385*14bbe3e3SMatthew Wilcox (Oracle) } 386*14bbe3e3SMatthew Wilcox (Oracle) 387*14bbe3e3SMatthew Wilcox (Oracle) static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) 388*14bbe3e3SMatthew Wilcox (Oracle) { 389*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *old = 390*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_old, struct interval_tree_node, rb); 391*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *new = 392*14bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_new, struct interval_tree_node, rb); 393*14bbe3e3SMatthew Wilcox (Oracle) 394*14bbe3e3SMatthew Wilcox (Oracle) new->__subtree_last = old->__subtree_last; 395*14bbe3e3SMatthew Wilcox (Oracle) old->__subtree_last = compute_subtree_last(old); 396*14bbe3e3SMatthew Wilcox (Oracle) } 397*14bbe3e3SMatthew Wilcox (Oracle) 398*14bbe3e3SMatthew Wilcox (Oracle) static const struct rb_augment_callbacks augment_callbacks = { 399*14bbe3e3SMatthew Wilcox (Oracle) augment_propagate, augment_copy, augment_rotate 400*14bbe3e3SMatthew Wilcox (Oracle) }; 401*14bbe3e3SMatthew Wilcox (Oracle) 402*14bbe3e3SMatthew Wilcox (Oracle) void interval_tree_insert(struct interval_tree_node *node, 403*14bbe3e3SMatthew Wilcox (Oracle) struct rb_root *root) 404*14bbe3e3SMatthew Wilcox (Oracle) { 405*14bbe3e3SMatthew Wilcox (Oracle) struct rb_node **link = &root->rb_node, *rb_parent = NULL; 406*14bbe3e3SMatthew Wilcox (Oracle) unsigned long start = node->start, last = node->last; 407*14bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *parent; 408*14bbe3e3SMatthew Wilcox (Oracle) 409*14bbe3e3SMatthew Wilcox (Oracle) while (*link) { 410*14bbe3e3SMatthew Wilcox (Oracle) rb_parent = *link; 411*14bbe3e3SMatthew Wilcox (Oracle) parent = rb_entry(rb_parent, struct interval_tree_node, rb); 412*14bbe3e3SMatthew Wilcox (Oracle) if (parent->__subtree_last < last) 413*14bbe3e3SMatthew Wilcox (Oracle) parent->__subtree_last = last; 414*14bbe3e3SMatthew Wilcox (Oracle) if (start < parent->start) 415*14bbe3e3SMatthew Wilcox (Oracle) link = &parent->rb.rb_left; 416*14bbe3e3SMatthew Wilcox (Oracle) else 417*14bbe3e3SMatthew Wilcox (Oracle) link = &parent->rb.rb_right; 418*14bbe3e3SMatthew Wilcox (Oracle) } 419*14bbe3e3SMatthew Wilcox (Oracle) 420*14bbe3e3SMatthew Wilcox (Oracle) node->__subtree_last = last; 421*14bbe3e3SMatthew Wilcox (Oracle) rb_link_node(&node->rb, rb_parent, link); 422*14bbe3e3SMatthew Wilcox (Oracle) rb_insert_augmented(&node->rb, root, &augment_callbacks); 423*14bbe3e3SMatthew Wilcox (Oracle) } 424*14bbe3e3SMatthew Wilcox (Oracle) 425*14bbe3e3SMatthew Wilcox (Oracle) void interval_tree_remove(struct interval_tree_node *node, 426*14bbe3e3SMatthew Wilcox (Oracle) struct rb_root *root) 427*14bbe3e3SMatthew Wilcox (Oracle) { 428*14bbe3e3SMatthew Wilcox (Oracle) rb_erase_augmented(&node->rb, root, &augment_callbacks); 429*14bbe3e3SMatthew Wilcox (Oracle) } 430