114bbe3e3SMatthew Wilcox (Oracle)================================= 214bbe3e3SMatthew Wilcox (Oracle)Red-black Trees (rbtree) in Linux 314bbe3e3SMatthew Wilcox (Oracle)================================= 414bbe3e3SMatthew Wilcox (Oracle) 514bbe3e3SMatthew Wilcox (Oracle) 614bbe3e3SMatthew Wilcox (Oracle):Date: January 18, 2007 714bbe3e3SMatthew Wilcox (Oracle):Author: Rob Landley <rob@landley.net> 814bbe3e3SMatthew Wilcox (Oracle) 914bbe3e3SMatthew Wilcox (Oracle)What are red-black trees, and what are they for? 1014bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------ 1114bbe3e3SMatthew Wilcox (Oracle) 1214bbe3e3SMatthew Wilcox (Oracle)Red-black trees are a type of self-balancing binary search tree, used for 1314bbe3e3SMatthew Wilcox (Oracle)storing sortable key/value data pairs. This differs from radix trees (which 1414bbe3e3SMatthew Wilcox (Oracle)are used to efficiently store sparse arrays and thus use long integer indexes 1514bbe3e3SMatthew Wilcox (Oracle)to insert/access/delete nodes) and hash tables (which are not kept sorted to 1614bbe3e3SMatthew Wilcox (Oracle)be easily traversed in order, and must be tuned for a specific size and 1714bbe3e3SMatthew Wilcox (Oracle)hash function where rbtrees scale gracefully storing arbitrary keys). 1814bbe3e3SMatthew Wilcox (Oracle) 1914bbe3e3SMatthew Wilcox (Oracle)Red-black trees are similar to AVL trees, but provide faster real-time bounded 2014bbe3e3SMatthew Wilcox (Oracle)worst case performance for insertion and deletion (at most two rotations and 2114bbe3e3SMatthew Wilcox (Oracle)three rotations, respectively, to balance the tree), with slightly slower 2214bbe3e3SMatthew Wilcox (Oracle)(but still O(log n)) lookup time. 2314bbe3e3SMatthew Wilcox (Oracle) 2414bbe3e3SMatthew Wilcox (Oracle)To quote Linux Weekly News: 2514bbe3e3SMatthew Wilcox (Oracle) 2614bbe3e3SMatthew Wilcox (Oracle) There are a number of red-black trees in use in the kernel. 2714bbe3e3SMatthew Wilcox (Oracle) The deadline and CFQ I/O schedulers employ rbtrees to 2814bbe3e3SMatthew Wilcox (Oracle) track requests; the packet CD/DVD driver does the same. 2914bbe3e3SMatthew Wilcox (Oracle) The high-resolution timer code uses an rbtree to organize outstanding 3014bbe3e3SMatthew Wilcox (Oracle) timer requests. The ext3 filesystem tracks directory entries in a 3114bbe3e3SMatthew Wilcox (Oracle) red-black tree. Virtual memory areas (VMAs) are tracked with red-black 3214bbe3e3SMatthew Wilcox (Oracle) trees, as are epoll file descriptors, cryptographic keys, and network 3314bbe3e3SMatthew Wilcox (Oracle) packets in the "hierarchical token bucket" scheduler. 3414bbe3e3SMatthew Wilcox (Oracle) 3514bbe3e3SMatthew Wilcox (Oracle)This document covers use of the Linux rbtree implementation. For more 3614bbe3e3SMatthew Wilcox (Oracle)information on the nature and implementation of Red Black Trees, see: 3714bbe3e3SMatthew Wilcox (Oracle) 3814bbe3e3SMatthew Wilcox (Oracle) Linux Weekly News article on red-black trees 3993431e06SAlexander A. Klimov https://lwn.net/Articles/184495/ 4014bbe3e3SMatthew Wilcox (Oracle) 4114bbe3e3SMatthew Wilcox (Oracle) Wikipedia entry on red-black trees 4293431e06SAlexander A. Klimov https://en.wikipedia.org/wiki/Red-black_tree 4314bbe3e3SMatthew Wilcox (Oracle) 4414bbe3e3SMatthew Wilcox (Oracle)Linux implementation of red-black trees 4514bbe3e3SMatthew Wilcox (Oracle)--------------------------------------- 4614bbe3e3SMatthew Wilcox (Oracle) 4714bbe3e3SMatthew Wilcox (Oracle)Linux's rbtree implementation lives in the file "lib/rbtree.c". To use it, 4814bbe3e3SMatthew Wilcox (Oracle)"#include <linux/rbtree.h>". 4914bbe3e3SMatthew Wilcox (Oracle) 5014bbe3e3SMatthew Wilcox (Oracle)The Linux rbtree implementation is optimized for speed, and thus has one 5114bbe3e3SMatthew Wilcox (Oracle)less layer of indirection (and better cache locality) than more traditional 5214bbe3e3SMatthew Wilcox (Oracle)tree implementations. Instead of using pointers to separate rb_node and data 5314bbe3e3SMatthew Wilcox (Oracle)structures, each instance of struct rb_node is embedded in the data structure 5414bbe3e3SMatthew Wilcox (Oracle)it organizes. And instead of using a comparison callback function pointer, 5514bbe3e3SMatthew Wilcox (Oracle)users are expected to write their own tree search and insert functions 5614bbe3e3SMatthew Wilcox (Oracle)which call the provided rbtree functions. Locking is also left up to the 5714bbe3e3SMatthew Wilcox (Oracle)user of the rbtree code. 5814bbe3e3SMatthew Wilcox (Oracle) 5914bbe3e3SMatthew Wilcox (Oracle)Creating a new rbtree 6014bbe3e3SMatthew Wilcox (Oracle)--------------------- 6114bbe3e3SMatthew Wilcox (Oracle) 6214bbe3e3SMatthew Wilcox (Oracle)Data nodes in an rbtree tree are structures containing a struct rb_node member:: 6314bbe3e3SMatthew Wilcox (Oracle) 6414bbe3e3SMatthew Wilcox (Oracle) struct mytype { 6514bbe3e3SMatthew Wilcox (Oracle) struct rb_node node; 6614bbe3e3SMatthew Wilcox (Oracle) char *keystring; 6714bbe3e3SMatthew Wilcox (Oracle) }; 6814bbe3e3SMatthew Wilcox (Oracle) 6914bbe3e3SMatthew Wilcox (Oracle)When dealing with a pointer to the embedded struct rb_node, the containing data 7014bbe3e3SMatthew Wilcox (Oracle)structure may be accessed with the standard container_of() macro. In addition, 7114bbe3e3SMatthew Wilcox (Oracle)individual members may be accessed directly via rb_entry(node, type, member). 7214bbe3e3SMatthew Wilcox (Oracle) 7314bbe3e3SMatthew Wilcox (Oracle)At the root of each rbtree is an rb_root structure, which is initialized to be 7414bbe3e3SMatthew Wilcox (Oracle)empty via: 7514bbe3e3SMatthew Wilcox (Oracle) 7614bbe3e3SMatthew Wilcox (Oracle) struct rb_root mytree = RB_ROOT; 7714bbe3e3SMatthew Wilcox (Oracle) 7814bbe3e3SMatthew Wilcox (Oracle)Searching for a value in an rbtree 7914bbe3e3SMatthew Wilcox (Oracle)---------------------------------- 8014bbe3e3SMatthew Wilcox (Oracle) 8114bbe3e3SMatthew Wilcox (Oracle)Writing a search function for your tree is fairly straightforward: start at the 8214bbe3e3SMatthew Wilcox (Oracle)root, compare each value, and follow the left or right branch as necessary. 8314bbe3e3SMatthew Wilcox (Oracle) 8414bbe3e3SMatthew Wilcox (Oracle)Example:: 8514bbe3e3SMatthew Wilcox (Oracle) 8614bbe3e3SMatthew Wilcox (Oracle) struct mytype *my_search(struct rb_root *root, char *string) 8714bbe3e3SMatthew Wilcox (Oracle) { 8814bbe3e3SMatthew Wilcox (Oracle) struct rb_node *node = root->rb_node; 8914bbe3e3SMatthew Wilcox (Oracle) 9014bbe3e3SMatthew Wilcox (Oracle) while (node) { 9114bbe3e3SMatthew Wilcox (Oracle) struct mytype *data = container_of(node, struct mytype, node); 9214bbe3e3SMatthew Wilcox (Oracle) int result; 9314bbe3e3SMatthew Wilcox (Oracle) 9414bbe3e3SMatthew Wilcox (Oracle) result = strcmp(string, data->keystring); 9514bbe3e3SMatthew Wilcox (Oracle) 9614bbe3e3SMatthew Wilcox (Oracle) if (result < 0) 9714bbe3e3SMatthew Wilcox (Oracle) node = node->rb_left; 9814bbe3e3SMatthew Wilcox (Oracle) else if (result > 0) 9914bbe3e3SMatthew Wilcox (Oracle) node = node->rb_right; 10014bbe3e3SMatthew Wilcox (Oracle) else 10114bbe3e3SMatthew Wilcox (Oracle) return data; 10214bbe3e3SMatthew Wilcox (Oracle) } 10314bbe3e3SMatthew Wilcox (Oracle) return NULL; 10414bbe3e3SMatthew Wilcox (Oracle) } 10514bbe3e3SMatthew Wilcox (Oracle) 10614bbe3e3SMatthew Wilcox (Oracle)Inserting data into an rbtree 10714bbe3e3SMatthew Wilcox (Oracle)----------------------------- 10814bbe3e3SMatthew Wilcox (Oracle) 10914bbe3e3SMatthew Wilcox (Oracle)Inserting data in the tree involves first searching for the place to insert the 11014bbe3e3SMatthew Wilcox (Oracle)new node, then inserting the node and rebalancing ("recoloring") the tree. 11114bbe3e3SMatthew Wilcox (Oracle) 11214bbe3e3SMatthew Wilcox (Oracle)The search for insertion differs from the previous search by finding the 11314bbe3e3SMatthew Wilcox (Oracle)location of the pointer on which to graft the new node. The new node also 11414bbe3e3SMatthew Wilcox (Oracle)needs a link to its parent node for rebalancing purposes. 11514bbe3e3SMatthew Wilcox (Oracle) 11614bbe3e3SMatthew Wilcox (Oracle)Example:: 11714bbe3e3SMatthew Wilcox (Oracle) 11814bbe3e3SMatthew Wilcox (Oracle) int my_insert(struct rb_root *root, struct mytype *data) 11914bbe3e3SMatthew Wilcox (Oracle) { 12014bbe3e3SMatthew Wilcox (Oracle) struct rb_node **new = &(root->rb_node), *parent = NULL; 12114bbe3e3SMatthew Wilcox (Oracle) 12214bbe3e3SMatthew Wilcox (Oracle) /* Figure out where to put new node */ 12314bbe3e3SMatthew Wilcox (Oracle) while (*new) { 12414bbe3e3SMatthew Wilcox (Oracle) struct mytype *this = container_of(*new, struct mytype, node); 12514bbe3e3SMatthew Wilcox (Oracle) int result = strcmp(data->keystring, this->keystring); 12614bbe3e3SMatthew Wilcox (Oracle) 12714bbe3e3SMatthew Wilcox (Oracle) parent = *new; 12814bbe3e3SMatthew Wilcox (Oracle) if (result < 0) 12914bbe3e3SMatthew Wilcox (Oracle) new = &((*new)->rb_left); 13014bbe3e3SMatthew Wilcox (Oracle) else if (result > 0) 13114bbe3e3SMatthew Wilcox (Oracle) new = &((*new)->rb_right); 13214bbe3e3SMatthew Wilcox (Oracle) else 13314bbe3e3SMatthew Wilcox (Oracle) return FALSE; 13414bbe3e3SMatthew Wilcox (Oracle) } 13514bbe3e3SMatthew Wilcox (Oracle) 13614bbe3e3SMatthew Wilcox (Oracle) /* Add new node and rebalance tree. */ 13714bbe3e3SMatthew Wilcox (Oracle) rb_link_node(&data->node, parent, new); 13814bbe3e3SMatthew Wilcox (Oracle) rb_insert_color(&data->node, root); 13914bbe3e3SMatthew Wilcox (Oracle) 14014bbe3e3SMatthew Wilcox (Oracle) return TRUE; 14114bbe3e3SMatthew Wilcox (Oracle) } 14214bbe3e3SMatthew Wilcox (Oracle) 14314bbe3e3SMatthew Wilcox (Oracle)Removing or replacing existing data in an rbtree 14414bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------ 14514bbe3e3SMatthew Wilcox (Oracle) 14614bbe3e3SMatthew Wilcox (Oracle)To remove an existing node from a tree, call:: 14714bbe3e3SMatthew Wilcox (Oracle) 14814bbe3e3SMatthew Wilcox (Oracle) void rb_erase(struct rb_node *victim, struct rb_root *tree); 14914bbe3e3SMatthew Wilcox (Oracle) 15014bbe3e3SMatthew Wilcox (Oracle)Example:: 15114bbe3e3SMatthew Wilcox (Oracle) 15214bbe3e3SMatthew Wilcox (Oracle) struct mytype *data = mysearch(&mytree, "walrus"); 15314bbe3e3SMatthew Wilcox (Oracle) 15414bbe3e3SMatthew Wilcox (Oracle) if (data) { 15514bbe3e3SMatthew Wilcox (Oracle) rb_erase(&data->node, &mytree); 15614bbe3e3SMatthew Wilcox (Oracle) myfree(data); 15714bbe3e3SMatthew Wilcox (Oracle) } 15814bbe3e3SMatthew Wilcox (Oracle) 15914bbe3e3SMatthew Wilcox (Oracle)To replace an existing node in a tree with a new one with the same key, call:: 16014bbe3e3SMatthew Wilcox (Oracle) 16114bbe3e3SMatthew Wilcox (Oracle) void rb_replace_node(struct rb_node *old, struct rb_node *new, 16214bbe3e3SMatthew Wilcox (Oracle) struct rb_root *tree); 16314bbe3e3SMatthew Wilcox (Oracle) 16414bbe3e3SMatthew Wilcox (Oracle)Replacing a node this way does not re-sort the tree: If the new node doesn't 16514bbe3e3SMatthew Wilcox (Oracle)have the same key as the old node, the rbtree will probably become corrupted. 16614bbe3e3SMatthew Wilcox (Oracle) 16714bbe3e3SMatthew Wilcox (Oracle)Iterating through the elements stored in an rbtree (in sort order) 16814bbe3e3SMatthew Wilcox (Oracle)------------------------------------------------------------------ 16914bbe3e3SMatthew Wilcox (Oracle) 17014bbe3e3SMatthew Wilcox (Oracle)Four functions are provided for iterating through an rbtree's contents in 17114bbe3e3SMatthew Wilcox (Oracle)sorted order. These work on arbitrary trees, and should not need to be 17214bbe3e3SMatthew Wilcox (Oracle)modified or wrapped (except for locking purposes):: 17314bbe3e3SMatthew Wilcox (Oracle) 17414bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_first(struct rb_root *tree); 17514bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_last(struct rb_root *tree); 17614bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_next(struct rb_node *node); 17714bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_prev(struct rb_node *node); 17814bbe3e3SMatthew Wilcox (Oracle) 17914bbe3e3SMatthew Wilcox (Oracle)To start iterating, call rb_first() or rb_last() with a pointer to the root 18014bbe3e3SMatthew Wilcox (Oracle)of the tree, which will return a pointer to the node structure contained in 18114bbe3e3SMatthew Wilcox (Oracle)the first or last element in the tree. To continue, fetch the next or previous 18214bbe3e3SMatthew Wilcox (Oracle)node by calling rb_next() or rb_prev() on the current node. This will return 18314bbe3e3SMatthew Wilcox (Oracle)NULL when there are no more nodes left. 18414bbe3e3SMatthew Wilcox (Oracle) 18514bbe3e3SMatthew Wilcox (Oracle)The iterator functions return a pointer to the embedded struct rb_node, from 18614bbe3e3SMatthew Wilcox (Oracle)which the containing data structure may be accessed with the container_of() 18714bbe3e3SMatthew Wilcox (Oracle)macro, and individual members may be accessed directly via 18814bbe3e3SMatthew Wilcox (Oracle)rb_entry(node, type, member). 18914bbe3e3SMatthew Wilcox (Oracle) 19014bbe3e3SMatthew Wilcox (Oracle)Example:: 19114bbe3e3SMatthew Wilcox (Oracle) 19214bbe3e3SMatthew Wilcox (Oracle) struct rb_node *node; 19314bbe3e3SMatthew Wilcox (Oracle) for (node = rb_first(&mytree); node; node = rb_next(node)) 19414bbe3e3SMatthew Wilcox (Oracle) printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); 19514bbe3e3SMatthew Wilcox (Oracle) 19614bbe3e3SMatthew Wilcox (Oracle)Cached rbtrees 19714bbe3e3SMatthew Wilcox (Oracle)-------------- 19814bbe3e3SMatthew Wilcox (Oracle) 19914bbe3e3SMatthew Wilcox (Oracle)Computing the leftmost (smallest) node is quite a common task for binary 20014bbe3e3SMatthew Wilcox (Oracle)search trees, such as for traversals or users relying on a the particular 20114bbe3e3SMatthew Wilcox (Oracle)order for their own logic. To this end, users can use 'struct rb_root_cached' 20214bbe3e3SMatthew Wilcox (Oracle)to optimize O(logN) rb_first() calls to a simple pointer fetch avoiding 20314bbe3e3SMatthew Wilcox (Oracle)potentially expensive tree iterations. This is done at negligible runtime 204*399bfc8bSBhaskar Chowdhuryoverhead for maintenance; albeit larger memory footprint. 20514bbe3e3SMatthew Wilcox (Oracle) 20614bbe3e3SMatthew Wilcox (Oracle)Similar to the rb_root structure, cached rbtrees are initialized to be 20714bbe3e3SMatthew Wilcox (Oracle)empty via:: 20814bbe3e3SMatthew Wilcox (Oracle) 20914bbe3e3SMatthew Wilcox (Oracle) struct rb_root_cached mytree = RB_ROOT_CACHED; 21014bbe3e3SMatthew Wilcox (Oracle) 21114bbe3e3SMatthew Wilcox (Oracle)Cached rbtree is simply a regular rb_root with an extra pointer to cache the 21214bbe3e3SMatthew Wilcox (Oracle)leftmost node. This allows rb_root_cached to exist wherever rb_root does, 21314bbe3e3SMatthew Wilcox (Oracle)which permits augmented trees to be supported as well as only a few extra 21414bbe3e3SMatthew Wilcox (Oracle)interfaces:: 21514bbe3e3SMatthew Wilcox (Oracle) 21614bbe3e3SMatthew Wilcox (Oracle) struct rb_node *rb_first_cached(struct rb_root_cached *tree); 21714bbe3e3SMatthew Wilcox (Oracle) void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool); 21814bbe3e3SMatthew Wilcox (Oracle) void rb_erase_cached(struct rb_node *node, struct rb_root_cached *); 21914bbe3e3SMatthew Wilcox (Oracle) 22014bbe3e3SMatthew Wilcox (Oracle)Both insert and erase calls have their respective counterpart of augmented 22114bbe3e3SMatthew Wilcox (Oracle)trees:: 22214bbe3e3SMatthew Wilcox (Oracle) 22314bbe3e3SMatthew Wilcox (Oracle) void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *, 22414bbe3e3SMatthew Wilcox (Oracle) bool, struct rb_augment_callbacks *); 22514bbe3e3SMatthew Wilcox (Oracle) void rb_erase_augmented_cached(struct rb_node *, struct rb_root_cached *, 22614bbe3e3SMatthew Wilcox (Oracle) struct rb_augment_callbacks *); 22714bbe3e3SMatthew Wilcox (Oracle) 22814bbe3e3SMatthew Wilcox (Oracle) 22914bbe3e3SMatthew Wilcox (Oracle)Support for Augmented rbtrees 23014bbe3e3SMatthew Wilcox (Oracle)----------------------------- 23114bbe3e3SMatthew Wilcox (Oracle) 23214bbe3e3SMatthew Wilcox (Oracle)Augmented rbtree is an rbtree with "some" additional data stored in 23314bbe3e3SMatthew Wilcox (Oracle)each node, where the additional data for node N must be a function of 23414bbe3e3SMatthew Wilcox (Oracle)the contents of all nodes in the subtree rooted at N. This data can 23514bbe3e3SMatthew Wilcox (Oracle)be used to augment some new functionality to rbtree. Augmented rbtree 23614bbe3e3SMatthew Wilcox (Oracle)is an optional feature built on top of basic rbtree infrastructure. 23714bbe3e3SMatthew Wilcox (Oracle)An rbtree user who wants this feature will have to call the augmentation 23814bbe3e3SMatthew Wilcox (Oracle)functions with the user provided augmentation callback when inserting 23914bbe3e3SMatthew Wilcox (Oracle)and erasing nodes. 24014bbe3e3SMatthew Wilcox (Oracle) 24114bbe3e3SMatthew Wilcox (Oracle)C files implementing augmented rbtree manipulation must include 24214bbe3e3SMatthew Wilcox (Oracle)<linux/rbtree_augmented.h> instead of <linux/rbtree.h>. Note that 24314bbe3e3SMatthew Wilcox (Oracle)linux/rbtree_augmented.h exposes some rbtree implementations details 24414bbe3e3SMatthew Wilcox (Oracle)you are not expected to rely on; please stick to the documented APIs 24514bbe3e3SMatthew Wilcox (Oracle)there and do not include <linux/rbtree_augmented.h> from header files 24614bbe3e3SMatthew Wilcox (Oracle)either so as to minimize chances of your users accidentally relying on 24714bbe3e3SMatthew Wilcox (Oracle)such implementation details. 24814bbe3e3SMatthew Wilcox (Oracle) 24914bbe3e3SMatthew Wilcox (Oracle)On insertion, the user must update the augmented information on the path 25014bbe3e3SMatthew Wilcox (Oracle)leading to the inserted node, then call rb_link_node() as usual and 25114bbe3e3SMatthew Wilcox (Oracle)rb_augment_inserted() instead of the usual rb_insert_color() call. 25214bbe3e3SMatthew Wilcox (Oracle)If rb_augment_inserted() rebalances the rbtree, it will callback into 25314bbe3e3SMatthew Wilcox (Oracle)a user provided function to update the augmented information on the 25414bbe3e3SMatthew Wilcox (Oracle)affected subtrees. 25514bbe3e3SMatthew Wilcox (Oracle) 25614bbe3e3SMatthew Wilcox (Oracle)When erasing a node, the user must call rb_erase_augmented() instead of 25714bbe3e3SMatthew Wilcox (Oracle)rb_erase(). rb_erase_augmented() calls back into user provided functions 25814bbe3e3SMatthew Wilcox (Oracle)to updated the augmented information on affected subtrees. 25914bbe3e3SMatthew Wilcox (Oracle) 26014bbe3e3SMatthew Wilcox (Oracle)In both cases, the callbacks are provided through struct rb_augment_callbacks. 26114bbe3e3SMatthew Wilcox (Oracle)3 callbacks must be defined: 26214bbe3e3SMatthew Wilcox (Oracle) 26314bbe3e3SMatthew Wilcox (Oracle)- A propagation callback, which updates the augmented value for a given 26414bbe3e3SMatthew Wilcox (Oracle) node and its ancestors, up to a given stop point (or NULL to update 26514bbe3e3SMatthew Wilcox (Oracle) all the way to the root). 26614bbe3e3SMatthew Wilcox (Oracle) 26714bbe3e3SMatthew Wilcox (Oracle)- A copy callback, which copies the augmented value for a given subtree 26814bbe3e3SMatthew Wilcox (Oracle) to a newly assigned subtree root. 26914bbe3e3SMatthew Wilcox (Oracle) 27014bbe3e3SMatthew Wilcox (Oracle)- A tree rotation callback, which copies the augmented value for a given 27114bbe3e3SMatthew Wilcox (Oracle) subtree to a newly assigned subtree root AND recomputes the augmented 27214bbe3e3SMatthew Wilcox (Oracle) information for the former subtree root. 27314bbe3e3SMatthew Wilcox (Oracle) 27414bbe3e3SMatthew Wilcox (Oracle)The compiled code for rb_erase_augmented() may inline the propagation and 27514bbe3e3SMatthew Wilcox (Oracle)copy callbacks, which results in a large function, so each augmented rbtree 27614bbe3e3SMatthew Wilcox (Oracle)user should have a single rb_erase_augmented() call site in order to limit 27714bbe3e3SMatthew Wilcox (Oracle)compiled code size. 27814bbe3e3SMatthew Wilcox (Oracle) 27914bbe3e3SMatthew Wilcox (Oracle) 28014bbe3e3SMatthew Wilcox (Oracle)Sample usage 28114bbe3e3SMatthew Wilcox (Oracle)^^^^^^^^^^^^ 28214bbe3e3SMatthew Wilcox (Oracle) 28314bbe3e3SMatthew Wilcox (Oracle)Interval tree is an example of augmented rb tree. Reference - 28414bbe3e3SMatthew Wilcox (Oracle)"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. 28514bbe3e3SMatthew Wilcox (Oracle)More details about interval trees: 28614bbe3e3SMatthew Wilcox (Oracle) 28714bbe3e3SMatthew Wilcox (Oracle)Classical rbtree has a single key and it cannot be directly used to store 28814bbe3e3SMatthew Wilcox (Oracle)interval ranges like [lo:hi] and do a quick lookup for any overlap with a new 28914bbe3e3SMatthew Wilcox (Oracle)lo:hi or to find whether there is an exact match for a new lo:hi. 29014bbe3e3SMatthew Wilcox (Oracle) 29114bbe3e3SMatthew Wilcox (Oracle)However, rbtree can be augmented to store such interval ranges in a structured 29214bbe3e3SMatthew Wilcox (Oracle)way making it possible to do efficient lookup and exact match. 29314bbe3e3SMatthew Wilcox (Oracle) 29414bbe3e3SMatthew Wilcox (Oracle)This "extra information" stored in each node is the maximum hi 29514bbe3e3SMatthew Wilcox (Oracle)(max_hi) value among all the nodes that are its descendants. This 29614bbe3e3SMatthew Wilcox (Oracle)information can be maintained at each node just be looking at the node 29714bbe3e3SMatthew Wilcox (Oracle)and its immediate children. And this will be used in O(log n) lookup 29814bbe3e3SMatthew Wilcox (Oracle)for lowest match (lowest start address among all possible matches) 29914bbe3e3SMatthew Wilcox (Oracle)with something like:: 30014bbe3e3SMatthew Wilcox (Oracle) 30114bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node * 30214bbe3e3SMatthew Wilcox (Oracle) interval_tree_first_match(struct rb_root *root, 30314bbe3e3SMatthew Wilcox (Oracle) unsigned long start, unsigned long last) 30414bbe3e3SMatthew Wilcox (Oracle) { 30514bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *node; 30614bbe3e3SMatthew Wilcox (Oracle) 30714bbe3e3SMatthew Wilcox (Oracle) if (!root->rb_node) 30814bbe3e3SMatthew Wilcox (Oracle) return NULL; 30914bbe3e3SMatthew Wilcox (Oracle) node = rb_entry(root->rb_node, struct interval_tree_node, rb); 31014bbe3e3SMatthew Wilcox (Oracle) 31114bbe3e3SMatthew Wilcox (Oracle) while (true) { 31214bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_left) { 31314bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *left = 31414bbe3e3SMatthew Wilcox (Oracle) rb_entry(node->rb.rb_left, 31514bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb); 31614bbe3e3SMatthew Wilcox (Oracle) if (left->__subtree_last >= start) { 31714bbe3e3SMatthew Wilcox (Oracle) /* 31814bbe3e3SMatthew Wilcox (Oracle) * Some nodes in left subtree satisfy Cond2. 31914bbe3e3SMatthew Wilcox (Oracle) * Iterate to find the leftmost such node N. 32014bbe3e3SMatthew Wilcox (Oracle) * If it also satisfies Cond1, that's the match 32114bbe3e3SMatthew Wilcox (Oracle) * we are looking for. Otherwise, there is no 32214bbe3e3SMatthew Wilcox (Oracle) * matching interval as nodes to the right of N 32314bbe3e3SMatthew Wilcox (Oracle) * can't satisfy Cond1 either. 32414bbe3e3SMatthew Wilcox (Oracle) */ 32514bbe3e3SMatthew Wilcox (Oracle) node = left; 32614bbe3e3SMatthew Wilcox (Oracle) continue; 32714bbe3e3SMatthew Wilcox (Oracle) } 32814bbe3e3SMatthew Wilcox (Oracle) } 32914bbe3e3SMatthew Wilcox (Oracle) if (node->start <= last) { /* Cond1 */ 33014bbe3e3SMatthew Wilcox (Oracle) if (node->last >= start) /* Cond2 */ 33114bbe3e3SMatthew Wilcox (Oracle) return node; /* node is leftmost match */ 33214bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_right) { 33314bbe3e3SMatthew Wilcox (Oracle) node = rb_entry(node->rb.rb_right, 33414bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb); 33514bbe3e3SMatthew Wilcox (Oracle) if (node->__subtree_last >= start) 33614bbe3e3SMatthew Wilcox (Oracle) continue; 33714bbe3e3SMatthew Wilcox (Oracle) } 33814bbe3e3SMatthew Wilcox (Oracle) } 33914bbe3e3SMatthew Wilcox (Oracle) return NULL; /* No match */ 34014bbe3e3SMatthew Wilcox (Oracle) } 34114bbe3e3SMatthew Wilcox (Oracle) } 34214bbe3e3SMatthew Wilcox (Oracle) 34314bbe3e3SMatthew Wilcox (Oracle)Insertion/removal are defined using the following augmented callbacks:: 34414bbe3e3SMatthew Wilcox (Oracle) 34514bbe3e3SMatthew Wilcox (Oracle) static inline unsigned long 34614bbe3e3SMatthew Wilcox (Oracle) compute_subtree_last(struct interval_tree_node *node) 34714bbe3e3SMatthew Wilcox (Oracle) { 34814bbe3e3SMatthew Wilcox (Oracle) unsigned long max = node->last, subtree_last; 34914bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_left) { 35014bbe3e3SMatthew Wilcox (Oracle) subtree_last = rb_entry(node->rb.rb_left, 35114bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb)->__subtree_last; 35214bbe3e3SMatthew Wilcox (Oracle) if (max < subtree_last) 35314bbe3e3SMatthew Wilcox (Oracle) max = subtree_last; 35414bbe3e3SMatthew Wilcox (Oracle) } 35514bbe3e3SMatthew Wilcox (Oracle) if (node->rb.rb_right) { 35614bbe3e3SMatthew Wilcox (Oracle) subtree_last = rb_entry(node->rb.rb_right, 35714bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node, rb)->__subtree_last; 35814bbe3e3SMatthew Wilcox (Oracle) if (max < subtree_last) 35914bbe3e3SMatthew Wilcox (Oracle) max = subtree_last; 36014bbe3e3SMatthew Wilcox (Oracle) } 36114bbe3e3SMatthew Wilcox (Oracle) return max; 36214bbe3e3SMatthew Wilcox (Oracle) } 36314bbe3e3SMatthew Wilcox (Oracle) 36414bbe3e3SMatthew Wilcox (Oracle) static void augment_propagate(struct rb_node *rb, struct rb_node *stop) 36514bbe3e3SMatthew Wilcox (Oracle) { 36614bbe3e3SMatthew Wilcox (Oracle) while (rb != stop) { 36714bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *node = 36814bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb, struct interval_tree_node, rb); 36914bbe3e3SMatthew Wilcox (Oracle) unsigned long subtree_last = compute_subtree_last(node); 37014bbe3e3SMatthew Wilcox (Oracle) if (node->__subtree_last == subtree_last) 37114bbe3e3SMatthew Wilcox (Oracle) break; 37214bbe3e3SMatthew Wilcox (Oracle) node->__subtree_last = subtree_last; 37314bbe3e3SMatthew Wilcox (Oracle) rb = rb_parent(&node->rb); 37414bbe3e3SMatthew Wilcox (Oracle) } 37514bbe3e3SMatthew Wilcox (Oracle) } 37614bbe3e3SMatthew Wilcox (Oracle) 37714bbe3e3SMatthew Wilcox (Oracle) static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new) 37814bbe3e3SMatthew Wilcox (Oracle) { 37914bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *old = 38014bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_old, struct interval_tree_node, rb); 38114bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *new = 38214bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_new, struct interval_tree_node, rb); 38314bbe3e3SMatthew Wilcox (Oracle) 38414bbe3e3SMatthew Wilcox (Oracle) new->__subtree_last = old->__subtree_last; 38514bbe3e3SMatthew Wilcox (Oracle) } 38614bbe3e3SMatthew Wilcox (Oracle) 38714bbe3e3SMatthew Wilcox (Oracle) static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) 38814bbe3e3SMatthew Wilcox (Oracle) { 38914bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *old = 39014bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_old, struct interval_tree_node, rb); 39114bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *new = 39214bbe3e3SMatthew Wilcox (Oracle) rb_entry(rb_new, struct interval_tree_node, rb); 39314bbe3e3SMatthew Wilcox (Oracle) 39414bbe3e3SMatthew Wilcox (Oracle) new->__subtree_last = old->__subtree_last; 39514bbe3e3SMatthew Wilcox (Oracle) old->__subtree_last = compute_subtree_last(old); 39614bbe3e3SMatthew Wilcox (Oracle) } 39714bbe3e3SMatthew Wilcox (Oracle) 39814bbe3e3SMatthew Wilcox (Oracle) static const struct rb_augment_callbacks augment_callbacks = { 39914bbe3e3SMatthew Wilcox (Oracle) augment_propagate, augment_copy, augment_rotate 40014bbe3e3SMatthew Wilcox (Oracle) }; 40114bbe3e3SMatthew Wilcox (Oracle) 40214bbe3e3SMatthew Wilcox (Oracle) void interval_tree_insert(struct interval_tree_node *node, 40314bbe3e3SMatthew Wilcox (Oracle) struct rb_root *root) 40414bbe3e3SMatthew Wilcox (Oracle) { 40514bbe3e3SMatthew Wilcox (Oracle) struct rb_node **link = &root->rb_node, *rb_parent = NULL; 40614bbe3e3SMatthew Wilcox (Oracle) unsigned long start = node->start, last = node->last; 40714bbe3e3SMatthew Wilcox (Oracle) struct interval_tree_node *parent; 40814bbe3e3SMatthew Wilcox (Oracle) 40914bbe3e3SMatthew Wilcox (Oracle) while (*link) { 41014bbe3e3SMatthew Wilcox (Oracle) rb_parent = *link; 41114bbe3e3SMatthew Wilcox (Oracle) parent = rb_entry(rb_parent, struct interval_tree_node, rb); 41214bbe3e3SMatthew Wilcox (Oracle) if (parent->__subtree_last < last) 41314bbe3e3SMatthew Wilcox (Oracle) parent->__subtree_last = last; 41414bbe3e3SMatthew Wilcox (Oracle) if (start < parent->start) 41514bbe3e3SMatthew Wilcox (Oracle) link = &parent->rb.rb_left; 41614bbe3e3SMatthew Wilcox (Oracle) else 41714bbe3e3SMatthew Wilcox (Oracle) link = &parent->rb.rb_right; 41814bbe3e3SMatthew Wilcox (Oracle) } 41914bbe3e3SMatthew Wilcox (Oracle) 42014bbe3e3SMatthew Wilcox (Oracle) node->__subtree_last = last; 42114bbe3e3SMatthew Wilcox (Oracle) rb_link_node(&node->rb, rb_parent, link); 42214bbe3e3SMatthew Wilcox (Oracle) rb_insert_augmented(&node->rb, root, &augment_callbacks); 42314bbe3e3SMatthew Wilcox (Oracle) } 42414bbe3e3SMatthew Wilcox (Oracle) 42514bbe3e3SMatthew Wilcox (Oracle) void interval_tree_remove(struct interval_tree_node *node, 42614bbe3e3SMatthew Wilcox (Oracle) struct rb_root *root) 42714bbe3e3SMatthew Wilcox (Oracle) { 42814bbe3e3SMatthew Wilcox (Oracle) rb_erase_augmented(&node->rb, root, &augment_callbacks); 42914bbe3e3SMatthew Wilcox (Oracle) } 430