1*1ac00669SMauro Carvalho Chehab=================================================== 2*1ac00669SMauro Carvalho ChehabAdding reference counters (krefs) to kernel objects 3*1ac00669SMauro Carvalho Chehab=================================================== 4*1ac00669SMauro Carvalho Chehab 5*1ac00669SMauro Carvalho Chehab:Author: Corey Minyard <minyard@acm.org> 6*1ac00669SMauro Carvalho Chehab:Author: Thomas Hellstrom <thellstrom@vmware.com> 7*1ac00669SMauro Carvalho Chehab 8*1ac00669SMauro Carvalho ChehabA lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and 9*1ac00669SMauro Carvalho Chehabpresentation on krefs, which can be found at: 10*1ac00669SMauro Carvalho Chehab 11*1ac00669SMauro Carvalho Chehab - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf 12*1ac00669SMauro Carvalho Chehab - http://www.kroah.com/linux/talks/ols_2004_kref_talk/ 13*1ac00669SMauro Carvalho Chehab 14*1ac00669SMauro Carvalho ChehabIntroduction 15*1ac00669SMauro Carvalho Chehab============ 16*1ac00669SMauro Carvalho Chehab 17*1ac00669SMauro Carvalho Chehabkrefs allow you to add reference counters to your objects. If you 18*1ac00669SMauro Carvalho Chehabhave objects that are used in multiple places and passed around, and 19*1ac00669SMauro Carvalho Chehabyou don't have refcounts, your code is almost certainly broken. If 20*1ac00669SMauro Carvalho Chehabyou want refcounts, krefs are the way to go. 21*1ac00669SMauro Carvalho Chehab 22*1ac00669SMauro Carvalho ChehabTo use a kref, add one to your data structures like:: 23*1ac00669SMauro Carvalho Chehab 24*1ac00669SMauro Carvalho Chehab struct my_data 25*1ac00669SMauro Carvalho Chehab { 26*1ac00669SMauro Carvalho Chehab . 27*1ac00669SMauro Carvalho Chehab . 28*1ac00669SMauro Carvalho Chehab struct kref refcount; 29*1ac00669SMauro Carvalho Chehab . 30*1ac00669SMauro Carvalho Chehab . 31*1ac00669SMauro Carvalho Chehab }; 32*1ac00669SMauro Carvalho Chehab 33*1ac00669SMauro Carvalho ChehabThe kref can occur anywhere within the data structure. 34*1ac00669SMauro Carvalho Chehab 35*1ac00669SMauro Carvalho ChehabInitialization 36*1ac00669SMauro Carvalho Chehab============== 37*1ac00669SMauro Carvalho Chehab 38*1ac00669SMauro Carvalho ChehabYou must initialize the kref after you allocate it. To do this, call 39*1ac00669SMauro Carvalho Chehabkref_init as so:: 40*1ac00669SMauro Carvalho Chehab 41*1ac00669SMauro Carvalho Chehab struct my_data *data; 42*1ac00669SMauro Carvalho Chehab 43*1ac00669SMauro Carvalho Chehab data = kmalloc(sizeof(*data), GFP_KERNEL); 44*1ac00669SMauro Carvalho Chehab if (!data) 45*1ac00669SMauro Carvalho Chehab return -ENOMEM; 46*1ac00669SMauro Carvalho Chehab kref_init(&data->refcount); 47*1ac00669SMauro Carvalho Chehab 48*1ac00669SMauro Carvalho ChehabThis sets the refcount in the kref to 1. 49*1ac00669SMauro Carvalho Chehab 50*1ac00669SMauro Carvalho ChehabKref rules 51*1ac00669SMauro Carvalho Chehab========== 52*1ac00669SMauro Carvalho Chehab 53*1ac00669SMauro Carvalho ChehabOnce you have an initialized kref, you must follow the following 54*1ac00669SMauro Carvalho Chehabrules: 55*1ac00669SMauro Carvalho Chehab 56*1ac00669SMauro Carvalho Chehab1) If you make a non-temporary copy of a pointer, especially if 57*1ac00669SMauro Carvalho Chehab it can be passed to another thread of execution, you must 58*1ac00669SMauro Carvalho Chehab increment the refcount with kref_get() before passing it off:: 59*1ac00669SMauro Carvalho Chehab 60*1ac00669SMauro Carvalho Chehab kref_get(&data->refcount); 61*1ac00669SMauro Carvalho Chehab 62*1ac00669SMauro Carvalho Chehab If you already have a valid pointer to a kref-ed structure (the 63*1ac00669SMauro Carvalho Chehab refcount cannot go to zero) you may do this without a lock. 64*1ac00669SMauro Carvalho Chehab 65*1ac00669SMauro Carvalho Chehab2) When you are done with a pointer, you must call kref_put():: 66*1ac00669SMauro Carvalho Chehab 67*1ac00669SMauro Carvalho Chehab kref_put(&data->refcount, data_release); 68*1ac00669SMauro Carvalho Chehab 69*1ac00669SMauro Carvalho Chehab If this is the last reference to the pointer, the release 70*1ac00669SMauro Carvalho Chehab routine will be called. If the code never tries to get 71*1ac00669SMauro Carvalho Chehab a valid pointer to a kref-ed structure without already 72*1ac00669SMauro Carvalho Chehab holding a valid pointer, it is safe to do this without 73*1ac00669SMauro Carvalho Chehab a lock. 74*1ac00669SMauro Carvalho Chehab 75*1ac00669SMauro Carvalho Chehab3) If the code attempts to gain a reference to a kref-ed structure 76*1ac00669SMauro Carvalho Chehab without already holding a valid pointer, it must serialize access 77*1ac00669SMauro Carvalho Chehab where a kref_put() cannot occur during the kref_get(), and the 78*1ac00669SMauro Carvalho Chehab structure must remain valid during the kref_get(). 79*1ac00669SMauro Carvalho Chehab 80*1ac00669SMauro Carvalho ChehabFor example, if you allocate some data and then pass it to another 81*1ac00669SMauro Carvalho Chehabthread to process:: 82*1ac00669SMauro Carvalho Chehab 83*1ac00669SMauro Carvalho Chehab void data_release(struct kref *ref) 84*1ac00669SMauro Carvalho Chehab { 85*1ac00669SMauro Carvalho Chehab struct my_data *data = container_of(ref, struct my_data, refcount); 86*1ac00669SMauro Carvalho Chehab kfree(data); 87*1ac00669SMauro Carvalho Chehab } 88*1ac00669SMauro Carvalho Chehab 89*1ac00669SMauro Carvalho Chehab void more_data_handling(void *cb_data) 90*1ac00669SMauro Carvalho Chehab { 91*1ac00669SMauro Carvalho Chehab struct my_data *data = cb_data; 92*1ac00669SMauro Carvalho Chehab . 93*1ac00669SMauro Carvalho Chehab . do stuff with data here 94*1ac00669SMauro Carvalho Chehab . 95*1ac00669SMauro Carvalho Chehab kref_put(&data->refcount, data_release); 96*1ac00669SMauro Carvalho Chehab } 97*1ac00669SMauro Carvalho Chehab 98*1ac00669SMauro Carvalho Chehab int my_data_handler(void) 99*1ac00669SMauro Carvalho Chehab { 100*1ac00669SMauro Carvalho Chehab int rv = 0; 101*1ac00669SMauro Carvalho Chehab struct my_data *data; 102*1ac00669SMauro Carvalho Chehab struct task_struct *task; 103*1ac00669SMauro Carvalho Chehab data = kmalloc(sizeof(*data), GFP_KERNEL); 104*1ac00669SMauro Carvalho Chehab if (!data) 105*1ac00669SMauro Carvalho Chehab return -ENOMEM; 106*1ac00669SMauro Carvalho Chehab kref_init(&data->refcount); 107*1ac00669SMauro Carvalho Chehab 108*1ac00669SMauro Carvalho Chehab kref_get(&data->refcount); 109*1ac00669SMauro Carvalho Chehab task = kthread_run(more_data_handling, data, "more_data_handling"); 110*1ac00669SMauro Carvalho Chehab if (task == ERR_PTR(-ENOMEM)) { 111*1ac00669SMauro Carvalho Chehab rv = -ENOMEM; 112*1ac00669SMauro Carvalho Chehab kref_put(&data->refcount, data_release); 113*1ac00669SMauro Carvalho Chehab goto out; 114*1ac00669SMauro Carvalho Chehab } 115*1ac00669SMauro Carvalho Chehab 116*1ac00669SMauro Carvalho Chehab . 117*1ac00669SMauro Carvalho Chehab . do stuff with data here 118*1ac00669SMauro Carvalho Chehab . 119*1ac00669SMauro Carvalho Chehab out: 120*1ac00669SMauro Carvalho Chehab kref_put(&data->refcount, data_release); 121*1ac00669SMauro Carvalho Chehab return rv; 122*1ac00669SMauro Carvalho Chehab } 123*1ac00669SMauro Carvalho Chehab 124*1ac00669SMauro Carvalho ChehabThis way, it doesn't matter what order the two threads handle the 125*1ac00669SMauro Carvalho Chehabdata, the kref_put() handles knowing when the data is not referenced 126*1ac00669SMauro Carvalho Chehabany more and releasing it. The kref_get() does not require a lock, 127*1ac00669SMauro Carvalho Chehabsince we already have a valid pointer that we own a refcount for. The 128*1ac00669SMauro Carvalho Chehabput needs no lock because nothing tries to get the data without 129*1ac00669SMauro Carvalho Chehabalready holding a pointer. 130*1ac00669SMauro Carvalho Chehab 131*1ac00669SMauro Carvalho ChehabIn the above example, kref_put() will be called 2 times in both success 132*1ac00669SMauro Carvalho Chehaband error paths. This is necessary because the reference count got 133*1ac00669SMauro Carvalho Chehabincremented 2 times by kref_init() and kref_get(). 134*1ac00669SMauro Carvalho Chehab 135*1ac00669SMauro Carvalho ChehabNote that the "before" in rule 1 is very important. You should never 136*1ac00669SMauro Carvalho Chehabdo something like:: 137*1ac00669SMauro Carvalho Chehab 138*1ac00669SMauro Carvalho Chehab task = kthread_run(more_data_handling, data, "more_data_handling"); 139*1ac00669SMauro Carvalho Chehab if (task == ERR_PTR(-ENOMEM)) { 140*1ac00669SMauro Carvalho Chehab rv = -ENOMEM; 141*1ac00669SMauro Carvalho Chehab goto out; 142*1ac00669SMauro Carvalho Chehab } else 143*1ac00669SMauro Carvalho Chehab /* BAD BAD BAD - get is after the handoff */ 144*1ac00669SMauro Carvalho Chehab kref_get(&data->refcount); 145*1ac00669SMauro Carvalho Chehab 146*1ac00669SMauro Carvalho ChehabDon't assume you know what you are doing and use the above construct. 147*1ac00669SMauro Carvalho ChehabFirst of all, you may not know what you are doing. Second, you may 148*1ac00669SMauro Carvalho Chehabknow what you are doing (there are some situations where locking is 149*1ac00669SMauro Carvalho Chehabinvolved where the above may be legal) but someone else who doesn't 150*1ac00669SMauro Carvalho Chehabknow what they are doing may change the code or copy the code. It's 151*1ac00669SMauro Carvalho Chehabbad style. Don't do it. 152*1ac00669SMauro Carvalho Chehab 153*1ac00669SMauro Carvalho ChehabThere are some situations where you can optimize the gets and puts. 154*1ac00669SMauro Carvalho ChehabFor instance, if you are done with an object and enqueuing it for 155*1ac00669SMauro Carvalho Chehabsomething else or passing it off to something else, there is no reason 156*1ac00669SMauro Carvalho Chehabto do a get then a put:: 157*1ac00669SMauro Carvalho Chehab 158*1ac00669SMauro Carvalho Chehab /* Silly extra get and put */ 159*1ac00669SMauro Carvalho Chehab kref_get(&obj->ref); 160*1ac00669SMauro Carvalho Chehab enqueue(obj); 161*1ac00669SMauro Carvalho Chehab kref_put(&obj->ref, obj_cleanup); 162*1ac00669SMauro Carvalho Chehab 163*1ac00669SMauro Carvalho ChehabJust do the enqueue. A comment about this is always welcome:: 164*1ac00669SMauro Carvalho Chehab 165*1ac00669SMauro Carvalho Chehab enqueue(obj); 166*1ac00669SMauro Carvalho Chehab /* We are done with obj, so we pass our refcount off 167*1ac00669SMauro Carvalho Chehab to the queue. DON'T TOUCH obj AFTER HERE! */ 168*1ac00669SMauro Carvalho Chehab 169*1ac00669SMauro Carvalho ChehabThe last rule (rule 3) is the nastiest one to handle. Say, for 170*1ac00669SMauro Carvalho Chehabinstance, you have a list of items that are each kref-ed, and you wish 171*1ac00669SMauro Carvalho Chehabto get the first one. You can't just pull the first item off the list 172*1ac00669SMauro Carvalho Chehaband kref_get() it. That violates rule 3 because you are not already 173*1ac00669SMauro Carvalho Chehabholding a valid pointer. You must add a mutex (or some other lock). 174*1ac00669SMauro Carvalho ChehabFor instance:: 175*1ac00669SMauro Carvalho Chehab 176*1ac00669SMauro Carvalho Chehab static DEFINE_MUTEX(mutex); 177*1ac00669SMauro Carvalho Chehab static LIST_HEAD(q); 178*1ac00669SMauro Carvalho Chehab struct my_data 179*1ac00669SMauro Carvalho Chehab { 180*1ac00669SMauro Carvalho Chehab struct kref refcount; 181*1ac00669SMauro Carvalho Chehab struct list_head link; 182*1ac00669SMauro Carvalho Chehab }; 183*1ac00669SMauro Carvalho Chehab 184*1ac00669SMauro Carvalho Chehab static struct my_data *get_entry() 185*1ac00669SMauro Carvalho Chehab { 186*1ac00669SMauro Carvalho Chehab struct my_data *entry = NULL; 187*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 188*1ac00669SMauro Carvalho Chehab if (!list_empty(&q)) { 189*1ac00669SMauro Carvalho Chehab entry = container_of(q.next, struct my_data, link); 190*1ac00669SMauro Carvalho Chehab kref_get(&entry->refcount); 191*1ac00669SMauro Carvalho Chehab } 192*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 193*1ac00669SMauro Carvalho Chehab return entry; 194*1ac00669SMauro Carvalho Chehab } 195*1ac00669SMauro Carvalho Chehab 196*1ac00669SMauro Carvalho Chehab static void release_entry(struct kref *ref) 197*1ac00669SMauro Carvalho Chehab { 198*1ac00669SMauro Carvalho Chehab struct my_data *entry = container_of(ref, struct my_data, refcount); 199*1ac00669SMauro Carvalho Chehab 200*1ac00669SMauro Carvalho Chehab list_del(&entry->link); 201*1ac00669SMauro Carvalho Chehab kfree(entry); 202*1ac00669SMauro Carvalho Chehab } 203*1ac00669SMauro Carvalho Chehab 204*1ac00669SMauro Carvalho Chehab static void put_entry(struct my_data *entry) 205*1ac00669SMauro Carvalho Chehab { 206*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 207*1ac00669SMauro Carvalho Chehab kref_put(&entry->refcount, release_entry); 208*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 209*1ac00669SMauro Carvalho Chehab } 210*1ac00669SMauro Carvalho Chehab 211*1ac00669SMauro Carvalho ChehabThe kref_put() return value is useful if you do not want to hold the 212*1ac00669SMauro Carvalho Chehablock during the whole release operation. Say you didn't want to call 213*1ac00669SMauro Carvalho Chehabkfree() with the lock held in the example above (since it is kind of 214*1ac00669SMauro Carvalho Chehabpointless to do so). You could use kref_put() as follows:: 215*1ac00669SMauro Carvalho Chehab 216*1ac00669SMauro Carvalho Chehab static void release_entry(struct kref *ref) 217*1ac00669SMauro Carvalho Chehab { 218*1ac00669SMauro Carvalho Chehab /* All work is done after the return from kref_put(). */ 219*1ac00669SMauro Carvalho Chehab } 220*1ac00669SMauro Carvalho Chehab 221*1ac00669SMauro Carvalho Chehab static void put_entry(struct my_data *entry) 222*1ac00669SMauro Carvalho Chehab { 223*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 224*1ac00669SMauro Carvalho Chehab if (kref_put(&entry->refcount, release_entry)) { 225*1ac00669SMauro Carvalho Chehab list_del(&entry->link); 226*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 227*1ac00669SMauro Carvalho Chehab kfree(entry); 228*1ac00669SMauro Carvalho Chehab } else 229*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 230*1ac00669SMauro Carvalho Chehab } 231*1ac00669SMauro Carvalho Chehab 232*1ac00669SMauro Carvalho ChehabThis is really more useful if you have to call other routines as part 233*1ac00669SMauro Carvalho Chehabof the free operations that could take a long time or might claim the 234*1ac00669SMauro Carvalho Chehabsame lock. Note that doing everything in the release routine is still 235*1ac00669SMauro Carvalho Chehabpreferred as it is a little neater. 236*1ac00669SMauro Carvalho Chehab 237*1ac00669SMauro Carvalho ChehabThe above example could also be optimized using kref_get_unless_zero() in 238*1ac00669SMauro Carvalho Chehabthe following way:: 239*1ac00669SMauro Carvalho Chehab 240*1ac00669SMauro Carvalho Chehab static struct my_data *get_entry() 241*1ac00669SMauro Carvalho Chehab { 242*1ac00669SMauro Carvalho Chehab struct my_data *entry = NULL; 243*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 244*1ac00669SMauro Carvalho Chehab if (!list_empty(&q)) { 245*1ac00669SMauro Carvalho Chehab entry = container_of(q.next, struct my_data, link); 246*1ac00669SMauro Carvalho Chehab if (!kref_get_unless_zero(&entry->refcount)) 247*1ac00669SMauro Carvalho Chehab entry = NULL; 248*1ac00669SMauro Carvalho Chehab } 249*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 250*1ac00669SMauro Carvalho Chehab return entry; 251*1ac00669SMauro Carvalho Chehab } 252*1ac00669SMauro Carvalho Chehab 253*1ac00669SMauro Carvalho Chehab static void release_entry(struct kref *ref) 254*1ac00669SMauro Carvalho Chehab { 255*1ac00669SMauro Carvalho Chehab struct my_data *entry = container_of(ref, struct my_data, refcount); 256*1ac00669SMauro Carvalho Chehab 257*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 258*1ac00669SMauro Carvalho Chehab list_del(&entry->link); 259*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 260*1ac00669SMauro Carvalho Chehab kfree(entry); 261*1ac00669SMauro Carvalho Chehab } 262*1ac00669SMauro Carvalho Chehab 263*1ac00669SMauro Carvalho Chehab static void put_entry(struct my_data *entry) 264*1ac00669SMauro Carvalho Chehab { 265*1ac00669SMauro Carvalho Chehab kref_put(&entry->refcount, release_entry); 266*1ac00669SMauro Carvalho Chehab } 267*1ac00669SMauro Carvalho Chehab 268*1ac00669SMauro Carvalho ChehabWhich is useful to remove the mutex lock around kref_put() in put_entry(), but 269*1ac00669SMauro Carvalho Chehabit's important that kref_get_unless_zero is enclosed in the same critical 270*1ac00669SMauro Carvalho Chehabsection that finds the entry in the lookup table, 271*1ac00669SMauro Carvalho Chehabotherwise kref_get_unless_zero may reference already freed memory. 272*1ac00669SMauro Carvalho ChehabNote that it is illegal to use kref_get_unless_zero without checking its 273*1ac00669SMauro Carvalho Chehabreturn value. If you are sure (by already having a valid pointer) that 274*1ac00669SMauro Carvalho Chehabkref_get_unless_zero() will return true, then use kref_get() instead. 275*1ac00669SMauro Carvalho Chehab 276*1ac00669SMauro Carvalho ChehabKrefs and RCU 277*1ac00669SMauro Carvalho Chehab============= 278*1ac00669SMauro Carvalho Chehab 279*1ac00669SMauro Carvalho ChehabThe function kref_get_unless_zero also makes it possible to use rcu 280*1ac00669SMauro Carvalho Chehablocking for lookups in the above example:: 281*1ac00669SMauro Carvalho Chehab 282*1ac00669SMauro Carvalho Chehab struct my_data 283*1ac00669SMauro Carvalho Chehab { 284*1ac00669SMauro Carvalho Chehab struct rcu_head rhead; 285*1ac00669SMauro Carvalho Chehab . 286*1ac00669SMauro Carvalho Chehab struct kref refcount; 287*1ac00669SMauro Carvalho Chehab . 288*1ac00669SMauro Carvalho Chehab . 289*1ac00669SMauro Carvalho Chehab }; 290*1ac00669SMauro Carvalho Chehab 291*1ac00669SMauro Carvalho Chehab static struct my_data *get_entry_rcu() 292*1ac00669SMauro Carvalho Chehab { 293*1ac00669SMauro Carvalho Chehab struct my_data *entry = NULL; 294*1ac00669SMauro Carvalho Chehab rcu_read_lock(); 295*1ac00669SMauro Carvalho Chehab if (!list_empty(&q)) { 296*1ac00669SMauro Carvalho Chehab entry = container_of(q.next, struct my_data, link); 297*1ac00669SMauro Carvalho Chehab if (!kref_get_unless_zero(&entry->refcount)) 298*1ac00669SMauro Carvalho Chehab entry = NULL; 299*1ac00669SMauro Carvalho Chehab } 300*1ac00669SMauro Carvalho Chehab rcu_read_unlock(); 301*1ac00669SMauro Carvalho Chehab return entry; 302*1ac00669SMauro Carvalho Chehab } 303*1ac00669SMauro Carvalho Chehab 304*1ac00669SMauro Carvalho Chehab static void release_entry_rcu(struct kref *ref) 305*1ac00669SMauro Carvalho Chehab { 306*1ac00669SMauro Carvalho Chehab struct my_data *entry = container_of(ref, struct my_data, refcount); 307*1ac00669SMauro Carvalho Chehab 308*1ac00669SMauro Carvalho Chehab mutex_lock(&mutex); 309*1ac00669SMauro Carvalho Chehab list_del_rcu(&entry->link); 310*1ac00669SMauro Carvalho Chehab mutex_unlock(&mutex); 311*1ac00669SMauro Carvalho Chehab kfree_rcu(entry, rhead); 312*1ac00669SMauro Carvalho Chehab } 313*1ac00669SMauro Carvalho Chehab 314*1ac00669SMauro Carvalho Chehab static void put_entry(struct my_data *entry) 315*1ac00669SMauro Carvalho Chehab { 316*1ac00669SMauro Carvalho Chehab kref_put(&entry->refcount, release_entry_rcu); 317*1ac00669SMauro Carvalho Chehab } 318*1ac00669SMauro Carvalho Chehab 319*1ac00669SMauro Carvalho ChehabBut note that the struct kref member needs to remain in valid memory for a 320*1ac00669SMauro Carvalho Chehabrcu grace period after release_entry_rcu was called. That can be accomplished 321*1ac00669SMauro Carvalho Chehabby using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu() 322*1ac00669SMauro Carvalho Chehabbefore using kfree, but note that synchronize_rcu() may sleep for a 323*1ac00669SMauro Carvalho Chehabsubstantial amount of time. 324