1=========================== 2Unreliable Guide To Locking 3=========================== 4 5:Author: Rusty Russell 6 7Introduction 8============ 9 10Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking 11issues. This document describes the locking systems in the Linux Kernel 12in 2.6. 13 14With the wide availability of HyperThreading, and preemption in the 15Linux Kernel, everyone hacking on the kernel needs to know the 16fundamentals of concurrency and locking for SMP. 17 18The Problem With Concurrency 19============================ 20 21(Skip this if you know what a Race Condition is). 22 23In a normal program, you can increment a counter like so: 24 25:: 26 27 very_important_count++; 28 29 30This is what they would expect to happen: 31 32 33.. table:: Expected Results 34 35 +------------------------------------+------------------------------------+ 36 | Instance 1 | Instance 2 | 37 +====================================+====================================+ 38 | read very_important_count (5) | | 39 +------------------------------------+------------------------------------+ 40 | add 1 (6) | | 41 +------------------------------------+------------------------------------+ 42 | write very_important_count (6) | | 43 +------------------------------------+------------------------------------+ 44 | | read very_important_count (6) | 45 +------------------------------------+------------------------------------+ 46 | | add 1 (7) | 47 +------------------------------------+------------------------------------+ 48 | | write very_important_count (7) | 49 +------------------------------------+------------------------------------+ 50 51This is what might happen: 52 53.. table:: Possible Results 54 55 +------------------------------------+------------------------------------+ 56 | Instance 1 | Instance 2 | 57 +====================================+====================================+ 58 | read very_important_count (5) | | 59 +------------------------------------+------------------------------------+ 60 | | read very_important_count (5) | 61 +------------------------------------+------------------------------------+ 62 | add 1 (6) | | 63 +------------------------------------+------------------------------------+ 64 | | add 1 (6) | 65 +------------------------------------+------------------------------------+ 66 | write very_important_count (6) | | 67 +------------------------------------+------------------------------------+ 68 | | write very_important_count (6) | 69 +------------------------------------+------------------------------------+ 70 71 72Race Conditions and Critical Regions 73------------------------------------ 74 75This overlap, where the result depends on the relative timing of 76multiple tasks, is called a race condition. The piece of code containing 77the concurrency issue is called a critical region. And especially since 78Linux starting running on SMP machines, they became one of the major 79issues in kernel design and implementation. 80 81Preemption can have the same effect, even if there is only one CPU: by 82preempting one task during the critical region, we have exactly the same 83race condition. In this case the thread which preempts might run the 84critical region itself. 85 86The solution is to recognize when these simultaneous accesses occur, and 87use locks to make sure that only one instance can enter the critical 88region at any time. There are many friendly primitives in the Linux 89kernel to help you do this. And then there are the unfriendly 90primitives, but I'll pretend they don't exist. 91 92Locking in the Linux Kernel 93=========================== 94 95If I could give you one piece of advice: never sleep with anyone crazier 96than yourself. But if I had to give you advice on locking: **keep it 97simple**. 98 99Be reluctant to introduce new locks. 100 101Strangely enough, this last one is the exact reverse of my advice when 102you **have** slept with someone crazier than yourself. And you should 103think about getting a big dog. 104 105Two Main Types of Kernel Locks: Spinlocks and Mutexes 106----------------------------------------------------- 107 108There are two main types of kernel locks. The fundamental type is the 109spinlock (``include/asm/spinlock.h``), which is a very simple 110single-holder lock: if you can't get the spinlock, you keep trying 111(spinning) until you can. Spinlocks are very small and fast, and can be 112used anywhere. 113 114The second type is a mutex (``include/linux/mutex.h``): it is like a 115spinlock, but you may block holding a mutex. If you can't lock a mutex, 116your task will suspend itself, and be woken up when the mutex is 117released. This means the CPU can do something else while you are 118waiting. There are many cases when you simply can't sleep (see 119`What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__), 120and so have to use a spinlock instead. 121 122Neither type of lock is recursive: see 123`Deadlock: Simple and Advanced <#deadlock>`__. 124 125Locks and Uniprocessor Kernels 126------------------------------ 127 128For kernels compiled without ``CONFIG_SMP``, and without 129``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent 130design decision: when no-one else can run at the same time, there is no 131reason to have a lock. 132 133If the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT`` 134is set, then spinlocks simply disable preemption, which is sufficient to 135prevent any races. For most purposes, we can think of preemption as 136equivalent to SMP, and not worry about it separately. 137 138You should always test your locking code with ``CONFIG_SMP`` and 139``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box, 140because it will still catch some kinds of locking bugs. 141 142Mutexes still exist, because they are required for synchronization 143between user contexts, as we will see below. 144 145Locking Only In User Context 146---------------------------- 147 148If you have a data structure which is only ever accessed from user 149context, then you can use a simple mutex (``include/linux/mutex.h``) to 150protect it. This is the most trivial case: you initialize the mutex. 151Then you can call :c:func:`mutex_lock_interruptible()` to grab the 152mutex, and :c:func:`mutex_unlock()` to release it. There is also a 153:c:func:`mutex_lock()`, which should be avoided, because it will 154not return if a signal is received. 155 156Example: ``net/netfilter/nf_sockopt.c`` allows registration of new 157:c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with 158:c:func:`nf_register_sockopt()`. Registration and de-registration 159are only done on module load and unload (and boot time, where there is 160no concurrency), and the list of registrations is only consulted for an 161unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system 162call. The ``nf_sockopt_mutex`` is perfect to protect this, especially 163since the setsockopt and getsockopt calls may well sleep. 164 165Locking Between User Context and Softirqs 166----------------------------------------- 167 168If a softirq shares data with user context, you have two problems. 169Firstly, the current user context can be interrupted by a softirq, and 170secondly, the critical region could be entered from another CPU. This is 171where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is 172used. It disables softirqs on that CPU, then grabs the lock. 173:c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is 174a historical reference to "Bottom Halves", the old name for software 175interrupts. It should really be called spin_lock_softirq()' in a 176perfect world). 177 178Note that you can also use :c:func:`spin_lock_irq()` or 179:c:func:`spin_lock_irqsave()` here, which stop hardware interrupts 180as well: see `Hard IRQ Context <#hardirq-context>`__. 181 182This works perfectly for UP as well: the spin lock vanishes, and this 183macro simply becomes :c:func:`local_bh_disable()` 184(``include/linux/interrupt.h``), which protects you from the softirq 185being run. 186 187Locking Between User Context and Tasklets 188----------------------------------------- 189 190This is exactly the same as above, because tasklets are actually run 191from a softirq. 192 193Locking Between User Context and Timers 194--------------------------------------- 195 196This, too, is exactly the same as above, because timers are actually run 197from a softirq. From a locking point of view, tasklets and timers are 198identical. 199 200Locking Between Tasklets/Timers 201------------------------------- 202 203Sometimes a tasklet or timer might want to share data with another 204tasklet or timer. 205 206The Same Tasklet/Timer 207~~~~~~~~~~~~~~~~~~~~~~ 208 209Since a tasklet is never run on two CPUs at once, you don't need to 210worry about your tasklet being reentrant (running twice at once), even 211on SMP. 212 213Different Tasklets/Timers 214~~~~~~~~~~~~~~~~~~~~~~~~~ 215 216If another tasklet/timer wants to share data with your tasklet or timer 217, you will both need to use :c:func:`spin_lock()` and 218:c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is 219unnecessary here, as you are already in a tasklet, and none will be run 220on the same CPU. 221 222Locking Between Softirqs 223------------------------ 224 225Often a softirq might want to share data with itself or a tasklet/timer. 226 227The Same Softirq 228~~~~~~~~~~~~~~~~ 229 230The same softirq can run on the other CPUs: you can use a per-CPU array 231(see `Per-CPU Data <#per-cpu>`__) for better performance. If you're 232going so far as to use a softirq, you probably care about scalable 233performance enough to justify the extra complexity. 234 235You'll need to use :c:func:`spin_lock()` and 236:c:func:`spin_unlock()` for shared data. 237 238Different Softirqs 239~~~~~~~~~~~~~~~~~~ 240 241You'll need to use :c:func:`spin_lock()` and 242:c:func:`spin_unlock()` for shared data, whether it be a timer, 243tasklet, different softirq or the same or another softirq: any of them 244could be running on a different CPU. 245 246Hard IRQ Context 247================ 248 249Hardware interrupts usually communicate with a tasklet or softirq. 250Frequently this involves putting work in a queue, which the softirq will 251take out. 252 253Locking Between Hard IRQ and Softirqs/Tasklets 254---------------------------------------------- 255 256If a hardware irq handler shares data with a softirq, you have two 257concerns. Firstly, the softirq processing can be interrupted by a 258hardware interrupt, and secondly, the critical region could be entered 259by a hardware interrupt on another CPU. This is where 260:c:func:`spin_lock_irq()` is used. It is defined to disable 261interrupts on that cpu, then grab the lock. 262:c:func:`spin_unlock_irq()` does the reverse. 263 264The irq handler does not to use :c:func:`spin_lock_irq()`, because 265the softirq cannot run while the irq handler is running: it can use 266:c:func:`spin_lock()`, which is slightly faster. The only exception 267would be if a different hardware irq handler uses the same lock: 268:c:func:`spin_lock_irq()` will stop that from interrupting us. 269 270This works perfectly for UP as well: the spin lock vanishes, and this 271macro simply becomes :c:func:`local_irq_disable()` 272(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH 273being run. 274 275:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a 276variant which saves whether interrupts were on or off in a flags word, 277which is passed to :c:func:`spin_unlock_irqrestore()`. This means 278that the same code can be used inside an hard irq handler (where 279interrupts are already off) and in softirqs (where the irq disabling is 280required). 281 282Note that softirqs (and hence tasklets and timers) are run on return 283from hardware interrupts, so :c:func:`spin_lock_irq()` also stops 284these. In that sense, :c:func:`spin_lock_irqsave()` is the most 285general and powerful locking function. 286 287Locking Between Two Hard IRQ Handlers 288------------------------------------- 289 290It is rare to have to share data between two IRQ handlers, but if you 291do, :c:func:`spin_lock_irqsave()` should be used: it is 292architecture-specific whether all interrupts are disabled inside irq 293handlers themselves. 294 295Cheat Sheet For Locking 296======================= 297 298Pete Zaitcev gives the following summary: 299 300- If you are in a process context (any syscall) and want to lock other 301 process out, use a mutex. You can take a mutex and sleep 302 (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``). 303 304- Otherwise (== data can be touched in an interrupt), use 305 :c:func:`spin_lock_irqsave()` and 306 :c:func:`spin_unlock_irqrestore()`. 307 308- Avoid holding spinlock for more than 5 lines of code and across any 309 function call (except accessors like :c:func:`readb()`). 310 311Table of Minimum Requirements 312----------------------------- 313 314The following table lists the **minimum** locking requirements between 315various contexts. In some cases, the same context can only be running on 316one CPU at a time, so no locking is required for that context (eg. a 317particular thread can only run on one CPU at a time, but if it needs 318shares data with another thread, locking is required). 319 320Remember the advice above: you can always use 321:c:func:`spin_lock_irqsave()`, which is a superset of all other 322spinlock primitives. 323 324============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 325. IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B 326============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 327IRQ Handler A None 328IRQ Handler B SLIS None 329Softirq A SLI SLI SL 330Softirq B SLI SLI SL SL 331Tasklet A SLI SLI SL SL None 332Tasklet B SLI SLI SL SL SL None 333Timer A SLI SLI SL SL SL SL None 334Timer B SLI SLI SL SL SL SL SL None 335User Context A SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH None 336User Context B SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH MLI None 337============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 338 339Table: Table of Locking Requirements 340 341+--------+----------------------------+ 342| SLIS | spin_lock_irqsave | 343+--------+----------------------------+ 344| SLI | spin_lock_irq | 345+--------+----------------------------+ 346| SL | spin_lock | 347+--------+----------------------------+ 348| SLBH | spin_lock_bh | 349+--------+----------------------------+ 350| MLI | mutex_lock_interruptible | 351+--------+----------------------------+ 352 353Table: Legend for Locking Requirements Table 354 355The trylock Functions 356===================== 357 358There are functions that try to acquire a lock only once and immediately 359return a value telling about success or failure to acquire the lock. 360They can be used if you need no access to the data protected with the 361lock when some other thread is holding the lock. You should acquire the 362lock later if you then need access to the data protected with the lock. 363 364:c:func:`spin_trylock()` does not spin but returns non-zero if it 365acquires the spinlock on the first try or 0 if not. This function can be 366used in all contexts like :c:func:`spin_lock()`: you must have 367disabled the contexts that might interrupt you and acquire the spin 368lock. 369 370:c:func:`mutex_trylock()` does not suspend your task but returns 371non-zero if it could lock the mutex on the first try or 0 if not. This 372function cannot be safely used in hardware or software interrupt 373contexts despite not sleeping. 374 375Common Examples 376=============== 377 378Let's step through a simple example: a cache of number to name mappings. 379The cache keeps a count of how often each of the objects is used, and 380when it gets full, throws out the least used one. 381 382All In User Context 383------------------- 384 385For our first example, we assume that all operations are in user context 386(ie. from system calls), so we can sleep. This means we can use a mutex 387to protect the cache and all the objects within it. Here's the code:: 388 389 #include <linux/list.h> 390 #include <linux/slab.h> 391 #include <linux/string.h> 392 #include <linux/mutex.h> 393 #include <asm/errno.h> 394 395 struct object 396 { 397 struct list_head list; 398 int id; 399 char name[32]; 400 int popularity; 401 }; 402 403 /* Protects the cache, cache_num, and the objects within it */ 404 static DEFINE_MUTEX(cache_lock); 405 static LIST_HEAD(cache); 406 static unsigned int cache_num = 0; 407 #define MAX_CACHE_SIZE 10 408 409 /* Must be holding cache_lock */ 410 static struct object *__cache_find(int id) 411 { 412 struct object *i; 413 414 list_for_each_entry(i, &cache, list) 415 if (i->id == id) { 416 i->popularity++; 417 return i; 418 } 419 return NULL; 420 } 421 422 /* Must be holding cache_lock */ 423 static void __cache_delete(struct object *obj) 424 { 425 BUG_ON(!obj); 426 list_del(&obj->list); 427 kfree(obj); 428 cache_num--; 429 } 430 431 /* Must be holding cache_lock */ 432 static void __cache_add(struct object *obj) 433 { 434 list_add(&obj->list, &cache); 435 if (++cache_num > MAX_CACHE_SIZE) { 436 struct object *i, *outcast = NULL; 437 list_for_each_entry(i, &cache, list) { 438 if (!outcast || i->popularity < outcast->popularity) 439 outcast = i; 440 } 441 __cache_delete(outcast); 442 } 443 } 444 445 int cache_add(int id, const char *name) 446 { 447 struct object *obj; 448 449 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) 450 return -ENOMEM; 451 452 strlcpy(obj->name, name, sizeof(obj->name)); 453 obj->id = id; 454 obj->popularity = 0; 455 456 mutex_lock(&cache_lock); 457 __cache_add(obj); 458 mutex_unlock(&cache_lock); 459 return 0; 460 } 461 462 void cache_delete(int id) 463 { 464 mutex_lock(&cache_lock); 465 __cache_delete(__cache_find(id)); 466 mutex_unlock(&cache_lock); 467 } 468 469 int cache_find(int id, char *name) 470 { 471 struct object *obj; 472 int ret = -ENOENT; 473 474 mutex_lock(&cache_lock); 475 obj = __cache_find(id); 476 if (obj) { 477 ret = 0; 478 strcpy(name, obj->name); 479 } 480 mutex_unlock(&cache_lock); 481 return ret; 482 } 483 484Note that we always make sure we have the cache_lock when we add, 485delete, or look up the cache: both the cache infrastructure itself and 486the contents of the objects are protected by the lock. In this case it's 487easy, since we copy the data for the user, and never let them access the 488objects directly. 489 490There is a slight (and common) optimization here: in 491:c:func:`cache_add()` we set up the fields of the object before 492grabbing the lock. This is safe, as no-one else can access it until we 493put it in cache. 494 495Accessing From Interrupt Context 496-------------------------------- 497 498Now consider the case where :c:func:`cache_find()` can be called 499from interrupt context: either a hardware interrupt or a softirq. An 500example would be a timer which deletes object from the cache. 501 502The change is shown below, in standard patch format: the ``-`` are lines 503which are taken away, and the ``+`` are lines which are added. 504 505:: 506 507 --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100 508 +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100 509 @@ -12,7 +12,7 @@ 510 int popularity; 511 }; 512 513 -static DEFINE_MUTEX(cache_lock); 514 +static DEFINE_SPINLOCK(cache_lock); 515 static LIST_HEAD(cache); 516 static unsigned int cache_num = 0; 517 #define MAX_CACHE_SIZE 10 518 @@ -55,6 +55,7 @@ 519 int cache_add(int id, const char *name) 520 { 521 struct object *obj; 522 + unsigned long flags; 523 524 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) 525 return -ENOMEM; 526 @@ -63,30 +64,33 @@ 527 obj->id = id; 528 obj->popularity = 0; 529 530 - mutex_lock(&cache_lock); 531 + spin_lock_irqsave(&cache_lock, flags); 532 __cache_add(obj); 533 - mutex_unlock(&cache_lock); 534 + spin_unlock_irqrestore(&cache_lock, flags); 535 return 0; 536 } 537 538 void cache_delete(int id) 539 { 540 - mutex_lock(&cache_lock); 541 + unsigned long flags; 542 + 543 + spin_lock_irqsave(&cache_lock, flags); 544 __cache_delete(__cache_find(id)); 545 - mutex_unlock(&cache_lock); 546 + spin_unlock_irqrestore(&cache_lock, flags); 547 } 548 549 int cache_find(int id, char *name) 550 { 551 struct object *obj; 552 int ret = -ENOENT; 553 + unsigned long flags; 554 555 - mutex_lock(&cache_lock); 556 + spin_lock_irqsave(&cache_lock, flags); 557 obj = __cache_find(id); 558 if (obj) { 559 ret = 0; 560 strcpy(name, obj->name); 561 } 562 - mutex_unlock(&cache_lock); 563 + spin_unlock_irqrestore(&cache_lock, flags); 564 return ret; 565 } 566 567Note that the :c:func:`spin_lock_irqsave()` will turn off 568interrupts if they are on, otherwise does nothing (if we are already in 569an interrupt handler), hence these functions are safe to call from any 570context. 571 572Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()` 573with the ``GFP_KERNEL`` flag, which is only legal in user context. I 574have assumed that :c:func:`cache_add()` is still only called in 575user context, otherwise this should become a parameter to 576:c:func:`cache_add()`. 577 578Exposing Objects Outside This File 579---------------------------------- 580 581If our objects contained more information, it might not be sufficient to 582copy the information in and out: other parts of the code might want to 583keep pointers to these objects, for example, rather than looking up the 584id every time. This produces two problems. 585 586The first problem is that we use the ``cache_lock`` to protect objects: 587we'd need to make this non-static so the rest of the code can use it. 588This makes locking trickier, as it is no longer all in one place. 589 590The second problem is the lifetime problem: if another structure keeps a 591pointer to an object, it presumably expects that pointer to remain 592valid. Unfortunately, this is only guaranteed while you hold the lock, 593otherwise someone might call :c:func:`cache_delete()` and even 594worse, add another object, re-using the same address. 595 596As there is only one lock, you can't hold it forever: no-one else would 597get any work done. 598 599The solution to this problem is to use a reference count: everyone who 600has a pointer to the object increases it when they first get the object, 601and drops the reference count when they're finished with it. Whoever 602drops it to zero knows it is unused, and can actually delete it. 603 604Here is the code:: 605 606 --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100 607 +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100 608 @@ -7,6 +7,7 @@ 609 struct object 610 { 611 struct list_head list; 612 + unsigned int refcnt; 613 int id; 614 char name[32]; 615 int popularity; 616 @@ -17,6 +18,35 @@ 617 static unsigned int cache_num = 0; 618 #define MAX_CACHE_SIZE 10 619 620 +static void __object_put(struct object *obj) 621 +{ 622 + if (--obj->refcnt == 0) 623 + kfree(obj); 624 +} 625 + 626 +static void __object_get(struct object *obj) 627 +{ 628 + obj->refcnt++; 629 +} 630 + 631 +void object_put(struct object *obj) 632 +{ 633 + unsigned long flags; 634 + 635 + spin_lock_irqsave(&cache_lock, flags); 636 + __object_put(obj); 637 + spin_unlock_irqrestore(&cache_lock, flags); 638 +} 639 + 640 +void object_get(struct object *obj) 641 +{ 642 + unsigned long flags; 643 + 644 + spin_lock_irqsave(&cache_lock, flags); 645 + __object_get(obj); 646 + spin_unlock_irqrestore(&cache_lock, flags); 647 +} 648 + 649 /* Must be holding cache_lock */ 650 static struct object *__cache_find(int id) 651 { 652 @@ -35,6 +65,7 @@ 653 { 654 BUG_ON(!obj); 655 list_del(&obj->list); 656 + __object_put(obj); 657 cache_num--; 658 } 659 660 @@ -63,6 +94,7 @@ 661 strlcpy(obj->name, name, sizeof(obj->name)); 662 obj->id = id; 663 obj->popularity = 0; 664 + obj->refcnt = 1; /* The cache holds a reference */ 665 666 spin_lock_irqsave(&cache_lock, flags); 667 __cache_add(obj); 668 @@ -79,18 +111,15 @@ 669 spin_unlock_irqrestore(&cache_lock, flags); 670 } 671 672 -int cache_find(int id, char *name) 673 +struct object *cache_find(int id) 674 { 675 struct object *obj; 676 - int ret = -ENOENT; 677 unsigned long flags; 678 679 spin_lock_irqsave(&cache_lock, flags); 680 obj = __cache_find(id); 681 - if (obj) { 682 - ret = 0; 683 - strcpy(name, obj->name); 684 - } 685 + if (obj) 686 + __object_get(obj); 687 spin_unlock_irqrestore(&cache_lock, flags); 688 - return ret; 689 + return obj; 690 } 691 692We encapsulate the reference counting in the standard 'get' and 'put' 693functions. Now we can return the object itself from 694:c:func:`cache_find()` which has the advantage that the user can 695now sleep holding the object (eg. to :c:func:`copy_to_user()` to 696name to userspace). 697 698The other point to note is that I said a reference should be held for 699every pointer to the object: thus the reference count is 1 when first 700inserted into the cache. In some versions the framework does not hold a 701reference count, but they are more complicated. 702 703Using Atomic Operations For The Reference Count 704~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 705 706In practice, :c:type:`atomic_t` would usually be used for refcnt. There are a 707number of atomic operations defined in ``include/asm/atomic.h``: these 708are guaranteed to be seen atomically from all CPUs in the system, so no 709lock is required. In this case, it is simpler than using spinlocks, 710although for anything non-trivial using spinlocks is clearer. The 711:c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()` 712are used instead of the standard increment and decrement operators, and 713the lock is no longer used to protect the reference count itself. 714 715:: 716 717 --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100 718 +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100 719 @@ -7,7 +7,7 @@ 720 struct object 721 { 722 struct list_head list; 723 - unsigned int refcnt; 724 + atomic_t refcnt; 725 int id; 726 char name[32]; 727 int popularity; 728 @@ -18,33 +18,15 @@ 729 static unsigned int cache_num = 0; 730 #define MAX_CACHE_SIZE 10 731 732 -static void __object_put(struct object *obj) 733 -{ 734 - if (--obj->refcnt == 0) 735 - kfree(obj); 736 -} 737 - 738 -static void __object_get(struct object *obj) 739 -{ 740 - obj->refcnt++; 741 -} 742 - 743 void object_put(struct object *obj) 744 { 745 - unsigned long flags; 746 - 747 - spin_lock_irqsave(&cache_lock, flags); 748 - __object_put(obj); 749 - spin_unlock_irqrestore(&cache_lock, flags); 750 + if (atomic_dec_and_test(&obj->refcnt)) 751 + kfree(obj); 752 } 753 754 void object_get(struct object *obj) 755 { 756 - unsigned long flags; 757 - 758 - spin_lock_irqsave(&cache_lock, flags); 759 - __object_get(obj); 760 - spin_unlock_irqrestore(&cache_lock, flags); 761 + atomic_inc(&obj->refcnt); 762 } 763 764 /* Must be holding cache_lock */ 765 @@ -65,7 +47,7 @@ 766 { 767 BUG_ON(!obj); 768 list_del(&obj->list); 769 - __object_put(obj); 770 + object_put(obj); 771 cache_num--; 772 } 773 774 @@ -94,7 +76,7 @@ 775 strlcpy(obj->name, name, sizeof(obj->name)); 776 obj->id = id; 777 obj->popularity = 0; 778 - obj->refcnt = 1; /* The cache holds a reference */ 779 + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ 780 781 spin_lock_irqsave(&cache_lock, flags); 782 __cache_add(obj); 783 @@ -119,7 +101,7 @@ 784 spin_lock_irqsave(&cache_lock, flags); 785 obj = __cache_find(id); 786 if (obj) 787 - __object_get(obj); 788 + object_get(obj); 789 spin_unlock_irqrestore(&cache_lock, flags); 790 return obj; 791 } 792 793Protecting The Objects Themselves 794--------------------------------- 795 796In these examples, we assumed that the objects (except the reference 797counts) never changed once they are created. If we wanted to allow the 798name to change, there are three possibilities: 799 800- You can make ``cache_lock`` non-static, and tell people to grab that 801 lock before changing the name in any object. 802 803- You can provide a :c:func:`cache_obj_rename()` which grabs this 804 lock and changes the name for the caller, and tell everyone to use 805 that function. 806 807- You can make the ``cache_lock`` protect only the cache itself, and 808 use another lock to protect the name. 809 810Theoretically, you can make the locks as fine-grained as one lock for 811every field, for every object. In practice, the most common variants 812are: 813 814- One lock which protects the infrastructure (the ``cache`` list in 815 this example) and all the objects. This is what we have done so far. 816 817- One lock which protects the infrastructure (including the list 818 pointers inside the objects), and one lock inside the object which 819 protects the rest of that object. 820 821- Multiple locks to protect the infrastructure (eg. one lock per hash 822 chain), possibly with a separate per-object lock. 823 824Here is the "lock-per-object" implementation: 825 826:: 827 828 --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100 829 +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 830 @@ -6,11 +6,17 @@ 831 832 struct object 833 { 834 + /* These two protected by cache_lock. */ 835 struct list_head list; 836 + int popularity; 837 + 838 atomic_t refcnt; 839 + 840 + /* Doesn't change once created. */ 841 int id; 842 + 843 + spinlock_t lock; /* Protects the name */ 844 char name[32]; 845 - int popularity; 846 }; 847 848 static DEFINE_SPINLOCK(cache_lock); 849 @@ -77,6 +84,7 @@ 850 obj->id = id; 851 obj->popularity = 0; 852 atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ 853 + spin_lock_init(&obj->lock); 854 855 spin_lock_irqsave(&cache_lock, flags); 856 __cache_add(obj); 857 858Note that I decide that the popularity count should be protected by the 859``cache_lock`` rather than the per-object lock: this is because it (like 860the :c:type:`struct list_head <list_head>` inside the object) 861is logically part of the infrastructure. This way, I don't need to grab 862the lock of every object in :c:func:`__cache_add()` when seeking 863the least popular. 864 865I also decided that the id member is unchangeable, so I don't need to 866grab each object lock in :c:func:`__cache_find()` to examine the 867id: the object lock is only used by a caller who wants to read or write 868the name field. 869 870Note also that I added a comment describing what data was protected by 871which locks. This is extremely important, as it describes the runtime 872behavior of the code, and can be hard to gain from just reading. And as 873Alan Cox says, “Lock data, not code”. 874 875Common Problems 876=============== 877 878Deadlock: Simple and Advanced 879----------------------------- 880 881There is a coding bug where a piece of code tries to grab a spinlock 882twice: it will spin forever, waiting for the lock to be released 883(spinlocks, rwlocks and mutexes are not recursive in Linux). This is 884trivial to diagnose: not a 885stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. 886 887For a slightly more complex case, imagine you have a region shared by a 888softirq and user context. If you use a :c:func:`spin_lock()` call 889to protect it, it is possible that the user context will be interrupted 890by the softirq while it holds the lock, and the softirq will then spin 891forever trying to get the same lock. 892 893Both of these are called deadlock, and as shown above, it can occur even 894with a single CPU (although not on UP compiles, since spinlocks vanish 895on kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data 896corruption in the second example). 897 898This complete lockup is easy to diagnose: on SMP boxes the watchdog 899timer or compiling with ``DEBUG_SPINLOCK`` set 900(``include/linux/spinlock.h``) will show this up immediately when it 901happens. 902 903A more complex problem is the so-called 'deadly embrace', involving two 904or more locks. Say you have a hash table: each entry in the table is a 905spinlock, and a chain of hashed objects. Inside a softirq handler, you 906sometimes want to alter an object from one place in the hash to another: 907you grab the spinlock of the old hash chain and the spinlock of the new 908hash chain, and delete the object from the old one, and insert it in the 909new one. 910 911There are two problems here. First, if your code ever tries to move the 912object to the same chain, it will deadlock with itself as it tries to 913lock it twice. Secondly, if the same softirq on another CPU is trying to 914move another object in the reverse direction, the following could 915happen: 916 917+-----------------------+-----------------------+ 918| CPU 1 | CPU 2 | 919+=======================+=======================+ 920| Grab lock A -> OK | Grab lock B -> OK | 921+-----------------------+-----------------------+ 922| Grab lock B -> spin | Grab lock A -> spin | 923+-----------------------+-----------------------+ 924 925Table: Consequences 926 927The two CPUs will spin forever, waiting for the other to give up their 928lock. It will look, smell, and feel like a crash. 929 930Preventing Deadlock 931------------------- 932 933Textbooks will tell you that if you always lock in the same order, you 934will never get this kind of deadlock. Practice will tell you that this 935approach doesn't scale: when I create a new lock, I don't understand 936enough of the kernel to figure out where in the 5000 lock hierarchy it 937will fit. 938 939The best locks are encapsulated: they never get exposed in headers, and 940are never held around calls to non-trivial functions outside the same 941file. You can read through this code and see that it will never 942deadlock, because it never tries to grab another lock while it has that 943one. People using your code don't even need to know you are using a 944lock. 945 946A classic problem here is when you provide callbacks or hooks: if you 947call these with the lock held, you risk simple deadlock, or a deadly 948embrace (who knows what the callback will do?). Remember, the other 949programmers are out to get you, so don't do this. 950 951Overzealous Prevention Of Deadlocks 952~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 953 954Deadlocks are problematic, but not as bad as data corruption. Code which 955grabs a read lock, searches a list, fails to find what it wants, drops 956the read lock, grabs a write lock and inserts the object has a race 957condition. 958 959If you don't see why, please stay the fuck away from my code. 960 961Racing Timers: A Kernel Pastime 962------------------------------- 963 964Timers can produce their own special problems with races. Consider a 965collection of objects (list, hash, etc) where each object has a timer 966which is due to destroy it. 967 968If you want to destroy the entire collection (say on module removal), 969you might do the following:: 970 971 /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE 972 HUNGARIAN NOTATION */ 973 spin_lock_bh(&list_lock); 974 975 while (list) { 976 struct foo *next = list->next; 977 del_timer(&list->timer); 978 kfree(list); 979 list = next; 980 } 981 982 spin_unlock_bh(&list_lock); 983 984 985Sooner or later, this will crash on SMP, because a timer can have just 986gone off before the :c:func:`spin_lock_bh()`, and it will only get 987the lock after we :c:func:`spin_unlock_bh()`, and then try to free 988the element (which has already been freed!). 989 990This can be avoided by checking the result of 991:c:func:`del_timer()`: if it returns 1, the timer has been deleted. 992If 0, it means (in this case) that it is currently running, so we can 993do:: 994 995 retry: 996 spin_lock_bh(&list_lock); 997 998 while (list) { 999 struct foo *next = list->next; 1000 if (!del_timer(&list->timer)) { 1001 /* Give timer a chance to delete this */ 1002 spin_unlock_bh(&list_lock); 1003 goto retry; 1004 } 1005 kfree(list); 1006 list = next; 1007 } 1008 1009 spin_unlock_bh(&list_lock); 1010 1011 1012Another common problem is deleting timers which restart themselves (by 1013calling :c:func:`add_timer()` at the end of their timer function). 1014Because this is a fairly common case which is prone to races, you should 1015use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to 1016handle this case. It returns the number of times the timer had to be 1017deleted before we finally stopped it from adding itself back in. 1018 1019Locking Speed 1020============= 1021 1022There are three main things to worry about when considering speed of 1023some code which does locking. First is concurrency: how many things are 1024going to be waiting while someone else is holding a lock. Second is the 1025time taken to actually acquire and release an uncontended lock. Third is 1026using fewer, or smarter locks. I'm assuming that the lock is used fairly 1027often: otherwise, you wouldn't be concerned about efficiency. 1028 1029Concurrency depends on how long the lock is usually held: you should 1030hold the lock for as long as needed, but no longer. In the cache 1031example, we always create the object without the lock held, and then 1032grab the lock only when we are ready to insert it in the list. 1033 1034Acquisition times depend on how much damage the lock operations do to 1035the pipeline (pipeline stalls) and how likely it is that this CPU was 1036the last one to grab the lock (ie. is the lock cache-hot for this CPU): 1037on a machine with more CPUs, this likelihood drops fast. Consider a 1038700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic 1039increment takes about 58ns, a lock which is cache-hot on this CPU takes 1040160ns, and a cacheline transfer from another CPU takes an additional 170 1041to 360ns. (These figures from Paul McKenney's `Linux Journal RCU 1042article <http://www.linuxjournal.com/article.php?sid=6993>`__). 1043 1044These two aims conflict: holding a lock for a short time might be done 1045by splitting locks into parts (such as in our final per-object-lock 1046example), but this increases the number of lock acquisitions, and the 1047results are often slower than having a single lock. This is another 1048reason to advocate locking simplicity. 1049 1050The third concern is addressed below: there are some methods to reduce 1051the amount of locking which needs to be done. 1052 1053Read/Write Lock Variants 1054------------------------ 1055 1056Both spinlocks and mutexes have read/write variants: ``rwlock_t`` and 1057:c:type:`struct rw_semaphore <rw_semaphore>`. These divide 1058users into two classes: the readers and the writers. If you are only 1059reading the data, you can get a read lock, but to write to the data you 1060need the write lock. Many people can hold a read lock, but a writer must 1061be sole holder. 1062 1063If your code divides neatly along reader/writer lines (as our cache code 1064does), and the lock is held by readers for significant lengths of time, 1065using these locks can help. They are slightly slower than the normal 1066locks though, so in practice ``rwlock_t`` is not usually worthwhile. 1067 1068Avoiding Locks: Read Copy Update 1069-------------------------------- 1070 1071There is a special method of read/write locking called Read Copy Update. 1072Using RCU, the readers can avoid taking a lock altogether: as we expect 1073our cache to be read more often than updated (otherwise the cache is a 1074waste of time), it is a candidate for this optimization. 1075 1076How do we get rid of read locks? Getting rid of read locks means that 1077writers may be changing the list underneath the readers. That is 1078actually quite simple: we can read a linked list while an element is 1079being added if the writer adds the element very carefully. For example, 1080adding ``new`` to a single linked list called ``list``:: 1081 1082 new->next = list->next; 1083 wmb(); 1084 list->next = new; 1085 1086 1087The :c:func:`wmb()` is a write memory barrier. It ensures that the 1088first operation (setting the new element's ``next`` pointer) is complete 1089and will be seen by all CPUs, before the second operation is (putting 1090the new element into the list). This is important, since modern 1091compilers and modern CPUs can both reorder instructions unless told 1092otherwise: we want a reader to either not see the new element at all, or 1093see the new element with the ``next`` pointer correctly pointing at the 1094rest of the list. 1095 1096Fortunately, there is a function to do this for standard 1097:c:type:`struct list_head <list_head>` lists: 1098:c:func:`list_add_rcu()` (``include/linux/list.h``). 1099 1100Removing an element from the list is even simpler: we replace the 1101pointer to the old element with a pointer to its successor, and readers 1102will either see it, or skip over it. 1103 1104:: 1105 1106 list->next = old->next; 1107 1108 1109There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which 1110does this (the normal version poisons the old object, which we don't 1111want). 1112 1113The reader must also be careful: some CPUs can look through the ``next`` 1114pointer to start reading the contents of the next element early, but 1115don't realize that the pre-fetched contents is wrong when the ``next`` 1116pointer changes underneath them. Once again, there is a 1117:c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``) 1118to help you. Of course, writers can just use 1119:c:func:`list_for_each_entry()`, since there cannot be two 1120simultaneous writers. 1121 1122Our final dilemma is this: when can we actually destroy the removed 1123element? Remember, a reader might be stepping through this element in 1124the list right now: if we free this element and the ``next`` pointer 1125changes, the reader will jump off into garbage and crash. We need to 1126wait until we know that all the readers who were traversing the list 1127when we deleted the element are finished. We use 1128:c:func:`call_rcu()` to register a callback which will actually 1129destroy the object once all pre-existing readers are finished. 1130Alternatively, :c:func:`synchronize_rcu()` may be used to block 1131until all pre-existing are finished. 1132 1133But how does Read Copy Update know when the readers are finished? The 1134method is this: firstly, the readers always traverse the list inside 1135:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs: 1136these simply disable preemption so the reader won't go to sleep while 1137reading the list. 1138 1139RCU then waits until every other CPU has slept at least once: since 1140readers cannot sleep, we know that any readers which were traversing the 1141list during the deletion are finished, and the callback is triggered. 1142The real Read Copy Update code is a little more optimized than this, but 1143this is the fundamental idea. 1144 1145:: 1146 1147 --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 1148 +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100 1149 @@ -1,15 +1,18 @@ 1150 #include <linux/list.h> 1151 #include <linux/slab.h> 1152 #include <linux/string.h> 1153 +#include <linux/rcupdate.h> 1154 #include <linux/mutex.h> 1155 #include <asm/errno.h> 1156 1157 struct object 1158 { 1159 - /* These two protected by cache_lock. */ 1160 + /* This is protected by RCU */ 1161 struct list_head list; 1162 int popularity; 1163 1164 + struct rcu_head rcu; 1165 + 1166 atomic_t refcnt; 1167 1168 /* Doesn't change once created. */ 1169 @@ -40,7 +43,7 @@ 1170 { 1171 struct object *i; 1172 1173 - list_for_each_entry(i, &cache, list) { 1174 + list_for_each_entry_rcu(i, &cache, list) { 1175 if (i->id == id) { 1176 i->popularity++; 1177 return i; 1178 @@ -49,19 +52,25 @@ 1179 return NULL; 1180 } 1181 1182 +/* Final discard done once we know no readers are looking. */ 1183 +static void cache_delete_rcu(void *arg) 1184 +{ 1185 + object_put(arg); 1186 +} 1187 + 1188 /* Must be holding cache_lock */ 1189 static void __cache_delete(struct object *obj) 1190 { 1191 BUG_ON(!obj); 1192 - list_del(&obj->list); 1193 - object_put(obj); 1194 + list_del_rcu(&obj->list); 1195 cache_num--; 1196 + call_rcu(&obj->rcu, cache_delete_rcu); 1197 } 1198 1199 /* Must be holding cache_lock */ 1200 static void __cache_add(struct object *obj) 1201 { 1202 - list_add(&obj->list, &cache); 1203 + list_add_rcu(&obj->list, &cache); 1204 if (++cache_num > MAX_CACHE_SIZE) { 1205 struct object *i, *outcast = NULL; 1206 list_for_each_entry(i, &cache, list) { 1207 @@ -104,12 +114,11 @@ 1208 struct object *cache_find(int id) 1209 { 1210 struct object *obj; 1211 - unsigned long flags; 1212 1213 - spin_lock_irqsave(&cache_lock, flags); 1214 + rcu_read_lock(); 1215 obj = __cache_find(id); 1216 if (obj) 1217 object_get(obj); 1218 - spin_unlock_irqrestore(&cache_lock, flags); 1219 + rcu_read_unlock(); 1220 return obj; 1221 } 1222 1223Note that the reader will alter the popularity member in 1224:c:func:`__cache_find()`, and now it doesn't hold a lock. One 1225solution would be to make it an ``atomic_t``, but for this usage, we 1226don't really care about races: an approximate result is good enough, so 1227I didn't change it. 1228 1229The result is that :c:func:`cache_find()` requires no 1230synchronization with any other functions, so is almost as fast on SMP as 1231it would be on UP. 1232 1233There is a further optimization possible here: remember our original 1234cache code, where there were no reference counts and the caller simply 1235held the lock whenever using the object? This is still possible: if you 1236hold the lock, no one can delete the object, so you don't need to get 1237and put the reference count. 1238 1239Now, because the 'read lock' in RCU is simply disabling preemption, a 1240caller which always has preemption disabled between calling 1241:c:func:`cache_find()` and :c:func:`object_put()` does not 1242need to actually get and put the reference count: we could expose 1243:c:func:`__cache_find()` by making it non-static, and such 1244callers could simply call that. 1245 1246The benefit here is that the reference count is not written to: the 1247object is not altered in any way, which is much faster on SMP machines 1248due to caching. 1249 1250Per-CPU Data 1251------------ 1252 1253Another technique for avoiding locking which is used fairly widely is to 1254duplicate information for each CPU. For example, if you wanted to keep a 1255count of a common condition, you could use a spin lock and a single 1256counter. Nice and simple. 1257 1258If that was too slow (it's usually not, but if you've got a really big 1259machine to test on and can show that it is), you could instead use a 1260counter for each CPU, then none of them need an exclusive lock. See 1261:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and 1262:c:func:`put_cpu_var()` (``include/linux/percpu.h``). 1263 1264Of particular use for simple per-cpu counters is the ``local_t`` type, 1265and the :c:func:`cpu_local_inc()` and related functions, which are 1266more efficient than simple code on some architectures 1267(``include/asm/local.h``). 1268 1269Note that there is no simple, reliable way of getting an exact value of 1270such a counter, without introducing more locks. This is not a problem 1271for some uses. 1272 1273Data Which Mostly Used By An IRQ Handler 1274---------------------------------------- 1275 1276If data is always accessed from within the same IRQ handler, you don't 1277need a lock at all: the kernel already guarantees that the irq handler 1278will not run simultaneously on multiple CPUs. 1279 1280Manfred Spraul points out that you can still do this, even if the data 1281is very occasionally accessed in user context or softirqs/tasklets. The 1282irq handler doesn't use a lock, and all other accesses are done as so:: 1283 1284 spin_lock(&lock); 1285 disable_irq(irq); 1286 ... 1287 enable_irq(irq); 1288 spin_unlock(&lock); 1289 1290The :c:func:`disable_irq()` prevents the irq handler from running 1291(and waits for it to finish if it's currently running on other CPUs). 1292The spinlock prevents any other accesses happening at the same time. 1293Naturally, this is slower than just a :c:func:`spin_lock_irq()` 1294call, so it only makes sense if this type of access happens extremely 1295rarely. 1296 1297What Functions Are Safe To Call From Interrupts? 1298================================================ 1299 1300Many functions in the kernel sleep (ie. call schedule()) directly or 1301indirectly: you can never call them while holding a spinlock, or with 1302preemption disabled. This also means you need to be in user context: 1303calling them from an interrupt is illegal. 1304 1305Some Functions Which Sleep 1306-------------------------- 1307 1308The most common ones are listed below, but you usually have to read the 1309code to find out if other calls are safe. If everyone else who calls it 1310can sleep, you probably need to be able to sleep, too. In particular, 1311registration and deregistration functions usually expect to be called 1312from user context, and can sleep. 1313 1314- Accesses to userspace: 1315 1316 - :c:func:`copy_from_user()` 1317 1318 - :c:func:`copy_to_user()` 1319 1320 - :c:func:`get_user()` 1321 1322 - :c:func:`put_user()` 1323 1324- :c:func:`kmalloc(GFP_KERNEL) <kmalloc>` 1325 1326- :c:func:`mutex_lock_interruptible()` and 1327 :c:func:`mutex_lock()` 1328 1329 There is a :c:func:`mutex_trylock()` which does not sleep. 1330 Still, it must not be used inside interrupt context since its 1331 implementation is not safe for that. :c:func:`mutex_unlock()` 1332 will also never sleep. It cannot be used in interrupt context either 1333 since a mutex must be released by the same task that acquired it. 1334 1335Some Functions Which Don't Sleep 1336-------------------------------- 1337 1338Some functions are safe to call from any context, or holding almost any 1339lock. 1340 1341- :c:func:`printk()` 1342 1343- :c:func:`kfree()` 1344 1345- :c:func:`add_timer()` and :c:func:`del_timer()` 1346 1347Mutex API reference 1348=================== 1349 1350.. kernel-doc:: include/linux/mutex.h 1351 :internal: 1352 1353.. kernel-doc:: kernel/locking/mutex.c 1354 :export: 1355 1356Futex API reference 1357=================== 1358 1359.. kernel-doc:: kernel/futex.c 1360 :internal: 1361 1362Further reading 1363=============== 1364 1365- ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking 1366 tutorial in the kernel sources. 1367 1368- Unix Systems for Modern Architectures: Symmetric Multiprocessing and 1369 Caching for Kernel Programmers: 1370 1371 Curt Schimmel's very good introduction to kernel level locking (not 1372 written for Linux, but nearly everything applies). The book is 1373 expensive, but really worth every penny to understand SMP locking. 1374 [ISBN: 0201633388] 1375 1376Thanks 1377====== 1378 1379Thanks to Telsa Gwynne for DocBooking, neatening and adding style. 1380 1381Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras, 1382Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev, 1383James Morris, Robert Love, Paul McKenney, John Ashby for proofreading, 1384correcting, flaming, commenting. 1385 1386Thanks to the cabal for having no influence on this document. 1387 1388Glossary 1389======== 1390 1391preemption 1392 Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user 1393 context inside the kernel would not preempt each other (ie. you had that 1394 CPU until you gave it up, except for interrupts). With the addition of 1395 ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher 1396 priority tasks can "cut in": spinlocks were changed to disable 1397 preemption, even on UP. 1398 1399bh 1400 Bottom Half: for historical reasons, functions with '_bh' in them often 1401 now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()` 1402 blocks any software interrupt on the current CPU. Bottom halves are 1403 deprecated, and will eventually be replaced by tasklets. Only one bottom 1404 half will be running at any time. 1405 1406Hardware Interrupt / Hardware IRQ 1407 Hardware interrupt request. :c:func:`in_irq()` returns true in a 1408 hardware interrupt handler. 1409 1410Interrupt Context 1411 Not user context: processing a hardware irq or software irq. Indicated 1412 by the :c:func:`in_interrupt()` macro returning true. 1413 1414SMP 1415 Symmetric Multi-Processor: kernels compiled for multiple-CPU machines. 1416 (``CONFIG_SMP=y``). 1417 1418Software Interrupt / softirq 1419 Software interrupt handler. :c:func:`in_irq()` returns false; 1420 :c:func:`in_softirq()` returns true. Tasklets and softirqs both 1421 fall into the category of 'software interrupts'. 1422 1423 Strictly speaking a softirq is one of up to 32 enumerated software 1424 interrupts which can run on multiple CPUs at once. Sometimes used to 1425 refer to tasklets as well (ie. all software interrupts). 1426 1427tasklet 1428 A dynamically-registrable software interrupt, which is guaranteed to 1429 only run on one CPU at a time. 1430 1431timer 1432 A dynamically-registrable software interrupt, which is run at (or close 1433 to) a given time. When running, it is just like a tasklet (in fact, they 1434 are called from the ``TIMER_SOFTIRQ``). 1435 1436UP 1437 Uni-Processor: Non-SMP. (``CONFIG_SMP=n``). 1438 1439User Context 1440 The kernel executing on behalf of a particular process (ie. a system 1441 call or trap) or kernel thread. You can tell which process with the 1442 ``current`` macro.) Not to be confused with userspace. Can be 1443 interrupted by software or hardware interrupts. 1444 1445Userspace 1446 A process executing its own code outside the kernel. 1447