1The Kernel Concurrency Sanitizer (KCSAN) 2======================================== 3 4The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which 5relies on compile-time instrumentation, and uses a watchpoint-based sampling 6approach to detect races. KCSAN's primary purpose is to detect `data races`_. 7 8Usage 9----- 10 11KCSAN requires Clang version 11 or later. 12 13To enable KCSAN configure the kernel with:: 14 15 CONFIG_KCSAN = y 16 17KCSAN provides several other configuration options to customize behaviour (see 18the respective help text in ``lib/Kconfig.kcsan`` for more info). 19 20Error reports 21~~~~~~~~~~~~~ 22 23A typical data race report looks like this:: 24 25 ================================================================== 26 BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode 27 28 write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4: 29 kernfs_refresh_inode+0x70/0x170 30 kernfs_iop_permission+0x4f/0x90 31 inode_permission+0x190/0x200 32 link_path_walk.part.0+0x503/0x8e0 33 path_lookupat.isra.0+0x69/0x4d0 34 filename_lookup+0x136/0x280 35 user_path_at_empty+0x47/0x60 36 vfs_statx+0x9b/0x130 37 __do_sys_newlstat+0x50/0xb0 38 __x64_sys_newlstat+0x37/0x50 39 do_syscall_64+0x85/0x260 40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 41 42 read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6: 43 generic_permission+0x5b/0x2a0 44 kernfs_iop_permission+0x66/0x90 45 inode_permission+0x190/0x200 46 link_path_walk.part.0+0x503/0x8e0 47 path_lookupat.isra.0+0x69/0x4d0 48 filename_lookup+0x136/0x280 49 user_path_at_empty+0x47/0x60 50 do_faccessat+0x11a/0x390 51 __x64_sys_access+0x3c/0x50 52 do_syscall_64+0x85/0x260 53 entry_SYSCALL_64_after_hwframe+0x44/0xa9 54 55 Reported by Kernel Concurrency Sanitizer on: 56 CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1 57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 58 ================================================================== 59 60The header of the report provides a short summary of the functions involved in 61the race. It is followed by the access types and stack traces of the 2 threads 62involved in the data race. 63 64The other less common type of data race report looks like this:: 65 66 ================================================================== 67 BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10 68 69 race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0: 70 e1000_clean_rx_irq+0x551/0xb10 71 e1000_clean+0x533/0xda0 72 net_rx_action+0x329/0x900 73 __do_softirq+0xdb/0x2db 74 irq_exit+0x9b/0xa0 75 do_IRQ+0x9c/0xf0 76 ret_from_intr+0x0/0x18 77 default_idle+0x3f/0x220 78 arch_cpu_idle+0x21/0x30 79 do_idle+0x1df/0x230 80 cpu_startup_entry+0x14/0x20 81 rest_init+0xc5/0xcb 82 arch_call_rest_init+0x13/0x2b 83 start_kernel+0x6db/0x700 84 85 Reported by Kernel Concurrency Sanitizer on: 86 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2 87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 88 ================================================================== 89 90This report is generated where it was not possible to determine the other 91racing thread, but a race was inferred due to the data value of the watched 92memory location having changed. These can occur either due to missing 93instrumentation or e.g. DMA accesses. These reports will only be generated if 94``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default). 95 96Selective analysis 97~~~~~~~~~~~~~~~~~~ 98 99It may be desirable to disable data race detection for specific accesses, 100functions, compilation units, or entire subsystems. For static blacklisting, 101the below options are available: 102 103* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that 104 any data races due to accesses in ``expr`` should be ignored and resulting 105 behaviour when encountering a data race is deemed safe. 106 107* Disabling data race detection for entire functions can be accomplished by 108 using the function attribute ``__no_kcsan``:: 109 110 __no_kcsan 111 void foo(void) { 112 ... 113 114 To dynamically limit for which functions to generate reports, see the 115 `DebugFS interface`_ blacklist/whitelist feature. 116 117 For ``__always_inline`` functions, replace ``__always_inline`` with 118 ``__no_kcsan_or_inline`` (which implies ``__always_inline``):: 119 120 static __no_kcsan_or_inline void foo(void) { 121 ... 122 123* To disable data race detection for a particular compilation unit, add to the 124 ``Makefile``:: 125 126 KCSAN_SANITIZE_file.o := n 127 128* To disable data race detection for all compilation units listed in a 129 ``Makefile``, add to the respective ``Makefile``:: 130 131 KCSAN_SANITIZE := n 132 133Furthermore, it is possible to tell KCSAN to show or hide entire classes of 134data races, depending on preferences. These can be changed via the following 135Kconfig options: 136 137* ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write 138 is observed via a watchpoint, but the data value of the memory location was 139 observed to remain unchanged, do not report the data race. 140 141* ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes 142 up to word size are atomic by default. Assumes that such writes are not 143 subject to unsafe compiler optimizations resulting in data races. The option 144 causes KCSAN to not report data races due to conflicts where the only plain 145 accesses are aligned writes up to word size. 146 147DebugFS interface 148~~~~~~~~~~~~~~~~~ 149 150The file ``/sys/kernel/debug/kcsan`` provides the following interface: 151 152* Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics. 153 154* Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN 155 on or off, respectively. 156 157* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds 158 ``some_func_name`` to the report filter list, which (by default) blacklists 159 reporting data races where either one of the top stackframes are a function 160 in the list. 161 162* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan`` 163 changes the report filtering behaviour. For example, the blacklist feature 164 can be used to silence frequently occurring data races; the whitelist feature 165 can help with reproduction and testing of fixes. 166 167Tuning performance 168~~~~~~~~~~~~~~~~~~ 169 170Core parameters that affect KCSAN's overall performance and bug detection 171ability are exposed as kernel command-line arguments whose defaults can also be 172changed via the corresponding Kconfig options. 173 174* ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory 175 operations to skip, before another watchpoint is set up. Setting up 176 watchpoints more frequently will result in the likelihood of races to be 177 observed to increase. This parameter has the most significant impact on 178 overall system performance and race detection ability. 179 180* ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the 181 microsecond delay to stall execution after a watchpoint has been set up. 182 Larger values result in the window in which we may observe a race to 183 increase. 184 185* ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For 186 interrupts, the microsecond delay to stall execution after a watchpoint has 187 been set up. Interrupts have tighter latency requirements, and their delay 188 should generally be smaller than the one chosen for tasks. 189 190They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``. 191 192Data Races 193---------- 194 195In an execution, two memory accesses form a *data race* if they *conflict*, 196they happen concurrently in different threads, and at least one of them is a 197*plain access*; they *conflict* if both access the same memory location, and at 198least one is a write. For a more thorough discussion and definition, see `"Plain 199Accesses and Data Races" in the LKMM`_. 200 201.. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 202 203Relationship with the Linux-Kernel Memory Consistency Model (LKMM) 204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 205 206The LKMM defines the propagation and ordering rules of various memory 207operations, which gives developers the ability to reason about concurrent code. 208Ultimately this allows to determine the possible executions of concurrent code, 209and if that code is free from data races. 210 211KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``, 212``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply 213assumes that memory barriers are placed correctly. In other words, KCSAN 214assumes that as long as a plain access is not observed to race with another 215conflicting access, memory operations are correctly ordered. 216 217This means that KCSAN will not report *potential* data races due to missing 218memory ordering. Developers should therefore carefully consider the required 219memory ordering requirements that remain unchecked. If, however, missing 220memory ordering (that is observable with a particular compiler and 221architecture) leads to an observable data race (e.g. entering a critical 222section erroneously), KCSAN would report the resulting data race. 223 224Race Detection Beyond Data Races 225-------------------------------- 226 227For code with complex concurrency design, race-condition bugs may not always 228manifest as data races. Race conditions occur if concurrently executing 229operations result in unexpected system behaviour. On the other hand, data races 230are defined at the C-language level. The following macros can be used to check 231properties of concurrent code where bugs would not manifest as data races. 232 233.. kernel-doc:: include/linux/kcsan-checks.h 234 :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED 235 ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED 236 ASSERT_EXCLUSIVE_BITS 237 238Implementation Details 239---------------------- 240 241KCSAN relies on observing that two accesses happen concurrently. Crucially, we 242want to (a) increase the chances of observing races (especially for races that 243manifest rarely), and (b) be able to actually observe them. We can accomplish 244(a) by injecting various delays, and (b) by using address watchpoints (or 245breakpoints). 246 247If we deliberately stall a memory access, while we have a watchpoint for its 248address set up, and then observe the watchpoint to fire, two accesses to the 249same address just raced. Using hardware watchpoints, this is the approach taken 250in `DataCollider 251<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_. 252Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead 253relies on compiler instrumentation and "soft watchpoints". 254 255In KCSAN, watchpoints are implemented using an efficient encoding that stores 256access type, size, and address in a long; the benefits of using "soft 257watchpoints" are portability and greater flexibility. KCSAN then relies on the 258compiler instrumenting plain accesses. For each instrumented plain access: 259 2601. Check if a matching watchpoint exists; if yes, and at least one access is a 261 write, then we encountered a racing access. 262 2632. Periodically, if no matching watchpoint exists, set up a watchpoint and 264 stall for a small randomized delay. 265 2663. Also check the data value before the delay, and re-check the data value 267 after delay; if the values mismatch, we infer a race of unknown origin. 268 269To detect data races between plain and marked accesses, KCSAN also annotates 270marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never 271sets up a watchpoint on marked accesses. By never setting up watchpoints for 272marked operations, if all accesses to a variable that is accessed concurrently 273are properly marked, KCSAN will never trigger a watchpoint and therefore never 274report the accesses. 275 276Key Properties 277~~~~~~~~~~~~~~ 278 2791. **Memory Overhead:** The overall memory overhead is only a few MiB 280 depending on configuration. The current implementation uses a small array of 281 longs to encode watchpoint information, which is negligible. 282 2832. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an 284 efficient watchpoint encoding that does not require acquiring any shared 285 locks in the fast-path. For kernel boot on a system with 8 CPUs: 286 287 - 5.0x slow-down with the default KCSAN config; 288 - 2.8x slow-down from runtime fast-path overhead only (set very large 289 ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``). 290 2913. **Annotation Overheads:** Minimal annotations are required outside the KCSAN 292 runtime. As a result, maintenance overheads are minimal as the kernel 293 evolves. 294 2954. **Detects Racy Writes from Devices:** Due to checking data values upon 296 setting up watchpoints, racy writes from devices can also be detected. 297 2985. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering 299 rules; this may result in missed data races (false negatives). 300 3016. **Analysis Accuracy:** For observed executions, due to using a sampling 302 strategy, the analysis is *unsound* (false negatives possible), but aims to 303 be complete (no false positives). 304 305Alternatives Considered 306----------------------- 307 308An alternative data race detection approach for the kernel can be found in the 309`Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_. 310KTSAN is a happens-before data race detector, which explicitly establishes the 311happens-before order between memory operations, which can then be used to 312determine data races as defined in `Data Races`_. 313 314To build a correct happens-before relation, KTSAN must be aware of all ordering 315rules of the LKMM and synchronization primitives. Unfortunately, any omission 316leads to large numbers of false positives, which is especially detrimental in 317the context of the kernel which includes numerous custom synchronization 318mechanisms. To track the happens-before relation, KTSAN's implementation 319requires metadata for each memory location (shadow memory), which for each page 320corresponds to 4 pages of shadow memory, and can translate into overhead of 321tens of GiB on a large system. 322