1.. SPDX-License-Identifier: GPL-2.0 2.. Copyright (C) 2022, Google LLC. 3 4=================================== 5The Kernel Memory Sanitizer (KMSAN) 6=================================== 7 8KMSAN is a dynamic error detector aimed at finding uses of uninitialized 9values. It is based on compiler instrumentation, and is quite similar to the 10userspace `MemorySanitizer tool`_. 11 12An important note is that KMSAN is not intended for production use, because it 13drastically increases kernel memory footprint and slows the whole system down. 14 15Usage 16===== 17 18Building the kernel 19------------------- 20 21In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+). 22Please refer to `LLVM documentation`_ for the instructions on how to build Clang. 23 24Now configure and build the kernel with CONFIG_KMSAN enabled. 25 26Example report 27-------------- 28 29Here is an example of a KMSAN report:: 30 31 ===================================================== 32 BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test] 33 test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273 34 kunit_run_case_internal lib/kunit/test.c:333 35 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 36 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 37 kthread+0x721/0x850 kernel/kthread.c:327 38 ret_from_fork+0x1f/0x30 ??:? 39 40 Uninit was stored to memory at: 41 do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260 42 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 43 kunit_run_case_internal lib/kunit/test.c:333 44 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 45 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 46 kthread+0x721/0x850 kernel/kthread.c:327 47 ret_from_fork+0x1f/0x30 ??:? 48 49 Local variable uninit created at: 50 do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256 51 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 52 53 Bytes 4-7 of 8 are uninitialized 54 Memory access of size 8 starts at ffff888083fe3da0 55 56 CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104 57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 58 ===================================================== 59 60The report says that the local variable ``uninit`` was created uninitialized in 61``do_uninit_local_array()``. The third stack trace corresponds to the place 62where this variable was created. 63 64The first stack trace shows where the uninit value was used (in 65``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left 66uninitialized in the local variable, as well as the stack where the value was 67copied to another memory location before use. 68 69A use of uninitialized value ``v`` is reported by KMSAN in the following cases: 70 - in a condition, e.g. ``if (v) { ... }``; 71 - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``; 72 - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``; 73 - when it is passed as an argument to a function, and 74 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below). 75 76The mentioned cases (apart from copying data to userspace or hardware, which is 77a security issue) are considered undefined behavior from the C11 Standard point 78of view. 79 80Disabling the instrumentation 81----------------------------- 82 83A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN 84ignore uninitialized values in that function and mark its output as initialized. 85As a result, the user will not get KMSAN reports related to that function. 86 87Another function attribute supported by KMSAN is ``__no_sanitize_memory``. 88Applying this attribute to a function will result in KMSAN not instrumenting 89it, which can be helpful if we do not want the compiler to interfere with some 90low-level code (e.g. that marked with ``noinstr`` which implicitly adds 91``__no_sanitize_memory``). 92 93This however comes at a cost: stack allocations from such functions will have 94incorrect shadow/origin values, likely leading to false positives. Functions 95called from non-instrumented code may also receive incorrect metadata for their 96parameters. 97 98As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly. 99 100It is also possible to disable KMSAN for a single file (e.g. main.o):: 101 102 KMSAN_SANITIZE_main.o := n 103 104or for the whole directory:: 105 106 KMSAN_SANITIZE := n 107 108in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every 109function in the file or directory. Most users won't need KMSAN_SANITIZE, unless 110their code gets broken by KMSAN (e.g. runs at early boot time). 111 112Support 113======= 114 115In order for KMSAN to work the kernel must be built with Clang, which so far is 116the only compiler that has KMSAN support. The kernel instrumentation pass is 117based on the userspace `MemorySanitizer tool`_. 118 119The runtime library only supports x86_64 at the moment. 120 121How KMSAN works 122=============== 123 124KMSAN shadow memory 125------------------- 126 127KMSAN associates a metadata byte (also called shadow byte) with every byte of 128kernel memory. A bit in the shadow byte is set iff the corresponding bit of the 129kernel memory byte is uninitialized. Marking the memory uninitialized (i.e. 130setting its shadow bytes to ``0xff``) is called poisoning, marking it 131initialized (setting the shadow bytes to ``0x00``) is called unpoisoning. 132 133When a new variable is allocated on the stack, it is poisoned by default by 134instrumentation code inserted by the compiler (unless it is a stack variable 135that is immediately initialized). Any new heap allocation done without 136``__GFP_ZERO`` is also poisoned. 137 138Compiler instrumentation also tracks the shadow values as they are used along 139the code. When needed, instrumentation code invokes the runtime library in 140``mm/kmsan/`` to persist shadow values. 141 142The shadow value of a basic or compound type is an array of bytes of the same 143length. When a constant value is written into memory, that memory is unpoisoned. 144When a value is read from memory, its shadow memory is also obtained and 145propagated into all the operations which use that value. For every instruction 146that takes one or more values the compiler generates code that calculates the 147shadow of the result depending on those values and their shadows. 148 149Example:: 150 151 int a = 0xff; // i.e. 0x000000ff 152 int b; 153 int c = a | b; 154 155In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, 156shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of 157``c`` are uninitialized, while the lower byte is initialized. 158 159Origin tracking 160--------------- 161 162Every four bytes of kernel memory also have a so-called origin mapped to them. 163This origin describes the point in program execution at which the uninitialized 164value was created. Every origin is associated with either the full allocation 165stack (for heap-allocated memory), or the function containing the uninitialized 166variable (for locals). 167 168When an uninitialized variable is allocated on stack or heap, a new origin 169value is created, and that variable's origin is filled with that value. When a 170value is read from memory, its origin is also read and kept together with the 171shadow. For every instruction that takes one or more values, the origin of the 172result is one of the origins corresponding to any of the uninitialized inputs. 173If a poisoned value is written into memory, its origin is written to the 174corresponding storage as well. 175 176Example 1:: 177 178 int a = 42; 179 int b; 180 int c = a + b; 181 182In this case the origin of ``b`` is generated upon function entry, and is 183stored to the origin of ``c`` right before the addition result is written into 184memory. 185 186Several variables may share the same origin address, if they are stored in the 187same four-byte chunk. In this case every write to either variable updates the 188origin for all of them. We have to sacrifice precision in this case, because 189storing origins for individual bits (and even bytes) would be too costly. 190 191Example 2:: 192 193 int combine(short a, short b) { 194 union ret_t { 195 int i; 196 short s[2]; 197 } ret; 198 ret.s[0] = a; 199 ret.s[1] = b; 200 return ret.i; 201 } 202 203If ``a`` is initialized and ``b`` is not, the shadow of the result would be 2040xffff0000, and the origin of the result would be the origin of ``b``. 205``ret.s[0]`` would have the same origin, but it will never be used, because 206that variable is initialized. 207 208If both function arguments are uninitialized, only the origin of the second 209argument is preserved. 210 211Origin chaining 212~~~~~~~~~~~~~~~ 213 214To ease debugging, KMSAN creates a new origin for every store of an 215uninitialized value to memory. The new origin references both its creation stack 216and the previous origin the value had. This may cause increased memory 217consumption, so we limit the length of origin chains in the runtime. 218 219Clang instrumentation API 220------------------------- 221 222Clang instrumentation pass inserts calls to functions defined in 223``mm/kmsan/nstrumentation.c`` into the kernel code. 224 225Shadow manipulation 226~~~~~~~~~~~~~~~~~~~ 227 228For every memory access the compiler emits a call to a function that returns a 229pair of pointers to the shadow and origin addresses of the given memory:: 230 231 typedef struct { 232 void *shadow, *origin; 233 } shadow_origin_ptr_t 234 235 shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) 236 shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) 237 shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size) 238 shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size) 239 240The function name depends on the memory access size. 241 242The compiler makes sure that for every loaded value its shadow and origin 243values are read from memory. When a value is stored to memory, its shadow and 244origin are also stored using the metadata pointers. 245 246Handling locals 247~~~~~~~~~~~~~~~ 248 249A special function is used to create a new origin value for a local variable and 250set the origin of that variable to that value:: 251 252 void __msan_poison_alloca(void *addr, uintptr_t size, char *descr) 253 254Access to per-task data 255~~~~~~~~~~~~~~~~~~~~~~~ 256 257At the beginning of every instrumented function KMSAN inserts a call to 258``__msan_get_context_state()``:: 259 260 kmsan_context_state *__msan_get_context_state(void) 261 262``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: 263 264 struct kmsan_context_state { 265 char param_tls[KMSAN_PARAM_SIZE]; 266 char retval_tls[KMSAN_RETVAL_SIZE]; 267 char va_arg_tls[KMSAN_PARAM_SIZE]; 268 char va_arg_origin_tls[KMSAN_PARAM_SIZE]; 269 u64 va_arg_overflow_size_tls; 270 char param_origin_tls[KMSAN_PARAM_SIZE]; 271 depot_stack_handle_t retval_origin_tls; 272 }; 273 274This structure is used by KMSAN to pass parameter shadows and origins between 275instrumented functions (unless the parameters are checked immediately by 276``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). 277 278Passing uninitialized values to functions 279~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 280 281Clang's MemorySanitizer instrumentation has an option, 282``-fsanitize-memory-param-retval``, which makes the compiler check function 283parameters passed by value, as well as function return values. 284 285The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is 286enabled by default to let KMSAN report uninitialized values earlier. 287Please refer to the `LKML discussion`_ for more details. 288 289Because of the way the checks are implemented in LLVM (they are only applied to 290parameters marked as ``noundef``), not all parameters are guaranteed to be 291checked, so we cannot give up the metadata storage in ``kmsan_context_state``. 292 293String functions 294~~~~~~~~~~~~~~~~ 295 296The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the 297following functions. These functions are also called when data structures are 298initialized or copied, making sure shadow and origin values are copied alongside 299with the data:: 300 301 void *__msan_memcpy(void *dst, void *src, uintptr_t n) 302 void *__msan_memmove(void *dst, void *src, uintptr_t n) 303 void *__msan_memset(void *dst, int c, uintptr_t n) 304 305Error reporting 306~~~~~~~~~~~~~~~ 307 308For each use of a value the compiler emits a shadow check that calls 309``__msan_warning()`` in the case that value is poisoned:: 310 311 void __msan_warning(u32 origin) 312 313``__msan_warning()`` causes KMSAN runtime to print an error report. 314 315Inline assembly instrumentation 316~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 318KMSAN instruments every inline assembly output with a call to:: 319 320 void __msan_instrument_asm_store(void *addr, uintptr_t size) 321 322, which unpoisons the memory region. 323 324This approach may mask certain errors, but it also helps to avoid a lot of 325false positives in bitwise operations, atomics etc. 326 327Sometimes the pointers passed into inline assembly do not point to valid memory. 328In such cases they are ignored at runtime. 329 330 331Runtime library 332--------------- 333 334The code is located in ``mm/kmsan/``. 335 336Per-task KMSAN state 337~~~~~~~~~~~~~~~~~~~~ 338 339Every task_struct has an associated KMSAN task state that holds the KMSAN 340context (see above) and a per-task flag disallowing KMSAN reports:: 341 342 struct kmsan_context { 343 ... 344 bool allow_reporting; 345 struct kmsan_context_state cstate; 346 ... 347 } 348 349 struct task_struct { 350 ... 351 struct kmsan_context kmsan; 352 ... 353 } 354 355KMSAN contexts 356~~~~~~~~~~~~~~ 357 358When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to 359hold the metadata for function parameters and return values. 360 361But in the case the kernel is running in the interrupt, softirq or NMI context, 362where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: 363 364 DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx); 365 366Metadata allocation 367~~~~~~~~~~~~~~~~~~~ 368 369There are several places in the kernel for which the metadata is stored. 370 3711. Each ``struct page`` instance contains two pointers to its shadow and 372origin pages:: 373 374 struct page { 375 ... 376 struct page *shadow, *origin; 377 ... 378 }; 379 380At boot-time, the kernel allocates shadow and origin pages for every available 381kernel page. This is done quite late, when the kernel address space is already 382fragmented, so normal data pages may arbitrarily interleave with the metadata 383pages. 384 385This means that in general for two contiguous memory pages their shadow/origin 386pages may not be contiguous. Consequently, if a memory access crosses the 387boundary of a memory block, accesses to shadow/origin memory may potentially 388corrupt other pages or read incorrect values from them. 389 390In practice, contiguous memory pages returned by the same ``alloc_pages()`` 391call will have contiguous metadata, whereas if these pages belong to two 392different allocations their metadata pages can be fragmented. 393 394For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions 395there also are no guarantees on metadata contiguity. 396 397In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two 398pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions:: 399 400 char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 401 char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 402 403``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. 404All stores to ``dummy_store_page`` are ignored. 405 4062. For vmalloc memory and modules, there is a direct mapping between the memory 407range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only 408the first quarter available to ``vmalloc()``. The second quarter of the vmalloc 409area contains shadow memory for the first quarter, the third one holds the 410origins. A small part of the fourth quarter contains shadow and origins for the 411kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for 412more details. 413 414When an array of pages is mapped into a contiguous virtual memory space, their 415shadow and origin pages are similarly mapped into contiguous regions. 416 417References 418========== 419 420E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized 421memory use in C++ 422<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_. 423In Proceedings of CGO 2015. 424 425.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html 426.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html 427.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/ 428