1.. SPDX-License-Identifier: GPL-2.0 2.. Copyright (C) 2020, Google LLC. 3 4Kernel Electric-Fence (KFENCE) 5============================== 6 7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety 8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and 9invalid-free errors. 10 11KFENCE is designed to be enabled in production kernels, and has near zero 12performance overhead. Compared to KASAN, KFENCE trades performance for 13precision. The main motivation behind KFENCE's design, is that with enough 14total uptime KFENCE will detect bugs in code paths not typically exercised by 15non-production test workloads. One way to quickly achieve a large enough total 16uptime is when the tool is deployed across a large fleet of machines. 17 18Usage 19----- 20 21To enable KFENCE, configure the kernel with:: 22 23 CONFIG_KFENCE=y 24 25To build a kernel with KFENCE support, but disabled by default (to enable, set 26``kfence.sample_interval`` to non-zero value), configure the kernel with:: 27 28 CONFIG_KFENCE=y 29 CONFIG_KFENCE_SAMPLE_INTERVAL=0 30 31KFENCE provides several other configuration options to customize behaviour (see 32the respective help text in ``lib/Kconfig.kfence`` for more info). 33 34Tuning performance 35~~~~~~~~~~~~~~~~~~ 36 37The most important parameter is KFENCE's sample interval, which can be set via 38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The 39sample interval determines the frequency with which heap allocations will be 40guarded by KFENCE. The default is configurable via the Kconfig option 41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` 42disables KFENCE. 43 44The KFENCE memory pool is of fixed size, and if the pool is exhausted, no 45further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 46255), the number of available guarded objects can be controlled. Each object 47requires 2 pages, one for the object itself and the other one used as a guard 48page; object pages are interleaved with guard pages, and every object page is 49therefore surrounded by two guard pages. 50 51The total memory dedicated to the KFENCE memory pool can be computed as:: 52 53 ( #objects + 1 ) * 2 * PAGE_SIZE 54 55Using the default config, and assuming a page size of 4 KiB, results in 56dedicating 2 MiB to the KFENCE memory pool. 57 58Note: On architectures that support huge pages, KFENCE will ensure that the 59pool is using pages of size ``PAGE_SIZE``. This will result in additional page 60tables being allocated. 61 62Error reports 63~~~~~~~~~~~~~ 64 65A typical out-of-bounds access looks like this:: 66 67 ================================================================== 68 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b 69 70 Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): 71 test_out_of_bounds_read+0xa3/0x22b 72 kunit_try_run_case+0x51/0x85 73 kunit_generic_run_threadfn_adapter+0x16/0x30 74 kthread+0x137/0x160 75 ret_from_fork+0x22/0x30 76 77 kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: 78 test_alloc+0xf3/0x25b 79 test_out_of_bounds_read+0x98/0x22b 80 kunit_try_run_case+0x51/0x85 81 kunit_generic_run_threadfn_adapter+0x16/0x30 82 kthread+0x137/0x160 83 ret_from_fork+0x22/0x30 84 85 CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 86 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 87 ================================================================== 88 89The header of the report provides a short summary of the function involved in 90the access. It is followed by more detailed information about the access and 91its origin. Note that, real kernel addresses are only shown when using the 92kernel command line option ``no_hash_pointers``. 93 94Use-after-free accesses are reported as:: 95 96 ================================================================== 97 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 98 99 Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): 100 test_use_after_free_read+0xb3/0x143 101 kunit_try_run_case+0x51/0x85 102 kunit_generic_run_threadfn_adapter+0x16/0x30 103 kthread+0x137/0x160 104 ret_from_fork+0x22/0x30 105 106 kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: 107 test_alloc+0xf3/0x25b 108 test_use_after_free_read+0x76/0x143 109 kunit_try_run_case+0x51/0x85 110 kunit_generic_run_threadfn_adapter+0x16/0x30 111 kthread+0x137/0x160 112 ret_from_fork+0x22/0x30 113 114 freed by task 507: 115 test_use_after_free_read+0xa8/0x143 116 kunit_try_run_case+0x51/0x85 117 kunit_generic_run_threadfn_adapter+0x16/0x30 118 kthread+0x137/0x160 119 ret_from_fork+0x22/0x30 120 121 CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 122 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 123 ================================================================== 124 125KFENCE also reports on invalid frees, such as double-frees:: 126 127 ================================================================== 128 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 129 130 Invalid free of 0xffffffffb6741000: 131 test_double_free+0xdc/0x171 132 kunit_try_run_case+0x51/0x85 133 kunit_generic_run_threadfn_adapter+0x16/0x30 134 kthread+0x137/0x160 135 ret_from_fork+0x22/0x30 136 137 kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: 138 test_alloc+0xf3/0x25b 139 test_double_free+0x76/0x171 140 kunit_try_run_case+0x51/0x85 141 kunit_generic_run_threadfn_adapter+0x16/0x30 142 kthread+0x137/0x160 143 ret_from_fork+0x22/0x30 144 145 freed by task 507: 146 test_double_free+0xa8/0x171 147 kunit_try_run_case+0x51/0x85 148 kunit_generic_run_threadfn_adapter+0x16/0x30 149 kthread+0x137/0x160 150 ret_from_fork+0x22/0x30 151 152 CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 153 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 154 ================================================================== 155 156KFENCE also uses pattern-based redzones on the other side of an object's guard 157page, to detect out-of-bounds writes on the unprotected side of the object. 158These are reported on frees:: 159 160 ================================================================== 161 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 162 163 Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): 164 test_kmalloc_aligned_oob_write+0xef/0x184 165 kunit_try_run_case+0x51/0x85 166 kunit_generic_run_threadfn_adapter+0x16/0x30 167 kthread+0x137/0x160 168 ret_from_fork+0x22/0x30 169 170 kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: 171 test_alloc+0xf3/0x25b 172 test_kmalloc_aligned_oob_write+0x57/0x184 173 kunit_try_run_case+0x51/0x85 174 kunit_generic_run_threadfn_adapter+0x16/0x30 175 kthread+0x137/0x160 176 ret_from_fork+0x22/0x30 177 178 CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 179 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 180 ================================================================== 181 182For such errors, the address where the corruption occurred as well as the 183invalidly written bytes (offset from the address) are shown; in this 184representation, '.' denote untouched bytes. In the example above ``0xac`` is 185the value written to the invalid address at offset 0, and the remaining '.' 186denote that no following bytes have been touched. Note that, real values are 187only shown if the kernel was booted with ``no_hash_pointers``; to avoid 188information disclosure otherwise, '!' is used instead to denote invalidly 189written bytes. 190 191And finally, KFENCE may also report on invalid accesses to any protected page 192where it was not possible to determine an associated object, e.g. if adjacent 193object pages had not yet been allocated:: 194 195 ================================================================== 196 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 197 198 Invalid read at 0xffffffffb670b00a: 199 test_invalid_access+0x26/0xe0 200 kunit_try_run_case+0x51/0x85 201 kunit_generic_run_threadfn_adapter+0x16/0x30 202 kthread+0x137/0x160 203 ret_from_fork+0x22/0x30 204 205 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 207 ================================================================== 208 209DebugFS interface 210~~~~~~~~~~~~~~~~~ 211 212Some debugging information is exposed via debugfs: 213 214* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. 215 216* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects 217 allocated via KFENCE, including those already freed but protected. 218 219Implementation Details 220---------------------- 221 222Guarded allocations are set up based on the sample interval. After expiration 223of the sample interval, the next allocation through the main allocator (SLAB or 224SLUB) returns a guarded allocation from the KFENCE object pool (allocation 225sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and 226the next allocation is set up after the expiration of the interval. To "gate" a 227KFENCE allocation through the main allocator's fast-path without overhead, 228KFENCE relies on static branches via the static keys infrastructure. The static 229branch is toggled to redirect the allocation to KFENCE. 230 231KFENCE objects each reside on a dedicated page, at either the left or right 232page boundaries selected at random. The pages to the left and right of the 233object page are "guard pages", whose attributes are changed to a protected 234state, and cause page faults on any attempted access. Such page faults are then 235intercepted by KFENCE, which handles the fault gracefully by reporting an 236out-of-bounds access, and marking the page as accessible so that the faulting 237code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). 238 239To detect out-of-bounds writes to memory within the object's page itself, 240KFENCE also uses pattern-based redzones. For each object page, a redzone is set 241up for all non-object memory. For typical alignments, the redzone is only 242required on the unguarded side of an object. Because KFENCE must honor the 243cache's requested alignment, special alignments may result in unprotected gaps 244on either side of an object, all of which are redzoned. 245 246The following figure illustrates the page layout:: 247 248 ---+-----------+-----------+-----------+-----------+-----------+--- 249 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | 250 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | 251 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | 252 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | 253 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | 254 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | 255 ---+-----------+-----------+-----------+-----------+-----------+--- 256 257Upon deallocation of a KFENCE object, the object's page is again protected and 258the object is marked as freed. Any further access to the object causes a fault 259and KFENCE reports a use-after-free access. Freed objects are inserted at the 260tail of KFENCE's freelist, so that the least recently freed objects are reused 261first, and the chances of detecting use-after-frees of recently freed objects 262is increased. 263 264Interface 265--------- 266 267The following describes the functions which are used by allocators as well as 268page handling code to set up and deal with KFENCE allocations. 269 270.. kernel-doc:: include/linux/kfence.h 271 :functions: is_kfence_address 272 kfence_shutdown_cache 273 kfence_alloc kfence_free __kfence_free 274 kfence_ksize kfence_object_start 275 kfence_handle_page_fault 276 277Related Tools 278------------- 279 280In userspace, a similar approach is taken by `GWP-ASan 281<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and 282a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is 283directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another 284similar but non-sampling approach, that also inspired the name "KFENCE", can be 285found in the userspace `Electric Fence Malloc Debugger 286<https://linux.die.net/man/3/efence>`_. 287 288In the kernel, several tools exist to debug memory access errors, and in 289particular KASAN can detect all bug classes that KFENCE can detect. While KASAN 290is more precise, relying on compiler instrumentation, this comes at a 291performance cost. 292 293It is worth highlighting that KASAN and KFENCE are complementary, with 294different target environments. For instance, KASAN is the better debugging-aid, 295where test cases or reproducers exists: due to the lower chance to detect the 296error, it would require more effort using KFENCE to debug. Deployments at scale 297that cannot afford to enable KASAN, however, would benefit from using KFENCE to 298discover bugs due to code paths not exercised by test cases or fuzzers. 299