1.. SPDX-License-Identifier: GPL-2.0 2.. Copyright (C) 2020, Google LLC. 3 4Kernel Electric-Fence (KFENCE) 5============================== 6 7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety 8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and 9invalid-free errors. 10 11KFENCE is designed to be enabled in production kernels, and has near zero 12performance overhead. Compared to KASAN, KFENCE trades performance for 13precision. The main motivation behind KFENCE's design, is that with enough 14total uptime KFENCE will detect bugs in code paths not typically exercised by 15non-production test workloads. One way to quickly achieve a large enough total 16uptime is when the tool is deployed across a large fleet of machines. 17 18Usage 19----- 20 21To enable KFENCE, configure the kernel with:: 22 23 CONFIG_KFENCE=y 24 25To build a kernel with KFENCE support, but disabled by default (to enable, set 26``kfence.sample_interval`` to non-zero value), configure the kernel with:: 27 28 CONFIG_KFENCE=y 29 CONFIG_KFENCE_SAMPLE_INTERVAL=0 30 31KFENCE provides several other configuration options to customize behaviour (see 32the respective help text in ``lib/Kconfig.kfence`` for more info). 33 34Tuning performance 35~~~~~~~~~~~~~~~~~~ 36 37The most important parameter is KFENCE's sample interval, which can be set via 38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The 39sample interval determines the frequency with which heap allocations will be 40guarded by KFENCE. The default is configurable via the Kconfig option 41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` 42disables KFENCE. 43 44The KFENCE memory pool is of fixed size, and if the pool is exhausted, no 45further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 46255), the number of available guarded objects can be controlled. Each object 47requires 2 pages, one for the object itself and the other one used as a guard 48page; object pages are interleaved with guard pages, and every object page is 49therefore surrounded by two guard pages. 50 51The total memory dedicated to the KFENCE memory pool can be computed as:: 52 53 ( #objects + 1 ) * 2 * PAGE_SIZE 54 55Using the default config, and assuming a page size of 4 KiB, results in 56dedicating 2 MiB to the KFENCE memory pool. 57 58Note: On architectures that support huge pages, KFENCE will ensure that the 59pool is using pages of size ``PAGE_SIZE``. This will result in additional page 60tables being allocated. 61 62Error reports 63~~~~~~~~~~~~~ 64 65A typical out-of-bounds access looks like this:: 66 67 ================================================================== 68 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234 69 70 Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): 71 test_out_of_bounds_read+0xa6/0x234 72 kunit_try_run_case+0x61/0xa0 73 kunit_generic_run_threadfn_adapter+0x16/0x30 74 kthread+0x176/0x1b0 75 ret_from_fork+0x22/0x30 76 77 kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32 78 79 allocated by task 484 on cpu 0 at 32.919330s: 80 test_alloc+0xfe/0x738 81 test_out_of_bounds_read+0x9b/0x234 82 kunit_try_run_case+0x61/0xa0 83 kunit_generic_run_threadfn_adapter+0x16/0x30 84 kthread+0x176/0x1b0 85 ret_from_fork+0x22/0x30 86 87 CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7 88 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 89 ================================================================== 90 91The header of the report provides a short summary of the function involved in 92the access. It is followed by more detailed information about the access and 93its origin. Note that, real kernel addresses are only shown when using the 94kernel command line option ``no_hash_pointers``. 95 96Use-after-free accesses are reported as:: 97 98 ================================================================== 99 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 100 101 Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): 102 test_use_after_free_read+0xb3/0x143 103 kunit_try_run_case+0x61/0xa0 104 kunit_generic_run_threadfn_adapter+0x16/0x30 105 kthread+0x176/0x1b0 106 ret_from_fork+0x22/0x30 107 108 kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32 109 110 allocated by task 488 on cpu 2 at 33.871326s: 111 test_alloc+0xfe/0x738 112 test_use_after_free_read+0x76/0x143 113 kunit_try_run_case+0x61/0xa0 114 kunit_generic_run_threadfn_adapter+0x16/0x30 115 kthread+0x176/0x1b0 116 ret_from_fork+0x22/0x30 117 118 freed by task 488 on cpu 2 at 33.871358s: 119 test_use_after_free_read+0xa8/0x143 120 kunit_try_run_case+0x61/0xa0 121 kunit_generic_run_threadfn_adapter+0x16/0x30 122 kthread+0x176/0x1b0 123 ret_from_fork+0x22/0x30 124 125 CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 126 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 127 ================================================================== 128 129KFENCE also reports on invalid frees, such as double-frees:: 130 131 ================================================================== 132 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 133 134 Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): 135 test_double_free+0xdc/0x171 136 kunit_try_run_case+0x61/0xa0 137 kunit_generic_run_threadfn_adapter+0x16/0x30 138 kthread+0x176/0x1b0 139 ret_from_fork+0x22/0x30 140 141 kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32 142 143 allocated by task 490 on cpu 1 at 34.175321s: 144 test_alloc+0xfe/0x738 145 test_double_free+0x76/0x171 146 kunit_try_run_case+0x61/0xa0 147 kunit_generic_run_threadfn_adapter+0x16/0x30 148 kthread+0x176/0x1b0 149 ret_from_fork+0x22/0x30 150 151 freed by task 490 on cpu 1 at 34.175348s: 152 test_double_free+0xa8/0x171 153 kunit_try_run_case+0x61/0xa0 154 kunit_generic_run_threadfn_adapter+0x16/0x30 155 kthread+0x176/0x1b0 156 ret_from_fork+0x22/0x30 157 158 CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 159 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 160 ================================================================== 161 162KFENCE also uses pattern-based redzones on the other side of an object's guard 163page, to detect out-of-bounds writes on the unprotected side of the object. 164These are reported on frees:: 165 166 ================================================================== 167 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 168 169 Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): 170 test_kmalloc_aligned_oob_write+0xef/0x184 171 kunit_try_run_case+0x61/0xa0 172 kunit_generic_run_threadfn_adapter+0x16/0x30 173 kthread+0x176/0x1b0 174 ret_from_fork+0x22/0x30 175 176 kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96 177 178 allocated by task 502 on cpu 7 at 42.159302s: 179 test_alloc+0xfe/0x738 180 test_kmalloc_aligned_oob_write+0x57/0x184 181 kunit_try_run_case+0x61/0xa0 182 kunit_generic_run_threadfn_adapter+0x16/0x30 183 kthread+0x176/0x1b0 184 ret_from_fork+0x22/0x30 185 186 CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 187 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 188 ================================================================== 189 190For such errors, the address where the corruption occurred as well as the 191invalidly written bytes (offset from the address) are shown; in this 192representation, '.' denote untouched bytes. In the example above ``0xac`` is 193the value written to the invalid address at offset 0, and the remaining '.' 194denote that no following bytes have been touched. Note that, real values are 195only shown if the kernel was booted with ``no_hash_pointers``; to avoid 196information disclosure otherwise, '!' is used instead to denote invalidly 197written bytes. 198 199And finally, KFENCE may also report on invalid accesses to any protected page 200where it was not possible to determine an associated object, e.g. if adjacent 201object pages had not yet been allocated:: 202 203 ================================================================== 204 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 205 206 Invalid read at 0xffffffffb670b00a: 207 test_invalid_access+0x26/0xe0 208 kunit_try_run_case+0x51/0x85 209 kunit_generic_run_threadfn_adapter+0x16/0x30 210 kthread+0x137/0x160 211 ret_from_fork+0x22/0x30 212 213 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 214 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 215 ================================================================== 216 217DebugFS interface 218~~~~~~~~~~~~~~~~~ 219 220Some debugging information is exposed via debugfs: 221 222* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. 223 224* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects 225 allocated via KFENCE, including those already freed but protected. 226 227Implementation Details 228---------------------- 229 230Guarded allocations are set up based on the sample interval. After expiration 231of the sample interval, the next allocation through the main allocator (SLAB or 232SLUB) returns a guarded allocation from the KFENCE object pool (allocation 233sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and 234the next allocation is set up after the expiration of the interval. 235 236When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated" 237through the main allocator's fast-path by relying on static branches via the 238static keys infrastructure. The static branch is toggled to redirect the 239allocation to KFENCE. Depending on sample interval, target workloads, and 240system architecture, this may perform better than the simple dynamic branch. 241Careful benchmarking is recommended. 242 243KFENCE objects each reside on a dedicated page, at either the left or right 244page boundaries selected at random. The pages to the left and right of the 245object page are "guard pages", whose attributes are changed to a protected 246state, and cause page faults on any attempted access. Such page faults are then 247intercepted by KFENCE, which handles the fault gracefully by reporting an 248out-of-bounds access, and marking the page as accessible so that the faulting 249code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). 250 251To detect out-of-bounds writes to memory within the object's page itself, 252KFENCE also uses pattern-based redzones. For each object page, a redzone is set 253up for all non-object memory. For typical alignments, the redzone is only 254required on the unguarded side of an object. Because KFENCE must honor the 255cache's requested alignment, special alignments may result in unprotected gaps 256on either side of an object, all of which are redzoned. 257 258The following figure illustrates the page layout:: 259 260 ---+-----------+-----------+-----------+-----------+-----------+--- 261 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | 262 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | 263 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | 264 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | 265 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | 266 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | 267 ---+-----------+-----------+-----------+-----------+-----------+--- 268 269Upon deallocation of a KFENCE object, the object's page is again protected and 270the object is marked as freed. Any further access to the object causes a fault 271and KFENCE reports a use-after-free access. Freed objects are inserted at the 272tail of KFENCE's freelist, so that the least recently freed objects are reused 273first, and the chances of detecting use-after-frees of recently freed objects 274is increased. 275 276If pool utilization reaches 75% (default) or above, to reduce the risk of the 277pool eventually being fully occupied by allocated objects yet ensure diverse 278coverage of allocations, KFENCE limits currently covered allocations of the 279same source from further filling up the pool. The "source" of an allocation is 280based on its partial allocation stack trace. A side-effect is that this also 281limits frequent long-lived allocations (e.g. pagecache) of the same source 282filling up the pool permanently, which is the most common risk for the pool 283becoming full and the sampled allocation rate dropping to zero. The threshold 284at which to start limiting currently covered allocations can be configured via 285the boot parameter ``kfence.skip_covered_thresh`` (pool usage%). 286 287Interface 288--------- 289 290The following describes the functions which are used by allocators as well as 291page handling code to set up and deal with KFENCE allocations. 292 293.. kernel-doc:: include/linux/kfence.h 294 :functions: is_kfence_address 295 kfence_shutdown_cache 296 kfence_alloc kfence_free __kfence_free 297 kfence_ksize kfence_object_start 298 kfence_handle_page_fault 299 300Related Tools 301------------- 302 303In userspace, a similar approach is taken by `GWP-ASan 304<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and 305a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is 306directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another 307similar but non-sampling approach, that also inspired the name "KFENCE", can be 308found in the userspace `Electric Fence Malloc Debugger 309<https://linux.die.net/man/3/efence>`_. 310 311In the kernel, several tools exist to debug memory access errors, and in 312particular KASAN can detect all bug classes that KFENCE can detect. While KASAN 313is more precise, relying on compiler instrumentation, this comes at a 314performance cost. 315 316It is worth highlighting that KASAN and KFENCE are complementary, with 317different target environments. For instance, KASAN is the better debugging-aid, 318where test cases or reproducers exists: due to the lower chance to detect the 319error, it would require more effort using KFENCE to debug. Deployments at scale 320that cannot afford to enable KASAN, however, would benefit from using KFENCE to 321discover bugs due to code paths not exercised by test cases or fuzzers. 322