1=============================================== 2Memory Tagging Extension (MTE) in AArch64 Linux 3=============================================== 4 5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> 6 Catalin Marinas <catalin.marinas@arm.com> 7 8Date: 2020-02-25 9 10This document describes the provision of the Memory Tagging Extension 11functionality in AArch64 Linux. 12 13Introduction 14============ 15 16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) 17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI 18(Top Byte Ignore) feature and allows software to access a 4-bit 19allocation tag for each 16-byte granule in the physical address space. 20Such memory range must be mapped with the Normal-Tagged memory 21attribute. A logical tag is derived from bits 59-56 of the virtual 22address used for the memory access. A CPU with MTE enabled will compare 23the logical tag against the allocation tag and potentially raise an 24exception on mismatch, subject to system registers configuration. 25 26Userspace Support 27================= 28 29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is 30supported by the hardware, the kernel advertises the feature to 31userspace via ``HWCAP2_MTE``. 32 33PROT_MTE 34-------- 35 36To access the allocation tags, a user process must enable the Tagged 37memory attribute on an address range using a new ``prot`` flag for 38``mmap()`` and ``mprotect()``: 39 40``PROT_MTE`` - Pages allow access to the MTE allocation tags. 41 42The allocation tag is set to 0 when such pages are first mapped in the 43user address space and preserved on copy-on-write. ``MAP_SHARED`` is 44supported and the allocation tags can be shared between processes. 45 46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and 47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other 48types of mapping will result in ``-EINVAL`` returned by these system 49calls. 50 51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot 52be cleared by ``mprotect()``. 53 54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and 55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any 56point after the system call. 57 58Tag Check Faults 59---------------- 60 61When ``PROT_MTE`` is enabled on an address range and a mismatch between 62the logical and allocation tags occurs on access, there are three 63configurable behaviours: 64 65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the 66 tag check fault. 67 68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with 69 ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The 70 memory access is not performed. If ``SIGSEGV`` is ignored or blocked 71 by the offending thread, the containing process is terminated with a 72 ``coredump``. 73 74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending 75 thread, asynchronously following one or multiple tag check faults, 76 with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting 77 address is unknown). 78 79- *Asymmetric* - Reads are handled as for synchronous mode while writes 80 are handled as for asynchronous mode. 81 82The user can select the above modes, per thread, using the 83``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` 84contains any number of the following values in the ``PR_MTE_TCF_MASK`` 85bit-field: 86 87- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults 88 (ignored if combined with other options) 89- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode 90- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode 91 92If no modes are specified, tag check faults are ignored. If a single 93mode is specified, the program will run in that mode. If multiple 94modes are specified, the mode is selected as described in the "Per-CPU 95preferred tag checking modes" section below. 96 97The current tag check fault configuration can be read using the 98``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If 99multiple modes were requested then all will be reported. 100 101Tag checking can also be disabled for a user thread by setting the 102``PSTATE.TCO`` bit with ``MSR TCO, #1``. 103 104**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, 105irrespective of the interrupted context. ``PSTATE.TCO`` is restored on 106``sigreturn()``. 107 108**Note**: There are no *match-all* logical tags available for user 109applications. 110 111**Note**: Kernel accesses to the user address space (e.g. ``read()`` 112system call) are not checked if the user thread tag checking mode is 113``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is 114``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user 115address accesses, however it cannot always guarantee it. Kernel accesses 116to user addresses are always performed with an effective ``PSTATE.TCO`` 117value of zero, regardless of the user configuration. 118 119Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions 120----------------------------------------------------------------- 121 122The architecture allows excluding certain tags to be randomly generated 123via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux 124excludes all tags other than 0. A user thread can enable specific tags 125in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, 126flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap 127in the ``PR_MTE_TAG_MASK`` bit-field. 128 129**Note**: The hardware uses an exclude mask but the ``prctl()`` 130interface provides an include mask. An include mask of ``0`` (exclusion 131mask ``0xffff``) results in the CPU always generating tag ``0``. 132 133Per-CPU preferred tag checking mode 134----------------------------------- 135 136On some CPUs the performance of MTE in stricter tag checking modes 137is similar to that of less strict tag checking modes. This makes it 138worthwhile to enable stricter checks on those CPUs when a less strict 139checking mode is requested, in order to gain the error detection 140benefits of the stricter checks without the performance downsides. To 141support this scenario, a privileged user may configure a stricter 142tag checking mode as the CPU's preferred tag checking mode. 143 144The preferred tag checking mode for each CPU is controlled by 145``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a 146privileged user may write the value ``async``, ``sync`` or ``asymm``. The 147default preferred mode for each CPU is ``async``. 148 149To allow a program to potentially run in the CPU's preferred tag 150checking mode, the user program may set multiple tag check fault mode 151bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, 152flags, 0, 0, 0)`` system call. If both synchronous and asynchronous 153modes are requested then asymmetric mode may also be selected by the 154kernel. If the CPU's preferred tag checking mode is in the task's set 155of provided tag checking modes, that mode will be selected. Otherwise, 156one of the modes in the task's mode will be selected by the kernel 157from the task's mode set using the preference order: 158 159 1. Asynchronous 160 2. Asymmetric 161 3. Synchronous 162 163Note that there is no way for userspace to request multiple modes and 164also disable asymmetric mode. 165 166Initial process state 167--------------------- 168 169On ``execve()``, the new process has the following configuration: 170 171- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) 172- No tag checking modes are selected (tag check faults ignored) 173- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) 174- ``PSTATE.TCO`` set to 0 175- ``PROT_MTE`` not set on any of the initial memory maps 176 177On ``fork()``, the new process inherits the parent's configuration and 178memory map attributes with the exception of the ``madvise()`` ranges 179with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set 180to 0). 181 182The ``ptrace()`` interface 183-------------------------- 184 185``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read 186the tags from or set the tags to a tracee's address space. The 187``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, 188data)`` where: 189 190- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. 191- ``pid`` - the tracee's PID. 192- ``addr`` - address in the tracee's address space. 193- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to 194 a buffer of ``iov_len`` length in the tracer's address space. 195 196The tags in the tracer's ``iov_base`` buffer are represented as one 1974-bit tag per byte and correspond to a 16-byte MTE tag granule in the 198tracee's address space. 199 200**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel 201will use the corresponding aligned address. 202 203``ptrace()`` return value: 204 205- 0 - tags were copied, the tracer's ``iov_len`` was updated to the 206 number of tags transferred. This may be smaller than the requested 207 ``iov_len`` if the requested address range in the tracee's or the 208 tracer's space cannot be accessed or does not have valid tags. 209- ``-EPERM`` - the specified process cannot be traced. 210- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid 211 address) and no tags copied. ``iov_len`` not updated. 212- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` 213 or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. 214- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never 215 mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. 216 217**Note**: There are no transient errors for the requests above, so user 218programs should not retry in case of a non-zero system call return. 219 220``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == 221``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged 222address ABI control and MTE configuration of a process as per the 223``prctl()`` options described in 224Documentation/arch/arm64/tagged-address-abi.rst and above. The corresponding 225``regset`` is 1 element of 8 bytes (``sizeof(long))``). 226 227Core dump support 228----------------- 229 230The allocation tags for user memory mapped with ``PROT_MTE`` are dumped 231in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The 232program header for such segment is defined as: 233 234:``p_type``: ``PT_AARCH64_MEMTAG_MTE`` 235:``p_flags``: 0 236:``p_offset``: segment file offset 237:``p_vaddr``: segment virtual address, same as the corresponding 238 ``PT_LOAD`` segment 239:``p_paddr``: 0 240:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32`` 241 (two 4-bit tags cover 32 bytes of memory) 242:``p_memsz``: segment size in memory, same as the corresponding 243 ``PT_LOAD`` segment 244:``p_align``: 0 245 246The tags are stored in the core file at ``p_offset`` as two 4-bit tags 247in a byte. With the tag granule of 16 bytes, a 4K page requires 128 248bytes in the core file. 249 250Example of correct usage 251======================== 252 253*MTE Example code* 254 255.. code-block:: c 256 257 /* 258 * To be compiled with -march=armv8.5-a+memtag 259 */ 260 #include <errno.h> 261 #include <stdint.h> 262 #include <stdio.h> 263 #include <stdlib.h> 264 #include <unistd.h> 265 #include <sys/auxv.h> 266 #include <sys/mman.h> 267 #include <sys/prctl.h> 268 269 /* 270 * From arch/arm64/include/uapi/asm/hwcap.h 271 */ 272 #define HWCAP2_MTE (1 << 18) 273 274 /* 275 * From arch/arm64/include/uapi/asm/mman.h 276 */ 277 #define PROT_MTE 0x20 278 279 /* 280 * From include/uapi/linux/prctl.h 281 */ 282 #define PR_SET_TAGGED_ADDR_CTRL 55 283 #define PR_GET_TAGGED_ADDR_CTRL 56 284 # define PR_TAGGED_ADDR_ENABLE (1UL << 0) 285 # define PR_MTE_TCF_SHIFT 1 286 # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) 287 # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) 288 # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) 289 # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) 290 # define PR_MTE_TAG_SHIFT 3 291 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) 292 293 /* 294 * Insert a random logical tag into the given pointer. 295 */ 296 #define insert_random_tag(ptr) ({ \ 297 uint64_t __val; \ 298 asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ 299 __val; \ 300 }) 301 302 /* 303 * Set the allocation tag on the destination address. 304 */ 305 #define set_tag(tagged_addr) do { \ 306 asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ 307 } while (0) 308 309 int main() 310 { 311 unsigned char *a; 312 unsigned long page_sz = sysconf(_SC_PAGESIZE); 313 unsigned long hwcap2 = getauxval(AT_HWCAP2); 314 315 /* check if MTE is present */ 316 if (!(hwcap2 & HWCAP2_MTE)) 317 return EXIT_FAILURE; 318 319 /* 320 * Enable the tagged address ABI, synchronous or asynchronous MTE 321 * tag check faults (based on per-CPU preference) and allow all 322 * non-zero tags in the randomly generated set. 323 */ 324 if (prctl(PR_SET_TAGGED_ADDR_CTRL, 325 PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | 326 (0xfffe << PR_MTE_TAG_SHIFT), 327 0, 0, 0)) { 328 perror("prctl() failed"); 329 return EXIT_FAILURE; 330 } 331 332 a = mmap(0, page_sz, PROT_READ | PROT_WRITE, 333 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 334 if (a == MAP_FAILED) { 335 perror("mmap() failed"); 336 return EXIT_FAILURE; 337 } 338 339 /* 340 * Enable MTE on the above anonymous mmap. The flag could be passed 341 * directly to mmap() and skip this step. 342 */ 343 if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { 344 perror("mprotect() failed"); 345 return EXIT_FAILURE; 346 } 347 348 /* access with the default tag (0) */ 349 a[0] = 1; 350 a[1] = 2; 351 352 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 353 354 /* set the logical and allocation tags */ 355 a = (unsigned char *)insert_random_tag(a); 356 set_tag(a); 357 358 printf("%p\n", a); 359 360 /* non-zero tag access */ 361 a[0] = 3; 362 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 363 364 /* 365 * If MTE is enabled correctly the next instruction will generate an 366 * exception. 367 */ 368 printf("Expecting SIGSEGV...\n"); 369 a[16] = 0xdd; 370 371 /* this should not be printed in the PR_MTE_TCF_SYNC mode */ 372 printf("...haven't got one\n"); 373 374 return EXIT_FAILURE; 375 } 376