xref: /openbmc/linux/Documentation/mm/page_table_check.rst (revision 1760371b277718062211fc7eb6f3042c5051c1a5)
1ee65728eSMike Rapoport.. SPDX-License-Identifier: GPL-2.0
2ee65728eSMike Rapoport
3ee65728eSMike Rapoport================
4ee65728eSMike RapoportPage Table Check
5ee65728eSMike Rapoport================
6ee65728eSMike Rapoport
7ee65728eSMike RapoportIntroduction
8ee65728eSMike Rapoport============
9ee65728eSMike Rapoport
10ee65728eSMike RapoportPage table check allows to harden the kernel by ensuring that some types of
11ee65728eSMike Rapoportthe memory corruptions are prevented.
12ee65728eSMike Rapoport
13ee65728eSMike RapoportPage table check performs extra verifications at the time when new pages become
14ee65728eSMike Rapoportaccessible from the userspace by getting their page table entries (PTEs PMDs
15ee65728eSMike Rapoportetc.) added into the table.
16ee65728eSMike Rapoport
17*5b485efcSPeter XuIn case of most detected corruption, the kernel is crashed. There is a small
18ee65728eSMike Rapoportperformance and memory overhead associated with the page table check. Therefore,
19ee65728eSMike Rapoportit is disabled by default, but can be optionally enabled on systems where the
20ee65728eSMike Rapoportextra hardening outweighs the performance costs. Also, because page table check
21ee65728eSMike Rapoportis synchronous, it can help with debugging double map memory corruption issues,
22ee65728eSMike Rapoportby crashing kernel at the time wrong mapping occurs instead of later which is
23ee65728eSMike Rapoportoften the case with memory corruptions bugs.
24ee65728eSMike Rapoport
25*5b485efcSPeter XuIt can also be used to do page table entry checks over various flags, dump
26*5b485efcSPeter Xuwarnings when illegal combinations of entry flags are detected.  Currently,
27*5b485efcSPeter Xuuserfaultfd is the only user of such to sanity check wr-protect bit against
28*5b485efcSPeter Xuany writable flags.  Illegal flag combinations will not directly cause data
29*5b485efcSPeter Xucorruption in this case immediately, but that will cause read-only data to
30*5b485efcSPeter Xube writable, leading to corrupt when the page content is later modified.
31*5b485efcSPeter Xu
32ee65728eSMike RapoportDouble mapping detection logic
33ee65728eSMike Rapoport==============================
34ee65728eSMike Rapoport
35ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
36ee65728eSMike Rapoport| Current Mapping   | New mapping       | Permissions       | Rule             |
37ee65728eSMike Rapoport+===================+===================+===================+==================+
38ee65728eSMike Rapoport| Anonymous         | Anonymous         | Read              | Allow            |
39ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
40ee65728eSMike Rapoport| Anonymous         | Anonymous         | Read / Write      | Prohibit         |
41ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
42ee65728eSMike Rapoport| Anonymous         | Named             | Any               | Prohibit         |
43ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
44ee65728eSMike Rapoport| Named             | Anonymous         | Any               | Prohibit         |
45ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
46ee65728eSMike Rapoport| Named             | Named             | Any               | Allow            |
47ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+
48ee65728eSMike Rapoport
49ee65728eSMike RapoportEnabling Page Table Check
50ee65728eSMike Rapoport=========================
51ee65728eSMike Rapoport
52ee65728eSMike RapoportBuild kernel with:
53ee65728eSMike Rapoport
54ee65728eSMike Rapoport- PAGE_TABLE_CHECK=y
55ee65728eSMike Rapoport  Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
56ee65728eSMike Rapoport  is available.
57ee65728eSMike Rapoport
58ee65728eSMike Rapoport- Boot with 'page_table_check=on' kernel parameter.
59ee65728eSMike Rapoport
60ee65728eSMike RapoportOptionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
61ee65728eSMike Rapoporttable support without extra kernel parameter.
6281a31a86SRuihan Li
6381a31a86SRuihan LiImplementation notes
6481a31a86SRuihan Li====================
6581a31a86SRuihan Li
6681a31a86SRuihan LiWe specifically decided not to use VMA information in order to avoid relying on
6781a31a86SRuihan LiMM states (except for limited "struct page" info). The page table check is a
6881a31a86SRuihan Liseparate from Linux-MM state machine that verifies that the user accessible
6981a31a86SRuihan Lipages are not falsely shared.
7081a31a86SRuihan Li
7181a31a86SRuihan LiPAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
7281a31a86SRuihan LiEXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
7381a31a86SRuihan Liregions into the userspace via /dev/mem. At the same time, pages may change
7481a31a86SRuihan Litheir properties (e.g., from anonymous pages to named pages) while they are
7581a31a86SRuihan Listill being mapped in the userspace, leading to "corruption" detected by the
7681a31a86SRuihan Lipage table check.
7781a31a86SRuihan Li
7881a31a86SRuihan LiEven with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
7981a31a86SRuihan Li/dev/mem. However, these pages are always considered as named pages, so they
8081a31a86SRuihan Liwon't break the logic used in the page table check.
81