1ee65728eSMike Rapoport.. SPDX-License-Identifier: GPL-2.0 2ee65728eSMike Rapoport 3ee65728eSMike Rapoport================ 4ee65728eSMike RapoportPage Table Check 5ee65728eSMike Rapoport================ 6ee65728eSMike Rapoport 7ee65728eSMike RapoportIntroduction 8ee65728eSMike Rapoport============ 9ee65728eSMike Rapoport 10ee65728eSMike RapoportPage table check allows to harden the kernel by ensuring that some types of 11ee65728eSMike Rapoportthe memory corruptions are prevented. 12ee65728eSMike Rapoport 13ee65728eSMike RapoportPage table check performs extra verifications at the time when new pages become 14ee65728eSMike Rapoportaccessible from the userspace by getting their page table entries (PTEs PMDs 15ee65728eSMike Rapoportetc.) added into the table. 16ee65728eSMike Rapoport 17*5b485efcSPeter XuIn case of most detected corruption, the kernel is crashed. There is a small 18ee65728eSMike Rapoportperformance and memory overhead associated with the page table check. Therefore, 19ee65728eSMike Rapoportit is disabled by default, but can be optionally enabled on systems where the 20ee65728eSMike Rapoportextra hardening outweighs the performance costs. Also, because page table check 21ee65728eSMike Rapoportis synchronous, it can help with debugging double map memory corruption issues, 22ee65728eSMike Rapoportby crashing kernel at the time wrong mapping occurs instead of later which is 23ee65728eSMike Rapoportoften the case with memory corruptions bugs. 24ee65728eSMike Rapoport 25*5b485efcSPeter XuIt can also be used to do page table entry checks over various flags, dump 26*5b485efcSPeter Xuwarnings when illegal combinations of entry flags are detected. Currently, 27*5b485efcSPeter Xuuserfaultfd is the only user of such to sanity check wr-protect bit against 28*5b485efcSPeter Xuany writable flags. Illegal flag combinations will not directly cause data 29*5b485efcSPeter Xucorruption in this case immediately, but that will cause read-only data to 30*5b485efcSPeter Xube writable, leading to corrupt when the page content is later modified. 31*5b485efcSPeter Xu 32ee65728eSMike RapoportDouble mapping detection logic 33ee65728eSMike Rapoport============================== 34ee65728eSMike Rapoport 35ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 36ee65728eSMike Rapoport| Current Mapping | New mapping | Permissions | Rule | 37ee65728eSMike Rapoport+===================+===================+===================+==================+ 38ee65728eSMike Rapoport| Anonymous | Anonymous | Read | Allow | 39ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 40ee65728eSMike Rapoport| Anonymous | Anonymous | Read / Write | Prohibit | 41ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 42ee65728eSMike Rapoport| Anonymous | Named | Any | Prohibit | 43ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 44ee65728eSMike Rapoport| Named | Anonymous | Any | Prohibit | 45ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 46ee65728eSMike Rapoport| Named | Named | Any | Allow | 47ee65728eSMike Rapoport+-------------------+-------------------+-------------------+------------------+ 48ee65728eSMike Rapoport 49ee65728eSMike RapoportEnabling Page Table Check 50ee65728eSMike Rapoport========================= 51ee65728eSMike Rapoport 52ee65728eSMike RapoportBuild kernel with: 53ee65728eSMike Rapoport 54ee65728eSMike Rapoport- PAGE_TABLE_CHECK=y 55ee65728eSMike Rapoport Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK 56ee65728eSMike Rapoport is available. 57ee65728eSMike Rapoport 58ee65728eSMike Rapoport- Boot with 'page_table_check=on' kernel parameter. 59ee65728eSMike Rapoport 60ee65728eSMike RapoportOptionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page 61ee65728eSMike Rapoporttable support without extra kernel parameter. 6281a31a86SRuihan Li 6381a31a86SRuihan LiImplementation notes 6481a31a86SRuihan Li==================== 6581a31a86SRuihan Li 6681a31a86SRuihan LiWe specifically decided not to use VMA information in order to avoid relying on 6781a31a86SRuihan LiMM states (except for limited "struct page" info). The page table check is a 6881a31a86SRuihan Liseparate from Linux-MM state machine that verifies that the user accessible 6981a31a86SRuihan Lipages are not falsely shared. 7081a31a86SRuihan Li 7181a31a86SRuihan LiPAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without 7281a31a86SRuihan LiEXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory 7381a31a86SRuihan Liregions into the userspace via /dev/mem. At the same time, pages may change 7481a31a86SRuihan Litheir properties (e.g., from anonymous pages to named pages) while they are 7581a31a86SRuihan Listill being mapped in the userspace, leading to "corruption" detected by the 7681a31a86SRuihan Lipage table check. 7781a31a86SRuihan Li 7881a31a86SRuihan LiEven with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via 7981a31a86SRuihan Li/dev/mem. However, these pages are always considered as named pages, so they 8081a31a86SRuihan Liwon't break the logic used in the page table check. 81