xref: /openbmc/linux/Documentation/mm/split_page_table_lock.rst (revision e6b9d8eddb1772d99a676a906d42865293934edd)
1=====================
2Split page table lock
3=====================
4
5Originally, mm->page_table_lock spinlock protected all page tables of the
6mm_struct. But this approach leads to poor page fault scalability of
7multi-threaded applications due high contention on the lock. To improve
8scalability, split page table lock was introduced.
9
10With split page table lock we have separate per-table lock to serialize
11access to the table. At the moment we use split lock for PTE and PMD
12tables. Access to higher level tables protected by mm->page_table_lock.
13
14There are helpers to lock/unlock a table and other accessor functions:
15
16 - pte_offset_map_lock()
17	maps pte and takes PTE table lock, returns pointer to the taken
18	lock;
19 - pte_unmap_unlock()
20	unlocks and unmaps PTE table;
21 - pte_alloc_map_lock()
22	allocates PTE table if needed and take the lock, returns pointer
23	to taken lock or NULL if allocation failed;
24 - pte_lockptr()
25	returns pointer to PTE table lock;
26 - pmd_lock()
27	takes PMD table lock, returns pointer to taken lock;
28 - pmd_lockptr()
29	returns pointer to PMD table lock;
30
31Split page table lock for PTE tables is enabled compile-time if
32CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
33If split lock is disabled, all tables are guarded by mm->page_table_lock.
34
35Split page table lock for PMD tables is enabled, if it's enabled for PTE
36tables and the architecture supports it (see below).
37
38Hugetlb and split page table lock
39=================================
40
41Hugetlb can support several page sizes. We use split lock only for PMD
42level, but not for PUD.
43
44Hugetlb-specific helpers:
45
46 - huge_pte_lock()
47	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
48	otherwise;
49 - huge_pte_lockptr()
50	returns pointer to table lock;
51
52Support of split page table lock by an architecture
53===================================================
54
55There's no need in special enabling of PTE split page table lock: everything
56required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
57must be called on PTE table allocation / freeing.
58
59Make sure the architecture doesn't use slab allocator for page table
60allocation: slab uses page->slab_cache for its pages.
61This field shares storage with page->ptl.
62
63PMD split lock only makes sense if you have more than two page table
64levels.
65
66PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
67allocation and pgtable_pmd_page_dtor() on freeing.
68
69Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
70pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
71paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
72
73With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
74
75NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
76be handled properly.
77
78page->ptl
79=========
80
81page->ptl is used to access split page table lock, where 'page' is struct
82page of page containing the table. It shares storage with page->private
83(and few other fields in union).
84
85To avoid increasing size of struct page and have best performance, we use a
86trick:
87
88 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
89   can avoid indirect access and save a cache line.
90 - if size of spinlock_t is bigger then size of long, we use page->ptl as
91   pointer to spinlock_t and allocate it dynamically. This allows to use
92   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
93   one more cache line for indirect access;
94
95The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
96pgtable_pmd_page_ctor() for PMD table.
97
98Please, never access page->ptl directly -- use appropriate helper.
99