1=====================
2Split page table lock
3=====================
4
5Originally, mm->page_table_lock spinlock protected all page tables of the
6mm_struct. But this approach leads to poor page fault scalability of
7multi-threaded applications due high contention on the lock. To improve
8scalability, split page table lock was introduced.
9
10With split page table lock we have separate per-table lock to serialize
11access to the table. At the moment we use split lock for PTE and PMD
12tables. Access to higher level tables protected by mm->page_table_lock.
13
14There are helpers to lock/unlock a table and other accessor functions:
15
16 - pte_offset_map_lock()
17	maps PTE and takes PTE table lock, returns pointer to PTE with
18	pointer to its PTE table lock, or returns NULL if no PTE table;
19 - pte_offset_map_nolock()
20	maps PTE, returns pointer to PTE with pointer to its PTE table
21	lock (not taken), or returns NULL if no PTE table;
22 - pte_offset_map()
23	maps PTE, returns pointer to PTE, or returns NULL if no PTE table;
24 - pte_unmap()
25	unmaps PTE table;
26 - pte_unmap_unlock()
27	unlocks and unmaps PTE table;
28 - pte_alloc_map_lock()
29	allocates PTE table if needed and takes its lock, returns pointer to
30	PTE with pointer to its lock, or returns NULL if allocation failed;
31 - pmd_lock()
32	takes PMD table lock, returns pointer to taken lock;
33 - pmd_lockptr()
34	returns pointer to PMD table lock;
35
36Split page table lock for PTE tables is enabled compile-time if
37CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
38If split lock is disabled, all tables are guarded by mm->page_table_lock.
39
40Split page table lock for PMD tables is enabled, if it's enabled for PTE
41tables and the architecture supports it (see below).
42
43Hugetlb and split page table lock
44=================================
45
46Hugetlb can support several page sizes. We use split lock only for PMD
47level, but not for PUD.
48
49Hugetlb-specific helpers:
50
51 - huge_pte_lock()
52	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
53	otherwise;
54 - huge_pte_lockptr()
55	returns pointer to table lock;
56
57Support of split page table lock by an architecture
58===================================================
59
60There's no need in special enabling of PTE split page table lock: everything
61required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
62must be called on PTE table allocation / freeing.
63
64Make sure the architecture doesn't use slab allocator for page table
65allocation: slab uses page->slab_cache for its pages.
66This field shares storage with page->ptl.
67
68PMD split lock only makes sense if you have more than two page table
69levels.
70
71PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
72allocation and pagetable_pmd_dtor() on freeing.
73
74Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
75pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
76paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
77
78With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
79
80NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
81be handled properly.
82
83page->ptl
84=========
85
86page->ptl is used to access split page table lock, where 'page' is struct
87page of page containing the table. It shares storage with page->private
88(and few other fields in union).
89
90To avoid increasing size of struct page and have best performance, we use a
91trick:
92
93 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
94   can avoid indirect access and save a cache line.
95 - if size of spinlock_t is bigger then size of long, we use page->ptl as
96   pointer to spinlock_t and allocate it dynamically. This allows to use
97   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
98   one more cache line for indirect access;
99
100The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
101pagetable_pmd_ctor() for PMD table.
102
103Please, never access page->ptl directly -- use appropriate helper.
104