1===================== 2Split page table lock 3===================== 4 5Originally, mm->page_table_lock spinlock protected all page tables of the 6mm_struct. But this approach leads to poor page fault scalability of 7multi-threaded applications due high contention on the lock. To improve 8scalability, split page table lock was introduced. 9 10With split page table lock we have separate per-table lock to serialize 11access to the table. At the moment we use split lock for PTE and PMD 12tables. Access to higher level tables protected by mm->page_table_lock. 13 14There are helpers to lock/unlock a table and other accessor functions: 15 16 - pte_offset_map_lock() 17 maps PTE and takes PTE table lock, returns pointer to PTE with 18 pointer to its PTE table lock, or returns NULL if no PTE table; 19 - pte_offset_map_nolock() 20 maps PTE, returns pointer to PTE with pointer to its PTE table 21 lock (not taken), or returns NULL if no PTE table; 22 - pte_offset_map() 23 maps PTE, returns pointer to PTE, or returns NULL if no PTE table; 24 - pte_unmap() 25 unmaps PTE table; 26 - pte_unmap_unlock() 27 unlocks and unmaps PTE table; 28 - pte_alloc_map_lock() 29 allocates PTE table if needed and takes its lock, returns pointer to 30 PTE with pointer to its lock, or returns NULL if allocation failed; 31 - pmd_lock() 32 takes PMD table lock, returns pointer to taken lock; 33 - pmd_lockptr() 34 returns pointer to PMD table lock; 35 36Split page table lock for PTE tables is enabled compile-time if 37CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. 38If split lock is disabled, all tables are guarded by mm->page_table_lock. 39 40Split page table lock for PMD tables is enabled, if it's enabled for PTE 41tables and the architecture supports it (see below). 42 43Hugetlb and split page table lock 44================================= 45 46Hugetlb can support several page sizes. We use split lock only for PMD 47level, but not for PUD. 48 49Hugetlb-specific helpers: 50 51 - huge_pte_lock() 52 takes pmd split lock for PMD_SIZE page, mm->page_table_lock 53 otherwise; 54 - huge_pte_lockptr() 55 returns pointer to table lock; 56 57Support of split page table lock by an architecture 58=================================================== 59 60There's no need in special enabling of PTE split page table lock: everything 61required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which 62must be called on PTE table allocation / freeing. 63 64Make sure the architecture doesn't use slab allocator for page table 65allocation: slab uses page->slab_cache for its pages. 66This field shares storage with page->ptl. 67 68PMD split lock only makes sense if you have more than two page table 69levels. 70 71PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table 72allocation and pagetable_pmd_dtor() on freeing. 73 74Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and 75pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing 76paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). 77 78With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. 79 80NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must 81be handled properly. 82 83page->ptl 84========= 85 86page->ptl is used to access split page table lock, where 'page' is struct 87page of page containing the table. It shares storage with page->private 88(and few other fields in union). 89 90To avoid increasing size of struct page and have best performance, we use a 91trick: 92 93 - if spinlock_t fits into long, we use page->ptr as spinlock, so we 94 can avoid indirect access and save a cache line. 95 - if size of spinlock_t is bigger then size of long, we use page->ptl as 96 pointer to spinlock_t and allocate it dynamically. This allows to use 97 split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs 98 one more cache line for indirect access; 99 100The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in 101pagetable_pmd_ctor() for PMD table. 102 103Please, never access page->ptl directly -- use appropriate helper. 104