13cdd868eSChangbin Du============================ 23cdd868eSChangbin DuSubsystem Trace Points: kmem 33cdd868eSChangbin Du============================ 43cdd868eSChangbin Du 53cdd868eSChangbin DuThe kmem tracing system captures events related to object and page allocation 63cdd868eSChangbin Duwithin the kernel. Broadly speaking there are five major subheadings. 73cdd868eSChangbin Du 83cdd868eSChangbin Du - Slab allocation of small objects of unknown type (kmalloc) 93cdd868eSChangbin Du - Slab allocation of small objects of known type 103cdd868eSChangbin Du - Page allocation 113cdd868eSChangbin Du - Per-CPU Allocator Activity 123cdd868eSChangbin Du - External Fragmentation 133cdd868eSChangbin Du 143cdd868eSChangbin DuThis document describes what each of the tracepoints is and why they 153cdd868eSChangbin Dumight be useful. 163cdd868eSChangbin Du 173cdd868eSChangbin Du1. Slab allocation of small objects of unknown type 183cdd868eSChangbin Du=================================================== 193cdd868eSChangbin Du:: 203cdd868eSChangbin Du 213cdd868eSChangbin Du kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 223cdd868eSChangbin Du kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d 233cdd868eSChangbin Du kfree call_site=%lx ptr=%p 243cdd868eSChangbin Du 253cdd868eSChangbin DuHeavy activity for these events may indicate that a specific cache is 263cdd868eSChangbin Dujustified, particularly if kmalloc slab pages are getting significantly 273cdd868eSChangbin Duinternal fragmented as a result of the allocation pattern. By correlating 283cdd868eSChangbin Dukmalloc with kfree, it may be possible to identify memory leaks and where 293cdd868eSChangbin Duthe allocation sites were. 303cdd868eSChangbin Du 313cdd868eSChangbin Du 323cdd868eSChangbin Du2. Slab allocation of small objects of known type 333cdd868eSChangbin Du================================================= 343cdd868eSChangbin Du:: 353cdd868eSChangbin Du 363cdd868eSChangbin Du kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 373cdd868eSChangbin Du kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d 383cdd868eSChangbin Du kmem_cache_free call_site=%lx ptr=%p 393cdd868eSChangbin Du 403cdd868eSChangbin DuThese events are similar in usage to the kmalloc-related events except that 413cdd868eSChangbin Duit is likely easier to pin the event down to a specific cache. At the time 423cdd868eSChangbin Duof writing, no information is available on what slab is being allocated from, 433cdd868eSChangbin Dubut the call_site can usually be used to extrapolate that information. 443cdd868eSChangbin Du 453cdd868eSChangbin Du3. Page allocation 463cdd868eSChangbin Du================== 473cdd868eSChangbin Du:: 483cdd868eSChangbin Du 493cdd868eSChangbin Du mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s 503cdd868eSChangbin Du mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d 513cdd868eSChangbin Du mm_page_free page=%p pfn=%lu order=%d 523cdd868eSChangbin Du mm_page_free_batched page=%p pfn=%lu order=%d cold=%d 533cdd868eSChangbin Du 543cdd868eSChangbin DuThese four events deal with page allocation and freeing. mm_page_alloc is 553cdd868eSChangbin Dua simple indicator of page allocator activity. Pages may be allocated from 563cdd868eSChangbin Duthe per-CPU allocator (high performance) or the buddy allocator. 573cdd868eSChangbin Du 583cdd868eSChangbin DuIf pages are allocated directly from the buddy allocator, the 593cdd868eSChangbin Dumm_page_alloc_zone_locked event is triggered. This event is important as high 603cdd868eSChangbin Duamounts of activity imply high activity on the zone->lock. Taking this lock 613cdd868eSChangbin Duimpairs performance by disabling interrupts, dirtying cache lines between 623cdd868eSChangbin DuCPUs and serialising many CPUs. 633cdd868eSChangbin Du 643cdd868eSChangbin DuWhen a page is freed directly by the caller, the only mm_page_free event 653cdd868eSChangbin Duis triggered. Significant amounts of activity here could indicate that the 663cdd868eSChangbin Ducallers should be batching their activities. 673cdd868eSChangbin Du 683cdd868eSChangbin DuWhen pages are freed in batch, the also mm_page_free_batched is triggered. 693cdd868eSChangbin DuBroadly speaking, pages are taken off the LRU lock in bulk and 703cdd868eSChangbin Dufreed in batch with a page list. Significant amounts of activity here could 713cdd868eSChangbin Duindicate that the system is under memory pressure and can also indicate 72*15b44736SHugh Dickinscontention on the lruvec->lru_lock. 733cdd868eSChangbin Du 743cdd868eSChangbin Du4. Per-CPU Allocator Activity 753cdd868eSChangbin Du============================= 763cdd868eSChangbin Du:: 773cdd868eSChangbin Du 783cdd868eSChangbin Du mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d 793cdd868eSChangbin Du mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d 803cdd868eSChangbin Du 813cdd868eSChangbin DuIn front of the page allocator is a per-cpu page allocator. It exists only 823cdd868eSChangbin Dufor order-0 pages, reduces contention on the zone->lock and reduces the 833cdd868eSChangbin Duamount of writing on struct page. 843cdd868eSChangbin Du 853cdd868eSChangbin DuWhen a per-CPU list is empty or pages of the wrong type are allocated, 863cdd868eSChangbin Duthe zone->lock will be taken once and the per-CPU list refilled. The event 873cdd868eSChangbin Dutriggered is mm_page_alloc_zone_locked for each page allocated with the 883cdd868eSChangbin Duevent indicating whether it is for a percpu_refill or not. 893cdd868eSChangbin Du 903cdd868eSChangbin DuWhen the per-CPU list is too full, a number of pages are freed, each one 913cdd868eSChangbin Duwhich triggers a mm_page_pcpu_drain event. 923cdd868eSChangbin Du 933cdd868eSChangbin DuThe individual nature of the events is so that pages can be tracked 943cdd868eSChangbin Dubetween allocation and freeing. A number of drain or refill pages that occur 953cdd868eSChangbin Duconsecutively imply the zone->lock being taken once. Large amounts of per-CPU 963cdd868eSChangbin Durefills and drains could imply an imbalance between CPUs where too much work 973cdd868eSChangbin Duis being concentrated in one place. It could also indicate that the per-CPU 983cdd868eSChangbin Dulists should be a larger size. Finally, large amounts of refills on one CPU 993cdd868eSChangbin Duand drains on another could be a factor in causing large amounts of cache 1003cdd868eSChangbin Duline bounces due to writes between CPUs and worth investigating if pages 1013cdd868eSChangbin Ducan be allocated and freed on the same CPU through some algorithm change. 1023cdd868eSChangbin Du 1033cdd868eSChangbin Du5. External Fragmentation 1043cdd868eSChangbin Du========================= 1053cdd868eSChangbin Du:: 1063cdd868eSChangbin Du 1073cdd868eSChangbin Du mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d 1083cdd868eSChangbin Du 1093cdd868eSChangbin DuExternal fragmentation affects whether a high-order allocation will be 1103cdd868eSChangbin Dusuccessful or not. For some types of hardware, this is important although 1113cdd868eSChangbin Duit is avoided where possible. If the system is using huge pages and needs 1123cdd868eSChangbin Duto be able to resize the pool over the lifetime of the system, this value 1133cdd868eSChangbin Duis important. 1143cdd868eSChangbin Du 1153cdd868eSChangbin DuLarge numbers of this event implies that memory is fragmenting and 1163cdd868eSChangbin Duhigh-order allocations will start failing at some time in the future. One 1173cdd868eSChangbin Dumeans of reducing the occurrence of this event is to increase the size of 1183cdd868eSChangbin Dumin_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where 1193cdd868eSChangbin Dupageblock_size is usually the size of the default hugepage size. 120