xref: /openbmc/linux/Documentation/trace/events-kmem.rst (revision cdd38c5f1ce4398ec58fec95904b75824daab7b5)
13cdd868eSChangbin Du============================
23cdd868eSChangbin DuSubsystem Trace Points: kmem
33cdd868eSChangbin Du============================
43cdd868eSChangbin Du
53cdd868eSChangbin DuThe kmem tracing system captures events related to object and page allocation
63cdd868eSChangbin Duwithin the kernel. Broadly speaking there are five major subheadings.
73cdd868eSChangbin Du
83cdd868eSChangbin Du  - Slab allocation of small objects of unknown type (kmalloc)
93cdd868eSChangbin Du  - Slab allocation of small objects of known type
103cdd868eSChangbin Du  - Page allocation
113cdd868eSChangbin Du  - Per-CPU Allocator Activity
123cdd868eSChangbin Du  - External Fragmentation
133cdd868eSChangbin Du
143cdd868eSChangbin DuThis document describes what each of the tracepoints is and why they
153cdd868eSChangbin Dumight be useful.
163cdd868eSChangbin Du
173cdd868eSChangbin Du1. Slab allocation of small objects of unknown type
183cdd868eSChangbin Du===================================================
193cdd868eSChangbin Du::
203cdd868eSChangbin Du
213cdd868eSChangbin Du  kmalloc		call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
223cdd868eSChangbin Du  kmalloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
233cdd868eSChangbin Du  kfree		call_site=%lx ptr=%p
243cdd868eSChangbin Du
253cdd868eSChangbin DuHeavy activity for these events may indicate that a specific cache is
263cdd868eSChangbin Dujustified, particularly if kmalloc slab pages are getting significantly
273cdd868eSChangbin Duinternal fragmented as a result of the allocation pattern. By correlating
283cdd868eSChangbin Dukmalloc with kfree, it may be possible to identify memory leaks and where
293cdd868eSChangbin Duthe allocation sites were.
303cdd868eSChangbin Du
313cdd868eSChangbin Du
323cdd868eSChangbin Du2. Slab allocation of small objects of known type
333cdd868eSChangbin Du=================================================
343cdd868eSChangbin Du::
353cdd868eSChangbin Du
363cdd868eSChangbin Du  kmem_cache_alloc	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
373cdd868eSChangbin Du  kmem_cache_alloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
383cdd868eSChangbin Du  kmem_cache_free		call_site=%lx ptr=%p
393cdd868eSChangbin Du
403cdd868eSChangbin DuThese events are similar in usage to the kmalloc-related events except that
413cdd868eSChangbin Duit is likely easier to pin the event down to a specific cache. At the time
423cdd868eSChangbin Duof writing, no information is available on what slab is being allocated from,
433cdd868eSChangbin Dubut the call_site can usually be used to extrapolate that information.
443cdd868eSChangbin Du
453cdd868eSChangbin Du3. Page allocation
463cdd868eSChangbin Du==================
473cdd868eSChangbin Du::
483cdd868eSChangbin Du
493cdd868eSChangbin Du  mm_page_alloc		  page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
503cdd868eSChangbin Du  mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
513cdd868eSChangbin Du  mm_page_free		  page=%p pfn=%lu order=%d
523cdd868eSChangbin Du  mm_page_free_batched	  page=%p pfn=%lu order=%d cold=%d
533cdd868eSChangbin Du
543cdd868eSChangbin DuThese four events deal with page allocation and freeing. mm_page_alloc is
553cdd868eSChangbin Dua simple indicator of page allocator activity. Pages may be allocated from
563cdd868eSChangbin Duthe per-CPU allocator (high performance) or the buddy allocator.
573cdd868eSChangbin Du
583cdd868eSChangbin DuIf pages are allocated directly from the buddy allocator, the
593cdd868eSChangbin Dumm_page_alloc_zone_locked event is triggered. This event is important as high
603cdd868eSChangbin Duamounts of activity imply high activity on the zone->lock. Taking this lock
613cdd868eSChangbin Duimpairs performance by disabling interrupts, dirtying cache lines between
623cdd868eSChangbin DuCPUs and serialising many CPUs.
633cdd868eSChangbin Du
643cdd868eSChangbin DuWhen a page is freed directly by the caller, the only mm_page_free event
653cdd868eSChangbin Duis triggered. Significant amounts of activity here could indicate that the
663cdd868eSChangbin Ducallers should be batching their activities.
673cdd868eSChangbin Du
683cdd868eSChangbin DuWhen pages are freed in batch, the also mm_page_free_batched is triggered.
693cdd868eSChangbin DuBroadly speaking, pages are taken off the LRU lock in bulk and
703cdd868eSChangbin Dufreed in batch with a page list. Significant amounts of activity here could
713cdd868eSChangbin Duindicate that the system is under memory pressure and can also indicate
72*15b44736SHugh Dickinscontention on the lruvec->lru_lock.
733cdd868eSChangbin Du
743cdd868eSChangbin Du4. Per-CPU Allocator Activity
753cdd868eSChangbin Du=============================
763cdd868eSChangbin Du::
773cdd868eSChangbin Du
783cdd868eSChangbin Du  mm_page_alloc_zone_locked	page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
793cdd868eSChangbin Du  mm_page_pcpu_drain		page=%p pfn=%lu order=%d cpu=%d migratetype=%d
803cdd868eSChangbin Du
813cdd868eSChangbin DuIn front of the page allocator is a per-cpu page allocator. It exists only
823cdd868eSChangbin Dufor order-0 pages, reduces contention on the zone->lock and reduces the
833cdd868eSChangbin Duamount of writing on struct page.
843cdd868eSChangbin Du
853cdd868eSChangbin DuWhen a per-CPU list is empty or pages of the wrong type are allocated,
863cdd868eSChangbin Duthe zone->lock will be taken once and the per-CPU list refilled. The event
873cdd868eSChangbin Dutriggered is mm_page_alloc_zone_locked for each page allocated with the
883cdd868eSChangbin Duevent indicating whether it is for a percpu_refill or not.
893cdd868eSChangbin Du
903cdd868eSChangbin DuWhen the per-CPU list is too full, a number of pages are freed, each one
913cdd868eSChangbin Duwhich triggers a mm_page_pcpu_drain event.
923cdd868eSChangbin Du
933cdd868eSChangbin DuThe individual nature of the events is so that pages can be tracked
943cdd868eSChangbin Dubetween allocation and freeing. A number of drain or refill pages that occur
953cdd868eSChangbin Duconsecutively imply the zone->lock being taken once. Large amounts of per-CPU
963cdd868eSChangbin Durefills and drains could imply an imbalance between CPUs where too much work
973cdd868eSChangbin Duis being concentrated in one place. It could also indicate that the per-CPU
983cdd868eSChangbin Dulists should be a larger size. Finally, large amounts of refills on one CPU
993cdd868eSChangbin Duand drains on another could be a factor in causing large amounts of cache
1003cdd868eSChangbin Duline bounces due to writes between CPUs and worth investigating if pages
1013cdd868eSChangbin Ducan be allocated and freed on the same CPU through some algorithm change.
1023cdd868eSChangbin Du
1033cdd868eSChangbin Du5. External Fragmentation
1043cdd868eSChangbin Du=========================
1053cdd868eSChangbin Du::
1063cdd868eSChangbin Du
1073cdd868eSChangbin Du  mm_page_alloc_extfrag		page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
1083cdd868eSChangbin Du
1093cdd868eSChangbin DuExternal fragmentation affects whether a high-order allocation will be
1103cdd868eSChangbin Dusuccessful or not. For some types of hardware, this is important although
1113cdd868eSChangbin Duit is avoided where possible. If the system is using huge pages and needs
1123cdd868eSChangbin Duto be able to resize the pool over the lifetime of the system, this value
1133cdd868eSChangbin Duis important.
1143cdd868eSChangbin Du
1153cdd868eSChangbin DuLarge numbers of this event implies that memory is fragmenting and
1163cdd868eSChangbin Duhigh-order allocations will start failing at some time in the future. One
1173cdd868eSChangbin Dumeans of reducing the occurrence of this event is to increase the size of
1183cdd868eSChangbin Dumin_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
1193cdd868eSChangbin Dupageblock_size is usually the size of the default hugepage size.
120