Documentation/trace/events-kmem.rst

3cdd868eSChangbin Du============================
3cdd868eSChangbin DuSubsystem Trace Points: kmem
3cdd868eSChangbin Du============================
3cdd868eSChangbin Du
3cdd868eSChangbin DuThe kmem tracing system captures events related to object and page allocation
3cdd868eSChangbin Duwithin the kernel. Broadly speaking there are five major subheadings.
3cdd868eSChangbin Du
3cdd868eSChangbin Du  - Slab allocation of small objects of unknown type (kmalloc)
3cdd868eSChangbin Du  - Slab allocation of small objects of known type
3cdd868eSChangbin Du  - Page allocation
3cdd868eSChangbin Du  - Per-CPU Allocator Activity
3cdd868eSChangbin Du  - External Fragmentation
3cdd868eSChangbin Du
3cdd868eSChangbin DuThis document describes what each of the tracepoints is and why they
3cdd868eSChangbin Dumight be useful.
3cdd868eSChangbin Du
3cdd868eSChangbin Du1. Slab allocation of small objects of unknown type
3cdd868eSChangbin Du===================================================
3cdd868eSChangbin Du::
3cdd868eSChangbin Du
3cdd868eSChangbin Du  kmalloc		call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
3cdd868eSChangbin Du  kmalloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
3cdd868eSChangbin Du  kfree		call_site=%lx ptr=%p
3cdd868eSChangbin Du
3cdd868eSChangbin DuHeavy activity for these events may indicate that a specific cache is
3cdd868eSChangbin Dujustified, particularly if kmalloc slab pages are getting significantly
3cdd868eSChangbin Duinternal fragmented as a result of the allocation pattern. By correlating
3cdd868eSChangbin Dukmalloc with kfree, it may be possible to identify memory leaks and where
3cdd868eSChangbin Duthe allocation sites were.
3cdd868eSChangbin Du
3cdd868eSChangbin Du
3cdd868eSChangbin Du2. Slab allocation of small objects of known type
3cdd868eSChangbin Du=================================================
3cdd868eSChangbin Du::
3cdd868eSChangbin Du
3cdd868eSChangbin Du  kmem_cache_alloc	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
3cdd868eSChangbin Du  kmem_cache_alloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
3cdd868eSChangbin Du  kmem_cache_free		call_site=%lx ptr=%p
3cdd868eSChangbin Du
3cdd868eSChangbin DuThese events are similar in usage to the kmalloc-related events except that
3cdd868eSChangbin Duit is likely easier to pin the event down to a specific cache. At the time
3cdd868eSChangbin Duof writing, no information is available on what slab is being allocated from,
3cdd868eSChangbin Dubut the call_site can usually be used to extrapolate that information.
3cdd868eSChangbin Du
3cdd868eSChangbin Du3. Page allocation
3cdd868eSChangbin Du==================
3cdd868eSChangbin Du::
3cdd868eSChangbin Du
3cdd868eSChangbin Du  mm_page_alloc		  page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
3cdd868eSChangbin Du  mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
3cdd868eSChangbin Du  mm_page_free		  page=%p pfn=%lu order=%d
3cdd868eSChangbin Du  mm_page_free_batched	  page=%p pfn=%lu order=%d cold=%d
3cdd868eSChangbin Du
3cdd868eSChangbin DuThese four events deal with page allocation and freeing. mm_page_alloc is
3cdd868eSChangbin Dua simple indicator of page allocator activity. Pages may be allocated from
3cdd868eSChangbin Duthe per-CPU allocator (high performance) or the buddy allocator.
3cdd868eSChangbin Du
3cdd868eSChangbin DuIf pages are allocated directly from the buddy allocator, the
3cdd868eSChangbin Dumm_page_alloc_zone_locked event is triggered. This event is important as high
3cdd868eSChangbin Duamounts of activity imply high activity on the zone->lock. Taking this lock
3cdd868eSChangbin Duimpairs performance by disabling interrupts, dirtying cache lines between
3cdd868eSChangbin DuCPUs and serialising many CPUs.
3cdd868eSChangbin Du
3cdd868eSChangbin DuWhen a page is freed directly by the caller, the only mm_page_free event
3cdd868eSChangbin Duis triggered. Significant amounts of activity here could indicate that the
3cdd868eSChangbin Ducallers should be batching their activities.
3cdd868eSChangbin Du
3cdd868eSChangbin DuWhen pages are freed in batch, the also mm_page_free_batched is triggered.
3cdd868eSChangbin DuBroadly speaking, pages are taken off the LRU lock in bulk and
3cdd868eSChangbin Dufreed in batch with a page list. Significant amounts of activity here could
3cdd868eSChangbin Duindicate that the system is under memory pressure and can also indicate
*15b44736SHugh Dickinscontention on the lruvec->lru_lock.
3cdd868eSChangbin Du
3cdd868eSChangbin Du4. Per-CPU Allocator Activity
3cdd868eSChangbin Du=============================
3cdd868eSChangbin Du::
3cdd868eSChangbin Du
3cdd868eSChangbin Du  mm_page_alloc_zone_locked	page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
3cdd868eSChangbin Du  mm_page_pcpu_drain		page=%p pfn=%lu order=%d cpu=%d migratetype=%d
3cdd868eSChangbin Du
3cdd868eSChangbin DuIn front of the page allocator is a per-cpu page allocator. It exists only
3cdd868eSChangbin Dufor order-0 pages, reduces contention on the zone->lock and reduces the
3cdd868eSChangbin Duamount of writing on struct page.
3cdd868eSChangbin Du
3cdd868eSChangbin DuWhen a per-CPU list is empty or pages of the wrong type are allocated,
3cdd868eSChangbin Duthe zone->lock will be taken once and the per-CPU list refilled. The event
3cdd868eSChangbin Dutriggered is mm_page_alloc_zone_locked for each page allocated with the
3cdd868eSChangbin Duevent indicating whether it is for a percpu_refill or not.
3cdd868eSChangbin Du
3cdd868eSChangbin DuWhen the per-CPU list is too full, a number of pages are freed, each one
3cdd868eSChangbin Duwhich triggers a mm_page_pcpu_drain event.
3cdd868eSChangbin Du
3cdd868eSChangbin DuThe individual nature of the events is so that pages can be tracked
3cdd868eSChangbin Dubetween allocation and freeing. A number of drain or refill pages that occur
3cdd868eSChangbin Duconsecutively imply the zone->lock being taken once. Large amounts of per-CPU
3cdd868eSChangbin Durefills and drains could imply an imbalance between CPUs where too much work
3cdd868eSChangbin Duis being concentrated in one place. It could also indicate that the per-CPU
3cdd868eSChangbin Dulists should be a larger size. Finally, large amounts of refills on one CPU
3cdd868eSChangbin Duand drains on another could be a factor in causing large amounts of cache
3cdd868eSChangbin Duline bounces due to writes between CPUs and worth investigating if pages
3cdd868eSChangbin Ducan be allocated and freed on the same CPU through some algorithm change.
3cdd868eSChangbin Du
3cdd868eSChangbin Du5. External Fragmentation
3cdd868eSChangbin Du=========================
3cdd868eSChangbin Du::
3cdd868eSChangbin Du
3cdd868eSChangbin Du  mm_page_alloc_extfrag		page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
3cdd868eSChangbin Du
3cdd868eSChangbin DuExternal fragmentation affects whether a high-order allocation will be
3cdd868eSChangbin Dusuccessful or not. For some types of hardware, this is important although
3cdd868eSChangbin Duit is avoided where possible. If the system is using huge pages and needs
3cdd868eSChangbin Duto be able to resize the pool over the lifetime of the system, this value
3cdd868eSChangbin Duis important.
3cdd868eSChangbin Du
3cdd868eSChangbin DuLarge numbers of this event implies that memory is fragmenting and
3cdd868eSChangbin Duhigh-order allocations will start failing at some time in the future. One
3cdd868eSChangbin Dumeans of reducing the occurrence of this event is to increase the size of
3cdd868eSChangbin Dumin_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
3cdd868eSChangbin Dupageblock_size is usually the size of the default hugepage size.