1ee65728eSMike Rapoport========================== 2ee65728eSMike RapoportShort users guide for SLUB 3ee65728eSMike Rapoport========================== 4ee65728eSMike Rapoport 5ee65728eSMike RapoportThe basic philosophy of SLUB is very different from SLAB. SLAB 6ee65728eSMike Rapoportrequires rebuilding the kernel to activate debug options for all 7ee65728eSMike Rapoportslab caches. SLUB always includes full debugging but it is off by default. 8ee65728eSMike RapoportSLUB can enable debugging only for selected slabs in order to avoid 9ee65728eSMike Rapoportan impact on overall system performance which may make a bug more 10ee65728eSMike Rapoportdifficult to find. 11ee65728eSMike Rapoport 12ee65728eSMike RapoportIn order to switch debugging on one can add an option ``slub_debug`` 13ee65728eSMike Rapoportto the kernel command line. That will enable full debugging for 14ee65728eSMike Rapoportall slabs. 15ee65728eSMike Rapoport 16ee65728eSMike RapoportTypically one would then use the ``slabinfo`` command to get statistical 17ee65728eSMike Rapoportdata and perform operation on the slabs. By default ``slabinfo`` only lists 18ee65728eSMike Rapoportslabs that have data in them. See "slabinfo -h" for more options when 19ee65728eSMike Rapoportrunning the command. ``slabinfo`` can be compiled with 20ee65728eSMike Rapoport:: 21ee65728eSMike Rapoport 22*799fb82aSSeongJae Park gcc -o slabinfo tools/mm/slabinfo.c 23ee65728eSMike Rapoport 24ee65728eSMike RapoportSome of the modes of operation of ``slabinfo`` require that slub debugging 25ee65728eSMike Rapoportbe enabled on the command line. F.e. no tracking information will be 26ee65728eSMike Rapoportavailable without debugging on and validation can only partially 27ee65728eSMike Rapoportbe performed if debugging was not switched on. 28ee65728eSMike Rapoport 29ee65728eSMike RapoportSome more sophisticated uses of slub_debug: 30ee65728eSMike Rapoport------------------------------------------- 31ee65728eSMike Rapoport 32ee65728eSMike RapoportParameters may be given to ``slub_debug``. If none is specified then full 33ee65728eSMike Rapoportdebugging is enabled. Format: 34ee65728eSMike Rapoport 35ee65728eSMike Rapoportslub_debug=<Debug-Options> 36ee65728eSMike Rapoport Enable options for all slabs 37ee65728eSMike Rapoport 38ee65728eSMike Rapoportslub_debug=<Debug-Options>,<slab name1>,<slab name2>,... 39ee65728eSMike Rapoport Enable options only for select slabs (no spaces 40ee65728eSMike Rapoport after a comma) 41ee65728eSMike Rapoport 42ee65728eSMike RapoportMultiple blocks of options for all slabs or selected slabs can be given, with 43ee65728eSMike Rapoportblocks of options delimited by ';'. The last of "all slabs" blocks is applied 44ee65728eSMike Rapoportto all slabs except those that match one of the "select slabs" block. Options 45ee65728eSMike Rapoportof the first "select slabs" blocks that matches the slab's name are applied. 46ee65728eSMike Rapoport 47ee65728eSMike RapoportPossible debug options are:: 48ee65728eSMike Rapoport 49ee65728eSMike Rapoport F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS 50ee65728eSMike Rapoport Sorry SLAB legacy issues) 51ee65728eSMike Rapoport Z Red zoning 52ee65728eSMike Rapoport P Poisoning (object and padding) 53ee65728eSMike Rapoport U User tracking (free and alloc) 54ee65728eSMike Rapoport T Trace (please only use on single slabs) 55ee65728eSMike Rapoport A Enable failslab filter mark for the cache 56ee65728eSMike Rapoport O Switch debugging off for caches that would have 57ee65728eSMike Rapoport caused higher minimum slab orders 58ee65728eSMike Rapoport - Switch all debugging off (useful if the kernel is 59ee65728eSMike Rapoport configured with CONFIG_SLUB_DEBUG_ON) 60ee65728eSMike Rapoport 61ee65728eSMike RapoportF.e. in order to boot just with sanity checks and red zoning one would specify:: 62ee65728eSMike Rapoport 63ee65728eSMike Rapoport slub_debug=FZ 64ee65728eSMike Rapoport 65ee65728eSMike RapoportTrying to find an issue in the dentry cache? Try:: 66ee65728eSMike Rapoport 67ee65728eSMike Rapoport slub_debug=,dentry 68ee65728eSMike Rapoport 69ee65728eSMike Rapoportto only enable debugging on the dentry cache. You may use an asterisk at the 70ee65728eSMike Rapoportend of the slab name, in order to cover all slabs with the same prefix. For 71ee65728eSMike Rapoportexample, here's how you can poison the dentry cache as well as all kmalloc 72ee65728eSMike Rapoportslabs:: 73ee65728eSMike Rapoport 74ee65728eSMike Rapoport slub_debug=P,kmalloc-*,dentry 75ee65728eSMike Rapoport 76ee65728eSMike RapoportRed zoning and tracking may realign the slab. We can just apply sanity checks 77ee65728eSMike Rapoportto the dentry cache with:: 78ee65728eSMike Rapoport 79ee65728eSMike Rapoport slub_debug=F,dentry 80ee65728eSMike Rapoport 81ee65728eSMike RapoportDebugging options may require the minimum possible slab order to increase as 82ee65728eSMike Rapoporta result of storing the metadata (for example, caches with PAGE_SIZE object 83ee65728eSMike Rapoportsizes). This has a higher liklihood of resulting in slab allocation errors 84ee65728eSMike Rapoportin low memory situations or if there's high fragmentation of memory. To 85ee65728eSMike Rapoportswitch off debugging for such caches by default, use:: 86ee65728eSMike Rapoport 87ee65728eSMike Rapoport slub_debug=O 88ee65728eSMike Rapoport 89ee65728eSMike RapoportYou can apply different options to different list of slab names, using blocks 90ee65728eSMike Rapoportof options. This will enable red zoning for dentry and user tracking for 91ee65728eSMike Rapoportkmalloc. All other slabs will not get any debugging enabled:: 92ee65728eSMike Rapoport 93ee65728eSMike Rapoport slub_debug=Z,dentry;U,kmalloc-* 94ee65728eSMike Rapoport 95ee65728eSMike RapoportYou can also enable options (e.g. sanity checks and poisoning) for all caches 96ee65728eSMike Rapoportexcept some that are deemed too performance critical and don't need to be 97ee65728eSMike Rapoportdebugged by specifying global debug options followed by a list of slab names 98ee65728eSMike Rapoportwith "-" as options:: 99ee65728eSMike Rapoport 100ee65728eSMike Rapoport slub_debug=FZ;-,zs_handle,zspage 101ee65728eSMike Rapoport 102ee65728eSMike RapoportThe state of each debug option for a slab can be found in the respective files 103ee65728eSMike Rapoportunder:: 104ee65728eSMike Rapoport 105ee65728eSMike Rapoport /sys/kernel/slab/<slab name>/ 106ee65728eSMike Rapoport 107ee65728eSMike RapoportIf the file contains 1, the option is enabled, 0 means disabled. The debug 108ee65728eSMike Rapoportoptions from the ``slub_debug`` parameter translate to the following files:: 109ee65728eSMike Rapoport 110ee65728eSMike Rapoport F sanity_checks 111ee65728eSMike Rapoport Z red_zone 112ee65728eSMike Rapoport P poison 113ee65728eSMike Rapoport U store_user 114ee65728eSMike Rapoport T trace 115ee65728eSMike Rapoport A failslab 116ee65728eSMike Rapoport 1177c82b3b3SAlexander Atanasovfailslab file is writable, so writing 1 or 0 will enable or disable 1187c82b3b3SAlexander Atanasovthe option at runtime. Write returns -EINVAL if cache is an alias. 119ee65728eSMike RapoportCareful with tracing: It may spew out lots of information and never stop if 120ee65728eSMike Rapoportused on the wrong slab. 121ee65728eSMike Rapoport 122ee65728eSMike RapoportSlab merging 123ee65728eSMike Rapoport============ 124ee65728eSMike Rapoport 125ee65728eSMike RapoportIf no debug options are specified then SLUB may merge similar slabs together 126ee65728eSMike Rapoportin order to reduce overhead and increase cache hotness of objects. 127ee65728eSMike Rapoport``slabinfo -a`` displays which slabs were merged together. 128ee65728eSMike Rapoport 129ee65728eSMike RapoportSlab validation 130ee65728eSMike Rapoport=============== 131ee65728eSMike Rapoport 132ee65728eSMike RapoportSLUB can validate all object if the kernel was booted with slub_debug. In 133ee65728eSMike Rapoportorder to do so you must have the ``slabinfo`` tool. Then you can do 134ee65728eSMike Rapoport:: 135ee65728eSMike Rapoport 136ee65728eSMike Rapoport slabinfo -v 137ee65728eSMike Rapoport 138ee65728eSMike Rapoportwhich will test all objects. Output will be generated to the syslog. 139ee65728eSMike Rapoport 140ee65728eSMike RapoportThis also works in a more limited way if boot was without slab debug. 141ee65728eSMike RapoportIn that case ``slabinfo -v`` simply tests all reachable objects. Usually 142ee65728eSMike Rapoportthese are in the cpu slabs and the partial slabs. Full slabs are not 143ee65728eSMike Rapoporttracked by SLUB in a non debug situation. 144ee65728eSMike Rapoport 145ee65728eSMike RapoportGetting more performance 146ee65728eSMike Rapoport======================== 147ee65728eSMike Rapoport 148ee65728eSMike RapoportTo some degree SLUB's performance is limited by the need to take the 149ee65728eSMike Rapoportlist_lock once in a while to deal with partial slabs. That overhead is 150ee65728eSMike Rapoportgoverned by the order of the allocation for each slab. The allocations 151ee65728eSMike Rapoportcan be influenced by kernel parameters: 152ee65728eSMike Rapoport 153ee65728eSMike Rapoport.. slub_min_objects=x (default 4) 154ee65728eSMike Rapoport.. slub_min_order=x (default 0) 155ee65728eSMike Rapoport.. slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) 156ee65728eSMike Rapoport 157ee65728eSMike Rapoport``slub_min_objects`` 158ee65728eSMike Rapoport allows to specify how many objects must at least fit into one 159ee65728eSMike Rapoport slab in order for the allocation order to be acceptable. In 160ee65728eSMike Rapoport general slub will be able to perform this number of 161ee65728eSMike Rapoport allocations on a slab without consulting centralized resources 162ee65728eSMike Rapoport (list_lock) where contention may occur. 163ee65728eSMike Rapoport 164ee65728eSMike Rapoport``slub_min_order`` 165ee65728eSMike Rapoport specifies a minimum order of slabs. A similar effect like 166ee65728eSMike Rapoport ``slub_min_objects``. 167ee65728eSMike Rapoport 168ee65728eSMike Rapoport``slub_max_order`` 169ee65728eSMike Rapoport specified the order at which ``slub_min_objects`` should no 170ee65728eSMike Rapoport longer be checked. This is useful to avoid SLUB trying to 171ee65728eSMike Rapoport generate super large order pages to fit ``slub_min_objects`` 172ee65728eSMike Rapoport of a slab cache with large object sizes into one high order 173ee65728eSMike Rapoport page. Setting command line parameter 174ee65728eSMike Rapoport ``debug_guardpage_minorder=N`` (N > 0), forces setting 175ee65728eSMike Rapoport ``slub_max_order`` to 0, what cause minimum possible order of 176ee65728eSMike Rapoport slabs allocation. 177ee65728eSMike Rapoport 178ee65728eSMike RapoportSLUB Debug output 179ee65728eSMike Rapoport================= 180ee65728eSMike Rapoport 181ee65728eSMike RapoportHere is a sample of slub debug output:: 182ee65728eSMike Rapoport 183ee65728eSMike Rapoport ==================================================================== 184ee65728eSMike Rapoport BUG kmalloc-8: Right Redzone overwritten 185ee65728eSMike Rapoport -------------------------------------------------------------------- 186ee65728eSMike Rapoport 187ee65728eSMike Rapoport INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc 188ee65728eSMike Rapoport INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58 189ee65728eSMike Rapoport INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58 190ee65728eSMike Rapoport INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554 191ee65728eSMike Rapoport 192ee65728eSMike Rapoport Bytes b4 (0xc90f6d10): 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ 193ee65728eSMike Rapoport Object (0xc90f6d20): 31 30 31 39 2e 30 30 35 1019.005 194ee65728eSMike Rapoport Redzone (0xc90f6d28): 00 cc cc cc . 195ee65728eSMike Rapoport Padding (0xc90f6d50): 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ 196ee65728eSMike Rapoport 197ee65728eSMike Rapoport [<c010523d>] dump_trace+0x63/0x1eb 198ee65728eSMike Rapoport [<c01053df>] show_trace_log_lvl+0x1a/0x2f 199ee65728eSMike Rapoport [<c010601d>] show_trace+0x12/0x14 200ee65728eSMike Rapoport [<c0106035>] dump_stack+0x16/0x18 201ee65728eSMike Rapoport [<c017e0fa>] object_err+0x143/0x14b 202ee65728eSMike Rapoport [<c017e2cc>] check_object+0x66/0x234 203ee65728eSMike Rapoport [<c017eb43>] __slab_free+0x239/0x384 204ee65728eSMike Rapoport [<c017f446>] kfree+0xa6/0xc6 205ee65728eSMike Rapoport [<c02e2335>] get_modalias+0xb9/0xf5 206ee65728eSMike Rapoport [<c02e23b7>] dmi_dev_uevent+0x27/0x3c 207ee65728eSMike Rapoport [<c027866a>] dev_uevent+0x1ad/0x1da 208ee65728eSMike Rapoport [<c0205024>] kobject_uevent_env+0x20a/0x45b 209ee65728eSMike Rapoport [<c020527f>] kobject_uevent+0xa/0xf 210ee65728eSMike Rapoport [<c02779f1>] store_uevent+0x4f/0x58 211ee65728eSMike Rapoport [<c027758e>] dev_attr_store+0x29/0x2f 212ee65728eSMike Rapoport [<c01bec4f>] sysfs_write_file+0x16e/0x19c 213ee65728eSMike Rapoport [<c0183ba7>] vfs_write+0xd1/0x15a 214ee65728eSMike Rapoport [<c01841d7>] sys_write+0x3d/0x72 215ee65728eSMike Rapoport [<c0104112>] sysenter_past_esp+0x5f/0x99 216ee65728eSMike Rapoport [<b7f7b410>] 0xb7f7b410 217ee65728eSMike Rapoport ======================= 218ee65728eSMike Rapoport 219ee65728eSMike Rapoport FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc 220ee65728eSMike Rapoport 221ee65728eSMike RapoportIf SLUB encounters a corrupted object (full detection requires the kernel 222ee65728eSMike Rapoportto be booted with slub_debug) then the following output will be dumped 223ee65728eSMike Rapoportinto the syslog: 224ee65728eSMike Rapoport 225ee65728eSMike Rapoport1. Description of the problem encountered 226ee65728eSMike Rapoport 227ee65728eSMike Rapoport This will be a message in the system log starting with:: 228ee65728eSMike Rapoport 229ee65728eSMike Rapoport =============================================== 230ee65728eSMike Rapoport BUG <slab cache affected>: <What went wrong> 231ee65728eSMike Rapoport ----------------------------------------------- 232ee65728eSMike Rapoport 233ee65728eSMike Rapoport INFO: <corruption start>-<corruption_end> <more info> 234ee65728eSMike Rapoport INFO: Slab <address> <slab information> 235ee65728eSMike Rapoport INFO: Object <address> <object information> 236ee65728eSMike Rapoport INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by 237ee65728eSMike Rapoport cpu> pid=<pid of the process> 238ee65728eSMike Rapoport INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu> 239ee65728eSMike Rapoport pid=<pid of the process> 240ee65728eSMike Rapoport 241ee65728eSMike Rapoport (Object allocation / free information is only available if SLAB_STORE_USER is 242ee65728eSMike Rapoport set for the slab. slub_debug sets that option) 243ee65728eSMike Rapoport 244ee65728eSMike Rapoport2. The object contents if an object was involved. 245ee65728eSMike Rapoport 246ee65728eSMike Rapoport Various types of lines can follow the BUG SLUB line: 247ee65728eSMike Rapoport 248ee65728eSMike Rapoport Bytes b4 <address> : <bytes> 249ee65728eSMike Rapoport Shows a few bytes before the object where the problem was detected. 250ee65728eSMike Rapoport Can be useful if the corruption does not stop with the start of the 251ee65728eSMike Rapoport object. 252ee65728eSMike Rapoport 253ee65728eSMike Rapoport Object <address> : <bytes> 254ee65728eSMike Rapoport The bytes of the object. If the object is inactive then the bytes 255ee65728eSMike Rapoport typically contain poison values. Any non-poison value shows a 256ee65728eSMike Rapoport corruption by a write after free. 257ee65728eSMike Rapoport 258ee65728eSMike Rapoport Redzone <address> : <bytes> 259ee65728eSMike Rapoport The Redzone following the object. The Redzone is used to detect 260ee65728eSMike Rapoport writes after the object. All bytes should always have the same 261ee65728eSMike Rapoport value. If there is any deviation then it is due to a write after 262ee65728eSMike Rapoport the object boundary. 263ee65728eSMike Rapoport 264ee65728eSMike Rapoport (Redzone information is only available if SLAB_RED_ZONE is set. 265ee65728eSMike Rapoport slub_debug sets that option) 266ee65728eSMike Rapoport 267ee65728eSMike Rapoport Padding <address> : <bytes> 268ee65728eSMike Rapoport Unused data to fill up the space in order to get the next object 269ee65728eSMike Rapoport properly aligned. In the debug case we make sure that there are 270ee65728eSMike Rapoport at least 4 bytes of padding. This allows the detection of writes 271ee65728eSMike Rapoport before the object. 272ee65728eSMike Rapoport 273ee65728eSMike Rapoport3. A stackdump 274ee65728eSMike Rapoport 275ee65728eSMike Rapoport The stackdump describes the location where the error was detected. The cause 276ee65728eSMike Rapoport of the corruption is may be more likely found by looking at the function that 277ee65728eSMike Rapoport allocated or freed the object. 278ee65728eSMike Rapoport 279ee65728eSMike Rapoport4. Report on how the problem was dealt with in order to ensure the continued 280ee65728eSMike Rapoport operation of the system. 281ee65728eSMike Rapoport 282ee65728eSMike Rapoport These are messages in the system log beginning with:: 283ee65728eSMike Rapoport 284ee65728eSMike Rapoport FIX <slab cache affected>: <corrective action taken> 285ee65728eSMike Rapoport 286ee65728eSMike Rapoport In the above sample SLUB found that the Redzone of an active object has 287ee65728eSMike Rapoport been overwritten. Here a string of 8 characters was written into a slab that 288ee65728eSMike Rapoport has the length of 8 characters. However, a 8 character string needs a 289ee65728eSMike Rapoport terminating 0. That zero has overwritten the first byte of the Redzone field. 290ee65728eSMike Rapoport After reporting the details of the issue encountered the FIX SLUB message 291ee65728eSMike Rapoport tells us that SLUB has restored the Redzone to its proper value and then 292ee65728eSMike Rapoport system operations continue. 293ee65728eSMike Rapoport 294ee65728eSMike RapoportEmergency operations 295ee65728eSMike Rapoport==================== 296ee65728eSMike Rapoport 297ee65728eSMike RapoportMinimal debugging (sanity checks alone) can be enabled by booting with:: 298ee65728eSMike Rapoport 299ee65728eSMike Rapoport slub_debug=F 300ee65728eSMike Rapoport 301ee65728eSMike RapoportThis will be generally be enough to enable the resiliency features of slub 302ee65728eSMike Rapoportwhich will keep the system running even if a bad kernel component will 303ee65728eSMike Rapoportkeep corrupting objects. This may be important for production systems. 304ee65728eSMike RapoportPerformance will be impacted by the sanity checks and there will be a 305ee65728eSMike Rapoportcontinual stream of error messages to the syslog but no additional memory 306ee65728eSMike Rapoportwill be used (unlike full debugging). 307ee65728eSMike Rapoport 308ee65728eSMike RapoportNo guarantees. The kernel component still needs to be fixed. Performance 309ee65728eSMike Rapoportmay be optimized further by locating the slab that experiences corruption 310ee65728eSMike Rapoportand enabling debugging only for that cache 311ee65728eSMike Rapoport 312ee65728eSMike RapoportI.e.:: 313ee65728eSMike Rapoport 314ee65728eSMike Rapoport slub_debug=F,dentry 315ee65728eSMike Rapoport 316ee65728eSMike RapoportIf the corruption occurs by writing after the end of the object then it 317ee65728eSMike Rapoportmay be advisable to enable a Redzone to avoid corrupting the beginning 318ee65728eSMike Rapoportof other objects:: 319ee65728eSMike Rapoport 320ee65728eSMike Rapoport slub_debug=FZ,dentry 321ee65728eSMike Rapoport 322ee65728eSMike RapoportExtended slabinfo mode and plotting 323ee65728eSMike Rapoport=================================== 324ee65728eSMike Rapoport 325ee65728eSMike RapoportThe ``slabinfo`` tool has a special 'extended' ('-X') mode that includes: 326ee65728eSMike Rapoport - Slabcache Totals 327ee65728eSMike Rapoport - Slabs sorted by size (up to -N <num> slabs, default 1) 328ee65728eSMike Rapoport - Slabs sorted by loss (up to -N <num> slabs, default 1) 329ee65728eSMike Rapoport 330ee65728eSMike RapoportAdditionally, in this mode ``slabinfo`` does not dynamically scale 331ee65728eSMike Rapoportsizes (G/M/K) and reports everything in bytes (this functionality is 332ee65728eSMike Rapoportalso available to other slabinfo modes via '-B' option) which makes 333ee65728eSMike Rapoportreporting more precise and accurate. Moreover, in some sense the `-X' 334ee65728eSMike Rapoportmode also simplifies the analysis of slabs' behaviour, because its 335ee65728eSMike Rapoportoutput can be plotted using the ``slabinfo-gnuplot.sh`` script. So it 336ee65728eSMike Rapoportpushes the analysis from looking through the numbers (tons of numbers) 337ee65728eSMike Rapoportto something easier -- visual analysis. 338ee65728eSMike Rapoport 339ee65728eSMike RapoportTo generate plots: 340ee65728eSMike Rapoport 341ee65728eSMike Rapoporta) collect slabinfo extended records, for example:: 342ee65728eSMike Rapoport 343ee65728eSMike Rapoport while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done 344ee65728eSMike Rapoport 345ee65728eSMike Rapoportb) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script:: 346ee65728eSMike Rapoport 347ee65728eSMike Rapoport slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN] 348ee65728eSMike Rapoport 349ee65728eSMike Rapoport The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records 350ee65728eSMike Rapoport and generates 3 png files (and 3 pre-processing cache files) per STATS 351ee65728eSMike Rapoport file: 352ee65728eSMike Rapoport - Slabcache Totals: FOO_STATS-totals.png 353ee65728eSMike Rapoport - Slabs sorted by size: FOO_STATS-slabs-by-size.png 354ee65728eSMike Rapoport - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png 355ee65728eSMike Rapoport 356ee65728eSMike RapoportAnother use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you 357ee65728eSMike Rapoportneed to compare slabs' behaviour "prior to" and "after" some code 358ee65728eSMike Rapoportmodification. To help you out there, ``slabinfo-gnuplot.sh`` script 359ee65728eSMike Rapoportcan 'merge' the `Slabcache Totals` sections from different 360ee65728eSMike Rapoportmeasurements. To visually compare N plots: 361ee65728eSMike Rapoport 362ee65728eSMike Rapoporta) Collect as many STATS1, STATS2, .. STATSN files as you need:: 363ee65728eSMike Rapoport 364ee65728eSMike Rapoport while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done 365ee65728eSMike Rapoport 366ee65728eSMike Rapoportb) Pre-process those STATS files:: 367ee65728eSMike Rapoport 368ee65728eSMike Rapoport slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN 369ee65728eSMike Rapoport 370ee65728eSMike Rapoportc) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the 371ee65728eSMike Rapoport generated pre-processed \*-totals:: 372ee65728eSMike Rapoport 373ee65728eSMike Rapoport slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals 374ee65728eSMike Rapoport 375ee65728eSMike Rapoport This will produce a single plot (png file). 376ee65728eSMike Rapoport 377ee65728eSMike Rapoport Plots, expectedly, can be large so some fluctuations or small spikes 378ee65728eSMike Rapoport can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two 379ee65728eSMike Rapoport options to 'zoom-in'/'zoom-out': 380ee65728eSMike Rapoport 381ee65728eSMike Rapoport a) ``-s %d,%d`` -- overwrites the default image width and height 382ee65728eSMike Rapoport b) ``-r %d,%d`` -- specifies a range of samples to use (for example, 383ee65728eSMike Rapoport in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r 384ee65728eSMike Rapoport 40,60`` range will plot only samples collected between 40th and 385ee65728eSMike Rapoport 60th seconds). 386ee65728eSMike Rapoport 387ee65728eSMike Rapoport 388ee65728eSMike RapoportDebugFS files for SLUB 389ee65728eSMike Rapoport====================== 390ee65728eSMike Rapoport 391ee65728eSMike RapoportFor more information about current state of SLUB caches with the user tracking 392ee65728eSMike Rapoportdebug option enabled, debugfs files are available, typically under 393ee65728eSMike Rapoport/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user 394ee65728eSMike Rapoporttracking). There are 2 types of these files with the following debug 395ee65728eSMike Rapoportinformation: 396ee65728eSMike Rapoport 397ee65728eSMike Rapoport1. alloc_traces:: 398ee65728eSMike Rapoport 399ee65728eSMike Rapoport Prints information about unique allocation traces of the currently 400ee65728eSMike Rapoport allocated objects. The output is sorted by frequency of each trace. 401ee65728eSMike Rapoport 402ee65728eSMike Rapoport Information in the output: 4036edf2576SFeng Tang Number of objects, allocating function, possible memory wastage of 4046edf2576SFeng Tang kmalloc objects(total/per-object), minimal/average/maximal jiffies 4056edf2576SFeng Tang since alloc, pid range of the allocating processes, cpu mask of 4066edf2576SFeng Tang allocating cpus, numa node mask of origins of memory, and stack trace. 407ee65728eSMike Rapoport 408ee65728eSMike Rapoport Example::: 409ee65728eSMike Rapoport 4106edf2576SFeng Tang 338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1 4116edf2576SFeng Tang __kmem_cache_alloc_node+0x11f/0x4e0 4126edf2576SFeng Tang kmalloc_trace+0x26/0xa0 4136edf2576SFeng Tang pci_alloc_dev+0x2c/0xa0 4146edf2576SFeng Tang pci_scan_single_device+0xd2/0x150 4156edf2576SFeng Tang pci_scan_slot+0xf7/0x2d0 4166edf2576SFeng Tang pci_scan_child_bus_extend+0x4e/0x360 4176edf2576SFeng Tang acpi_pci_root_create+0x32e/0x3b0 4186edf2576SFeng Tang pci_acpi_scan_root+0x2b9/0x2d0 4196edf2576SFeng Tang acpi_pci_root_add.cold.11+0x110/0xb0a 4206edf2576SFeng Tang acpi_bus_attach+0x262/0x3f0 4216edf2576SFeng Tang device_for_each_child+0xb7/0x110 4226edf2576SFeng Tang acpi_dev_for_each_child+0x77/0xa0 4236edf2576SFeng Tang acpi_bus_attach+0x108/0x3f0 4246edf2576SFeng Tang device_for_each_child+0xb7/0x110 4256edf2576SFeng Tang acpi_dev_for_each_child+0x77/0xa0 4266edf2576SFeng Tang acpi_bus_attach+0x108/0x3f0 427ee65728eSMike Rapoport 428ee65728eSMike Rapoport2. free_traces:: 429ee65728eSMike Rapoport 430ee65728eSMike Rapoport Prints information about unique freeing traces of the currently allocated 431ee65728eSMike Rapoport objects. The freeing traces thus come from the previous life-cycle of the 432ee65728eSMike Rapoport objects and are reported as not available for objects allocated for the first 433ee65728eSMike Rapoport time. The output is sorted by frequency of each trace. 434ee65728eSMike Rapoport 435ee65728eSMike Rapoport Information in the output: 436ee65728eSMike Rapoport Number of objects, freeing function, minimal/average/maximal jiffies since free, 437ee65728eSMike Rapoport pid range of the freeing processes, cpu mask of freeing cpus, and stack trace. 438ee65728eSMike Rapoport 439ee65728eSMike Rapoport Example::: 440ee65728eSMike Rapoport 441ee65728eSMike Rapoport 1980 <not-available> age=4294912290 pid=0 cpus=0 442ee65728eSMike Rapoport 51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1 443ee65728eSMike Rapoport kfree+0x2db/0x420 444ee65728eSMike Rapoport acpi_ut_update_ref_count+0x6a6/0x782 445ee65728eSMike Rapoport acpi_ut_update_object_reference+0x1ad/0x234 446ee65728eSMike Rapoport acpi_ut_remove_reference+0x7d/0x84 447ee65728eSMike Rapoport acpi_rs_get_prt_method_data+0x97/0xd6 448ee65728eSMike Rapoport acpi_get_irq_routing_table+0x82/0xc4 449ee65728eSMike Rapoport acpi_pci_irq_find_prt_entry+0x8e/0x2e0 450ee65728eSMike Rapoport acpi_pci_irq_lookup+0x3a/0x1e0 451ee65728eSMike Rapoport acpi_pci_irq_enable+0x77/0x240 452ee65728eSMike Rapoport pcibios_enable_device+0x39/0x40 453ee65728eSMike Rapoport do_pci_enable_device.part.0+0x5d/0xe0 454ee65728eSMike Rapoport pci_enable_device_flags+0xfc/0x120 455ee65728eSMike Rapoport pci_enable_device+0x13/0x20 456ee65728eSMike Rapoport virtio_pci_probe+0x9e/0x170 457ee65728eSMike Rapoport local_pci_probe+0x48/0x80 458ee65728eSMike Rapoport pci_device_probe+0x105/0x1c0 459ee65728eSMike Rapoport 460ee65728eSMike RapoportChristoph Lameter, May 30, 2007 461ee65728eSMike RapoportSergey Senozhatsky, October 23, 2015 462