1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0 2*ff61f079SJonathan Corbet 3*ff61f079SJonathan Corbet===================== 4*ff61f079SJonathan CorbetFake NUMA For CPUSets 5*ff61f079SJonathan Corbet===================== 6*ff61f079SJonathan Corbet 7*ff61f079SJonathan Corbet:Author: David Rientjes <rientjes@cs.washington.edu> 8*ff61f079SJonathan Corbet 9*ff61f079SJonathan CorbetUsing numa=fake and CPUSets for Resource Management 10*ff61f079SJonathan Corbet 11*ff61f079SJonathan CorbetThis document describes how the numa=fake x86_64 command-line option can be used 12*ff61f079SJonathan Corbetin conjunction with cpusets for coarse memory management. Using this feature, 13*ff61f079SJonathan Corbetyou can create fake NUMA nodes that represent contiguous chunks of memory and 14*ff61f079SJonathan Corbetassign them to cpusets and their attached tasks. This is a way of limiting the 15*ff61f079SJonathan Corbetamount of system memory that are available to a certain class of tasks. 16*ff61f079SJonathan Corbet 17*ff61f079SJonathan CorbetFor more information on the features of cpusets, see 18*ff61f079SJonathan CorbetDocumentation/admin-guide/cgroup-v1/cpusets.rst. 19*ff61f079SJonathan CorbetThere are a number of different configurations you can use for your needs. For 20*ff61f079SJonathan Corbetmore information on the numa=fake command line option and its various ways of 21*ff61f079SJonathan Corbetconfiguring fake nodes, see Documentation/arch/x86/x86_64/boot-options.rst. 22*ff61f079SJonathan Corbet 23*ff61f079SJonathan CorbetFor the purposes of this introduction, we'll assume a very primitive NUMA 24*ff61f079SJonathan Corbetemulation setup of "numa=fake=4*512,". This will split our system memory into 25*ff61f079SJonathan Corbetfour equal chunks of 512M each that we can now use to assign to cpusets. As 26*ff61f079SJonathan Corbetyou become more familiar with using this combination for resource control, 27*ff61f079SJonathan Corbetyou'll determine a better setup to minimize the number of nodes you have to deal 28*ff61f079SJonathan Corbetwith. 29*ff61f079SJonathan Corbet 30*ff61f079SJonathan CorbetA machine may be split as follows with "numa=fake=4*512," as reported by dmesg:: 31*ff61f079SJonathan Corbet 32*ff61f079SJonathan Corbet Faking node 0 at 0000000000000000-0000000020000000 (512MB) 33*ff61f079SJonathan Corbet Faking node 1 at 0000000020000000-0000000040000000 (512MB) 34*ff61f079SJonathan Corbet Faking node 2 at 0000000040000000-0000000060000000 (512MB) 35*ff61f079SJonathan Corbet Faking node 3 at 0000000060000000-0000000080000000 (512MB) 36*ff61f079SJonathan Corbet ... 37*ff61f079SJonathan Corbet On node 0 totalpages: 130975 38*ff61f079SJonathan Corbet On node 1 totalpages: 131072 39*ff61f079SJonathan Corbet On node 2 totalpages: 131072 40*ff61f079SJonathan Corbet On node 3 totalpages: 131072 41*ff61f079SJonathan Corbet 42*ff61f079SJonathan CorbetNow following the instructions for mounting the cpusets filesystem from 43*ff61f079SJonathan CorbetDocumentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory 44*ff61f079SJonathan Corbetaddress spaces) to individual cpusets:: 45*ff61f079SJonathan Corbet 46*ff61f079SJonathan Corbet [root@xroads /]# mkdir exampleset 47*ff61f079SJonathan Corbet [root@xroads /]# mount -t cpuset none exampleset 48*ff61f079SJonathan Corbet [root@xroads /]# mkdir exampleset/ddset 49*ff61f079SJonathan Corbet [root@xroads /]# cd exampleset/ddset 50*ff61f079SJonathan Corbet [root@xroads /exampleset/ddset]# echo 0-1 > cpus 51*ff61f079SJonathan Corbet [root@xroads /exampleset/ddset]# echo 0-1 > mems 52*ff61f079SJonathan Corbet 53*ff61f079SJonathan CorbetNow this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for 54*ff61f079SJonathan Corbetmemory allocations (1G). 55*ff61f079SJonathan Corbet 56*ff61f079SJonathan CorbetYou can now assign tasks to these cpusets to limit the memory resources 57*ff61f079SJonathan Corbetavailable to them according to the fake nodes assigned as mems:: 58*ff61f079SJonathan Corbet 59*ff61f079SJonathan Corbet [root@xroads /exampleset/ddset]# echo $$ > tasks 60*ff61f079SJonathan Corbet [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G 61*ff61f079SJonathan Corbet [1] 13425 62*ff61f079SJonathan Corbet 63*ff61f079SJonathan CorbetNotice the difference between the system memory usage as reported by 64*ff61f079SJonathan Corbet/proc/meminfo between the restricted cpuset case above and the unrestricted 65*ff61f079SJonathan Corbetcase (i.e. running the same 'dd' command without assigning it to a fake NUMA 66*ff61f079SJonathan Corbetcpuset): 67*ff61f079SJonathan Corbet 68*ff61f079SJonathan Corbet ======== ============ ========== 69*ff61f079SJonathan Corbet Name Unrestricted Restricted 70*ff61f079SJonathan Corbet ======== ============ ========== 71*ff61f079SJonathan Corbet MemTotal 3091900 kB 3091900 kB 72*ff61f079SJonathan Corbet MemFree 42113 kB 1513236 kB 73*ff61f079SJonathan Corbet ======== ============ ========== 74*ff61f079SJonathan Corbet 75*ff61f079SJonathan CorbetThis allows for coarse memory management for the tasks you assign to particular 76*ff61f079SJonathan Corbetcpusets. Since cpusets can form a hierarchy, you can create some pretty 77*ff61f079SJonathan Corbetinteresting combinations of use-cases for various classes of tasks for your 78*ff61f079SJonathan Corbetmemory management needs. 79