xref: /openbmc/linux/Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0
2*ff61f079SJonathan Corbet
3*ff61f079SJonathan Corbet=====================
4*ff61f079SJonathan CorbetFake NUMA For CPUSets
5*ff61f079SJonathan Corbet=====================
6*ff61f079SJonathan Corbet
7*ff61f079SJonathan Corbet:Author: David Rientjes <rientjes@cs.washington.edu>
8*ff61f079SJonathan Corbet
9*ff61f079SJonathan CorbetUsing numa=fake and CPUSets for Resource Management
10*ff61f079SJonathan Corbet
11*ff61f079SJonathan CorbetThis document describes how the numa=fake x86_64 command-line option can be used
12*ff61f079SJonathan Corbetin conjunction with cpusets for coarse memory management.  Using this feature,
13*ff61f079SJonathan Corbetyou can create fake NUMA nodes that represent contiguous chunks of memory and
14*ff61f079SJonathan Corbetassign them to cpusets and their attached tasks.  This is a way of limiting the
15*ff61f079SJonathan Corbetamount of system memory that are available to a certain class of tasks.
16*ff61f079SJonathan Corbet
17*ff61f079SJonathan CorbetFor more information on the features of cpusets, see
18*ff61f079SJonathan CorbetDocumentation/admin-guide/cgroup-v1/cpusets.rst.
19*ff61f079SJonathan CorbetThere are a number of different configurations you can use for your needs.  For
20*ff61f079SJonathan Corbetmore information on the numa=fake command line option and its various ways of
21*ff61f079SJonathan Corbetconfiguring fake nodes, see Documentation/arch/x86/x86_64/boot-options.rst.
22*ff61f079SJonathan Corbet
23*ff61f079SJonathan CorbetFor the purposes of this introduction, we'll assume a very primitive NUMA
24*ff61f079SJonathan Corbetemulation setup of "numa=fake=4*512,".  This will split our system memory into
25*ff61f079SJonathan Corbetfour equal chunks of 512M each that we can now use to assign to cpusets.  As
26*ff61f079SJonathan Corbetyou become more familiar with using this combination for resource control,
27*ff61f079SJonathan Corbetyou'll determine a better setup to minimize the number of nodes you have to deal
28*ff61f079SJonathan Corbetwith.
29*ff61f079SJonathan Corbet
30*ff61f079SJonathan CorbetA machine may be split as follows with "numa=fake=4*512," as reported by dmesg::
31*ff61f079SJonathan Corbet
32*ff61f079SJonathan Corbet	Faking node 0 at 0000000000000000-0000000020000000 (512MB)
33*ff61f079SJonathan Corbet	Faking node 1 at 0000000020000000-0000000040000000 (512MB)
34*ff61f079SJonathan Corbet	Faking node 2 at 0000000040000000-0000000060000000 (512MB)
35*ff61f079SJonathan Corbet	Faking node 3 at 0000000060000000-0000000080000000 (512MB)
36*ff61f079SJonathan Corbet	...
37*ff61f079SJonathan Corbet	On node 0 totalpages: 130975
38*ff61f079SJonathan Corbet	On node 1 totalpages: 131072
39*ff61f079SJonathan Corbet	On node 2 totalpages: 131072
40*ff61f079SJonathan Corbet	On node 3 totalpages: 131072
41*ff61f079SJonathan Corbet
42*ff61f079SJonathan CorbetNow following the instructions for mounting the cpusets filesystem from
43*ff61f079SJonathan CorbetDocumentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
44*ff61f079SJonathan Corbetaddress spaces) to individual cpusets::
45*ff61f079SJonathan Corbet
46*ff61f079SJonathan Corbet	[root@xroads /]# mkdir exampleset
47*ff61f079SJonathan Corbet	[root@xroads /]# mount -t cpuset none exampleset
48*ff61f079SJonathan Corbet	[root@xroads /]# mkdir exampleset/ddset
49*ff61f079SJonathan Corbet	[root@xroads /]# cd exampleset/ddset
50*ff61f079SJonathan Corbet	[root@xroads /exampleset/ddset]# echo 0-1 > cpus
51*ff61f079SJonathan Corbet	[root@xroads /exampleset/ddset]# echo 0-1 > mems
52*ff61f079SJonathan Corbet
53*ff61f079SJonathan CorbetNow this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for
54*ff61f079SJonathan Corbetmemory allocations (1G).
55*ff61f079SJonathan Corbet
56*ff61f079SJonathan CorbetYou can now assign tasks to these cpusets to limit the memory resources
57*ff61f079SJonathan Corbetavailable to them according to the fake nodes assigned as mems::
58*ff61f079SJonathan Corbet
59*ff61f079SJonathan Corbet	[root@xroads /exampleset/ddset]# echo $$ > tasks
60*ff61f079SJonathan Corbet	[root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G
61*ff61f079SJonathan Corbet	[1] 13425
62*ff61f079SJonathan Corbet
63*ff61f079SJonathan CorbetNotice the difference between the system memory usage as reported by
64*ff61f079SJonathan Corbet/proc/meminfo between the restricted cpuset case above and the unrestricted
65*ff61f079SJonathan Corbetcase (i.e. running the same 'dd' command without assigning it to a fake NUMA
66*ff61f079SJonathan Corbetcpuset):
67*ff61f079SJonathan Corbet
68*ff61f079SJonathan Corbet	========	============	==========
69*ff61f079SJonathan Corbet	Name		Unrestricted	Restricted
70*ff61f079SJonathan Corbet	========	============	==========
71*ff61f079SJonathan Corbet	MemTotal	3091900 kB	3091900 kB
72*ff61f079SJonathan Corbet	MemFree		42113 kB	1513236 kB
73*ff61f079SJonathan Corbet	========	============	==========
74*ff61f079SJonathan Corbet
75*ff61f079SJonathan CorbetThis allows for coarse memory management for the tasks you assign to particular
76*ff61f079SJonathan Corbetcpusets.  Since cpusets can form a hierarchy, you can create some pretty
77*ff61f079SJonathan Corbetinteresting combinations of use-cases for various classes of tasks for your
78*ff61f079SJonathan Corbetmemory management needs.
79