1========================= 2CPU hotplug in the Kernel 3========================= 4 5:Date: December, 2016 6:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, 7 Rusty Russell <rusty@rustcorp.com.au>, 8 Srivatsa Vaddagiri <vatsa@in.ibm.com>, 9 Ashok Raj <ashok.raj@intel.com>, 10 Joel Schopp <jschopp@austin.ibm.com> 11 12Introduction 13============ 14 15Modern advances in system architectures have introduced advanced error 16reporting and correction capabilities in processors. There are couple OEMS that 17support NUMA hardware which are hot pluggable as well, where physical node 18insertion and removal require support for CPU hotplug. 19 20Such advances require CPUs available to a kernel to be removed either for 21provisioning reasons, or for RAS purposes to keep an offending CPU off 22system execution path. Hence the need for CPU hotplug support in the 23Linux kernel. 24 25A more novel use of CPU-hotplug support is its use today in suspend resume 26support for SMP. Dual-core and HT support makes even a laptop run SMP kernels 27which didn't support these methods. 28 29 30Command Line Switches 31===================== 32``maxcpus=n`` 33 Restrict boot time CPUs to *n*. Say if you have four CPUs, using 34 ``maxcpus=2`` will only boot two. You can choose to bring the 35 other CPUs later online. 36 37``nr_cpus=n`` 38 Restrict the total amount of CPUs the kernel will support. If the number 39 supplied here is lower than the number of physically available CPUs, then 40 those CPUs can not be brought online later. 41 42``additional_cpus=n`` 43 Use this to limit hotpluggable CPUs. This option sets 44 ``cpu_possible_mask = cpu_present_mask + additional_cpus`` 45 46 This option is limited to the IA64 architecture. 47 48``possible_cpus=n`` 49 This option sets ``possible_cpus`` bits in ``cpu_possible_mask``. 50 51 This option is limited to the X86 and S390 architecture. 52 53``cpu0_hotplug`` 54 Allow to shutdown CPU0. 55 56 This option is limited to the X86 architecture. 57 58CPU maps 59======== 60 61``cpu_possible_mask`` 62 Bitmap of possible CPUs that can ever be available in the 63 system. This is used to allocate some boot time memory for per_cpu variables 64 that aren't designed to grow/shrink as CPUs are made available or removed. 65 Once set during boot time discovery phase, the map is static, i.e no bits 66 are added or removed anytime. Trimming it accurately for your system needs 67 upfront can save some boot time memory. 68 69``cpu_online_mask`` 70 Bitmap of all CPUs currently online. Its set in ``__cpu_up()`` 71 after a CPU is available for kernel scheduling and ready to receive 72 interrupts from devices. Its cleared when a CPU is brought down using 73 ``__cpu_disable()``, before which all OS services including interrupts are 74 migrated to another target CPU. 75 76``cpu_present_mask`` 77 Bitmap of CPUs currently present in the system. Not all 78 of them may be online. When physical hotplug is processed by the relevant 79 subsystem (e.g ACPI) can change and new bit either be added or removed 80 from the map depending on the event is hot-add/hot-remove. There are currently 81 no locking rules as of now. Typical usage is to init topology during boot, 82 at which time hotplug is disabled. 83 84You really don't need to manipulate any of the system CPU maps. They should 85be read-only for most use. When setting up per-cpu resources almost always use 86``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro 87``for_each_cpu()`` can be used to iterate over a custom CPU mask. 88 89Never use anything other than ``cpumask_t`` to represent bitmap of CPUs. 90 91 92Using CPU hotplug 93================= 94The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently 95available on multiple architectures including ARM, MIPS, PowerPC and X86. The 96configuration is done via the sysfs interface: :: 97 98 $ ls -lh /sys/devices/system/cpu 99 total 0 100 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0 101 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1 102 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2 103 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3 104 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4 105 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5 106 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6 107 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7 108 drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug 109 -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline 110 -r--r--r-- 1 root root 4.0K Dec 21 16:33 online 111 -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible 112 -r--r--r-- 1 root root 4.0K Dec 21 16:33 present 113 114The files *offline*, *online*, *possible*, *present* represent the CPU masks. 115Each CPU folder contains an *online* file which controls the logical on (1) and 116off (0) state. To logically shutdown CPU4: :: 117 118 $ echo 0 > /sys/devices/system/cpu/cpu4/online 119 smpboot: CPU 4 is now offline 120 121Once the CPU is shutdown, it will be removed from */proc/interrupts*, 122*/proc/cpuinfo* and should also not be shown visible by the *top* command. To 123bring CPU4 back online: :: 124 125 $ echo 1 > /sys/devices/system/cpu/cpu4/online 126 smpboot: Booting Node 0 Processor 4 APIC 0x1 127 128The CPU is usable again. This should work on all CPUs. CPU0 is often special 129and excluded from CPU hotplug. On X86 the kernel option 130*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to 131shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be 132used. Some known dependencies of CPU0: 133 134* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline. 135* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected. 136 137Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies 138on CPU0. 139 140The CPU hotplug coordination 141============================ 142 143The offline case 144---------------- 145Once a CPU has been logically shutdown the teardown callbacks of registered 146hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating 147at state ``CPUHP_OFFLINE``. This includes: 148 149* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen* 150 will be set to true. 151* All processes are migrated away from this outgoing CPU to new CPUs. 152 The new CPU is chosen from each process' current cpuset, which may be 153 a subset of all online CPUs. 154* All interrupts targeted to this CPU are migrated to a new CPU 155* timers are also migrated to a new CPU 156* Once all services are migrated, kernel calls an arch specific routine 157 ``__cpu_disable()`` to perform arch specific cleanup. 158 159Using the hotplug API 160--------------------- 161It is possible to receive notifications once a CPU is offline or onlined. This 162might be important to certain drivers which need to perform some kind of setup 163or clean up functions based on the number of available CPUs: :: 164 165 #include <linux/cpuhotplug.h> 166 167 ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "X/Y:online", 168 Y_online, Y_prepare_down); 169 170*X* is the subsystem and *Y* the particular driver. The *Y_online* callback 171will be invoked during registration on all online CPUs. If an error 172occurs during the online callback the *Y_prepare_down* callback will be 173invoked on all CPUs on which the online callback was previously invoked. 174After registration completed, the *Y_online* callback will be invoked 175once a CPU is brought online and *Y_prepare_down* will be invoked when a 176CPU is shutdown. All resources which were previously allocated in 177*Y_online* should be released in *Y_prepare_down*. 178The return value *ret* is negative if an error occurred during the 179registration process. Otherwise a positive value is returned which 180contains the allocated hotplug for dynamically allocated states 181(*CPUHP_AP_ONLINE_DYN*). It will return zero for predefined states. 182 183The callback can be remove by invoking ``cpuhp_remove_state()``. In case of a 184dynamically allocated state (*CPUHP_AP_ONLINE_DYN*) use the returned state. 185During the removal of a hotplug state the teardown callback will be invoked. 186 187Multiple instances 188~~~~~~~~~~~~~~~~~~ 189If a driver has multiple instances and each instance needs to perform the 190callback independently then it is likely that a ''multi-state'' should be used. 191First a multi-state state needs to be registered: :: 192 193 ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "X/Y:online, 194 Y_online, Y_prepare_down); 195 Y_hp_online = ret; 196 197The ``cpuhp_setup_state_multi()`` behaves similar to ``cpuhp_setup_state()`` 198except it prepares the callbacks for a multi state and does not invoke 199the callbacks. This is a one time setup. 200Once a new instance is allocated, you need to register this new instance: :: 201 202 ret = cpuhp_state_add_instance(Y_hp_online, &d->node); 203 204This function will add this instance to your previously allocated 205*Y_hp_online* state and invoke the previously registered callback 206(*Y_online*) on all online CPUs. The *node* element is a ``struct 207hlist_node`` member of your per-instance data structure. 208 209On removal of the instance: :: 210 cpuhp_state_remove_instance(Y_hp_online, &d->node) 211 212should be invoked which will invoke the teardown callback on all online 213CPUs. 214 215Manual setup 216~~~~~~~~~~~~ 217Usually it is handy to invoke setup and teardown callbacks on registration or 218removal of a state because usually the operation needs to performed once a CPU 219goes online (offline) and during initial setup (shutdown) of the driver. However 220each registration and removal function is also available with a ``_nocalls`` 221suffix which does not invoke the provided callbacks if the invocation of the 222callbacks is not desired. During the manual setup (or teardown) the functions 223``get_online_cpus()`` and ``put_online_cpus()`` should be used to inhibit CPU 224hotplug operations. 225 226 227The ordering of the events 228-------------------------- 229The hotplug states are defined in ``include/linux/cpuhotplug.h``: 230 231* The states *CPUHP_OFFLINE* … *CPUHP_AP_OFFLINE* are invoked before the 232 CPU is up. 233* The states *CPUHP_AP_OFFLINE* … *CPUHP_AP_ONLINE* are invoked 234 just the after the CPU has been brought up. The interrupts are off and 235 the scheduler is not yet active on this CPU. Starting with *CPUHP_AP_OFFLINE* 236 the callbacks are invoked on the target CPU. 237* The states between *CPUHP_AP_ONLINE_DYN* and *CPUHP_AP_ONLINE_DYN_END* are 238 reserved for the dynamic allocation. 239* The states are invoked in the reverse order on CPU shutdown starting with 240 *CPUHP_ONLINE* and stopping at *CPUHP_OFFLINE*. Here the callbacks are 241 invoked on the CPU that will be shutdown until *CPUHP_AP_OFFLINE*. 242 243A dynamically allocated state via *CPUHP_AP_ONLINE_DYN* is often enough. 244However if an earlier invocation during the bring up or shutdown is required 245then an explicit state should be acquired. An explicit state might also be 246required if the hotplug event requires specific ordering in respect to 247another hotplug event. 248 249Testing of hotplug states 250========================= 251One way to verify whether a custom state is working as expected or not is to 252shutdown a CPU and then put it online again. It is also possible to put the CPU 253to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to 254*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE* 255which would lead to rollback to the online state. 256 257All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states``: :: 258 259 $ tail /sys/devices/system/cpu/hotplug/states 260 138: mm/vmscan:online 261 139: mm/vmstat:online 262 140: lib/percpu_cnt:online 263 141: acpi/cpu-drv:online 264 142: base/cacheinfo:online 265 143: virtio/net:online 266 144: x86/mce:online 267 145: printk:online 268 168: sched:active 269 169: online 270 271To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue: :: 272 273 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 274 169 275 $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target 276 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 277 140 278 279It is important to note that the teardown callbac of state 140 have been 280invoked. And now get back online: :: 281 282 $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target 283 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 284 169 285 286With trace events enabled, the individual steps are visible, too: :: 287 288 # TASK-PID CPU# TIMESTAMP FUNCTION 289 # | | | | | 290 bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work) 291 cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate) 292 cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 293 cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down) 294 cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 295 cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep) 296 cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 297 cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down) 298 cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 299 bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0 300 bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work) 301 cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online) 302 cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0 303 cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online) 304 cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 305 cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online) 306 cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 307 cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online) 308 cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 309 cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify) 310 cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0 311 cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate) 312 cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 313 bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0 314 315As it an be seen, CPU4 went down until timestamp 22.996 and then back up until 31695.552. All invoked callbacks including their return codes are visible in the 317trace. 318 319Architecture's requirements 320=========================== 321The following functions and configurations are required: 322 323``CONFIG_HOTPLUG_CPU`` 324 This entry needs to be enabled in Kconfig 325 326``__cpu_up()`` 327 Arch interface to bring up a CPU 328 329``__cpu_disable()`` 330 Arch interface to shutdown a CPU, no more interrupts can be handled by the 331 kernel after the routine returns. This includes the shutdown of the timer. 332 333``__cpu_die()`` 334 This actually supposed to ensure death of the CPU. Actually look at some 335 example code in other arch that implement CPU hotplug. The processor is taken 336 down from the ``idle()`` loop for that specific architecture. ``__cpu_die()`` 337 typically waits for some per_cpu state to be set, to ensure the processor dead 338 routine is called to be sure positively. 339 340User Space Notification 341======================= 342After CPU successfully onlined or offline udev events are sent. A udev rule like: :: 343 344 SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh" 345 346will receive all events. A script like: :: 347 348 #!/bin/sh 349 350 if [ "${ACTION}" = "offline" ] 351 then 352 echo "CPU ${DEVPATH##*/} offline" 353 354 elif [ "${ACTION}" = "online" ] 355 then 356 echo "CPU ${DEVPATH##*/} online" 357 358 fi 359 360can process the event further. 361 362Kernel Inline Documentations Reference 363====================================== 364 365.. kernel-doc:: include/linux/cpuhotplug.h 366