1========================= 2CPU hotplug in the Kernel 3========================= 4 5:Date: September, 2021 6:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, 7 Rusty Russell <rusty@rustcorp.com.au>, 8 Srivatsa Vaddagiri <vatsa@in.ibm.com>, 9 Ashok Raj <ashok.raj@intel.com>, 10 Joel Schopp <jschopp@austin.ibm.com>, 11 Thomas Gleixner <tglx@linutronix.de> 12 13Introduction 14============ 15 16Modern advances in system architectures have introduced advanced error 17reporting and correction capabilities in processors. There are couple OEMS that 18support NUMA hardware which are hot pluggable as well, where physical node 19insertion and removal require support for CPU hotplug. 20 21Such advances require CPUs available to a kernel to be removed either for 22provisioning reasons, or for RAS purposes to keep an offending CPU off 23system execution path. Hence the need for CPU hotplug support in the 24Linux kernel. 25 26A more novel use of CPU-hotplug support is its use today in suspend resume 27support for SMP. Dual-core and HT support makes even a laptop run SMP kernels 28which didn't support these methods. 29 30 31Command Line Switches 32===================== 33``maxcpus=n`` 34 Restrict boot time CPUs to *n*. Say if you have four CPUs, using 35 ``maxcpus=2`` will only boot two. You can choose to bring the 36 other CPUs later online. 37 38``nr_cpus=n`` 39 Restrict the total amount of CPUs the kernel will support. If the number 40 supplied here is lower than the number of physically available CPUs, then 41 those CPUs can not be brought online later. 42 43``additional_cpus=n`` 44 Use this to limit hotpluggable CPUs. This option sets 45 ``cpu_possible_mask = cpu_present_mask + additional_cpus`` 46 47 This option is limited to the IA64 architecture. 48 49``possible_cpus=n`` 50 This option sets ``possible_cpus`` bits in ``cpu_possible_mask``. 51 52 This option is limited to the X86 and S390 architecture. 53 54``cpu0_hotplug`` 55 Allow to shutdown CPU0. 56 57 This option is limited to the X86 architecture. 58 59CPU maps 60======== 61 62``cpu_possible_mask`` 63 Bitmap of possible CPUs that can ever be available in the 64 system. This is used to allocate some boot time memory for per_cpu variables 65 that aren't designed to grow/shrink as CPUs are made available or removed. 66 Once set during boot time discovery phase, the map is static, i.e no bits 67 are added or removed anytime. Trimming it accurately for your system needs 68 upfront can save some boot time memory. 69 70``cpu_online_mask`` 71 Bitmap of all CPUs currently online. Its set in ``__cpu_up()`` 72 after a CPU is available for kernel scheduling and ready to receive 73 interrupts from devices. Its cleared when a CPU is brought down using 74 ``__cpu_disable()``, before which all OS services including interrupts are 75 migrated to another target CPU. 76 77``cpu_present_mask`` 78 Bitmap of CPUs currently present in the system. Not all 79 of them may be online. When physical hotplug is processed by the relevant 80 subsystem (e.g ACPI) can change and new bit either be added or removed 81 from the map depending on the event is hot-add/hot-remove. There are currently 82 no locking rules as of now. Typical usage is to init topology during boot, 83 at which time hotplug is disabled. 84 85You really don't need to manipulate any of the system CPU maps. They should 86be read-only for most use. When setting up per-cpu resources almost always use 87``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro 88``for_each_cpu()`` can be used to iterate over a custom CPU mask. 89 90Never use anything other than ``cpumask_t`` to represent bitmap of CPUs. 91 92 93Using CPU hotplug 94================= 95 96The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently 97available on multiple architectures including ARM, MIPS, PowerPC and X86. The 98configuration is done via the sysfs interface:: 99 100 $ ls -lh /sys/devices/system/cpu 101 total 0 102 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0 103 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1 104 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2 105 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3 106 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4 107 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5 108 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6 109 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7 110 drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug 111 -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline 112 -r--r--r-- 1 root root 4.0K Dec 21 16:33 online 113 -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible 114 -r--r--r-- 1 root root 4.0K Dec 21 16:33 present 115 116The files *offline*, *online*, *possible*, *present* represent the CPU masks. 117Each CPU folder contains an *online* file which controls the logical on (1) and 118off (0) state. To logically shutdown CPU4:: 119 120 $ echo 0 > /sys/devices/system/cpu/cpu4/online 121 smpboot: CPU 4 is now offline 122 123Once the CPU is shutdown, it will be removed from */proc/interrupts*, 124*/proc/cpuinfo* and should also not be shown visible by the *top* command. To 125bring CPU4 back online:: 126 127 $ echo 1 > /sys/devices/system/cpu/cpu4/online 128 smpboot: Booting Node 0 Processor 4 APIC 0x1 129 130The CPU is usable again. This should work on all CPUs. CPU0 is often special 131and excluded from CPU hotplug. On X86 the kernel option 132*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to 133shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be 134used. Some known dependencies of CPU0: 135 136* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline. 137* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected. 138 139Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies 140on CPU0. 141 142The CPU hotplug coordination 143============================ 144 145The offline case 146---------------- 147 148Once a CPU has been logically shutdown the teardown callbacks of registered 149hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating 150at state ``CPUHP_OFFLINE``. This includes: 151 152* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen* 153 will be set to true. 154* All processes are migrated away from this outgoing CPU to new CPUs. 155 The new CPU is chosen from each process' current cpuset, which may be 156 a subset of all online CPUs. 157* All interrupts targeted to this CPU are migrated to a new CPU 158* timers are also migrated to a new CPU 159* Once all services are migrated, kernel calls an arch specific routine 160 ``__cpu_disable()`` to perform arch specific cleanup. 161 162 163The CPU hotplug API 164=================== 165 166CPU hotplug state machine 167------------------------- 168 169CPU hotplug uses a trivial state machine with a linear state space from 170CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown 171callback. 172 173When a CPU is onlined, the startup callbacks are invoked sequentially until 174the state CPUHP_ONLINE is reached. They can also be invoked when the 175callbacks of a state are set up or an instance is added to a multi-instance 176state. 177 178When a CPU is offlined the teardown callbacks are invoked in the reverse 179order sequentially until the state CPUHP_OFFLINE is reached. They can also 180be invoked when the callbacks of a state are removed or an instance is 181removed from a multi-instance state. 182 183If a usage site requires only a callback in one direction of the hotplug 184operations (CPU online or CPU offline) then the other not-required callback 185can be set to NULL when the state is set up. 186 187The state space is divided into three sections: 188 189* The PREPARE section 190 191 The PREPARE section covers the state space from CPUHP_OFFLINE to 192 CPUHP_BRINGUP_CPU. 193 194 The startup callbacks in this section are invoked before the CPU is 195 started during a CPU online operation. The teardown callbacks are invoked 196 after the CPU has become dysfunctional during a CPU offline operation. 197 198 The callbacks are invoked on a control CPU as they can't obviously run on 199 the hotplugged CPU which is either not yet started or has become 200 dysfunctional already. 201 202 The startup callbacks are used to setup resources which are required to 203 bring a CPU successfully online. The teardown callbacks are used to free 204 resources or to move pending work to an online CPU after the hotplugged 205 CPU became dysfunctional. 206 207 The startup callbacks are allowed to fail. If a callback fails, the CPU 208 online operation is aborted and the CPU is brought down to the previous 209 state (usually CPUHP_OFFLINE) again. 210 211 The teardown callbacks in this section are not allowed to fail. 212 213* The STARTING section 214 215 The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1 216 and CPUHP_AP_ONLINE. 217 218 The startup callbacks in this section are invoked on the hotplugged CPU 219 with interrupts disabled during a CPU online operation in the early CPU 220 setup code. The teardown callbacks are invoked with interrupts disabled 221 on the hotplugged CPU during a CPU offline operation shortly before the 222 CPU is completely shut down. 223 224 The callbacks in this section are not allowed to fail. 225 226 The callbacks are used for low level hardware initialization/shutdown and 227 for core subsystems. 228 229* The ONLINE section 230 231 The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and 232 CPUHP_ONLINE. 233 234 The startup callbacks in this section are invoked on the hotplugged CPU 235 during a CPU online operation. The teardown callbacks are invoked on the 236 hotplugged CPU during a CPU offline operation. 237 238 The callbacks are invoked in the context of the per CPU hotplug thread, 239 which is pinned on the hotplugged CPU. The callbacks are invoked with 240 interrupts and preemption enabled. 241 242 The callbacks are allowed to fail. When a callback fails the hotplug 243 operation is aborted and the CPU is brought back to the previous state. 244 245CPU online/offline operations 246----------------------------- 247 248A successful online operation looks like this:: 249 250 [CPUHP_OFFLINE] 251 [CPUHP_OFFLINE + 1]->startup() -> success 252 [CPUHP_OFFLINE + 2]->startup() -> success 253 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 254 ... 255 [CPUHP_BRINGUP_CPU]->startup() -> success 256 === End of PREPARE section 257 [CPUHP_BRINGUP_CPU + 1]->startup() -> success 258 ... 259 [CPUHP_AP_ONLINE]->startup() -> success 260 === End of STARTUP section 261 [CPUHP_AP_ONLINE + 1]->startup() -> success 262 ... 263 [CPUHP_ONLINE - 1]->startup() -> success 264 [CPUHP_ONLINE] 265 266A successful offline operation looks like this:: 267 268 [CPUHP_ONLINE] 269 [CPUHP_ONLINE - 1]->teardown() -> success 270 ... 271 [CPUHP_AP_ONLINE + 1]->teardown() -> success 272 === Start of STARTUP section 273 [CPUHP_AP_ONLINE]->teardown() -> success 274 ... 275 [CPUHP_BRINGUP_ONLINE - 1]->teardown() 276 ... 277 === Start of PREPARE section 278 [CPUHP_BRINGUP_CPU]->teardown() 279 [CPUHP_OFFLINE + 3]->teardown() 280 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 281 [CPUHP_OFFLINE + 1]->teardown() 282 [CPUHP_OFFLINE] 283 284A failed online operation looks like this:: 285 286 [CPUHP_OFFLINE] 287 [CPUHP_OFFLINE + 1]->startup() -> success 288 [CPUHP_OFFLINE + 2]->startup() -> success 289 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 290 ... 291 [CPUHP_BRINGUP_CPU]->startup() -> success 292 === End of PREPARE section 293 [CPUHP_BRINGUP_CPU + 1]->startup() -> success 294 ... 295 [CPUHP_AP_ONLINE]->startup() -> success 296 === End of STARTUP section 297 [CPUHP_AP_ONLINE + 1]->startup() -> success 298 --- 299 [CPUHP_AP_ONLINE + N]->startup() -> fail 300 [CPUHP_AP_ONLINE + (N - 1)]->teardown() 301 ... 302 [CPUHP_AP_ONLINE + 1]->teardown() 303 === Start of STARTUP section 304 [CPUHP_AP_ONLINE]->teardown() 305 ... 306 [CPUHP_BRINGUP_ONLINE - 1]->teardown() 307 ... 308 === Start of PREPARE section 309 [CPUHP_BRINGUP_CPU]->teardown() 310 [CPUHP_OFFLINE + 3]->teardown() 311 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 312 [CPUHP_OFFLINE + 1]->teardown() 313 [CPUHP_OFFLINE] 314 315A failed offline operation looks like this:: 316 317 [CPUHP_ONLINE] 318 [CPUHP_ONLINE - 1]->teardown() -> success 319 ... 320 [CPUHP_ONLINE - N]->teardown() -> fail 321 [CPUHP_ONLINE - (N - 1)]->startup() 322 ... 323 [CPUHP_ONLINE - 1]->startup() 324 [CPUHP_ONLINE] 325 326Recursive failures cannot be handled sensibly. Look at the following 327example of a recursive fail due to a failed offline operation: :: 328 329 [CPUHP_ONLINE] 330 [CPUHP_ONLINE - 1]->teardown() -> success 331 ... 332 [CPUHP_ONLINE - N]->teardown() -> fail 333 [CPUHP_ONLINE - (N - 1)]->startup() -> success 334 [CPUHP_ONLINE - (N - 2)]->startup() -> fail 335 336The CPU hotplug state machine stops right here and does not try to go back 337down again because that would likely result in an endless loop:: 338 339 [CPUHP_ONLINE - (N - 1)]->teardown() -> success 340 [CPUHP_ONLINE - N]->teardown() -> fail 341 [CPUHP_ONLINE - (N - 1)]->startup() -> success 342 [CPUHP_ONLINE - (N - 2)]->startup() -> fail 343 [CPUHP_ONLINE - (N - 1)]->teardown() -> success 344 [CPUHP_ONLINE - N]->teardown() -> fail 345 346Lather, rinse and repeat. In this case the CPU left in state:: 347 348 [CPUHP_ONLINE - (N - 1)] 349 350which at least lets the system make progress and gives the user a chance to 351debug or even resolve the situation. 352 353Allocating a state 354------------------ 355 356There are two ways to allocate a CPU hotplug state: 357 358* Static allocation 359 360 Static allocation has to be used when the subsystem or driver has 361 ordering requirements versus other CPU hotplug states. E.g. the PERF core 362 startup callback has to be invoked before the PERF driver startup 363 callbacks during a CPU online operation. During a CPU offline operation 364 the driver teardown callbacks have to be invoked before the core teardown 365 callback. The statically allocated states are described by constants in 366 the cpuhp_state enum which can be found in include/linux/cpuhotplug.h. 367 368 Insert the state into the enum at the proper place so the ordering 369 requirements are fulfilled. The state constant has to be used for state 370 setup and removal. 371 372 Static allocation is also required when the state callbacks are not set 373 up at runtime and are part of the initializer of the CPU hotplug state 374 array in kernel/cpu.c. 375 376* Dynamic allocation 377 378 When there are no ordering requirements for the state callbacks then 379 dynamic allocation is the preferred method. The state number is allocated 380 by the setup function and returned to the caller on success. 381 382 Only the PREPARE and ONLINE sections provide a dynamic allocation 383 range. The STARTING section does not as most of the callbacks in that 384 section have explicit ordering requirements. 385 386Setup of a CPU hotplug state 387---------------------------- 388 389The core code provides the following functions to setup a state: 390 391* cpuhp_setup_state(state, name, startup, teardown) 392* cpuhp_setup_state_nocalls(state, name, startup, teardown) 393* cpuhp_setup_state_cpuslocked(state, name, startup, teardown) 394* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown) 395 396For cases where a driver or a subsystem has multiple instances and the same 397CPU hotplug state callbacks need to be invoked for each instance, the CPU 398hotplug core provides multi-instance support. The advantage over driver 399specific instance lists is that the instance related functions are fully 400serialized against CPU hotplug operations and provide the automatic 401invocations of the state callbacks on add and removal. To set up such a 402multi-instance state the following function is available: 403 404* cpuhp_setup_state_multi(state, name, startup, teardown) 405 406The @state argument is either a statically allocated state or one of the 407constants for dynamically allocated states - CPUHP_PREPARE_DYN, 408CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for 409which a dynamic state should be allocated. 410 411The @name argument is used for sysfs output and for instrumentation. The 412naming convention is "subsys:mode" or "subsys/driver:mode", 413e.g. "perf:mode" or "perf/x86:mode". The common mode names are: 414 415======== ======================================================= 416prepare For states in the PREPARE section 417 418dead For states in the PREPARE section which do not provide 419 a startup callback 420 421starting For states in the STARTING section 422 423dying For states in the STARTING section which do not provide 424 a startup callback 425 426online For states in the ONLINE section 427 428offline For states in the ONLINE section which do not provide 429 a startup callback 430======== ======================================================= 431 432As the @name argument is only used for sysfs and instrumentation other mode 433descriptors can be used as well if they describe the nature of the state 434better than the common ones. 435 436Examples for @name arguments: "perf/online", "perf/x86:prepare", 437"RCU/tree:dying", "sched/waitempty" 438 439The @startup argument is a function pointer to the callback which should be 440invoked during a CPU online operation. If the usage site does not require a 441startup callback set the pointer to NULL. 442 443The @teardown argument is a function pointer to the callback which should 444be invoked during a CPU offline operation. If the usage site does not 445require a teardown callback set the pointer to NULL. 446 447The functions differ in the way how the installed callbacks are treated: 448 449 * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked() 450 and cpuhp_setup_state_multi() only install the callbacks 451 452 * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the 453 callbacks and invoke the @startup callback (if not NULL) for all online 454 CPUs which have currently a state greater than the newly installed 455 state. Depending on the state section the callback is either invoked on 456 the current CPU (PREPARE section) or on each online CPU (ONLINE 457 section) in the context of the CPU's hotplug thread. 458 459 If a callback fails for CPU N then the teardown callback for CPU 460 0 .. N-1 is invoked to rollback the operation. The state setup fails, 461 the callbacks for the state are not installed and in case of dynamic 462 allocation the allocated state is freed. 463 464The state setup and the callback invocations are serialized against CPU 465hotplug operations. If the setup function has to be called from a CPU 466hotplug read locked region, then the _cpuslocked() variants have to be 467used. These functions cannot be used from within CPU hotplug callbacks. 468 469The function return values: 470 ======== =================================================================== 471 0 Statically allocated state was successfully set up 472 473 >0 Dynamically allocated state was successfully set up. 474 475 The returned number is the state number which was allocated. If 476 the state callbacks have to be removed later, e.g. module 477 removal, then this number has to be saved by the caller and used 478 as @state argument for the state remove function. For 479 multi-instance states the dynamically allocated state number is 480 also required as @state argument for the instance add/remove 481 operations. 482 483 <0 Operation failed 484 ======== =================================================================== 485 486Removal of a CPU hotplug state 487------------------------------ 488 489To remove a previously set up state, the following functions are provided: 490 491* cpuhp_remove_state(state) 492* cpuhp_remove_state_nocalls(state) 493* cpuhp_remove_state_nocalls_cpuslocked(state) 494* cpuhp_remove_multi_state(state) 495 496The @state argument is either a statically allocated state or the state 497number which was allocated in the dynamic range by cpuhp_setup_state*(). If 498the state is in the dynamic range, then the state number is freed and 499available for dynamic allocation again. 500 501The functions differ in the way how the installed callbacks are treated: 502 503 * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked() 504 and cpuhp_remove_multi_state() only remove the callbacks. 505 506 * cpuhp_remove_state() removes the callbacks and invokes the teardown 507 callback (if not NULL) for all online CPUs which have currently a state 508 greater than the removed state. Depending on the state section the 509 callback is either invoked on the current CPU (PREPARE section) or on 510 each online CPU (ONLINE section) in the context of the CPU's hotplug 511 thread. 512 513 In order to complete the removal, the teardown callback should not fail. 514 515The state removal and the callback invocations are serialized against CPU 516hotplug operations. If the remove function has to be called from a CPU 517hotplug read locked region, then the _cpuslocked() variants have to be 518used. These functions cannot be used from within CPU hotplug callbacks. 519 520If a multi-instance state is removed then the caller has to remove all 521instances first. 522 523Multi-Instance state instance management 524---------------------------------------- 525 526Once the multi-instance state is set up, instances can be added to the 527state: 528 529 * cpuhp_state_add_instance(state, node) 530 * cpuhp_state_add_instance_nocalls(state, node) 531 532The @state argument is either a statically allocated state or the state 533number which was allocated in the dynamic range by cpuhp_setup_state_multi(). 534 535The @node argument is a pointer to an hlist_node which is embedded in the 536instance's data structure. The pointer is handed to the multi-instance 537state callbacks and can be used by the callback to retrieve the instance 538via container_of(). 539 540The functions differ in the way how the installed callbacks are treated: 541 542 * cpuhp_state_add_instance_nocalls() and only adds the instance to the 543 multi-instance state's node list. 544 545 * cpuhp_state_add_instance() adds the instance and invokes the startup 546 callback (if not NULL) associated with @state for all online CPUs which 547 have currently a state greater than @state. The callback is only 548 invoked for the to be added instance. Depending on the state section 549 the callback is either invoked on the current CPU (PREPARE section) or 550 on each online CPU (ONLINE section) in the context of the CPU's hotplug 551 thread. 552 553 If a callback fails for CPU N then the teardown callback for CPU 554 0 .. N-1 is invoked to rollback the operation, the function fails and 555 the instance is not added to the node list of the multi-instance state. 556 557To remove an instance from the state's node list these functions are 558available: 559 560 * cpuhp_state_remove_instance(state, node) 561 * cpuhp_state_remove_instance_nocalls(state, node) 562 563The arguments are the same as for the the cpuhp_state_add_instance*() 564variants above. 565 566The functions differ in the way how the installed callbacks are treated: 567 568 * cpuhp_state_remove_instance_nocalls() only removes the instance from the 569 state's node list. 570 571 * cpuhp_state_remove_instance() removes the instance and invokes the 572 teardown callback (if not NULL) associated with @state for all online 573 CPUs which have currently a state greater than @state. The callback is 574 only invoked for the to be removed instance. Depending on the state 575 section the callback is either invoked on the current CPU (PREPARE 576 section) or on each online CPU (ONLINE section) in the context of the 577 CPU's hotplug thread. 578 579 In order to complete the removal, the teardown callback should not fail. 580 581The node list add/remove operations and the callback invocations are 582serialized against CPU hotplug operations. These functions cannot be used 583from within CPU hotplug callbacks and CPU hotplug read locked regions. 584 585Examples 586-------- 587 588Setup and teardown a statically allocated state in the STARTING section for 589notifications on online and offline operations:: 590 591 ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying); 592 if (ret < 0) 593 return ret; 594 .... 595 cpuhp_remove_state(CPUHP_SUBSYS_STARTING); 596 597Setup and teardown a dynamically allocated state in the ONLINE section 598for notifications on offline operations:: 599 600 state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); 601 if (state < 0) 602 return state; 603 .... 604 cpuhp_remove_state(state); 605 606Setup and teardown a dynamically allocated state in the ONLINE section 607for notifications on online operations without invoking the callbacks:: 608 609 state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); 610 if (state < 0) 611 return state; 612 .... 613 cpuhp_remove_state_nocalls(state); 614 615Setup, use and teardown a dynamically allocated multi-instance state in the 616ONLINE section for notifications on online and offline operation:: 617 618 state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); 619 if (state < 0) 620 return state; 621 .... 622 ret = cpuhp_state_add_instance(state, &inst1->node); 623 if (ret) 624 return ret; 625 .... 626 ret = cpuhp_state_add_instance(state, &inst2->node); 627 if (ret) 628 return ret; 629 .... 630 cpuhp_remove_instance(state, &inst1->node); 631 .... 632 cpuhp_remove_instance(state, &inst2->node); 633 .... 634 remove_multi_state(state); 635 636 637Testing of hotplug states 638========================= 639 640One way to verify whether a custom state is working as expected or not is to 641shutdown a CPU and then put it online again. It is also possible to put the CPU 642to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to 643*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE* 644which would lead to rollback to the online state. 645 646All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` :: 647 648 $ tail /sys/devices/system/cpu/hotplug/states 649 138: mm/vmscan:online 650 139: mm/vmstat:online 651 140: lib/percpu_cnt:online 652 141: acpi/cpu-drv:online 653 142: base/cacheinfo:online 654 143: virtio/net:online 655 144: x86/mce:online 656 145: printk:online 657 168: sched:active 658 169: online 659 660To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue:: 661 662 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 663 169 664 $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target 665 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 666 140 667 668It is important to note that the teardown callback of state 140 have been 669invoked. And now get back online:: 670 671 $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target 672 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 673 169 674 675With trace events enabled, the individual steps are visible, too:: 676 677 # TASK-PID CPU# TIMESTAMP FUNCTION 678 # | | | | | 679 bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work) 680 cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate) 681 cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 682 cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down) 683 cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 684 cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep) 685 cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 686 cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down) 687 cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 688 bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0 689 bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work) 690 cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online) 691 cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0 692 cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online) 693 cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 694 cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online) 695 cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 696 cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online) 697 cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 698 cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify) 699 cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0 700 cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate) 701 cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 702 bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0 703 704As it an be seen, CPU4 went down until timestamp 22.996 and then back up until 70595.552. All invoked callbacks including their return codes are visible in the 706trace. 707 708Architecture's requirements 709=========================== 710 711The following functions and configurations are required: 712 713``CONFIG_HOTPLUG_CPU`` 714 This entry needs to be enabled in Kconfig 715 716``__cpu_up()`` 717 Arch interface to bring up a CPU 718 719``__cpu_disable()`` 720 Arch interface to shutdown a CPU, no more interrupts can be handled by the 721 kernel after the routine returns. This includes the shutdown of the timer. 722 723``__cpu_die()`` 724 This actually supposed to ensure death of the CPU. Actually look at some 725 example code in other arch that implement CPU hotplug. The processor is taken 726 down from the ``idle()`` loop for that specific architecture. ``__cpu_die()`` 727 typically waits for some per_cpu state to be set, to ensure the processor dead 728 routine is called to be sure positively. 729 730User Space Notification 731======================= 732 733After CPU successfully onlined or offline udev events are sent. A udev rule like:: 734 735 SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh" 736 737will receive all events. A script like:: 738 739 #!/bin/sh 740 741 if [ "${ACTION}" = "offline" ] 742 then 743 echo "CPU ${DEVPATH##*/} offline" 744 745 elif [ "${ACTION}" = "online" ] 746 then 747 echo "CPU ${DEVPATH##*/} online" 748 749 fi 750 751can process the event further. 752 753Kernel Inline Documentations Reference 754====================================== 755 756.. kernel-doc:: include/linux/cpuhotplug.h 757