1fc7db767SRafael J. Wysocki.. SPDX-License-Identifier: GPL-2.0 2fc1860d6SRafael J. Wysocki.. include:: <isonum.txt> 3fc7db767SRafael J. Wysocki 43b735766SRafael J. Wysocki======================== 53b735766SRafael J. WysockiCPU Idle Time Management 63b735766SRafael J. Wysocki======================== 73b735766SRafael J. Wysocki 8fc1860d6SRafael J. Wysocki:Copyright: |copy| 2019 Intel Corporation 93b735766SRafael J. Wysocki 10fc1860d6SRafael J. Wysocki:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 113b735766SRafael J. Wysocki 123b735766SRafael J. Wysocki 133b735766SRafael J. WysockiCPU Idle Time Management Subsystem 143b735766SRafael J. Wysocki================================== 153b735766SRafael J. Wysocki 163b735766SRafael J. WysockiEvery time one of the logical CPUs in the system (the entities that appear to 173b735766SRafael J. Wysockifetch and execute instructions: hardware threads, if present, or processor 183b735766SRafael J. Wysockicores) is idle after an interrupt or equivalent wakeup event, which means that 193b735766SRafael J. Wysockithere are no tasks to run on it except for the special "idle" task associated 203b735766SRafael J. Wysockiwith it, there is an opportunity to save energy for the processor that it 213b735766SRafael J. Wysockibelongs to. That can be done by making the idle logical CPU stop fetching 223b735766SRafael J. Wysockiinstructions from memory and putting some of the processor's functional units 233b735766SRafael J. Wysockidepended on by it into an idle state in which they will draw less power. 243b735766SRafael J. Wysocki 253b735766SRafael J. WysockiHowever, there may be multiple different idle states that can be used in such a 263b735766SRafael J. Wysockisituation in principle, so it may be necessary to find the most suitable one 273b735766SRafael J. Wysocki(from the kernel perspective) and ask the processor to use (or "enter") that 283b735766SRafael J. Wysockiparticular idle state. That is the role of the CPU idle time management 293b735766SRafael J. Wysockisubsystem in the kernel, called ``CPUIdle``. 303b735766SRafael J. Wysocki 313b735766SRafael J. WysockiThe design of ``CPUIdle`` is modular and based on the code duplication avoidance 323b735766SRafael J. Wysockiprinciple, so the generic code that in principle need not depend on the hardware 333b735766SRafael J. Wysockior platform design details in it is separate from the code that interacts with 343b735766SRafael J. Wysockithe hardware. It generally is divided into three categories of functional 353b735766SRafael J. Wysockiunits: *governors* responsible for selecting idle states to ask the processor 363b735766SRafael J. Wysockito enter, *drivers* that pass the governors' decisions on to the hardware and 373b735766SRafael J. Wysockithe *core* providing a common framework for them. 383b735766SRafael J. Wysocki 393b735766SRafael J. Wysocki 403b735766SRafael J. WysockiCPU Idle Time Governors 413b735766SRafael J. Wysocki======================= 423b735766SRafael J. Wysocki 433b735766SRafael J. WysockiA CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when 443b735766SRafael J. Wysockione of the logical CPUs in the system turns out to be idle. Its role is to 453b735766SRafael J. Wysockiselect an idle state to ask the processor to enter in order to save some energy. 463b735766SRafael J. Wysocki 473b735766SRafael J. Wysocki``CPUIdle`` governors are generic and each of them can be used on any hardware 483b735766SRafael J. Wysockiplatform that the Linux kernel can run on. For this reason, data structures 493b735766SRafael J. Wysockioperated on by them cannot depend on any hardware architecture or platform 503b735766SRafael J. Wysockidesign details as well. 513b735766SRafael J. Wysocki 52abc59fd4SMauro Carvalho ChehabThe governor itself is represented by a struct cpuidle_governor object 533b735766SRafael J. Wysockicontaining four callback pointers, :c:member:`enable`, :c:member:`disable`, 543b735766SRafael J. Wysocki:c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described 553b735766SRafael J. Wysockibelow, and a name (string) used for identifying it. 563b735766SRafael J. Wysocki 573b735766SRafael J. WysockiFor the governor to be available at all, that object needs to be registered 583b735766SRafael J. Wysockiwith the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with 593b735766SRafael J. Wysockia pointer to it passed as the argument. If successful, that causes the core to 603b735766SRafael J. Wysockiadd the governor to the global list of available governors and, if it is the 613b735766SRafael J. Wysockionly one in the list (that is, the list was empty before) or the value of its 623b735766SRafael J. Wysocki:c:member:`rating` field is greater than the value of that field for the 633b735766SRafael J. Wysockigovernor currently in use, or the name of the new governor was passed to the 643b735766SRafael J. Wysockikernel as the value of the ``cpuidle.governor=`` command line parameter, the new 653b735766SRafael J. Wysockigovernor will be used from that point on (there can be only one ``CPUIdle`` 667395683aSHanjun Guogovernor in use at a time). Also, user space can choose the ``CPUIdle`` 677395683aSHanjun Guogovernor to use at run time via ``sysfs``. 683b735766SRafael J. Wysocki 693b735766SRafael J. WysockiOnce registered, ``CPUIdle`` governors cannot be unregistered, so it is not 703b735766SRafael J. Wysockipractical to put them into loadable kernel modules. 713b735766SRafael J. Wysocki 723b735766SRafael J. WysockiThe interface between ``CPUIdle`` governors and the core consists of four 733b735766SRafael J. Wysockicallbacks: 743b735766SRafael J. Wysocki 753b735766SRafael J. Wysocki:c:member:`enable` 763b735766SRafael J. Wysocki :: 773b735766SRafael J. Wysocki 783b735766SRafael J. Wysocki int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 793b735766SRafael J. Wysocki 803b735766SRafael J. Wysocki The role of this callback is to prepare the governor for handling the 81abc59fd4SMauro Carvalho Chehab (logical) CPU represented by the struct cpuidle_device object pointed 82abc59fd4SMauro Carvalho Chehab to by the ``dev`` argument. The struct cpuidle_driver object pointed 833b735766SRafael J. Wysocki to by the ``drv`` argument represents the ``CPUIdle`` driver to be used 843b735766SRafael J. Wysocki with that CPU (among other things, it should contain the list of 85abc59fd4SMauro Carvalho Chehab struct cpuidle_state objects representing idle states that the 863b735766SRafael J. Wysocki processor holding the given CPU can be asked to enter). 873b735766SRafael J. Wysocki 883b735766SRafael J. Wysocki It may fail, in which case it is expected to return a negative error 893b735766SRafael J. Wysocki code, and that causes the kernel to run the architecture-specific 903b735766SRafael J. Wysocki default code for idle CPUs on the CPU in question instead of ``CPUIdle`` 913b735766SRafael J. Wysocki until the ``->enable()`` governor callback is invoked for that CPU 923b735766SRafael J. Wysocki again. 933b735766SRafael J. Wysocki 943b735766SRafael J. Wysocki:c:member:`disable` 953b735766SRafael J. Wysocki :: 963b735766SRafael J. Wysocki 973b735766SRafael J. Wysocki void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 983b735766SRafael J. Wysocki 993b735766SRafael J. Wysocki Called to make the governor stop handling the (logical) CPU represented 100abc59fd4SMauro Carvalho Chehab by the struct cpuidle_device object pointed to by the ``dev`` 1013b735766SRafael J. Wysocki argument. 1023b735766SRafael J. Wysocki 1033b735766SRafael J. Wysocki It is expected to reverse any changes made by the ``->enable()`` 1043b735766SRafael J. Wysocki callback when it was last invoked for the target CPU, free all memory 1053b735766SRafael J. Wysocki allocated by that callback and so on. 1063b735766SRafael J. Wysocki 1073b735766SRafael J. Wysocki:c:member:`select` 1083b735766SRafael J. Wysocki :: 1093b735766SRafael J. Wysocki 1103b735766SRafael J. Wysocki int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev, 1113b735766SRafael J. Wysocki bool *stop_tick); 1123b735766SRafael J. Wysocki 1133b735766SRafael J. Wysocki Called to select an idle state for the processor holding the (logical) 114abc59fd4SMauro Carvalho Chehab CPU represented by the struct cpuidle_device object pointed to by the 1153b735766SRafael J. Wysocki ``dev`` argument. 1163b735766SRafael J. Wysocki 1173b735766SRafael J. Wysocki The list of idle states to take into consideration is represented by the 118abc59fd4SMauro Carvalho Chehab :c:member:`states` array of struct cpuidle_state objects held by the 119abc59fd4SMauro Carvalho Chehab struct cpuidle_driver object pointed to by the ``drv`` argument (which 1203b735766SRafael J. Wysocki represents the ``CPUIdle`` driver to be used with the CPU at hand). The 1213b735766SRafael J. Wysocki value returned by this callback is interpreted as an index into that 1223b735766SRafael J. Wysocki array (unless it is a negative error code). 1233b735766SRafael J. Wysocki 1243b735766SRafael J. Wysocki The ``stop_tick`` argument is used to indicate whether or not to stop 1253b735766SRafael J. Wysocki the scheduler tick before asking the processor to enter the selected 1263b735766SRafael J. Wysocki idle state. When the ``bool`` variable pointed to by it (which is set 1273b735766SRafael J. Wysocki to ``true`` before invoking this callback) is cleared to ``false``, the 1283b735766SRafael J. Wysocki processor will be asked to enter the selected idle state without 1293b735766SRafael J. Wysocki stopping the scheduler tick on the given CPU (if the tick has been 1303b735766SRafael J. Wysocki stopped on that CPU already, however, it will not be restarted before 1313b735766SRafael J. Wysocki asking the processor to enter the idle state). 1323b735766SRafael J. Wysocki 1333b735766SRafael J. Wysocki This callback is mandatory (i.e. the :c:member:`select` callback pointer 134abc59fd4SMauro Carvalho Chehab in struct cpuidle_governor must not be ``NULL`` for the registration 1353b735766SRafael J. Wysocki of the governor to succeed). 1363b735766SRafael J. Wysocki 1373b735766SRafael J. Wysocki:c:member:`reflect` 1383b735766SRafael J. Wysocki :: 1393b735766SRafael J. Wysocki 1403b735766SRafael J. Wysocki void (*reflect) (struct cpuidle_device *dev, int index); 1413b735766SRafael J. Wysocki 1423b735766SRafael J. Wysocki Called to allow the governor to evaluate the accuracy of the idle state 1433b735766SRafael J. Wysocki selection made by the ``->select()`` callback (when it was invoked last 1443b735766SRafael J. Wysocki time) and possibly use the result of that to improve the accuracy of 1453b735766SRafael J. Wysocki idle state selections in the future. 1463b735766SRafael J. Wysocki 1473b735766SRafael J. WysockiIn addition, ``CPUIdle`` governors are required to take power management 1483b735766SRafael J. Wysockiquality of service (PM QoS) constraints on the processor wakeup latency into 1493b735766SRafael J. Wysockiaccount when selecting idle states. In order to obtain the current effective 1503b735766SRafael J. WysockiPM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is 1513b735766SRafael J. Wysockiexpected to pass the number of the CPU to 1523b735766SRafael J. Wysocki:c:func:`cpuidle_governor_latency_req()`. Then, the governor's ``->select()`` 1533b735766SRafael J. Wysockicallback must not return the index of an indle state whose 1543b735766SRafael J. Wysocki:c:member:`exit_latency` value is greater than the number returned by that 1553b735766SRafael J. Wysockifunction. 1563b735766SRafael J. Wysocki 1573b735766SRafael J. Wysocki 1583b735766SRafael J. WysockiCPU Idle Time Management Drivers 1593b735766SRafael J. Wysocki================================ 1603b735766SRafael J. Wysocki 1613b735766SRafael J. WysockiCPU idle time management (``CPUIdle``) drivers provide an interface between the 1623b735766SRafael J. Wysockiother parts of ``CPUIdle`` and the hardware. 1633b735766SRafael J. Wysocki 1643b735766SRafael J. WysockiFirst of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array 165abc59fd4SMauro Carvalho Chehabof struct cpuidle_state objects included in the struct cpuidle_driver object 1663b735766SRafael J. Wysockirepresenting it. Going forward this array will represent the list of available 1673b735766SRafael J. Wysockiidle states that the processor hardware can be asked to enter shared by all of 1683b735766SRafael J. Wysockithe logical CPUs handled by the given driver. 1693b735766SRafael J. Wysocki 1703b735766SRafael J. WysockiThe entries in the :c:member:`states` array are expected to be sorted by the 171abc59fd4SMauro Carvalho Chehabvalue of the :c:member:`target_residency` field in struct cpuidle_state in 1723b735766SRafael J. Wysockithe ascending order (that is, index 0 should correspond to the idle state with 1733b735766SRafael J. Wysockithe minimum value of :c:member:`target_residency`). [Since the 1743b735766SRafael J. Wysocki:c:member:`target_residency` value is expected to reflect the "depth" of the 175abc59fd4SMauro Carvalho Chehabidle state represented by the struct cpuidle_state object holding it, this 1763b735766SRafael J. Wysockisorting order should be the same as the ascending sorting order by the idle 1773b735766SRafael J. Wysockistate "depth".] 1783b735766SRafael J. Wysocki 179abc59fd4SMauro Carvalho ChehabThree fields in struct cpuidle_state are used by the existing ``CPUIdle`` 1803b735766SRafael J. Wysockigovernors for computations related to idle state selection: 1813b735766SRafael J. Wysocki 1823b735766SRafael J. Wysocki:c:member:`target_residency` 1833b735766SRafael J. Wysocki Minimum time to spend in this idle state including the time needed to 1843b735766SRafael J. Wysocki enter it (which may be substantial) to save more energy than could 1853b735766SRafael J. Wysocki be saved by staying in a shallower idle state for the same amount of 1863b735766SRafael J. Wysocki time, in microseconds. 1873b735766SRafael J. Wysocki 1883b735766SRafael J. Wysocki:c:member:`exit_latency` 1893b735766SRafael J. Wysocki Maximum time it will take a CPU asking the processor to enter this idle 1903b735766SRafael J. Wysocki state to start executing the first instruction after a wakeup from it, 1913b735766SRafael J. Wysocki in microseconds. 1923b735766SRafael J. Wysocki 1933b735766SRafael J. Wysocki:c:member:`flags` 1943b735766SRafael J. Wysocki Flags representing idle state properties. Currently, governors only use 1953b735766SRafael J. Wysocki the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object 1963b735766SRafael J. Wysocki does not represent a real idle state, but an interface to a software 1973b735766SRafael J. Wysocki "loop" that can be used in order to avoid asking the processor to enter 1983b735766SRafael J. Wysocki any idle state at all. [There are other flags used by the ``CPUIdle`` 1993b735766SRafael J. Wysocki core in special situations.] 2003b735766SRafael J. Wysocki 201abc59fd4SMauro Carvalho ChehabThe :c:member:`enter` callback pointer in struct cpuidle_state, which must not 2023b735766SRafael J. Wysockibe ``NULL``, points to the routine to execute in order to ask the processor to 2033b735766SRafael J. Wysockienter this particular idle state: 2043b735766SRafael J. Wysocki 2053b735766SRafael J. Wysocki:: 2063b735766SRafael J. Wysocki 2073b735766SRafael J. Wysocki void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv, 2083b735766SRafael J. Wysocki int index); 2093b735766SRafael J. Wysocki 210abc59fd4SMauro Carvalho ChehabThe first two arguments of it point to the struct cpuidle_device object 2113b735766SRafael J. Wysockirepresenting the logical CPU running this callback and the 212abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object representing the driver itself, respectively, 213abc59fd4SMauro Carvalho Chehaband the last one is an index of the struct cpuidle_state entry in the driver's 2143b735766SRafael J. Wysocki:c:member:`states` array representing the idle state to ask the processor to 2153b735766SRafael J. Wysockienter. 2163b735766SRafael J. Wysocki 217abc59fd4SMauro Carvalho ChehabThe analogous ``->enter_s2idle()`` callback in struct cpuidle_state is used 2183b735766SRafael J. Wysockionly for implementing the suspend-to-idle system-wide power management feature. 2193b735766SRafael J. WysockiThe difference between in and ``->enter()`` is that it must not re-enable 2203b735766SRafael J. Wysockiinterrupts at any point (even temporarily) or attempt to change the states of 2213b735766SRafael J. Wysockiclock event devices, which the ``->enter()`` callback may do sometimes. 2223b735766SRafael J. Wysocki 2233b735766SRafael J. WysockiOnce the :c:member:`states` array has been populated, the number of valid 2243b735766SRafael J. Wysockientries in it has to be stored in the :c:member:`state_count` field of the 225abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object representing the driver. Moreover, if any 2263b735766SRafael J. Wysockientries in the :c:member:`states` array represent "coupled" idle states (that 2273b735766SRafael J. Wysockiis, idle states that can only be asked for if multiple related logical CPUs are 228abc59fd4SMauro Carvalho Chehabidle), the :c:member:`safe_state_index` field in struct cpuidle_driver needs 2293b735766SRafael J. Wysockito be the index of an idle state that is not "coupled" (that is, one that can be 2303b735766SRafael J. Wysockiasked for if only one logical CPU is idle). 2313b735766SRafael J. Wysocki 2323b735766SRafael J. WysockiIn addition to that, if the given ``CPUIdle`` driver is only going to handle a 2333b735766SRafael J. Wysockisubset of logical CPUs in the system, the :c:member:`cpumask` field in its 234abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object must point to the set (mask) of CPUs that will be 2353b735766SRafael J. Wysockihandled by it. 2363b735766SRafael J. Wysocki 2373b735766SRafael J. WysockiA ``CPUIdle`` driver can only be used after it has been registered. If there 2383b735766SRafael J. Wysockiare no "coupled" idle state entries in the driver's :c:member:`states` array, 239abc59fd4SMauro Carvalho Chehabthat can be accomplished by passing the driver's struct cpuidle_driver object 2403b735766SRafael J. Wysockito :c:func:`cpuidle_register_driver()`. Otherwise, :c:func:`cpuidle_register()` 2413b735766SRafael J. Wysockishould be used for this purpose. 2423b735766SRafael J. Wysocki 243abc59fd4SMauro Carvalho ChehabHowever, it also is necessary to register struct cpuidle_device objects for 2443b735766SRafael J. Wysockiall of the logical CPUs to be handled by the given ``CPUIdle`` driver with the 2453b735766SRafael J. Wysockihelp of :c:func:`cpuidle_register_device()` after the driver has been registered 2463b735766SRafael J. Wysockiand :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`, 2473b735766SRafael J. Wysockidoes not do that automatically. For this reason, the drivers that use 2483b735766SRafael J. Wysocki:c:func:`cpuidle_register_driver()` to register themselves must also take care 249abc59fd4SMauro Carvalho Chehabof registering the struct cpuidle_device objects as needed, so it is generally 2503b735766SRafael J. Wysockirecommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver 2513b735766SRafael J. Wysockiregistration in all cases. 2523b735766SRafael J. Wysocki 253abc59fd4SMauro Carvalho ChehabThe registration of a struct cpuidle_device object causes the ``CPUIdle`` 2543b735766SRafael J. Wysocki``sysfs`` interface to be created and the governor's ``->enable()`` callback to 2553b735766SRafael J. Wysockibe invoked for the logical CPU represented by it, so it must take place after 2563b735766SRafael J. Wysockiregistering the driver that will handle the CPU in question. 2573b735766SRafael J. Wysocki 258abc59fd4SMauro Carvalho Chehab``CPUIdle`` drivers and struct cpuidle_device objects can be unregistered 2593b735766SRafael J. Wysockiwhen they are not necessary any more which allows some resources associated with 2603b735766SRafael J. Wysockithem to be released. Due to dependencies between them, all of the 261abc59fd4SMauro Carvalho Chehabstruct cpuidle_device objects representing CPUs handled by the given 2623b735766SRafael J. Wysocki``CPUIdle`` driver must be unregistered, with the help of 2633b735766SRafael J. Wysocki:c:func:`cpuidle_unregister_device()`, before calling 2643b735766SRafael J. Wysocki:c:func:`cpuidle_unregister_driver()` to unregister the driver. Alternatively, 2653b735766SRafael J. Wysocki:c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver 266abc59fd4SMauro Carvalho Chehabalong with all of the struct cpuidle_device objects representing CPUs handled 2673b735766SRafael J. Wysockiby it. 2683b735766SRafael J. Wysocki 2693b735766SRafael J. Wysocki``CPUIdle`` drivers can respond to runtime system configuration changes that 2703b735766SRafael J. Wysockilead to modifications of the list of available processor idle states (which can 2713b735766SRafael J. Wysockihappen, for example, when the system's power source is switched from AC to 2723b735766SRafael J. Wysockibattery or the other way around). Upon a notification of such a change, 2733b735766SRafael J. Wysockia ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to 2743b735766SRafael J. Wysockiturn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for 275abc59fd4SMauro Carvalho Chehaball of the struct cpuidle_device objects representing CPUs affected by that 2763b735766SRafael J. Wysockichange. Next, it can update its :c:member:`states` array in accordance with 2773b735766SRafael J. Wysockithe new configuration of the system, call :c:func:`cpuidle_enable_device()` for 278abc59fd4SMauro Carvalho Chehaball of the relevant struct cpuidle_device objects and invoke 2793b735766SRafael J. Wysocki:c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again. 280