1fc7db767SRafael J. Wysocki.. SPDX-License-Identifier: GPL-2.0
2fc1860d6SRafael J. Wysocki.. include:: <isonum.txt>
3fc7db767SRafael J. Wysocki
43b735766SRafael J. Wysocki========================
53b735766SRafael J. WysockiCPU Idle Time Management
63b735766SRafael J. Wysocki========================
73b735766SRafael J. Wysocki
8fc1860d6SRafael J. Wysocki:Copyright: |copy| 2019 Intel Corporation
93b735766SRafael J. Wysocki
10fc1860d6SRafael J. Wysocki:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
113b735766SRafael J. Wysocki
123b735766SRafael J. Wysocki
133b735766SRafael J. WysockiCPU Idle Time Management Subsystem
143b735766SRafael J. Wysocki==================================
153b735766SRafael J. Wysocki
163b735766SRafael J. WysockiEvery time one of the logical CPUs in the system (the entities that appear to
173b735766SRafael J. Wysockifetch and execute instructions: hardware threads, if present, or processor
183b735766SRafael J. Wysockicores) is idle after an interrupt or equivalent wakeup event, which means that
193b735766SRafael J. Wysockithere are no tasks to run on it except for the special "idle" task associated
203b735766SRafael J. Wysockiwith it, there is an opportunity to save energy for the processor that it
213b735766SRafael J. Wysockibelongs to.  That can be done by making the idle logical CPU stop fetching
223b735766SRafael J. Wysockiinstructions from memory and putting some of the processor's functional units
233b735766SRafael J. Wysockidepended on by it into an idle state in which they will draw less power.
243b735766SRafael J. Wysocki
253b735766SRafael J. WysockiHowever, there may be multiple different idle states that can be used in such a
263b735766SRafael J. Wysockisituation in principle, so it may be necessary to find the most suitable one
273b735766SRafael J. Wysocki(from the kernel perspective) and ask the processor to use (or "enter") that
283b735766SRafael J. Wysockiparticular idle state.  That is the role of the CPU idle time management
293b735766SRafael J. Wysockisubsystem in the kernel, called ``CPUIdle``.
303b735766SRafael J. Wysocki
313b735766SRafael J. WysockiThe design of ``CPUIdle`` is modular and based on the code duplication avoidance
323b735766SRafael J. Wysockiprinciple, so the generic code that in principle need not depend on the hardware
333b735766SRafael J. Wysockior platform design details in it is separate from the code that interacts with
343b735766SRafael J. Wysockithe hardware.  It generally is divided into three categories of functional
353b735766SRafael J. Wysockiunits: *governors* responsible for selecting idle states to ask the processor
363b735766SRafael J. Wysockito enter, *drivers* that pass the governors' decisions on to the hardware and
373b735766SRafael J. Wysockithe *core* providing a common framework for them.
383b735766SRafael J. Wysocki
393b735766SRafael J. Wysocki
403b735766SRafael J. WysockiCPU Idle Time Governors
413b735766SRafael J. Wysocki=======================
423b735766SRafael J. Wysocki
433b735766SRafael J. WysockiA CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when
443b735766SRafael J. Wysockione of the logical CPUs in the system turns out to be idle.  Its role is to
453b735766SRafael J. Wysockiselect an idle state to ask the processor to enter in order to save some energy.
463b735766SRafael J. Wysocki
473b735766SRafael J. Wysocki``CPUIdle`` governors are generic and each of them can be used on any hardware
483b735766SRafael J. Wysockiplatform that the Linux kernel can run on.  For this reason, data structures
493b735766SRafael J. Wysockioperated on by them cannot depend on any hardware architecture or platform
503b735766SRafael J. Wysockidesign details as well.
513b735766SRafael J. Wysocki
52abc59fd4SMauro Carvalho ChehabThe governor itself is represented by a struct cpuidle_governor object
533b735766SRafael J. Wysockicontaining four callback pointers, :c:member:`enable`, :c:member:`disable`,
543b735766SRafael J. Wysocki:c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described
553b735766SRafael J. Wysockibelow, and a name (string) used for identifying it.
563b735766SRafael J. Wysocki
573b735766SRafael J. WysockiFor the governor to be available at all, that object needs to be registered
583b735766SRafael J. Wysockiwith the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with
593b735766SRafael J. Wysockia pointer to it passed as the argument.  If successful, that causes the core to
603b735766SRafael J. Wysockiadd the governor to the global list of available governors and, if it is the
613b735766SRafael J. Wysockionly one in the list (that is, the list was empty before) or the value of its
623b735766SRafael J. Wysocki:c:member:`rating` field is greater than the value of that field for the
633b735766SRafael J. Wysockigovernor currently in use, or the name of the new governor was passed to the
643b735766SRafael J. Wysockikernel as the value of the ``cpuidle.governor=`` command line parameter, the new
653b735766SRafael J. Wysockigovernor will be used from that point on (there can be only one ``CPUIdle``
667395683aSHanjun Guogovernor in use at a time).  Also, user space can choose the ``CPUIdle``
677395683aSHanjun Guogovernor to use at run time via ``sysfs``.
683b735766SRafael J. Wysocki
693b735766SRafael J. WysockiOnce registered, ``CPUIdle`` governors cannot be unregistered, so it is not
703b735766SRafael J. Wysockipractical to put them into loadable kernel modules.
713b735766SRafael J. Wysocki
723b735766SRafael J. WysockiThe interface between ``CPUIdle`` governors and the core consists of four
733b735766SRafael J. Wysockicallbacks:
743b735766SRafael J. Wysocki
753b735766SRafael J. Wysocki:c:member:`enable`
763b735766SRafael J. Wysocki	::
773b735766SRafael J. Wysocki
783b735766SRafael J. Wysocki	  int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
793b735766SRafael J. Wysocki
803b735766SRafael J. Wysocki	The role of this callback is to prepare the governor for handling the
81abc59fd4SMauro Carvalho Chehab	(logical) CPU represented by the struct cpuidle_device object	pointed
82abc59fd4SMauro Carvalho Chehab	to by the ``dev`` argument.  The struct cpuidle_driver object pointed
833b735766SRafael J. Wysocki	to by the ``drv`` argument represents the ``CPUIdle`` driver to be used
843b735766SRafael J. Wysocki	with that CPU (among other things, it should contain the list of
85abc59fd4SMauro Carvalho Chehab	struct cpuidle_state objects representing idle states that the
863b735766SRafael J. Wysocki	processor holding the given CPU can be asked to enter).
873b735766SRafael J. Wysocki
883b735766SRafael J. Wysocki	It may fail, in which case it is expected to return a negative error
893b735766SRafael J. Wysocki	code, and that causes the kernel to run the architecture-specific
903b735766SRafael J. Wysocki	default code for idle CPUs on the CPU in question instead of ``CPUIdle``
913b735766SRafael J. Wysocki	until the ``->enable()`` governor callback is invoked for that CPU
923b735766SRafael J. Wysocki	again.
933b735766SRafael J. Wysocki
943b735766SRafael J. Wysocki:c:member:`disable`
953b735766SRafael J. Wysocki	::
963b735766SRafael J. Wysocki
973b735766SRafael J. Wysocki	  void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
983b735766SRafael J. Wysocki
993b735766SRafael J. Wysocki	Called to make the governor stop handling the (logical) CPU represented
100abc59fd4SMauro Carvalho Chehab	by the struct cpuidle_device object pointed to by the ``dev``
1013b735766SRafael J. Wysocki	argument.
1023b735766SRafael J. Wysocki
1033b735766SRafael J. Wysocki	It is expected to reverse any changes made by the ``->enable()``
1043b735766SRafael J. Wysocki	callback when it was last invoked for the target CPU, free all memory
1053b735766SRafael J. Wysocki	allocated by that callback and so on.
1063b735766SRafael J. Wysocki
1073b735766SRafael J. Wysocki:c:member:`select`
1083b735766SRafael J. Wysocki	::
1093b735766SRafael J. Wysocki
1103b735766SRafael J. Wysocki	  int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev,
1113b735766SRafael J. Wysocki	                 bool *stop_tick);
1123b735766SRafael J. Wysocki
1133b735766SRafael J. Wysocki	Called to select an idle state for the processor holding the (logical)
114abc59fd4SMauro Carvalho Chehab	CPU represented by the struct cpuidle_device object pointed to by the
1153b735766SRafael J. Wysocki	``dev`` argument.
1163b735766SRafael J. Wysocki
1173b735766SRafael J. Wysocki	The list of idle states to take into consideration is represented by the
118abc59fd4SMauro Carvalho Chehab	:c:member:`states` array of struct cpuidle_state objects held by the
119abc59fd4SMauro Carvalho Chehab	struct cpuidle_driver object pointed to by the ``drv`` argument (which
1203b735766SRafael J. Wysocki	represents the ``CPUIdle`` driver to be used with the CPU at hand).  The
1213b735766SRafael J. Wysocki	value returned by this callback is interpreted as an index into that
1223b735766SRafael J. Wysocki	array (unless it is a negative error code).
1233b735766SRafael J. Wysocki
1243b735766SRafael J. Wysocki	The ``stop_tick`` argument is used to indicate whether or not to stop
1253b735766SRafael J. Wysocki	the scheduler tick before asking the processor to enter the selected
1263b735766SRafael J. Wysocki	idle state.  When the ``bool`` variable pointed to by it (which is set
1273b735766SRafael J. Wysocki	to ``true`` before invoking this callback) is cleared to ``false``, the
1283b735766SRafael J. Wysocki	processor will be asked to enter the selected idle state without
1293b735766SRafael J. Wysocki	stopping the scheduler tick on the given CPU (if the tick has been
1303b735766SRafael J. Wysocki	stopped on that CPU already, however, it will not be restarted before
1313b735766SRafael J. Wysocki	asking the processor to enter the idle state).
1323b735766SRafael J. Wysocki
1333b735766SRafael J. Wysocki	This callback is mandatory (i.e. the :c:member:`select` callback pointer
134abc59fd4SMauro Carvalho Chehab	in struct cpuidle_governor must not be ``NULL`` for the registration
1353b735766SRafael J. Wysocki	of the governor to succeed).
1363b735766SRafael J. Wysocki
1373b735766SRafael J. Wysocki:c:member:`reflect`
1383b735766SRafael J. Wysocki	::
1393b735766SRafael J. Wysocki
1403b735766SRafael J. Wysocki	  void (*reflect) (struct cpuidle_device *dev, int index);
1413b735766SRafael J. Wysocki
1423b735766SRafael J. Wysocki	Called to allow the governor to evaluate the accuracy of the idle state
1433b735766SRafael J. Wysocki	selection made by the ``->select()`` callback (when it was invoked last
1443b735766SRafael J. Wysocki	time) and possibly use the result of that to improve the accuracy of
1453b735766SRafael J. Wysocki	idle state selections in the future.
1463b735766SRafael J. Wysocki
1473b735766SRafael J. WysockiIn addition, ``CPUIdle`` governors are required to take power management
1483b735766SRafael J. Wysockiquality of service (PM QoS) constraints on the processor wakeup latency into
1493b735766SRafael J. Wysockiaccount when selecting idle states.  In order to obtain the current effective
1503b735766SRafael J. WysockiPM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is
1513b735766SRafael J. Wysockiexpected to pass the number of the CPU to
1523b735766SRafael J. Wysocki:c:func:`cpuidle_governor_latency_req()`.  Then, the governor's ``->select()``
1533b735766SRafael J. Wysockicallback must not return the index of an indle state whose
1543b735766SRafael J. Wysocki:c:member:`exit_latency` value is greater than the number returned by that
1553b735766SRafael J. Wysockifunction.
1563b735766SRafael J. Wysocki
1573b735766SRafael J. Wysocki
1583b735766SRafael J. WysockiCPU Idle Time Management Drivers
1593b735766SRafael J. Wysocki================================
1603b735766SRafael J. Wysocki
1613b735766SRafael J. WysockiCPU idle time management (``CPUIdle``) drivers provide an interface between the
1623b735766SRafael J. Wysockiother parts of ``CPUIdle`` and the hardware.
1633b735766SRafael J. Wysocki
1643b735766SRafael J. WysockiFirst of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array
165abc59fd4SMauro Carvalho Chehabof struct cpuidle_state objects included in the struct cpuidle_driver object
1663b735766SRafael J. Wysockirepresenting it.  Going forward this array will represent the list of available
1673b735766SRafael J. Wysockiidle states that the processor hardware can be asked to enter shared by all of
1683b735766SRafael J. Wysockithe logical CPUs handled by the given driver.
1693b735766SRafael J. Wysocki
1703b735766SRafael J. WysockiThe entries in the :c:member:`states` array are expected to be sorted by the
171abc59fd4SMauro Carvalho Chehabvalue of the :c:member:`target_residency` field in struct cpuidle_state in
1723b735766SRafael J. Wysockithe ascending order (that is, index 0 should correspond to the idle state with
1733b735766SRafael J. Wysockithe minimum value of :c:member:`target_residency`).  [Since the
1743b735766SRafael J. Wysocki:c:member:`target_residency` value is expected to reflect the "depth" of the
175abc59fd4SMauro Carvalho Chehabidle state represented by the struct cpuidle_state object holding it, this
1763b735766SRafael J. Wysockisorting order should be the same as the ascending sorting order by the idle
1773b735766SRafael J. Wysockistate "depth".]
1783b735766SRafael J. Wysocki
179abc59fd4SMauro Carvalho ChehabThree fields in struct cpuidle_state are used by the existing ``CPUIdle``
1803b735766SRafael J. Wysockigovernors for computations related to idle state selection:
1813b735766SRafael J. Wysocki
1823b735766SRafael J. Wysocki:c:member:`target_residency`
1833b735766SRafael J. Wysocki	Minimum time to spend in this idle state including the time needed to
1843b735766SRafael J. Wysocki	enter it (which may be substantial) to save more energy than could
1853b735766SRafael J. Wysocki	be saved by staying in a shallower idle state for the same amount of
1863b735766SRafael J. Wysocki	time, in microseconds.
1873b735766SRafael J. Wysocki
1883b735766SRafael J. Wysocki:c:member:`exit_latency`
1893b735766SRafael J. Wysocki	Maximum time it will take a CPU asking the processor to enter this idle
1903b735766SRafael J. Wysocki	state to start executing the first instruction after a wakeup from it,
1913b735766SRafael J. Wysocki	in microseconds.
1923b735766SRafael J. Wysocki
1933b735766SRafael J. Wysocki:c:member:`flags`
1943b735766SRafael J. Wysocki	Flags representing idle state properties.  Currently, governors only use
1953b735766SRafael J. Wysocki	the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object
1963b735766SRafael J. Wysocki	does not represent a real idle state, but an interface to a software
1973b735766SRafael J. Wysocki	"loop" that can be used in order to avoid asking the processor to enter
1983b735766SRafael J. Wysocki	any idle state at all.  [There are other flags used by the ``CPUIdle``
1993b735766SRafael J. Wysocki	core in special situations.]
2003b735766SRafael J. Wysocki
201abc59fd4SMauro Carvalho ChehabThe :c:member:`enter` callback pointer in struct cpuidle_state, which must not
2023b735766SRafael J. Wysockibe ``NULL``, points to the routine to execute in order to ask the processor to
2033b735766SRafael J. Wysockienter this particular idle state:
2043b735766SRafael J. Wysocki
2053b735766SRafael J. Wysocki::
2063b735766SRafael J. Wysocki
2073b735766SRafael J. Wysocki  void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv,
2083b735766SRafael J. Wysocki                 int index);
2093b735766SRafael J. Wysocki
210abc59fd4SMauro Carvalho ChehabThe first two arguments of it point to the struct cpuidle_device object
2113b735766SRafael J. Wysockirepresenting the logical CPU running this callback and the
212abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object representing the driver itself, respectively,
213abc59fd4SMauro Carvalho Chehaband the last one is an index of the struct cpuidle_state entry in the driver's
2143b735766SRafael J. Wysocki:c:member:`states` array representing the idle state to ask the processor to
2153b735766SRafael J. Wysockienter.
2163b735766SRafael J. Wysocki
217abc59fd4SMauro Carvalho ChehabThe analogous ``->enter_s2idle()`` callback in struct cpuidle_state is used
2183b735766SRafael J. Wysockionly for implementing the suspend-to-idle system-wide power management feature.
2193b735766SRafael J. WysockiThe difference between in and ``->enter()`` is that it must not re-enable
2203b735766SRafael J. Wysockiinterrupts at any point (even temporarily) or attempt to change the states of
2213b735766SRafael J. Wysockiclock event devices, which the ``->enter()`` callback may do sometimes.
2223b735766SRafael J. Wysocki
2233b735766SRafael J. WysockiOnce the :c:member:`states` array has been populated, the number of valid
2243b735766SRafael J. Wysockientries in it has to be stored in the :c:member:`state_count` field of the
225abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object representing the driver.  Moreover, if any
2263b735766SRafael J. Wysockientries in the :c:member:`states` array represent "coupled" idle states (that
2273b735766SRafael J. Wysockiis, idle states that can only be asked for if multiple related logical CPUs are
228abc59fd4SMauro Carvalho Chehabidle), the :c:member:`safe_state_index` field in struct cpuidle_driver needs
2293b735766SRafael J. Wysockito be the index of an idle state that is not "coupled" (that is, one that can be
2303b735766SRafael J. Wysockiasked for if only one logical CPU is idle).
2313b735766SRafael J. Wysocki
2323b735766SRafael J. WysockiIn addition to that, if the given ``CPUIdle`` driver is only going to handle a
2333b735766SRafael J. Wysockisubset of logical CPUs in the system, the :c:member:`cpumask` field in its
234abc59fd4SMauro Carvalho Chehabstruct cpuidle_driver object must point to the set (mask) of CPUs that will be
2353b735766SRafael J. Wysockihandled by it.
2363b735766SRafael J. Wysocki
2373b735766SRafael J. WysockiA ``CPUIdle`` driver can only be used after it has been registered.  If there
2383b735766SRafael J. Wysockiare no "coupled" idle state entries in the driver's :c:member:`states` array,
239abc59fd4SMauro Carvalho Chehabthat can be accomplished by passing the driver's struct cpuidle_driver object
2403b735766SRafael J. Wysockito :c:func:`cpuidle_register_driver()`.  Otherwise, :c:func:`cpuidle_register()`
2413b735766SRafael J. Wysockishould be used for this purpose.
2423b735766SRafael J. Wysocki
243abc59fd4SMauro Carvalho ChehabHowever, it also is necessary to register struct cpuidle_device objects for
2443b735766SRafael J. Wysockiall of the logical CPUs to be handled by the given ``CPUIdle`` driver with the
2453b735766SRafael J. Wysockihelp of :c:func:`cpuidle_register_device()` after the driver has been registered
2463b735766SRafael J. Wysockiand :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`,
2473b735766SRafael J. Wysockidoes not do that automatically.  For this reason, the drivers that use
2483b735766SRafael J. Wysocki:c:func:`cpuidle_register_driver()` to register themselves must also take care
249abc59fd4SMauro Carvalho Chehabof registering the struct cpuidle_device objects as needed, so it is generally
2503b735766SRafael J. Wysockirecommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver
2513b735766SRafael J. Wysockiregistration in all cases.
2523b735766SRafael J. Wysocki
253abc59fd4SMauro Carvalho ChehabThe registration of a struct cpuidle_device object causes the ``CPUIdle``
2543b735766SRafael J. Wysocki``sysfs`` interface to be created and the governor's ``->enable()`` callback to
2553b735766SRafael J. Wysockibe invoked for the logical CPU represented by it, so it must take place after
2563b735766SRafael J. Wysockiregistering the driver that will handle the CPU in question.
2573b735766SRafael J. Wysocki
258abc59fd4SMauro Carvalho Chehab``CPUIdle`` drivers and struct cpuidle_device objects can be unregistered
2593b735766SRafael J. Wysockiwhen they are not necessary any more which allows some resources associated with
2603b735766SRafael J. Wysockithem to be released.  Due to dependencies between them, all of the
261abc59fd4SMauro Carvalho Chehabstruct cpuidle_device objects representing CPUs handled by the given
2623b735766SRafael J. Wysocki``CPUIdle`` driver must be unregistered, with the help of
2633b735766SRafael J. Wysocki:c:func:`cpuidle_unregister_device()`, before calling
2643b735766SRafael J. Wysocki:c:func:`cpuidle_unregister_driver()` to unregister the driver.  Alternatively,
2653b735766SRafael J. Wysocki:c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver
266abc59fd4SMauro Carvalho Chehabalong with all of the struct cpuidle_device objects representing CPUs handled
2673b735766SRafael J. Wysockiby it.
2683b735766SRafael J. Wysocki
2693b735766SRafael J. Wysocki``CPUIdle`` drivers can respond to runtime system configuration changes that
2703b735766SRafael J. Wysockilead to modifications of the list of available processor idle states (which can
2713b735766SRafael J. Wysockihappen, for example, when the system's power source is switched from AC to
2723b735766SRafael J. Wysockibattery or the other way around).  Upon a notification of such a change,
2733b735766SRafael J. Wysockia ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to
2743b735766SRafael J. Wysockiturn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for
275abc59fd4SMauro Carvalho Chehaball of the struct cpuidle_device objects representing CPUs affected by that
2763b735766SRafael J. Wysockichange.  Next, it can update its :c:member:`states` array in accordance with
2773b735766SRafael J. Wysockithe new configuration of the system, call :c:func:`cpuidle_enable_device()` for
278abc59fd4SMauro Carvalho Chehaball of the relevant struct cpuidle_device objects and invoke
2793b735766SRafael J. Wysocki:c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again.
280