1fc7db767SRafael J. Wysocki.. SPDX-License-Identifier: GPL-2.0 2fc1860d6SRafael J. Wysocki.. include:: <isonum.txt> 3fc7db767SRafael J. Wysocki 42a0e4927SRafael J. Wysocki.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>` 533fc30b4SRafael J. Wysocki.. |intel_pstate| replace:: :doc:`intel_pstate <intel_pstate>` 62a0e4927SRafael J. Wysocki 72a0e4927SRafael J. Wysocki======================= 82a0e4927SRafael J. WysockiCPU Performance Scaling 92a0e4927SRafael J. Wysocki======================= 102a0e4927SRafael J. Wysocki 11fc1860d6SRafael J. Wysocki:Copyright: |copy| 2017 Intel Corporation 122a0e4927SRafael J. Wysocki 13fc1860d6SRafael J. Wysocki:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 14fc1860d6SRafael J. Wysocki 152a0e4927SRafael J. Wysocki 162a0e4927SRafael J. WysockiThe Concept of CPU Performance Scaling 172a0e4927SRafael J. Wysocki====================================== 182a0e4927SRafael J. Wysocki 192a0e4927SRafael J. WysockiThe majority of modern processors are capable of operating in a number of 202a0e4927SRafael J. Wysockidifferent clock frequency and voltage configurations, often referred to as 212a0e4927SRafael J. WysockiOperating Performance Points or P-states (in ACPI terminology). As a rule, 222a0e4927SRafael J. Wysockithe higher the clock frequency and the higher the voltage, the more instructions 232a0e4927SRafael J. Wysockican be retired by the CPU over a unit of time, but also the higher the clock 242a0e4927SRafael J. Wysockifrequency and the higher the voltage, the more energy is consumed over a unit of 252a0e4927SRafael J. Wysockitime (or the more power is drawn) by the CPU in the given P-state. Therefore 262a0e4927SRafael J. Wysockithere is a natural tradeoff between the CPU capacity (the number of instructions 272a0e4927SRafael J. Wysockithat can be executed over a unit of time) and the power drawn by the CPU. 282a0e4927SRafael J. Wysocki 292a0e4927SRafael J. WysockiIn some situations it is desirable or even necessary to run the program as fast 302a0e4927SRafael J. Wysockias possible and then there is no reason to use any P-states different from the 312a0e4927SRafael J. Wysockihighest one (i.e. the highest-performance frequency/voltage configuration 322a0e4927SRafael J. Wysockiavailable). In some other cases, however, it may not be necessary to execute 332a0e4927SRafael J. Wysockiinstructions so quickly and maintaining the highest available CPU capacity for a 342a0e4927SRafael J. Wysockirelatively long time without utilizing it entirely may be regarded as wasteful. 352a0e4927SRafael J. WysockiIt also may not be physically possible to maintain maximum CPU capacity for too 362a0e4927SRafael J. Wysockilong for thermal or power supply capacity reasons or similar. To cover those 372a0e4927SRafael J. Wysockicases, there are hardware interfaces allowing CPUs to be switched between 382a0e4927SRafael J. Wysockidifferent frequency/voltage configurations or (in the ACPI terminology) to be 392a0e4927SRafael J. Wysockiput into different P-states. 402a0e4927SRafael J. Wysocki 412a0e4927SRafael J. WysockiTypically, they are used along with algorithms to estimate the required CPU 422a0e4927SRafael J. Wysockicapacity, so as to decide which P-states to put the CPUs into. Of course, since 432a0e4927SRafael J. Wysockithe utilization of the system generally changes over time, that has to be done 442a0e4927SRafael J. Wysockirepeatedly on a regular basis. The activity by which this happens is referred 452a0e4927SRafael J. Wysockito as CPU performance scaling or CPU frequency scaling (because it involves 462a0e4927SRafael J. Wysockiadjusting the CPU clock frequency). 472a0e4927SRafael J. Wysocki 482a0e4927SRafael J. Wysocki 492a0e4927SRafael J. WysockiCPU Performance Scaling in Linux 502a0e4927SRafael J. Wysocki================================ 512a0e4927SRafael J. Wysocki 522a0e4927SRafael J. WysockiThe Linux kernel supports CPU performance scaling by means of the ``CPUFreq`` 532a0e4927SRafael J. Wysocki(CPU Frequency scaling) subsystem that consists of three layers of code: the 542a0e4927SRafael J. Wysockicore, scaling governors and scaling drivers. 552a0e4927SRafael J. Wysocki 562a0e4927SRafael J. WysockiThe ``CPUFreq`` core provides the common code infrastructure and user space 572a0e4927SRafael J. Wysockiinterfaces for all platforms that support CPU performance scaling. It defines 582a0e4927SRafael J. Wysockithe basic framework in which the other components operate. 592a0e4927SRafael J. Wysocki 602a0e4927SRafael J. WysockiScaling governors implement algorithms to estimate the required CPU capacity. 612a0e4927SRafael J. WysockiAs a rule, each governor implements one, possibly parametrized, scaling 622a0e4927SRafael J. Wysockialgorithm. 632a0e4927SRafael J. Wysocki 642a0e4927SRafael J. WysockiScaling drivers talk to the hardware. They provide scaling governors with 652a0e4927SRafael J. Wysockiinformation on the available P-states (or P-state ranges in some cases) and 662a0e4927SRafael J. Wysockiaccess platform-specific hardware interfaces to change CPU P-states as requested 672a0e4927SRafael J. Wysockiby scaling governors. 682a0e4927SRafael J. Wysocki 692a0e4927SRafael J. WysockiIn principle, all available scaling governors can be used with every scaling 702a0e4927SRafael J. Wysockidriver. That design is based on the observation that the information used by 712a0e4927SRafael J. Wysockiperformance scaling algorithms for P-state selection can be represented in a 722a0e4927SRafael J. Wysockiplatform-independent form in the majority of cases, so it should be possible 732a0e4927SRafael J. Wysockito use the same performance scaling algorithm implemented in exactly the same 742a0e4927SRafael J. Wysockiway regardless of which scaling driver is used. Consequently, the same set of 752a0e4927SRafael J. Wysockiscaling governors should be suitable for every supported platform. 762a0e4927SRafael J. Wysocki 772a0e4927SRafael J. WysockiHowever, that observation may not hold for performance scaling algorithms 782a0e4927SRafael J. Wysockibased on information provided by the hardware itself, for example through 792a0e4927SRafael J. Wysockifeedback registers, as that information is typically specific to the hardware 802a0e4927SRafael J. Wysockiinterface it comes from and may not be easily represented in an abstract, 812a0e4927SRafael J. Wysockiplatform-independent way. For this reason, ``CPUFreq`` allows scaling drivers 822a0e4927SRafael J. Wysockito bypass the governor layer and implement their own performance scaling 8333fc30b4SRafael J. Wysockialgorithms. That is done by the |intel_pstate| scaling driver. 842a0e4927SRafael J. Wysocki 852a0e4927SRafael J. Wysocki 862a0e4927SRafael J. Wysocki``CPUFreq`` Policy Objects 872a0e4927SRafael J. Wysocki========================== 882a0e4927SRafael J. Wysocki 892a0e4927SRafael J. WysockiIn some cases the hardware interface for P-state control is shared by multiple 902a0e4927SRafael J. WysockiCPUs. That is, for example, the same register (or set of registers) is used to 912a0e4927SRafael J. Wysockicontrol the P-state of multiple CPUs at the same time and writing to it affects 922a0e4927SRafael J. Wysockiall of those CPUs simultaneously. 932a0e4927SRafael J. Wysocki 942a0e4927SRafael J. WysockiSets of CPUs sharing hardware P-state control interfaces are represented by 952a0e4927SRafael J. Wysocki``CPUFreq`` as |struct cpufreq_policy| objects. For consistency, 962a0e4927SRafael J. Wysocki|struct cpufreq_policy| is also used when there is only one CPU in the given 972a0e4927SRafael J. Wysockiset. 982a0e4927SRafael J. Wysocki 992a0e4927SRafael J. WysockiThe ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for 1002a0e4927SRafael J. Wysockievery CPU in the system, including CPUs that are currently offline. If multiple 1012a0e4927SRafael J. WysockiCPUs share the same hardware P-state control interface, all of the pointers 1022a0e4927SRafael J. Wysockicorresponding to them point to the same |struct cpufreq_policy| object. 1032a0e4927SRafael J. Wysocki 1042a0e4927SRafael J. Wysocki``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design 1052a0e4927SRafael J. Wysockiof its user space interface is based on the policy concept. 1062a0e4927SRafael J. Wysocki 1072a0e4927SRafael J. Wysocki 1082a0e4927SRafael J. WysockiCPU Initialization 1092a0e4927SRafael J. Wysocki================== 1102a0e4927SRafael J. Wysocki 1112a0e4927SRafael J. WysockiFirst of all, a scaling driver has to be registered for ``CPUFreq`` to work. 1122a0e4927SRafael J. WysockiIt is only possible to register one scaling driver at a time, so the scaling 1132a0e4927SRafael J. Wysockidriver is expected to be able to handle all CPUs in the system. 1142a0e4927SRafael J. Wysocki 1152a0e4927SRafael J. WysockiThe scaling driver may be registered before or after CPU registration. If 1162a0e4927SRafael J. WysockiCPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to 1172a0e4927SRafael J. Wysockitake a note of all of the already registered CPUs during the registration of the 1182a0e4927SRafael J. Wysockiscaling driver. In turn, if any CPUs are registered after the registration of 1192a0e4927SRafael J. Wysockithe scaling driver, the ``CPUFreq`` core will be invoked to take note of them 1202a0e4927SRafael J. Wysockiat their registration time. 1212a0e4927SRafael J. Wysocki 1222a0e4927SRafael J. WysockiIn any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it 1232a0e4927SRafael J. Wysockihas not seen so far as soon as it is ready to handle that CPU. [Note that the 1242a0e4927SRafael J. Wysockilogical CPU may be a physical single-core processor, or a single core in a 1252a0e4927SRafael J. Wysockimulticore processor, or a hardware thread in a physical processor or processor 1262a0e4927SRafael J. Wysockicore. In what follows "CPU" always means "logical CPU" unless explicitly stated 1272a0e4927SRafael J. Wysockiotherwise and the word "processor" is used to refer to the physical part 1282a0e4927SRafael J. Wysockipossibly including multiple logical CPUs.] 1292a0e4927SRafael J. Wysocki 1302a0e4927SRafael J. WysockiOnce invoked, the ``CPUFreq`` core checks if the policy pointer is already set 1312a0e4927SRafael J. Wysockifor the given CPU and if so, it skips the policy object creation. Otherwise, 1322a0e4927SRafael J. Wysockia new policy object is created and initialized, which involves the creation of 1332a0e4927SRafael J. Wysockia new policy directory in ``sysfs``, and the policy pointer corresponding to 1342a0e4927SRafael J. Wysockithe given CPU is set to the new policy object's address in memory. 1352a0e4927SRafael J. Wysocki 1362a0e4927SRafael J. WysockiNext, the scaling driver's ``->init()`` callback is invoked with the policy 1372a0e4927SRafael J. Wysockipointer of the new CPU passed to it as the argument. That callback is expected 1382a0e4927SRafael J. Wysockito initialize the performance scaling hardware interface for the given CPU (or, 1392a0e4927SRafael J. Wysockimore precisely, for the set of CPUs sharing the hardware interface it belongs 1402a0e4927SRafael J. Wysockito, represented by its policy object) and, if the policy object it has been 1412a0e4927SRafael J. Wysockicalled for is new, to set parameters of the policy, like the minimum and maximum 1422a0e4927SRafael J. Wysockifrequencies supported by the hardware, the table of available frequencies (if 1432a0e4927SRafael J. Wysockithe set of supported P-states is not a continuous range), and the mask of CPUs 1442a0e4927SRafael J. Wysockithat belong to the same policy (including both online and offline CPUs). That 1452a0e4927SRafael J. Wysockimask is then used by the core to populate the policy pointers for all of the 1462a0e4927SRafael J. WysockiCPUs in it. 1472a0e4927SRafael J. Wysocki 1482a0e4927SRafael J. WysockiThe next major initialization step for a new policy object is to attach a 1492a0e4927SRafael J. Wysockiscaling governor to it (to begin with, that is the default scaling governor 1502a0e4927SRafael J. Wysockidetermined by the kernel configuration, but it may be changed later 1512a0e4927SRafael J. Wysockivia ``sysfs``). First, a pointer to the new policy object is passed to the 1522a0e4927SRafael J. Wysockigovernor's ``->init()`` callback which is expected to initialize all of the 1532a0e4927SRafael J. Wysockidata structures necessary to handle the given policy and, possibly, to add 1542a0e4927SRafael J. Wysockia governor ``sysfs`` interface to it. Next, the governor is started by 1552a0e4927SRafael J. Wysockiinvoking its ``->start()`` callback. 1562a0e4927SRafael J. Wysocki 157e531efa1SZhao Wei LiewThat callback is expected to register per-CPU utilization update callbacks for 1582a0e4927SRafael J. Wysockiall of the online CPUs belonging to the given policy with the CPU scheduler. 1592a0e4927SRafael J. WysockiThe utilization update callbacks will be invoked by the CPU scheduler on 1602a0e4927SRafael J. Wysockiimportant events, like task enqueue and dequeue, on every iteration of the 1612a0e4927SRafael J. Wysockischeduler tick or generally whenever the CPU utilization may change (from the 1622a0e4927SRafael J. Wysockischeduler's perspective). They are expected to carry out computations needed 1632a0e4927SRafael J. Wysockito determine the P-state to use for the given policy going forward and to 1642a0e4927SRafael J. Wysockiinvoke the scaling driver to make changes to the hardware in accordance with 1652a0e4927SRafael J. Wysockithe P-state selection. The scaling driver may be invoked directly from 1662a0e4927SRafael J. Wysockischeduler context or asynchronously, via a kernel thread or workqueue, depending 1672a0e4927SRafael J. Wysockion the configuration and capabilities of the scaling driver and the governor. 1682a0e4927SRafael J. Wysocki 1692a0e4927SRafael J. WysockiSimilar steps are taken for policy objects that are not new, but were "inactive" 1702a0e4927SRafael J. Wysockipreviously, meaning that all of the CPUs belonging to them were offline. The 1712a0e4927SRafael J. Wysockionly practical difference in that case is that the ``CPUFreq`` core will attempt 1722a0e4927SRafael J. Wysockito use the scaling governor previously used with the policy that became 1732a0e4927SRafael J. Wysocki"inactive" (and is re-initialized now) instead of the default governor. 1742a0e4927SRafael J. Wysocki 1752a0e4927SRafael J. WysockiIn turn, if a previously offline CPU is being brought back online, but some 1762a0e4927SRafael J. Wysockiother CPUs sharing the policy object with it are online already, there is no 1772a0e4927SRafael J. Wysockineed to re-initialize the policy object at all. In that case, it only is 1782a0e4927SRafael J. Wysockinecessary to restart the scaling governor so that it can take the new online CPU 1792a0e4927SRafael J. Wysockiinto account. That is achieved by invoking the governor's ``->stop`` and 1802a0e4927SRafael J. Wysocki``->start()`` callbacks, in this order, for the entire policy. 1812a0e4927SRafael J. Wysocki 18233fc30b4SRafael J. WysockiAs mentioned before, the |intel_pstate| scaling driver bypasses the scaling 1832a0e4927SRafael J. Wysockigovernor layer of ``CPUFreq`` and provides its own P-state selection algorithms. 18433fc30b4SRafael J. WysockiConsequently, if |intel_pstate| is used, scaling governors are not attached to 1852a0e4927SRafael J. Wysockinew policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked 1862a0e4927SRafael J. Wysockito register per-CPU utilization update callbacks for each policy. These 1872a0e4927SRafael J. Wysockicallbacks are invoked by the CPU scheduler in the same way as for scaling 18833fc30b4SRafael J. Wysockigovernors, but in the |intel_pstate| case they both determine the P-state to 1892a0e4927SRafael J. Wysockiuse and change the hardware configuration accordingly in one go from scheduler 1902a0e4927SRafael J. Wysockicontext. 1912a0e4927SRafael J. Wysocki 1922a0e4927SRafael J. WysockiThe policy objects created during CPU initialization and other data structures 1932a0e4927SRafael J. Wysockiassociated with them are torn down when the scaling driver is unregistered 1942a0e4927SRafael J. Wysocki(which happens when the kernel module containing it is unloaded, for example) or 1952a0e4927SRafael J. Wysockiwhen the last CPU belonging to the given policy in unregistered. 1962a0e4927SRafael J. Wysocki 1972a0e4927SRafael J. Wysocki 1982a0e4927SRafael J. WysockiPolicy Interface in ``sysfs`` 1992a0e4927SRafael J. Wysocki============================= 2002a0e4927SRafael J. Wysocki 2012a0e4927SRafael J. WysockiDuring the initialization of the kernel, the ``CPUFreq`` core creates a 2022a0e4927SRafael J. Wysocki``sysfs`` directory (kobject) called ``cpufreq`` under 2032a0e4927SRafael J. Wysocki:file:`/sys/devices/system/cpu/`. 2042a0e4927SRafael J. Wysocki 2052a0e4927SRafael J. WysockiThat directory contains a ``policyX`` subdirectory (where ``X`` represents an 2062a0e4927SRafael J. Wysockiinteger number) for every policy object maintained by the ``CPUFreq`` core. 2072a0e4927SRafael J. WysockiEach ``policyX`` directory is pointed to by ``cpufreq`` symbolic links 2082a0e4927SRafael J. Wysockiunder :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer 2092a0e4927SRafael J. Wysockithat may be different from the one represented by ``X``) for all of the CPUs 2102a0e4927SRafael J. Wysockiassociated with (or belonging to) the given policy. The ``policyX`` directories 2112a0e4927SRafael J. Wysockiin :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific 2122a0e4927SRafael J. Wysockiattributes (files) to control ``CPUFreq`` behavior for the corresponding policy 2132a0e4927SRafael J. Wysockiobjects (that is, for all of the CPUs associated with them). 2142a0e4927SRafael J. Wysocki 2152a0e4927SRafael J. WysockiSome of those attributes are generic. They are created by the ``CPUFreq`` core 2162a0e4927SRafael J. Wysockiand their behavior generally does not depend on what scaling driver is in use 2172a0e4927SRafael J. Wysockiand what scaling governor is attached to the given policy. Some scaling drivers 2182a0e4927SRafael J. Wysockialso add driver-specific attributes to the policy directories in ``sysfs`` to 2192a0e4927SRafael J. Wysockicontrol policy-specific aspects of driver behavior. 2202a0e4927SRafael J. Wysocki 2212a0e4927SRafael J. WysockiThe generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/` 2222a0e4927SRafael J. Wysockiare the following: 2232a0e4927SRafael J. Wysocki 2242a0e4927SRafael J. Wysocki``affected_cpus`` 2252a0e4927SRafael J. Wysocki List of online CPUs belonging to this policy (i.e. sharing the hardware 2262a0e4927SRafael J. Wysocki performance scaling interface represented by the ``policyX`` policy 2272a0e4927SRafael J. Wysocki object). 2282a0e4927SRafael J. Wysocki 2292a0e4927SRafael J. Wysocki``bios_limit`` 2302a0e4927SRafael J. Wysocki If the platform firmware (BIOS) tells the OS to apply an upper limit to 2312a0e4927SRafael J. Wysocki CPU frequencies, that limit will be reported through this attribute (if 2322a0e4927SRafael J. Wysocki present). 2332a0e4927SRafael J. Wysocki 2342a0e4927SRafael J. Wysocki The existence of the limit may be a result of some (often unintentional) 2352a0e4927SRafael J. Wysocki BIOS settings, restrictions coming from a service processor or another 2362a0e4927SRafael J. Wysocki BIOS/HW-based mechanisms. 2372a0e4927SRafael J. Wysocki 2382a0e4927SRafael J. Wysocki This does not cover ACPI thermal limitations which can be discovered 2392a0e4927SRafael J. Wysocki through a generic thermal driver. 2402a0e4927SRafael J. Wysocki 2412a0e4927SRafael J. Wysocki This attribute is not present if the scaling driver in use does not 2422a0e4927SRafael J. Wysocki support it. 2432a0e4927SRafael J. Wysocki 244c2e3af11SRafael J. Wysocki``cpuinfo_cur_freq`` 245c2e3af11SRafael J. Wysocki Current frequency of the CPUs belonging to this policy as obtained from 246c2e3af11SRafael J. Wysocki the hardware (in KHz). 247c2e3af11SRafael J. Wysocki 248c2e3af11SRafael J. Wysocki This is expected to be the frequency the hardware actually runs at. 249c2e3af11SRafael J. Wysocki If that frequency cannot be determined, this attribute should not 250c2e3af11SRafael J. Wysocki be present. 251c2e3af11SRafael J. Wysocki 2522a0e4927SRafael J. Wysocki``cpuinfo_max_freq`` 2532a0e4927SRafael J. Wysocki Maximum possible operating frequency the CPUs belonging to this policy 2542a0e4927SRafael J. Wysocki can run at (in kHz). 2552a0e4927SRafael J. Wysocki 2562a0e4927SRafael J. Wysocki``cpuinfo_min_freq`` 2572a0e4927SRafael J. Wysocki Minimum possible operating frequency the CPUs belonging to this policy 2582a0e4927SRafael J. Wysocki can run at (in kHz). 2592a0e4927SRafael J. Wysocki 2602a0e4927SRafael J. Wysocki``cpuinfo_transition_latency`` 2612a0e4927SRafael J. Wysocki The time it takes to switch the CPUs belonging to this policy from one 2622a0e4927SRafael J. Wysocki P-state to another, in nanoseconds. 2632a0e4927SRafael J. Wysocki 2642a0e4927SRafael J. Wysocki If unknown or if known to be so high that the scaling driver does not 2652a0e4927SRafael J. Wysocki work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`) 2662a0e4927SRafael J. Wysocki will be returned by reads from this attribute. 2672a0e4927SRafael J. Wysocki 2682a0e4927SRafael J. Wysocki``related_cpus`` 2692a0e4927SRafael J. Wysocki List of all (online and offline) CPUs belonging to this policy. 2702a0e4927SRafael J. Wysocki 2712a0e4927SRafael J. Wysocki``scaling_available_governors`` 2722a0e4927SRafael J. Wysocki List of ``CPUFreq`` scaling governors present in the kernel that can 27333fc30b4SRafael J. Wysocki be attached to this policy or (if the |intel_pstate| scaling driver is 2742a0e4927SRafael J. Wysocki in use) list of scaling algorithms provided by the driver that can be 2752a0e4927SRafael J. Wysocki applied to this policy. 2762a0e4927SRafael J. Wysocki 2772a0e4927SRafael J. Wysocki [Note that some governors are modular and it may be necessary to load a 2782a0e4927SRafael J. Wysocki kernel module for the governor held by it to become available and be 2792a0e4927SRafael J. Wysocki listed by this attribute.] 2802a0e4927SRafael J. Wysocki 2812a0e4927SRafael J. Wysocki``scaling_cur_freq`` 2822a0e4927SRafael J. Wysocki Current frequency of all of the CPUs belonging to this policy (in kHz). 2832a0e4927SRafael J. Wysocki 2848183003eSRafael J. Wysocki In the majority of cases, this is the frequency of the last P-state 2858183003eSRafael J. Wysocki requested by the scaling driver from the hardware using the scaling 2862a0e4927SRafael J. Wysocki interface provided by it, which may or may not reflect the frequency 2872a0e4927SRafael J. Wysocki the CPU is actually running at (due to hardware design and other 2882a0e4927SRafael J. Wysocki limitations). 2892a0e4927SRafael J. Wysocki 2908183003eSRafael J. Wysocki Some architectures (e.g. ``x86``) may attempt to provide information 2918183003eSRafael J. Wysocki more precisely reflecting the current CPU frequency through this 2928183003eSRafael J. Wysocki attribute, but that still may not be the exact current CPU frequency as 2938183003eSRafael J. Wysocki seen by the hardware at the moment. 2942a0e4927SRafael J. Wysocki 2952a0e4927SRafael J. Wysocki``scaling_driver`` 2962a0e4927SRafael J. Wysocki The scaling driver currently in use. 2972a0e4927SRafael J. Wysocki 2982a0e4927SRafael J. Wysocki``scaling_governor`` 2992a0e4927SRafael J. Wysocki The scaling governor currently attached to this policy or (if the 30033fc30b4SRafael J. Wysocki |intel_pstate| scaling driver is in use) the scaling algorithm 3012a0e4927SRafael J. Wysocki provided by the driver that is currently applied to this policy. 3022a0e4927SRafael J. Wysocki 3032a0e4927SRafael J. Wysocki This attribute is read-write and writing to it will cause a new scaling 3042a0e4927SRafael J. Wysocki governor to be attached to this policy or a new scaling algorithm 3052a0e4927SRafael J. Wysocki provided by the scaling driver to be applied to it (in the 30633fc30b4SRafael J. Wysocki |intel_pstate| case), as indicated by the string written to this 3072a0e4927SRafael J. Wysocki attribute (which must be one of the names listed by the 3082a0e4927SRafael J. Wysocki ``scaling_available_governors`` attribute described above). 3092a0e4927SRafael J. Wysocki 3102a0e4927SRafael J. Wysocki``scaling_max_freq`` 3112a0e4927SRafael J. Wysocki Maximum frequency the CPUs belonging to this policy are allowed to be 3122a0e4927SRafael J. Wysocki running at (in kHz). 3132a0e4927SRafael J. Wysocki 3142a0e4927SRafael J. Wysocki This attribute is read-write and writing a string representing an 3152a0e4927SRafael J. Wysocki integer to it will cause a new limit to be set (it must not be lower 3162a0e4927SRafael J. Wysocki than the value of the ``scaling_min_freq`` attribute). 3172a0e4927SRafael J. Wysocki 3182a0e4927SRafael J. Wysocki``scaling_min_freq`` 3192a0e4927SRafael J. Wysocki Minimum frequency the CPUs belonging to this policy are allowed to be 3202a0e4927SRafael J. Wysocki running at (in kHz). 3212a0e4927SRafael J. Wysocki 3222a0e4927SRafael J. Wysocki This attribute is read-write and writing a string representing a 3232a0e4927SRafael J. Wysocki non-negative integer to it will cause a new limit to be set (it must not 3242a0e4927SRafael J. Wysocki be higher than the value of the ``scaling_max_freq`` attribute). 3252a0e4927SRafael J. Wysocki 3262a0e4927SRafael J. Wysocki``scaling_setspeed`` 3272a0e4927SRafael J. Wysocki This attribute is functional only if the `userspace`_ scaling governor 3282a0e4927SRafael J. Wysocki is attached to the given policy. 3292a0e4927SRafael J. Wysocki 3302a0e4927SRafael J. Wysocki It returns the last frequency requested by the governor (in kHz) or can 3312a0e4927SRafael J. Wysocki be written to in order to set a new frequency for the policy. 3322a0e4927SRafael J. Wysocki 3332a0e4927SRafael J. Wysocki 3342a0e4927SRafael J. WysockiGeneric Scaling Governors 3352a0e4927SRafael J. Wysocki========================= 3362a0e4927SRafael J. Wysocki 3372a0e4927SRafael J. Wysocki``CPUFreq`` provides generic scaling governors that can be used with all 3382a0e4927SRafael J. Wysockiscaling drivers. As stated before, each of them implements a single, possibly 3392a0e4927SRafael J. Wysockiparametrized, performance scaling algorithm. 3402a0e4927SRafael J. Wysocki 3412a0e4927SRafael J. WysockiScaling governors are attached to policy objects and different policy objects 3422a0e4927SRafael J. Wysockican be handled by different scaling governors at the same time (although that 3432a0e4927SRafael J. Wysockimay lead to suboptimal results in some cases). 3442a0e4927SRafael J. Wysocki 3452a0e4927SRafael J. WysockiThe scaling governor for a given policy object can be changed at any time with 3462a0e4927SRafael J. Wysockithe help of the ``scaling_governor`` policy attribute in ``sysfs``. 3472a0e4927SRafael J. Wysocki 3482a0e4927SRafael J. WysockiSome governors expose ``sysfs`` attributes to control or fine-tune the scaling 3492a0e4927SRafael J. Wysockialgorithms implemented by them. Those attributes, referred to as governor 3502a0e4927SRafael J. Wysockitunables, can be either global (system-wide) or per-policy, depending on the 3512a0e4927SRafael J. Wysockiscaling driver in use. If the driver requires governor tunables to be 3522a0e4927SRafael J. Wysockiper-policy, they are located in a subdirectory of each policy directory. 3532a0e4927SRafael J. WysockiOtherwise, they are located in a subdirectory under 3542a0e4927SRafael J. Wysocki:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the 3552a0e4927SRafael J. Wysockisubdirectory containing the governor tunables is the name of the governor 3562a0e4927SRafael J. Wysockiproviding them. 3572a0e4927SRafael J. Wysocki 3582a0e4927SRafael J. Wysocki``performance`` 3592a0e4927SRafael J. Wysocki--------------- 3602a0e4927SRafael J. Wysocki 3612a0e4927SRafael J. WysockiWhen attached to a policy object, this governor causes the highest frequency, 3622a0e4927SRafael J. Wysockiwithin the ``scaling_max_freq`` policy limit, to be requested for that policy. 3632a0e4927SRafael J. Wysocki 3642a0e4927SRafael J. WysockiThe request is made once at that time the governor for the policy is set to 3652a0e4927SRafael J. Wysocki``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` 3662a0e4927SRafael J. Wysockipolicy limits change after that. 3672a0e4927SRafael J. Wysocki 3682a0e4927SRafael J. Wysocki``powersave`` 3692a0e4927SRafael J. Wysocki------------- 3702a0e4927SRafael J. Wysocki 3712a0e4927SRafael J. WysockiWhen attached to a policy object, this governor causes the lowest frequency, 3722a0e4927SRafael J. Wysockiwithin the ``scaling_min_freq`` policy limit, to be requested for that policy. 3732a0e4927SRafael J. Wysocki 3742a0e4927SRafael J. WysockiThe request is made once at that time the governor for the policy is set to 3752a0e4927SRafael J. Wysocki``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` 3762a0e4927SRafael J. Wysockipolicy limits change after that. 3772a0e4927SRafael J. Wysocki 3782a0e4927SRafael J. Wysocki``userspace`` 3792a0e4927SRafael J. Wysocki------------- 3802a0e4927SRafael J. Wysocki 3812a0e4927SRafael J. WysockiThis governor does not do anything by itself. Instead, it allows user space 3822a0e4927SRafael J. Wysockito set the CPU frequency for the policy it is attached to by writing to the 3832a0e4927SRafael J. Wysocki``scaling_setspeed`` attribute of that policy. 3842a0e4927SRafael J. Wysocki 3852a0e4927SRafael J. Wysocki``schedutil`` 3862a0e4927SRafael J. Wysocki------------- 3872a0e4927SRafael J. Wysocki 3882a0e4927SRafael J. WysockiThis governor uses CPU utilization data available from the CPU scheduler. It 3892a0e4927SRafael J. Wysockigenerally is regarded as a part of the CPU scheduler, so it can access the 3902a0e4927SRafael J. Wysockischeduler's internal data structures directly. 3912a0e4927SRafael J. Wysocki 3922a0e4927SRafael J. WysockiIt runs entirely in scheduler context, although in some cases it may need to 3932a0e4927SRafael J. Wysockiinvoke the scaling driver asynchronously when it decides that the CPU frequency 3942a0e4927SRafael J. Wysockishould be changed for a given policy (that depends on whether or not the driver 3952a0e4927SRafael J. Wysockiis capable of changing the CPU frequency from scheduler context). 3962a0e4927SRafael J. Wysocki 3972a0e4927SRafael J. WysockiThe actions of this governor for a particular CPU depend on the scheduling class 3982a0e4927SRafael J. Wysockiinvoking its utilization update callback for that CPU. If it is invoked by the 3992a0e4927SRafael J. WysockiRT or deadline scheduling classes, the governor will increase the frequency to 4002a0e4927SRafael J. Wysockithe allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn, 4012a0e4927SRafael J. Wysockiif it is invoked by the CFS scheduling class, the governor will use the 4022a0e4927SRafael J. WysockiPer-Entity Load Tracking (PELT) metric for the root control group of the 4031120b0f9SRafael J. Wysockigiven CPU as the CPU utilization estimate (see the *Per-entity load tracking* 4041120b0f9SRafael J. WysockiLWN.net article [1]_ for a description of the PELT mechanism). Then, the new 4052a0e4927SRafael J. WysockiCPU frequency to apply is computed in accordance with the formula 4062a0e4927SRafael J. Wysocki 4072a0e4927SRafael J. Wysocki f = 1.25 * ``f_0`` * ``util`` / ``max`` 4082a0e4927SRafael J. Wysocki 4092a0e4927SRafael J. Wysockiwhere ``util`` is the PELT number, ``max`` is the theoretical maximum of 4102a0e4927SRafael J. Wysocki``util``, and ``f_0`` is either the maximum possible CPU frequency for the given 4112a0e4927SRafael J. Wysockipolicy (if the PELT number is frequency-invariant), or the current CPU frequency 4122a0e4927SRafael J. Wysocki(otherwise). 4132a0e4927SRafael J. Wysocki 4142a0e4927SRafael J. WysockiThis governor also employs a mechanism allowing it to temporarily bump up the 4152a0e4927SRafael J. WysockiCPU frequency for tasks that have been waiting on I/O most recently, called 4162a0e4927SRafael J. Wysocki"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag 4172a0e4927SRafael J. Wysockiis passed by the scheduler to the governor callback which causes the frequency 4182a0e4927SRafael J. Wysockito go up to the allowed maximum immediately and then draw back to the value 4192a0e4927SRafael J. Wysockireturned by the above formula over time. 4202a0e4927SRafael J. Wysocki 4212a0e4927SRafael J. WysockiThis governor exposes only one tunable: 4222a0e4927SRafael J. Wysocki 4232a0e4927SRafael J. Wysocki``rate_limit_us`` 4242a0e4927SRafael J. Wysocki Minimum time (in microseconds) that has to pass between two consecutive 4252a0e4927SRafael J. Wysocki runs of governor computations (default: 1000 times the scaling driver's 4262a0e4927SRafael J. Wysocki transition latency). 4272a0e4927SRafael J. Wysocki 4282a0e4927SRafael J. Wysocki The purpose of this tunable is to reduce the scheduler context overhead 4292a0e4927SRafael J. Wysocki of the governor which might be excessive without it. 4302a0e4927SRafael J. Wysocki 4312a0e4927SRafael J. WysockiThis governor generally is regarded as a replacement for the older `ondemand`_ 4322a0e4927SRafael J. Wysockiand `conservative`_ governors (described below), as it is simpler and more 4332a0e4927SRafael J. Wysockitightly integrated with the CPU scheduler, its overhead in terms of CPU context 4342a0e4927SRafael J. Wysockiswitches and similar is less significant, and it uses the scheduler's own CPU 4352a0e4927SRafael J. Wysockiutilization metric, so in principle its decisions should not contradict the 4362a0e4927SRafael J. Wysockidecisions made by the other parts of the scheduler. 4372a0e4927SRafael J. Wysocki 4382a0e4927SRafael J. Wysocki``ondemand`` 4392a0e4927SRafael J. Wysocki------------ 4402a0e4927SRafael J. Wysocki 4412a0e4927SRafael J. WysockiThis governor uses CPU load as a CPU frequency selection metric. 4422a0e4927SRafael J. Wysocki 4432a0e4927SRafael J. WysockiIn order to estimate the current CPU load, it measures the time elapsed between 4442a0e4927SRafael J. Wysockiconsecutive invocations of its worker routine and computes the fraction of that 4452a0e4927SRafael J. Wysockitime in which the given CPU was not idle. The ratio of the non-idle (active) 4462a0e4927SRafael J. Wysockitime to the total CPU time is taken as an estimate of the load. 4472a0e4927SRafael J. Wysocki 4482a0e4927SRafael J. WysockiIf this governor is attached to a policy shared by multiple CPUs, the load is 4492a0e4927SRafael J. Wysockiestimated for all of them and the greatest result is taken as the load estimate 4502a0e4927SRafael J. Wysockifor the entire policy. 4512a0e4927SRafael J. Wysocki 4522a0e4927SRafael J. WysockiThe worker routine of this governor has to run in process context, so it is 4532a0e4927SRafael J. Wysockiinvoked asynchronously (via a workqueue) and CPU P-states are updated from 4542a0e4927SRafael J. Wysockithere if necessary. As a result, the scheduler context overhead from this 4552a0e4927SRafael J. Wysockigovernor is minimum, but it causes additional CPU context switches to happen 4562a0e4927SRafael J. Wysockirelatively often and the CPU P-state updates triggered by it can be relatively 4572a0e4927SRafael J. Wysockiirregular. Also, it affects its own CPU load metric by running code that 4582a0e4927SRafael J. Wysockireduces the CPU idle time (even though the CPU idle time is only reduced very 4592a0e4927SRafael J. Wysockislightly by it). 4602a0e4927SRafael J. Wysocki 4612a0e4927SRafael J. WysockiIt generally selects CPU frequencies proportional to the estimated load, so that 4622a0e4927SRafael J. Wysockithe value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of 4632a0e4927SRafael J. Wysocki1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute 4642a0e4927SRafael J. Wysockicorresponds to the load of 0, unless when the load exceeds a (configurable) 4652a0e4927SRafael J. Wysockispeedup threshold, in which case it will go straight for the highest frequency 4662a0e4927SRafael J. Wysockiit is allowed to use (the ``scaling_max_freq`` policy limit). 4672a0e4927SRafael J. Wysocki 4682a0e4927SRafael J. WysockiThis governor exposes the following tunables: 4692a0e4927SRafael J. Wysocki 4702a0e4927SRafael J. Wysocki``sampling_rate`` 4712a0e4927SRafael J. Wysocki This is how often the governor's worker routine should run, in 4722a0e4927SRafael J. Wysocki microseconds. 4732a0e4927SRafael J. Wysocki 4742a0e4927SRafael J. Wysocki Typically, it is set to values of the order of 10000 (10 ms). Its 4752a0e4927SRafael J. Wysocki default value is equal to the value of ``cpuinfo_transition_latency`` 4762a0e4927SRafael J. Wysocki for each policy this governor is attached to (but since the unit here 4772a0e4927SRafael J. Wysocki is greater by 1000, this means that the time represented by 4782a0e4927SRafael J. Wysocki ``sampling_rate`` is 1000 times greater than the transition latency by 4792a0e4927SRafael J. Wysocki default). 4802a0e4927SRafael J. Wysocki 4812a0e4927SRafael J. Wysocki If this tunable is per-policy, the following shell command sets the time 4822a0e4927SRafael J. Wysocki represented by it to be 750 times as high as the transition latency:: 4832a0e4927SRafael J. Wysocki 4842a0e4927SRafael J. Wysocki # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate 4852a0e4927SRafael J. Wysocki 4862a0e4927SRafael J. Wysocki``up_threshold`` 4872a0e4927SRafael J. Wysocki If the estimated CPU load is above this value (in percent), the governor 4882a0e4927SRafael J. Wysocki will set the frequency to the maximum value allowed for the policy. 4892a0e4927SRafael J. Wysocki Otherwise, the selected frequency will be proportional to the estimated 4902a0e4927SRafael J. Wysocki CPU load. 4912a0e4927SRafael J. Wysocki 4922a0e4927SRafael J. Wysocki``ignore_nice_load`` 4932a0e4927SRafael J. Wysocki If set to 1 (default 0), it will cause the CPU load estimation code to 4942a0e4927SRafael J. Wysocki treat the CPU time spent on executing tasks with "nice" levels greater 4952a0e4927SRafael J. Wysocki than 0 as CPU idle time. 4962a0e4927SRafael J. Wysocki 4972a0e4927SRafael J. Wysocki This may be useful if there are tasks in the system that should not be 4982a0e4927SRafael J. Wysocki taken into account when deciding what frequency to run the CPUs at. 4992a0e4927SRafael J. Wysocki Then, to make that happen it is sufficient to increase the "nice" level 5002a0e4927SRafael J. Wysocki of those tasks above 0 and set this attribute to 1. 5012a0e4927SRafael J. Wysocki 5022a0e4927SRafael J. Wysocki``sampling_down_factor`` 5032a0e4927SRafael J. Wysocki Temporary multiplier, between 1 (default) and 100 inclusive, to apply to 5042a0e4927SRafael J. Wysocki the ``sampling_rate`` value if the CPU load goes above ``up_threshold``. 5052a0e4927SRafael J. Wysocki 5062a0e4927SRafael J. Wysocki This causes the next execution of the governor's worker routine (after 5072a0e4927SRafael J. Wysocki setting the frequency to the allowed maximum) to be delayed, so the 5082a0e4927SRafael J. Wysocki frequency stays at the maximum level for a longer time. 5092a0e4927SRafael J. Wysocki 5102a0e4927SRafael J. Wysocki Frequency fluctuations in some bursty workloads may be avoided this way 5112a0e4927SRafael J. Wysocki at the cost of additional energy spent on maintaining the maximum CPU 5122a0e4927SRafael J. Wysocki capacity. 5132a0e4927SRafael J. Wysocki 5142a0e4927SRafael J. Wysocki``powersave_bias`` 5152a0e4927SRafael J. Wysocki Reduction factor to apply to the original frequency target of the 5162a0e4927SRafael J. Wysocki governor (including the maximum value used when the ``up_threshold`` 5172a0e4927SRafael J. Wysocki value is exceeded by the estimated CPU load) or sensitivity threshold 5182a0e4927SRafael J. Wysocki for the AMD frequency sensitivity powersave bias driver 5192a0e4927SRafael J. Wysocki (:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000 5202a0e4927SRafael J. Wysocki inclusive. 5212a0e4927SRafael J. Wysocki 5222a0e4927SRafael J. Wysocki If the AMD frequency sensitivity powersave bias driver is not loaded, 5232a0e4927SRafael J. Wysocki the effective frequency to apply is given by 5242a0e4927SRafael J. Wysocki 5252a0e4927SRafael J. Wysocki f * (1 - ``powersave_bias`` / 1000) 5262a0e4927SRafael J. Wysocki 5272a0e4927SRafael J. Wysocki where f is the governor's original frequency target. The default value 5282a0e4927SRafael J. Wysocki of this attribute is 0 in that case. 5292a0e4927SRafael J. Wysocki 5302a0e4927SRafael J. Wysocki If the AMD frequency sensitivity powersave bias driver is loaded, the 5312a0e4927SRafael J. Wysocki value of this attribute is 400 by default and it is used in a different 5322a0e4927SRafael J. Wysocki way. 5332a0e4927SRafael J. Wysocki 5342a0e4927SRafael J. Wysocki On Family 16h (and later) AMD processors there is a mechanism to get a 5352a0e4927SRafael J. Wysocki measured workload sensitivity, between 0 and 100% inclusive, from the 5362a0e4927SRafael J. Wysocki hardware. That value can be used to estimate how the performance of the 5372a0e4927SRafael J. Wysocki workload running on a CPU will change in response to frequency changes. 5382a0e4927SRafael J. Wysocki 5392a0e4927SRafael J. Wysocki The performance of a workload with the sensitivity of 0 (memory-bound or 5402a0e4927SRafael J. Wysocki IO-bound) is not expected to increase at all as a result of increasing 5412a0e4927SRafael J. Wysocki the CPU frequency, whereas workloads with the sensitivity of 100% 5422a0e4927SRafael J. Wysocki (CPU-bound) are expected to perform much better if the CPU frequency is 5432a0e4927SRafael J. Wysocki increased. 5442a0e4927SRafael J. Wysocki 5452a0e4927SRafael J. Wysocki If the workload sensitivity is less than the threshold represented by 5462a0e4927SRafael J. Wysocki the ``powersave_bias`` value, the sensitivity powersave bias driver 5472a0e4927SRafael J. Wysocki will cause the governor to select a frequency lower than its original 5482a0e4927SRafael J. Wysocki target, so as to avoid over-provisioning workloads that will not benefit 5492a0e4927SRafael J. Wysocki from running at higher CPU frequencies. 5502a0e4927SRafael J. Wysocki 5512a0e4927SRafael J. Wysocki``conservative`` 5522a0e4927SRafael J. Wysocki---------------- 5532a0e4927SRafael J. Wysocki 5542a0e4927SRafael J. WysockiThis governor uses CPU load as a CPU frequency selection metric. 5552a0e4927SRafael J. Wysocki 5562a0e4927SRafael J. WysockiIt estimates the CPU load in the same way as the `ondemand`_ governor described 5572a0e4927SRafael J. Wysockiabove, but the CPU frequency selection algorithm implemented by it is different. 5582a0e4927SRafael J. Wysocki 5592a0e4927SRafael J. WysockiNamely, it avoids changing the frequency significantly over short time intervals 5602a0e4927SRafael J. Wysockiwhich may not be suitable for systems with limited power supply capacity (e.g. 5612a0e4927SRafael J. Wysockibattery-powered). To achieve that, it changes the frequency in relatively 5622a0e4927SRafael J. Wysockismall steps, one step at a time, up or down - depending on whether or not a 5632a0e4927SRafael J. Wysocki(configurable) threshold has been exceeded by the estimated CPU load. 5642a0e4927SRafael J. Wysocki 5652a0e4927SRafael J. WysockiThis governor exposes the following tunables: 5662a0e4927SRafael J. Wysocki 5672a0e4927SRafael J. Wysocki``freq_step`` 5682a0e4927SRafael J. Wysocki Frequency step in percent of the maximum frequency the governor is 5692a0e4927SRafael J. Wysocki allowed to set (the ``scaling_max_freq`` policy limit), between 0 and 5702a0e4927SRafael J. Wysocki 100 (5 by default). 5712a0e4927SRafael J. Wysocki 5722a0e4927SRafael J. Wysocki This is how much the frequency is allowed to change in one go. Setting 5732a0e4927SRafael J. Wysocki it to 0 will cause the default frequency step (5 percent) to be used 5742a0e4927SRafael J. Wysocki and setting it to 100 effectively causes the governor to periodically 5752a0e4927SRafael J. Wysocki switch the frequency between the ``scaling_min_freq`` and 5762a0e4927SRafael J. Wysocki ``scaling_max_freq`` policy limits. 5772a0e4927SRafael J. Wysocki 5782a0e4927SRafael J. Wysocki``down_threshold`` 5792a0e4927SRafael J. Wysocki Threshold value (in percent, 20 by default) used to determine the 5802a0e4927SRafael J. Wysocki frequency change direction. 5812a0e4927SRafael J. Wysocki 5822a0e4927SRafael J. Wysocki If the estimated CPU load is greater than this value, the frequency will 5832a0e4927SRafael J. Wysocki go up (by ``freq_step``). If the load is less than this value (and the 5842a0e4927SRafael J. Wysocki ``sampling_down_factor`` mechanism is not in effect), the frequency will 5852a0e4927SRafael J. Wysocki go down. Otherwise, the frequency will not be changed. 5862a0e4927SRafael J. Wysocki 5872a0e4927SRafael J. Wysocki``sampling_down_factor`` 5882a0e4927SRafael J. Wysocki Frequency decrease deferral factor, between 1 (default) and 10 5892a0e4927SRafael J. Wysocki inclusive. 5902a0e4927SRafael J. Wysocki 5912a0e4927SRafael J. Wysocki It effectively causes the frequency to go down ``sampling_down_factor`` 5922a0e4927SRafael J. Wysocki times slower than it ramps up. 5932a0e4927SRafael J. Wysocki 5942a0e4927SRafael J. Wysocki 5952a0e4927SRafael J. WysockiFrequency Boost Support 5962a0e4927SRafael J. Wysocki======================= 5972a0e4927SRafael J. Wysocki 5982a0e4927SRafael J. WysockiBackground 5992a0e4927SRafael J. Wysocki---------- 6002a0e4927SRafael J. Wysocki 6012a0e4927SRafael J. WysockiSome processors support a mechanism to raise the operating frequency of some 6022a0e4927SRafael J. Wysockicores in a multicore package temporarily (and above the sustainable frequency 6032a0e4927SRafael J. Wysockithreshold for the whole package) under certain conditions, for example if the 6042a0e4927SRafael J. Wysockiwhole chip is not fully utilized and below its intended thermal or power budget. 6052a0e4927SRafael J. Wysocki 6062a0e4927SRafael J. WysockiDifferent names are used by different vendors to refer to this functionality. 6072a0e4927SRafael J. WysockiFor Intel processors it is referred to as "Turbo Boost", AMD calls it 6082a0e4927SRafael J. Wysocki"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on. 6092a0e4927SRafael J. WysockiAs a rule, it also is implemented differently by different vendors. The simple 6102a0e4927SRafael J. Wysockiterm "frequency boost" is used here for brevity to refer to all of those 6112a0e4927SRafael J. Wysockiimplementations. 6122a0e4927SRafael J. Wysocki 6132a0e4927SRafael J. WysockiThe frequency boost mechanism may be either hardware-based or software-based. 6142a0e4927SRafael J. WysockiIf it is hardware-based (e.g. on x86), the decision to trigger the boosting is 6152a0e4927SRafael J. Wysockimade by the hardware (although in general it requires the hardware to be put 6162a0e4927SRafael J. Wysockiinto a special state in which it can control the CPU frequency within certain 6172a0e4927SRafael J. Wysockilimits). If it is software-based (e.g. on ARM), the scaling driver decides 6182a0e4927SRafael J. Wysockiwhether or not to trigger boosting and when to do that. 6192a0e4927SRafael J. Wysocki 6202a0e4927SRafael J. WysockiThe ``boost`` File in ``sysfs`` 6212a0e4927SRafael J. Wysocki------------------------------- 6222a0e4927SRafael J. Wysocki 6232a0e4927SRafael J. WysockiThis file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls 6242a0e4927SRafael J. Wysockithe "boost" setting for the whole system. It is not present if the underlying 6252a0e4927SRafael J. Wysockiscaling driver does not support the frequency boost mechanism (or supports it, 6262a0e4927SRafael J. Wysockibut provides a driver-specific interface for controlling it, like 62733fc30b4SRafael J. Wysocki|intel_pstate|). 6282a0e4927SRafael J. Wysocki 6292a0e4927SRafael J. WysockiIf the value in this file is 1, the frequency boost mechanism is enabled. This 6302a0e4927SRafael J. Wysockimeans that either the hardware can be put into states in which it is able to 6312a0e4927SRafael J. Wysockitrigger boosting (in the hardware-based case), or the software is allowed to 6322a0e4927SRafael J. Wysockitrigger boosting (in the software-based case). It does not mean that boosting 6332a0e4927SRafael J. Wysockiis actually in use at the moment on any CPUs in the system. It only means a 6342a0e4927SRafael J. Wysockipermission to use the frequency boost mechanism (which still may never be used 6352a0e4927SRafael J. Wysockifor other reasons). 6362a0e4927SRafael J. Wysocki 6372a0e4927SRafael J. WysockiIf the value in this file is 0, the frequency boost mechanism is disabled and 6382a0e4927SRafael J. Wysockicannot be used at all. 6392a0e4927SRafael J. Wysocki 6402a0e4927SRafael J. WysockiThe only values that can be written to this file are 0 and 1. 6412a0e4927SRafael J. Wysocki 6422a0e4927SRafael J. WysockiRationale for Boost Control Knob 6432a0e4927SRafael J. Wysocki-------------------------------- 6442a0e4927SRafael J. Wysocki 6452a0e4927SRafael J. WysockiThe frequency boost mechanism is generally intended to help to achieve optimum 6462a0e4927SRafael J. WysockiCPU performance on time scales below software resolution (e.g. below the 6472a0e4927SRafael J. Wysockischeduler tick interval) and it is demonstrably suitable for many workloads, but 6482a0e4927SRafael J. Wysockiit may lead to problems in certain situations. 6492a0e4927SRafael J. Wysocki 6502a0e4927SRafael J. WysockiFor this reason, many systems make it possible to disable the frequency boost 6512a0e4927SRafael J. Wysockimechanism in the platform firmware (BIOS) setup, but that requires the system to 6522a0e4927SRafael J. Wysockibe restarted for the setting to be adjusted as desired, which may not be 6532a0e4927SRafael J. Wysockipractical at least in some cases. For example: 6542a0e4927SRafael J. Wysocki 6552a0e4927SRafael J. Wysocki 1. Boosting means overclocking the processor, although under controlled 6562a0e4927SRafael J. Wysocki conditions. Generally, the processor's energy consumption increases 6572a0e4927SRafael J. Wysocki as a result of increasing its frequency and voltage, even temporarily. 6582a0e4927SRafael J. Wysocki That may not be desirable on systems that switch to power sources of 6592a0e4927SRafael J. Wysocki limited capacity, such as batteries, so the ability to disable the boost 6602a0e4927SRafael J. Wysocki mechanism while the system is running may help there (but that depends on 6612a0e4927SRafael J. Wysocki the workload too). 6622a0e4927SRafael J. Wysocki 6632a0e4927SRafael J. Wysocki 2. In some situations deterministic behavior is more important than 6642a0e4927SRafael J. Wysocki performance or energy consumption (or both) and the ability to disable 6652a0e4927SRafael J. Wysocki boosting while the system is running may be useful then. 6662a0e4927SRafael J. Wysocki 6672a0e4927SRafael J. Wysocki 3. To examine the impact of the frequency boost mechanism itself, it is useful 6682a0e4927SRafael J. Wysocki to be able to run tests with and without boosting, preferably without 6692a0e4927SRafael J. Wysocki restarting the system in the meantime. 6702a0e4927SRafael J. Wysocki 6712a0e4927SRafael J. Wysocki 4. Reproducible results are important when running benchmarks. Since 6722a0e4927SRafael J. Wysocki the boosting functionality depends on the load of the whole package, 6732a0e4927SRafael J. Wysocki single-thread performance may vary because of it which may lead to 6742a0e4927SRafael J. Wysocki unreproducible results sometimes. That can be avoided by disabling the 6752a0e4927SRafael J. Wysocki frequency boost mechanism before running benchmarks sensitive to that 6762a0e4927SRafael J. Wysocki issue. 6772a0e4927SRafael J. Wysocki 6782a0e4927SRafael J. WysockiLegacy AMD ``cpb`` Knob 6792a0e4927SRafael J. Wysocki----------------------- 6802a0e4927SRafael J. Wysocki 6812a0e4927SRafael J. WysockiThe AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to 6822a0e4927SRafael J. Wysockithe global ``boost`` one. It is used for disabling/enabling the "Core 6832a0e4927SRafael J. WysockiPerformance Boost" feature of some AMD processors. 6842a0e4927SRafael J. Wysocki 6852a0e4927SRafael J. WysockiIf present, that knob is located in every ``CPUFreq`` policy directory in 6862a0e4927SRafael J. Wysocki``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called 6872a0e4927SRafael J. Wysocki``cpb``, which indicates a more fine grained control interface. The actual 6882a0e4927SRafael J. Wysockiimplementation, however, works on the system-wide basis and setting that knob 6892a0e4927SRafael J. Wysockifor one policy causes the same value of it to be set for all of the other 6902a0e4927SRafael J. Wysockipolicies at the same time. 6912a0e4927SRafael J. Wysocki 6922a0e4927SRafael J. WysockiThat knob is still supported on AMD processors that support its underlying 6932a0e4927SRafael J. Wysockihardware feature, but it may be configured out of the kernel (via the 6942a0e4927SRafael J. Wysocki:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global 6952a0e4927SRafael J. Wysocki``boost`` knob is present regardless. Thus it is always possible use the 6962a0e4927SRafael J. Wysocki``boost`` knob instead of the ``cpb`` one which is highly recommended, as that 6972a0e4927SRafael J. Wysockiis more consistent with what all of the other systems do (and the ``cpb`` knob 6982a0e4927SRafael J. Wysockimay not be supported any more in the future). 6992a0e4927SRafael J. Wysocki 7002a0e4927SRafael J. WysockiThe ``cpb`` knob is never present for any processors without the underlying 7012a0e4927SRafael J. Wysockihardware feature (e.g. all Intel ones), even if the 7022a0e4927SRafael J. Wysocki:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set. 7032a0e4927SRafael J. Wysocki 7042a0e4927SRafael J. Wysocki 7051120b0f9SRafael J. WysockiReferences 7061120b0f9SRafael J. Wysocki========== 7071120b0f9SRafael J. Wysocki 7081120b0f9SRafael J. Wysocki.. [1] Jonathan Corbet, *Per-entity load tracking*, 7091120b0f9SRafael J. Wysocki https://lwn.net/Articles/531853/ 710