17b7570adSLukasz Luba.. SPDX-License-Identifier: GPL-2.0 27b7570adSLukasz Luba 37b7570adSLukasz Luba======================= 47b7570adSLukasz LubaEnergy Model of devices 57b7570adSLukasz Luba======================= 6151f4e2bSMauro Carvalho Chehab 7151f4e2bSMauro Carvalho Chehab1. Overview 8151f4e2bSMauro Carvalho Chehab----------- 9151f4e2bSMauro Carvalho Chehab 10151f4e2bSMauro Carvalho ChehabThe Energy Model (EM) framework serves as an interface between drivers knowing 117b7570adSLukasz Lubathe power consumed by devices at various performance levels, and the kernel 12151f4e2bSMauro Carvalho Chehabsubsystems willing to use that information to make energy-aware decisions. 13151f4e2bSMauro Carvalho Chehab 147b7570adSLukasz LubaThe source of the information about the power consumed by devices can vary greatly 15151f4e2bSMauro Carvalho Chehabfrom one platform to another. These power costs can be estimated using 16151f4e2bSMauro Carvalho Chehabdevicetree data in some cases. In others, the firmware will know better. 17151f4e2bSMauro Carvalho ChehabAlternatively, userspace might be best positioned. And so on. In order to avoid 18151f4e2bSMauro Carvalho Chehabeach and every client subsystem to re-implement support for each and every 19151f4e2bSMauro Carvalho Chehabpossible source of information on its own, the EM framework intervenes as an 20151f4e2bSMauro Carvalho Chehababstraction layer which standardizes the format of power cost tables in the 21151f4e2bSMauro Carvalho Chehabkernel, hence enabling to avoid redundant work. 22151f4e2bSMauro Carvalho Chehab 235a64f775SLukasz LubaThe power values might be expressed in milli-Watts or in an 'abstract scale'. 245a64f775SLukasz LubaMultiple subsystems might use the EM and it is up to the system integrator to 255a64f775SLukasz Lubacheck that the requirements for the power value scale types are met. An example 265a64f775SLukasz Lubacan be found in the Energy-Aware Scheduler documentation 275a64f775SLukasz LubaDocumentation/scheduler/sched-energy.rst. For some subsystems like thermal or 285a64f775SLukasz Lubapowercap power values expressed in an 'abstract scale' might cause issues. 295a64f775SLukasz LubaThese subsystems are more interested in estimation of power used in the past, 305a64f775SLukasz Lubathus the real milli-Watts might be needed. An example of these requirements can 315a64f775SLukasz Lubabe found in the Intelligent Power Allocation in 325a64f775SLukasz LubaDocumentation/driver-api/thermal/power_allocator.rst. 33b56a352cSLukasz LubaKernel subsystems might implement automatic detection to check whether EM 34b56a352cSLukasz Lubaregistered devices have inconsistent scale (based on EM internal flag). 355a64f775SLukasz LubaImportant thing to keep in mind is that when the power values are expressed in 365a64f775SLukasz Lubaan 'abstract scale' deriving real energy in milli-Joules would not be possible. 375a64f775SLukasz Luba 38151f4e2bSMauro Carvalho ChehabThe figure below depicts an example of drivers (Arm-specific here, but the 39151f4e2bSMauro Carvalho Chehabapproach is applicable to any architecture) providing power costs to the EM 40151f4e2bSMauro Carvalho Chehabframework, and interested clients reading the data from it:: 41151f4e2bSMauro Carvalho Chehab 42151f4e2bSMauro Carvalho Chehab +---------------+ +-----------------+ +---------------+ 43151f4e2bSMauro Carvalho Chehab | Thermal (IPA) | | Scheduler (EAS) | | Other | 44151f4e2bSMauro Carvalho Chehab +---------------+ +-----------------+ +---------------+ 457b7570adSLukasz Luba | | em_cpu_energy() | 46151f4e2bSMauro Carvalho Chehab | | em_cpu_get() | 47151f4e2bSMauro Carvalho Chehab +---------+ | +---------+ 48151f4e2bSMauro Carvalho Chehab | | | 49151f4e2bSMauro Carvalho Chehab v v v 50151f4e2bSMauro Carvalho Chehab +---------------------+ 51151f4e2bSMauro Carvalho Chehab | Energy Model | 52151f4e2bSMauro Carvalho Chehab | Framework | 53151f4e2bSMauro Carvalho Chehab +---------------------+ 54151f4e2bSMauro Carvalho Chehab ^ ^ ^ 557b7570adSLukasz Luba | | | em_dev_register_perf_domain() 56151f4e2bSMauro Carvalho Chehab +----------+ | +---------+ 57151f4e2bSMauro Carvalho Chehab | | | 58151f4e2bSMauro Carvalho Chehab +---------------+ +---------------+ +--------------+ 59151f4e2bSMauro Carvalho Chehab | cpufreq-dt | | arm_scmi | | Other | 60151f4e2bSMauro Carvalho Chehab +---------------+ +---------------+ +--------------+ 61151f4e2bSMauro Carvalho Chehab ^ ^ ^ 62151f4e2bSMauro Carvalho Chehab | | | 63151f4e2bSMauro Carvalho Chehab +--------------+ +---------------+ +--------------+ 64151f4e2bSMauro Carvalho Chehab | Device Tree | | Firmware | | ? | 65151f4e2bSMauro Carvalho Chehab +--------------+ +---------------+ +--------------+ 66151f4e2bSMauro Carvalho Chehab 677b7570adSLukasz LubaIn case of CPU devices the EM framework manages power cost tables per 687b7570adSLukasz Luba'performance domain' in the system. A performance domain is a group of CPUs 697b7570adSLukasz Lubawhose performance is scaled together. Performance domains generally have a 707b7570adSLukasz Luba1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are 717b7570adSLukasz Lubarequired to have the same micro-architecture. CPUs in different performance 727b7570adSLukasz Lubadomains can have different micro-architectures. 73151f4e2bSMauro Carvalho Chehab 74151f4e2bSMauro Carvalho Chehab 75151f4e2bSMauro Carvalho Chehab2. Core APIs 76151f4e2bSMauro Carvalho Chehab------------ 77151f4e2bSMauro Carvalho Chehab 78151f4e2bSMauro Carvalho Chehab2.1 Config options 79151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^ 80151f4e2bSMauro Carvalho Chehab 81151f4e2bSMauro Carvalho ChehabCONFIG_ENERGY_MODEL must be enabled to use the EM framework. 82151f4e2bSMauro Carvalho Chehab 83151f4e2bSMauro Carvalho Chehab 84151f4e2bSMauro Carvalho Chehab2.2 Registration of performance domains 85151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 86151f4e2bSMauro Carvalho Chehab 87151f4e2bSMauro Carvalho ChehabDrivers are expected to register performance domains into the EM framework by 88151f4e2bSMauro Carvalho Chehabcalling the following API:: 89151f4e2bSMauro Carvalho Chehab 907b7570adSLukasz Luba int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, 91b56a352cSLukasz Luba struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts); 92151f4e2bSMauro Carvalho Chehab 937b7570adSLukasz LubaDrivers must provide a callback function returning <frequency, power> tuples 947b7570adSLukasz Lubafor each performance state. The callback function provided by the driver is free 95151f4e2bSMauro Carvalho Chehabto fetch data from any relevant location (DT, firmware, ...), and by any mean 967b7570adSLukasz Lubadeemed necessary. Only for CPU devices, drivers must specify the CPUs of the 977b7570adSLukasz Lubaperformance domains using cpumask. For other devices than CPUs the last 987b7570adSLukasz Lubaargument must be set to NULL. 99b56a352cSLukasz LubaThe last argument 'milliwatts' is important to set with correct value. Kernel 100b56a352cSLukasz Lubasubsystems which use EM might rely on this flag to check if all EM devices use 101b56a352cSLukasz Lubathe same scale. If there are different scales, these subsystems might decide 102b56a352cSLukasz Lubato: return warning/error, stop working or panic. 1037b7570adSLukasz LubaSee Section 3. for an example of driver implementing this 104d62aab8fSLukasz Lubacallback, or Section 2.4 for further documentation on this API 105151f4e2bSMauro Carvalho Chehab 106151f4e2bSMauro Carvalho Chehab 107151f4e2bSMauro Carvalho Chehab2.3 Accessing performance domains 108151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 109151f4e2bSMauro Carvalho Chehab 1107b7570adSLukasz LubaThere are two API functions which provide the access to the energy model: 1117b7570adSLukasz Lubaem_cpu_get() which takes CPU id as an argument and em_pd_get() with device 1127b7570adSLukasz Lubapointer as an argument. It depends on the subsystem which interface it is 1137b7570adSLukasz Lubagoing to use, but in case of CPU devices both functions return the same 1147b7570adSLukasz Lubaperformance domain. 1157b7570adSLukasz Luba 116151f4e2bSMauro Carvalho ChehabSubsystems interested in the energy model of a CPU can retrieve it using the 117151f4e2bSMauro Carvalho Chehabem_cpu_get() API. The energy model tables are allocated once upon creation of 118151f4e2bSMauro Carvalho Chehabthe performance domains, and kept in memory untouched. 119151f4e2bSMauro Carvalho Chehab 120151f4e2bSMauro Carvalho ChehabThe energy consumed by a performance domain can be estimated using the 1217b7570adSLukasz Lubaem_cpu_energy() API. The estimation is performed assuming that the schedutil 1227b7570adSLukasz LubaCPUfreq governor is in use in case of CPU device. Currently this calculation is 1237b7570adSLukasz Lubanot provided for other type of devices. 124151f4e2bSMauro Carvalho Chehab 125d62aab8fSLukasz LubaMore details about the above APIs can be found in ``<linux/energy_model.h>`` 126d62aab8fSLukasz Lubaor in Section 2.4 127d62aab8fSLukasz Luba 128d62aab8fSLukasz Luba 129d62aab8fSLukasz Luba2.4 Description details of this API 130d62aab8fSLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131d62aab8fSLukasz Luba.. kernel-doc:: include/linux/energy_model.h 132d62aab8fSLukasz Luba :internal: 133d62aab8fSLukasz Luba 134d62aab8fSLukasz Luba.. kernel-doc:: kernel/power/energy_model.c 135d62aab8fSLukasz Luba :export: 136151f4e2bSMauro Carvalho Chehab 137151f4e2bSMauro Carvalho Chehab 138151f4e2bSMauro Carvalho Chehab3. Example driver 139151f4e2bSMauro Carvalho Chehab----------------- 140151f4e2bSMauro Carvalho Chehab 141*d704aa0dSLukasz LubaThe CPUFreq framework supports dedicated callback for registering 142*d704aa0dSLukasz Lubathe EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). 143*d704aa0dSLukasz LubaThat callback has to be implemented properly for a given driver, 144*d704aa0dSLukasz Lubabecause the framework would call it at the right time during setup. 145151f4e2bSMauro Carvalho ChehabThis section provides a simple example of a CPUFreq driver registering a 146151f4e2bSMauro Carvalho Chehabperformance domain in the Energy Model framework using the (fake) 'foo' 147151f4e2bSMauro Carvalho Chehabprotocol. The driver implements an est_power() function to be provided to the 148151f4e2bSMauro Carvalho ChehabEM framework:: 149151f4e2bSMauro Carvalho Chehab 150151f4e2bSMauro Carvalho Chehab -> drivers/cpufreq/foo_cpufreq.c 151151f4e2bSMauro Carvalho Chehab 1527b7570adSLukasz Luba 01 static int est_power(unsigned long *mW, unsigned long *KHz, 1537b7570adSLukasz Luba 02 struct device *dev) 1547b7570adSLukasz Luba 03 { 1557b7570adSLukasz Luba 04 long freq, power; 1567b7570adSLukasz Luba 05 1577b7570adSLukasz Luba 06 /* Use the 'foo' protocol to ceil the frequency */ 1587b7570adSLukasz Luba 07 freq = foo_get_freq_ceil(dev, *KHz); 1597b7570adSLukasz Luba 08 if (freq < 0); 1607b7570adSLukasz Luba 09 return freq; 1617b7570adSLukasz Luba 10 1627b7570adSLukasz Luba 11 /* Estimate the power cost for the dev at the relevant freq. */ 1637b7570adSLukasz Luba 12 power = foo_estimate_power(dev, freq); 1647b7570adSLukasz Luba 13 if (power < 0); 1657b7570adSLukasz Luba 14 return power; 1667b7570adSLukasz Luba 15 1677b7570adSLukasz Luba 16 /* Return the values to the EM framework */ 1687b7570adSLukasz Luba 17 *mW = power; 1697b7570adSLukasz Luba 18 *KHz = freq; 1707b7570adSLukasz Luba 19 1717b7570adSLukasz Luba 20 return 0; 1727b7570adSLukasz Luba 21 } 1737b7570adSLukasz Luba 22 174*d704aa0dSLukasz Luba 23 static void foo_cpufreq_register_em(struct cpufreq_policy *policy) 1757b7570adSLukasz Luba 24 { 1767b7570adSLukasz Luba 25 struct em_data_callback em_cb = EM_DATA_CB(est_power); 1777b7570adSLukasz Luba 26 struct device *cpu_dev; 178*d704aa0dSLukasz Luba 27 int nr_opp; 1797b7570adSLukasz Luba 28 1807b7570adSLukasz Luba 29 cpu_dev = get_cpu_device(cpumask_first(policy->cpus)); 1817b7570adSLukasz Luba 30 182*d704aa0dSLukasz Luba 31 /* Find the number of OPPs for this policy */ 183*d704aa0dSLukasz Luba 32 nr_opp = foo_get_nr_opp(policy); 184*d704aa0dSLukasz Luba 33 185*d704aa0dSLukasz Luba 34 /* And register the new performance domain */ 186*d704aa0dSLukasz Luba 35 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus, 187*d704aa0dSLukasz Luba 36 true); 188*d704aa0dSLukasz Luba 37 } 1897b7570adSLukasz Luba 38 190*d704aa0dSLukasz Luba 39 static struct cpufreq_driver foo_cpufreq_driver = { 191*d704aa0dSLukasz Luba 40 .register_em = foo_cpufreq_register_em, 192*d704aa0dSLukasz Luba 41 }; 193