xref: /openbmc/linux/Documentation/power/energy-model.rst (revision c5d39fae8992cfca6d57e09dadf7aaaf7e64224f)
17b7570adSLukasz Luba.. SPDX-License-Identifier: GPL-2.0
27b7570adSLukasz Luba
37b7570adSLukasz Luba=======================
47b7570adSLukasz LubaEnergy Model of devices
57b7570adSLukasz Luba=======================
6151f4e2bSMauro Carvalho Chehab
7151f4e2bSMauro Carvalho Chehab1. Overview
8151f4e2bSMauro Carvalho Chehab-----------
9151f4e2bSMauro Carvalho Chehab
10151f4e2bSMauro Carvalho ChehabThe Energy Model (EM) framework serves as an interface between drivers knowing
117b7570adSLukasz Lubathe power consumed by devices at various performance levels, and the kernel
12151f4e2bSMauro Carvalho Chehabsubsystems willing to use that information to make energy-aware decisions.
13151f4e2bSMauro Carvalho Chehab
147b7570adSLukasz LubaThe source of the information about the power consumed by devices can vary greatly
15151f4e2bSMauro Carvalho Chehabfrom one platform to another. These power costs can be estimated using
16151f4e2bSMauro Carvalho Chehabdevicetree data in some cases. In others, the firmware will know better.
17151f4e2bSMauro Carvalho ChehabAlternatively, userspace might be best positioned. And so on. In order to avoid
18151f4e2bSMauro Carvalho Chehabeach and every client subsystem to re-implement support for each and every
19151f4e2bSMauro Carvalho Chehabpossible source of information on its own, the EM framework intervenes as an
20151f4e2bSMauro Carvalho Chehababstraction layer which standardizes the format of power cost tables in the
21151f4e2bSMauro Carvalho Chehabkernel, hence enabling to avoid redundant work.
22151f4e2bSMauro Carvalho Chehab
23*c5d39faeSLukasz LubaThe power values might be expressed in micro-Watts or in an 'abstract scale'.
245a64f775SLukasz LubaMultiple subsystems might use the EM and it is up to the system integrator to
255a64f775SLukasz Lubacheck that the requirements for the power value scale types are met. An example
265a64f775SLukasz Lubacan be found in the Energy-Aware Scheduler documentation
275a64f775SLukasz LubaDocumentation/scheduler/sched-energy.rst. For some subsystems like thermal or
285a64f775SLukasz Lubapowercap power values expressed in an 'abstract scale' might cause issues.
295a64f775SLukasz LubaThese subsystems are more interested in estimation of power used in the past,
30*c5d39faeSLukasz Lubathus the real micro-Watts might be needed. An example of these requirements can
315a64f775SLukasz Lubabe found in the Intelligent Power Allocation in
325a64f775SLukasz LubaDocumentation/driver-api/thermal/power_allocator.rst.
33b56a352cSLukasz LubaKernel subsystems might implement automatic detection to check whether EM
34b56a352cSLukasz Lubaregistered devices have inconsistent scale (based on EM internal flag).
355a64f775SLukasz LubaImportant thing to keep in mind is that when the power values are expressed in
36*c5d39faeSLukasz Lubaan 'abstract scale' deriving real energy in micro-Joules would not be possible.
375a64f775SLukasz Luba
38151f4e2bSMauro Carvalho ChehabThe figure below depicts an example of drivers (Arm-specific here, but the
39151f4e2bSMauro Carvalho Chehabapproach is applicable to any architecture) providing power costs to the EM
40151f4e2bSMauro Carvalho Chehabframework, and interested clients reading the data from it::
41151f4e2bSMauro Carvalho Chehab
42151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
43151f4e2bSMauro Carvalho Chehab       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
44151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
457b7570adSLukasz Luba               |                   | em_cpu_energy()   |
46151f4e2bSMauro Carvalho Chehab               |                   | em_cpu_get()      |
47151f4e2bSMauro Carvalho Chehab               +---------+         |         +---------+
48151f4e2bSMauro Carvalho Chehab                         |         |         |
49151f4e2bSMauro Carvalho Chehab                         v         v         v
50151f4e2bSMauro Carvalho Chehab                        +---------------------+
51151f4e2bSMauro Carvalho Chehab                        |    Energy Model     |
52151f4e2bSMauro Carvalho Chehab                        |     Framework       |
53151f4e2bSMauro Carvalho Chehab                        +---------------------+
54151f4e2bSMauro Carvalho Chehab                           ^       ^       ^
557b7570adSLukasz Luba                           |       |       | em_dev_register_perf_domain()
56151f4e2bSMauro Carvalho Chehab                +----------+       |       +---------+
57151f4e2bSMauro Carvalho Chehab                |                  |                 |
58151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
59151f4e2bSMauro Carvalho Chehab        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
60151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
61151f4e2bSMauro Carvalho Chehab                ^                  ^                 ^
62151f4e2bSMauro Carvalho Chehab                |                  |                 |
63151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
64151f4e2bSMauro Carvalho Chehab        | Device Tree  |   |   Firmware    |  |      ?       |
65151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
66151f4e2bSMauro Carvalho Chehab
677b7570adSLukasz LubaIn case of CPU devices the EM framework manages power cost tables per
687b7570adSLukasz Luba'performance domain' in the system. A performance domain is a group of CPUs
697b7570adSLukasz Lubawhose performance is scaled together. Performance domains generally have a
707b7570adSLukasz Luba1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
717b7570adSLukasz Lubarequired to have the same micro-architecture. CPUs in different performance
727b7570adSLukasz Lubadomains can have different micro-architectures.
73151f4e2bSMauro Carvalho Chehab
74151f4e2bSMauro Carvalho Chehab
75151f4e2bSMauro Carvalho Chehab2. Core APIs
76151f4e2bSMauro Carvalho Chehab------------
77151f4e2bSMauro Carvalho Chehab
78151f4e2bSMauro Carvalho Chehab2.1 Config options
79151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^
80151f4e2bSMauro Carvalho Chehab
81151f4e2bSMauro Carvalho ChehabCONFIG_ENERGY_MODEL must be enabled to use the EM framework.
82151f4e2bSMauro Carvalho Chehab
83151f4e2bSMauro Carvalho Chehab
84151f4e2bSMauro Carvalho Chehab2.2 Registration of performance domains
85151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
86151f4e2bSMauro Carvalho Chehab
8708374410SLukasz LubaRegistration of 'advanced' EM
8808374410SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8908374410SLukasz Luba
9008374410SLukasz LubaThe 'advanced' EM gets it's name due to the fact that the driver is allowed
9108374410SLukasz Lubato provide more precised power model. It's not limited to some implemented math
9208374410SLukasz Lubaformula in the framework (like it's in 'simple' EM case). It can better reflect
9308374410SLukasz Lubathe real power measurements performed for each performance state. Thus, this
9408374410SLukasz Lubaregistration method should be preferred in case considering EM static power
9508374410SLukasz Luba(leakage) is important.
9608374410SLukasz Luba
97151f4e2bSMauro Carvalho ChehabDrivers are expected to register performance domains into the EM framework by
98151f4e2bSMauro Carvalho Chehabcalling the following API::
99151f4e2bSMauro Carvalho Chehab
1007b7570adSLukasz Luba  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
101*c5d39faeSLukasz Luba		struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);
102151f4e2bSMauro Carvalho Chehab
1037b7570adSLukasz LubaDrivers must provide a callback function returning <frequency, power> tuples
1047b7570adSLukasz Lubafor each performance state. The callback function provided by the driver is free
105151f4e2bSMauro Carvalho Chehabto fetch data from any relevant location (DT, firmware, ...), and by any mean
1067b7570adSLukasz Lubadeemed necessary. Only for CPU devices, drivers must specify the CPUs of the
1077b7570adSLukasz Lubaperformance domains using cpumask. For other devices than CPUs the last
1087b7570adSLukasz Lubaargument must be set to NULL.
109*c5d39faeSLukasz LubaThe last argument 'microwatts' is important to set with correct value. Kernel
110b56a352cSLukasz Lubasubsystems which use EM might rely on this flag to check if all EM devices use
111b56a352cSLukasz Lubathe same scale. If there are different scales, these subsystems might decide
112*c5d39faeSLukasz Lubato return warning/error, stop working or panic.
1137b7570adSLukasz LubaSee Section 3. for an example of driver implementing this
114d62aab8fSLukasz Lubacallback, or Section 2.4 for further documentation on this API
115151f4e2bSMauro Carvalho Chehab
116f48a0c47SLukasz LubaRegistration of EM using DT
117f48a0c47SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
118f48a0c47SLukasz Luba
119f48a0c47SLukasz LubaThe  EM can also be registered using OPP framework and information in DT
120f48a0c47SLukasz Luba"operating-points-v2". Each OPP entry in DT can be extended with a property
121f48a0c47SLukasz Luba"opp-microwatt" containing micro-Watts power value. This OPP DT property
122f48a0c47SLukasz Lubaallows a platform to register EM power values which are reflecting total power
123f48a0c47SLukasz Luba(static + dynamic). These power values might be coming directly from
124f48a0c47SLukasz Lubaexperiments and measurements.
125f48a0c47SLukasz Luba
126015f569cSLukasz LubaRegistration of 'artificial' EM
127015f569cSLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128015f569cSLukasz Luba
129015f569cSLukasz LubaThere is an option to provide a custom callback for drivers missing detailed
130015f569cSLukasz Lubaknowledge about power value for each performance state. The callback
131015f569cSLukasz Luba.get_cost() is optional and provides the 'cost' values used by the EAS.
132015f569cSLukasz LubaThis is useful for platforms that only provide information on relative
133015f569cSLukasz Lubaefficiency between CPU types, where one could use the information to
134015f569cSLukasz Lubacreate an abstract power model. But even an abstract power model can
135015f569cSLukasz Lubasometimes be hard to fit in, given the input power value size restrictions.
136015f569cSLukasz LubaThe .get_cost() allows to provide the 'cost' values which reflect the
137015f569cSLukasz Lubaefficiency of the CPUs. This would allow to provide EAS information which
138015f569cSLukasz Lubahas different relation than what would be forced by the EM internal
139015f569cSLukasz Lubaformulas calculating 'cost' values. To register an EM for such platform, the
140*c5d39faeSLukasz Lubadriver must set the flag 'microwatts' to 0, provide .get_power() callback
141015f569cSLukasz Lubaand provide .get_cost() callback. The EM framework would handle such platform
142015f569cSLukasz Lubaproperly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
143015f569cSLukasz Lubaplatform. Special care should be taken by other frameworks which are using EM
144015f569cSLukasz Lubato test and treat this flag properly.
145015f569cSLukasz Luba
14608374410SLukasz LubaRegistration of 'simple' EM
14708374410SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~
14808374410SLukasz Luba
14908374410SLukasz LubaThe 'simple' EM is registered using the framework helper function
15008374410SLukasz Lubacpufreq_register_em_with_opp(). It implements a power model which is tight to
15108374410SLukasz Lubamath formula::
15208374410SLukasz Luba
15308374410SLukasz Luba	Power = C * V^2 * f
15408374410SLukasz Luba
15508374410SLukasz LubaThe EM which is registered using this method might not reflect correctly the
15608374410SLukasz Lubaphysics of a real device, e.g. when static power (leakage) is important.
15708374410SLukasz Luba
158151f4e2bSMauro Carvalho Chehab
159151f4e2bSMauro Carvalho Chehab2.3 Accessing performance domains
160151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
161151f4e2bSMauro Carvalho Chehab
1627b7570adSLukasz LubaThere are two API functions which provide the access to the energy model:
1637b7570adSLukasz Lubaem_cpu_get() which takes CPU id as an argument and em_pd_get() with device
1647b7570adSLukasz Lubapointer as an argument. It depends on the subsystem which interface it is
1657b7570adSLukasz Lubagoing to use, but in case of CPU devices both functions return the same
1667b7570adSLukasz Lubaperformance domain.
1677b7570adSLukasz Luba
168151f4e2bSMauro Carvalho ChehabSubsystems interested in the energy model of a CPU can retrieve it using the
169151f4e2bSMauro Carvalho Chehabem_cpu_get() API. The energy model tables are allocated once upon creation of
170151f4e2bSMauro Carvalho Chehabthe performance domains, and kept in memory untouched.
171151f4e2bSMauro Carvalho Chehab
172151f4e2bSMauro Carvalho ChehabThe energy consumed by a performance domain can be estimated using the
1737b7570adSLukasz Lubaem_cpu_energy() API. The estimation is performed assuming that the schedutil
1747b7570adSLukasz LubaCPUfreq governor is in use in case of CPU device. Currently this calculation is
1757b7570adSLukasz Lubanot provided for other type of devices.
176151f4e2bSMauro Carvalho Chehab
177d62aab8fSLukasz LubaMore details about the above APIs can be found in ``<linux/energy_model.h>``
178d62aab8fSLukasz Lubaor in Section 2.4
179d62aab8fSLukasz Luba
180d62aab8fSLukasz Luba
181d62aab8fSLukasz Luba2.4 Description details of this API
182d62aab8fSLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183d62aab8fSLukasz Luba.. kernel-doc:: include/linux/energy_model.h
184d62aab8fSLukasz Luba   :internal:
185d62aab8fSLukasz Luba
186d62aab8fSLukasz Luba.. kernel-doc:: kernel/power/energy_model.c
187d62aab8fSLukasz Luba   :export:
188151f4e2bSMauro Carvalho Chehab
189151f4e2bSMauro Carvalho Chehab
190151f4e2bSMauro Carvalho Chehab3. Example driver
191151f4e2bSMauro Carvalho Chehab-----------------
192151f4e2bSMauro Carvalho Chehab
193d704aa0dSLukasz LubaThe CPUFreq framework supports dedicated callback for registering
194d704aa0dSLukasz Lubathe EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
195d704aa0dSLukasz LubaThat callback has to be implemented properly for a given driver,
196d704aa0dSLukasz Lubabecause the framework would call it at the right time during setup.
197151f4e2bSMauro Carvalho ChehabThis section provides a simple example of a CPUFreq driver registering a
198151f4e2bSMauro Carvalho Chehabperformance domain in the Energy Model framework using the (fake) 'foo'
199151f4e2bSMauro Carvalho Chehabprotocol. The driver implements an est_power() function to be provided to the
200151f4e2bSMauro Carvalho ChehabEM framework::
201151f4e2bSMauro Carvalho Chehab
202151f4e2bSMauro Carvalho Chehab  -> drivers/cpufreq/foo_cpufreq.c
203151f4e2bSMauro Carvalho Chehab
20475a3a99aSLukasz Luba  01	static int est_power(struct device *dev, unsigned long *mW,
20575a3a99aSLukasz Luba  02			unsigned long *KHz)
2067b7570adSLukasz Luba  03	{
2077b7570adSLukasz Luba  04		long freq, power;
2087b7570adSLukasz Luba  05
2097b7570adSLukasz Luba  06		/* Use the 'foo' protocol to ceil the frequency */
2107b7570adSLukasz Luba  07		freq = foo_get_freq_ceil(dev, *KHz);
2117b7570adSLukasz Luba  08		if (freq < 0);
2127b7570adSLukasz Luba  09			return freq;
2137b7570adSLukasz Luba  10
2147b7570adSLukasz Luba  11		/* Estimate the power cost for the dev at the relevant freq. */
2157b7570adSLukasz Luba  12		power = foo_estimate_power(dev, freq);
2167b7570adSLukasz Luba  13		if (power < 0);
2177b7570adSLukasz Luba  14			return power;
2187b7570adSLukasz Luba  15
2197b7570adSLukasz Luba  16		/* Return the values to the EM framework */
2207b7570adSLukasz Luba  17		*mW = power;
2217b7570adSLukasz Luba  18		*KHz = freq;
2227b7570adSLukasz Luba  19
2237b7570adSLukasz Luba  20		return 0;
2247b7570adSLukasz Luba  21	}
2257b7570adSLukasz Luba  22
226d704aa0dSLukasz Luba  23	static void foo_cpufreq_register_em(struct cpufreq_policy *policy)
2277b7570adSLukasz Luba  24	{
2287b7570adSLukasz Luba  25		struct em_data_callback em_cb = EM_DATA_CB(est_power);
2297b7570adSLukasz Luba  26		struct device *cpu_dev;
230d704aa0dSLukasz Luba  27		int nr_opp;
2317b7570adSLukasz Luba  28
2327b7570adSLukasz Luba  29		cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
2337b7570adSLukasz Luba  30
234d704aa0dSLukasz Luba  31     	/* Find the number of OPPs for this policy */
235d704aa0dSLukasz Luba  32     	nr_opp = foo_get_nr_opp(policy);
236d704aa0dSLukasz Luba  33
237d704aa0dSLukasz Luba  34     	/* And register the new performance domain */
238d704aa0dSLukasz Luba  35     	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
239d704aa0dSLukasz Luba  36					    true);
240d704aa0dSLukasz Luba  37	}
2417b7570adSLukasz Luba  38
242d704aa0dSLukasz Luba  39	static struct cpufreq_driver foo_cpufreq_driver = {
243d704aa0dSLukasz Luba  40		.register_em = foo_cpufreq_register_em,
244d704aa0dSLukasz Luba  41	};
245