xref: /openbmc/linux/Documentation/power/energy-model.rst (revision 5a64f775691647c242aa40d34f3512e7b179a921)
17b7570adSLukasz Luba.. SPDX-License-Identifier: GPL-2.0
27b7570adSLukasz Luba
37b7570adSLukasz Luba=======================
47b7570adSLukasz LubaEnergy Model of devices
57b7570adSLukasz Luba=======================
6151f4e2bSMauro Carvalho Chehab
7151f4e2bSMauro Carvalho Chehab1. Overview
8151f4e2bSMauro Carvalho Chehab-----------
9151f4e2bSMauro Carvalho Chehab
10151f4e2bSMauro Carvalho ChehabThe Energy Model (EM) framework serves as an interface between drivers knowing
117b7570adSLukasz Lubathe power consumed by devices at various performance levels, and the kernel
12151f4e2bSMauro Carvalho Chehabsubsystems willing to use that information to make energy-aware decisions.
13151f4e2bSMauro Carvalho Chehab
147b7570adSLukasz LubaThe source of the information about the power consumed by devices can vary greatly
15151f4e2bSMauro Carvalho Chehabfrom one platform to another. These power costs can be estimated using
16151f4e2bSMauro Carvalho Chehabdevicetree data in some cases. In others, the firmware will know better.
17151f4e2bSMauro Carvalho ChehabAlternatively, userspace might be best positioned. And so on. In order to avoid
18151f4e2bSMauro Carvalho Chehabeach and every client subsystem to re-implement support for each and every
19151f4e2bSMauro Carvalho Chehabpossible source of information on its own, the EM framework intervenes as an
20151f4e2bSMauro Carvalho Chehababstraction layer which standardizes the format of power cost tables in the
21151f4e2bSMauro Carvalho Chehabkernel, hence enabling to avoid redundant work.
22151f4e2bSMauro Carvalho Chehab
23*5a64f775SLukasz LubaThe power values might be expressed in milli-Watts or in an 'abstract scale'.
24*5a64f775SLukasz LubaMultiple subsystems might use the EM and it is up to the system integrator to
25*5a64f775SLukasz Lubacheck that the requirements for the power value scale types are met. An example
26*5a64f775SLukasz Lubacan be found in the Energy-Aware Scheduler documentation
27*5a64f775SLukasz LubaDocumentation/scheduler/sched-energy.rst. For some subsystems like thermal or
28*5a64f775SLukasz Lubapowercap power values expressed in an 'abstract scale' might cause issues.
29*5a64f775SLukasz LubaThese subsystems are more interested in estimation of power used in the past,
30*5a64f775SLukasz Lubathus the real milli-Watts might be needed. An example of these requirements can
31*5a64f775SLukasz Lubabe found in the Intelligent Power Allocation in
32*5a64f775SLukasz LubaDocumentation/driver-api/thermal/power_allocator.rst.
33*5a64f775SLukasz LubaImportant thing to keep in mind is that when the power values are expressed in
34*5a64f775SLukasz Lubaan 'abstract scale' deriving real energy in milli-Joules would not be possible.
35*5a64f775SLukasz Luba
36151f4e2bSMauro Carvalho ChehabThe figure below depicts an example of drivers (Arm-specific here, but the
37151f4e2bSMauro Carvalho Chehabapproach is applicable to any architecture) providing power costs to the EM
38151f4e2bSMauro Carvalho Chehabframework, and interested clients reading the data from it::
39151f4e2bSMauro Carvalho Chehab
40151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
41151f4e2bSMauro Carvalho Chehab       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
42151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
437b7570adSLukasz Luba               |                   | em_cpu_energy()   |
44151f4e2bSMauro Carvalho Chehab               |                   | em_cpu_get()      |
45151f4e2bSMauro Carvalho Chehab               +---------+         |         +---------+
46151f4e2bSMauro Carvalho Chehab                         |         |         |
47151f4e2bSMauro Carvalho Chehab                         v         v         v
48151f4e2bSMauro Carvalho Chehab                        +---------------------+
49151f4e2bSMauro Carvalho Chehab                        |    Energy Model     |
50151f4e2bSMauro Carvalho Chehab                        |     Framework       |
51151f4e2bSMauro Carvalho Chehab                        +---------------------+
52151f4e2bSMauro Carvalho Chehab                           ^       ^       ^
537b7570adSLukasz Luba                           |       |       | em_dev_register_perf_domain()
54151f4e2bSMauro Carvalho Chehab                +----------+       |       +---------+
55151f4e2bSMauro Carvalho Chehab                |                  |                 |
56151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
57151f4e2bSMauro Carvalho Chehab        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
58151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
59151f4e2bSMauro Carvalho Chehab                ^                  ^                 ^
60151f4e2bSMauro Carvalho Chehab                |                  |                 |
61151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
62151f4e2bSMauro Carvalho Chehab        | Device Tree  |   |   Firmware    |  |      ?       |
63151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
64151f4e2bSMauro Carvalho Chehab
657b7570adSLukasz LubaIn case of CPU devices the EM framework manages power cost tables per
667b7570adSLukasz Luba'performance domain' in the system. A performance domain is a group of CPUs
677b7570adSLukasz Lubawhose performance is scaled together. Performance domains generally have a
687b7570adSLukasz Luba1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
697b7570adSLukasz Lubarequired to have the same micro-architecture. CPUs in different performance
707b7570adSLukasz Lubadomains can have different micro-architectures.
71151f4e2bSMauro Carvalho Chehab
72151f4e2bSMauro Carvalho Chehab
73151f4e2bSMauro Carvalho Chehab2. Core APIs
74151f4e2bSMauro Carvalho Chehab------------
75151f4e2bSMauro Carvalho Chehab
76151f4e2bSMauro Carvalho Chehab2.1 Config options
77151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^
78151f4e2bSMauro Carvalho Chehab
79151f4e2bSMauro Carvalho ChehabCONFIG_ENERGY_MODEL must be enabled to use the EM framework.
80151f4e2bSMauro Carvalho Chehab
81151f4e2bSMauro Carvalho Chehab
82151f4e2bSMauro Carvalho Chehab2.2 Registration of performance domains
83151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
84151f4e2bSMauro Carvalho Chehab
85151f4e2bSMauro Carvalho ChehabDrivers are expected to register performance domains into the EM framework by
86151f4e2bSMauro Carvalho Chehabcalling the following API::
87151f4e2bSMauro Carvalho Chehab
887b7570adSLukasz Luba  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
897b7570adSLukasz Luba		struct em_data_callback *cb, cpumask_t *cpus);
90151f4e2bSMauro Carvalho Chehab
917b7570adSLukasz LubaDrivers must provide a callback function returning <frequency, power> tuples
927b7570adSLukasz Lubafor each performance state. The callback function provided by the driver is free
93151f4e2bSMauro Carvalho Chehabto fetch data from any relevant location (DT, firmware, ...), and by any mean
947b7570adSLukasz Lubadeemed necessary. Only for CPU devices, drivers must specify the CPUs of the
957b7570adSLukasz Lubaperformance domains using cpumask. For other devices than CPUs the last
967b7570adSLukasz Lubaargument must be set to NULL.
977b7570adSLukasz LubaSee Section 3. for an example of driver implementing this
98151f4e2bSMauro Carvalho Chehabcallback, and kernel/power/energy_model.c for further documentation on this
99151f4e2bSMauro Carvalho ChehabAPI.
100151f4e2bSMauro Carvalho Chehab
101151f4e2bSMauro Carvalho Chehab
102151f4e2bSMauro Carvalho Chehab2.3 Accessing performance domains
103151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104151f4e2bSMauro Carvalho Chehab
1057b7570adSLukasz LubaThere are two API functions which provide the access to the energy model:
1067b7570adSLukasz Lubaem_cpu_get() which takes CPU id as an argument and em_pd_get() with device
1077b7570adSLukasz Lubapointer as an argument. It depends on the subsystem which interface it is
1087b7570adSLukasz Lubagoing to use, but in case of CPU devices both functions return the same
1097b7570adSLukasz Lubaperformance domain.
1107b7570adSLukasz Luba
111151f4e2bSMauro Carvalho ChehabSubsystems interested in the energy model of a CPU can retrieve it using the
112151f4e2bSMauro Carvalho Chehabem_cpu_get() API. The energy model tables are allocated once upon creation of
113151f4e2bSMauro Carvalho Chehabthe performance domains, and kept in memory untouched.
114151f4e2bSMauro Carvalho Chehab
115151f4e2bSMauro Carvalho ChehabThe energy consumed by a performance domain can be estimated using the
1167b7570adSLukasz Lubaem_cpu_energy() API. The estimation is performed assuming that the schedutil
1177b7570adSLukasz LubaCPUfreq governor is in use in case of CPU device. Currently this calculation is
1187b7570adSLukasz Lubanot provided for other type of devices.
119151f4e2bSMauro Carvalho Chehab
120151f4e2bSMauro Carvalho ChehabMore details about the above APIs can be found in include/linux/energy_model.h.
121151f4e2bSMauro Carvalho Chehab
122151f4e2bSMauro Carvalho Chehab
123151f4e2bSMauro Carvalho Chehab3. Example driver
124151f4e2bSMauro Carvalho Chehab-----------------
125151f4e2bSMauro Carvalho Chehab
126151f4e2bSMauro Carvalho ChehabThis section provides a simple example of a CPUFreq driver registering a
127151f4e2bSMauro Carvalho Chehabperformance domain in the Energy Model framework using the (fake) 'foo'
128151f4e2bSMauro Carvalho Chehabprotocol. The driver implements an est_power() function to be provided to the
129151f4e2bSMauro Carvalho ChehabEM framework::
130151f4e2bSMauro Carvalho Chehab
131151f4e2bSMauro Carvalho Chehab  -> drivers/cpufreq/foo_cpufreq.c
132151f4e2bSMauro Carvalho Chehab
1337b7570adSLukasz Luba  01	static int est_power(unsigned long *mW, unsigned long *KHz,
1347b7570adSLukasz Luba  02			struct device *dev)
1357b7570adSLukasz Luba  03	{
1367b7570adSLukasz Luba  04		long freq, power;
1377b7570adSLukasz Luba  05
1387b7570adSLukasz Luba  06		/* Use the 'foo' protocol to ceil the frequency */
1397b7570adSLukasz Luba  07		freq = foo_get_freq_ceil(dev, *KHz);
1407b7570adSLukasz Luba  08		if (freq < 0);
1417b7570adSLukasz Luba  09			return freq;
1427b7570adSLukasz Luba  10
1437b7570adSLukasz Luba  11		/* Estimate the power cost for the dev at the relevant freq. */
1447b7570adSLukasz Luba  12		power = foo_estimate_power(dev, freq);
1457b7570adSLukasz Luba  13		if (power < 0);
1467b7570adSLukasz Luba  14			return power;
1477b7570adSLukasz Luba  15
1487b7570adSLukasz Luba  16		/* Return the values to the EM framework */
1497b7570adSLukasz Luba  17		*mW = power;
1507b7570adSLukasz Luba  18		*KHz = freq;
1517b7570adSLukasz Luba  19
1527b7570adSLukasz Luba  20		return 0;
1537b7570adSLukasz Luba  21	}
1547b7570adSLukasz Luba  22
1557b7570adSLukasz Luba  23	static int foo_cpufreq_init(struct cpufreq_policy *policy)
1567b7570adSLukasz Luba  24	{
1577b7570adSLukasz Luba  25		struct em_data_callback em_cb = EM_DATA_CB(est_power);
1587b7570adSLukasz Luba  26		struct device *cpu_dev;
1597b7570adSLukasz Luba  27		int nr_opp, ret;
1607b7570adSLukasz Luba  28
1617b7570adSLukasz Luba  29		cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
1627b7570adSLukasz Luba  30
1637b7570adSLukasz Luba  31     	/* Do the actual CPUFreq init work ... */
1647b7570adSLukasz Luba  32     	ret = do_foo_cpufreq_init(policy);
1657b7570adSLukasz Luba  33     	if (ret)
1667b7570adSLukasz Luba  34     		return ret;
1677b7570adSLukasz Luba  35
1687b7570adSLukasz Luba  36     	/* Find the number of OPPs for this policy */
1697b7570adSLukasz Luba  37     	nr_opp = foo_get_nr_opp(policy);
1707b7570adSLukasz Luba  38
1717b7570adSLukasz Luba  39     	/* And register the new performance domain */
1727b7570adSLukasz Luba  40     	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
1737b7570adSLukasz Luba  41
1747b7570adSLukasz Luba  42	        return 0;
1757b7570adSLukasz Luba  43	}
176