xref: /openbmc/linux/Documentation/power/pci.rst (revision 4f2c0a4acffbec01079c28f839422e64ddeff004)
1151f4e2bSMauro Carvalho Chehab====================
2151f4e2bSMauro Carvalho ChehabPCI Power Management
3151f4e2bSMauro Carvalho Chehab====================
4151f4e2bSMauro Carvalho Chehab
5151f4e2bSMauro Carvalho ChehabCopyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
6151f4e2bSMauro Carvalho Chehab
7151f4e2bSMauro Carvalho ChehabAn overview of concepts and the Linux kernel's interfaces related to PCI power
8151f4e2bSMauro Carvalho Chehabmanagement.  Based on previous work by Patrick Mochel <mochel@transmeta.com>
9151f4e2bSMauro Carvalho Chehab(and others).
10151f4e2bSMauro Carvalho Chehab
11151f4e2bSMauro Carvalho ChehabThis document only covers the aspects of power management specific to PCI
12151f4e2bSMauro Carvalho Chehabdevices.  For general description of the kernel's interfaces related to device
13151f4e2bSMauro Carvalho Chehabpower management refer to Documentation/driver-api/pm/devices.rst and
14151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst.
15151f4e2bSMauro Carvalho Chehab
16151f4e2bSMauro Carvalho Chehab.. contents:
17151f4e2bSMauro Carvalho Chehab
18151f4e2bSMauro Carvalho Chehab   1. Hardware and Platform Support for PCI Power Management
19151f4e2bSMauro Carvalho Chehab   2. PCI Subsystem and Device Power Management
20151f4e2bSMauro Carvalho Chehab   3. PCI Device Drivers and Power Management
21151f4e2bSMauro Carvalho Chehab   4. Resources
22151f4e2bSMauro Carvalho Chehab
23151f4e2bSMauro Carvalho Chehab
24151f4e2bSMauro Carvalho Chehab1. Hardware and Platform Support for PCI Power Management
25151f4e2bSMauro Carvalho Chehab=========================================================
26151f4e2bSMauro Carvalho Chehab
27151f4e2bSMauro Carvalho Chehab1.1. Native and Platform-Based Power Management
28151f4e2bSMauro Carvalho Chehab-----------------------------------------------
29151f4e2bSMauro Carvalho Chehab
30151f4e2bSMauro Carvalho ChehabIn general, power management is a feature allowing one to save energy by putting
31151f4e2bSMauro Carvalho Chehabdevices into states in which they draw less power (low-power states) at the
32151f4e2bSMauro Carvalho Chehabprice of reduced functionality or performance.
33151f4e2bSMauro Carvalho Chehab
34151f4e2bSMauro Carvalho ChehabUsually, a device is put into a low-power state when it is underutilized or
35151f4e2bSMauro Carvalho Chehabcompletely inactive.  However, when it is necessary to use the device once
36151f4e2bSMauro Carvalho Chehabagain, it has to be put back into the "fully functional" state (full-power
37151f4e2bSMauro Carvalho Chehabstate).  This may happen when there are some data for the device to handle or
38151f4e2bSMauro Carvalho Chehabas a result of an external event requiring the device to be active, which may
39151f4e2bSMauro Carvalho Chehabbe signaled by the device itself.
40151f4e2bSMauro Carvalho Chehab
41151f4e2bSMauro Carvalho ChehabPCI devices may be put into low-power states in two ways, by using the device
42151f4e2bSMauro Carvalho Chehabcapabilities introduced by the PCI Bus Power Management Interface Specification,
43151f4e2bSMauro Carvalho Chehabor with the help of platform firmware, such as an ACPI BIOS.  In the first
44151f4e2bSMauro Carvalho Chehabapproach, that is referred to as the native PCI power management (native PCI PM)
45151f4e2bSMauro Carvalho Chehabin what follows, the device power state is changed as a result of writing a
46151f4e2bSMauro Carvalho Chehabspecific value into one of its standard configuration registers.  The second
47151f4e2bSMauro Carvalho Chehabapproach requires the platform firmware to provide special methods that may be
48151f4e2bSMauro Carvalho Chehabused by the kernel to change the device's power state.
49151f4e2bSMauro Carvalho Chehab
50151f4e2bSMauro Carvalho ChehabDevices supporting the native PCI PM usually can generate wakeup signals called
51151f4e2bSMauro Carvalho ChehabPower Management Events (PMEs) to let the kernel know about external events
52151f4e2bSMauro Carvalho Chehabrequiring the device to be active.  After receiving a PME the kernel is supposed
53151f4e2bSMauro Carvalho Chehabto put the device that sent it into the full-power state.  However, the PCI Bus
54151f4e2bSMauro Carvalho ChehabPower Management Interface Specification doesn't define any standard method of
55151f4e2bSMauro Carvalho Chehabdelivering the PME from the device to the CPU and the operating system kernel.
56151f4e2bSMauro Carvalho ChehabIt is assumed that the platform firmware will perform this task and therefore,
57151f4e2bSMauro Carvalho Chehabeven though a PCI device is set up to generate PMEs, it also may be necessary to
58151f4e2bSMauro Carvalho Chehabprepare the platform firmware for notifying the CPU of the PMEs coming from the
59151f4e2bSMauro Carvalho Chehabdevice (e.g. by generating interrupts).
60151f4e2bSMauro Carvalho Chehab
61151f4e2bSMauro Carvalho ChehabIn turn, if the methods provided by the platform firmware are used for changing
62151f4e2bSMauro Carvalho Chehabthe power state of a device, usually the platform also provides a method for
63151f4e2bSMauro Carvalho Chehabpreparing the device to generate wakeup signals.  In that case, however, it
64151f4e2bSMauro Carvalho Chehaboften also is necessary to prepare the device for generating PMEs using the
65151f4e2bSMauro Carvalho Chehabnative PCI PM mechanism, because the method provided by the platform depends on
66151f4e2bSMauro Carvalho Chehabthat.
67151f4e2bSMauro Carvalho Chehab
68151f4e2bSMauro Carvalho ChehabThus in many situations both the native and the platform-based power management
69151f4e2bSMauro Carvalho Chehabmechanisms have to be used simultaneously to obtain the desired result.
70151f4e2bSMauro Carvalho Chehab
71151f4e2bSMauro Carvalho Chehab1.2. Native PCI Power Management
72151f4e2bSMauro Carvalho Chehab--------------------------------
73151f4e2bSMauro Carvalho Chehab
74151f4e2bSMauro Carvalho ChehabThe PCI Bus Power Management Interface Specification (PCI PM Spec) was
75151f4e2bSMauro Carvalho Chehabintroduced between the PCI 2.1 and PCI 2.2 Specifications.  It defined a
76151f4e2bSMauro Carvalho Chehabstandard interface for performing various operations related to power
77151f4e2bSMauro Carvalho Chehabmanagement.
78151f4e2bSMauro Carvalho Chehab
79151f4e2bSMauro Carvalho ChehabThe implementation of the PCI PM Spec is optional for conventional PCI devices,
80151f4e2bSMauro Carvalho Chehabbut it is mandatory for PCI Express devices.  If a device supports the PCI PM
81151f4e2bSMauro Carvalho ChehabSpec, it has an 8 byte power management capability field in its PCI
82151f4e2bSMauro Carvalho Chehabconfiguration space.  This field is used to describe and control the standard
83151f4e2bSMauro Carvalho Chehabfeatures related to the native PCI power management.
84151f4e2bSMauro Carvalho Chehab
85151f4e2bSMauro Carvalho ChehabThe PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses
86151f4e2bSMauro Carvalho Chehab(B0-B3).  The higher the number, the less power is drawn by the device or bus
87151f4e2bSMauro Carvalho Chehabin that state.  However, the higher the number, the longer the latency for
88151f4e2bSMauro Carvalho Chehabthe device or bus to return to the full-power state (D0 or B0, respectively).
89151f4e2bSMauro Carvalho Chehab
90151f4e2bSMauro Carvalho ChehabThere are two variants of the D3 state defined by the specification.  The first
91151f4e2bSMauro Carvalho Chehabone is D3hot, referred to as the software accessible D3, because devices can be
92151f4e2bSMauro Carvalho Chehabprogrammed to go into it.  The second one, D3cold, is the state that PCI devices
93151f4e2bSMauro Carvalho Chehabare in when the supply voltage (Vcc) is removed from them.  It is not possible
94151f4e2bSMauro Carvalho Chehabto program a PCI device to go into D3cold, although there may be a programmable
95151f4e2bSMauro Carvalho Chehabinterface for putting the bus the device is on into a state in which Vcc is
96151f4e2bSMauro Carvalho Chehabremoved from all devices on the bus.
97151f4e2bSMauro Carvalho Chehab
98151f4e2bSMauro Carvalho ChehabPCI bus power management, however, is not supported by the Linux kernel at the
99151f4e2bSMauro Carvalho Chehabtime of this writing and therefore it is not covered by this document.
100151f4e2bSMauro Carvalho Chehab
101151f4e2bSMauro Carvalho ChehabNote that every PCI device can be in the full-power state (D0) or in D3cold,
102151f4e2bSMauro Carvalho Chehabregardless of whether or not it implements the PCI PM Spec.  In addition to
103151f4e2bSMauro Carvalho Chehabthat, if the PCI PM Spec is implemented by the device, it must support D3hot
104151f4e2bSMauro Carvalho Chehabas well as D0.  The support for the D1 and D2 power states is optional.
105151f4e2bSMauro Carvalho Chehab
106151f4e2bSMauro Carvalho ChehabPCI devices supporting the PCI PM Spec can be programmed to go to any of the
107151f4e2bSMauro Carvalho Chehabsupported low-power states (except for D3cold).  While in D1-D3hot the
108151f4e2bSMauro Carvalho Chehabstandard configuration registers of the device must be accessible to software
109151f4e2bSMauro Carvalho Chehab(i.e. the device is required to respond to PCI configuration accesses), although
110151f4e2bSMauro Carvalho Chehabits I/O and memory spaces are then disabled.  This allows the device to be
111151f4e2bSMauro Carvalho Chehabprogrammatically put into D0.  Thus the kernel can switch the device back and
112151f4e2bSMauro Carvalho Chehabforth between D0 and the supported low-power states (except for D3cold) and the
113151f4e2bSMauro Carvalho Chehabpossible power state transitions the device can undergo are the following:
114151f4e2bSMauro Carvalho Chehab
115151f4e2bSMauro Carvalho Chehab+----------------------------+
116151f4e2bSMauro Carvalho Chehab| Current State | New State  |
117151f4e2bSMauro Carvalho Chehab+----------------------------+
118151f4e2bSMauro Carvalho Chehab| D0            | D1, D2, D3 |
119151f4e2bSMauro Carvalho Chehab+----------------------------+
120151f4e2bSMauro Carvalho Chehab| D1            | D2, D3     |
121151f4e2bSMauro Carvalho Chehab+----------------------------+
122151f4e2bSMauro Carvalho Chehab| D2            | D3         |
123151f4e2bSMauro Carvalho Chehab+----------------------------+
124151f4e2bSMauro Carvalho Chehab| D1, D2, D3    | D0         |
125151f4e2bSMauro Carvalho Chehab+----------------------------+
126151f4e2bSMauro Carvalho Chehab
127151f4e2bSMauro Carvalho ChehabThe transition from D3cold to D0 occurs when the supply voltage is provided to
128151f4e2bSMauro Carvalho Chehabthe device (i.e. power is restored).  In that case the device returns to D0 with
129151f4e2bSMauro Carvalho Chehaba full power-on reset sequence and the power-on defaults are restored to the
130151f4e2bSMauro Carvalho Chehabdevice by hardware just as at initial power up.
131151f4e2bSMauro Carvalho Chehab
132151f4e2bSMauro Carvalho ChehabPCI devices supporting the PCI PM Spec can be programmed to generate PMEs
13385a9b050SBjorn Helgaaswhile in any power state (D0-D3), but they are not required to be capable
13485a9b050SBjorn Helgaasof generating PMEs from all supported power states.  In particular, the
135151f4e2bSMauro Carvalho Chehabcapability of generating PMEs from D3cold is optional and depends on the
136151f4e2bSMauro Carvalho Chehabpresence of additional voltage (3.3Vaux) allowing the device to remain
137151f4e2bSMauro Carvalho Chehabsufficiently active to generate a wakeup signal.
138151f4e2bSMauro Carvalho Chehab
139151f4e2bSMauro Carvalho Chehab1.3. ACPI Device Power Management
140151f4e2bSMauro Carvalho Chehab---------------------------------
141151f4e2bSMauro Carvalho Chehab
142151f4e2bSMauro Carvalho ChehabThe platform firmware support for the power management of PCI devices is
143151f4e2bSMauro Carvalho Chehabsystem-specific.  However, if the system in question is compliant with the
144151f4e2bSMauro Carvalho ChehabAdvanced Configuration and Power Interface (ACPI) Specification, like the
145151f4e2bSMauro Carvalho Chehabmajority of x86-based systems, it is supposed to implement device power
146151f4e2bSMauro Carvalho Chehabmanagement interfaces defined by the ACPI standard.
147151f4e2bSMauro Carvalho Chehab
148151f4e2bSMauro Carvalho ChehabFor this purpose the ACPI BIOS provides special functions called "control
149151f4e2bSMauro Carvalho Chehabmethods" that may be executed by the kernel to perform specific tasks, such as
150151f4e2bSMauro Carvalho Chehabputting a device into a low-power state.  These control methods are encoded
151151f4e2bSMauro Carvalho Chehabusing special byte-code language called the ACPI Machine Language (AML) and
152151f4e2bSMauro Carvalho Chehabstored in the machine's BIOS.  The kernel loads them from the BIOS and executes
153151f4e2bSMauro Carvalho Chehabthem as needed using an AML interpreter that translates the AML byte code into
154151f4e2bSMauro Carvalho Chehabcomputations and memory or I/O space accesses.  This way, in theory, a BIOS
155151f4e2bSMauro Carvalho Chehabwriter can provide the kernel with a means to perform actions depending
156151f4e2bSMauro Carvalho Chehabon the system design in a system-specific fashion.
157151f4e2bSMauro Carvalho Chehab
158151f4e2bSMauro Carvalho ChehabACPI control methods may be divided into global control methods, that are not
159151f4e2bSMauro Carvalho Chehabassociated with any particular devices, and device control methods, that have
160151f4e2bSMauro Carvalho Chehabto be defined separately for each device supposed to be handled with the help of
161151f4e2bSMauro Carvalho Chehabthe platform.  This means, in particular, that ACPI device control methods can
162151f4e2bSMauro Carvalho Chehabonly be used to handle devices that the BIOS writer knew about in advance.  The
163151f4e2bSMauro Carvalho ChehabACPI methods used for device power management fall into that category.
164151f4e2bSMauro Carvalho Chehab
165151f4e2bSMauro Carvalho ChehabThe ACPI specification assumes that devices can be in one of four power states
166151f4e2bSMauro Carvalho Chehablabeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM
167151f4e2bSMauro Carvalho ChehabD0-D3 states (although the difference between D3hot and D3cold is not taken
168151f4e2bSMauro Carvalho Chehabinto account by ACPI).  Moreover, for each power state of a device there is a
169151f4e2bSMauro Carvalho Chehabset of power resources that have to be enabled for the device to be put into
170151f4e2bSMauro Carvalho Chehabthat state.  These power resources are controlled (i.e. enabled or disabled)
171151f4e2bSMauro Carvalho Chehabwith the help of their own control methods, _ON and _OFF, that have to be
172151f4e2bSMauro Carvalho Chehabdefined individually for each of them.
173151f4e2bSMauro Carvalho Chehab
174151f4e2bSMauro Carvalho ChehabTo put a device into the ACPI power state Dx (where x is a number between 0 and
175151f4e2bSMauro Carvalho Chehab3 inclusive) the kernel is supposed to (1) enable the power resources required
176151f4e2bSMauro Carvalho Chehabby the device in this state using their _ON control methods and (2) execute the
177151f4e2bSMauro Carvalho Chehab_PSx control method defined for the device.  In addition to that, if the device
178151f4e2bSMauro Carvalho Chehabis going to be put into a low-power state (D1-D3) and is supposed to generate
179151f4e2bSMauro Carvalho Chehabwakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI
180151f4e2bSMauro Carvalho Chehab3.0) control method defined for it has to be executed before _PSx.  Power
181151f4e2bSMauro Carvalho Chehabresources that are not required by the device in the target power state and are
182151f4e2bSMauro Carvalho Chehabnot required any more by any other device should be disabled (by executing their
183151f4e2bSMauro Carvalho Chehab_OFF control methods).  If the current power state of the device is D3, it can
184151f4e2bSMauro Carvalho Chehabonly be put into D0 this way.
185151f4e2bSMauro Carvalho Chehab
186151f4e2bSMauro Carvalho ChehabHowever, quite often the power states of devices are changed during a
187151f4e2bSMauro Carvalho Chehabsystem-wide transition into a sleep state or back into the working state.  ACPI
188151f4e2bSMauro Carvalho Chehabdefines four system sleep states, S1, S2, S3, and S4, and denotes the system
189151f4e2bSMauro Carvalho Chehabworking state as S0.  In general, the target system sleep (or working) state
190151f4e2bSMauro Carvalho Chehabdetermines the highest power (lowest number) state the device can be put
191151f4e2bSMauro Carvalho Chehabinto and the kernel is supposed to obtain this information by executing the
192151f4e2bSMauro Carvalho Chehabdevice's _SxD control method (where x is a number between 0 and 4 inclusive).
193151f4e2bSMauro Carvalho ChehabIf the device is required to wake up the system from the target sleep state, the
194151f4e2bSMauro Carvalho Chehablowest power (highest number) state it can be put into is also determined by the
195151f4e2bSMauro Carvalho Chehabtarget state of the system.  The kernel is then supposed to use the device's
196151f4e2bSMauro Carvalho Chehab_SxW control method to obtain the number of that state.  It also is supposed to
197151f4e2bSMauro Carvalho Chehabuse the device's _PRW control method to learn which power resources need to be
198151f4e2bSMauro Carvalho Chehabenabled for the device to be able to generate wakeup signals.
199151f4e2bSMauro Carvalho Chehab
200151f4e2bSMauro Carvalho Chehab1.4. Wakeup Signaling
201151f4e2bSMauro Carvalho Chehab---------------------
202151f4e2bSMauro Carvalho Chehab
203151f4e2bSMauro Carvalho ChehabWakeup signals generated by PCI devices, either as native PCI PMEs, or as
204151f4e2bSMauro Carvalho Chehaba result of the execution of the _DSW (or _PSW) ACPI control method before
205151f4e2bSMauro Carvalho Chehabputting the device into a low-power state, have to be caught and handled as
206151f4e2bSMauro Carvalho Chehabappropriate.  If they are sent while the system is in the working state
207151f4e2bSMauro Carvalho Chehab(ACPI S0), they should be translated into interrupts so that the kernel can
208151f4e2bSMauro Carvalho Chehabput the devices generating them into the full-power state and take care of the
209151f4e2bSMauro Carvalho Chehabevents that triggered them.  In turn, if they are sent while the system is
210151f4e2bSMauro Carvalho Chehabsleeping, they should cause the system's core logic to trigger wakeup.
211151f4e2bSMauro Carvalho Chehab
212151f4e2bSMauro Carvalho ChehabOn ACPI-based systems wakeup signals sent by conventional PCI devices are
213151f4e2bSMauro Carvalho Chehabconverted into ACPI General-Purpose Events (GPEs) which are hardware signals
214151f4e2bSMauro Carvalho Chehabfrom the system core logic generated in response to various events that need to
215151f4e2bSMauro Carvalho Chehabbe acted upon.  Every GPE is associated with one or more sources of potentially
216151f4e2bSMauro Carvalho Chehabinteresting events.  In particular, a GPE may be associated with a PCI device
217151f4e2bSMauro Carvalho Chehabcapable of signaling wakeup.  The information on the connections between GPEs
218151f4e2bSMauro Carvalho Chehaband event sources is recorded in the system's ACPI BIOS from where it can be
219151f4e2bSMauro Carvalho Chehabread by the kernel.
220151f4e2bSMauro Carvalho Chehab
221151f4e2bSMauro Carvalho ChehabIf a PCI device known to the system's ACPI BIOS signals wakeup, the GPE
222151f4e2bSMauro Carvalho Chehabassociated with it (if there is one) is triggered.  The GPEs associated with PCI
223151f4e2bSMauro Carvalho Chehabbridges may also be triggered in response to a wakeup signal from one of the
224151f4e2bSMauro Carvalho Chehabdevices below the bridge (this also is the case for root bridges) and, for
225151f4e2bSMauro Carvalho Chehabexample, native PCI PMEs from devices unknown to the system's ACPI BIOS may be
226151f4e2bSMauro Carvalho Chehabhandled this way.
227151f4e2bSMauro Carvalho Chehab
228151f4e2bSMauro Carvalho ChehabA GPE may be triggered when the system is sleeping (i.e. when it is in one of
229151f4e2bSMauro Carvalho Chehabthe ACPI S1-S4 states), in which case system wakeup is started by its core logic
230151f4e2bSMauro Carvalho Chehab(the device that was the source of the signal causing the system wakeup to occur
231151f4e2bSMauro Carvalho Chehabmay be identified later).  The GPEs used in such situations are referred to as
232151f4e2bSMauro Carvalho Chehabwakeup GPEs.
233151f4e2bSMauro Carvalho Chehab
234151f4e2bSMauro Carvalho ChehabUsually, however, GPEs are also triggered when the system is in the working
235151f4e2bSMauro Carvalho Chehabstate (ACPI S0) and in that case the system's core logic generates a System
236151f4e2bSMauro Carvalho ChehabControl Interrupt (SCI) to notify the kernel of the event.  Then, the SCI
237151f4e2bSMauro Carvalho Chehabhandler identifies the GPE that caused the interrupt to be generated which,
238151f4e2bSMauro Carvalho Chehabin turn, allows the kernel to identify the source of the event (that may be
239151f4e2bSMauro Carvalho Chehaba PCI device signaling wakeup).  The GPEs used for notifying the kernel of
240151f4e2bSMauro Carvalho Chehabevents occurring while the system is in the working state are referred to as
241151f4e2bSMauro Carvalho Chehabruntime GPEs.
242151f4e2bSMauro Carvalho Chehab
243151f4e2bSMauro Carvalho ChehabUnfortunately, there is no standard way of handling wakeup signals sent by
244151f4e2bSMauro Carvalho Chehabconventional PCI devices on systems that are not ACPI-based, but there is one
245151f4e2bSMauro Carvalho Chehabfor PCI Express devices.  Namely, the PCI Express Base Specification introduced
246151f4e2bSMauro Carvalho Chehaba native mechanism for converting native PCI PMEs into interrupts generated by
247151f4e2bSMauro Carvalho Chehabroot ports.  For conventional PCI devices native PMEs are out-of-band, so they
248151f4e2bSMauro Carvalho Chehabare routed separately and they need not pass through bridges (in principle they
249151f4e2bSMauro Carvalho Chehabmay be routed directly to the system's core logic), but for PCI Express devices
250151f4e2bSMauro Carvalho Chehabthey are in-band messages that have to pass through the PCI Express hierarchy,
251151f4e2bSMauro Carvalho Chehabincluding the root port on the path from the device to the Root Complex.  Thus
252151f4e2bSMauro Carvalho Chehabit was possible to introduce a mechanism by which a root port generates an
253151f4e2bSMauro Carvalho Chehabinterrupt whenever it receives a PME message from one of the devices below it.
254151f4e2bSMauro Carvalho ChehabThe PCI Express Requester ID of the device that sent the PME message is then
255151f4e2bSMauro Carvalho Chehabrecorded in one of the root port's configuration registers from where it may be
256151f4e2bSMauro Carvalho Chehabread by the interrupt handler allowing the device to be identified.  [PME
257151f4e2bSMauro Carvalho Chehabmessages sent by PCI Express endpoints integrated with the Root Complex don't
258151f4e2bSMauro Carvalho Chehabpass through root ports, but instead they cause a Root Complex Event Collector
259151f4e2bSMauro Carvalho Chehab(if there is one) to generate interrupts.]
260151f4e2bSMauro Carvalho Chehab
261151f4e2bSMauro Carvalho ChehabIn principle the native PCI Express PME signaling may also be used on ACPI-based
262151f4e2bSMauro Carvalho Chehabsystems along with the GPEs, but to use it the kernel has to ask the system's
263151f4e2bSMauro Carvalho ChehabACPI BIOS to release control of root port configuration registers.  The ACPI
264151f4e2bSMauro Carvalho ChehabBIOS, however, is not required to allow the kernel to control these registers
265151f4e2bSMauro Carvalho Chehaband if it doesn't do that, the kernel must not modify their contents.  Of course
266151f4e2bSMauro Carvalho Chehabthe native PCI Express PME signaling cannot be used by the kernel in that case.
267151f4e2bSMauro Carvalho Chehab
268151f4e2bSMauro Carvalho Chehab
269151f4e2bSMauro Carvalho Chehab2. PCI Subsystem and Device Power Management
270151f4e2bSMauro Carvalho Chehab============================================
271151f4e2bSMauro Carvalho Chehab
272151f4e2bSMauro Carvalho Chehab2.1. Device Power Management Callbacks
273151f4e2bSMauro Carvalho Chehab--------------------------------------
274151f4e2bSMauro Carvalho Chehab
275151f4e2bSMauro Carvalho ChehabThe PCI Subsystem participates in the power management of PCI devices in a
276151f4e2bSMauro Carvalho Chehabnumber of ways.  First of all, it provides an intermediate code layer between
277151f4e2bSMauro Carvalho Chehabthe device power management core (PM core) and PCI device drivers.
278151f4e2bSMauro Carvalho ChehabSpecifically, the pm field of the PCI subsystem's struct bus_type object,
279151f4e2bSMauro Carvalho Chehabpci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
280151f4e2bSMauro Carvalho Chehabpointers to several device power management callbacks::
281151f4e2bSMauro Carvalho Chehab
282151f4e2bSMauro Carvalho Chehab  const struct dev_pm_ops pci_dev_pm_ops = {
283151f4e2bSMauro Carvalho Chehab	.prepare = pci_pm_prepare,
284151f4e2bSMauro Carvalho Chehab	.complete = pci_pm_complete,
285151f4e2bSMauro Carvalho Chehab	.suspend = pci_pm_suspend,
286151f4e2bSMauro Carvalho Chehab	.resume = pci_pm_resume,
287151f4e2bSMauro Carvalho Chehab	.freeze = pci_pm_freeze,
288151f4e2bSMauro Carvalho Chehab	.thaw = pci_pm_thaw,
289151f4e2bSMauro Carvalho Chehab	.poweroff = pci_pm_poweroff,
290151f4e2bSMauro Carvalho Chehab	.restore = pci_pm_restore,
291151f4e2bSMauro Carvalho Chehab	.suspend_noirq = pci_pm_suspend_noirq,
292151f4e2bSMauro Carvalho Chehab	.resume_noirq = pci_pm_resume_noirq,
293151f4e2bSMauro Carvalho Chehab	.freeze_noirq = pci_pm_freeze_noirq,
294151f4e2bSMauro Carvalho Chehab	.thaw_noirq = pci_pm_thaw_noirq,
295151f4e2bSMauro Carvalho Chehab	.poweroff_noirq = pci_pm_poweroff_noirq,
296151f4e2bSMauro Carvalho Chehab	.restore_noirq = pci_pm_restore_noirq,
297151f4e2bSMauro Carvalho Chehab	.runtime_suspend = pci_pm_runtime_suspend,
298151f4e2bSMauro Carvalho Chehab	.runtime_resume = pci_pm_runtime_resume,
299151f4e2bSMauro Carvalho Chehab	.runtime_idle = pci_pm_runtime_idle,
300151f4e2bSMauro Carvalho Chehab  };
301151f4e2bSMauro Carvalho Chehab
302151f4e2bSMauro Carvalho ChehabThese callbacks are executed by the PM core in various situations related to
303151f4e2bSMauro Carvalho Chehabdevice power management and they, in turn, execute power management callbacks
304151f4e2bSMauro Carvalho Chehabprovided by PCI device drivers.  They also perform power management operations
305151f4e2bSMauro Carvalho Chehabinvolving some standard configuration registers of PCI devices that device
306151f4e2bSMauro Carvalho Chehabdrivers need not know or care about.
307151f4e2bSMauro Carvalho Chehab
308151f4e2bSMauro Carvalho ChehabThe structure representing a PCI device, struct pci_dev, contains several fields
309151f4e2bSMauro Carvalho Chehabthat these callbacks operate on::
310151f4e2bSMauro Carvalho Chehab
311151f4e2bSMauro Carvalho Chehab  struct pci_dev {
312151f4e2bSMauro Carvalho Chehab	...
313151f4e2bSMauro Carvalho Chehab	pci_power_t     current_state;  /* Current operating state. */
314151f4e2bSMauro Carvalho Chehab	int		pm_cap;		/* PM capability offset in the
315151f4e2bSMauro Carvalho Chehab					   configuration space */
316151f4e2bSMauro Carvalho Chehab	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
317151f4e2bSMauro Carvalho Chehab					   can be generated */
318*7c4300ebSMario Limonciello	unsigned int	pme_poll:1;	/* Poll device's PME status bit */
319151f4e2bSMauro Carvalho Chehab	unsigned int	d1_support:1;	/* Low power state D1 is supported */
320151f4e2bSMauro Carvalho Chehab	unsigned int	d2_support:1;	/* Low power state D2 is supported */
321151f4e2bSMauro Carvalho Chehab	unsigned int	no_d1d2:1;	/* D1 and D2 are forbidden */
322151f4e2bSMauro Carvalho Chehab	unsigned int	wakeup_prepared:1;  /* Device prepared for wake up */
3233789af9aSKrzysztof Wilczyński	unsigned int	d3hot_delay;	/* D3hot->D0 transition time in ms */
324151f4e2bSMauro Carvalho Chehab	...
325151f4e2bSMauro Carvalho Chehab  };
326151f4e2bSMauro Carvalho Chehab
327151f4e2bSMauro Carvalho ChehabThey also indirectly use some fields of the struct device that is embedded in
328151f4e2bSMauro Carvalho Chehabstruct pci_dev.
329151f4e2bSMauro Carvalho Chehab
330151f4e2bSMauro Carvalho Chehab2.2. Device Initialization
331151f4e2bSMauro Carvalho Chehab--------------------------
332151f4e2bSMauro Carvalho Chehab
333151f4e2bSMauro Carvalho ChehabThe PCI subsystem's first task related to device power management is to
334151f4e2bSMauro Carvalho Chehabprepare the device for power management and initialize the fields of struct
335151f4e2bSMauro Carvalho Chehabpci_dev used for this purpose.  This happens in two functions defined in
336151f4e2bSMauro Carvalho Chehabdrivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init().
337151f4e2bSMauro Carvalho Chehab
338151f4e2bSMauro Carvalho ChehabThe first of these functions checks if the device supports native PCI PM
339151f4e2bSMauro Carvalho Chehaband if that's the case the offset of its power management capability structure
340151f4e2bSMauro Carvalho Chehabin the configuration space is stored in the pm_cap field of the device's struct
341151f4e2bSMauro Carvalho Chehabpci_dev object.  Next, the function checks which PCI low-power states are
342151f4e2bSMauro Carvalho Chehabsupported by the device and from which low-power states the device can generate
343151f4e2bSMauro Carvalho Chehabnative PCI PMEs.  The power management fields of the device's struct pci_dev and
344151f4e2bSMauro Carvalho Chehabthe struct device embedded in it are updated accordingly and the generation of
345151f4e2bSMauro Carvalho ChehabPMEs by the device is disabled.
346151f4e2bSMauro Carvalho Chehab
347151f4e2bSMauro Carvalho ChehabThe second function checks if the device can be prepared to signal wakeup with
348151f4e2bSMauro Carvalho Chehabthe help of the platform firmware, such as the ACPI BIOS.  If that is the case,
349151f4e2bSMauro Carvalho Chehabthe function updates the wakeup fields in struct device embedded in the
350151f4e2bSMauro Carvalho Chehabdevice's struct pci_dev and uses the firmware-provided method to prevent the
351151f4e2bSMauro Carvalho Chehabdevice from signaling wakeup.
352151f4e2bSMauro Carvalho Chehab
353151f4e2bSMauro Carvalho ChehabAt this point the device is ready for power management.  For driverless devices,
354151f4e2bSMauro Carvalho Chehabhowever, this functionality is limited to a few basic operations carried out
355151f4e2bSMauro Carvalho Chehabduring system-wide transitions to a sleep state and back to the working state.
356151f4e2bSMauro Carvalho Chehab
357151f4e2bSMauro Carvalho Chehab2.3. Runtime Device Power Management
358151f4e2bSMauro Carvalho Chehab------------------------------------
359151f4e2bSMauro Carvalho Chehab
360151f4e2bSMauro Carvalho ChehabThe PCI subsystem plays a vital role in the runtime power management of PCI
361151f4e2bSMauro Carvalho Chehabdevices.  For this purpose it uses the general runtime power management
362151f4e2bSMauro Carvalho Chehab(runtime PM) framework described in Documentation/power/runtime_pm.rst.
363151f4e2bSMauro Carvalho ChehabNamely, it provides subsystem-level callbacks::
364151f4e2bSMauro Carvalho Chehab
365151f4e2bSMauro Carvalho Chehab	pci_pm_runtime_suspend()
366151f4e2bSMauro Carvalho Chehab	pci_pm_runtime_resume()
367151f4e2bSMauro Carvalho Chehab	pci_pm_runtime_idle()
368151f4e2bSMauro Carvalho Chehab
369151f4e2bSMauro Carvalho Chehabthat are executed by the core runtime PM routines.  It also implements the
370151f4e2bSMauro Carvalho Chehabentire mechanics necessary for handling runtime wakeup signals from PCI devices
371151f4e2bSMauro Carvalho Chehabin low-power states, which at the time of this writing works for both the native
372151f4e2bSMauro Carvalho ChehabPCI Express PME signaling and the ACPI GPE-based wakeup signaling described in
373151f4e2bSMauro Carvalho ChehabSection 1.
374151f4e2bSMauro Carvalho Chehab
375151f4e2bSMauro Carvalho ChehabFirst, a PCI device is put into a low-power state, or suspended, with the help
376151f4e2bSMauro Carvalho Chehabof pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call
377151f4e2bSMauro Carvalho Chehabpci_pm_runtime_suspend() to do the actual job.  For this to work, the device's
378151f4e2bSMauro Carvalho Chehabdriver has to provide a pm->runtime_suspend() callback (see below), which is
379151f4e2bSMauro Carvalho Chehabrun by pci_pm_runtime_suspend() as the first action.  If the driver's callback
380151f4e2bSMauro Carvalho Chehabreturns successfully, the device's standard configuration registers are saved,
381151f4e2bSMauro Carvalho Chehabthe device is prepared to generate wakeup signals and, finally, it is put into
382151f4e2bSMauro Carvalho Chehabthe target low-power state.
383151f4e2bSMauro Carvalho Chehab
384151f4e2bSMauro Carvalho ChehabThe low-power state to put the device into is the lowest-power (highest number)
385151f4e2bSMauro Carvalho Chehabstate from which it can signal wakeup.  The exact method of signaling wakeup is
386151f4e2bSMauro Carvalho Chehabsystem-dependent and is determined by the PCI subsystem on the basis of the
387151f4e2bSMauro Carvalho Chehabreported capabilities of the device and the platform firmware.  To prepare the
388151f4e2bSMauro Carvalho Chehabdevice for signaling wakeup and put it into the selected low-power state, the
389151f4e2bSMauro Carvalho ChehabPCI subsystem can use the platform firmware as well as the device's native PCI
390151f4e2bSMauro Carvalho ChehabPM capabilities, if supported.
391151f4e2bSMauro Carvalho Chehab
392151f4e2bSMauro Carvalho ChehabIt is expected that the device driver's pm->runtime_suspend() callback will
393151f4e2bSMauro Carvalho Chehabnot attempt to prepare the device for signaling wakeup or to put it into a
394151f4e2bSMauro Carvalho Chehablow-power state.  The driver ought to leave these tasks to the PCI subsystem
395151f4e2bSMauro Carvalho Chehabthat has all of the information necessary to perform them.
396151f4e2bSMauro Carvalho Chehab
397151f4e2bSMauro Carvalho ChehabA suspended device is brought back into the "active" state, or resumed,
398151f4e2bSMauro Carvalho Chehabwith the help of pm_request_resume() or pm_runtime_resume() which both call
399151f4e2bSMauro Carvalho Chehabpci_pm_runtime_resume() for PCI devices.  Again, this only works if the device's
400151f4e2bSMauro Carvalho Chehabdriver provides a pm->runtime_resume() callback (see below).  However, before
401151f4e2bSMauro Carvalho Chehabthe driver's callback is executed, pci_pm_runtime_resume() brings the device
402151f4e2bSMauro Carvalho Chehabback into the full-power state, prevents it from signaling wakeup while in that
403151f4e2bSMauro Carvalho Chehabstate and restores its standard configuration registers.  Thus the driver's
404151f4e2bSMauro Carvalho Chehabcallback need not worry about the PCI-specific aspects of the device resume.
405151f4e2bSMauro Carvalho Chehab
406151f4e2bSMauro Carvalho ChehabNote that generally pci_pm_runtime_resume() may be called in two different
407151f4e2bSMauro Carvalho Chehabsituations.  First, it may be called at the request of the device's driver, for
408151f4e2bSMauro Carvalho Chehabexample if there are some data for it to process.  Second, it may be called
409151f4e2bSMauro Carvalho Chehabas a result of a wakeup signal from the device itself (this sometimes is
410151f4e2bSMauro Carvalho Chehabreferred to as "remote wakeup").  Of course, for this purpose the wakeup signal
411151f4e2bSMauro Carvalho Chehabis handled in one of the ways described in Section 1 and finally converted into
412151f4e2bSMauro Carvalho Chehaba notification for the PCI subsystem after the source device has been
413151f4e2bSMauro Carvalho Chehabidentified.
414151f4e2bSMauro Carvalho Chehab
415151f4e2bSMauro Carvalho ChehabThe pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle()
416151f4e2bSMauro Carvalho Chehaband pm_request_idle(), executes the device driver's pm->runtime_idle()
417151f4e2bSMauro Carvalho Chehabcallback, if defined, and if that callback doesn't return error code (or is not
418151f4e2bSMauro Carvalho Chehabpresent at all), suspends the device with the help of pm_runtime_suspend().
419151f4e2bSMauro Carvalho ChehabSometimes pci_pm_runtime_idle() is called automatically by the PM core (for
420151f4e2bSMauro Carvalho Chehabexample, it is called right after the device has just been resumed), in which
421151f4e2bSMauro Carvalho Chehabcases it is expected to suspend the device if that makes sense.  Usually,
422151f4e2bSMauro Carvalho Chehabhowever, the PCI subsystem doesn't really know if the device really can be
423151f4e2bSMauro Carvalho Chehabsuspended, so it lets the device's driver decide by running its
424151f4e2bSMauro Carvalho Chehabpm->runtime_idle() callback.
425151f4e2bSMauro Carvalho Chehab
426151f4e2bSMauro Carvalho Chehab2.4. System-Wide Power Transitions
427151f4e2bSMauro Carvalho Chehab----------------------------------
428151f4e2bSMauro Carvalho ChehabThere are a few different types of system-wide power transitions, described in
429b64cf7a1SBjorn HelgaasDocumentation/driver-api/pm/devices.rst.  Each of them requires devices to be
430b64cf7a1SBjorn Helgaashandled in a specific way and the PM core executes subsystem-level power
431b64cf7a1SBjorn Helgaasmanagement callbacks for this purpose.  They are executed in phases such that
432b64cf7a1SBjorn Helgaaseach phase involves executing the same subsystem-level callback for every device
433b64cf7a1SBjorn Helgaasbelonging to the given subsystem before the next phase begins.  These phases
434b64cf7a1SBjorn Helgaasalways run after tasks have been frozen.
435151f4e2bSMauro Carvalho Chehab
436151f4e2bSMauro Carvalho Chehab2.4.1. System Suspend
437151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^
438151f4e2bSMauro Carvalho Chehab
439151f4e2bSMauro Carvalho ChehabWhen the system is going into a sleep state in which the contents of memory will
440151f4e2bSMauro Carvalho Chehabbe preserved, such as one of the ACPI sleep states S1-S3, the phases are:
441151f4e2bSMauro Carvalho Chehab
442151f4e2bSMauro Carvalho Chehab	prepare, suspend, suspend_noirq.
443151f4e2bSMauro Carvalho Chehab
444151f4e2bSMauro Carvalho ChehabThe following PCI bus type's callbacks, respectively, are used in these phases::
445151f4e2bSMauro Carvalho Chehab
446151f4e2bSMauro Carvalho Chehab	pci_pm_prepare()
447151f4e2bSMauro Carvalho Chehab	pci_pm_suspend()
448151f4e2bSMauro Carvalho Chehab	pci_pm_suspend_noirq()
449151f4e2bSMauro Carvalho Chehab
450151f4e2bSMauro Carvalho ChehabThe pci_pm_prepare() routine first puts the device into the "fully functional"
451151f4e2bSMauro Carvalho Chehabstate with the help of pm_runtime_resume().  Then, it executes the device
452151f4e2bSMauro Carvalho Chehabdriver's pm->prepare() callback if defined (i.e. if the driver's struct
453151f4e2bSMauro Carvalho Chehabdev_pm_ops object is present and the prepare pointer in that object is valid).
454151f4e2bSMauro Carvalho Chehab
455151f4e2bSMauro Carvalho ChehabThe pci_pm_suspend() routine first checks if the device's driver implements
456151f4e2bSMauro Carvalho Chehablegacy PCI suspend routines (see Section 3), in which case the driver's legacy
457151f4e2bSMauro Carvalho Chehabsuspend callback is executed, if present, and its result is returned.  Next, if
458151f4e2bSMauro Carvalho Chehabthe device's driver doesn't provide a struct dev_pm_ops object (containing
459151f4e2bSMauro Carvalho Chehabpointers to the driver's callbacks), pci_pm_default_suspend() is called, which
460151f4e2bSMauro Carvalho Chehabsimply turns off the device's bus master capability and runs
461151f4e2bSMauro Carvalho Chehabpcibios_disable_device() to disable it, unless the device is a bridge (PCI
462151f4e2bSMauro Carvalho Chehabbridges are ignored by this routine).  Next, the device driver's pm->suspend()
463151f4e2bSMauro Carvalho Chehabcallback is executed, if defined, and its result is returned if it fails.
464151f4e2bSMauro Carvalho ChehabFinally, pci_fixup_device() is called to apply hardware suspend quirks related
465151f4e2bSMauro Carvalho Chehabto the device if necessary.
466151f4e2bSMauro Carvalho Chehab
467151f4e2bSMauro Carvalho ChehabNote that the suspend phase is carried out asynchronously for PCI devices, so
468151f4e2bSMauro Carvalho Chehabthe pci_pm_suspend() callback may be executed in parallel for any pair of PCI
469151f4e2bSMauro Carvalho Chehabdevices that don't depend on each other in a known way (i.e. none of the paths
470151f4e2bSMauro Carvalho Chehabin the device tree from the root bridge to a leaf device contains both of them).
471151f4e2bSMauro Carvalho Chehab
472151f4e2bSMauro Carvalho ChehabThe pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has
473151f4e2bSMauro Carvalho Chehabbeen called, which means that the device driver's interrupt handler won't be
474151f4e2bSMauro Carvalho Chehabinvoked while this routine is running.  It first checks if the device's driver
475151f4e2bSMauro Carvalho Chehabimplements legacy PCI suspends routines (Section 3), in which case the legacy
476151f4e2bSMauro Carvalho Chehablate suspend routine is called and its result is returned (the standard
477151f4e2bSMauro Carvalho Chehabconfiguration registers of the device are saved if the driver's callback hasn't
478151f4e2bSMauro Carvalho Chehabdone that).  Second, if the device driver's struct dev_pm_ops object is not
479151f4e2bSMauro Carvalho Chehabpresent, the device's standard configuration registers are saved and the routine
480151f4e2bSMauro Carvalho Chehabreturns success.  Otherwise the device driver's pm->suspend_noirq() callback is
481151f4e2bSMauro Carvalho Chehabexecuted, if present, and its result is returned if it fails.  Next, if the
482151f4e2bSMauro Carvalho Chehabdevice's standard configuration registers haven't been saved yet (one of the
483151f4e2bSMauro Carvalho Chehabdevice driver's callbacks executed before might do that), pci_pm_suspend_noirq()
484151f4e2bSMauro Carvalho Chehabsaves them, prepares the device to signal wakeup (if necessary) and puts it into
485151f4e2bSMauro Carvalho Chehaba low-power state.
486151f4e2bSMauro Carvalho Chehab
487151f4e2bSMauro Carvalho ChehabThe low-power state to put the device into is the lowest-power (highest number)
488151f4e2bSMauro Carvalho Chehabstate from which it can signal wakeup while the system is in the target sleep
489151f4e2bSMauro Carvalho Chehabstate.  Just like in the runtime PM case described above, the mechanism of
490151f4e2bSMauro Carvalho Chehabsignaling wakeup is system-dependent and determined by the PCI subsystem, which
491151f4e2bSMauro Carvalho Chehabis also responsible for preparing the device to signal wakeup from the system's
492151f4e2bSMauro Carvalho Chehabtarget sleep state as appropriate.
493151f4e2bSMauro Carvalho Chehab
494151f4e2bSMauro Carvalho ChehabPCI device drivers (that don't implement legacy power management callbacks) are
495151f4e2bSMauro Carvalho Chehabgenerally not expected to prepare devices for signaling wakeup or to put them
496151f4e2bSMauro Carvalho Chehabinto low-power states.  However, if one of the driver's suspend callbacks
497151f4e2bSMauro Carvalho Chehab(pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration
498151f4e2bSMauro Carvalho Chehabregisters, pci_pm_suspend_noirq() will assume that the device has been prepared
499151f4e2bSMauro Carvalho Chehabto signal wakeup and put into a low-power state by the driver (the driver is
500151f4e2bSMauro Carvalho Chehabthen assumed to have used the helper functions provided by the PCI subsystem for
501151f4e2bSMauro Carvalho Chehabthis purpose).  PCI device drivers are not encouraged to do that, but in some
502151f4e2bSMauro Carvalho Chehabrare cases doing that in the driver may be the optimum approach.
503151f4e2bSMauro Carvalho Chehab
504151f4e2bSMauro Carvalho Chehab2.4.2. System Resume
505151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^
506151f4e2bSMauro Carvalho Chehab
507151f4e2bSMauro Carvalho ChehabWhen the system is undergoing a transition from a sleep state in which the
508151f4e2bSMauro Carvalho Chehabcontents of memory have been preserved, such as one of the ACPI sleep states
509151f4e2bSMauro Carvalho ChehabS1-S3, into the working state (ACPI S0), the phases are:
510151f4e2bSMauro Carvalho Chehab
511151f4e2bSMauro Carvalho Chehab	resume_noirq, resume, complete.
512151f4e2bSMauro Carvalho Chehab
513151f4e2bSMauro Carvalho ChehabThe following PCI bus type's callbacks, respectively, are executed in these
514151f4e2bSMauro Carvalho Chehabphases::
515151f4e2bSMauro Carvalho Chehab
516151f4e2bSMauro Carvalho Chehab	pci_pm_resume_noirq()
517151f4e2bSMauro Carvalho Chehab	pci_pm_resume()
518151f4e2bSMauro Carvalho Chehab	pci_pm_complete()
519151f4e2bSMauro Carvalho Chehab
520151f4e2bSMauro Carvalho ChehabThe pci_pm_resume_noirq() routine first puts the device into the full-power
521151f4e2bSMauro Carvalho Chehabstate, restores its standard configuration registers and applies early resume
522151f4e2bSMauro Carvalho Chehabhardware quirks related to the device, if necessary.  This is done
523151f4e2bSMauro Carvalho Chehabunconditionally, regardless of whether or not the device's driver implements
524151f4e2bSMauro Carvalho Chehablegacy PCI power management callbacks (this way all PCI devices are in the
525151f4e2bSMauro Carvalho Chehabfull-power state and their standard configuration registers have been restored
526151f4e2bSMauro Carvalho Chehabwhen their interrupt handlers are invoked for the first time during resume,
527151f4e2bSMauro Carvalho Chehabwhich allows the kernel to avoid problems with the handling of shared interrupts
528151f4e2bSMauro Carvalho Chehabby drivers whose devices are still suspended).  If legacy PCI power management
529151f4e2bSMauro Carvalho Chehabcallbacks (see Section 3) are implemented by the device's driver, the legacy
530151f4e2bSMauro Carvalho Chehabearly resume callback is executed and its result is returned.  Otherwise, the
531151f4e2bSMauro Carvalho Chehabdevice driver's pm->resume_noirq() callback is executed, if defined, and its
532151f4e2bSMauro Carvalho Chehabresult is returned.
533151f4e2bSMauro Carvalho Chehab
534151f4e2bSMauro Carvalho ChehabThe pci_pm_resume() routine first checks if the device's standard configuration
535151f4e2bSMauro Carvalho Chehabregisters have been restored and restores them if that's not the case (this
536151f4e2bSMauro Carvalho Chehabonly is necessary in the error path during a failing suspend).  Next, resume
537151f4e2bSMauro Carvalho Chehabhardware quirks related to the device are applied, if necessary, and if the
538151f4e2bSMauro Carvalho Chehabdevice's driver implements legacy PCI power management callbacks (see
539151f4e2bSMauro Carvalho ChehabSection 3), the driver's legacy resume callback is executed and its result is
540151f4e2bSMauro Carvalho Chehabreturned.  Otherwise, the device's wakeup signaling mechanisms are blocked and
541151f4e2bSMauro Carvalho Chehabits driver's pm->resume() callback is executed, if defined (the callback's
542151f4e2bSMauro Carvalho Chehabresult is then returned).
543151f4e2bSMauro Carvalho Chehab
544151f4e2bSMauro Carvalho ChehabThe resume phase is carried out asynchronously for PCI devices, like the
545151f4e2bSMauro Carvalho Chehabsuspend phase described above, which means that if two PCI devices don't depend
546151f4e2bSMauro Carvalho Chehabon each other in a known way, the pci_pm_resume() routine may be executed for
547151f4e2bSMauro Carvalho Chehabthe both of them in parallel.
548151f4e2bSMauro Carvalho Chehab
549151f4e2bSMauro Carvalho ChehabThe pci_pm_complete() routine only executes the device driver's pm->complete()
550151f4e2bSMauro Carvalho Chehabcallback, if defined.
551151f4e2bSMauro Carvalho Chehab
552151f4e2bSMauro Carvalho Chehab2.4.3. System Hibernation
553151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^
554151f4e2bSMauro Carvalho Chehab
555151f4e2bSMauro Carvalho ChehabSystem hibernation is more complicated than system suspend, because it requires
556151f4e2bSMauro Carvalho Chehaba system image to be created and written into a persistent storage medium.  The
557151f4e2bSMauro Carvalho Chehabimage is created atomically and all devices are quiesced, or frozen, before that
558151f4e2bSMauro Carvalho Chehabhappens.
559151f4e2bSMauro Carvalho Chehab
560151f4e2bSMauro Carvalho ChehabThe freezing of devices is carried out after enough memory has been freed (at
561151f4e2bSMauro Carvalho Chehabthe time of this writing the image creation requires at least 50% of system RAM
562151f4e2bSMauro Carvalho Chehabto be free) in the following three phases:
563151f4e2bSMauro Carvalho Chehab
564151f4e2bSMauro Carvalho Chehab	prepare, freeze, freeze_noirq
565151f4e2bSMauro Carvalho Chehab
566151f4e2bSMauro Carvalho Chehabthat correspond to the PCI bus type's callbacks::
567151f4e2bSMauro Carvalho Chehab
568151f4e2bSMauro Carvalho Chehab	pci_pm_prepare()
569151f4e2bSMauro Carvalho Chehab	pci_pm_freeze()
570151f4e2bSMauro Carvalho Chehab	pci_pm_freeze_noirq()
571151f4e2bSMauro Carvalho Chehab
572151f4e2bSMauro Carvalho ChehabThis means that the prepare phase is exactly the same as for system suspend.
573151f4e2bSMauro Carvalho ChehabThe other two phases, however, are different.
574151f4e2bSMauro Carvalho Chehab
575151f4e2bSMauro Carvalho ChehabThe pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs
576151f4e2bSMauro Carvalho Chehabthe device driver's pm->freeze() callback, if defined, instead of pm->suspend(),
577151f4e2bSMauro Carvalho Chehaband it doesn't apply the suspend-related hardware quirks.  It is executed
578151f4e2bSMauro Carvalho Chehabasynchronously for different PCI devices that don't depend on each other in a
579151f4e2bSMauro Carvalho Chehabknown way.
580151f4e2bSMauro Carvalho Chehab
581151f4e2bSMauro Carvalho ChehabThe pci_pm_freeze_noirq() routine, in turn, is similar to
582151f4e2bSMauro Carvalho Chehabpci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq()
583151f4e2bSMauro Carvalho Chehabroutine instead of pm->suspend_noirq().  It also doesn't attempt to prepare the
584151f4e2bSMauro Carvalho Chehabdevice for signaling wakeup and put it into a low-power state.  Still, it saves
585151f4e2bSMauro Carvalho Chehabthe device's standard configuration registers if they haven't been saved by one
586151f4e2bSMauro Carvalho Chehabof the driver's callbacks.
587151f4e2bSMauro Carvalho Chehab
588151f4e2bSMauro Carvalho ChehabOnce the image has been created, it has to be saved.  However, at this point all
589151f4e2bSMauro Carvalho Chehabdevices are frozen and they cannot handle I/O, while their ability to handle
590151f4e2bSMauro Carvalho ChehabI/O is obviously necessary for the image saving.  Thus they have to be brought
591151f4e2bSMauro Carvalho Chehabback to the fully functional state and this is done in the following phases:
592151f4e2bSMauro Carvalho Chehab
593151f4e2bSMauro Carvalho Chehab	thaw_noirq, thaw, complete
594151f4e2bSMauro Carvalho Chehab
595151f4e2bSMauro Carvalho Chehabusing the following PCI bus type's callbacks::
596151f4e2bSMauro Carvalho Chehab
597151f4e2bSMauro Carvalho Chehab	pci_pm_thaw_noirq()
598151f4e2bSMauro Carvalho Chehab	pci_pm_thaw()
599151f4e2bSMauro Carvalho Chehab	pci_pm_complete()
600151f4e2bSMauro Carvalho Chehab
601151f4e2bSMauro Carvalho Chehabrespectively.
602151f4e2bSMauro Carvalho Chehab
603dc68b406SBjorn HelgaasThe first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq().
604dc68b406SBjorn HelgaasIt puts the device into the full power state and restores its standard
605dc68b406SBjorn Helgaasconfiguration registers.  It also executes the device driver's pm->thaw_noirq()
606dc68b406SBjorn Helgaascallback, if defined, instead of pm->resume_noirq().
607151f4e2bSMauro Carvalho Chehab
608151f4e2bSMauro Carvalho ChehabThe pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
609151f4e2bSMauro Carvalho Chehabdriver's pm->thaw() callback instead of pm->resume().  It is executed
610151f4e2bSMauro Carvalho Chehabasynchronously for different PCI devices that don't depend on each other in a
611151f4e2bSMauro Carvalho Chehabknown way.
612151f4e2bSMauro Carvalho Chehab
613dc68b406SBjorn HelgaasThe complete phase is the same as for system resume.
614151f4e2bSMauro Carvalho Chehab
615151f4e2bSMauro Carvalho ChehabAfter saving the image, devices need to be powered down before the system can
616151f4e2bSMauro Carvalho Chehabenter the target sleep state (ACPI S4 for ACPI-based systems).  This is done in
617151f4e2bSMauro Carvalho Chehabthree phases:
618151f4e2bSMauro Carvalho Chehab
619151f4e2bSMauro Carvalho Chehab	prepare, poweroff, poweroff_noirq
620151f4e2bSMauro Carvalho Chehab
621151f4e2bSMauro Carvalho Chehabwhere the prepare phase is exactly the same as for system suspend.  The other
622151f4e2bSMauro Carvalho Chehabtwo phases are analogous to the suspend and suspend_noirq phases, respectively.
623151f4e2bSMauro Carvalho ChehabThe PCI subsystem-level callbacks they correspond to::
624151f4e2bSMauro Carvalho Chehab
625151f4e2bSMauro Carvalho Chehab	pci_pm_poweroff()
626151f4e2bSMauro Carvalho Chehab	pci_pm_poweroff_noirq()
627151f4e2bSMauro Carvalho Chehab
628151f4e2bSMauro Carvalho Chehabwork in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively,
629151f4e2bSMauro Carvalho Chehabalthough they don't attempt to save the device's standard configuration
630151f4e2bSMauro Carvalho Chehabregisters.
631151f4e2bSMauro Carvalho Chehab
632151f4e2bSMauro Carvalho Chehab2.4.4. System Restore
633151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^
634151f4e2bSMauro Carvalho Chehab
635151f4e2bSMauro Carvalho ChehabSystem restore requires a hibernation image to be loaded into memory and the
636151f4e2bSMauro Carvalho Chehabpre-hibernation memory contents to be restored before the pre-hibernation system
637151f4e2bSMauro Carvalho Chehabactivity can be resumed.
638151f4e2bSMauro Carvalho Chehab
639b64cf7a1SBjorn HelgaasAs described in Documentation/driver-api/pm/devices.rst, the hibernation image
640b64cf7a1SBjorn Helgaasis loaded into memory by a fresh instance of the kernel, called the boot kernel,
641b64cf7a1SBjorn Helgaaswhich in turn is loaded and run by a boot loader in the usual way.  After the
642b64cf7a1SBjorn Helgaasboot kernel has loaded the image, it needs to replace its own code and data with
643b64cf7a1SBjorn Helgaasthe code and data of the "hibernated" kernel stored within the image, called the
644b64cf7a1SBjorn Helgaasimage kernel.  For this purpose all devices are frozen just like before creating
645151f4e2bSMauro Carvalho Chehabthe image during hibernation, in the
646151f4e2bSMauro Carvalho Chehab
647151f4e2bSMauro Carvalho Chehab	prepare, freeze, freeze_noirq
648151f4e2bSMauro Carvalho Chehab
649151f4e2bSMauro Carvalho Chehabphases described above.  However, the devices affected by these phases are only
650151f4e2bSMauro Carvalho Chehabthose having drivers in the boot kernel; other devices will still be in whatever
651151f4e2bSMauro Carvalho Chehabstate the boot loader left them.
652151f4e2bSMauro Carvalho Chehab
653151f4e2bSMauro Carvalho ChehabShould the restoration of the pre-hibernation memory contents fail, the boot
654151f4e2bSMauro Carvalho Chehabkernel would go through the "thawing" procedure described above, using the
655151f4e2bSMauro Carvalho Chehabthaw_noirq, thaw, and complete phases (that will only affect the devices having
656151f4e2bSMauro Carvalho Chehabdrivers in the boot kernel), and then continue running normally.
657151f4e2bSMauro Carvalho Chehab
658151f4e2bSMauro Carvalho ChehabIf the pre-hibernation memory contents are restored successfully, which is the
659151f4e2bSMauro Carvalho Chehabusual situation, control is passed to the image kernel, which then becomes
660151f4e2bSMauro Carvalho Chehabresponsible for bringing the system back to the working state.  To achieve this,
661151f4e2bSMauro Carvalho Chehabit must restore the devices' pre-hibernation functionality, which is done much
662151f4e2bSMauro Carvalho Chehablike waking up from the memory sleep state, although it involves different
663151f4e2bSMauro Carvalho Chehabphases:
664151f4e2bSMauro Carvalho Chehab
665151f4e2bSMauro Carvalho Chehab	restore_noirq, restore, complete
666151f4e2bSMauro Carvalho Chehab
667151f4e2bSMauro Carvalho ChehabThe first two of these are analogous to the resume_noirq and resume phases
668151f4e2bSMauro Carvalho Chehabdescribed above, respectively, and correspond to the following PCI subsystem
669151f4e2bSMauro Carvalho Chehabcallbacks::
670151f4e2bSMauro Carvalho Chehab
671151f4e2bSMauro Carvalho Chehab	pci_pm_restore_noirq()
672151f4e2bSMauro Carvalho Chehab	pci_pm_restore()
673151f4e2bSMauro Carvalho Chehab
674151f4e2bSMauro Carvalho ChehabThese callbacks work in analogy with pci_pm_resume_noirq() and pci_pm_resume(),
675151f4e2bSMauro Carvalho Chehabrespectively, but they execute the device driver's pm->restore_noirq() and
676151f4e2bSMauro Carvalho Chehabpm->restore() callbacks, if available.
677151f4e2bSMauro Carvalho Chehab
678151f4e2bSMauro Carvalho ChehabThe complete phase is carried out in exactly the same way as during system
679151f4e2bSMauro Carvalho Chehabresume.
680151f4e2bSMauro Carvalho Chehab
681151f4e2bSMauro Carvalho Chehab
682151f4e2bSMauro Carvalho Chehab3. PCI Device Drivers and Power Management
683151f4e2bSMauro Carvalho Chehab==========================================
684151f4e2bSMauro Carvalho Chehab
685151f4e2bSMauro Carvalho Chehab3.1. Power Management Callbacks
686151f4e2bSMauro Carvalho Chehab-------------------------------
687151f4e2bSMauro Carvalho Chehab
688151f4e2bSMauro Carvalho ChehabPCI device drivers participate in power management by providing callbacks to be
689151f4e2bSMauro Carvalho Chehabexecuted by the PCI subsystem's power management routines described above and by
690151f4e2bSMauro Carvalho Chehabcontrolling the runtime power management of their devices.
691151f4e2bSMauro Carvalho Chehab
692151f4e2bSMauro Carvalho ChehabAt the time of this writing there are two ways to define power management
693151f4e2bSMauro Carvalho Chehabcallbacks for a PCI device driver, the recommended one, based on using a
694b64cf7a1SBjorn Helgaasdev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and
6951a1daf09SBjorn Helgaasthe "legacy" one, in which the .suspend() and .resume() callbacks from struct
6961a1daf09SBjorn Helgaaspci_driver are used.  The legacy approach, however, doesn't allow one to define
6971a1daf09SBjorn Helgaasruntime power management callbacks and is not really suitable for any new
6981a1daf09SBjorn Helgaasdrivers.  Therefore it is not covered by this document (refer to the source code
6991a1daf09SBjorn Helgaasto learn more about it).
700151f4e2bSMauro Carvalho Chehab
701151f4e2bSMauro Carvalho ChehabIt is recommended that all PCI device drivers define a struct dev_pm_ops object
702151f4e2bSMauro Carvalho Chehabcontaining pointers to power management (PM) callbacks that will be executed by
703151f4e2bSMauro Carvalho Chehabthe PCI subsystem's PM routines in various circumstances.  A pointer to the
704151f4e2bSMauro Carvalho Chehabdriver's struct dev_pm_ops object has to be assigned to the driver.pm field in
705151f4e2bSMauro Carvalho Chehabits struct pci_driver object.  Once that has happened, the "legacy" PM callbacks
706151f4e2bSMauro Carvalho Chehabin struct pci_driver are ignored (even if they are not NULL).
707151f4e2bSMauro Carvalho Chehab
708151f4e2bSMauro Carvalho ChehabThe PM callbacks in struct dev_pm_ops are not mandatory and if they are not
709151f4e2bSMauro Carvalho Chehabdefined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI
710151f4e2bSMauro Carvalho Chehabsubsystem will handle the device in a simplified default manner.  If they are
711151f4e2bSMauro Carvalho Chehabdefined, though, they are expected to behave as described in the following
712151f4e2bSMauro Carvalho Chehabsubsections.
713151f4e2bSMauro Carvalho Chehab
714151f4e2bSMauro Carvalho Chehab3.1.1. prepare()
715151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^
716151f4e2bSMauro Carvalho Chehab
717151f4e2bSMauro Carvalho ChehabThe prepare() callback is executed during system suspend, during hibernation
718151f4e2bSMauro Carvalho Chehab(when a hibernation image is about to be created), during power-off after
719151f4e2bSMauro Carvalho Chehabsaving a hibernation image and during system restore, when a hibernation image
720151f4e2bSMauro Carvalho Chehabhas just been loaded into memory.
721151f4e2bSMauro Carvalho Chehab
722151f4e2bSMauro Carvalho ChehabThis callback is only necessary if the driver's device has children that in
723151f4e2bSMauro Carvalho Chehabgeneral may be registered at any time.  In that case the role of the prepare()
724151f4e2bSMauro Carvalho Chehabcallback is to prevent new children of the device from being registered until
725151f4e2bSMauro Carvalho Chehabone of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run.
726151f4e2bSMauro Carvalho Chehab
727151f4e2bSMauro Carvalho ChehabIn addition to that the prepare() callback may carry out some operations
728151f4e2bSMauro Carvalho Chehabpreparing the device to be suspended, although it should not allocate memory
729151f4e2bSMauro Carvalho Chehab(if additional memory is required to suspend the device, it has to be
730151f4e2bSMauro Carvalho Chehabpreallocated earlier, for example in a suspend/hibernate notifier as described
731151f4e2bSMauro Carvalho Chehabin Documentation/driver-api/pm/notifiers.rst).
732151f4e2bSMauro Carvalho Chehab
733151f4e2bSMauro Carvalho Chehab3.1.2. suspend()
734151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^
735151f4e2bSMauro Carvalho Chehab
736151f4e2bSMauro Carvalho ChehabThe suspend() callback is only executed during system suspend, after prepare()
737151f4e2bSMauro Carvalho Chehabcallbacks have been executed for all devices in the system.
738151f4e2bSMauro Carvalho Chehab
739151f4e2bSMauro Carvalho ChehabThis callback is expected to quiesce the device and prepare it to be put into a
740151f4e2bSMauro Carvalho Chehablow-power state by the PCI subsystem.  It is not required (in fact it even is
741151f4e2bSMauro Carvalho Chehabnot recommended) that a PCI driver's suspend() callback save the standard
742151f4e2bSMauro Carvalho Chehabconfiguration registers of the device, prepare it for waking up the system, or
743151f4e2bSMauro Carvalho Chehabput it into a low-power state.  All of these operations can very well be taken
744151f4e2bSMauro Carvalho Chehabcare of by the PCI subsystem, without the driver's participation.
745151f4e2bSMauro Carvalho Chehab
746151f4e2bSMauro Carvalho ChehabHowever, in some rare case it is convenient to carry out these operations in
747151f4e2bSMauro Carvalho Chehaba PCI driver.  Then, pci_save_state(), pci_prepare_to_sleep(), and
748151f4e2bSMauro Carvalho Chehabpci_set_power_state() should be used to save the device's standard configuration
749151f4e2bSMauro Carvalho Chehabregisters, to prepare it for system wakeup (if necessary), and to put it into a
750151f4e2bSMauro Carvalho Chehablow-power state, respectively.  Moreover, if the driver calls pci_save_state(),
751151f4e2bSMauro Carvalho Chehabthe PCI subsystem will not execute either pci_prepare_to_sleep(), or
752151f4e2bSMauro Carvalho Chehabpci_set_power_state() for its device, so the driver is then responsible for
753151f4e2bSMauro Carvalho Chehabhandling the device as appropriate.
754151f4e2bSMauro Carvalho Chehab
755151f4e2bSMauro Carvalho ChehabWhile the suspend() callback is being executed, the driver's interrupt handler
756151f4e2bSMauro Carvalho Chehabcan be invoked to handle an interrupt from the device, so all suspend-related
757151f4e2bSMauro Carvalho Chehaboperations relying on the driver's ability to handle interrupts should be
758151f4e2bSMauro Carvalho Chehabcarried out in this callback.
759151f4e2bSMauro Carvalho Chehab
760151f4e2bSMauro Carvalho Chehab3.1.3. suspend_noirq()
761151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^
762151f4e2bSMauro Carvalho Chehab
763151f4e2bSMauro Carvalho ChehabThe suspend_noirq() callback is only executed during system suspend, after
764151f4e2bSMauro Carvalho Chehabsuspend() callbacks have been executed for all devices in the system and
765151f4e2bSMauro Carvalho Chehabafter device interrupts have been disabled by the PM core.
766151f4e2bSMauro Carvalho Chehab
767151f4e2bSMauro Carvalho ChehabThe difference between suspend_noirq() and suspend() is that the driver's
768151f4e2bSMauro Carvalho Chehabinterrupt handler will not be invoked while suspend_noirq() is running.  Thus
769151f4e2bSMauro Carvalho Chehabsuspend_noirq() can carry out operations that would cause race conditions to
770151f4e2bSMauro Carvalho Chehabarise if they were performed in suspend().
771151f4e2bSMauro Carvalho Chehab
772151f4e2bSMauro Carvalho Chehab3.1.4. freeze()
773151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^
774151f4e2bSMauro Carvalho Chehab
775151f4e2bSMauro Carvalho ChehabThe freeze() callback is hibernation-specific and is executed in two situations,
776151f4e2bSMauro Carvalho Chehabduring hibernation, after prepare() callbacks have been executed for all devices
777151f4e2bSMauro Carvalho Chehabin preparation for the creation of a system image, and during restore,
778151f4e2bSMauro Carvalho Chehabafter a system image has been loaded into memory from persistent storage and the
779151f4e2bSMauro Carvalho Chehabprepare() callbacks have been executed for all devices.
780151f4e2bSMauro Carvalho Chehab
781151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend() callback
782151f4e2bSMauro Carvalho Chehabdescribed above.  In fact, they only need to be different in the rare cases when
783151f4e2bSMauro Carvalho Chehabthe driver takes the responsibility for putting the device into a low-power
784151f4e2bSMauro Carvalho Chehabstate.
785151f4e2bSMauro Carvalho Chehab
786151f4e2bSMauro Carvalho ChehabIn that cases the freeze() callback should not prepare the device system wakeup
787151f4e2bSMauro Carvalho Chehabor put it into a low-power state.  Still, either it or freeze_noirq() should
788151f4e2bSMauro Carvalho Chehabsave the device's standard configuration registers using pci_save_state().
789151f4e2bSMauro Carvalho Chehab
790151f4e2bSMauro Carvalho Chehab3.1.5. freeze_noirq()
791151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^
792151f4e2bSMauro Carvalho Chehab
793151f4e2bSMauro Carvalho ChehabThe freeze_noirq() callback is hibernation-specific.  It is executed during
794151f4e2bSMauro Carvalho Chehabhibernation, after prepare() and freeze() callbacks have been executed for all
795151f4e2bSMauro Carvalho Chehabdevices in preparation for the creation of a system image, and during restore,
796151f4e2bSMauro Carvalho Chehabafter a system image has been loaded into memory and after prepare() and
797151f4e2bSMauro Carvalho Chehabfreeze() callbacks have been executed for all devices.  It is always executed
798151f4e2bSMauro Carvalho Chehabafter device interrupts have been disabled by the PM core.
799151f4e2bSMauro Carvalho Chehab
800151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend_noirq()
801151f4e2bSMauro Carvalho Chehabcallback described above and it very rarely is necessary to define
802151f4e2bSMauro Carvalho Chehabfreeze_noirq().
803151f4e2bSMauro Carvalho Chehab
804151f4e2bSMauro Carvalho ChehabThe difference between freeze_noirq() and freeze() is analogous to the
805151f4e2bSMauro Carvalho Chehabdifference between suspend_noirq() and suspend().
806151f4e2bSMauro Carvalho Chehab
807151f4e2bSMauro Carvalho Chehab3.1.6. poweroff()
808151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^
809151f4e2bSMauro Carvalho Chehab
810151f4e2bSMauro Carvalho ChehabThe poweroff() callback is hibernation-specific.  It is executed when the system
811151f4e2bSMauro Carvalho Chehabis about to be powered off after saving a hibernation image to a persistent
812151f4e2bSMauro Carvalho Chehabstorage.  prepare() callbacks are executed for all devices before poweroff() is
813151f4e2bSMauro Carvalho Chehabcalled.
814151f4e2bSMauro Carvalho Chehab
815151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend() and freeze()
816151f4e2bSMauro Carvalho Chehabcallbacks described above, although it does not need to save the contents of
817151f4e2bSMauro Carvalho Chehabthe device's registers.  In particular, if the driver wants to put the device
818151f4e2bSMauro Carvalho Chehabinto a low-power state itself instead of allowing the PCI subsystem to do that,
819151f4e2bSMauro Carvalho Chehabthe poweroff() callback should use pci_prepare_to_sleep() and
820151f4e2bSMauro Carvalho Chehabpci_set_power_state() to prepare the device for system wakeup and to put it
821151f4e2bSMauro Carvalho Chehabinto a low-power state, respectively, but it need not save the device's standard
822151f4e2bSMauro Carvalho Chehabconfiguration registers.
823151f4e2bSMauro Carvalho Chehab
824151f4e2bSMauro Carvalho Chehab3.1.7. poweroff_noirq()
825151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^
826151f4e2bSMauro Carvalho Chehab
827151f4e2bSMauro Carvalho ChehabThe poweroff_noirq() callback is hibernation-specific.  It is executed after
828151f4e2bSMauro Carvalho Chehabpoweroff() callbacks have been executed for all devices in the system.
829151f4e2bSMauro Carvalho Chehab
830151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend_noirq() and
831151f4e2bSMauro Carvalho Chehabfreeze_noirq() callbacks described above, but it does not need to save the
832151f4e2bSMauro Carvalho Chehabcontents of the device's registers.
833151f4e2bSMauro Carvalho Chehab
834151f4e2bSMauro Carvalho ChehabThe difference between poweroff_noirq() and poweroff() is analogous to the
835151f4e2bSMauro Carvalho Chehabdifference between suspend_noirq() and suspend().
836151f4e2bSMauro Carvalho Chehab
837151f4e2bSMauro Carvalho Chehab3.1.8. resume_noirq()
838151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^
839151f4e2bSMauro Carvalho Chehab
840151f4e2bSMauro Carvalho ChehabThe resume_noirq() callback is only executed during system resume, after the
841151f4e2bSMauro Carvalho ChehabPM core has enabled the non-boot CPUs.  The driver's interrupt handler will not
842151f4e2bSMauro Carvalho Chehabbe invoked while resume_noirq() is running, so this callback can carry out
843151f4e2bSMauro Carvalho Chehaboperations that might race with the interrupt handler.
844151f4e2bSMauro Carvalho Chehab
845151f4e2bSMauro Carvalho ChehabSince the PCI subsystem unconditionally puts all devices into the full power
846151f4e2bSMauro Carvalho Chehabstate in the resume_noirq phase of system resume and restores their standard
847151f4e2bSMauro Carvalho Chehabconfiguration registers, resume_noirq() is usually not necessary.  In general
848151f4e2bSMauro Carvalho Chehabit should only be used for performing operations that would lead to race
849151f4e2bSMauro Carvalho Chehabconditions if carried out by resume().
850151f4e2bSMauro Carvalho Chehab
851151f4e2bSMauro Carvalho Chehab3.1.9. resume()
852151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^
853151f4e2bSMauro Carvalho Chehab
854151f4e2bSMauro Carvalho ChehabThe resume() callback is only executed during system resume, after
855151f4e2bSMauro Carvalho Chehabresume_noirq() callbacks have been executed for all devices in the system and
856151f4e2bSMauro Carvalho Chehabdevice interrupts have been enabled by the PM core.
857151f4e2bSMauro Carvalho Chehab
858151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the pre-suspend configuration of the
859151f4e2bSMauro Carvalho Chehabdevice and bringing it back to the fully functional state.  The device should be
860151f4e2bSMauro Carvalho Chehabable to process I/O in a usual way after resume() has returned.
861151f4e2bSMauro Carvalho Chehab
862151f4e2bSMauro Carvalho Chehab3.1.10. thaw_noirq()
863151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^
864151f4e2bSMauro Carvalho Chehab
865151f4e2bSMauro Carvalho ChehabThe thaw_noirq() callback is hibernation-specific.  It is executed after a
866151f4e2bSMauro Carvalho Chehabsystem image has been created and the non-boot CPUs have been enabled by the PM
867151f4e2bSMauro Carvalho Chehabcore, in the thaw_noirq phase of hibernation.  It also may be executed if the
868151f4e2bSMauro Carvalho Chehabloading of a hibernation image fails during system restore (it is then executed
869151f4e2bSMauro Carvalho Chehabafter enabling the non-boot CPUs).  The driver's interrupt handler will not be
870151f4e2bSMauro Carvalho Chehabinvoked while thaw_noirq() is running.
871151f4e2bSMauro Carvalho Chehab
872151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of resume_noirq().  The
873151f4e2bSMauro Carvalho Chehabdifference between these two callbacks is that thaw_noirq() is executed after
874151f4e2bSMauro Carvalho Chehabfreeze() and freeze_noirq(), so in general it does not need to modify the
875151f4e2bSMauro Carvalho Chehabcontents of the device's registers.
876151f4e2bSMauro Carvalho Chehab
877151f4e2bSMauro Carvalho Chehab3.1.11. thaw()
878151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^
879151f4e2bSMauro Carvalho Chehab
880151f4e2bSMauro Carvalho ChehabThe thaw() callback is hibernation-specific.  It is executed after thaw_noirq()
881151f4e2bSMauro Carvalho Chehabcallbacks have been executed for all devices in the system and after device
882151f4e2bSMauro Carvalho Chehabinterrupts have been enabled by the PM core.
883151f4e2bSMauro Carvalho Chehab
884151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the pre-freeze configuration of
885151f4e2bSMauro Carvalho Chehabthe device, so that it will work in a usual way after thaw() has returned.
886151f4e2bSMauro Carvalho Chehab
887151f4e2bSMauro Carvalho Chehab3.1.12. restore_noirq()
888151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^
889151f4e2bSMauro Carvalho Chehab
890151f4e2bSMauro Carvalho ChehabThe restore_noirq() callback is hibernation-specific.  It is executed in the
891151f4e2bSMauro Carvalho Chehabrestore_noirq phase of hibernation, when the boot kernel has passed control to
892151f4e2bSMauro Carvalho Chehabthe image kernel and the non-boot CPUs have been enabled by the image kernel's
893151f4e2bSMauro Carvalho ChehabPM core.
894151f4e2bSMauro Carvalho Chehab
895151f4e2bSMauro Carvalho ChehabThis callback is analogous to resume_noirq() with the exception that it cannot
896151f4e2bSMauro Carvalho Chehabmake any assumption on the previous state of the device, even if the BIOS (or
897151f4e2bSMauro Carvalho Chehabgenerally the platform firmware) is known to preserve that state over a
898151f4e2bSMauro Carvalho Chehabsuspend-resume cycle.
899151f4e2bSMauro Carvalho Chehab
900151f4e2bSMauro Carvalho ChehabFor the vast majority of PCI device drivers there is no difference between
901151f4e2bSMauro Carvalho Chehabresume_noirq() and restore_noirq().
902151f4e2bSMauro Carvalho Chehab
903151f4e2bSMauro Carvalho Chehab3.1.13. restore()
904151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^
905151f4e2bSMauro Carvalho Chehab
906151f4e2bSMauro Carvalho ChehabThe restore() callback is hibernation-specific.  It is executed after
907151f4e2bSMauro Carvalho Chehabrestore_noirq() callbacks have been executed for all devices in the system and
908151f4e2bSMauro Carvalho Chehabafter the PM core has enabled device drivers' interrupt handlers to be invoked.
909151f4e2bSMauro Carvalho Chehab
910151f4e2bSMauro Carvalho ChehabThis callback is analogous to resume(), just like restore_noirq() is analogous
911151f4e2bSMauro Carvalho Chehabto resume_noirq().  Consequently, the difference between restore_noirq() and
912151f4e2bSMauro Carvalho Chehabrestore() is analogous to the difference between resume_noirq() and resume().
913151f4e2bSMauro Carvalho Chehab
914151f4e2bSMauro Carvalho ChehabFor the vast majority of PCI device drivers there is no difference between
915151f4e2bSMauro Carvalho Chehabresume() and restore().
916151f4e2bSMauro Carvalho Chehab
917151f4e2bSMauro Carvalho Chehab3.1.14. complete()
918151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^
919151f4e2bSMauro Carvalho Chehab
920151f4e2bSMauro Carvalho ChehabThe complete() callback is executed in the following situations:
921151f4e2bSMauro Carvalho Chehab
922151f4e2bSMauro Carvalho Chehab  - during system resume, after resume() callbacks have been executed for all
923151f4e2bSMauro Carvalho Chehab    devices,
924151f4e2bSMauro Carvalho Chehab  - during hibernation, before saving the system image, after thaw() callbacks
925151f4e2bSMauro Carvalho Chehab    have been executed for all devices,
926151f4e2bSMauro Carvalho Chehab  - during system restore, when the system is going back to its pre-hibernation
927151f4e2bSMauro Carvalho Chehab    state, after restore() callbacks have been executed for all devices.
928151f4e2bSMauro Carvalho Chehab
929151f4e2bSMauro Carvalho ChehabIt also may be executed if the loading of a hibernation image into memory fails
930151f4e2bSMauro Carvalho Chehab(in that case it is run after thaw() callbacks have been executed for all
931151f4e2bSMauro Carvalho Chehabdevices that have drivers in the boot kernel).
932151f4e2bSMauro Carvalho Chehab
933151f4e2bSMauro Carvalho ChehabThis callback is entirely optional, although it may be necessary if the
934151f4e2bSMauro Carvalho Chehabprepare() callback performs operations that need to be reversed.
935151f4e2bSMauro Carvalho Chehab
936151f4e2bSMauro Carvalho Chehab3.1.15. runtime_suspend()
937151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^
938151f4e2bSMauro Carvalho Chehab
939151f4e2bSMauro Carvalho ChehabThe runtime_suspend() callback is specific to device runtime power management
940151f4e2bSMauro Carvalho Chehab(runtime PM).  It is executed by the PM core's runtime PM framework when the
941151f4e2bSMauro Carvalho Chehabdevice is about to be suspended (i.e. quiesced and put into a low-power state)
942151f4e2bSMauro Carvalho Chehabat run time.
943151f4e2bSMauro Carvalho Chehab
944151f4e2bSMauro Carvalho ChehabThis callback is responsible for freezing the device and preparing it to be
945151f4e2bSMauro Carvalho Chehabput into a low-power state, but it must allow the PCI subsystem to perform all
946151f4e2bSMauro Carvalho Chehabof the PCI-specific actions necessary for suspending the device.
947151f4e2bSMauro Carvalho Chehab
948151f4e2bSMauro Carvalho Chehab3.1.16. runtime_resume()
949151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^
950151f4e2bSMauro Carvalho Chehab
951151f4e2bSMauro Carvalho ChehabThe runtime_resume() callback is specific to device runtime PM.  It is executed
952151f4e2bSMauro Carvalho Chehabby the PM core's runtime PM framework when the device is about to be resumed
953151f4e2bSMauro Carvalho Chehab(i.e. put into the full-power state and programmed to process I/O normally) at
954151f4e2bSMauro Carvalho Chehabrun time.
955151f4e2bSMauro Carvalho Chehab
956151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the normal functionality of the
957151f4e2bSMauro Carvalho Chehabdevice after it has been put into the full-power state by the PCI subsystem.
958151f4e2bSMauro Carvalho ChehabThe device is expected to be able to process I/O in the usual way after
959151f4e2bSMauro Carvalho Chehabruntime_resume() has returned.
960151f4e2bSMauro Carvalho Chehab
961151f4e2bSMauro Carvalho Chehab3.1.17. runtime_idle()
962151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^
963151f4e2bSMauro Carvalho Chehab
964151f4e2bSMauro Carvalho ChehabThe runtime_idle() callback is specific to device runtime PM.  It is executed
965151f4e2bSMauro Carvalho Chehabby the PM core's runtime PM framework whenever it may be desirable to suspend
966151f4e2bSMauro Carvalho Chehabthe device according to the PM core's information.  In particular, it is
967151f4e2bSMauro Carvalho Chehabautomatically executed right after runtime_resume() has returned in case the
968151f4e2bSMauro Carvalho Chehabresume of the device has happened as a result of a spurious event.
969151f4e2bSMauro Carvalho Chehab
970151f4e2bSMauro Carvalho ChehabThis callback is optional, but if it is not implemented or if it returns 0, the
971151f4e2bSMauro Carvalho ChehabPCI subsystem will call pm_runtime_suspend() for the device, which in turn will
972151f4e2bSMauro Carvalho Chehabcause the driver's runtime_suspend() callback to be executed.
973151f4e2bSMauro Carvalho Chehab
974151f4e2bSMauro Carvalho Chehab3.1.18. Pointing Multiple Callback Pointers to One Routine
975151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
976151f4e2bSMauro Carvalho Chehab
977151f4e2bSMauro Carvalho ChehabAlthough in principle each of the callbacks described in the previous
978151f4e2bSMauro Carvalho Chehabsubsections can be defined as a separate function, it often is convenient to
979151f4e2bSMauro Carvalho Chehabpoint two or more members of struct dev_pm_ops to the same routine.  There are
980151f4e2bSMauro Carvalho Chehaba few convenience macros that can be used for this purpose.
981151f4e2bSMauro Carvalho Chehab
982151f4e2bSMauro Carvalho ChehabThe SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one
983151f4e2bSMauro Carvalho Chehabsuspend routine pointed to by the .suspend(), .freeze(), and .poweroff()
984151f4e2bSMauro Carvalho Chehabmembers and one resume routine pointed to by the .resume(), .thaw(), and
985151f4e2bSMauro Carvalho Chehab.restore() members.  The other function pointers in this struct dev_pm_ops are
986151f4e2bSMauro Carvalho Chehabunset.
987151f4e2bSMauro Carvalho Chehab
988151f4e2bSMauro Carvalho ChehabThe UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it
989151f4e2bSMauro Carvalho Chehabadditionally sets the .runtime_resume() pointer to the same value as
990151f4e2bSMauro Carvalho Chehab.resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to
991151f4e2bSMauro Carvalho Chehabthe same value as .suspend() (and .freeze() and .poweroff()).
992151f4e2bSMauro Carvalho Chehab
993151f4e2bSMauro Carvalho ChehabThe SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct
994151f4e2bSMauro Carvalho Chehabdev_pm_ops to indicate that one suspend routine is to be pointed to by the
995151f4e2bSMauro Carvalho Chehab.suspend(), .freeze(), and .poweroff() members and one resume routine is to
996151f4e2bSMauro Carvalho Chehabbe pointed to by the .resume(), .thaw(), and .restore() members.
997151f4e2bSMauro Carvalho Chehab
998151f4e2bSMauro Carvalho Chehab3.1.19. Driver Flags for Power Management
999151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1000151f4e2bSMauro Carvalho Chehab
1001151f4e2bSMauro Carvalho ChehabThe PM core allows device drivers to set flags that influence the handling of
1002151f4e2bSMauro Carvalho Chehabpower management for the devices by the core itself and by middle layer code
1003151f4e2bSMauro Carvalho Chehabincluding the PCI bus type.  The flags should be set once at the driver probe
1004151f4e2bSMauro Carvalho Chehabtime with the help of the dev_pm_set_driver_flags() function and they should not
1005151f4e2bSMauro Carvalho Chehabbe updated directly afterwards.
1006151f4e2bSMauro Carvalho Chehab
1007e0751556SRafael J. WysockiThe DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
1008e0751556SRafael J. Wysockidirect-complete mechanism allowing device suspend/resume callbacks to be skipped
1009e0751556SRafael J. Wysockiif the device is in runtime suspend when the system suspend starts.  That also
1010e0751556SRafael J. Wysockiaffects all of the ancestors of the device, so this flag should only be used if
1011e0751556SRafael J. Wysockiabsolutely necessary.
1012151f4e2bSMauro Carvalho Chehab
10132fff3f73SRafael J. WysockiThe DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive
10142fff3f73SRafael J. Wysockivalue from pci_pm_prepare() only if the ->prepare callback provided by the
1015151f4e2bSMauro Carvalho Chehabdriver of the device returns a positive value.  That allows the driver to opt
10162fff3f73SRafael J. Wysockiout from using the direct-complete mechanism dynamically (whereas setting
10172fff3f73SRafael J. WysockiDPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out).
1018151f4e2bSMauro Carvalho Chehab
1019151f4e2bSMauro Carvalho ChehabThe DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
1020151f4e2bSMauro Carvalho Chehabperspective the device can be safely left in runtime suspend during system
1021151f4e2bSMauro Carvalho Chehabsuspend.  That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
10222fff3f73SRafael J. Wysockito avoid resuming the device from runtime suspend unless there are PCI-specific
10232fff3f73SRafael J. Wysockireasons for doing that.  Also, it causes pci_pm_suspend_late/noirq() and
10242fff3f73SRafael J. Wysockipci_pm_poweroff_late/noirq() to return early if the device remains in runtime
10252fff3f73SRafael J. Wysockisuspend during the "late" phase of the system-wide transition under way.
10262fff3f73SRafael J. WysockiMoreover, if the device is in runtime suspend in pci_pm_resume_noirq() or
10272fff3f73SRafael J. Wysockipci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it
10282fff3f73SRafael J. Wysockiis going to be put into D0 going forward).
1029151f4e2bSMauro Carvalho Chehab
10302fff3f73SRafael J. WysockiSetting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its
10312fff3f73SRafael J. Wysocki"noirq" and "early" resume callbacks to be skipped if the device can be left
10322fff3f73SRafael J. Wysockiin suspend after a system-wide transition into the working state.  This flag is
10332fff3f73SRafael J. Wysockitaken into consideration by the PM core along with the power.may_skip_resume
10342fff3f73SRafael J. Wysockistatus bit of the device which is set by pci_pm_suspend_noirq() in certain
10352fff3f73SRafael J. Wysockisituations.  If the PM core determines that the driver's "noirq" and "early"
10362fff3f73SRafael J. Wysockiresume callbacks should be skipped, the dev_pm_skip_resume() helper function
10372fff3f73SRafael J. Wysockiwill return "true" and that will cause pci_pm_resume_noirq() and
10382fff3f73SRafael J. Wysockipci_pm_resume_early() to return upfront without touching the device and
10392fff3f73SRafael J. Wysockiexecuting the driver callbacks.
1040151f4e2bSMauro Carvalho Chehab
1041151f4e2bSMauro Carvalho Chehab3.2. Device Runtime Power Management
1042151f4e2bSMauro Carvalho Chehab------------------------------------
1043151f4e2bSMauro Carvalho Chehab
1044151f4e2bSMauro Carvalho ChehabIn addition to providing device power management callbacks PCI device drivers
1045151f4e2bSMauro Carvalho Chehabare responsible for controlling the runtime power management (runtime PM) of
1046151f4e2bSMauro Carvalho Chehabtheir devices.
1047151f4e2bSMauro Carvalho Chehab
1048151f4e2bSMauro Carvalho ChehabThe PCI device runtime PM is optional, but it is recommended that PCI device
1049151f4e2bSMauro Carvalho Chehabdrivers implement it at least in the cases where there is a reliable way of
1050151f4e2bSMauro Carvalho Chehabverifying that the device is not used (like when the network cable is detached
1051151f4e2bSMauro Carvalho Chehabfrom an Ethernet adapter or there are no devices attached to a USB controller).
1052151f4e2bSMauro Carvalho Chehab
1053151f4e2bSMauro Carvalho ChehabTo support the PCI runtime PM the driver first needs to implement the
1054151f4e2bSMauro Carvalho Chehabruntime_suspend() and runtime_resume() callbacks.  It also may need to implement
1055151f4e2bSMauro Carvalho Chehabthe runtime_idle() callback to prevent the device from being suspended again
1056151f4e2bSMauro Carvalho Chehabevery time right after the runtime_resume() callback has returned
1057151f4e2bSMauro Carvalho Chehab(alternatively, the runtime_suspend() callback will have to check if the
1058151f4e2bSMauro Carvalho Chehabdevice should really be suspended and return -EAGAIN if that is not the case).
1059151f4e2bSMauro Carvalho Chehab
1060151f4e2bSMauro Carvalho ChehabThe runtime PM of PCI devices is enabled by default by the PCI core.  PCI
1061151f4e2bSMauro Carvalho Chehabdevice drivers do not need to enable it and should not attempt to do so.
1062151f4e2bSMauro Carvalho ChehabHowever, it is blocked by pci_pm_init() that runs the pm_runtime_forbid()
1063151f4e2bSMauro Carvalho Chehabhelper function.  In addition to that, the runtime PM usage counter of
1064151f4e2bSMauro Carvalho Chehabeach PCI device is incremented by local_pci_probe() before executing the
1065151f4e2bSMauro Carvalho Chehabprobe callback provided by the device's driver.
1066151f4e2bSMauro Carvalho Chehab
1067151f4e2bSMauro Carvalho ChehabIf a PCI driver implements the runtime PM callbacks and intends to use the
1068151f4e2bSMauro Carvalho Chehabruntime PM framework provided by the PM core and the PCI subsystem, it needs
1069151f4e2bSMauro Carvalho Chehabto decrement the device's runtime PM usage counter in its probe callback
1070151f4e2bSMauro Carvalho Chehabfunction.  If it doesn't do that, the counter will always be different from
1071151f4e2bSMauro Carvalho Chehabzero for the device and it will never be runtime-suspended.  The simplest
1072151f4e2bSMauro Carvalho Chehabway to do that is by calling pm_runtime_put_noidle(), but if the driver
1073151f4e2bSMauro Carvalho Chehabwants to schedule an autosuspend right away, for example, it may call
1074151f4e2bSMauro Carvalho Chehabpm_runtime_put_autosuspend() instead for this purpose.  Generally, it
1075151f4e2bSMauro Carvalho Chehabjust needs to call a function that decrements the devices usage counter
1076151f4e2bSMauro Carvalho Chehabfrom its probe routine to make runtime PM work for the device.
1077151f4e2bSMauro Carvalho Chehab
1078151f4e2bSMauro Carvalho ChehabIt is important to remember that the driver's runtime_suspend() callback
1079151f4e2bSMauro Carvalho Chehabmay be executed right after the usage counter has been decremented, because
1080151f4e2bSMauro Carvalho Chehabuser space may already have caused the pm_runtime_allow() helper function
1081151f4e2bSMauro Carvalho Chehabunblocking the runtime PM of the device to run via sysfs, so the driver must
1082151f4e2bSMauro Carvalho Chehabbe prepared to cope with that.
1083151f4e2bSMauro Carvalho Chehab
1084151f4e2bSMauro Carvalho ChehabThe driver itself should not call pm_runtime_allow(), though.  Instead, it
1085151f4e2bSMauro Carvalho Chehabshould let user space or some platform-specific code do that (user space can
1086151f4e2bSMauro Carvalho Chehabdo it via sysfs as stated above), but it must be prepared to handle the
1087151f4e2bSMauro Carvalho Chehabruntime PM of the device correctly as soon as pm_runtime_allow() is called
1088151f4e2bSMauro Carvalho Chehab(which may happen at any time, even before the driver is loaded).
1089151f4e2bSMauro Carvalho Chehab
1090151f4e2bSMauro Carvalho ChehabWhen the driver's remove callback runs, it has to balance the decrementation
1091151f4e2bSMauro Carvalho Chehabof the device's runtime PM usage counter at the probe time.  For this reason,
1092151f4e2bSMauro Carvalho Chehabif it has decremented the counter in its probe callback, it must run
1093151f4e2bSMauro Carvalho Chehabpm_runtime_get_noresume() in its remove callback.  [Since the core carries
1094151f4e2bSMauro Carvalho Chehabout a runtime resume of the device and bumps up the device's usage counter
1095151f4e2bSMauro Carvalho Chehabbefore running the driver's remove callback, the runtime PM of the device
1096151f4e2bSMauro Carvalho Chehabis effectively disabled for the duration of the remove execution and all
1097151f4e2bSMauro Carvalho Chehabruntime PM helper functions incrementing the device's usage counter are
1098151f4e2bSMauro Carvalho Chehabthen effectively equivalent to pm_runtime_get_noresume().]
1099151f4e2bSMauro Carvalho Chehab
1100151f4e2bSMauro Carvalho ChehabThe runtime PM framework works by processing requests to suspend or resume
1101151f4e2bSMauro Carvalho Chehabdevices, or to check if they are idle (in which cases it is reasonable to
1102151f4e2bSMauro Carvalho Chehabsubsequently request that they be suspended).  These requests are represented
1103151f4e2bSMauro Carvalho Chehabby work items put into the power management workqueue, pm_wq.  Although there
1104151f4e2bSMauro Carvalho Chehabare a few situations in which power management requests are automatically
1105151f4e2bSMauro Carvalho Chehabqueued by the PM core (for example, after processing a request to resume a
1106151f4e2bSMauro Carvalho Chehabdevice the PM core automatically queues a request to check if the device is
1107151f4e2bSMauro Carvalho Chehabidle), device drivers are generally responsible for queuing power management
1108151f4e2bSMauro Carvalho Chehabrequests for their devices.  For this purpose they should use the runtime PM
1109151f4e2bSMauro Carvalho Chehabhelper functions provided by the PM core, discussed in
1110151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst.
1111151f4e2bSMauro Carvalho Chehab
1112151f4e2bSMauro Carvalho ChehabDevices can also be suspended and resumed synchronously, without placing a
1113151f4e2bSMauro Carvalho Chehabrequest into pm_wq.  In the majority of cases this also is done by their
1114151f4e2bSMauro Carvalho Chehabdrivers that use helper functions provided by the PM core for this purpose.
1115151f4e2bSMauro Carvalho Chehab
1116151f4e2bSMauro Carvalho ChehabFor more information on the runtime PM of devices refer to
1117151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst.
1118151f4e2bSMauro Carvalho Chehab
1119151f4e2bSMauro Carvalho Chehab
1120151f4e2bSMauro Carvalho Chehab4. Resources
1121151f4e2bSMauro Carvalho Chehab============
1122151f4e2bSMauro Carvalho Chehab
1123151f4e2bSMauro Carvalho ChehabPCI Local Bus Specification, Rev. 3.0
1124151f4e2bSMauro Carvalho Chehab
1125151f4e2bSMauro Carvalho ChehabPCI Bus Power Management Interface Specification, Rev. 1.2
1126151f4e2bSMauro Carvalho Chehab
1127151f4e2bSMauro Carvalho ChehabAdvanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
1128151f4e2bSMauro Carvalho Chehab
1129151f4e2bSMauro Carvalho ChehabPCI Express Base Specification, Rev. 2.0
1130151f4e2bSMauro Carvalho Chehab
1131151f4e2bSMauro Carvalho ChehabDocumentation/driver-api/pm/devices.rst
1132151f4e2bSMauro Carvalho Chehab
1133151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst
1134