1151f4e2bSMauro Carvalho Chehab==================== 2151f4e2bSMauro Carvalho ChehabPCI Power Management 3151f4e2bSMauro Carvalho Chehab==================== 4151f4e2bSMauro Carvalho Chehab 5151f4e2bSMauro Carvalho ChehabCopyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 6151f4e2bSMauro Carvalho Chehab 7151f4e2bSMauro Carvalho ChehabAn overview of concepts and the Linux kernel's interfaces related to PCI power 8151f4e2bSMauro Carvalho Chehabmanagement. Based on previous work by Patrick Mochel <mochel@transmeta.com> 9151f4e2bSMauro Carvalho Chehab(and others). 10151f4e2bSMauro Carvalho Chehab 11151f4e2bSMauro Carvalho ChehabThis document only covers the aspects of power management specific to PCI 12151f4e2bSMauro Carvalho Chehabdevices. For general description of the kernel's interfaces related to device 13151f4e2bSMauro Carvalho Chehabpower management refer to Documentation/driver-api/pm/devices.rst and 14151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst. 15151f4e2bSMauro Carvalho Chehab 16151f4e2bSMauro Carvalho Chehab.. contents: 17151f4e2bSMauro Carvalho Chehab 18151f4e2bSMauro Carvalho Chehab 1. Hardware and Platform Support for PCI Power Management 19151f4e2bSMauro Carvalho Chehab 2. PCI Subsystem and Device Power Management 20151f4e2bSMauro Carvalho Chehab 3. PCI Device Drivers and Power Management 21151f4e2bSMauro Carvalho Chehab 4. Resources 22151f4e2bSMauro Carvalho Chehab 23151f4e2bSMauro Carvalho Chehab 24151f4e2bSMauro Carvalho Chehab1. Hardware and Platform Support for PCI Power Management 25151f4e2bSMauro Carvalho Chehab========================================================= 26151f4e2bSMauro Carvalho Chehab 27151f4e2bSMauro Carvalho Chehab1.1. Native and Platform-Based Power Management 28151f4e2bSMauro Carvalho Chehab----------------------------------------------- 29151f4e2bSMauro Carvalho Chehab 30151f4e2bSMauro Carvalho ChehabIn general, power management is a feature allowing one to save energy by putting 31151f4e2bSMauro Carvalho Chehabdevices into states in which they draw less power (low-power states) at the 32151f4e2bSMauro Carvalho Chehabprice of reduced functionality or performance. 33151f4e2bSMauro Carvalho Chehab 34151f4e2bSMauro Carvalho ChehabUsually, a device is put into a low-power state when it is underutilized or 35151f4e2bSMauro Carvalho Chehabcompletely inactive. However, when it is necessary to use the device once 36151f4e2bSMauro Carvalho Chehabagain, it has to be put back into the "fully functional" state (full-power 37151f4e2bSMauro Carvalho Chehabstate). This may happen when there are some data for the device to handle or 38151f4e2bSMauro Carvalho Chehabas a result of an external event requiring the device to be active, which may 39151f4e2bSMauro Carvalho Chehabbe signaled by the device itself. 40151f4e2bSMauro Carvalho Chehab 41151f4e2bSMauro Carvalho ChehabPCI devices may be put into low-power states in two ways, by using the device 42151f4e2bSMauro Carvalho Chehabcapabilities introduced by the PCI Bus Power Management Interface Specification, 43151f4e2bSMauro Carvalho Chehabor with the help of platform firmware, such as an ACPI BIOS. In the first 44151f4e2bSMauro Carvalho Chehabapproach, that is referred to as the native PCI power management (native PCI PM) 45151f4e2bSMauro Carvalho Chehabin what follows, the device power state is changed as a result of writing a 46151f4e2bSMauro Carvalho Chehabspecific value into one of its standard configuration registers. The second 47151f4e2bSMauro Carvalho Chehabapproach requires the platform firmware to provide special methods that may be 48151f4e2bSMauro Carvalho Chehabused by the kernel to change the device's power state. 49151f4e2bSMauro Carvalho Chehab 50151f4e2bSMauro Carvalho ChehabDevices supporting the native PCI PM usually can generate wakeup signals called 51151f4e2bSMauro Carvalho ChehabPower Management Events (PMEs) to let the kernel know about external events 52151f4e2bSMauro Carvalho Chehabrequiring the device to be active. After receiving a PME the kernel is supposed 53151f4e2bSMauro Carvalho Chehabto put the device that sent it into the full-power state. However, the PCI Bus 54151f4e2bSMauro Carvalho ChehabPower Management Interface Specification doesn't define any standard method of 55151f4e2bSMauro Carvalho Chehabdelivering the PME from the device to the CPU and the operating system kernel. 56151f4e2bSMauro Carvalho ChehabIt is assumed that the platform firmware will perform this task and therefore, 57151f4e2bSMauro Carvalho Chehabeven though a PCI device is set up to generate PMEs, it also may be necessary to 58151f4e2bSMauro Carvalho Chehabprepare the platform firmware for notifying the CPU of the PMEs coming from the 59151f4e2bSMauro Carvalho Chehabdevice (e.g. by generating interrupts). 60151f4e2bSMauro Carvalho Chehab 61151f4e2bSMauro Carvalho ChehabIn turn, if the methods provided by the platform firmware are used for changing 62151f4e2bSMauro Carvalho Chehabthe power state of a device, usually the platform also provides a method for 63151f4e2bSMauro Carvalho Chehabpreparing the device to generate wakeup signals. In that case, however, it 64151f4e2bSMauro Carvalho Chehaboften also is necessary to prepare the device for generating PMEs using the 65151f4e2bSMauro Carvalho Chehabnative PCI PM mechanism, because the method provided by the platform depends on 66151f4e2bSMauro Carvalho Chehabthat. 67151f4e2bSMauro Carvalho Chehab 68151f4e2bSMauro Carvalho ChehabThus in many situations both the native and the platform-based power management 69151f4e2bSMauro Carvalho Chehabmechanisms have to be used simultaneously to obtain the desired result. 70151f4e2bSMauro Carvalho Chehab 71151f4e2bSMauro Carvalho Chehab1.2. Native PCI Power Management 72151f4e2bSMauro Carvalho Chehab-------------------------------- 73151f4e2bSMauro Carvalho Chehab 74151f4e2bSMauro Carvalho ChehabThe PCI Bus Power Management Interface Specification (PCI PM Spec) was 75151f4e2bSMauro Carvalho Chehabintroduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a 76151f4e2bSMauro Carvalho Chehabstandard interface for performing various operations related to power 77151f4e2bSMauro Carvalho Chehabmanagement. 78151f4e2bSMauro Carvalho Chehab 79151f4e2bSMauro Carvalho ChehabThe implementation of the PCI PM Spec is optional for conventional PCI devices, 80151f4e2bSMauro Carvalho Chehabbut it is mandatory for PCI Express devices. If a device supports the PCI PM 81151f4e2bSMauro Carvalho ChehabSpec, it has an 8 byte power management capability field in its PCI 82151f4e2bSMauro Carvalho Chehabconfiguration space. This field is used to describe and control the standard 83151f4e2bSMauro Carvalho Chehabfeatures related to the native PCI power management. 84151f4e2bSMauro Carvalho Chehab 85151f4e2bSMauro Carvalho ChehabThe PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses 86151f4e2bSMauro Carvalho Chehab(B0-B3). The higher the number, the less power is drawn by the device or bus 87151f4e2bSMauro Carvalho Chehabin that state. However, the higher the number, the longer the latency for 88151f4e2bSMauro Carvalho Chehabthe device or bus to return to the full-power state (D0 or B0, respectively). 89151f4e2bSMauro Carvalho Chehab 90151f4e2bSMauro Carvalho ChehabThere are two variants of the D3 state defined by the specification. The first 91151f4e2bSMauro Carvalho Chehabone is D3hot, referred to as the software accessible D3, because devices can be 92151f4e2bSMauro Carvalho Chehabprogrammed to go into it. The second one, D3cold, is the state that PCI devices 93151f4e2bSMauro Carvalho Chehabare in when the supply voltage (Vcc) is removed from them. It is not possible 94151f4e2bSMauro Carvalho Chehabto program a PCI device to go into D3cold, although there may be a programmable 95151f4e2bSMauro Carvalho Chehabinterface for putting the bus the device is on into a state in which Vcc is 96151f4e2bSMauro Carvalho Chehabremoved from all devices on the bus. 97151f4e2bSMauro Carvalho Chehab 98151f4e2bSMauro Carvalho ChehabPCI bus power management, however, is not supported by the Linux kernel at the 99151f4e2bSMauro Carvalho Chehabtime of this writing and therefore it is not covered by this document. 100151f4e2bSMauro Carvalho Chehab 101151f4e2bSMauro Carvalho ChehabNote that every PCI device can be in the full-power state (D0) or in D3cold, 102151f4e2bSMauro Carvalho Chehabregardless of whether or not it implements the PCI PM Spec. In addition to 103151f4e2bSMauro Carvalho Chehabthat, if the PCI PM Spec is implemented by the device, it must support D3hot 104151f4e2bSMauro Carvalho Chehabas well as D0. The support for the D1 and D2 power states is optional. 105151f4e2bSMauro Carvalho Chehab 106151f4e2bSMauro Carvalho ChehabPCI devices supporting the PCI PM Spec can be programmed to go to any of the 107151f4e2bSMauro Carvalho Chehabsupported low-power states (except for D3cold). While in D1-D3hot the 108151f4e2bSMauro Carvalho Chehabstandard configuration registers of the device must be accessible to software 109151f4e2bSMauro Carvalho Chehab(i.e. the device is required to respond to PCI configuration accesses), although 110151f4e2bSMauro Carvalho Chehabits I/O and memory spaces are then disabled. This allows the device to be 111151f4e2bSMauro Carvalho Chehabprogrammatically put into D0. Thus the kernel can switch the device back and 112151f4e2bSMauro Carvalho Chehabforth between D0 and the supported low-power states (except for D3cold) and the 113151f4e2bSMauro Carvalho Chehabpossible power state transitions the device can undergo are the following: 114151f4e2bSMauro Carvalho Chehab 115151f4e2bSMauro Carvalho Chehab+----------------------------+ 116151f4e2bSMauro Carvalho Chehab| Current State | New State | 117151f4e2bSMauro Carvalho Chehab+----------------------------+ 118151f4e2bSMauro Carvalho Chehab| D0 | D1, D2, D3 | 119151f4e2bSMauro Carvalho Chehab+----------------------------+ 120151f4e2bSMauro Carvalho Chehab| D1 | D2, D3 | 121151f4e2bSMauro Carvalho Chehab+----------------------------+ 122151f4e2bSMauro Carvalho Chehab| D2 | D3 | 123151f4e2bSMauro Carvalho Chehab+----------------------------+ 124151f4e2bSMauro Carvalho Chehab| D1, D2, D3 | D0 | 125151f4e2bSMauro Carvalho Chehab+----------------------------+ 126151f4e2bSMauro Carvalho Chehab 127151f4e2bSMauro Carvalho ChehabThe transition from D3cold to D0 occurs when the supply voltage is provided to 128151f4e2bSMauro Carvalho Chehabthe device (i.e. power is restored). In that case the device returns to D0 with 129151f4e2bSMauro Carvalho Chehaba full power-on reset sequence and the power-on defaults are restored to the 130151f4e2bSMauro Carvalho Chehabdevice by hardware just as at initial power up. 131151f4e2bSMauro Carvalho Chehab 132151f4e2bSMauro Carvalho ChehabPCI devices supporting the PCI PM Spec can be programmed to generate PMEs 13385a9b050SBjorn Helgaaswhile in any power state (D0-D3), but they are not required to be capable 13485a9b050SBjorn Helgaasof generating PMEs from all supported power states. In particular, the 135151f4e2bSMauro Carvalho Chehabcapability of generating PMEs from D3cold is optional and depends on the 136151f4e2bSMauro Carvalho Chehabpresence of additional voltage (3.3Vaux) allowing the device to remain 137151f4e2bSMauro Carvalho Chehabsufficiently active to generate a wakeup signal. 138151f4e2bSMauro Carvalho Chehab 139151f4e2bSMauro Carvalho Chehab1.3. ACPI Device Power Management 140151f4e2bSMauro Carvalho Chehab--------------------------------- 141151f4e2bSMauro Carvalho Chehab 142151f4e2bSMauro Carvalho ChehabThe platform firmware support for the power management of PCI devices is 143151f4e2bSMauro Carvalho Chehabsystem-specific. However, if the system in question is compliant with the 144151f4e2bSMauro Carvalho ChehabAdvanced Configuration and Power Interface (ACPI) Specification, like the 145151f4e2bSMauro Carvalho Chehabmajority of x86-based systems, it is supposed to implement device power 146151f4e2bSMauro Carvalho Chehabmanagement interfaces defined by the ACPI standard. 147151f4e2bSMauro Carvalho Chehab 148151f4e2bSMauro Carvalho ChehabFor this purpose the ACPI BIOS provides special functions called "control 149151f4e2bSMauro Carvalho Chehabmethods" that may be executed by the kernel to perform specific tasks, such as 150151f4e2bSMauro Carvalho Chehabputting a device into a low-power state. These control methods are encoded 151151f4e2bSMauro Carvalho Chehabusing special byte-code language called the ACPI Machine Language (AML) and 152151f4e2bSMauro Carvalho Chehabstored in the machine's BIOS. The kernel loads them from the BIOS and executes 153151f4e2bSMauro Carvalho Chehabthem as needed using an AML interpreter that translates the AML byte code into 154151f4e2bSMauro Carvalho Chehabcomputations and memory or I/O space accesses. This way, in theory, a BIOS 155151f4e2bSMauro Carvalho Chehabwriter can provide the kernel with a means to perform actions depending 156151f4e2bSMauro Carvalho Chehabon the system design in a system-specific fashion. 157151f4e2bSMauro Carvalho Chehab 158151f4e2bSMauro Carvalho ChehabACPI control methods may be divided into global control methods, that are not 159151f4e2bSMauro Carvalho Chehabassociated with any particular devices, and device control methods, that have 160151f4e2bSMauro Carvalho Chehabto be defined separately for each device supposed to be handled with the help of 161151f4e2bSMauro Carvalho Chehabthe platform. This means, in particular, that ACPI device control methods can 162151f4e2bSMauro Carvalho Chehabonly be used to handle devices that the BIOS writer knew about in advance. The 163151f4e2bSMauro Carvalho ChehabACPI methods used for device power management fall into that category. 164151f4e2bSMauro Carvalho Chehab 165151f4e2bSMauro Carvalho ChehabThe ACPI specification assumes that devices can be in one of four power states 166151f4e2bSMauro Carvalho Chehablabeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM 167151f4e2bSMauro Carvalho ChehabD0-D3 states (although the difference between D3hot and D3cold is not taken 168151f4e2bSMauro Carvalho Chehabinto account by ACPI). Moreover, for each power state of a device there is a 169151f4e2bSMauro Carvalho Chehabset of power resources that have to be enabled for the device to be put into 170151f4e2bSMauro Carvalho Chehabthat state. These power resources are controlled (i.e. enabled or disabled) 171151f4e2bSMauro Carvalho Chehabwith the help of their own control methods, _ON and _OFF, that have to be 172151f4e2bSMauro Carvalho Chehabdefined individually for each of them. 173151f4e2bSMauro Carvalho Chehab 174151f4e2bSMauro Carvalho ChehabTo put a device into the ACPI power state Dx (where x is a number between 0 and 175151f4e2bSMauro Carvalho Chehab3 inclusive) the kernel is supposed to (1) enable the power resources required 176151f4e2bSMauro Carvalho Chehabby the device in this state using their _ON control methods and (2) execute the 177151f4e2bSMauro Carvalho Chehab_PSx control method defined for the device. In addition to that, if the device 178151f4e2bSMauro Carvalho Chehabis going to be put into a low-power state (D1-D3) and is supposed to generate 179151f4e2bSMauro Carvalho Chehabwakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI 180151f4e2bSMauro Carvalho Chehab3.0) control method defined for it has to be executed before _PSx. Power 181151f4e2bSMauro Carvalho Chehabresources that are not required by the device in the target power state and are 182151f4e2bSMauro Carvalho Chehabnot required any more by any other device should be disabled (by executing their 183151f4e2bSMauro Carvalho Chehab_OFF control methods). If the current power state of the device is D3, it can 184151f4e2bSMauro Carvalho Chehabonly be put into D0 this way. 185151f4e2bSMauro Carvalho Chehab 186151f4e2bSMauro Carvalho ChehabHowever, quite often the power states of devices are changed during a 187151f4e2bSMauro Carvalho Chehabsystem-wide transition into a sleep state or back into the working state. ACPI 188151f4e2bSMauro Carvalho Chehabdefines four system sleep states, S1, S2, S3, and S4, and denotes the system 189151f4e2bSMauro Carvalho Chehabworking state as S0. In general, the target system sleep (or working) state 190151f4e2bSMauro Carvalho Chehabdetermines the highest power (lowest number) state the device can be put 191151f4e2bSMauro Carvalho Chehabinto and the kernel is supposed to obtain this information by executing the 192151f4e2bSMauro Carvalho Chehabdevice's _SxD control method (where x is a number between 0 and 4 inclusive). 193151f4e2bSMauro Carvalho ChehabIf the device is required to wake up the system from the target sleep state, the 194151f4e2bSMauro Carvalho Chehablowest power (highest number) state it can be put into is also determined by the 195151f4e2bSMauro Carvalho Chehabtarget state of the system. The kernel is then supposed to use the device's 196151f4e2bSMauro Carvalho Chehab_SxW control method to obtain the number of that state. It also is supposed to 197151f4e2bSMauro Carvalho Chehabuse the device's _PRW control method to learn which power resources need to be 198151f4e2bSMauro Carvalho Chehabenabled for the device to be able to generate wakeup signals. 199151f4e2bSMauro Carvalho Chehab 200151f4e2bSMauro Carvalho Chehab1.4. Wakeup Signaling 201151f4e2bSMauro Carvalho Chehab--------------------- 202151f4e2bSMauro Carvalho Chehab 203151f4e2bSMauro Carvalho ChehabWakeup signals generated by PCI devices, either as native PCI PMEs, or as 204151f4e2bSMauro Carvalho Chehaba result of the execution of the _DSW (or _PSW) ACPI control method before 205151f4e2bSMauro Carvalho Chehabputting the device into a low-power state, have to be caught and handled as 206151f4e2bSMauro Carvalho Chehabappropriate. If they are sent while the system is in the working state 207151f4e2bSMauro Carvalho Chehab(ACPI S0), they should be translated into interrupts so that the kernel can 208151f4e2bSMauro Carvalho Chehabput the devices generating them into the full-power state and take care of the 209151f4e2bSMauro Carvalho Chehabevents that triggered them. In turn, if they are sent while the system is 210151f4e2bSMauro Carvalho Chehabsleeping, they should cause the system's core logic to trigger wakeup. 211151f4e2bSMauro Carvalho Chehab 212151f4e2bSMauro Carvalho ChehabOn ACPI-based systems wakeup signals sent by conventional PCI devices are 213151f4e2bSMauro Carvalho Chehabconverted into ACPI General-Purpose Events (GPEs) which are hardware signals 214151f4e2bSMauro Carvalho Chehabfrom the system core logic generated in response to various events that need to 215151f4e2bSMauro Carvalho Chehabbe acted upon. Every GPE is associated with one or more sources of potentially 216151f4e2bSMauro Carvalho Chehabinteresting events. In particular, a GPE may be associated with a PCI device 217151f4e2bSMauro Carvalho Chehabcapable of signaling wakeup. The information on the connections between GPEs 218151f4e2bSMauro Carvalho Chehaband event sources is recorded in the system's ACPI BIOS from where it can be 219151f4e2bSMauro Carvalho Chehabread by the kernel. 220151f4e2bSMauro Carvalho Chehab 221151f4e2bSMauro Carvalho ChehabIf a PCI device known to the system's ACPI BIOS signals wakeup, the GPE 222151f4e2bSMauro Carvalho Chehabassociated with it (if there is one) is triggered. The GPEs associated with PCI 223151f4e2bSMauro Carvalho Chehabbridges may also be triggered in response to a wakeup signal from one of the 224151f4e2bSMauro Carvalho Chehabdevices below the bridge (this also is the case for root bridges) and, for 225151f4e2bSMauro Carvalho Chehabexample, native PCI PMEs from devices unknown to the system's ACPI BIOS may be 226151f4e2bSMauro Carvalho Chehabhandled this way. 227151f4e2bSMauro Carvalho Chehab 228151f4e2bSMauro Carvalho ChehabA GPE may be triggered when the system is sleeping (i.e. when it is in one of 229151f4e2bSMauro Carvalho Chehabthe ACPI S1-S4 states), in which case system wakeup is started by its core logic 230151f4e2bSMauro Carvalho Chehab(the device that was the source of the signal causing the system wakeup to occur 231151f4e2bSMauro Carvalho Chehabmay be identified later). The GPEs used in such situations are referred to as 232151f4e2bSMauro Carvalho Chehabwakeup GPEs. 233151f4e2bSMauro Carvalho Chehab 234151f4e2bSMauro Carvalho ChehabUsually, however, GPEs are also triggered when the system is in the working 235151f4e2bSMauro Carvalho Chehabstate (ACPI S0) and in that case the system's core logic generates a System 236151f4e2bSMauro Carvalho ChehabControl Interrupt (SCI) to notify the kernel of the event. Then, the SCI 237151f4e2bSMauro Carvalho Chehabhandler identifies the GPE that caused the interrupt to be generated which, 238151f4e2bSMauro Carvalho Chehabin turn, allows the kernel to identify the source of the event (that may be 239151f4e2bSMauro Carvalho Chehaba PCI device signaling wakeup). The GPEs used for notifying the kernel of 240151f4e2bSMauro Carvalho Chehabevents occurring while the system is in the working state are referred to as 241151f4e2bSMauro Carvalho Chehabruntime GPEs. 242151f4e2bSMauro Carvalho Chehab 243151f4e2bSMauro Carvalho ChehabUnfortunately, there is no standard way of handling wakeup signals sent by 244151f4e2bSMauro Carvalho Chehabconventional PCI devices on systems that are not ACPI-based, but there is one 245151f4e2bSMauro Carvalho Chehabfor PCI Express devices. Namely, the PCI Express Base Specification introduced 246151f4e2bSMauro Carvalho Chehaba native mechanism for converting native PCI PMEs into interrupts generated by 247151f4e2bSMauro Carvalho Chehabroot ports. For conventional PCI devices native PMEs are out-of-band, so they 248151f4e2bSMauro Carvalho Chehabare routed separately and they need not pass through bridges (in principle they 249151f4e2bSMauro Carvalho Chehabmay be routed directly to the system's core logic), but for PCI Express devices 250151f4e2bSMauro Carvalho Chehabthey are in-band messages that have to pass through the PCI Express hierarchy, 251151f4e2bSMauro Carvalho Chehabincluding the root port on the path from the device to the Root Complex. Thus 252151f4e2bSMauro Carvalho Chehabit was possible to introduce a mechanism by which a root port generates an 253151f4e2bSMauro Carvalho Chehabinterrupt whenever it receives a PME message from one of the devices below it. 254151f4e2bSMauro Carvalho ChehabThe PCI Express Requester ID of the device that sent the PME message is then 255151f4e2bSMauro Carvalho Chehabrecorded in one of the root port's configuration registers from where it may be 256151f4e2bSMauro Carvalho Chehabread by the interrupt handler allowing the device to be identified. [PME 257151f4e2bSMauro Carvalho Chehabmessages sent by PCI Express endpoints integrated with the Root Complex don't 258151f4e2bSMauro Carvalho Chehabpass through root ports, but instead they cause a Root Complex Event Collector 259151f4e2bSMauro Carvalho Chehab(if there is one) to generate interrupts.] 260151f4e2bSMauro Carvalho Chehab 261151f4e2bSMauro Carvalho ChehabIn principle the native PCI Express PME signaling may also be used on ACPI-based 262151f4e2bSMauro Carvalho Chehabsystems along with the GPEs, but to use it the kernel has to ask the system's 263151f4e2bSMauro Carvalho ChehabACPI BIOS to release control of root port configuration registers. The ACPI 264151f4e2bSMauro Carvalho ChehabBIOS, however, is not required to allow the kernel to control these registers 265151f4e2bSMauro Carvalho Chehaband if it doesn't do that, the kernel must not modify their contents. Of course 266151f4e2bSMauro Carvalho Chehabthe native PCI Express PME signaling cannot be used by the kernel in that case. 267151f4e2bSMauro Carvalho Chehab 268151f4e2bSMauro Carvalho Chehab 269151f4e2bSMauro Carvalho Chehab2. PCI Subsystem and Device Power Management 270151f4e2bSMauro Carvalho Chehab============================================ 271151f4e2bSMauro Carvalho Chehab 272151f4e2bSMauro Carvalho Chehab2.1. Device Power Management Callbacks 273151f4e2bSMauro Carvalho Chehab-------------------------------------- 274151f4e2bSMauro Carvalho Chehab 275151f4e2bSMauro Carvalho ChehabThe PCI Subsystem participates in the power management of PCI devices in a 276151f4e2bSMauro Carvalho Chehabnumber of ways. First of all, it provides an intermediate code layer between 277151f4e2bSMauro Carvalho Chehabthe device power management core (PM core) and PCI device drivers. 278151f4e2bSMauro Carvalho ChehabSpecifically, the pm field of the PCI subsystem's struct bus_type object, 279151f4e2bSMauro Carvalho Chehabpci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing 280151f4e2bSMauro Carvalho Chehabpointers to several device power management callbacks:: 281151f4e2bSMauro Carvalho Chehab 282151f4e2bSMauro Carvalho Chehab const struct dev_pm_ops pci_dev_pm_ops = { 283151f4e2bSMauro Carvalho Chehab .prepare = pci_pm_prepare, 284151f4e2bSMauro Carvalho Chehab .complete = pci_pm_complete, 285151f4e2bSMauro Carvalho Chehab .suspend = pci_pm_suspend, 286151f4e2bSMauro Carvalho Chehab .resume = pci_pm_resume, 287151f4e2bSMauro Carvalho Chehab .freeze = pci_pm_freeze, 288151f4e2bSMauro Carvalho Chehab .thaw = pci_pm_thaw, 289151f4e2bSMauro Carvalho Chehab .poweroff = pci_pm_poweroff, 290151f4e2bSMauro Carvalho Chehab .restore = pci_pm_restore, 291151f4e2bSMauro Carvalho Chehab .suspend_noirq = pci_pm_suspend_noirq, 292151f4e2bSMauro Carvalho Chehab .resume_noirq = pci_pm_resume_noirq, 293151f4e2bSMauro Carvalho Chehab .freeze_noirq = pci_pm_freeze_noirq, 294151f4e2bSMauro Carvalho Chehab .thaw_noirq = pci_pm_thaw_noirq, 295151f4e2bSMauro Carvalho Chehab .poweroff_noirq = pci_pm_poweroff_noirq, 296151f4e2bSMauro Carvalho Chehab .restore_noirq = pci_pm_restore_noirq, 297151f4e2bSMauro Carvalho Chehab .runtime_suspend = pci_pm_runtime_suspend, 298151f4e2bSMauro Carvalho Chehab .runtime_resume = pci_pm_runtime_resume, 299151f4e2bSMauro Carvalho Chehab .runtime_idle = pci_pm_runtime_idle, 300151f4e2bSMauro Carvalho Chehab }; 301151f4e2bSMauro Carvalho Chehab 302151f4e2bSMauro Carvalho ChehabThese callbacks are executed by the PM core in various situations related to 303151f4e2bSMauro Carvalho Chehabdevice power management and they, in turn, execute power management callbacks 304151f4e2bSMauro Carvalho Chehabprovided by PCI device drivers. They also perform power management operations 305151f4e2bSMauro Carvalho Chehabinvolving some standard configuration registers of PCI devices that device 306151f4e2bSMauro Carvalho Chehabdrivers need not know or care about. 307151f4e2bSMauro Carvalho Chehab 308151f4e2bSMauro Carvalho ChehabThe structure representing a PCI device, struct pci_dev, contains several fields 309151f4e2bSMauro Carvalho Chehabthat these callbacks operate on:: 310151f4e2bSMauro Carvalho Chehab 311151f4e2bSMauro Carvalho Chehab struct pci_dev { 312151f4e2bSMauro Carvalho Chehab ... 313151f4e2bSMauro Carvalho Chehab pci_power_t current_state; /* Current operating state. */ 314151f4e2bSMauro Carvalho Chehab int pm_cap; /* PM capability offset in the 315151f4e2bSMauro Carvalho Chehab configuration space */ 316151f4e2bSMauro Carvalho Chehab unsigned int pme_support:5; /* Bitmask of states from which PME# 317151f4e2bSMauro Carvalho Chehab can be generated */ 318*7c4300ebSMario Limonciello unsigned int pme_poll:1; /* Poll device's PME status bit */ 319151f4e2bSMauro Carvalho Chehab unsigned int d1_support:1; /* Low power state D1 is supported */ 320151f4e2bSMauro Carvalho Chehab unsigned int d2_support:1; /* Low power state D2 is supported */ 321151f4e2bSMauro Carvalho Chehab unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ 322151f4e2bSMauro Carvalho Chehab unsigned int wakeup_prepared:1; /* Device prepared for wake up */ 3233789af9aSKrzysztof Wilczyński unsigned int d3hot_delay; /* D3hot->D0 transition time in ms */ 324151f4e2bSMauro Carvalho Chehab ... 325151f4e2bSMauro Carvalho Chehab }; 326151f4e2bSMauro Carvalho Chehab 327151f4e2bSMauro Carvalho ChehabThey also indirectly use some fields of the struct device that is embedded in 328151f4e2bSMauro Carvalho Chehabstruct pci_dev. 329151f4e2bSMauro Carvalho Chehab 330151f4e2bSMauro Carvalho Chehab2.2. Device Initialization 331151f4e2bSMauro Carvalho Chehab-------------------------- 332151f4e2bSMauro Carvalho Chehab 333151f4e2bSMauro Carvalho ChehabThe PCI subsystem's first task related to device power management is to 334151f4e2bSMauro Carvalho Chehabprepare the device for power management and initialize the fields of struct 335151f4e2bSMauro Carvalho Chehabpci_dev used for this purpose. This happens in two functions defined in 336151f4e2bSMauro Carvalho Chehabdrivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init(). 337151f4e2bSMauro Carvalho Chehab 338151f4e2bSMauro Carvalho ChehabThe first of these functions checks if the device supports native PCI PM 339151f4e2bSMauro Carvalho Chehaband if that's the case the offset of its power management capability structure 340151f4e2bSMauro Carvalho Chehabin the configuration space is stored in the pm_cap field of the device's struct 341151f4e2bSMauro Carvalho Chehabpci_dev object. Next, the function checks which PCI low-power states are 342151f4e2bSMauro Carvalho Chehabsupported by the device and from which low-power states the device can generate 343151f4e2bSMauro Carvalho Chehabnative PCI PMEs. The power management fields of the device's struct pci_dev and 344151f4e2bSMauro Carvalho Chehabthe struct device embedded in it are updated accordingly and the generation of 345151f4e2bSMauro Carvalho ChehabPMEs by the device is disabled. 346151f4e2bSMauro Carvalho Chehab 347151f4e2bSMauro Carvalho ChehabThe second function checks if the device can be prepared to signal wakeup with 348151f4e2bSMauro Carvalho Chehabthe help of the platform firmware, such as the ACPI BIOS. If that is the case, 349151f4e2bSMauro Carvalho Chehabthe function updates the wakeup fields in struct device embedded in the 350151f4e2bSMauro Carvalho Chehabdevice's struct pci_dev and uses the firmware-provided method to prevent the 351151f4e2bSMauro Carvalho Chehabdevice from signaling wakeup. 352151f4e2bSMauro Carvalho Chehab 353151f4e2bSMauro Carvalho ChehabAt this point the device is ready for power management. For driverless devices, 354151f4e2bSMauro Carvalho Chehabhowever, this functionality is limited to a few basic operations carried out 355151f4e2bSMauro Carvalho Chehabduring system-wide transitions to a sleep state and back to the working state. 356151f4e2bSMauro Carvalho Chehab 357151f4e2bSMauro Carvalho Chehab2.3. Runtime Device Power Management 358151f4e2bSMauro Carvalho Chehab------------------------------------ 359151f4e2bSMauro Carvalho Chehab 360151f4e2bSMauro Carvalho ChehabThe PCI subsystem plays a vital role in the runtime power management of PCI 361151f4e2bSMauro Carvalho Chehabdevices. For this purpose it uses the general runtime power management 362151f4e2bSMauro Carvalho Chehab(runtime PM) framework described in Documentation/power/runtime_pm.rst. 363151f4e2bSMauro Carvalho ChehabNamely, it provides subsystem-level callbacks:: 364151f4e2bSMauro Carvalho Chehab 365151f4e2bSMauro Carvalho Chehab pci_pm_runtime_suspend() 366151f4e2bSMauro Carvalho Chehab pci_pm_runtime_resume() 367151f4e2bSMauro Carvalho Chehab pci_pm_runtime_idle() 368151f4e2bSMauro Carvalho Chehab 369151f4e2bSMauro Carvalho Chehabthat are executed by the core runtime PM routines. It also implements the 370151f4e2bSMauro Carvalho Chehabentire mechanics necessary for handling runtime wakeup signals from PCI devices 371151f4e2bSMauro Carvalho Chehabin low-power states, which at the time of this writing works for both the native 372151f4e2bSMauro Carvalho ChehabPCI Express PME signaling and the ACPI GPE-based wakeup signaling described in 373151f4e2bSMauro Carvalho ChehabSection 1. 374151f4e2bSMauro Carvalho Chehab 375151f4e2bSMauro Carvalho ChehabFirst, a PCI device is put into a low-power state, or suspended, with the help 376151f4e2bSMauro Carvalho Chehabof pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call 377151f4e2bSMauro Carvalho Chehabpci_pm_runtime_suspend() to do the actual job. For this to work, the device's 378151f4e2bSMauro Carvalho Chehabdriver has to provide a pm->runtime_suspend() callback (see below), which is 379151f4e2bSMauro Carvalho Chehabrun by pci_pm_runtime_suspend() as the first action. If the driver's callback 380151f4e2bSMauro Carvalho Chehabreturns successfully, the device's standard configuration registers are saved, 381151f4e2bSMauro Carvalho Chehabthe device is prepared to generate wakeup signals and, finally, it is put into 382151f4e2bSMauro Carvalho Chehabthe target low-power state. 383151f4e2bSMauro Carvalho Chehab 384151f4e2bSMauro Carvalho ChehabThe low-power state to put the device into is the lowest-power (highest number) 385151f4e2bSMauro Carvalho Chehabstate from which it can signal wakeup. The exact method of signaling wakeup is 386151f4e2bSMauro Carvalho Chehabsystem-dependent and is determined by the PCI subsystem on the basis of the 387151f4e2bSMauro Carvalho Chehabreported capabilities of the device and the platform firmware. To prepare the 388151f4e2bSMauro Carvalho Chehabdevice for signaling wakeup and put it into the selected low-power state, the 389151f4e2bSMauro Carvalho ChehabPCI subsystem can use the platform firmware as well as the device's native PCI 390151f4e2bSMauro Carvalho ChehabPM capabilities, if supported. 391151f4e2bSMauro Carvalho Chehab 392151f4e2bSMauro Carvalho ChehabIt is expected that the device driver's pm->runtime_suspend() callback will 393151f4e2bSMauro Carvalho Chehabnot attempt to prepare the device for signaling wakeup or to put it into a 394151f4e2bSMauro Carvalho Chehablow-power state. The driver ought to leave these tasks to the PCI subsystem 395151f4e2bSMauro Carvalho Chehabthat has all of the information necessary to perform them. 396151f4e2bSMauro Carvalho Chehab 397151f4e2bSMauro Carvalho ChehabA suspended device is brought back into the "active" state, or resumed, 398151f4e2bSMauro Carvalho Chehabwith the help of pm_request_resume() or pm_runtime_resume() which both call 399151f4e2bSMauro Carvalho Chehabpci_pm_runtime_resume() for PCI devices. Again, this only works if the device's 400151f4e2bSMauro Carvalho Chehabdriver provides a pm->runtime_resume() callback (see below). However, before 401151f4e2bSMauro Carvalho Chehabthe driver's callback is executed, pci_pm_runtime_resume() brings the device 402151f4e2bSMauro Carvalho Chehabback into the full-power state, prevents it from signaling wakeup while in that 403151f4e2bSMauro Carvalho Chehabstate and restores its standard configuration registers. Thus the driver's 404151f4e2bSMauro Carvalho Chehabcallback need not worry about the PCI-specific aspects of the device resume. 405151f4e2bSMauro Carvalho Chehab 406151f4e2bSMauro Carvalho ChehabNote that generally pci_pm_runtime_resume() may be called in two different 407151f4e2bSMauro Carvalho Chehabsituations. First, it may be called at the request of the device's driver, for 408151f4e2bSMauro Carvalho Chehabexample if there are some data for it to process. Second, it may be called 409151f4e2bSMauro Carvalho Chehabas a result of a wakeup signal from the device itself (this sometimes is 410151f4e2bSMauro Carvalho Chehabreferred to as "remote wakeup"). Of course, for this purpose the wakeup signal 411151f4e2bSMauro Carvalho Chehabis handled in one of the ways described in Section 1 and finally converted into 412151f4e2bSMauro Carvalho Chehaba notification for the PCI subsystem after the source device has been 413151f4e2bSMauro Carvalho Chehabidentified. 414151f4e2bSMauro Carvalho Chehab 415151f4e2bSMauro Carvalho ChehabThe pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle() 416151f4e2bSMauro Carvalho Chehaband pm_request_idle(), executes the device driver's pm->runtime_idle() 417151f4e2bSMauro Carvalho Chehabcallback, if defined, and if that callback doesn't return error code (or is not 418151f4e2bSMauro Carvalho Chehabpresent at all), suspends the device with the help of pm_runtime_suspend(). 419151f4e2bSMauro Carvalho ChehabSometimes pci_pm_runtime_idle() is called automatically by the PM core (for 420151f4e2bSMauro Carvalho Chehabexample, it is called right after the device has just been resumed), in which 421151f4e2bSMauro Carvalho Chehabcases it is expected to suspend the device if that makes sense. Usually, 422151f4e2bSMauro Carvalho Chehabhowever, the PCI subsystem doesn't really know if the device really can be 423151f4e2bSMauro Carvalho Chehabsuspended, so it lets the device's driver decide by running its 424151f4e2bSMauro Carvalho Chehabpm->runtime_idle() callback. 425151f4e2bSMauro Carvalho Chehab 426151f4e2bSMauro Carvalho Chehab2.4. System-Wide Power Transitions 427151f4e2bSMauro Carvalho Chehab---------------------------------- 428151f4e2bSMauro Carvalho ChehabThere are a few different types of system-wide power transitions, described in 429b64cf7a1SBjorn HelgaasDocumentation/driver-api/pm/devices.rst. Each of them requires devices to be 430b64cf7a1SBjorn Helgaashandled in a specific way and the PM core executes subsystem-level power 431b64cf7a1SBjorn Helgaasmanagement callbacks for this purpose. They are executed in phases such that 432b64cf7a1SBjorn Helgaaseach phase involves executing the same subsystem-level callback for every device 433b64cf7a1SBjorn Helgaasbelonging to the given subsystem before the next phase begins. These phases 434b64cf7a1SBjorn Helgaasalways run after tasks have been frozen. 435151f4e2bSMauro Carvalho Chehab 436151f4e2bSMauro Carvalho Chehab2.4.1. System Suspend 437151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^ 438151f4e2bSMauro Carvalho Chehab 439151f4e2bSMauro Carvalho ChehabWhen the system is going into a sleep state in which the contents of memory will 440151f4e2bSMauro Carvalho Chehabbe preserved, such as one of the ACPI sleep states S1-S3, the phases are: 441151f4e2bSMauro Carvalho Chehab 442151f4e2bSMauro Carvalho Chehab prepare, suspend, suspend_noirq. 443151f4e2bSMauro Carvalho Chehab 444151f4e2bSMauro Carvalho ChehabThe following PCI bus type's callbacks, respectively, are used in these phases:: 445151f4e2bSMauro Carvalho Chehab 446151f4e2bSMauro Carvalho Chehab pci_pm_prepare() 447151f4e2bSMauro Carvalho Chehab pci_pm_suspend() 448151f4e2bSMauro Carvalho Chehab pci_pm_suspend_noirq() 449151f4e2bSMauro Carvalho Chehab 450151f4e2bSMauro Carvalho ChehabThe pci_pm_prepare() routine first puts the device into the "fully functional" 451151f4e2bSMauro Carvalho Chehabstate with the help of pm_runtime_resume(). Then, it executes the device 452151f4e2bSMauro Carvalho Chehabdriver's pm->prepare() callback if defined (i.e. if the driver's struct 453151f4e2bSMauro Carvalho Chehabdev_pm_ops object is present and the prepare pointer in that object is valid). 454151f4e2bSMauro Carvalho Chehab 455151f4e2bSMauro Carvalho ChehabThe pci_pm_suspend() routine first checks if the device's driver implements 456151f4e2bSMauro Carvalho Chehablegacy PCI suspend routines (see Section 3), in which case the driver's legacy 457151f4e2bSMauro Carvalho Chehabsuspend callback is executed, if present, and its result is returned. Next, if 458151f4e2bSMauro Carvalho Chehabthe device's driver doesn't provide a struct dev_pm_ops object (containing 459151f4e2bSMauro Carvalho Chehabpointers to the driver's callbacks), pci_pm_default_suspend() is called, which 460151f4e2bSMauro Carvalho Chehabsimply turns off the device's bus master capability and runs 461151f4e2bSMauro Carvalho Chehabpcibios_disable_device() to disable it, unless the device is a bridge (PCI 462151f4e2bSMauro Carvalho Chehabbridges are ignored by this routine). Next, the device driver's pm->suspend() 463151f4e2bSMauro Carvalho Chehabcallback is executed, if defined, and its result is returned if it fails. 464151f4e2bSMauro Carvalho ChehabFinally, pci_fixup_device() is called to apply hardware suspend quirks related 465151f4e2bSMauro Carvalho Chehabto the device if necessary. 466151f4e2bSMauro Carvalho Chehab 467151f4e2bSMauro Carvalho ChehabNote that the suspend phase is carried out asynchronously for PCI devices, so 468151f4e2bSMauro Carvalho Chehabthe pci_pm_suspend() callback may be executed in parallel for any pair of PCI 469151f4e2bSMauro Carvalho Chehabdevices that don't depend on each other in a known way (i.e. none of the paths 470151f4e2bSMauro Carvalho Chehabin the device tree from the root bridge to a leaf device contains both of them). 471151f4e2bSMauro Carvalho Chehab 472151f4e2bSMauro Carvalho ChehabThe pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has 473151f4e2bSMauro Carvalho Chehabbeen called, which means that the device driver's interrupt handler won't be 474151f4e2bSMauro Carvalho Chehabinvoked while this routine is running. It first checks if the device's driver 475151f4e2bSMauro Carvalho Chehabimplements legacy PCI suspends routines (Section 3), in which case the legacy 476151f4e2bSMauro Carvalho Chehablate suspend routine is called and its result is returned (the standard 477151f4e2bSMauro Carvalho Chehabconfiguration registers of the device are saved if the driver's callback hasn't 478151f4e2bSMauro Carvalho Chehabdone that). Second, if the device driver's struct dev_pm_ops object is not 479151f4e2bSMauro Carvalho Chehabpresent, the device's standard configuration registers are saved and the routine 480151f4e2bSMauro Carvalho Chehabreturns success. Otherwise the device driver's pm->suspend_noirq() callback is 481151f4e2bSMauro Carvalho Chehabexecuted, if present, and its result is returned if it fails. Next, if the 482151f4e2bSMauro Carvalho Chehabdevice's standard configuration registers haven't been saved yet (one of the 483151f4e2bSMauro Carvalho Chehabdevice driver's callbacks executed before might do that), pci_pm_suspend_noirq() 484151f4e2bSMauro Carvalho Chehabsaves them, prepares the device to signal wakeup (if necessary) and puts it into 485151f4e2bSMauro Carvalho Chehaba low-power state. 486151f4e2bSMauro Carvalho Chehab 487151f4e2bSMauro Carvalho ChehabThe low-power state to put the device into is the lowest-power (highest number) 488151f4e2bSMauro Carvalho Chehabstate from which it can signal wakeup while the system is in the target sleep 489151f4e2bSMauro Carvalho Chehabstate. Just like in the runtime PM case described above, the mechanism of 490151f4e2bSMauro Carvalho Chehabsignaling wakeup is system-dependent and determined by the PCI subsystem, which 491151f4e2bSMauro Carvalho Chehabis also responsible for preparing the device to signal wakeup from the system's 492151f4e2bSMauro Carvalho Chehabtarget sleep state as appropriate. 493151f4e2bSMauro Carvalho Chehab 494151f4e2bSMauro Carvalho ChehabPCI device drivers (that don't implement legacy power management callbacks) are 495151f4e2bSMauro Carvalho Chehabgenerally not expected to prepare devices for signaling wakeup or to put them 496151f4e2bSMauro Carvalho Chehabinto low-power states. However, if one of the driver's suspend callbacks 497151f4e2bSMauro Carvalho Chehab(pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration 498151f4e2bSMauro Carvalho Chehabregisters, pci_pm_suspend_noirq() will assume that the device has been prepared 499151f4e2bSMauro Carvalho Chehabto signal wakeup and put into a low-power state by the driver (the driver is 500151f4e2bSMauro Carvalho Chehabthen assumed to have used the helper functions provided by the PCI subsystem for 501151f4e2bSMauro Carvalho Chehabthis purpose). PCI device drivers are not encouraged to do that, but in some 502151f4e2bSMauro Carvalho Chehabrare cases doing that in the driver may be the optimum approach. 503151f4e2bSMauro Carvalho Chehab 504151f4e2bSMauro Carvalho Chehab2.4.2. System Resume 505151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^ 506151f4e2bSMauro Carvalho Chehab 507151f4e2bSMauro Carvalho ChehabWhen the system is undergoing a transition from a sleep state in which the 508151f4e2bSMauro Carvalho Chehabcontents of memory have been preserved, such as one of the ACPI sleep states 509151f4e2bSMauro Carvalho ChehabS1-S3, into the working state (ACPI S0), the phases are: 510151f4e2bSMauro Carvalho Chehab 511151f4e2bSMauro Carvalho Chehab resume_noirq, resume, complete. 512151f4e2bSMauro Carvalho Chehab 513151f4e2bSMauro Carvalho ChehabThe following PCI bus type's callbacks, respectively, are executed in these 514151f4e2bSMauro Carvalho Chehabphases:: 515151f4e2bSMauro Carvalho Chehab 516151f4e2bSMauro Carvalho Chehab pci_pm_resume_noirq() 517151f4e2bSMauro Carvalho Chehab pci_pm_resume() 518151f4e2bSMauro Carvalho Chehab pci_pm_complete() 519151f4e2bSMauro Carvalho Chehab 520151f4e2bSMauro Carvalho ChehabThe pci_pm_resume_noirq() routine first puts the device into the full-power 521151f4e2bSMauro Carvalho Chehabstate, restores its standard configuration registers and applies early resume 522151f4e2bSMauro Carvalho Chehabhardware quirks related to the device, if necessary. This is done 523151f4e2bSMauro Carvalho Chehabunconditionally, regardless of whether or not the device's driver implements 524151f4e2bSMauro Carvalho Chehablegacy PCI power management callbacks (this way all PCI devices are in the 525151f4e2bSMauro Carvalho Chehabfull-power state and their standard configuration registers have been restored 526151f4e2bSMauro Carvalho Chehabwhen their interrupt handlers are invoked for the first time during resume, 527151f4e2bSMauro Carvalho Chehabwhich allows the kernel to avoid problems with the handling of shared interrupts 528151f4e2bSMauro Carvalho Chehabby drivers whose devices are still suspended). If legacy PCI power management 529151f4e2bSMauro Carvalho Chehabcallbacks (see Section 3) are implemented by the device's driver, the legacy 530151f4e2bSMauro Carvalho Chehabearly resume callback is executed and its result is returned. Otherwise, the 531151f4e2bSMauro Carvalho Chehabdevice driver's pm->resume_noirq() callback is executed, if defined, and its 532151f4e2bSMauro Carvalho Chehabresult is returned. 533151f4e2bSMauro Carvalho Chehab 534151f4e2bSMauro Carvalho ChehabThe pci_pm_resume() routine first checks if the device's standard configuration 535151f4e2bSMauro Carvalho Chehabregisters have been restored and restores them if that's not the case (this 536151f4e2bSMauro Carvalho Chehabonly is necessary in the error path during a failing suspend). Next, resume 537151f4e2bSMauro Carvalho Chehabhardware quirks related to the device are applied, if necessary, and if the 538151f4e2bSMauro Carvalho Chehabdevice's driver implements legacy PCI power management callbacks (see 539151f4e2bSMauro Carvalho ChehabSection 3), the driver's legacy resume callback is executed and its result is 540151f4e2bSMauro Carvalho Chehabreturned. Otherwise, the device's wakeup signaling mechanisms are blocked and 541151f4e2bSMauro Carvalho Chehabits driver's pm->resume() callback is executed, if defined (the callback's 542151f4e2bSMauro Carvalho Chehabresult is then returned). 543151f4e2bSMauro Carvalho Chehab 544151f4e2bSMauro Carvalho ChehabThe resume phase is carried out asynchronously for PCI devices, like the 545151f4e2bSMauro Carvalho Chehabsuspend phase described above, which means that if two PCI devices don't depend 546151f4e2bSMauro Carvalho Chehabon each other in a known way, the pci_pm_resume() routine may be executed for 547151f4e2bSMauro Carvalho Chehabthe both of them in parallel. 548151f4e2bSMauro Carvalho Chehab 549151f4e2bSMauro Carvalho ChehabThe pci_pm_complete() routine only executes the device driver's pm->complete() 550151f4e2bSMauro Carvalho Chehabcallback, if defined. 551151f4e2bSMauro Carvalho Chehab 552151f4e2bSMauro Carvalho Chehab2.4.3. System Hibernation 553151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^ 554151f4e2bSMauro Carvalho Chehab 555151f4e2bSMauro Carvalho ChehabSystem hibernation is more complicated than system suspend, because it requires 556151f4e2bSMauro Carvalho Chehaba system image to be created and written into a persistent storage medium. The 557151f4e2bSMauro Carvalho Chehabimage is created atomically and all devices are quiesced, or frozen, before that 558151f4e2bSMauro Carvalho Chehabhappens. 559151f4e2bSMauro Carvalho Chehab 560151f4e2bSMauro Carvalho ChehabThe freezing of devices is carried out after enough memory has been freed (at 561151f4e2bSMauro Carvalho Chehabthe time of this writing the image creation requires at least 50% of system RAM 562151f4e2bSMauro Carvalho Chehabto be free) in the following three phases: 563151f4e2bSMauro Carvalho Chehab 564151f4e2bSMauro Carvalho Chehab prepare, freeze, freeze_noirq 565151f4e2bSMauro Carvalho Chehab 566151f4e2bSMauro Carvalho Chehabthat correspond to the PCI bus type's callbacks:: 567151f4e2bSMauro Carvalho Chehab 568151f4e2bSMauro Carvalho Chehab pci_pm_prepare() 569151f4e2bSMauro Carvalho Chehab pci_pm_freeze() 570151f4e2bSMauro Carvalho Chehab pci_pm_freeze_noirq() 571151f4e2bSMauro Carvalho Chehab 572151f4e2bSMauro Carvalho ChehabThis means that the prepare phase is exactly the same as for system suspend. 573151f4e2bSMauro Carvalho ChehabThe other two phases, however, are different. 574151f4e2bSMauro Carvalho Chehab 575151f4e2bSMauro Carvalho ChehabThe pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs 576151f4e2bSMauro Carvalho Chehabthe device driver's pm->freeze() callback, if defined, instead of pm->suspend(), 577151f4e2bSMauro Carvalho Chehaband it doesn't apply the suspend-related hardware quirks. It is executed 578151f4e2bSMauro Carvalho Chehabasynchronously for different PCI devices that don't depend on each other in a 579151f4e2bSMauro Carvalho Chehabknown way. 580151f4e2bSMauro Carvalho Chehab 581151f4e2bSMauro Carvalho ChehabThe pci_pm_freeze_noirq() routine, in turn, is similar to 582151f4e2bSMauro Carvalho Chehabpci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq() 583151f4e2bSMauro Carvalho Chehabroutine instead of pm->suspend_noirq(). It also doesn't attempt to prepare the 584151f4e2bSMauro Carvalho Chehabdevice for signaling wakeup and put it into a low-power state. Still, it saves 585151f4e2bSMauro Carvalho Chehabthe device's standard configuration registers if they haven't been saved by one 586151f4e2bSMauro Carvalho Chehabof the driver's callbacks. 587151f4e2bSMauro Carvalho Chehab 588151f4e2bSMauro Carvalho ChehabOnce the image has been created, it has to be saved. However, at this point all 589151f4e2bSMauro Carvalho Chehabdevices are frozen and they cannot handle I/O, while their ability to handle 590151f4e2bSMauro Carvalho ChehabI/O is obviously necessary for the image saving. Thus they have to be brought 591151f4e2bSMauro Carvalho Chehabback to the fully functional state and this is done in the following phases: 592151f4e2bSMauro Carvalho Chehab 593151f4e2bSMauro Carvalho Chehab thaw_noirq, thaw, complete 594151f4e2bSMauro Carvalho Chehab 595151f4e2bSMauro Carvalho Chehabusing the following PCI bus type's callbacks:: 596151f4e2bSMauro Carvalho Chehab 597151f4e2bSMauro Carvalho Chehab pci_pm_thaw_noirq() 598151f4e2bSMauro Carvalho Chehab pci_pm_thaw() 599151f4e2bSMauro Carvalho Chehab pci_pm_complete() 600151f4e2bSMauro Carvalho Chehab 601151f4e2bSMauro Carvalho Chehabrespectively. 602151f4e2bSMauro Carvalho Chehab 603dc68b406SBjorn HelgaasThe first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(). 604dc68b406SBjorn HelgaasIt puts the device into the full power state and restores its standard 605dc68b406SBjorn Helgaasconfiguration registers. It also executes the device driver's pm->thaw_noirq() 606dc68b406SBjorn Helgaascallback, if defined, instead of pm->resume_noirq(). 607151f4e2bSMauro Carvalho Chehab 608151f4e2bSMauro Carvalho ChehabThe pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device 609151f4e2bSMauro Carvalho Chehabdriver's pm->thaw() callback instead of pm->resume(). It is executed 610151f4e2bSMauro Carvalho Chehabasynchronously for different PCI devices that don't depend on each other in a 611151f4e2bSMauro Carvalho Chehabknown way. 612151f4e2bSMauro Carvalho Chehab 613dc68b406SBjorn HelgaasThe complete phase is the same as for system resume. 614151f4e2bSMauro Carvalho Chehab 615151f4e2bSMauro Carvalho ChehabAfter saving the image, devices need to be powered down before the system can 616151f4e2bSMauro Carvalho Chehabenter the target sleep state (ACPI S4 for ACPI-based systems). This is done in 617151f4e2bSMauro Carvalho Chehabthree phases: 618151f4e2bSMauro Carvalho Chehab 619151f4e2bSMauro Carvalho Chehab prepare, poweroff, poweroff_noirq 620151f4e2bSMauro Carvalho Chehab 621151f4e2bSMauro Carvalho Chehabwhere the prepare phase is exactly the same as for system suspend. The other 622151f4e2bSMauro Carvalho Chehabtwo phases are analogous to the suspend and suspend_noirq phases, respectively. 623151f4e2bSMauro Carvalho ChehabThe PCI subsystem-level callbacks they correspond to:: 624151f4e2bSMauro Carvalho Chehab 625151f4e2bSMauro Carvalho Chehab pci_pm_poweroff() 626151f4e2bSMauro Carvalho Chehab pci_pm_poweroff_noirq() 627151f4e2bSMauro Carvalho Chehab 628151f4e2bSMauro Carvalho Chehabwork in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively, 629151f4e2bSMauro Carvalho Chehabalthough they don't attempt to save the device's standard configuration 630151f4e2bSMauro Carvalho Chehabregisters. 631151f4e2bSMauro Carvalho Chehab 632151f4e2bSMauro Carvalho Chehab2.4.4. System Restore 633151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^ 634151f4e2bSMauro Carvalho Chehab 635151f4e2bSMauro Carvalho ChehabSystem restore requires a hibernation image to be loaded into memory and the 636151f4e2bSMauro Carvalho Chehabpre-hibernation memory contents to be restored before the pre-hibernation system 637151f4e2bSMauro Carvalho Chehabactivity can be resumed. 638151f4e2bSMauro Carvalho Chehab 639b64cf7a1SBjorn HelgaasAs described in Documentation/driver-api/pm/devices.rst, the hibernation image 640b64cf7a1SBjorn Helgaasis loaded into memory by a fresh instance of the kernel, called the boot kernel, 641b64cf7a1SBjorn Helgaaswhich in turn is loaded and run by a boot loader in the usual way. After the 642b64cf7a1SBjorn Helgaasboot kernel has loaded the image, it needs to replace its own code and data with 643b64cf7a1SBjorn Helgaasthe code and data of the "hibernated" kernel stored within the image, called the 644b64cf7a1SBjorn Helgaasimage kernel. For this purpose all devices are frozen just like before creating 645151f4e2bSMauro Carvalho Chehabthe image during hibernation, in the 646151f4e2bSMauro Carvalho Chehab 647151f4e2bSMauro Carvalho Chehab prepare, freeze, freeze_noirq 648151f4e2bSMauro Carvalho Chehab 649151f4e2bSMauro Carvalho Chehabphases described above. However, the devices affected by these phases are only 650151f4e2bSMauro Carvalho Chehabthose having drivers in the boot kernel; other devices will still be in whatever 651151f4e2bSMauro Carvalho Chehabstate the boot loader left them. 652151f4e2bSMauro Carvalho Chehab 653151f4e2bSMauro Carvalho ChehabShould the restoration of the pre-hibernation memory contents fail, the boot 654151f4e2bSMauro Carvalho Chehabkernel would go through the "thawing" procedure described above, using the 655151f4e2bSMauro Carvalho Chehabthaw_noirq, thaw, and complete phases (that will only affect the devices having 656151f4e2bSMauro Carvalho Chehabdrivers in the boot kernel), and then continue running normally. 657151f4e2bSMauro Carvalho Chehab 658151f4e2bSMauro Carvalho ChehabIf the pre-hibernation memory contents are restored successfully, which is the 659151f4e2bSMauro Carvalho Chehabusual situation, control is passed to the image kernel, which then becomes 660151f4e2bSMauro Carvalho Chehabresponsible for bringing the system back to the working state. To achieve this, 661151f4e2bSMauro Carvalho Chehabit must restore the devices' pre-hibernation functionality, which is done much 662151f4e2bSMauro Carvalho Chehablike waking up from the memory sleep state, although it involves different 663151f4e2bSMauro Carvalho Chehabphases: 664151f4e2bSMauro Carvalho Chehab 665151f4e2bSMauro Carvalho Chehab restore_noirq, restore, complete 666151f4e2bSMauro Carvalho Chehab 667151f4e2bSMauro Carvalho ChehabThe first two of these are analogous to the resume_noirq and resume phases 668151f4e2bSMauro Carvalho Chehabdescribed above, respectively, and correspond to the following PCI subsystem 669151f4e2bSMauro Carvalho Chehabcallbacks:: 670151f4e2bSMauro Carvalho Chehab 671151f4e2bSMauro Carvalho Chehab pci_pm_restore_noirq() 672151f4e2bSMauro Carvalho Chehab pci_pm_restore() 673151f4e2bSMauro Carvalho Chehab 674151f4e2bSMauro Carvalho ChehabThese callbacks work in analogy with pci_pm_resume_noirq() and pci_pm_resume(), 675151f4e2bSMauro Carvalho Chehabrespectively, but they execute the device driver's pm->restore_noirq() and 676151f4e2bSMauro Carvalho Chehabpm->restore() callbacks, if available. 677151f4e2bSMauro Carvalho Chehab 678151f4e2bSMauro Carvalho ChehabThe complete phase is carried out in exactly the same way as during system 679151f4e2bSMauro Carvalho Chehabresume. 680151f4e2bSMauro Carvalho Chehab 681151f4e2bSMauro Carvalho Chehab 682151f4e2bSMauro Carvalho Chehab3. PCI Device Drivers and Power Management 683151f4e2bSMauro Carvalho Chehab========================================== 684151f4e2bSMauro Carvalho Chehab 685151f4e2bSMauro Carvalho Chehab3.1. Power Management Callbacks 686151f4e2bSMauro Carvalho Chehab------------------------------- 687151f4e2bSMauro Carvalho Chehab 688151f4e2bSMauro Carvalho ChehabPCI device drivers participate in power management by providing callbacks to be 689151f4e2bSMauro Carvalho Chehabexecuted by the PCI subsystem's power management routines described above and by 690151f4e2bSMauro Carvalho Chehabcontrolling the runtime power management of their devices. 691151f4e2bSMauro Carvalho Chehab 692151f4e2bSMauro Carvalho ChehabAt the time of this writing there are two ways to define power management 693151f4e2bSMauro Carvalho Chehabcallbacks for a PCI device driver, the recommended one, based on using a 694b64cf7a1SBjorn Helgaasdev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and 6951a1daf09SBjorn Helgaasthe "legacy" one, in which the .suspend() and .resume() callbacks from struct 6961a1daf09SBjorn Helgaaspci_driver are used. The legacy approach, however, doesn't allow one to define 6971a1daf09SBjorn Helgaasruntime power management callbacks and is not really suitable for any new 6981a1daf09SBjorn Helgaasdrivers. Therefore it is not covered by this document (refer to the source code 6991a1daf09SBjorn Helgaasto learn more about it). 700151f4e2bSMauro Carvalho Chehab 701151f4e2bSMauro Carvalho ChehabIt is recommended that all PCI device drivers define a struct dev_pm_ops object 702151f4e2bSMauro Carvalho Chehabcontaining pointers to power management (PM) callbacks that will be executed by 703151f4e2bSMauro Carvalho Chehabthe PCI subsystem's PM routines in various circumstances. A pointer to the 704151f4e2bSMauro Carvalho Chehabdriver's struct dev_pm_ops object has to be assigned to the driver.pm field in 705151f4e2bSMauro Carvalho Chehabits struct pci_driver object. Once that has happened, the "legacy" PM callbacks 706151f4e2bSMauro Carvalho Chehabin struct pci_driver are ignored (even if they are not NULL). 707151f4e2bSMauro Carvalho Chehab 708151f4e2bSMauro Carvalho ChehabThe PM callbacks in struct dev_pm_ops are not mandatory and if they are not 709151f4e2bSMauro Carvalho Chehabdefined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI 710151f4e2bSMauro Carvalho Chehabsubsystem will handle the device in a simplified default manner. If they are 711151f4e2bSMauro Carvalho Chehabdefined, though, they are expected to behave as described in the following 712151f4e2bSMauro Carvalho Chehabsubsections. 713151f4e2bSMauro Carvalho Chehab 714151f4e2bSMauro Carvalho Chehab3.1.1. prepare() 715151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^ 716151f4e2bSMauro Carvalho Chehab 717151f4e2bSMauro Carvalho ChehabThe prepare() callback is executed during system suspend, during hibernation 718151f4e2bSMauro Carvalho Chehab(when a hibernation image is about to be created), during power-off after 719151f4e2bSMauro Carvalho Chehabsaving a hibernation image and during system restore, when a hibernation image 720151f4e2bSMauro Carvalho Chehabhas just been loaded into memory. 721151f4e2bSMauro Carvalho Chehab 722151f4e2bSMauro Carvalho ChehabThis callback is only necessary if the driver's device has children that in 723151f4e2bSMauro Carvalho Chehabgeneral may be registered at any time. In that case the role of the prepare() 724151f4e2bSMauro Carvalho Chehabcallback is to prevent new children of the device from being registered until 725151f4e2bSMauro Carvalho Chehabone of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run. 726151f4e2bSMauro Carvalho Chehab 727151f4e2bSMauro Carvalho ChehabIn addition to that the prepare() callback may carry out some operations 728151f4e2bSMauro Carvalho Chehabpreparing the device to be suspended, although it should not allocate memory 729151f4e2bSMauro Carvalho Chehab(if additional memory is required to suspend the device, it has to be 730151f4e2bSMauro Carvalho Chehabpreallocated earlier, for example in a suspend/hibernate notifier as described 731151f4e2bSMauro Carvalho Chehabin Documentation/driver-api/pm/notifiers.rst). 732151f4e2bSMauro Carvalho Chehab 733151f4e2bSMauro Carvalho Chehab3.1.2. suspend() 734151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^ 735151f4e2bSMauro Carvalho Chehab 736151f4e2bSMauro Carvalho ChehabThe suspend() callback is only executed during system suspend, after prepare() 737151f4e2bSMauro Carvalho Chehabcallbacks have been executed for all devices in the system. 738151f4e2bSMauro Carvalho Chehab 739151f4e2bSMauro Carvalho ChehabThis callback is expected to quiesce the device and prepare it to be put into a 740151f4e2bSMauro Carvalho Chehablow-power state by the PCI subsystem. It is not required (in fact it even is 741151f4e2bSMauro Carvalho Chehabnot recommended) that a PCI driver's suspend() callback save the standard 742151f4e2bSMauro Carvalho Chehabconfiguration registers of the device, prepare it for waking up the system, or 743151f4e2bSMauro Carvalho Chehabput it into a low-power state. All of these operations can very well be taken 744151f4e2bSMauro Carvalho Chehabcare of by the PCI subsystem, without the driver's participation. 745151f4e2bSMauro Carvalho Chehab 746151f4e2bSMauro Carvalho ChehabHowever, in some rare case it is convenient to carry out these operations in 747151f4e2bSMauro Carvalho Chehaba PCI driver. Then, pci_save_state(), pci_prepare_to_sleep(), and 748151f4e2bSMauro Carvalho Chehabpci_set_power_state() should be used to save the device's standard configuration 749151f4e2bSMauro Carvalho Chehabregisters, to prepare it for system wakeup (if necessary), and to put it into a 750151f4e2bSMauro Carvalho Chehablow-power state, respectively. Moreover, if the driver calls pci_save_state(), 751151f4e2bSMauro Carvalho Chehabthe PCI subsystem will not execute either pci_prepare_to_sleep(), or 752151f4e2bSMauro Carvalho Chehabpci_set_power_state() for its device, so the driver is then responsible for 753151f4e2bSMauro Carvalho Chehabhandling the device as appropriate. 754151f4e2bSMauro Carvalho Chehab 755151f4e2bSMauro Carvalho ChehabWhile the suspend() callback is being executed, the driver's interrupt handler 756151f4e2bSMauro Carvalho Chehabcan be invoked to handle an interrupt from the device, so all suspend-related 757151f4e2bSMauro Carvalho Chehaboperations relying on the driver's ability to handle interrupts should be 758151f4e2bSMauro Carvalho Chehabcarried out in this callback. 759151f4e2bSMauro Carvalho Chehab 760151f4e2bSMauro Carvalho Chehab3.1.3. suspend_noirq() 761151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^ 762151f4e2bSMauro Carvalho Chehab 763151f4e2bSMauro Carvalho ChehabThe suspend_noirq() callback is only executed during system suspend, after 764151f4e2bSMauro Carvalho Chehabsuspend() callbacks have been executed for all devices in the system and 765151f4e2bSMauro Carvalho Chehabafter device interrupts have been disabled by the PM core. 766151f4e2bSMauro Carvalho Chehab 767151f4e2bSMauro Carvalho ChehabThe difference between suspend_noirq() and suspend() is that the driver's 768151f4e2bSMauro Carvalho Chehabinterrupt handler will not be invoked while suspend_noirq() is running. Thus 769151f4e2bSMauro Carvalho Chehabsuspend_noirq() can carry out operations that would cause race conditions to 770151f4e2bSMauro Carvalho Chehabarise if they were performed in suspend(). 771151f4e2bSMauro Carvalho Chehab 772151f4e2bSMauro Carvalho Chehab3.1.4. freeze() 773151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^ 774151f4e2bSMauro Carvalho Chehab 775151f4e2bSMauro Carvalho ChehabThe freeze() callback is hibernation-specific and is executed in two situations, 776151f4e2bSMauro Carvalho Chehabduring hibernation, after prepare() callbacks have been executed for all devices 777151f4e2bSMauro Carvalho Chehabin preparation for the creation of a system image, and during restore, 778151f4e2bSMauro Carvalho Chehabafter a system image has been loaded into memory from persistent storage and the 779151f4e2bSMauro Carvalho Chehabprepare() callbacks have been executed for all devices. 780151f4e2bSMauro Carvalho Chehab 781151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend() callback 782151f4e2bSMauro Carvalho Chehabdescribed above. In fact, they only need to be different in the rare cases when 783151f4e2bSMauro Carvalho Chehabthe driver takes the responsibility for putting the device into a low-power 784151f4e2bSMauro Carvalho Chehabstate. 785151f4e2bSMauro Carvalho Chehab 786151f4e2bSMauro Carvalho ChehabIn that cases the freeze() callback should not prepare the device system wakeup 787151f4e2bSMauro Carvalho Chehabor put it into a low-power state. Still, either it or freeze_noirq() should 788151f4e2bSMauro Carvalho Chehabsave the device's standard configuration registers using pci_save_state(). 789151f4e2bSMauro Carvalho Chehab 790151f4e2bSMauro Carvalho Chehab3.1.5. freeze_noirq() 791151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^ 792151f4e2bSMauro Carvalho Chehab 793151f4e2bSMauro Carvalho ChehabThe freeze_noirq() callback is hibernation-specific. It is executed during 794151f4e2bSMauro Carvalho Chehabhibernation, after prepare() and freeze() callbacks have been executed for all 795151f4e2bSMauro Carvalho Chehabdevices in preparation for the creation of a system image, and during restore, 796151f4e2bSMauro Carvalho Chehabafter a system image has been loaded into memory and after prepare() and 797151f4e2bSMauro Carvalho Chehabfreeze() callbacks have been executed for all devices. It is always executed 798151f4e2bSMauro Carvalho Chehabafter device interrupts have been disabled by the PM core. 799151f4e2bSMauro Carvalho Chehab 800151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend_noirq() 801151f4e2bSMauro Carvalho Chehabcallback described above and it very rarely is necessary to define 802151f4e2bSMauro Carvalho Chehabfreeze_noirq(). 803151f4e2bSMauro Carvalho Chehab 804151f4e2bSMauro Carvalho ChehabThe difference between freeze_noirq() and freeze() is analogous to the 805151f4e2bSMauro Carvalho Chehabdifference between suspend_noirq() and suspend(). 806151f4e2bSMauro Carvalho Chehab 807151f4e2bSMauro Carvalho Chehab3.1.6. poweroff() 808151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^ 809151f4e2bSMauro Carvalho Chehab 810151f4e2bSMauro Carvalho ChehabThe poweroff() callback is hibernation-specific. It is executed when the system 811151f4e2bSMauro Carvalho Chehabis about to be powered off after saving a hibernation image to a persistent 812151f4e2bSMauro Carvalho Chehabstorage. prepare() callbacks are executed for all devices before poweroff() is 813151f4e2bSMauro Carvalho Chehabcalled. 814151f4e2bSMauro Carvalho Chehab 815151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend() and freeze() 816151f4e2bSMauro Carvalho Chehabcallbacks described above, although it does not need to save the contents of 817151f4e2bSMauro Carvalho Chehabthe device's registers. In particular, if the driver wants to put the device 818151f4e2bSMauro Carvalho Chehabinto a low-power state itself instead of allowing the PCI subsystem to do that, 819151f4e2bSMauro Carvalho Chehabthe poweroff() callback should use pci_prepare_to_sleep() and 820151f4e2bSMauro Carvalho Chehabpci_set_power_state() to prepare the device for system wakeup and to put it 821151f4e2bSMauro Carvalho Chehabinto a low-power state, respectively, but it need not save the device's standard 822151f4e2bSMauro Carvalho Chehabconfiguration registers. 823151f4e2bSMauro Carvalho Chehab 824151f4e2bSMauro Carvalho Chehab3.1.7. poweroff_noirq() 825151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^ 826151f4e2bSMauro Carvalho Chehab 827151f4e2bSMauro Carvalho ChehabThe poweroff_noirq() callback is hibernation-specific. It is executed after 828151f4e2bSMauro Carvalho Chehabpoweroff() callbacks have been executed for all devices in the system. 829151f4e2bSMauro Carvalho Chehab 830151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of the suspend_noirq() and 831151f4e2bSMauro Carvalho Chehabfreeze_noirq() callbacks described above, but it does not need to save the 832151f4e2bSMauro Carvalho Chehabcontents of the device's registers. 833151f4e2bSMauro Carvalho Chehab 834151f4e2bSMauro Carvalho ChehabThe difference between poweroff_noirq() and poweroff() is analogous to the 835151f4e2bSMauro Carvalho Chehabdifference between suspend_noirq() and suspend(). 836151f4e2bSMauro Carvalho Chehab 837151f4e2bSMauro Carvalho Chehab3.1.8. resume_noirq() 838151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^ 839151f4e2bSMauro Carvalho Chehab 840151f4e2bSMauro Carvalho ChehabThe resume_noirq() callback is only executed during system resume, after the 841151f4e2bSMauro Carvalho ChehabPM core has enabled the non-boot CPUs. The driver's interrupt handler will not 842151f4e2bSMauro Carvalho Chehabbe invoked while resume_noirq() is running, so this callback can carry out 843151f4e2bSMauro Carvalho Chehaboperations that might race with the interrupt handler. 844151f4e2bSMauro Carvalho Chehab 845151f4e2bSMauro Carvalho ChehabSince the PCI subsystem unconditionally puts all devices into the full power 846151f4e2bSMauro Carvalho Chehabstate in the resume_noirq phase of system resume and restores their standard 847151f4e2bSMauro Carvalho Chehabconfiguration registers, resume_noirq() is usually not necessary. In general 848151f4e2bSMauro Carvalho Chehabit should only be used for performing operations that would lead to race 849151f4e2bSMauro Carvalho Chehabconditions if carried out by resume(). 850151f4e2bSMauro Carvalho Chehab 851151f4e2bSMauro Carvalho Chehab3.1.9. resume() 852151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^ 853151f4e2bSMauro Carvalho Chehab 854151f4e2bSMauro Carvalho ChehabThe resume() callback is only executed during system resume, after 855151f4e2bSMauro Carvalho Chehabresume_noirq() callbacks have been executed for all devices in the system and 856151f4e2bSMauro Carvalho Chehabdevice interrupts have been enabled by the PM core. 857151f4e2bSMauro Carvalho Chehab 858151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the pre-suspend configuration of the 859151f4e2bSMauro Carvalho Chehabdevice and bringing it back to the fully functional state. The device should be 860151f4e2bSMauro Carvalho Chehabable to process I/O in a usual way after resume() has returned. 861151f4e2bSMauro Carvalho Chehab 862151f4e2bSMauro Carvalho Chehab3.1.10. thaw_noirq() 863151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^ 864151f4e2bSMauro Carvalho Chehab 865151f4e2bSMauro Carvalho ChehabThe thaw_noirq() callback is hibernation-specific. It is executed after a 866151f4e2bSMauro Carvalho Chehabsystem image has been created and the non-boot CPUs have been enabled by the PM 867151f4e2bSMauro Carvalho Chehabcore, in the thaw_noirq phase of hibernation. It also may be executed if the 868151f4e2bSMauro Carvalho Chehabloading of a hibernation image fails during system restore (it is then executed 869151f4e2bSMauro Carvalho Chehabafter enabling the non-boot CPUs). The driver's interrupt handler will not be 870151f4e2bSMauro Carvalho Chehabinvoked while thaw_noirq() is running. 871151f4e2bSMauro Carvalho Chehab 872151f4e2bSMauro Carvalho ChehabThe role of this callback is analogous to the role of resume_noirq(). The 873151f4e2bSMauro Carvalho Chehabdifference between these two callbacks is that thaw_noirq() is executed after 874151f4e2bSMauro Carvalho Chehabfreeze() and freeze_noirq(), so in general it does not need to modify the 875151f4e2bSMauro Carvalho Chehabcontents of the device's registers. 876151f4e2bSMauro Carvalho Chehab 877151f4e2bSMauro Carvalho Chehab3.1.11. thaw() 878151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^ 879151f4e2bSMauro Carvalho Chehab 880151f4e2bSMauro Carvalho ChehabThe thaw() callback is hibernation-specific. It is executed after thaw_noirq() 881151f4e2bSMauro Carvalho Chehabcallbacks have been executed for all devices in the system and after device 882151f4e2bSMauro Carvalho Chehabinterrupts have been enabled by the PM core. 883151f4e2bSMauro Carvalho Chehab 884151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the pre-freeze configuration of 885151f4e2bSMauro Carvalho Chehabthe device, so that it will work in a usual way after thaw() has returned. 886151f4e2bSMauro Carvalho Chehab 887151f4e2bSMauro Carvalho Chehab3.1.12. restore_noirq() 888151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^ 889151f4e2bSMauro Carvalho Chehab 890151f4e2bSMauro Carvalho ChehabThe restore_noirq() callback is hibernation-specific. It is executed in the 891151f4e2bSMauro Carvalho Chehabrestore_noirq phase of hibernation, when the boot kernel has passed control to 892151f4e2bSMauro Carvalho Chehabthe image kernel and the non-boot CPUs have been enabled by the image kernel's 893151f4e2bSMauro Carvalho ChehabPM core. 894151f4e2bSMauro Carvalho Chehab 895151f4e2bSMauro Carvalho ChehabThis callback is analogous to resume_noirq() with the exception that it cannot 896151f4e2bSMauro Carvalho Chehabmake any assumption on the previous state of the device, even if the BIOS (or 897151f4e2bSMauro Carvalho Chehabgenerally the platform firmware) is known to preserve that state over a 898151f4e2bSMauro Carvalho Chehabsuspend-resume cycle. 899151f4e2bSMauro Carvalho Chehab 900151f4e2bSMauro Carvalho ChehabFor the vast majority of PCI device drivers there is no difference between 901151f4e2bSMauro Carvalho Chehabresume_noirq() and restore_noirq(). 902151f4e2bSMauro Carvalho Chehab 903151f4e2bSMauro Carvalho Chehab3.1.13. restore() 904151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^ 905151f4e2bSMauro Carvalho Chehab 906151f4e2bSMauro Carvalho ChehabThe restore() callback is hibernation-specific. It is executed after 907151f4e2bSMauro Carvalho Chehabrestore_noirq() callbacks have been executed for all devices in the system and 908151f4e2bSMauro Carvalho Chehabafter the PM core has enabled device drivers' interrupt handlers to be invoked. 909151f4e2bSMauro Carvalho Chehab 910151f4e2bSMauro Carvalho ChehabThis callback is analogous to resume(), just like restore_noirq() is analogous 911151f4e2bSMauro Carvalho Chehabto resume_noirq(). Consequently, the difference between restore_noirq() and 912151f4e2bSMauro Carvalho Chehabrestore() is analogous to the difference between resume_noirq() and resume(). 913151f4e2bSMauro Carvalho Chehab 914151f4e2bSMauro Carvalho ChehabFor the vast majority of PCI device drivers there is no difference between 915151f4e2bSMauro Carvalho Chehabresume() and restore(). 916151f4e2bSMauro Carvalho Chehab 917151f4e2bSMauro Carvalho Chehab3.1.14. complete() 918151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^ 919151f4e2bSMauro Carvalho Chehab 920151f4e2bSMauro Carvalho ChehabThe complete() callback is executed in the following situations: 921151f4e2bSMauro Carvalho Chehab 922151f4e2bSMauro Carvalho Chehab - during system resume, after resume() callbacks have been executed for all 923151f4e2bSMauro Carvalho Chehab devices, 924151f4e2bSMauro Carvalho Chehab - during hibernation, before saving the system image, after thaw() callbacks 925151f4e2bSMauro Carvalho Chehab have been executed for all devices, 926151f4e2bSMauro Carvalho Chehab - during system restore, when the system is going back to its pre-hibernation 927151f4e2bSMauro Carvalho Chehab state, after restore() callbacks have been executed for all devices. 928151f4e2bSMauro Carvalho Chehab 929151f4e2bSMauro Carvalho ChehabIt also may be executed if the loading of a hibernation image into memory fails 930151f4e2bSMauro Carvalho Chehab(in that case it is run after thaw() callbacks have been executed for all 931151f4e2bSMauro Carvalho Chehabdevices that have drivers in the boot kernel). 932151f4e2bSMauro Carvalho Chehab 933151f4e2bSMauro Carvalho ChehabThis callback is entirely optional, although it may be necessary if the 934151f4e2bSMauro Carvalho Chehabprepare() callback performs operations that need to be reversed. 935151f4e2bSMauro Carvalho Chehab 936151f4e2bSMauro Carvalho Chehab3.1.15. runtime_suspend() 937151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^ 938151f4e2bSMauro Carvalho Chehab 939151f4e2bSMauro Carvalho ChehabThe runtime_suspend() callback is specific to device runtime power management 940151f4e2bSMauro Carvalho Chehab(runtime PM). It is executed by the PM core's runtime PM framework when the 941151f4e2bSMauro Carvalho Chehabdevice is about to be suspended (i.e. quiesced and put into a low-power state) 942151f4e2bSMauro Carvalho Chehabat run time. 943151f4e2bSMauro Carvalho Chehab 944151f4e2bSMauro Carvalho ChehabThis callback is responsible for freezing the device and preparing it to be 945151f4e2bSMauro Carvalho Chehabput into a low-power state, but it must allow the PCI subsystem to perform all 946151f4e2bSMauro Carvalho Chehabof the PCI-specific actions necessary for suspending the device. 947151f4e2bSMauro Carvalho Chehab 948151f4e2bSMauro Carvalho Chehab3.1.16. runtime_resume() 949151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^ 950151f4e2bSMauro Carvalho Chehab 951151f4e2bSMauro Carvalho ChehabThe runtime_resume() callback is specific to device runtime PM. It is executed 952151f4e2bSMauro Carvalho Chehabby the PM core's runtime PM framework when the device is about to be resumed 953151f4e2bSMauro Carvalho Chehab(i.e. put into the full-power state and programmed to process I/O normally) at 954151f4e2bSMauro Carvalho Chehabrun time. 955151f4e2bSMauro Carvalho Chehab 956151f4e2bSMauro Carvalho ChehabThis callback is responsible for restoring the normal functionality of the 957151f4e2bSMauro Carvalho Chehabdevice after it has been put into the full-power state by the PCI subsystem. 958151f4e2bSMauro Carvalho ChehabThe device is expected to be able to process I/O in the usual way after 959151f4e2bSMauro Carvalho Chehabruntime_resume() has returned. 960151f4e2bSMauro Carvalho Chehab 961151f4e2bSMauro Carvalho Chehab3.1.17. runtime_idle() 962151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^ 963151f4e2bSMauro Carvalho Chehab 964151f4e2bSMauro Carvalho ChehabThe runtime_idle() callback is specific to device runtime PM. It is executed 965151f4e2bSMauro Carvalho Chehabby the PM core's runtime PM framework whenever it may be desirable to suspend 966151f4e2bSMauro Carvalho Chehabthe device according to the PM core's information. In particular, it is 967151f4e2bSMauro Carvalho Chehabautomatically executed right after runtime_resume() has returned in case the 968151f4e2bSMauro Carvalho Chehabresume of the device has happened as a result of a spurious event. 969151f4e2bSMauro Carvalho Chehab 970151f4e2bSMauro Carvalho ChehabThis callback is optional, but if it is not implemented or if it returns 0, the 971151f4e2bSMauro Carvalho ChehabPCI subsystem will call pm_runtime_suspend() for the device, which in turn will 972151f4e2bSMauro Carvalho Chehabcause the driver's runtime_suspend() callback to be executed. 973151f4e2bSMauro Carvalho Chehab 974151f4e2bSMauro Carvalho Chehab3.1.18. Pointing Multiple Callback Pointers to One Routine 975151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 976151f4e2bSMauro Carvalho Chehab 977151f4e2bSMauro Carvalho ChehabAlthough in principle each of the callbacks described in the previous 978151f4e2bSMauro Carvalho Chehabsubsections can be defined as a separate function, it often is convenient to 979151f4e2bSMauro Carvalho Chehabpoint two or more members of struct dev_pm_ops to the same routine. There are 980151f4e2bSMauro Carvalho Chehaba few convenience macros that can be used for this purpose. 981151f4e2bSMauro Carvalho Chehab 982151f4e2bSMauro Carvalho ChehabThe SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one 983151f4e2bSMauro Carvalho Chehabsuspend routine pointed to by the .suspend(), .freeze(), and .poweroff() 984151f4e2bSMauro Carvalho Chehabmembers and one resume routine pointed to by the .resume(), .thaw(), and 985151f4e2bSMauro Carvalho Chehab.restore() members. The other function pointers in this struct dev_pm_ops are 986151f4e2bSMauro Carvalho Chehabunset. 987151f4e2bSMauro Carvalho Chehab 988151f4e2bSMauro Carvalho ChehabThe UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it 989151f4e2bSMauro Carvalho Chehabadditionally sets the .runtime_resume() pointer to the same value as 990151f4e2bSMauro Carvalho Chehab.resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to 991151f4e2bSMauro Carvalho Chehabthe same value as .suspend() (and .freeze() and .poweroff()). 992151f4e2bSMauro Carvalho Chehab 993151f4e2bSMauro Carvalho ChehabThe SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct 994151f4e2bSMauro Carvalho Chehabdev_pm_ops to indicate that one suspend routine is to be pointed to by the 995151f4e2bSMauro Carvalho Chehab.suspend(), .freeze(), and .poweroff() members and one resume routine is to 996151f4e2bSMauro Carvalho Chehabbe pointed to by the .resume(), .thaw(), and .restore() members. 997151f4e2bSMauro Carvalho Chehab 998151f4e2bSMauro Carvalho Chehab3.1.19. Driver Flags for Power Management 999151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1000151f4e2bSMauro Carvalho Chehab 1001151f4e2bSMauro Carvalho ChehabThe PM core allows device drivers to set flags that influence the handling of 1002151f4e2bSMauro Carvalho Chehabpower management for the devices by the core itself and by middle layer code 1003151f4e2bSMauro Carvalho Chehabincluding the PCI bus type. The flags should be set once at the driver probe 1004151f4e2bSMauro Carvalho Chehabtime with the help of the dev_pm_set_driver_flags() function and they should not 1005151f4e2bSMauro Carvalho Chehabbe updated directly afterwards. 1006151f4e2bSMauro Carvalho Chehab 1007e0751556SRafael J. WysockiThe DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the 1008e0751556SRafael J. Wysockidirect-complete mechanism allowing device suspend/resume callbacks to be skipped 1009e0751556SRafael J. Wysockiif the device is in runtime suspend when the system suspend starts. That also 1010e0751556SRafael J. Wysockiaffects all of the ancestors of the device, so this flag should only be used if 1011e0751556SRafael J. Wysockiabsolutely necessary. 1012151f4e2bSMauro Carvalho Chehab 10132fff3f73SRafael J. WysockiThe DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive 10142fff3f73SRafael J. Wysockivalue from pci_pm_prepare() only if the ->prepare callback provided by the 1015151f4e2bSMauro Carvalho Chehabdriver of the device returns a positive value. That allows the driver to opt 10162fff3f73SRafael J. Wysockiout from using the direct-complete mechanism dynamically (whereas setting 10172fff3f73SRafael J. WysockiDPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out). 1018151f4e2bSMauro Carvalho Chehab 1019151f4e2bSMauro Carvalho ChehabThe DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's 1020151f4e2bSMauro Carvalho Chehabperspective the device can be safely left in runtime suspend during system 1021151f4e2bSMauro Carvalho Chehabsuspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff() 10222fff3f73SRafael J. Wysockito avoid resuming the device from runtime suspend unless there are PCI-specific 10232fff3f73SRafael J. Wysockireasons for doing that. Also, it causes pci_pm_suspend_late/noirq() and 10242fff3f73SRafael J. Wysockipci_pm_poweroff_late/noirq() to return early if the device remains in runtime 10252fff3f73SRafael J. Wysockisuspend during the "late" phase of the system-wide transition under way. 10262fff3f73SRafael J. WysockiMoreover, if the device is in runtime suspend in pci_pm_resume_noirq() or 10272fff3f73SRafael J. Wysockipci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it 10282fff3f73SRafael J. Wysockiis going to be put into D0 going forward). 1029151f4e2bSMauro Carvalho Chehab 10302fff3f73SRafael J. WysockiSetting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its 10312fff3f73SRafael J. Wysocki"noirq" and "early" resume callbacks to be skipped if the device can be left 10322fff3f73SRafael J. Wysockiin suspend after a system-wide transition into the working state. This flag is 10332fff3f73SRafael J. Wysockitaken into consideration by the PM core along with the power.may_skip_resume 10342fff3f73SRafael J. Wysockistatus bit of the device which is set by pci_pm_suspend_noirq() in certain 10352fff3f73SRafael J. Wysockisituations. If the PM core determines that the driver's "noirq" and "early" 10362fff3f73SRafael J. Wysockiresume callbacks should be skipped, the dev_pm_skip_resume() helper function 10372fff3f73SRafael J. Wysockiwill return "true" and that will cause pci_pm_resume_noirq() and 10382fff3f73SRafael J. Wysockipci_pm_resume_early() to return upfront without touching the device and 10392fff3f73SRafael J. Wysockiexecuting the driver callbacks. 1040151f4e2bSMauro Carvalho Chehab 1041151f4e2bSMauro Carvalho Chehab3.2. Device Runtime Power Management 1042151f4e2bSMauro Carvalho Chehab------------------------------------ 1043151f4e2bSMauro Carvalho Chehab 1044151f4e2bSMauro Carvalho ChehabIn addition to providing device power management callbacks PCI device drivers 1045151f4e2bSMauro Carvalho Chehabare responsible for controlling the runtime power management (runtime PM) of 1046151f4e2bSMauro Carvalho Chehabtheir devices. 1047151f4e2bSMauro Carvalho Chehab 1048151f4e2bSMauro Carvalho ChehabThe PCI device runtime PM is optional, but it is recommended that PCI device 1049151f4e2bSMauro Carvalho Chehabdrivers implement it at least in the cases where there is a reliable way of 1050151f4e2bSMauro Carvalho Chehabverifying that the device is not used (like when the network cable is detached 1051151f4e2bSMauro Carvalho Chehabfrom an Ethernet adapter or there are no devices attached to a USB controller). 1052151f4e2bSMauro Carvalho Chehab 1053151f4e2bSMauro Carvalho ChehabTo support the PCI runtime PM the driver first needs to implement the 1054151f4e2bSMauro Carvalho Chehabruntime_suspend() and runtime_resume() callbacks. It also may need to implement 1055151f4e2bSMauro Carvalho Chehabthe runtime_idle() callback to prevent the device from being suspended again 1056151f4e2bSMauro Carvalho Chehabevery time right after the runtime_resume() callback has returned 1057151f4e2bSMauro Carvalho Chehab(alternatively, the runtime_suspend() callback will have to check if the 1058151f4e2bSMauro Carvalho Chehabdevice should really be suspended and return -EAGAIN if that is not the case). 1059151f4e2bSMauro Carvalho Chehab 1060151f4e2bSMauro Carvalho ChehabThe runtime PM of PCI devices is enabled by default by the PCI core. PCI 1061151f4e2bSMauro Carvalho Chehabdevice drivers do not need to enable it and should not attempt to do so. 1062151f4e2bSMauro Carvalho ChehabHowever, it is blocked by pci_pm_init() that runs the pm_runtime_forbid() 1063151f4e2bSMauro Carvalho Chehabhelper function. In addition to that, the runtime PM usage counter of 1064151f4e2bSMauro Carvalho Chehabeach PCI device is incremented by local_pci_probe() before executing the 1065151f4e2bSMauro Carvalho Chehabprobe callback provided by the device's driver. 1066151f4e2bSMauro Carvalho Chehab 1067151f4e2bSMauro Carvalho ChehabIf a PCI driver implements the runtime PM callbacks and intends to use the 1068151f4e2bSMauro Carvalho Chehabruntime PM framework provided by the PM core and the PCI subsystem, it needs 1069151f4e2bSMauro Carvalho Chehabto decrement the device's runtime PM usage counter in its probe callback 1070151f4e2bSMauro Carvalho Chehabfunction. If it doesn't do that, the counter will always be different from 1071151f4e2bSMauro Carvalho Chehabzero for the device and it will never be runtime-suspended. The simplest 1072151f4e2bSMauro Carvalho Chehabway to do that is by calling pm_runtime_put_noidle(), but if the driver 1073151f4e2bSMauro Carvalho Chehabwants to schedule an autosuspend right away, for example, it may call 1074151f4e2bSMauro Carvalho Chehabpm_runtime_put_autosuspend() instead for this purpose. Generally, it 1075151f4e2bSMauro Carvalho Chehabjust needs to call a function that decrements the devices usage counter 1076151f4e2bSMauro Carvalho Chehabfrom its probe routine to make runtime PM work for the device. 1077151f4e2bSMauro Carvalho Chehab 1078151f4e2bSMauro Carvalho ChehabIt is important to remember that the driver's runtime_suspend() callback 1079151f4e2bSMauro Carvalho Chehabmay be executed right after the usage counter has been decremented, because 1080151f4e2bSMauro Carvalho Chehabuser space may already have caused the pm_runtime_allow() helper function 1081151f4e2bSMauro Carvalho Chehabunblocking the runtime PM of the device to run via sysfs, so the driver must 1082151f4e2bSMauro Carvalho Chehabbe prepared to cope with that. 1083151f4e2bSMauro Carvalho Chehab 1084151f4e2bSMauro Carvalho ChehabThe driver itself should not call pm_runtime_allow(), though. Instead, it 1085151f4e2bSMauro Carvalho Chehabshould let user space or some platform-specific code do that (user space can 1086151f4e2bSMauro Carvalho Chehabdo it via sysfs as stated above), but it must be prepared to handle the 1087151f4e2bSMauro Carvalho Chehabruntime PM of the device correctly as soon as pm_runtime_allow() is called 1088151f4e2bSMauro Carvalho Chehab(which may happen at any time, even before the driver is loaded). 1089151f4e2bSMauro Carvalho Chehab 1090151f4e2bSMauro Carvalho ChehabWhen the driver's remove callback runs, it has to balance the decrementation 1091151f4e2bSMauro Carvalho Chehabof the device's runtime PM usage counter at the probe time. For this reason, 1092151f4e2bSMauro Carvalho Chehabif it has decremented the counter in its probe callback, it must run 1093151f4e2bSMauro Carvalho Chehabpm_runtime_get_noresume() in its remove callback. [Since the core carries 1094151f4e2bSMauro Carvalho Chehabout a runtime resume of the device and bumps up the device's usage counter 1095151f4e2bSMauro Carvalho Chehabbefore running the driver's remove callback, the runtime PM of the device 1096151f4e2bSMauro Carvalho Chehabis effectively disabled for the duration of the remove execution and all 1097151f4e2bSMauro Carvalho Chehabruntime PM helper functions incrementing the device's usage counter are 1098151f4e2bSMauro Carvalho Chehabthen effectively equivalent to pm_runtime_get_noresume().] 1099151f4e2bSMauro Carvalho Chehab 1100151f4e2bSMauro Carvalho ChehabThe runtime PM framework works by processing requests to suspend or resume 1101151f4e2bSMauro Carvalho Chehabdevices, or to check if they are idle (in which cases it is reasonable to 1102151f4e2bSMauro Carvalho Chehabsubsequently request that they be suspended). These requests are represented 1103151f4e2bSMauro Carvalho Chehabby work items put into the power management workqueue, pm_wq. Although there 1104151f4e2bSMauro Carvalho Chehabare a few situations in which power management requests are automatically 1105151f4e2bSMauro Carvalho Chehabqueued by the PM core (for example, after processing a request to resume a 1106151f4e2bSMauro Carvalho Chehabdevice the PM core automatically queues a request to check if the device is 1107151f4e2bSMauro Carvalho Chehabidle), device drivers are generally responsible for queuing power management 1108151f4e2bSMauro Carvalho Chehabrequests for their devices. For this purpose they should use the runtime PM 1109151f4e2bSMauro Carvalho Chehabhelper functions provided by the PM core, discussed in 1110151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst. 1111151f4e2bSMauro Carvalho Chehab 1112151f4e2bSMauro Carvalho ChehabDevices can also be suspended and resumed synchronously, without placing a 1113151f4e2bSMauro Carvalho Chehabrequest into pm_wq. In the majority of cases this also is done by their 1114151f4e2bSMauro Carvalho Chehabdrivers that use helper functions provided by the PM core for this purpose. 1115151f4e2bSMauro Carvalho Chehab 1116151f4e2bSMauro Carvalho ChehabFor more information on the runtime PM of devices refer to 1117151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst. 1118151f4e2bSMauro Carvalho Chehab 1119151f4e2bSMauro Carvalho Chehab 1120151f4e2bSMauro Carvalho Chehab4. Resources 1121151f4e2bSMauro Carvalho Chehab============ 1122151f4e2bSMauro Carvalho Chehab 1123151f4e2bSMauro Carvalho ChehabPCI Local Bus Specification, Rev. 3.0 1124151f4e2bSMauro Carvalho Chehab 1125151f4e2bSMauro Carvalho ChehabPCI Bus Power Management Interface Specification, Rev. 1.2 1126151f4e2bSMauro Carvalho Chehab 1127151f4e2bSMauro Carvalho ChehabAdvanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b 1128151f4e2bSMauro Carvalho Chehab 1129151f4e2bSMauro Carvalho ChehabPCI Express Base Specification, Rev. 2.0 1130151f4e2bSMauro Carvalho Chehab 1131151f4e2bSMauro Carvalho ChehabDocumentation/driver-api/pm/devices.rst 1132151f4e2bSMauro Carvalho Chehab 1133151f4e2bSMauro Carvalho ChehabDocumentation/power/runtime_pm.rst 1134