xref: /openbmc/linux/Documentation/admin-guide/pm/suspend-flows.rst (revision f97cee494dc92395a668445bcd24d34c89f4ff8c)
1.. SPDX-License-Identifier: GPL-2.0
2.. include:: <isonum.txt>
3
4=========================
5System Suspend Code Flows
6=========================
7
8:Copyright: |copy| 2020 Intel Corporation
9
10:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
11
12At least one global system-wide transition needs to be carried out for the
13system to get from the working state into one of the supported
14:doc:`sleep states <sleep-states>`.  Hibernation requires more than one
15transition to occur for this purpose, but the other sleep states, commonly
16referred to as *system-wide suspend* (or simply *system suspend*) states, need
17only one.
18
19For those sleep states, the transition from the working state of the system into
20the target sleep state is referred to as *system suspend* too (in the majority
21of cases, whether this means a transition or a sleep state of the system should
22be clear from the context) and the transition back from the sleep state into the
23working state is referred to as *system resume*.
24
25The kernel code flows associated with the suspend and resume transitions for
26different sleep states of the system are quite similar, but there are some
27significant differences between the :ref:`suspend-to-idle <s2idle>` code flows
28and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and
29:ref:`standby <standby>` sleep states.
30
31The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states
32cannot be implemented without platform support and the difference between them
33boils down to the platform-specific actions carried out by the suspend and
34resume hooks that need to be provided by the platform driver to make them
35available.  Apart from that, the suspend and resume code flows for these sleep
36states are mostly identical, so they both together will be referred to as
37*platform-dependent suspend* states in what follows.
38
39
40.. _s2idle_suspend:
41
42Suspend-to-idle Suspend Code Flow
43=================================
44
45The following steps are taken in order to transition the system from the working
46state to the :ref:`suspend-to-idle <s2idle>` sleep state:
47
48 1. Invoking system-wide suspend notifiers.
49
50    Kernel subsystems can register callbacks to be invoked when the suspend
51    transition is about to occur and when the resume transition has finished.
52
53    That allows them to prepare for the change of the system state and to clean
54    up after getting back to the working state.
55
56 2. Freezing tasks.
57
58    Tasks are frozen primarily in order to avoid unchecked hardware accesses
59    from user space through MMIO regions or I/O registers exposed directly to
60    it and to prevent user space from entering the kernel while the next step
61    of the transition is in progress (which might have been problematic for
62    various reasons).
63
64    All user space tasks are intercepted as though they were sent a signal and
65    put into uninterruptible sleep until the end of the subsequent system resume
66    transition.
67
68    The kernel threads that choose to be frozen during system suspend for
69    specific reasons are frozen subsequently, but they are not intercepted.
70    Instead, they are expected to periodically check whether or not they need
71    to be frozen and to put themselves into uninterruptible sleep if so.  [Note,
72    however, that kernel threads can use locking and other concurrency controls
73    available in kernel space to synchronize themselves with system suspend and
74    resume, which can be much more precise than the freezing, so the latter is
75    not a recommended option for kernel threads.]
76
77 3. Suspending devices and reconfiguring IRQs.
78
79    Devices are suspended in four phases called *prepare*, *suspend*,
80    *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more
81    information on what exactly happens in each phase).
82
83    Every device is visited in each phase, but typically it is not physically
84    accessed in more than two of them.
85
86    The runtime PM API is disabled for every device during the *late* suspend
87    phase and high-level ("action") interrupt handlers are prevented from being
88    invoked before the *noirq* suspend phase.
89
90    Interrupts are still handled after that, but they are only acknowledged to
91    interrupt controllers without performing any device-specific actions that
92    would be triggered in the working state of the system (those actions are
93    deferred till the subsequent system resume transition as described
94    `below <s2idle_resume_>`_).
95
96    IRQs associated with system wakeup devices are "armed" so that the resume
97    transition of the system is started when one of them signals an event.
98
99 4. Freezing the scheduler tick and suspending timekeeping.
100
101    When all devices have been suspended, CPUs enter the idle loop and are put
102    into the deepest available idle state.  While doing that, each of them
103    "freezes" its own scheduler tick so that the timer events associated with
104    the tick do not occur until the CPU is woken up by another interrupt source.
105
106    The last CPU to enter the idle state also stops the timekeeping which
107    (among other things) prevents high resolution timers from triggering going
108    forward until the first CPU that is woken up restarts the timekeeping.
109    That allows the CPUs to stay in the deep idle state relatively long in one
110    go.
111
112    From this point on, the CPUs can only be woken up by non-timer hardware
113    interrupts.  If that happens, they go back to the idle state unless the
114    interrupt that woke up one of them comes from an IRQ that has been armed for
115    system wakeup, in which case the system resume transition is started.
116
117
118.. _s2idle_resume:
119
120Suspend-to-idle Resume Code Flow
121================================
122
123The following steps are taken in order to transition the system from the
124:ref:`suspend-to-idle <s2idle>` sleep state into the working state:
125
126 1. Resuming timekeeping and unfreezing the scheduler tick.
127
128    When one of the CPUs is woken up (by a non-timer hardware interrupt), it
129    leaves the idle state entered in the last step of the preceding suspend
130    transition, restarts the timekeeping (unless it has been restarted already
131    by another CPU that woke up earlier) and the scheduler tick on that CPU is
132    unfrozen.
133
134    If the interrupt that has woken up the CPU was armed for system wakeup,
135    the system resume transition begins.
136
137 2. Resuming devices and restoring the working-state configuration of IRQs.
138
139    Devices are resumed in four phases called *noirq resume*, *early resume*,
140    *resume* and *complete* (see :ref:`driverapi_pm_devices` for more
141    information on what exactly happens in each phase).
142
143    Every device is visited in each phase, but typically it is not physically
144    accessed in more than two of them.
145
146    The working-state configuration of IRQs is restored after the *noirq* resume
147    phase and the runtime PM API is re-enabled for every device whose driver
148    supports it during the *early* resume phase.
149
150 3. Thawing tasks.
151
152    Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_
153    transition are "thawed", which means that they are woken up from the
154    uninterruptible sleep that they went into at that time and user space tasks
155    are allowed to exit the kernel.
156
157 4. Invoking system-wide resume notifiers.
158
159    This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition
160    and the same set of callbacks is invoked at this point, but a different
161    "notification type" parameter value is passed to them.
162
163
164Platform-dependent Suspend Code Flow
165====================================
166
167The following steps are taken in order to transition the system from the working
168state to platform-dependent suspend state:
169
170 1. Invoking system-wide suspend notifiers.
171
172    This step is the same as step 1 of the suspend-to-idle suspend transition
173    described `above <s2idle_suspend_>`_.
174
175 2. Freezing tasks.
176
177    This step is the same as step 2 of the suspend-to-idle suspend transition
178    described `above <s2idle_suspend_>`_.
179
180 3. Suspending devices and reconfiguring IRQs.
181
182    This step is analogous to step 3 of the suspend-to-idle suspend transition
183    described `above <s2idle_suspend_>`_, but the arming of IRQs for system
184    wakeup generally does not have any effect on the platform.
185
186    There are platforms that can go into a very deep low-power state internally
187    when all CPUs in them are in sufficiently deep idle states and all I/O
188    devices have been put into low-power states.  On those platforms,
189    suspend-to-idle can reduce system power very effectively.
190
191    On the other platforms, however, low-level components (like interrupt
192    controllers) need to be turned off in a platform-specific way (implemented
193    in the hooks provided by the platform driver) to achieve comparable power
194    reduction.
195
196    That usually prevents in-band hardware interrupts from waking up the system,
197    which must be done in a special platform-dependent way.  Then, the
198    configuration of system wakeup sources usually starts when system wakeup
199    devices are suspended and is finalized by the platform suspend hooks later
200    on.
201
202 4. Disabling non-boot CPUs.
203
204    On some platforms the suspend hooks mentioned above must run in a one-CPU
205    configuration of the system (in particular, the hardware cannot be accessed
206    by any code running in parallel with the platform suspend hooks that may,
207    and often do, trap into the platform firmware in order to finalize the
208    suspend transition).
209
210    For this reason, the CPU offline/online (CPU hotplug) framework is used
211    to take all of the CPUs in the system, except for one (the boot CPU),
212    offline (typically, the CPUs that have been taken offline go into deep idle
213    states).
214
215    This means that all tasks are migrated away from those CPUs and all IRQs are
216    rerouted to the only CPU that remains online.
217
218 5. Suspending core system components.
219
220    This prepares the core system components for (possibly) losing power going
221    forward and suspends the timekeeping.
222
223 6. Platform-specific power removal.
224
225    This is expected to remove power from all of the system components except
226    for the memory controller and RAM (in order to preserve the contents of the
227    latter) and some devices designated for system wakeup.
228
229    In many cases control is passed to the platform firmware which is expected
230    to finalize the suspend transition as needed.
231
232
233Platform-dependent Resume Code Flow
234===================================
235
236The following steps are taken in order to transition the system from a
237platform-dependent suspend state into the working state:
238
239 1. Platform-specific system wakeup.
240
241    The platform is woken up by a signal from one of the designated system
242    wakeup devices (which need not be an in-band hardware interrupt)  and
243    control is passed back to the kernel (the working configuration of the
244    platform may need to be restored by the platform firmware before the
245    kernel gets control again).
246
247 2. Resuming core system components.
248
249    The suspend-time configuration of the core system components is restored and
250    the timekeeping is resumed.
251
252 3. Re-enabling non-boot CPUs.
253
254    The CPUs disabled in step 4 of the preceding suspend transition are taken
255    back online and their suspend-time configuration is restored.
256
257 4. Resuming devices and restoring the working-state configuration of IRQs.
258
259    This step is the same as step 2 of the suspend-to-idle suspend transition
260    described `above <s2idle_resume_>`_.
261
262 5. Thawing tasks.
263
264    This step is the same as step 3 of the suspend-to-idle suspend transition
265    described `above <s2idle_resume_>`_.
266
267 6. Invoking system-wide resume notifiers.
268
269    This step is the same as step 4 of the suspend-to-idle suspend transition
270    described `above <s2idle_resume_>`_.
271