xref: /openbmc/linux/Documentation/power/basic-pm-debugging.rst (revision 0898782247ae533d1f4e47a06bc5d4870931b284)
1*151f4e2bSMauro Carvalho Chehab=================================
2*151f4e2bSMauro Carvalho ChehabDebugging hibernation and suspend
3*151f4e2bSMauro Carvalho Chehab=================================
4*151f4e2bSMauro Carvalho Chehab
5*151f4e2bSMauro Carvalho Chehab	(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
6*151f4e2bSMauro Carvalho Chehab
7*151f4e2bSMauro Carvalho Chehab1. Testing hibernation (aka suspend to disk or STD)
8*151f4e2bSMauro Carvalho Chehab===================================================
9*151f4e2bSMauro Carvalho Chehab
10*151f4e2bSMauro Carvalho ChehabTo check if hibernation works, you can try to hibernate in the "reboot" mode::
11*151f4e2bSMauro Carvalho Chehab
12*151f4e2bSMauro Carvalho Chehab	# echo reboot > /sys/power/disk
13*151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
14*151f4e2bSMauro Carvalho Chehab
15*151f4e2bSMauro Carvalho Chehaband the system should create a hibernation image, reboot, resume and get back to
16*151f4e2bSMauro Carvalho Chehabthe command prompt where you have started the transition.  If that happens,
17*151f4e2bSMauro Carvalho Chehabhibernation is most likely to work correctly.  Still, you need to repeat the
18*151f4e2bSMauro Carvalho Chehabtest at least a couple of times in a row for confidence.  [This is necessary,
19*151f4e2bSMauro Carvalho Chehabbecause some problems only show up on a second attempt at suspending and
20*151f4e2bSMauro Carvalho Chehabresuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
21*151f4e2bSMauro Carvalho Chehabmodes causes the PM core to skip some platform-related callbacks which on ACPI
22*151f4e2bSMauro Carvalho Chehabsystems might be necessary to make hibernation work.  Thus, if your machine
23*151f4e2bSMauro Carvalho Chehabfails to hibernate or resume in the "reboot" mode, you should try the
24*151f4e2bSMauro Carvalho Chehab"platform" mode::
25*151f4e2bSMauro Carvalho Chehab
26*151f4e2bSMauro Carvalho Chehab	# echo platform > /sys/power/disk
27*151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
28*151f4e2bSMauro Carvalho Chehab
29*151f4e2bSMauro Carvalho Chehabwhich is the default and recommended mode of hibernation.
30*151f4e2bSMauro Carvalho Chehab
31*151f4e2bSMauro Carvalho ChehabUnfortunately, the "platform" mode of hibernation does not work on some systems
32*151f4e2bSMauro Carvalho Chehabwith broken BIOSes.  In such cases the "shutdown" mode of hibernation might
33*151f4e2bSMauro Carvalho Chehabwork::
34*151f4e2bSMauro Carvalho Chehab
35*151f4e2bSMauro Carvalho Chehab	# echo shutdown > /sys/power/disk
36*151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
37*151f4e2bSMauro Carvalho Chehab
38*151f4e2bSMauro Carvalho Chehab(it is similar to the "reboot" mode, but it requires you to press the power
39*151f4e2bSMauro Carvalho Chehabbutton to make the system resume).
40*151f4e2bSMauro Carvalho Chehab
41*151f4e2bSMauro Carvalho ChehabIf neither "platform" nor "shutdown" hibernation mode works, you will need to
42*151f4e2bSMauro Carvalho Chehabidentify what goes wrong.
43*151f4e2bSMauro Carvalho Chehab
44*151f4e2bSMauro Carvalho Chehaba) Test modes of hibernation
45*151f4e2bSMauro Carvalho Chehab----------------------------
46*151f4e2bSMauro Carvalho Chehab
47*151f4e2bSMauro Carvalho ChehabTo find out why hibernation fails on your system, you can use a special testing
48*151f4e2bSMauro Carvalho Chehabfacility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
49*151f4e2bSMauro Carvalho Chehabthere is the file /sys/power/pm_test that can be used to make the hibernation
50*151f4e2bSMauro Carvalho Chehabcore run in a test mode.  There are 5 test modes available:
51*151f4e2bSMauro Carvalho Chehab
52*151f4e2bSMauro Carvalho Chehabfreezer
53*151f4e2bSMauro Carvalho Chehab	- test the freezing of processes
54*151f4e2bSMauro Carvalho Chehab
55*151f4e2bSMauro Carvalho Chehabdevices
56*151f4e2bSMauro Carvalho Chehab	- test the freezing of processes and suspending of devices
57*151f4e2bSMauro Carvalho Chehab
58*151f4e2bSMauro Carvalho Chehabplatform
59*151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices and platform
60*151f4e2bSMauro Carvalho Chehab	  global control methods [1]_
61*151f4e2bSMauro Carvalho Chehab
62*151f4e2bSMauro Carvalho Chehabprocessors
63*151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices, platform
64*151f4e2bSMauro Carvalho Chehab	  global control methods [1]_ and the disabling of nonboot CPUs
65*151f4e2bSMauro Carvalho Chehab
66*151f4e2bSMauro Carvalho Chehabcore
67*151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices, platform global
68*151f4e2bSMauro Carvalho Chehab	  control methods\ [1]_, the disabling of nonboot CPUs and suspending
69*151f4e2bSMauro Carvalho Chehab	  of platform/system devices
70*151f4e2bSMauro Carvalho Chehab
71*151f4e2bSMauro Carvalho Chehab.. [1]
72*151f4e2bSMauro Carvalho Chehab
73*151f4e2bSMauro Carvalho Chehab    the platform global control methods are only available on ACPI systems
74*151f4e2bSMauro Carvalho Chehab    and are only tested if the hibernation mode is set to "platform"
75*151f4e2bSMauro Carvalho Chehab
76*151f4e2bSMauro Carvalho ChehabTo use one of them it is necessary to write the corresponding string to
77*151f4e2bSMauro Carvalho Chehab/sys/power/pm_test (eg. "devices" to test the freezing of processes and
78*151f4e2bSMauro Carvalho Chehabsuspending devices) and issue the standard hibernation commands.  For example,
79*151f4e2bSMauro Carvalho Chehabto use the "devices" test mode along with the "platform" mode of hibernation,
80*151f4e2bSMauro Carvalho Chehabyou should do the following::
81*151f4e2bSMauro Carvalho Chehab
82*151f4e2bSMauro Carvalho Chehab	# echo devices > /sys/power/pm_test
83*151f4e2bSMauro Carvalho Chehab	# echo platform > /sys/power/disk
84*151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
85*151f4e2bSMauro Carvalho Chehab
86*151f4e2bSMauro Carvalho ChehabThen, the kernel will try to freeze processes, suspend devices, wait a few
87*151f4e2bSMauro Carvalho Chehabseconds (5 by default, but configurable by the suspend.pm_test_delay module
88*151f4e2bSMauro Carvalho Chehabparameter), resume devices and thaw processes.  If "platform" is written to
89*151f4e2bSMauro Carvalho Chehab/sys/power/pm_test , then after suspending devices the kernel will additionally
90*151f4e2bSMauro Carvalho Chehabinvoke the global control methods (eg. ACPI global control methods) used to
91*151f4e2bSMauro Carvalho Chehabprepare the platform firmware for hibernation.  Next, it will wait a
92*151f4e2bSMauro Carvalho Chehabconfigurable number of seconds and invoke the platform (eg. ACPI) global
93*151f4e2bSMauro Carvalho Chehabmethods used to cancel hibernation etc.
94*151f4e2bSMauro Carvalho Chehab
95*151f4e2bSMauro Carvalho ChehabWriting "none" to /sys/power/pm_test causes the kernel to switch to the normal
96*151f4e2bSMauro Carvalho Chehabhibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
97*151f4e2bSMauro Carvalho Chehabcontains a space-separated list of all available tests (including "none" that
98*151f4e2bSMauro Carvalho Chehabrepresents the normal functionality) in which the current test level is
99*151f4e2bSMauro Carvalho Chehabindicated by square brackets.
100*151f4e2bSMauro Carvalho Chehab
101*151f4e2bSMauro Carvalho ChehabGenerally, as you can see, each test level is more "invasive" than the previous
102*151f4e2bSMauro Carvalho Chehabone and the "core" level tests the hardware and drivers as deeply as possible
103*151f4e2bSMauro Carvalho Chehabwithout creating a hibernation image.  Obviously, if the "devices" test fails,
104*151f4e2bSMauro Carvalho Chehabthe "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
105*151f4e2bSMauro Carvalho Chehabshould try the test modes starting from "freezer", through "devices", "platform"
106*151f4e2bSMauro Carvalho Chehaband "processors" up to "core" (repeat the test on each level a couple of times
107*151f4e2bSMauro Carvalho Chehabto make sure that any random factors are avoided).
108*151f4e2bSMauro Carvalho Chehab
109*151f4e2bSMauro Carvalho ChehabIf the "freezer" test fails, there is a task that cannot be frozen (in that case
110*151f4e2bSMauro Carvalho Chehabit usually is possible to identify the offending task by analysing the output of
111*151f4e2bSMauro Carvalho Chehabdmesg obtained after the failing test).  Failure at this level usually means
112*151f4e2bSMauro Carvalho Chehabthat there is a problem with the tasks freezer subsystem that should be
113*151f4e2bSMauro Carvalho Chehabreported.
114*151f4e2bSMauro Carvalho Chehab
115*151f4e2bSMauro Carvalho ChehabIf the "devices" test fails, most likely there is a driver that cannot suspend
116*151f4e2bSMauro Carvalho Chehabor resume its device (in the latter case the system may hang or become unstable
117*151f4e2bSMauro Carvalho Chehabafter the test, so please take that into consideration).  To find this driver,
118*151f4e2bSMauro Carvalho Chehabyou can carry out a binary search according to the rules:
119*151f4e2bSMauro Carvalho Chehab
120*151f4e2bSMauro Carvalho Chehab- if the test fails, unload a half of the drivers currently loaded and repeat
121*151f4e2bSMauro Carvalho Chehab  (that would probably involve rebooting the system, so always note what drivers
122*151f4e2bSMauro Carvalho Chehab  have been loaded before the test),
123*151f4e2bSMauro Carvalho Chehab- if the test succeeds, load a half of the drivers you have unloaded most
124*151f4e2bSMauro Carvalho Chehab  recently and repeat.
125*151f4e2bSMauro Carvalho Chehab
126*151f4e2bSMauro Carvalho ChehabOnce you have found the failing driver (there can be more than just one of
127*151f4e2bSMauro Carvalho Chehabthem), you have to unload it every time before hibernation.  In that case please
128*151f4e2bSMauro Carvalho Chehabmake sure to report the problem with the driver.
129*151f4e2bSMauro Carvalho Chehab
130*151f4e2bSMauro Carvalho ChehabIt is also possible that the "devices" test will still fail after you have
131*151f4e2bSMauro Carvalho Chehabunloaded all modules. In that case, you may want to look in your kernel
132*151f4e2bSMauro Carvalho Chehabconfiguration for the drivers that can be compiled as modules (and test again
133*151f4e2bSMauro Carvalho Chehabwith these drivers compiled as modules).  You may also try to use some special
134*151f4e2bSMauro Carvalho Chehabkernel command line options such as "noapic", "noacpi" or even "acpi=off".
135*151f4e2bSMauro Carvalho Chehab
136*151f4e2bSMauro Carvalho ChehabIf the "platform" test fails, there is a problem with the handling of the
137*151f4e2bSMauro Carvalho Chehabplatform (eg. ACPI) firmware on your system.  In that case the "platform" mode
138*151f4e2bSMauro Carvalho Chehabof hibernation is not likely to work.  You can try the "shutdown" mode, but that
139*151f4e2bSMauro Carvalho Chehabis rather a poor man's workaround.
140*151f4e2bSMauro Carvalho Chehab
141*151f4e2bSMauro Carvalho ChehabIf the "processors" test fails, the disabling/enabling of nonboot CPUs does not
142*151f4e2bSMauro Carvalho Chehabwork (of course, this only may be an issue on SMP systems) and the problem
143*151f4e2bSMauro Carvalho Chehabshould be reported.  In that case you can also try to switch the nonboot CPUs
144*151f4e2bSMauro Carvalho Chehaboff and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
145*151f4e2bSMauro Carvalho Chehabsee if that works.
146*151f4e2bSMauro Carvalho Chehab
147*151f4e2bSMauro Carvalho ChehabIf the "core" test fails, which means that suspending of the system/platform
148*151f4e2bSMauro Carvalho Chehabdevices has failed (these devices are suspended on one CPU with interrupts off),
149*151f4e2bSMauro Carvalho Chehabthe problem is most probably hardware-related and serious, so it should be
150*151f4e2bSMauro Carvalho Chehabreported.
151*151f4e2bSMauro Carvalho Chehab
152*151f4e2bSMauro Carvalho ChehabA failure of any of the "platform", "processors" or "core" tests may cause your
153*151f4e2bSMauro Carvalho Chehabsystem to hang or become unstable, so please beware.  Such a failure usually
154*151f4e2bSMauro Carvalho Chehabindicates a serious problem that very well may be related to the hardware, but
155*151f4e2bSMauro Carvalho Chehabplease report it anyway.
156*151f4e2bSMauro Carvalho Chehab
157*151f4e2bSMauro Carvalho Chehabb) Testing minimal configuration
158*151f4e2bSMauro Carvalho Chehab--------------------------------
159*151f4e2bSMauro Carvalho Chehab
160*151f4e2bSMauro Carvalho ChehabIf all of the hibernation test modes work, you can boot the system with the
161*151f4e2bSMauro Carvalho Chehab"init=/bin/bash" command line parameter and attempt to hibernate in the
162*151f4e2bSMauro Carvalho Chehab"reboot", "shutdown" and "platform" modes.  If that does not work, there
163*151f4e2bSMauro Carvalho Chehabprobably is a problem with a driver statically compiled into the kernel and you
164*151f4e2bSMauro Carvalho Chehabcan try to compile more drivers as modules, so that they can be tested
165*151f4e2bSMauro Carvalho Chehabindividually.  Otherwise, there is a problem with a modular driver and you can
166*151f4e2bSMauro Carvalho Chehabfind it by loading a half of the modules you normally use and binary searching
167*151f4e2bSMauro Carvalho Chehabin accordance with the algorithm:
168*151f4e2bSMauro Carvalho Chehab- if there are n modules loaded and the attempt to suspend and resume fails,
169*151f4e2bSMauro Carvalho Chehabunload n/2 of the modules and try again (that would probably involve rebooting
170*151f4e2bSMauro Carvalho Chehabthe system),
171*151f4e2bSMauro Carvalho Chehab- if there are n modules loaded and the attempt to suspend and resume succeeds,
172*151f4e2bSMauro Carvalho Chehabload n/2 modules more and try again.
173*151f4e2bSMauro Carvalho Chehab
174*151f4e2bSMauro Carvalho ChehabAgain, if you find the offending module(s), it(they) must be unloaded every time
175*151f4e2bSMauro Carvalho Chehabbefore hibernation, and please report the problem with it(them).
176*151f4e2bSMauro Carvalho Chehab
177*151f4e2bSMauro Carvalho Chehabc) Using the "test_resume" hibernation option
178*151f4e2bSMauro Carvalho Chehab---------------------------------------------
179*151f4e2bSMauro Carvalho Chehab
180*151f4e2bSMauro Carvalho Chehab/sys/power/disk generally tells the kernel what to do after creating a
181*151f4e2bSMauro Carvalho Chehabhibernation image.  One of the available options is "test_resume" which
182*151f4e2bSMauro Carvalho Chehabcauses the just created image to be used for immediate restoration.  Namely,
183*151f4e2bSMauro Carvalho Chehabafter doing::
184*151f4e2bSMauro Carvalho Chehab
185*151f4e2bSMauro Carvalho Chehab	# echo test_resume > /sys/power/disk
186*151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
187*151f4e2bSMauro Carvalho Chehab
188*151f4e2bSMauro Carvalho Chehaba hibernation image will be created and a resume from it will be triggered
189*151f4e2bSMauro Carvalho Chehabimmediately without involving the platform firmware in any way.
190*151f4e2bSMauro Carvalho Chehab
191*151f4e2bSMauro Carvalho ChehabThat test can be used to check if failures to resume from hibernation are
192*151f4e2bSMauro Carvalho Chehabrelated to bad interactions with the platform firmware.  That is, if the above
193*151f4e2bSMauro Carvalho Chehabworks every time, but resume from actual hibernation does not work or is
194*151f4e2bSMauro Carvalho Chehabunreliable, the platform firmware may be responsible for the failures.
195*151f4e2bSMauro Carvalho Chehab
196*151f4e2bSMauro Carvalho ChehabOn architectures and platforms that support using different kernels to restore
197*151f4e2bSMauro Carvalho Chehabhibernation images (that is, the kernel used to read the image from storage and
198*151f4e2bSMauro Carvalho Chehabload it into memory is different from the one included in the image) or support
199*151f4e2bSMauro Carvalho Chehabkernel address space randomization, it also can be used to check if failures
200*151f4e2bSMauro Carvalho Chehabto resume may be related to the differences between the restore and image
201*151f4e2bSMauro Carvalho Chehabkernels.
202*151f4e2bSMauro Carvalho Chehab
203*151f4e2bSMauro Carvalho Chehabd) Advanced debugging
204*151f4e2bSMauro Carvalho Chehab---------------------
205*151f4e2bSMauro Carvalho Chehab
206*151f4e2bSMauro Carvalho ChehabIn case that hibernation does not work on your system even in the minimal
207*151f4e2bSMauro Carvalho Chehabconfiguration and compiling more drivers as modules is not practical or some
208*151f4e2bSMauro Carvalho Chehabmodules cannot be unloaded, you can use one of the more advanced debugging
209*151f4e2bSMauro Carvalho Chehabtechniques to find the problem.  First, if there is a serial port in your box,
210*151f4e2bSMauro Carvalho Chehabyou can boot the kernel with the 'no_console_suspend' parameter and try to log
211*151f4e2bSMauro Carvalho Chehabkernel messages using the serial console.  This may provide you with some
212*151f4e2bSMauro Carvalho Chehabinformation about the reasons of the suspend (resume) failure.  Alternatively,
213*151f4e2bSMauro Carvalho Chehabit may be possible to use a FireWire port for debugging with firescope
214*151f4e2bSMauro Carvalho Chehab(http://v3.sk/~lkundrak/firescope/).  On x86 it is also possible to
215*151f4e2bSMauro Carvalho Chehabuse the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
216*151f4e2bSMauro Carvalho Chehab
217*151f4e2bSMauro Carvalho Chehab2. Testing suspend to RAM (STR)
218*151f4e2bSMauro Carvalho Chehab===============================
219*151f4e2bSMauro Carvalho Chehab
220*151f4e2bSMauro Carvalho ChehabTo verify that the STR works, it is generally more convenient to use the s2ram
221*151f4e2bSMauro Carvalho Chehabtool available from http://suspend.sf.net and documented at
222*151f4e2bSMauro Carvalho Chehabhttp://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
223*151f4e2bSMauro Carvalho Chehab
224*151f4e2bSMauro Carvalho ChehabNamely, after writing "freezer", "devices", "platform", "processors", or "core"
225*151f4e2bSMauro Carvalho Chehabinto /sys/power/pm_test (available if the kernel is compiled with
226*151f4e2bSMauro Carvalho ChehabCONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
227*151f4e2bSMauro Carvalho Chehabto given string.  The STR test modes are defined in the same way as for
228*151f4e2bSMauro Carvalho Chehabhibernation, so please refer to Section 1 for more information about them.  In
229*151f4e2bSMauro Carvalho Chehabparticular, the "core" test allows you to test everything except for the actual
230*151f4e2bSMauro Carvalho Chehabinvocation of the platform firmware in order to put the system into the sleep
231*151f4e2bSMauro Carvalho Chehabstate.
232*151f4e2bSMauro Carvalho Chehab
233*151f4e2bSMauro Carvalho ChehabAmong other things, the testing with the help of /sys/power/pm_test may allow
234*151f4e2bSMauro Carvalho Chehabyou to identify drivers that fail to suspend or resume their devices.  They
235*151f4e2bSMauro Carvalho Chehabshould be unloaded every time before an STR transition.
236*151f4e2bSMauro Carvalho Chehab
237*151f4e2bSMauro Carvalho ChehabNext, you can follow the instructions at S2RAM_LINK to test the system, but if
238*151f4e2bSMauro Carvalho Chehabit does not work "out of the box", you may need to boot it with
239*151f4e2bSMauro Carvalho Chehab"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,
240*151f4e2bSMauro Carvalho Chehabyou may be able to search for failing drivers by following the procedure
241*151f4e2bSMauro Carvalho Chehabanalogous to the one described in section 1.  If you find some failing drivers,
242*151f4e2bSMauro Carvalho Chehabyou will have to unload them every time before an STR transition (ie. before
243*151f4e2bSMauro Carvalho Chehabyou run s2ram), and please report the problems with them.
244*151f4e2bSMauro Carvalho Chehab
245*151f4e2bSMauro Carvalho ChehabThere is a debugfs entry which shows the suspend to RAM statistics. Here is an
246*151f4e2bSMauro Carvalho Chehabexample of its output::
247*151f4e2bSMauro Carvalho Chehab
248*151f4e2bSMauro Carvalho Chehab	# mount -t debugfs none /sys/kernel/debug
249*151f4e2bSMauro Carvalho Chehab	# cat /sys/kernel/debug/suspend_stats
250*151f4e2bSMauro Carvalho Chehab	success: 20
251*151f4e2bSMauro Carvalho Chehab	fail: 5
252*151f4e2bSMauro Carvalho Chehab	failed_freeze: 0
253*151f4e2bSMauro Carvalho Chehab	failed_prepare: 0
254*151f4e2bSMauro Carvalho Chehab	failed_suspend: 5
255*151f4e2bSMauro Carvalho Chehab	failed_suspend_noirq: 0
256*151f4e2bSMauro Carvalho Chehab	failed_resume: 0
257*151f4e2bSMauro Carvalho Chehab	failed_resume_noirq: 0
258*151f4e2bSMauro Carvalho Chehab	failures:
259*151f4e2bSMauro Carvalho Chehab	  last_failed_dev:	alarm
260*151f4e2bSMauro Carvalho Chehab				adc
261*151f4e2bSMauro Carvalho Chehab	  last_failed_errno:	-16
262*151f4e2bSMauro Carvalho Chehab				-16
263*151f4e2bSMauro Carvalho Chehab	  last_failed_step:	suspend
264*151f4e2bSMauro Carvalho Chehab				suspend
265*151f4e2bSMauro Carvalho Chehab
266*151f4e2bSMauro Carvalho ChehabField success means the success number of suspend to RAM, and field fail means
267*151f4e2bSMauro Carvalho Chehabthe failure number. Others are the failure number of different steps of suspend
268*151f4e2bSMauro Carvalho Chehabto RAM. suspend_stats just lists the last 2 failed devices, error number and
269*151f4e2bSMauro Carvalho Chehabfailed step of suspend.
270