1151f4e2bSMauro Carvalho Chehab=================================
2151f4e2bSMauro Carvalho ChehabDebugging hibernation and suspend
3151f4e2bSMauro Carvalho Chehab=================================
4151f4e2bSMauro Carvalho Chehab
5151f4e2bSMauro Carvalho Chehab	(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
6151f4e2bSMauro Carvalho Chehab
7151f4e2bSMauro Carvalho Chehab1. Testing hibernation (aka suspend to disk or STD)
8151f4e2bSMauro Carvalho Chehab===================================================
9151f4e2bSMauro Carvalho Chehab
10151f4e2bSMauro Carvalho ChehabTo check if hibernation works, you can try to hibernate in the "reboot" mode::
11151f4e2bSMauro Carvalho Chehab
12151f4e2bSMauro Carvalho Chehab	# echo reboot > /sys/power/disk
13151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
14151f4e2bSMauro Carvalho Chehab
15151f4e2bSMauro Carvalho Chehaband the system should create a hibernation image, reboot, resume and get back to
16151f4e2bSMauro Carvalho Chehabthe command prompt where you have started the transition.  If that happens,
17151f4e2bSMauro Carvalho Chehabhibernation is most likely to work correctly.  Still, you need to repeat the
18151f4e2bSMauro Carvalho Chehabtest at least a couple of times in a row for confidence.  [This is necessary,
19151f4e2bSMauro Carvalho Chehabbecause some problems only show up on a second attempt at suspending and
20151f4e2bSMauro Carvalho Chehabresuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
21151f4e2bSMauro Carvalho Chehabmodes causes the PM core to skip some platform-related callbacks which on ACPI
22151f4e2bSMauro Carvalho Chehabsystems might be necessary to make hibernation work.  Thus, if your machine
23151f4e2bSMauro Carvalho Chehabfails to hibernate or resume in the "reboot" mode, you should try the
24151f4e2bSMauro Carvalho Chehab"platform" mode::
25151f4e2bSMauro Carvalho Chehab
26151f4e2bSMauro Carvalho Chehab	# echo platform > /sys/power/disk
27151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
28151f4e2bSMauro Carvalho Chehab
29151f4e2bSMauro Carvalho Chehabwhich is the default and recommended mode of hibernation.
30151f4e2bSMauro Carvalho Chehab
31151f4e2bSMauro Carvalho ChehabUnfortunately, the "platform" mode of hibernation does not work on some systems
32151f4e2bSMauro Carvalho Chehabwith broken BIOSes.  In such cases the "shutdown" mode of hibernation might
33151f4e2bSMauro Carvalho Chehabwork::
34151f4e2bSMauro Carvalho Chehab
35151f4e2bSMauro Carvalho Chehab	# echo shutdown > /sys/power/disk
36151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
37151f4e2bSMauro Carvalho Chehab
38151f4e2bSMauro Carvalho Chehab(it is similar to the "reboot" mode, but it requires you to press the power
39151f4e2bSMauro Carvalho Chehabbutton to make the system resume).
40151f4e2bSMauro Carvalho Chehab
41151f4e2bSMauro Carvalho ChehabIf neither "platform" nor "shutdown" hibernation mode works, you will need to
42151f4e2bSMauro Carvalho Chehabidentify what goes wrong.
43151f4e2bSMauro Carvalho Chehab
44151f4e2bSMauro Carvalho Chehaba) Test modes of hibernation
45151f4e2bSMauro Carvalho Chehab----------------------------
46151f4e2bSMauro Carvalho Chehab
47151f4e2bSMauro Carvalho ChehabTo find out why hibernation fails on your system, you can use a special testing
48151f4e2bSMauro Carvalho Chehabfacility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
49151f4e2bSMauro Carvalho Chehabthere is the file /sys/power/pm_test that can be used to make the hibernation
50151f4e2bSMauro Carvalho Chehabcore run in a test mode.  There are 5 test modes available:
51151f4e2bSMauro Carvalho Chehab
52151f4e2bSMauro Carvalho Chehabfreezer
53151f4e2bSMauro Carvalho Chehab	- test the freezing of processes
54151f4e2bSMauro Carvalho Chehab
55151f4e2bSMauro Carvalho Chehabdevices
56151f4e2bSMauro Carvalho Chehab	- test the freezing of processes and suspending of devices
57151f4e2bSMauro Carvalho Chehab
58151f4e2bSMauro Carvalho Chehabplatform
59151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices and platform
60151f4e2bSMauro Carvalho Chehab	  global control methods [1]_
61151f4e2bSMauro Carvalho Chehab
62151f4e2bSMauro Carvalho Chehabprocessors
63151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices, platform
64151f4e2bSMauro Carvalho Chehab	  global control methods [1]_ and the disabling of nonboot CPUs
65151f4e2bSMauro Carvalho Chehab
66151f4e2bSMauro Carvalho Chehabcore
67151f4e2bSMauro Carvalho Chehab	- test the freezing of processes, suspending of devices, platform global
68151f4e2bSMauro Carvalho Chehab	  control methods\ [1]_, the disabling of nonboot CPUs and suspending
69151f4e2bSMauro Carvalho Chehab	  of platform/system devices
70151f4e2bSMauro Carvalho Chehab
71151f4e2bSMauro Carvalho Chehab.. [1]
72151f4e2bSMauro Carvalho Chehab
73151f4e2bSMauro Carvalho Chehab    the platform global control methods are only available on ACPI systems
74151f4e2bSMauro Carvalho Chehab    and are only tested if the hibernation mode is set to "platform"
75151f4e2bSMauro Carvalho Chehab
76151f4e2bSMauro Carvalho ChehabTo use one of them it is necessary to write the corresponding string to
77151f4e2bSMauro Carvalho Chehab/sys/power/pm_test (eg. "devices" to test the freezing of processes and
78151f4e2bSMauro Carvalho Chehabsuspending devices) and issue the standard hibernation commands.  For example,
79151f4e2bSMauro Carvalho Chehabto use the "devices" test mode along with the "platform" mode of hibernation,
80151f4e2bSMauro Carvalho Chehabyou should do the following::
81151f4e2bSMauro Carvalho Chehab
82151f4e2bSMauro Carvalho Chehab	# echo devices > /sys/power/pm_test
83151f4e2bSMauro Carvalho Chehab	# echo platform > /sys/power/disk
84151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
85151f4e2bSMauro Carvalho Chehab
86151f4e2bSMauro Carvalho ChehabThen, the kernel will try to freeze processes, suspend devices, wait a few
87151f4e2bSMauro Carvalho Chehabseconds (5 by default, but configurable by the suspend.pm_test_delay module
88151f4e2bSMauro Carvalho Chehabparameter), resume devices and thaw processes.  If "platform" is written to
89151f4e2bSMauro Carvalho Chehab/sys/power/pm_test , then after suspending devices the kernel will additionally
90151f4e2bSMauro Carvalho Chehabinvoke the global control methods (eg. ACPI global control methods) used to
91151f4e2bSMauro Carvalho Chehabprepare the platform firmware for hibernation.  Next, it will wait a
92151f4e2bSMauro Carvalho Chehabconfigurable number of seconds and invoke the platform (eg. ACPI) global
93151f4e2bSMauro Carvalho Chehabmethods used to cancel hibernation etc.
94151f4e2bSMauro Carvalho Chehab
95151f4e2bSMauro Carvalho ChehabWriting "none" to /sys/power/pm_test causes the kernel to switch to the normal
96151f4e2bSMauro Carvalho Chehabhibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
97151f4e2bSMauro Carvalho Chehabcontains a space-separated list of all available tests (including "none" that
98151f4e2bSMauro Carvalho Chehabrepresents the normal functionality) in which the current test level is
99151f4e2bSMauro Carvalho Chehabindicated by square brackets.
100151f4e2bSMauro Carvalho Chehab
101151f4e2bSMauro Carvalho ChehabGenerally, as you can see, each test level is more "invasive" than the previous
102151f4e2bSMauro Carvalho Chehabone and the "core" level tests the hardware and drivers as deeply as possible
103151f4e2bSMauro Carvalho Chehabwithout creating a hibernation image.  Obviously, if the "devices" test fails,
104151f4e2bSMauro Carvalho Chehabthe "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
105151f4e2bSMauro Carvalho Chehabshould try the test modes starting from "freezer", through "devices", "platform"
106151f4e2bSMauro Carvalho Chehaband "processors" up to "core" (repeat the test on each level a couple of times
107151f4e2bSMauro Carvalho Chehabto make sure that any random factors are avoided).
108151f4e2bSMauro Carvalho Chehab
109151f4e2bSMauro Carvalho ChehabIf the "freezer" test fails, there is a task that cannot be frozen (in that case
110151f4e2bSMauro Carvalho Chehabit usually is possible to identify the offending task by analysing the output of
111151f4e2bSMauro Carvalho Chehabdmesg obtained after the failing test).  Failure at this level usually means
112151f4e2bSMauro Carvalho Chehabthat there is a problem with the tasks freezer subsystem that should be
113151f4e2bSMauro Carvalho Chehabreported.
114151f4e2bSMauro Carvalho Chehab
115151f4e2bSMauro Carvalho ChehabIf the "devices" test fails, most likely there is a driver that cannot suspend
116151f4e2bSMauro Carvalho Chehabor resume its device (in the latter case the system may hang or become unstable
117151f4e2bSMauro Carvalho Chehabafter the test, so please take that into consideration).  To find this driver,
118151f4e2bSMauro Carvalho Chehabyou can carry out a binary search according to the rules:
119151f4e2bSMauro Carvalho Chehab
120151f4e2bSMauro Carvalho Chehab- if the test fails, unload a half of the drivers currently loaded and repeat
121151f4e2bSMauro Carvalho Chehab  (that would probably involve rebooting the system, so always note what drivers
122151f4e2bSMauro Carvalho Chehab  have been loaded before the test),
123151f4e2bSMauro Carvalho Chehab- if the test succeeds, load a half of the drivers you have unloaded most
124151f4e2bSMauro Carvalho Chehab  recently and repeat.
125151f4e2bSMauro Carvalho Chehab
126151f4e2bSMauro Carvalho ChehabOnce you have found the failing driver (there can be more than just one of
127151f4e2bSMauro Carvalho Chehabthem), you have to unload it every time before hibernation.  In that case please
128151f4e2bSMauro Carvalho Chehabmake sure to report the problem with the driver.
129151f4e2bSMauro Carvalho Chehab
130151f4e2bSMauro Carvalho ChehabIt is also possible that the "devices" test will still fail after you have
131151f4e2bSMauro Carvalho Chehabunloaded all modules. In that case, you may want to look in your kernel
132151f4e2bSMauro Carvalho Chehabconfiguration for the drivers that can be compiled as modules (and test again
133151f4e2bSMauro Carvalho Chehabwith these drivers compiled as modules).  You may also try to use some special
134151f4e2bSMauro Carvalho Chehabkernel command line options such as "noapic", "noacpi" or even "acpi=off".
135151f4e2bSMauro Carvalho Chehab
136151f4e2bSMauro Carvalho ChehabIf the "platform" test fails, there is a problem with the handling of the
137151f4e2bSMauro Carvalho Chehabplatform (eg. ACPI) firmware on your system.  In that case the "platform" mode
138151f4e2bSMauro Carvalho Chehabof hibernation is not likely to work.  You can try the "shutdown" mode, but that
139151f4e2bSMauro Carvalho Chehabis rather a poor man's workaround.
140151f4e2bSMauro Carvalho Chehab
141151f4e2bSMauro Carvalho ChehabIf the "processors" test fails, the disabling/enabling of nonboot CPUs does not
142151f4e2bSMauro Carvalho Chehabwork (of course, this only may be an issue on SMP systems) and the problem
143151f4e2bSMauro Carvalho Chehabshould be reported.  In that case you can also try to switch the nonboot CPUs
144151f4e2bSMauro Carvalho Chehaboff and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
145151f4e2bSMauro Carvalho Chehabsee if that works.
146151f4e2bSMauro Carvalho Chehab
147151f4e2bSMauro Carvalho ChehabIf the "core" test fails, which means that suspending of the system/platform
148151f4e2bSMauro Carvalho Chehabdevices has failed (these devices are suspended on one CPU with interrupts off),
149151f4e2bSMauro Carvalho Chehabthe problem is most probably hardware-related and serious, so it should be
150151f4e2bSMauro Carvalho Chehabreported.
151151f4e2bSMauro Carvalho Chehab
152151f4e2bSMauro Carvalho ChehabA failure of any of the "platform", "processors" or "core" tests may cause your
153151f4e2bSMauro Carvalho Chehabsystem to hang or become unstable, so please beware.  Such a failure usually
154151f4e2bSMauro Carvalho Chehabindicates a serious problem that very well may be related to the hardware, but
155151f4e2bSMauro Carvalho Chehabplease report it anyway.
156151f4e2bSMauro Carvalho Chehab
157151f4e2bSMauro Carvalho Chehabb) Testing minimal configuration
158151f4e2bSMauro Carvalho Chehab--------------------------------
159151f4e2bSMauro Carvalho Chehab
160151f4e2bSMauro Carvalho ChehabIf all of the hibernation test modes work, you can boot the system with the
161151f4e2bSMauro Carvalho Chehab"init=/bin/bash" command line parameter and attempt to hibernate in the
162151f4e2bSMauro Carvalho Chehab"reboot", "shutdown" and "platform" modes.  If that does not work, there
163151f4e2bSMauro Carvalho Chehabprobably is a problem with a driver statically compiled into the kernel and you
164151f4e2bSMauro Carvalho Chehabcan try to compile more drivers as modules, so that they can be tested
165151f4e2bSMauro Carvalho Chehabindividually.  Otherwise, there is a problem with a modular driver and you can
166151f4e2bSMauro Carvalho Chehabfind it by loading a half of the modules you normally use and binary searching
167151f4e2bSMauro Carvalho Chehabin accordance with the algorithm:
168151f4e2bSMauro Carvalho Chehab- if there are n modules loaded and the attempt to suspend and resume fails,
169151f4e2bSMauro Carvalho Chehabunload n/2 of the modules and try again (that would probably involve rebooting
170151f4e2bSMauro Carvalho Chehabthe system),
171151f4e2bSMauro Carvalho Chehab- if there are n modules loaded and the attempt to suspend and resume succeeds,
172151f4e2bSMauro Carvalho Chehabload n/2 modules more and try again.
173151f4e2bSMauro Carvalho Chehab
174151f4e2bSMauro Carvalho ChehabAgain, if you find the offending module(s), it(they) must be unloaded every time
175151f4e2bSMauro Carvalho Chehabbefore hibernation, and please report the problem with it(them).
176151f4e2bSMauro Carvalho Chehab
177151f4e2bSMauro Carvalho Chehabc) Using the "test_resume" hibernation option
178151f4e2bSMauro Carvalho Chehab---------------------------------------------
179151f4e2bSMauro Carvalho Chehab
180151f4e2bSMauro Carvalho Chehab/sys/power/disk generally tells the kernel what to do after creating a
181151f4e2bSMauro Carvalho Chehabhibernation image.  One of the available options is "test_resume" which
182151f4e2bSMauro Carvalho Chehabcauses the just created image to be used for immediate restoration.  Namely,
183151f4e2bSMauro Carvalho Chehabafter doing::
184151f4e2bSMauro Carvalho Chehab
185151f4e2bSMauro Carvalho Chehab	# echo test_resume > /sys/power/disk
186151f4e2bSMauro Carvalho Chehab	# echo disk > /sys/power/state
187151f4e2bSMauro Carvalho Chehab
188151f4e2bSMauro Carvalho Chehaba hibernation image will be created and a resume from it will be triggered
189151f4e2bSMauro Carvalho Chehabimmediately without involving the platform firmware in any way.
190151f4e2bSMauro Carvalho Chehab
191151f4e2bSMauro Carvalho ChehabThat test can be used to check if failures to resume from hibernation are
192151f4e2bSMauro Carvalho Chehabrelated to bad interactions with the platform firmware.  That is, if the above
193151f4e2bSMauro Carvalho Chehabworks every time, but resume from actual hibernation does not work or is
194151f4e2bSMauro Carvalho Chehabunreliable, the platform firmware may be responsible for the failures.
195151f4e2bSMauro Carvalho Chehab
196151f4e2bSMauro Carvalho ChehabOn architectures and platforms that support using different kernels to restore
197151f4e2bSMauro Carvalho Chehabhibernation images (that is, the kernel used to read the image from storage and
198151f4e2bSMauro Carvalho Chehabload it into memory is different from the one included in the image) or support
199151f4e2bSMauro Carvalho Chehabkernel address space randomization, it also can be used to check if failures
200151f4e2bSMauro Carvalho Chehabto resume may be related to the differences between the restore and image
201151f4e2bSMauro Carvalho Chehabkernels.
202151f4e2bSMauro Carvalho Chehab
203151f4e2bSMauro Carvalho Chehabd) Advanced debugging
204151f4e2bSMauro Carvalho Chehab---------------------
205151f4e2bSMauro Carvalho Chehab
206151f4e2bSMauro Carvalho ChehabIn case that hibernation does not work on your system even in the minimal
207151f4e2bSMauro Carvalho Chehabconfiguration and compiling more drivers as modules is not practical or some
208151f4e2bSMauro Carvalho Chehabmodules cannot be unloaded, you can use one of the more advanced debugging
209151f4e2bSMauro Carvalho Chehabtechniques to find the problem.  First, if there is a serial port in your box,
210151f4e2bSMauro Carvalho Chehabyou can boot the kernel with the 'no_console_suspend' parameter and try to log
211151f4e2bSMauro Carvalho Chehabkernel messages using the serial console.  This may provide you with some
212151f4e2bSMauro Carvalho Chehabinformation about the reasons of the suspend (resume) failure.  Alternatively,
213151f4e2bSMauro Carvalho Chehabit may be possible to use a FireWire port for debugging with firescope
214151f4e2bSMauro Carvalho Chehab(http://v3.sk/~lkundrak/firescope/).  On x86 it is also possible to
215151f4e2bSMauro Carvalho Chehabuse the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
216151f4e2bSMauro Carvalho Chehab
217151f4e2bSMauro Carvalho Chehab2. Testing suspend to RAM (STR)
218151f4e2bSMauro Carvalho Chehab===============================
219151f4e2bSMauro Carvalho Chehab
220151f4e2bSMauro Carvalho ChehabTo verify that the STR works, it is generally more convenient to use the s2ram
221151f4e2bSMauro Carvalho Chehabtool available from http://suspend.sf.net and documented at
222151f4e2bSMauro Carvalho Chehabhttp://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
223151f4e2bSMauro Carvalho Chehab
224151f4e2bSMauro Carvalho ChehabNamely, after writing "freezer", "devices", "platform", "processors", or "core"
225151f4e2bSMauro Carvalho Chehabinto /sys/power/pm_test (available if the kernel is compiled with
226151f4e2bSMauro Carvalho ChehabCONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
227151f4e2bSMauro Carvalho Chehabto given string.  The STR test modes are defined in the same way as for
228151f4e2bSMauro Carvalho Chehabhibernation, so please refer to Section 1 for more information about them.  In
229151f4e2bSMauro Carvalho Chehabparticular, the "core" test allows you to test everything except for the actual
230151f4e2bSMauro Carvalho Chehabinvocation of the platform firmware in order to put the system into the sleep
231151f4e2bSMauro Carvalho Chehabstate.
232151f4e2bSMauro Carvalho Chehab
233151f4e2bSMauro Carvalho ChehabAmong other things, the testing with the help of /sys/power/pm_test may allow
234151f4e2bSMauro Carvalho Chehabyou to identify drivers that fail to suspend or resume their devices.  They
235151f4e2bSMauro Carvalho Chehabshould be unloaded every time before an STR transition.
236151f4e2bSMauro Carvalho Chehab
237151f4e2bSMauro Carvalho ChehabNext, you can follow the instructions at S2RAM_LINK to test the system, but if
238151f4e2bSMauro Carvalho Chehabit does not work "out of the box", you may need to boot it with
239151f4e2bSMauro Carvalho Chehab"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,
240151f4e2bSMauro Carvalho Chehabyou may be able to search for failing drivers by following the procedure
241151f4e2bSMauro Carvalho Chehabanalogous to the one described in section 1.  If you find some failing drivers,
242151f4e2bSMauro Carvalho Chehabyou will have to unload them every time before an STR transition (ie. before
243151f4e2bSMauro Carvalho Chehabyou run s2ram), and please report the problems with them.
244151f4e2bSMauro Carvalho Chehab
245151f4e2bSMauro Carvalho ChehabThere is a debugfs entry which shows the suspend to RAM statistics. Here is an
246151f4e2bSMauro Carvalho Chehabexample of its output::
247151f4e2bSMauro Carvalho Chehab
248151f4e2bSMauro Carvalho Chehab	# mount -t debugfs none /sys/kernel/debug
249151f4e2bSMauro Carvalho Chehab	# cat /sys/kernel/debug/suspend_stats
250151f4e2bSMauro Carvalho Chehab	success: 20
251151f4e2bSMauro Carvalho Chehab	fail: 5
252151f4e2bSMauro Carvalho Chehab	failed_freeze: 0
253151f4e2bSMauro Carvalho Chehab	failed_prepare: 0
254151f4e2bSMauro Carvalho Chehab	failed_suspend: 5
255151f4e2bSMauro Carvalho Chehab	failed_suspend_noirq: 0
256151f4e2bSMauro Carvalho Chehab	failed_resume: 0
257151f4e2bSMauro Carvalho Chehab	failed_resume_noirq: 0
258151f4e2bSMauro Carvalho Chehab	failures:
259151f4e2bSMauro Carvalho Chehab	  last_failed_dev:	alarm
260151f4e2bSMauro Carvalho Chehab				adc
261151f4e2bSMauro Carvalho Chehab	  last_failed_errno:	-16
262151f4e2bSMauro Carvalho Chehab				-16
263151f4e2bSMauro Carvalho Chehab	  last_failed_step:	suspend
264151f4e2bSMauro Carvalho Chehab				suspend
265151f4e2bSMauro Carvalho Chehab
266151f4e2bSMauro Carvalho ChehabField success means the success number of suspend to RAM, and field fail means
267151f4e2bSMauro Carvalho Chehabthe failure number. Others are the failure number of different steps of suspend
268151f4e2bSMauro Carvalho Chehabto RAM. suspend_stats just lists the last 2 failed devices, error number and
269151f4e2bSMauro Carvalho Chehabfailed step of suspend.
270