1cc2a2d19SMauro Carvalho Chehab=============================
2cc2a2d19SMauro Carvalho ChehabThe Linux Watchdog driver API
3cc2a2d19SMauro Carvalho Chehab=============================
4cc2a2d19SMauro Carvalho Chehab
5cc2a2d19SMauro Carvalho ChehabLast reviewed: 10/05/2007
6cc2a2d19SMauro Carvalho Chehab
7cc2a2d19SMauro Carvalho Chehab
8cc2a2d19SMauro Carvalho Chehab
9cc2a2d19SMauro Carvalho ChehabCopyright 2002 Christer Weingel <wingel@nano-system.com>
10cc2a2d19SMauro Carvalho Chehab
11cc2a2d19SMauro Carvalho ChehabSome parts of this document are copied verbatim from the sbc60xxwdt
12cc2a2d19SMauro Carvalho Chehabdriver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
13cc2a2d19SMauro Carvalho Chehab
14cc2a2d19SMauro Carvalho ChehabThis document describes the state of the Linux 2.4.18 kernel.
15cc2a2d19SMauro Carvalho Chehab
16cc2a2d19SMauro Carvalho ChehabIntroduction
17cc2a2d19SMauro Carvalho Chehab============
18cc2a2d19SMauro Carvalho Chehab
19cc2a2d19SMauro Carvalho ChehabA Watchdog Timer (WDT) is a hardware circuit that can reset the
20cc2a2d19SMauro Carvalho Chehabcomputer system in case of a software fault.  You probably knew that
21cc2a2d19SMauro Carvalho Chehabalready.
22cc2a2d19SMauro Carvalho Chehab
23cc2a2d19SMauro Carvalho ChehabUsually a userspace daemon will notify the kernel watchdog driver via the
24cc2a2d19SMauro Carvalho Chehab/dev/watchdog special device file that userspace is still alive, at
25cc2a2d19SMauro Carvalho Chehabregular intervals.  When such a notification occurs, the driver will
26cc2a2d19SMauro Carvalho Chehabusually tell the hardware watchdog that everything is in order, and
27cc2a2d19SMauro Carvalho Chehabthat the watchdog should wait for yet another little while to reset
28cc2a2d19SMauro Carvalho Chehabthe system.  If userspace fails (RAM error, kernel bug, whatever), the
29cc2a2d19SMauro Carvalho Chehabnotifications cease to occur, and the hardware watchdog will reset the
30cc2a2d19SMauro Carvalho Chehabsystem (causing a reboot) after the timeout occurs.
31cc2a2d19SMauro Carvalho Chehab
32cc2a2d19SMauro Carvalho ChehabThe Linux watchdog API is a rather ad-hoc construction and different
33cc2a2d19SMauro Carvalho Chehabdrivers implement different, and sometimes incompatible, parts of it.
34cc2a2d19SMauro Carvalho ChehabThis file is an attempt to document the existing usage and allow
35cc2a2d19SMauro Carvalho Chehabfuture driver writers to use it as a reference.
36cc2a2d19SMauro Carvalho Chehab
37cc2a2d19SMauro Carvalho ChehabThe simplest API
38cc2a2d19SMauro Carvalho Chehab================
39cc2a2d19SMauro Carvalho Chehab
40cc2a2d19SMauro Carvalho ChehabAll drivers support the basic mode of operation, where the watchdog
41cc2a2d19SMauro Carvalho Chehabactivates as soon as /dev/watchdog is opened and will reboot unless
42cc2a2d19SMauro Carvalho Chehabthe watchdog is pinged within a certain time, this time is called the
43cc2a2d19SMauro Carvalho Chehabtimeout or margin.  The simplest way to ping the watchdog is to write
44cc2a2d19SMauro Carvalho Chehabsome data to the device.  So a very simple watchdog daemon would look
45cc2a2d19SMauro Carvalho Chehablike this source file:  see samples/watchdog/watchdog-simple.c
46cc2a2d19SMauro Carvalho Chehab
47cc2a2d19SMauro Carvalho ChehabA more advanced driver could for example check that a HTTP server is
48cc2a2d19SMauro Carvalho Chehabstill responding before doing the write call to ping the watchdog.
49cc2a2d19SMauro Carvalho Chehab
50cc2a2d19SMauro Carvalho ChehabWhen the device is closed, the watchdog is disabled, unless the "Magic
51cc2a2d19SMauro Carvalho ChehabClose" feature is supported (see below).  This is not always such a
52cc2a2d19SMauro Carvalho Chehabgood idea, since if there is a bug in the watchdog daemon and it
53cc2a2d19SMauro Carvalho Chehabcrashes the system will not reboot.  Because of this, some of the
54cc2a2d19SMauro Carvalho Chehabdrivers support the configuration option "Disable watchdog shutdown on
55cc2a2d19SMauro Carvalho Chehabclose", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
56cc2a2d19SMauro Carvalho Chehabthe kernel, there is no way of disabling the watchdog once it has been
57cc2a2d19SMauro Carvalho Chehabstarted.  So, if the watchdog daemon crashes, the system will reboot
58cc2a2d19SMauro Carvalho Chehabafter the timeout has passed. Watchdog devices also usually support
59cc2a2d19SMauro Carvalho Chehabthe nowayout module parameter so that this option can be controlled at
60cc2a2d19SMauro Carvalho Chehabruntime.
61cc2a2d19SMauro Carvalho Chehab
62cc2a2d19SMauro Carvalho ChehabMagic Close feature
63cc2a2d19SMauro Carvalho Chehab===================
64cc2a2d19SMauro Carvalho Chehab
65cc2a2d19SMauro Carvalho ChehabIf a driver supports "Magic Close", the driver will not disable the
66cc2a2d19SMauro Carvalho Chehabwatchdog unless a specific magic character 'V' has been sent to
67cc2a2d19SMauro Carvalho Chehab/dev/watchdog just before closing the file.  If the userspace daemon
68cc2a2d19SMauro Carvalho Chehabcloses the file without sending this special character, the driver
69cc2a2d19SMauro Carvalho Chehabwill assume that the daemon (and userspace in general) died, and will
70cc2a2d19SMauro Carvalho Chehabstop pinging the watchdog without disabling it first.  This will then
71cc2a2d19SMauro Carvalho Chehabcause a reboot if the watchdog is not re-opened in sufficient time.
72cc2a2d19SMauro Carvalho Chehab
73cc2a2d19SMauro Carvalho ChehabThe ioctl API
74cc2a2d19SMauro Carvalho Chehab=============
75cc2a2d19SMauro Carvalho Chehab
76cc2a2d19SMauro Carvalho ChehabAll conforming drivers also support an ioctl API.
77cc2a2d19SMauro Carvalho Chehab
78cc2a2d19SMauro Carvalho ChehabPinging the watchdog using an ioctl:
79cc2a2d19SMauro Carvalho Chehab
80cc2a2d19SMauro Carvalho ChehabAll drivers that have an ioctl interface support at least one ioctl,
81cc2a2d19SMauro Carvalho ChehabKEEPALIVE.  This ioctl does exactly the same thing as a write to the
82cc2a2d19SMauro Carvalho Chehabwatchdog device, so the main loop in the above program could be
83cc2a2d19SMauro Carvalho Chehabreplaced with::
84cc2a2d19SMauro Carvalho Chehab
85cc2a2d19SMauro Carvalho Chehab	while (1) {
86cc2a2d19SMauro Carvalho Chehab		ioctl(fd, WDIOC_KEEPALIVE, 0);
87cc2a2d19SMauro Carvalho Chehab		sleep(10);
88cc2a2d19SMauro Carvalho Chehab	}
89cc2a2d19SMauro Carvalho Chehab
90cc2a2d19SMauro Carvalho Chehabthe argument to the ioctl is ignored.
91cc2a2d19SMauro Carvalho Chehab
92cc2a2d19SMauro Carvalho ChehabSetting and getting the timeout
93cc2a2d19SMauro Carvalho Chehab===============================
94cc2a2d19SMauro Carvalho Chehab
95cc2a2d19SMauro Carvalho ChehabFor some drivers it is possible to modify the watchdog timeout on the
96cc2a2d19SMauro Carvalho Chehabfly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
97cc2a2d19SMauro Carvalho Chehabflag set in their option field.  The argument is an integer
98cc2a2d19SMauro Carvalho Chehabrepresenting the timeout in seconds.  The driver returns the real
99cc2a2d19SMauro Carvalho Chehabtimeout used in the same variable, and this timeout might differ from
100cc2a2d19SMauro Carvalho Chehabthe requested one due to limitation of the hardware::
101cc2a2d19SMauro Carvalho Chehab
102cc2a2d19SMauro Carvalho Chehab    int timeout = 45;
103cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
104cc2a2d19SMauro Carvalho Chehab    printf("The timeout was set to %d seconds\n", timeout);
105cc2a2d19SMauro Carvalho Chehab
106cc2a2d19SMauro Carvalho ChehabThis example might actually print "The timeout was set to 60 seconds"
107cc2a2d19SMauro Carvalho Chehabif the device has a granularity of minutes for its timeout.
108cc2a2d19SMauro Carvalho Chehab
109cc2a2d19SMauro Carvalho ChehabStarting with the Linux 2.4.18 kernel, it is possible to query the
110cc2a2d19SMauro Carvalho Chehabcurrent timeout using the GETTIMEOUT ioctl::
111cc2a2d19SMauro Carvalho Chehab
112cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
113cc2a2d19SMauro Carvalho Chehab    printf("The timeout was is %d seconds\n", timeout);
114cc2a2d19SMauro Carvalho Chehab
115cc2a2d19SMauro Carvalho ChehabPretimeouts
116cc2a2d19SMauro Carvalho Chehab===========
117cc2a2d19SMauro Carvalho Chehab
118cc2a2d19SMauro Carvalho ChehabSome watchdog timers can be set to have a trigger go off before the
119cc2a2d19SMauro Carvalho Chehabactual time they will reset the system.  This can be done with an NMI,
120cc2a2d19SMauro Carvalho Chehabinterrupt, or other mechanism.  This allows Linux to record useful
121cc2a2d19SMauro Carvalho Chehabinformation (like panic information and kernel coredumps) before it
122cc2a2d19SMauro Carvalho Chehabresets::
123cc2a2d19SMauro Carvalho Chehab
124cc2a2d19SMauro Carvalho Chehab    pretimeout = 10;
125cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
126cc2a2d19SMauro Carvalho Chehab
127cc2a2d19SMauro Carvalho ChehabNote that the pretimeout is the number of seconds before the time
128cc2a2d19SMauro Carvalho Chehabwhen the timeout will go off.  It is not the number of seconds until
129cc2a2d19SMauro Carvalho Chehabthe pretimeout.  So, for instance, if you set the timeout to 60 seconds
130cc2a2d19SMauro Carvalho Chehaband the pretimeout to 10 seconds, the pretimeout will go off in 50
131cc2a2d19SMauro Carvalho Chehabseconds.  Setting a pretimeout to zero disables it.
132cc2a2d19SMauro Carvalho Chehab
133cc2a2d19SMauro Carvalho ChehabThere is also a get function for getting the pretimeout::
134cc2a2d19SMauro Carvalho Chehab
135cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
136cc2a2d19SMauro Carvalho Chehab    printf("The pretimeout was is %d seconds\n", timeout);
137cc2a2d19SMauro Carvalho Chehab
138cc2a2d19SMauro Carvalho ChehabNot all watchdog drivers will support a pretimeout.
139cc2a2d19SMauro Carvalho Chehab
140cc2a2d19SMauro Carvalho ChehabGet the number of seconds before reboot
141cc2a2d19SMauro Carvalho Chehab=======================================
142cc2a2d19SMauro Carvalho Chehab
143cc2a2d19SMauro Carvalho ChehabSome watchdog drivers have the ability to report the remaining time
144cc2a2d19SMauro Carvalho Chehabbefore the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
145cc2a2d19SMauro Carvalho Chehabthat returns the number of seconds before reboot::
146cc2a2d19SMauro Carvalho Chehab
147cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
148cc2a2d19SMauro Carvalho Chehab    printf("The timeout was is %d seconds\n", timeleft);
149cc2a2d19SMauro Carvalho Chehab
150cc2a2d19SMauro Carvalho ChehabEnvironmental monitoring
151cc2a2d19SMauro Carvalho Chehab========================
152cc2a2d19SMauro Carvalho Chehab
153cc2a2d19SMauro Carvalho ChehabAll watchdog drivers are required return more information about the system,
154cc2a2d19SMauro Carvalho Chehabsome do temperature, fan and power level monitoring, some can tell you
155cc2a2d19SMauro Carvalho Chehabthe reason for the last reboot of the system.  The GETSUPPORT ioctl is
156cc2a2d19SMauro Carvalho Chehabavailable to ask what the device can do::
157cc2a2d19SMauro Carvalho Chehab
158cc2a2d19SMauro Carvalho Chehab	struct watchdog_info ident;
159cc2a2d19SMauro Carvalho Chehab	ioctl(fd, WDIOC_GETSUPPORT, &ident);
160cc2a2d19SMauro Carvalho Chehab
161cc2a2d19SMauro Carvalho Chehabthe fields returned in the ident struct are:
162cc2a2d19SMauro Carvalho Chehab
163cc2a2d19SMauro Carvalho Chehab	================	=============================================
164cc2a2d19SMauro Carvalho Chehab        identity		a string identifying the watchdog driver
165cc2a2d19SMauro Carvalho Chehab	firmware_version	the firmware version of the card if available
166cc2a2d19SMauro Carvalho Chehab	options			a flags describing what the device supports
167cc2a2d19SMauro Carvalho Chehab	================	=============================================
168cc2a2d19SMauro Carvalho Chehab
169cc2a2d19SMauro Carvalho Chehabthe options field can have the following bits set, and describes what
170cc2a2d19SMauro Carvalho Chehabkind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
171d51d3852SAhmad Fatoumreturn.
172cc2a2d19SMauro Carvalho Chehab
173cc2a2d19SMauro Carvalho Chehab	================	=========================
174cc2a2d19SMauro Carvalho Chehab	WDIOF_OVERHEAT		Reset due to CPU overheat
175cc2a2d19SMauro Carvalho Chehab	================	=========================
176cc2a2d19SMauro Carvalho Chehab
177cc2a2d19SMauro Carvalho ChehabThe machine was last rebooted by the watchdog because the thermal limit was
178cc2a2d19SMauro Carvalho Chehabexceeded:
179cc2a2d19SMauro Carvalho Chehab
180cc2a2d19SMauro Carvalho Chehab	==============		==========
181cc2a2d19SMauro Carvalho Chehab	WDIOF_FANFAULT		Fan failed
182cc2a2d19SMauro Carvalho Chehab	==============		==========
183cc2a2d19SMauro Carvalho Chehab
184cc2a2d19SMauro Carvalho ChehabA system fan monitored by the watchdog card has failed
185cc2a2d19SMauro Carvalho Chehab
186cc2a2d19SMauro Carvalho Chehab	=============		================
187cc2a2d19SMauro Carvalho Chehab	WDIOF_EXTERN1		External relay 1
188cc2a2d19SMauro Carvalho Chehab	=============		================
189cc2a2d19SMauro Carvalho Chehab
190cc2a2d19SMauro Carvalho ChehabExternal monitoring relay/source 1 was triggered. Controllers intended for
191cc2a2d19SMauro Carvalho Chehabreal world applications include external monitoring pins that will trigger
192cc2a2d19SMauro Carvalho Chehaba reset.
193cc2a2d19SMauro Carvalho Chehab
194cc2a2d19SMauro Carvalho Chehab	=============		================
195cc2a2d19SMauro Carvalho Chehab	WDIOF_EXTERN2		External relay 2
196cc2a2d19SMauro Carvalho Chehab	=============		================
197cc2a2d19SMauro Carvalho Chehab
198cc2a2d19SMauro Carvalho ChehabExternal monitoring relay/source 2 was triggered
199cc2a2d19SMauro Carvalho Chehab
200cc2a2d19SMauro Carvalho Chehab	================	=====================
201cc2a2d19SMauro Carvalho Chehab	WDIOF_POWERUNDER	Power bad/power fault
202cc2a2d19SMauro Carvalho Chehab	================	=====================
203cc2a2d19SMauro Carvalho Chehab
204cc2a2d19SMauro Carvalho ChehabThe machine is showing an undervoltage status
205cc2a2d19SMauro Carvalho Chehab
206cc2a2d19SMauro Carvalho Chehab	===============		=============================
207cc2a2d19SMauro Carvalho Chehab	WDIOF_CARDRESET		Card previously reset the CPU
208cc2a2d19SMauro Carvalho Chehab	===============		=============================
209cc2a2d19SMauro Carvalho Chehab
210cc2a2d19SMauro Carvalho ChehabThe last reboot was caused by the watchdog card
211cc2a2d19SMauro Carvalho Chehab
212cc2a2d19SMauro Carvalho Chehab	================	=====================
213cc2a2d19SMauro Carvalho Chehab	WDIOF_POWEROVER		Power over voltage
214cc2a2d19SMauro Carvalho Chehab	================	=====================
215cc2a2d19SMauro Carvalho Chehab
216cc2a2d19SMauro Carvalho ChehabThe machine is showing an overvoltage status. Note that if one level is
217cc2a2d19SMauro Carvalho Chehabunder and one over both bits will be set - this may seem odd but makes
218cc2a2d19SMauro Carvalho Chehabsense.
219cc2a2d19SMauro Carvalho Chehab
220cc2a2d19SMauro Carvalho Chehab	===================	=====================
221cc2a2d19SMauro Carvalho Chehab	WDIOF_KEEPALIVEPING	Keep alive ping reply
222cc2a2d19SMauro Carvalho Chehab	===================	=====================
223cc2a2d19SMauro Carvalho Chehab
224cc2a2d19SMauro Carvalho ChehabThe watchdog saw a keepalive ping since it was last queried.
225cc2a2d19SMauro Carvalho Chehab
226cc2a2d19SMauro Carvalho Chehab	================	=======================
227cc2a2d19SMauro Carvalho Chehab	WDIOF_SETTIMEOUT	Can set/get the timeout
228cc2a2d19SMauro Carvalho Chehab	================	=======================
229cc2a2d19SMauro Carvalho Chehab
230cc2a2d19SMauro Carvalho ChehabThe watchdog can do pretimeouts.
231cc2a2d19SMauro Carvalho Chehab
232cc2a2d19SMauro Carvalho Chehab	================	================================
233cc2a2d19SMauro Carvalho Chehab	WDIOF_PRETIMEOUT	Pretimeout (in seconds), get/set
234cc2a2d19SMauro Carvalho Chehab	================	================================
235cc2a2d19SMauro Carvalho Chehab
236cc2a2d19SMauro Carvalho Chehab
237cc2a2d19SMauro Carvalho ChehabFor those drivers that return any bits set in the option field, the
238cc2a2d19SMauro Carvalho ChehabGETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
239cc2a2d19SMauro Carvalho Chehabstatus, and the status at the last reboot, respectively::
240cc2a2d19SMauro Carvalho Chehab
241cc2a2d19SMauro Carvalho Chehab    int flags;
242cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETSTATUS, &flags);
243cc2a2d19SMauro Carvalho Chehab
244cc2a2d19SMauro Carvalho Chehab    or
245cc2a2d19SMauro Carvalho Chehab
246cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
247cc2a2d19SMauro Carvalho Chehab
248cc2a2d19SMauro Carvalho ChehabNote that not all devices support these two calls, and some only
249cc2a2d19SMauro Carvalho Chehabsupport the GETBOOTSTATUS call.
250cc2a2d19SMauro Carvalho Chehab
251cc2a2d19SMauro Carvalho ChehabSome drivers can measure the temperature using the GETTEMP ioctl.  The
252cc2a2d19SMauro Carvalho Chehabreturned value is the temperature in degrees fahrenheit::
253cc2a2d19SMauro Carvalho Chehab
254cc2a2d19SMauro Carvalho Chehab    int temperature;
255cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_GETTEMP, &temperature);
256cc2a2d19SMauro Carvalho Chehab
257cc2a2d19SMauro Carvalho ChehabFinally the SETOPTIONS ioctl can be used to control some aspects of
258cc2a2d19SMauro Carvalho Chehabthe cards operation::
259cc2a2d19SMauro Carvalho Chehab
260cc2a2d19SMauro Carvalho Chehab    int options = 0;
261cc2a2d19SMauro Carvalho Chehab    ioctl(fd, WDIOC_SETOPTIONS, &options);
262cc2a2d19SMauro Carvalho Chehab
263cc2a2d19SMauro Carvalho ChehabThe following options are available:
264cc2a2d19SMauro Carvalho Chehab
265cc2a2d19SMauro Carvalho Chehab	=================	================================
266cc2a2d19SMauro Carvalho Chehab	WDIOS_DISABLECARD	Turn off the watchdog timer
267cc2a2d19SMauro Carvalho Chehab	WDIOS_ENABLECARD	Turn on the watchdog timer
268cc2a2d19SMauro Carvalho Chehab	WDIOS_TEMPPANIC		Kernel panic on temperature trip
269cc2a2d19SMauro Carvalho Chehab	=================	================================
270cc2a2d19SMauro Carvalho Chehab
271cc2a2d19SMauro Carvalho Chehab[FIXME -- better explanations]
272