1cc2a2d19SMauro Carvalho Chehab============================= 2cc2a2d19SMauro Carvalho ChehabThe Linux Watchdog driver API 3cc2a2d19SMauro Carvalho Chehab============================= 4cc2a2d19SMauro Carvalho Chehab 5cc2a2d19SMauro Carvalho ChehabLast reviewed: 10/05/2007 6cc2a2d19SMauro Carvalho Chehab 7cc2a2d19SMauro Carvalho Chehab 8cc2a2d19SMauro Carvalho Chehab 9cc2a2d19SMauro Carvalho ChehabCopyright 2002 Christer Weingel <wingel@nano-system.com> 10cc2a2d19SMauro Carvalho Chehab 11cc2a2d19SMauro Carvalho ChehabSome parts of this document are copied verbatim from the sbc60xxwdt 12cc2a2d19SMauro Carvalho Chehabdriver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk> 13cc2a2d19SMauro Carvalho Chehab 14cc2a2d19SMauro Carvalho ChehabThis document describes the state of the Linux 2.4.18 kernel. 15cc2a2d19SMauro Carvalho Chehab 16cc2a2d19SMauro Carvalho ChehabIntroduction 17cc2a2d19SMauro Carvalho Chehab============ 18cc2a2d19SMauro Carvalho Chehab 19cc2a2d19SMauro Carvalho ChehabA Watchdog Timer (WDT) is a hardware circuit that can reset the 20cc2a2d19SMauro Carvalho Chehabcomputer system in case of a software fault. You probably knew that 21cc2a2d19SMauro Carvalho Chehabalready. 22cc2a2d19SMauro Carvalho Chehab 23cc2a2d19SMauro Carvalho ChehabUsually a userspace daemon will notify the kernel watchdog driver via the 24cc2a2d19SMauro Carvalho Chehab/dev/watchdog special device file that userspace is still alive, at 25cc2a2d19SMauro Carvalho Chehabregular intervals. When such a notification occurs, the driver will 26cc2a2d19SMauro Carvalho Chehabusually tell the hardware watchdog that everything is in order, and 27cc2a2d19SMauro Carvalho Chehabthat the watchdog should wait for yet another little while to reset 28cc2a2d19SMauro Carvalho Chehabthe system. If userspace fails (RAM error, kernel bug, whatever), the 29cc2a2d19SMauro Carvalho Chehabnotifications cease to occur, and the hardware watchdog will reset the 30cc2a2d19SMauro Carvalho Chehabsystem (causing a reboot) after the timeout occurs. 31cc2a2d19SMauro Carvalho Chehab 32cc2a2d19SMauro Carvalho ChehabThe Linux watchdog API is a rather ad-hoc construction and different 33cc2a2d19SMauro Carvalho Chehabdrivers implement different, and sometimes incompatible, parts of it. 34cc2a2d19SMauro Carvalho ChehabThis file is an attempt to document the existing usage and allow 35cc2a2d19SMauro Carvalho Chehabfuture driver writers to use it as a reference. 36cc2a2d19SMauro Carvalho Chehab 37cc2a2d19SMauro Carvalho ChehabThe simplest API 38cc2a2d19SMauro Carvalho Chehab================ 39cc2a2d19SMauro Carvalho Chehab 40cc2a2d19SMauro Carvalho ChehabAll drivers support the basic mode of operation, where the watchdog 41cc2a2d19SMauro Carvalho Chehabactivates as soon as /dev/watchdog is opened and will reboot unless 42cc2a2d19SMauro Carvalho Chehabthe watchdog is pinged within a certain time, this time is called the 43cc2a2d19SMauro Carvalho Chehabtimeout or margin. The simplest way to ping the watchdog is to write 44cc2a2d19SMauro Carvalho Chehabsome data to the device. So a very simple watchdog daemon would look 45cc2a2d19SMauro Carvalho Chehablike this source file: see samples/watchdog/watchdog-simple.c 46cc2a2d19SMauro Carvalho Chehab 47cc2a2d19SMauro Carvalho ChehabA more advanced driver could for example check that a HTTP server is 48cc2a2d19SMauro Carvalho Chehabstill responding before doing the write call to ping the watchdog. 49cc2a2d19SMauro Carvalho Chehab 50cc2a2d19SMauro Carvalho ChehabWhen the device is closed, the watchdog is disabled, unless the "Magic 51cc2a2d19SMauro Carvalho ChehabClose" feature is supported (see below). This is not always such a 52cc2a2d19SMauro Carvalho Chehabgood idea, since if there is a bug in the watchdog daemon and it 53cc2a2d19SMauro Carvalho Chehabcrashes the system will not reboot. Because of this, some of the 54cc2a2d19SMauro Carvalho Chehabdrivers support the configuration option "Disable watchdog shutdown on 55cc2a2d19SMauro Carvalho Chehabclose", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when compiling 56cc2a2d19SMauro Carvalho Chehabthe kernel, there is no way of disabling the watchdog once it has been 57cc2a2d19SMauro Carvalho Chehabstarted. So, if the watchdog daemon crashes, the system will reboot 58cc2a2d19SMauro Carvalho Chehabafter the timeout has passed. Watchdog devices also usually support 59cc2a2d19SMauro Carvalho Chehabthe nowayout module parameter so that this option can be controlled at 60cc2a2d19SMauro Carvalho Chehabruntime. 61cc2a2d19SMauro Carvalho Chehab 62cc2a2d19SMauro Carvalho ChehabMagic Close feature 63cc2a2d19SMauro Carvalho Chehab=================== 64cc2a2d19SMauro Carvalho Chehab 65cc2a2d19SMauro Carvalho ChehabIf a driver supports "Magic Close", the driver will not disable the 66cc2a2d19SMauro Carvalho Chehabwatchdog unless a specific magic character 'V' has been sent to 67cc2a2d19SMauro Carvalho Chehab/dev/watchdog just before closing the file. If the userspace daemon 68cc2a2d19SMauro Carvalho Chehabcloses the file without sending this special character, the driver 69cc2a2d19SMauro Carvalho Chehabwill assume that the daemon (and userspace in general) died, and will 70cc2a2d19SMauro Carvalho Chehabstop pinging the watchdog without disabling it first. This will then 71cc2a2d19SMauro Carvalho Chehabcause a reboot if the watchdog is not re-opened in sufficient time. 72cc2a2d19SMauro Carvalho Chehab 73cc2a2d19SMauro Carvalho ChehabThe ioctl API 74cc2a2d19SMauro Carvalho Chehab============= 75cc2a2d19SMauro Carvalho Chehab 76cc2a2d19SMauro Carvalho ChehabAll conforming drivers also support an ioctl API. 77cc2a2d19SMauro Carvalho Chehab 78cc2a2d19SMauro Carvalho ChehabPinging the watchdog using an ioctl: 79cc2a2d19SMauro Carvalho Chehab 80cc2a2d19SMauro Carvalho ChehabAll drivers that have an ioctl interface support at least one ioctl, 81cc2a2d19SMauro Carvalho ChehabKEEPALIVE. This ioctl does exactly the same thing as a write to the 82cc2a2d19SMauro Carvalho Chehabwatchdog device, so the main loop in the above program could be 83cc2a2d19SMauro Carvalho Chehabreplaced with:: 84cc2a2d19SMauro Carvalho Chehab 85cc2a2d19SMauro Carvalho Chehab while (1) { 86cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_KEEPALIVE, 0); 87cc2a2d19SMauro Carvalho Chehab sleep(10); 88cc2a2d19SMauro Carvalho Chehab } 89cc2a2d19SMauro Carvalho Chehab 90cc2a2d19SMauro Carvalho Chehabthe argument to the ioctl is ignored. 91cc2a2d19SMauro Carvalho Chehab 92cc2a2d19SMauro Carvalho ChehabSetting and getting the timeout 93cc2a2d19SMauro Carvalho Chehab=============================== 94cc2a2d19SMauro Carvalho Chehab 95cc2a2d19SMauro Carvalho ChehabFor some drivers it is possible to modify the watchdog timeout on the 96cc2a2d19SMauro Carvalho Chehabfly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT 97cc2a2d19SMauro Carvalho Chehabflag set in their option field. The argument is an integer 98cc2a2d19SMauro Carvalho Chehabrepresenting the timeout in seconds. The driver returns the real 99cc2a2d19SMauro Carvalho Chehabtimeout used in the same variable, and this timeout might differ from 100cc2a2d19SMauro Carvalho Chehabthe requested one due to limitation of the hardware:: 101cc2a2d19SMauro Carvalho Chehab 102cc2a2d19SMauro Carvalho Chehab int timeout = 45; 103cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_SETTIMEOUT, &timeout); 104cc2a2d19SMauro Carvalho Chehab printf("The timeout was set to %d seconds\n", timeout); 105cc2a2d19SMauro Carvalho Chehab 106cc2a2d19SMauro Carvalho ChehabThis example might actually print "The timeout was set to 60 seconds" 107cc2a2d19SMauro Carvalho Chehabif the device has a granularity of minutes for its timeout. 108cc2a2d19SMauro Carvalho Chehab 109cc2a2d19SMauro Carvalho ChehabStarting with the Linux 2.4.18 kernel, it is possible to query the 110cc2a2d19SMauro Carvalho Chehabcurrent timeout using the GETTIMEOUT ioctl:: 111cc2a2d19SMauro Carvalho Chehab 112cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETTIMEOUT, &timeout); 113cc2a2d19SMauro Carvalho Chehab printf("The timeout was is %d seconds\n", timeout); 114cc2a2d19SMauro Carvalho Chehab 115cc2a2d19SMauro Carvalho ChehabPretimeouts 116cc2a2d19SMauro Carvalho Chehab=========== 117cc2a2d19SMauro Carvalho Chehab 118cc2a2d19SMauro Carvalho ChehabSome watchdog timers can be set to have a trigger go off before the 119cc2a2d19SMauro Carvalho Chehabactual time they will reset the system. This can be done with an NMI, 120cc2a2d19SMauro Carvalho Chehabinterrupt, or other mechanism. This allows Linux to record useful 121cc2a2d19SMauro Carvalho Chehabinformation (like panic information and kernel coredumps) before it 122cc2a2d19SMauro Carvalho Chehabresets:: 123cc2a2d19SMauro Carvalho Chehab 124cc2a2d19SMauro Carvalho Chehab pretimeout = 10; 125cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout); 126cc2a2d19SMauro Carvalho Chehab 127cc2a2d19SMauro Carvalho ChehabNote that the pretimeout is the number of seconds before the time 128cc2a2d19SMauro Carvalho Chehabwhen the timeout will go off. It is not the number of seconds until 129cc2a2d19SMauro Carvalho Chehabthe pretimeout. So, for instance, if you set the timeout to 60 seconds 130cc2a2d19SMauro Carvalho Chehaband the pretimeout to 10 seconds, the pretimeout will go off in 50 131cc2a2d19SMauro Carvalho Chehabseconds. Setting a pretimeout to zero disables it. 132cc2a2d19SMauro Carvalho Chehab 133cc2a2d19SMauro Carvalho ChehabThere is also a get function for getting the pretimeout:: 134cc2a2d19SMauro Carvalho Chehab 135cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout); 136cc2a2d19SMauro Carvalho Chehab printf("The pretimeout was is %d seconds\n", timeout); 137cc2a2d19SMauro Carvalho Chehab 138cc2a2d19SMauro Carvalho ChehabNot all watchdog drivers will support a pretimeout. 139cc2a2d19SMauro Carvalho Chehab 140cc2a2d19SMauro Carvalho ChehabGet the number of seconds before reboot 141cc2a2d19SMauro Carvalho Chehab======================================= 142cc2a2d19SMauro Carvalho Chehab 143cc2a2d19SMauro Carvalho ChehabSome watchdog drivers have the ability to report the remaining time 144cc2a2d19SMauro Carvalho Chehabbefore the system will reboot. The WDIOC_GETTIMELEFT is the ioctl 145cc2a2d19SMauro Carvalho Chehabthat returns the number of seconds before reboot:: 146cc2a2d19SMauro Carvalho Chehab 147cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETTIMELEFT, &timeleft); 148cc2a2d19SMauro Carvalho Chehab printf("The timeout was is %d seconds\n", timeleft); 149cc2a2d19SMauro Carvalho Chehab 150cc2a2d19SMauro Carvalho ChehabEnvironmental monitoring 151cc2a2d19SMauro Carvalho Chehab======================== 152cc2a2d19SMauro Carvalho Chehab 153cc2a2d19SMauro Carvalho ChehabAll watchdog drivers are required return more information about the system, 154cc2a2d19SMauro Carvalho Chehabsome do temperature, fan and power level monitoring, some can tell you 155cc2a2d19SMauro Carvalho Chehabthe reason for the last reboot of the system. The GETSUPPORT ioctl is 156cc2a2d19SMauro Carvalho Chehabavailable to ask what the device can do:: 157cc2a2d19SMauro Carvalho Chehab 158cc2a2d19SMauro Carvalho Chehab struct watchdog_info ident; 159cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETSUPPORT, &ident); 160cc2a2d19SMauro Carvalho Chehab 161cc2a2d19SMauro Carvalho Chehabthe fields returned in the ident struct are: 162cc2a2d19SMauro Carvalho Chehab 163cc2a2d19SMauro Carvalho Chehab ================ ============================================= 164cc2a2d19SMauro Carvalho Chehab identity a string identifying the watchdog driver 165cc2a2d19SMauro Carvalho Chehab firmware_version the firmware version of the card if available 166cc2a2d19SMauro Carvalho Chehab options a flags describing what the device supports 167cc2a2d19SMauro Carvalho Chehab ================ ============================================= 168cc2a2d19SMauro Carvalho Chehab 169cc2a2d19SMauro Carvalho Chehabthe options field can have the following bits set, and describes what 170cc2a2d19SMauro Carvalho Chehabkind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can 171d51d3852SAhmad Fatoumreturn. 172cc2a2d19SMauro Carvalho Chehab 173cc2a2d19SMauro Carvalho Chehab ================ ========================= 174cc2a2d19SMauro Carvalho Chehab WDIOF_OVERHEAT Reset due to CPU overheat 175cc2a2d19SMauro Carvalho Chehab ================ ========================= 176cc2a2d19SMauro Carvalho Chehab 177cc2a2d19SMauro Carvalho ChehabThe machine was last rebooted by the watchdog because the thermal limit was 178cc2a2d19SMauro Carvalho Chehabexceeded: 179cc2a2d19SMauro Carvalho Chehab 180cc2a2d19SMauro Carvalho Chehab ============== ========== 181cc2a2d19SMauro Carvalho Chehab WDIOF_FANFAULT Fan failed 182cc2a2d19SMauro Carvalho Chehab ============== ========== 183cc2a2d19SMauro Carvalho Chehab 184cc2a2d19SMauro Carvalho ChehabA system fan monitored by the watchdog card has failed 185cc2a2d19SMauro Carvalho Chehab 186cc2a2d19SMauro Carvalho Chehab ============= ================ 187cc2a2d19SMauro Carvalho Chehab WDIOF_EXTERN1 External relay 1 188cc2a2d19SMauro Carvalho Chehab ============= ================ 189cc2a2d19SMauro Carvalho Chehab 190cc2a2d19SMauro Carvalho ChehabExternal monitoring relay/source 1 was triggered. Controllers intended for 191cc2a2d19SMauro Carvalho Chehabreal world applications include external monitoring pins that will trigger 192cc2a2d19SMauro Carvalho Chehaba reset. 193cc2a2d19SMauro Carvalho Chehab 194cc2a2d19SMauro Carvalho Chehab ============= ================ 195cc2a2d19SMauro Carvalho Chehab WDIOF_EXTERN2 External relay 2 196cc2a2d19SMauro Carvalho Chehab ============= ================ 197cc2a2d19SMauro Carvalho Chehab 198cc2a2d19SMauro Carvalho ChehabExternal monitoring relay/source 2 was triggered 199cc2a2d19SMauro Carvalho Chehab 200cc2a2d19SMauro Carvalho Chehab ================ ===================== 201cc2a2d19SMauro Carvalho Chehab WDIOF_POWERUNDER Power bad/power fault 202cc2a2d19SMauro Carvalho Chehab ================ ===================== 203cc2a2d19SMauro Carvalho Chehab 204cc2a2d19SMauro Carvalho ChehabThe machine is showing an undervoltage status 205cc2a2d19SMauro Carvalho Chehab 206cc2a2d19SMauro Carvalho Chehab =============== ============================= 207cc2a2d19SMauro Carvalho Chehab WDIOF_CARDRESET Card previously reset the CPU 208cc2a2d19SMauro Carvalho Chehab =============== ============================= 209cc2a2d19SMauro Carvalho Chehab 210cc2a2d19SMauro Carvalho ChehabThe last reboot was caused by the watchdog card 211cc2a2d19SMauro Carvalho Chehab 212cc2a2d19SMauro Carvalho Chehab ================ ===================== 213cc2a2d19SMauro Carvalho Chehab WDIOF_POWEROVER Power over voltage 214cc2a2d19SMauro Carvalho Chehab ================ ===================== 215cc2a2d19SMauro Carvalho Chehab 216cc2a2d19SMauro Carvalho ChehabThe machine is showing an overvoltage status. Note that if one level is 217cc2a2d19SMauro Carvalho Chehabunder and one over both bits will be set - this may seem odd but makes 218cc2a2d19SMauro Carvalho Chehabsense. 219cc2a2d19SMauro Carvalho Chehab 220cc2a2d19SMauro Carvalho Chehab =================== ===================== 221cc2a2d19SMauro Carvalho Chehab WDIOF_KEEPALIVEPING Keep alive ping reply 222cc2a2d19SMauro Carvalho Chehab =================== ===================== 223cc2a2d19SMauro Carvalho Chehab 224cc2a2d19SMauro Carvalho ChehabThe watchdog saw a keepalive ping since it was last queried. 225cc2a2d19SMauro Carvalho Chehab 226cc2a2d19SMauro Carvalho Chehab ================ ======================= 227cc2a2d19SMauro Carvalho Chehab WDIOF_SETTIMEOUT Can set/get the timeout 228cc2a2d19SMauro Carvalho Chehab ================ ======================= 229cc2a2d19SMauro Carvalho Chehab 230cc2a2d19SMauro Carvalho ChehabThe watchdog can do pretimeouts. 231cc2a2d19SMauro Carvalho Chehab 232cc2a2d19SMauro Carvalho Chehab ================ ================================ 233cc2a2d19SMauro Carvalho Chehab WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set 234cc2a2d19SMauro Carvalho Chehab ================ ================================ 235cc2a2d19SMauro Carvalho Chehab 236cc2a2d19SMauro Carvalho Chehab 237cc2a2d19SMauro Carvalho ChehabFor those drivers that return any bits set in the option field, the 238cc2a2d19SMauro Carvalho ChehabGETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current 239cc2a2d19SMauro Carvalho Chehabstatus, and the status at the last reboot, respectively:: 240cc2a2d19SMauro Carvalho Chehab 241cc2a2d19SMauro Carvalho Chehab int flags; 242cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETSTATUS, &flags); 243cc2a2d19SMauro Carvalho Chehab 244cc2a2d19SMauro Carvalho Chehab or 245cc2a2d19SMauro Carvalho Chehab 246cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETBOOTSTATUS, &flags); 247cc2a2d19SMauro Carvalho Chehab 248cc2a2d19SMauro Carvalho ChehabNote that not all devices support these two calls, and some only 249cc2a2d19SMauro Carvalho Chehabsupport the GETBOOTSTATUS call. 250cc2a2d19SMauro Carvalho Chehab 251cc2a2d19SMauro Carvalho ChehabSome drivers can measure the temperature using the GETTEMP ioctl. The 252cc2a2d19SMauro Carvalho Chehabreturned value is the temperature in degrees fahrenheit:: 253cc2a2d19SMauro Carvalho Chehab 254cc2a2d19SMauro Carvalho Chehab int temperature; 255cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_GETTEMP, &temperature); 256cc2a2d19SMauro Carvalho Chehab 257cc2a2d19SMauro Carvalho ChehabFinally the SETOPTIONS ioctl can be used to control some aspects of 258cc2a2d19SMauro Carvalho Chehabthe cards operation:: 259cc2a2d19SMauro Carvalho Chehab 260cc2a2d19SMauro Carvalho Chehab int options = 0; 261cc2a2d19SMauro Carvalho Chehab ioctl(fd, WDIOC_SETOPTIONS, &options); 262cc2a2d19SMauro Carvalho Chehab 263cc2a2d19SMauro Carvalho ChehabThe following options are available: 264cc2a2d19SMauro Carvalho Chehab 265cc2a2d19SMauro Carvalho Chehab ================= ================================ 266cc2a2d19SMauro Carvalho Chehab WDIOS_DISABLECARD Turn off the watchdog timer 267cc2a2d19SMauro Carvalho Chehab WDIOS_ENABLECARD Turn on the watchdog timer 268cc2a2d19SMauro Carvalho Chehab WDIOS_TEMPPANIC Kernel panic on temperature trip 269cc2a2d19SMauro Carvalho Chehab ================= ================================ 270cc2a2d19SMauro Carvalho Chehab 271cc2a2d19SMauro Carvalho Chehab[FIXME -- better explanations] 272