1edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/
2edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
3edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
4edfc8730SMauro Carvalho ChehabDescription:
5edfc8730SMauro Carvalho Chehab		(X = CPU number)
6edfc8730SMauro Carvalho Chehab
7edfc8730SMauro Carvalho Chehab		Machine checks report internal hardware error conditions
8edfc8730SMauro Carvalho Chehab		detected by the CPU. Uncorrected errors typically cause a
9edfc8730SMauro Carvalho Chehab		machine check (often with panic), corrected ones cause a
10edfc8730SMauro Carvalho Chehab		machine check log entry.
11edfc8730SMauro Carvalho Chehab
12edfc8730SMauro Carvalho Chehab		For more details about the x86 machine check architecture
13edfc8730SMauro Carvalho Chehab		see the Intel and AMD architecture manuals from their
14edfc8730SMauro Carvalho Chehab		developer websites.
15edfc8730SMauro Carvalho Chehab
16edfc8730SMauro Carvalho Chehab		For more details about the architecture
17edfc8730SMauro Carvalho Chehab		see http://one.firstfloor.org/~andi/mce.pdf
18edfc8730SMauro Carvalho Chehab
19edfc8730SMauro Carvalho Chehab		Each CPU has its own directory.
20edfc8730SMauro Carvalho Chehab
21edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/bank<Y>
22edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
23edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
24edfc8730SMauro Carvalho ChehabDescription:
25edfc8730SMauro Carvalho Chehab		(Y bank number)
26edfc8730SMauro Carvalho Chehab
27edfc8730SMauro Carvalho Chehab		64bit Hex bitmask enabling/disabling specific subevents for
28edfc8730SMauro Carvalho Chehab		bank Y.
29edfc8730SMauro Carvalho Chehab
30edfc8730SMauro Carvalho Chehab		When a bit in the bitmask is zero then the respective
31edfc8730SMauro Carvalho Chehab		subevent will not be reported.
32edfc8730SMauro Carvalho Chehab
33edfc8730SMauro Carvalho Chehab		By default all events are enabled.
34edfc8730SMauro Carvalho Chehab
35edfc8730SMauro Carvalho Chehab		Note that BIOS maintain another mask to disable specific events
36edfc8730SMauro Carvalho Chehab		per bank.  This is not visible here
37edfc8730SMauro Carvalho Chehab
38edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/check_interval
39edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
40edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
41edfc8730SMauro Carvalho ChehabDescription:
42edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
43edfc8730SMauro Carvalho Chehab		between all CPUs.
44edfc8730SMauro Carvalho Chehab
45edfc8730SMauro Carvalho Chehab		How often to poll for corrected machine check errors, in
46edfc8730SMauro Carvalho Chehab		seconds (Note output is hexadecimal). Default 5 minutes.
47edfc8730SMauro Carvalho Chehab		When the poller finds MCEs it triggers an exponential speedup
48edfc8730SMauro Carvalho Chehab		(poll more often) on the polling interval.  When the poller
49edfc8730SMauro Carvalho Chehab		stops finding MCEs, it triggers an exponential backoff
50edfc8730SMauro Carvalho Chehab		(poll less often) on the polling interval. The check_interval
51edfc8730SMauro Carvalho Chehab		variable is both the initial and maximum polling interval.
52edfc8730SMauro Carvalho Chehab		0 means no polling for corrected machine check errors
53edfc8730SMauro Carvalho Chehab		(but some corrected errors might be still reported
54edfc8730SMauro Carvalho Chehab		in other ways)
55edfc8730SMauro Carvalho Chehab
56edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/tolerant
57edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
58edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
59edfc8730SMauro Carvalho ChehabDescription:
60edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
61edfc8730SMauro Carvalho Chehab		between all CPUs.
62edfc8730SMauro Carvalho Chehab
63edfc8730SMauro Carvalho Chehab		Tolerance level. When a machine check exception occurs for a
64edfc8730SMauro Carvalho Chehab		non corrected machine check the kernel can take different
65edfc8730SMauro Carvalho Chehab		actions.
66edfc8730SMauro Carvalho Chehab
67edfc8730SMauro Carvalho Chehab		Since machine check exceptions can happen any time it is
68edfc8730SMauro Carvalho Chehab		sometimes risky for the kernel to kill a process because it
69edfc8730SMauro Carvalho Chehab		defies normal kernel locking rules. The tolerance level
70edfc8730SMauro Carvalho Chehab		configures how hard the kernel tries to recover even at some
71edfc8730SMauro Carvalho Chehab		risk of	deadlock. Higher tolerant values trade potentially
72edfc8730SMauro Carvalho Chehab		better uptime with the risk of a crash or even corruption
73edfc8730SMauro Carvalho Chehab		(for tolerant >= 3).
74edfc8730SMauro Carvalho Chehab
75edfc8730SMauro Carvalho Chehab		==  ===========================================================
76edfc8730SMauro Carvalho Chehab		 0  always panic on uncorrected errors, log corrected errors
77edfc8730SMauro Carvalho Chehab		 1  panic or SIGBUS on uncorrected errors, log corrected errors
78edfc8730SMauro Carvalho Chehab		 2  SIGBUS or log uncorrected errors, log corrected errors
79edfc8730SMauro Carvalho Chehab		 3  never panic or SIGBUS, log all errors (for testing only)
80edfc8730SMauro Carvalho Chehab		==  ===========================================================
81edfc8730SMauro Carvalho Chehab
82edfc8730SMauro Carvalho Chehab		Default: 1
83edfc8730SMauro Carvalho Chehab
84edfc8730SMauro Carvalho Chehab		Note this only makes a difference if the CPU allows recovery
85edfc8730SMauro Carvalho Chehab		from a machine check exception. Current x86 CPUs generally
86edfc8730SMauro Carvalho Chehab		do not.
87edfc8730SMauro Carvalho Chehab
88edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/trigger
89edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
90edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
91edfc8730SMauro Carvalho ChehabDescription:
92edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
93edfc8730SMauro Carvalho Chehab		between all CPUs.
94edfc8730SMauro Carvalho Chehab
95edfc8730SMauro Carvalho Chehab		Program to run when a machine check event is detected.
96edfc8730SMauro Carvalho Chehab		This is an alternative to running mcelog regularly from cron
97edfc8730SMauro Carvalho Chehab		and allows to detect events faster.
98edfc8730SMauro Carvalho Chehab
99edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/monarch_timeout
100edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
101edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
102edfc8730SMauro Carvalho ChehabDescription:
103edfc8730SMauro Carvalho Chehab		How long to wait for the other CPUs to machine check too on a
104edfc8730SMauro Carvalho Chehab		exception. 0 to disable waiting for other CPUs.
105edfc8730SMauro Carvalho Chehab
106edfc8730SMauro Carvalho Chehab		Unit: us
107edfc8730SMauro Carvalho Chehab
108*bf0cf321SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/ignore_ce
109*bf0cf321SMauro Carvalho ChehabContact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
110*bf0cf321SMauro Carvalho ChehabDate:		Jun 2009
111*bf0cf321SMauro Carvalho ChehabDescription:
112*bf0cf321SMauro Carvalho Chehab		Disables polling and CMCI for corrected errors.
113*bf0cf321SMauro Carvalho Chehab		All corrected events are not cleared and kept in bank MSRs.
114*bf0cf321SMauro Carvalho Chehab
115*bf0cf321SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/dont_log_ce
116*bf0cf321SMauro Carvalho ChehabContact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
117*bf0cf321SMauro Carvalho ChehabDate:		Jun 2009
118*bf0cf321SMauro Carvalho ChehabDescription:
119*bf0cf321SMauro Carvalho Chehab		Disables logging for corrected errors.
120*bf0cf321SMauro Carvalho Chehab		All reported corrected errors will be cleared silently.
121*bf0cf321SMauro Carvalho Chehab
122*bf0cf321SMauro Carvalho Chehab		This option will be useful if you never care about corrected
123*bf0cf321SMauro Carvalho Chehab		errors.
124*bf0cf321SMauro Carvalho Chehab
125*bf0cf321SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/cmci_disabled
126*bf0cf321SMauro Carvalho ChehabContact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
127*bf0cf321SMauro Carvalho ChehabDate:		Jun 2009
128*bf0cf321SMauro Carvalho ChehabDescription:
129*bf0cf321SMauro Carvalho Chehab		Disables the CMCI feature.
130