xref: /openbmc/linux/Documentation/ABI/testing/sysfs-mce (revision 19dc81b4017baffd6e919fd71cfc8dcbd5442e15)
1What:		/sys/devices/system/machinecheck/machinecheckX/
2Contact:	Andi Kleen <ak@linux.intel.com>
3Date:		Feb, 2007
4Description:
5		(X = CPU number)
6
7		Machine checks report internal hardware error conditions
8		detected by the CPU. Uncorrected errors typically cause a
9		machine check (often with panic), corrected ones cause a
10		machine check log entry.
11
12		For more details about the x86 machine check architecture
13		see the Intel and AMD architecture manuals from their
14		developer websites.
15
16		For more details about the architecture
17		see http://one.firstfloor.org/~andi/mce.pdf
18
19		Each CPU has its own directory.
20
21What:		/sys/devices/system/machinecheck/machinecheckX/bank<Y>
22Contact:	Andi Kleen <ak@linux.intel.com>
23Date:		Feb, 2007
24Description:
25		(Y bank number)
26
27		64bit Hex bitmask enabling/disabling specific subevents for
28		bank Y.
29
30		When a bit in the bitmask is zero then the respective
31		subevent will not be reported.
32
33		By default all events are enabled.
34
35		Note that BIOS maintain another mask to disable specific events
36		per bank.  This is not visible here
37
38What:		/sys/devices/system/machinecheck/machinecheckX/check_interval
39Contact:	Andi Kleen <ak@linux.intel.com>
40Date:		Feb, 2007
41Description:
42		The entries appear for each CPU, but they are truly shared
43		between all CPUs.
44
45		How often to poll for corrected machine check errors, in
46		seconds (Note output is hexadecimal). Default 5 minutes.
47		When the poller finds MCEs it triggers an exponential speedup
48		(poll more often) on the polling interval.  When the poller
49		stops finding MCEs, it triggers an exponential backoff
50		(poll less often) on the polling interval. The check_interval
51		variable is both the initial and maximum polling interval.
52		0 means no polling for corrected machine check errors
53		(but some corrected errors might be still reported
54		in other ways)
55
56What:		/sys/devices/system/machinecheck/machinecheckX/tolerant
57Contact:	Andi Kleen <ak@linux.intel.com>
58Date:		Feb, 2007
59Description:
60		The entries appear for each CPU, but they are truly shared
61		between all CPUs.
62
63		Tolerance level. When a machine check exception occurs for a
64		non corrected machine check the kernel can take different
65		actions.
66
67		Since machine check exceptions can happen any time it is
68		sometimes risky for the kernel to kill a process because it
69		defies normal kernel locking rules. The tolerance level
70		configures how hard the kernel tries to recover even at some
71		risk of	deadlock. Higher tolerant values trade potentially
72		better uptime with the risk of a crash or even corruption
73		(for tolerant >= 3).
74
75		==  ===========================================================
76		 0  always panic on uncorrected errors, log corrected errors
77		 1  panic or SIGBUS on uncorrected errors, log corrected errors
78		 2  SIGBUS or log uncorrected errors, log corrected errors
79		 3  never panic or SIGBUS, log all errors (for testing only)
80		==  ===========================================================
81
82		Default: 1
83
84		Note this only makes a difference if the CPU allows recovery
85		from a machine check exception. Current x86 CPUs generally
86		do not.
87
88What:		/sys/devices/system/machinecheck/machinecheckX/trigger
89Contact:	Andi Kleen <ak@linux.intel.com>
90Date:		Feb, 2007
91Description:
92		The entries appear for each CPU, but they are truly shared
93		between all CPUs.
94
95		Program to run when a machine check event is detected.
96		This is an alternative to running mcelog regularly from cron
97		and allows to detect events faster.
98
99What:		/sys/devices/system/machinecheck/machinecheckX/monarch_timeout
100Contact:	Andi Kleen <ak@linux.intel.com>
101Date:		Feb, 2007
102Description:
103		How long to wait for the other CPUs to machine check too on a
104		exception. 0 to disable waiting for other CPUs.
105
106		Unit: us
107
108What:		/sys/devices/system/machinecheck/machinecheckX/ignore_ce
109Contact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
110Date:		Jun 2009
111Description:
112		Disables polling and CMCI for corrected errors.
113		All corrected events are not cleared and kept in bank MSRs.
114
115What:		/sys/devices/system/machinecheck/machinecheckX/dont_log_ce
116Contact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
117Date:		Jun 2009
118Description:
119		Disables logging for corrected errors.
120		All reported corrected errors will be cleared silently.
121
122		This option will be useful if you never care about corrected
123		errors.
124
125What:		/sys/devices/system/machinecheck/machinecheckX/cmci_disabled
126Contact:	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
127Date:		Jun 2009
128Description:
129		Disables the CMCI feature.
130