1*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/
2*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
3*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
4*edfc8730SMauro Carvalho ChehabDescription:
5*edfc8730SMauro Carvalho Chehab		(X = CPU number)
6*edfc8730SMauro Carvalho Chehab
7*edfc8730SMauro Carvalho Chehab		Machine checks report internal hardware error conditions
8*edfc8730SMauro Carvalho Chehab		detected by the CPU. Uncorrected errors typically cause a
9*edfc8730SMauro Carvalho Chehab		machine check (often with panic), corrected ones cause a
10*edfc8730SMauro Carvalho Chehab		machine check log entry.
11*edfc8730SMauro Carvalho Chehab
12*edfc8730SMauro Carvalho Chehab		For more details about the x86 machine check architecture
13*edfc8730SMauro Carvalho Chehab		see the Intel and AMD architecture manuals from their
14*edfc8730SMauro Carvalho Chehab		developer websites.
15*edfc8730SMauro Carvalho Chehab
16*edfc8730SMauro Carvalho Chehab		For more details about the architecture
17*edfc8730SMauro Carvalho Chehab		see http://one.firstfloor.org/~andi/mce.pdf
18*edfc8730SMauro Carvalho Chehab
19*edfc8730SMauro Carvalho Chehab		Each CPU has its own directory.
20*edfc8730SMauro Carvalho Chehab
21*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/bank<Y>
22*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
23*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
24*edfc8730SMauro Carvalho ChehabDescription:
25*edfc8730SMauro Carvalho Chehab		(Y bank number)
26*edfc8730SMauro Carvalho Chehab
27*edfc8730SMauro Carvalho Chehab		64bit Hex bitmask enabling/disabling specific subevents for
28*edfc8730SMauro Carvalho Chehab		bank Y.
29*edfc8730SMauro Carvalho Chehab
30*edfc8730SMauro Carvalho Chehab		When a bit in the bitmask is zero then the respective
31*edfc8730SMauro Carvalho Chehab		subevent will not be reported.
32*edfc8730SMauro Carvalho Chehab
33*edfc8730SMauro Carvalho Chehab		By default all events are enabled.
34*edfc8730SMauro Carvalho Chehab
35*edfc8730SMauro Carvalho Chehab		Note that BIOS maintain another mask to disable specific events
36*edfc8730SMauro Carvalho Chehab		per bank.  This is not visible here
37*edfc8730SMauro Carvalho Chehab
38*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/check_interval
39*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
40*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
41*edfc8730SMauro Carvalho ChehabDescription:
42*edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
43*edfc8730SMauro Carvalho Chehab		between all CPUs.
44*edfc8730SMauro Carvalho Chehab
45*edfc8730SMauro Carvalho Chehab		How often to poll for corrected machine check errors, in
46*edfc8730SMauro Carvalho Chehab		seconds (Note output is hexadecimal). Default 5 minutes.
47*edfc8730SMauro Carvalho Chehab		When the poller finds MCEs it triggers an exponential speedup
48*edfc8730SMauro Carvalho Chehab		(poll more often) on the polling interval.  When the poller
49*edfc8730SMauro Carvalho Chehab		stops finding MCEs, it triggers an exponential backoff
50*edfc8730SMauro Carvalho Chehab		(poll less often) on the polling interval. The check_interval
51*edfc8730SMauro Carvalho Chehab		variable is both the initial and maximum polling interval.
52*edfc8730SMauro Carvalho Chehab		0 means no polling for corrected machine check errors
53*edfc8730SMauro Carvalho Chehab		(but some corrected errors might be still reported
54*edfc8730SMauro Carvalho Chehab		in other ways)
55*edfc8730SMauro Carvalho Chehab
56*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/tolerant
57*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
58*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
59*edfc8730SMauro Carvalho ChehabDescription:
60*edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
61*edfc8730SMauro Carvalho Chehab		between all CPUs.
62*edfc8730SMauro Carvalho Chehab
63*edfc8730SMauro Carvalho Chehab		Tolerance level. When a machine check exception occurs for a
64*edfc8730SMauro Carvalho Chehab		non corrected machine check the kernel can take different
65*edfc8730SMauro Carvalho Chehab		actions.
66*edfc8730SMauro Carvalho Chehab
67*edfc8730SMauro Carvalho Chehab		Since machine check exceptions can happen any time it is
68*edfc8730SMauro Carvalho Chehab		sometimes risky for the kernel to kill a process because it
69*edfc8730SMauro Carvalho Chehab		defies normal kernel locking rules. The tolerance level
70*edfc8730SMauro Carvalho Chehab		configures how hard the kernel tries to recover even at some
71*edfc8730SMauro Carvalho Chehab		risk of	deadlock. Higher tolerant values trade potentially
72*edfc8730SMauro Carvalho Chehab		better uptime with the risk of a crash or even corruption
73*edfc8730SMauro Carvalho Chehab		(for tolerant >= 3).
74*edfc8730SMauro Carvalho Chehab
75*edfc8730SMauro Carvalho Chehab		==  ===========================================================
76*edfc8730SMauro Carvalho Chehab		 0  always panic on uncorrected errors, log corrected errors
77*edfc8730SMauro Carvalho Chehab		 1  panic or SIGBUS on uncorrected errors, log corrected errors
78*edfc8730SMauro Carvalho Chehab		 2  SIGBUS or log uncorrected errors, log corrected errors
79*edfc8730SMauro Carvalho Chehab		 3  never panic or SIGBUS, log all errors (for testing only)
80*edfc8730SMauro Carvalho Chehab		==  ===========================================================
81*edfc8730SMauro Carvalho Chehab
82*edfc8730SMauro Carvalho Chehab		Default: 1
83*edfc8730SMauro Carvalho Chehab
84*edfc8730SMauro Carvalho Chehab		Note this only makes a difference if the CPU allows recovery
85*edfc8730SMauro Carvalho Chehab		from a machine check exception. Current x86 CPUs generally
86*edfc8730SMauro Carvalho Chehab		do not.
87*edfc8730SMauro Carvalho Chehab
88*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/trigger
89*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
90*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
91*edfc8730SMauro Carvalho ChehabDescription:
92*edfc8730SMauro Carvalho Chehab		The entries appear for each CPU, but they are truly shared
93*edfc8730SMauro Carvalho Chehab		between all CPUs.
94*edfc8730SMauro Carvalho Chehab
95*edfc8730SMauro Carvalho Chehab		Program to run when a machine check event is detected.
96*edfc8730SMauro Carvalho Chehab		This is an alternative to running mcelog regularly from cron
97*edfc8730SMauro Carvalho Chehab		and allows to detect events faster.
98*edfc8730SMauro Carvalho Chehab
99*edfc8730SMauro Carvalho ChehabWhat:		/sys/devices/system/machinecheck/machinecheckX/monarch_timeout
100*edfc8730SMauro Carvalho ChehabContact:	Andi Kleen <ak@linux.intel.com>
101*edfc8730SMauro Carvalho ChehabDate:		Feb, 2007
102*edfc8730SMauro Carvalho ChehabDescription:
103*edfc8730SMauro Carvalho Chehab		How long to wait for the other CPUs to machine check too on a
104*edfc8730SMauro Carvalho Chehab		exception. 0 to disable waiting for other CPUs.
105*edfc8730SMauro Carvalho Chehab
106*edfc8730SMauro Carvalho Chehab		Unit: us
107*edfc8730SMauro Carvalho Chehab
108