Lines Matching +full:al +full:- +full:mc +full:- +full:edac
33 -------------
47 Self-Monitoring, Analysis and Reporting Technology (SMART).
55 ---------------
68 * **Correctable Error (CE)** - the error detection mechanism detected and
72 * **Uncorrected Error (UE)** - the amount of errors happened above the error
73 correction threshold, and the system was unable to auto-correct.
75 * **Fatal Error** - when an UE error happens on a critical component of the
79 * **Non-fatal Error** - when an UE error happens on an unused component,
87 The mechanism for handling non-fatal errors is usually complex and may
92 ------------------------------------
113 Locator: ChannelA-DIMM0
121 On the above example, a DDR4 SO-DIMM memory module is located at the
150 Such kind of memory is called Error-correcting code memory (ECC memory).
157 ----------
182 either by BIOS, by some special CPUs or by Linux EDAC driver. On x86 64
187 mode called "Lock-Step", where it groups two memory modules together,
188 doing 128-bit reads/writes. That gives 16 bits for error correction, with
198 memory modules (or 4 memory modules, if the system is also on Lock-step
204 EDAC - Error Detection And Correction
210 was "out-of-tree" and maintained at http://bluesmoke.sourceforge.net.
215 Kernel 2.6.16, it was renamed to ``EDAC``.
218 -------
220 The ``edac`` kernel module's goal is to detect and report hardware errors
224 ------
240 -----------------------
242 A new feature for EDAC, the ``edac_device`` class of device, was added in
245 This new device type allows for non-memory type of ECC hardware detectors
257 ----------------
263 There are several add-in adapters that do **not** follow the PCI specification
270 the EDAC PCI scanning code. If that attribute is set, PCI parity/error
280 ----------
282 EDAC is composed of a "core" module (``edac_core.ko``) and several Memory
283 Controller (MC) driver modules. On a given system, the CORE is loaded
284 and one MC driver will be loaded. Both the CORE and the MC driver (or
289 both the CORE's and the MC driver's versions.
293 -------
295 If ``edac`` was statically linked with the kernel then no loading
296 is necessary. If ``edac`` was built as modules then simply modprobe
297 the ``edac`` pieces that you need. You should be able to modprobe
298 hardware-specific modules and have the dependencies load the necessary
310 ---------------
312 EDAC presents a ``sysfs`` interface for control and reporting purposes. It
313 lives in the /sys/devices/system/edac directory.
318 mc memory controller(s) system
324 Memory Controller (mc) Model
325 ----------------------------
327 Each ``mc`` device controls a set of memory modules [#f4]_. These modules
328 are laid out in a Chip-Select Row (``csrowX``) and Channel table (``chX``).
331 .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
333 packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
336 (Type 17). Along this document, and inside the EDAC subsystem, the term
346 for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory
349 +------------+-----------------------+
351 +------------+-----------+-----------+
355 +------------+-----------+-----------+
357 +------------+-----------+-----------+
359 +------------+-----------+-----------+
361 +------------+-----------+-----------+
363 +------------+-----------+-----------+
365 +------------+-----------+-----------+
370 +---------+---------+
372 +---------+---------+
374 +---------+---------+
376 Labels for these slots are usually silk-screened on the motherboard.
393 tree in EDAC's sysfs interface. Starting in directory
394 ``/sys/devices/system/edac/mc``, each memory controller will be
396 index of the MC::
398 ..../edac/mc/
400 |->mc0
401 |->mc1
402 |->mc2
408 .../mc/mc0/
410 |->csrow0
411 |->csrow2
412 |->csrow3
417 order to have dual-channel mode be operational. Since both csrow2 and
421 Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
425 -------------------
427 In ``mcX`` directories are EDAC control and attribute files for
432 Documentation/ABI/testing/sysfs-devices-edac
436 ----------------------------------
438 The recommended way to use the EDAC subsystem is to look at the information
441 A typical EDAC system has the following structure under
442 ``/sys/devices/system/edac/``\ [#f6]_::
444 /sys/devices/system/edac/
445 ├── mc
491 In the ``dimmX`` directories are EDAC control and attribute files for
494 - ``size`` - Total memory managed by this csrow attribute file
499 - ``dimm_ue_count`` - Uncorrectable Errors count attribute file
503 this counter will not have a chance to increment, since EDAC
506 - ``dimm_ce_count`` - Correctable Errors count attribute file
512 monitored for non-zero values and report such information
515 - ``dimm_dev_type`` - Device type attribute file
521 - x1
522 - x2
523 - x4
524 - x8
526 - ``dimm_edac_mode`` - EDAC Mode of operation attribute file
531 - ``dimm_label`` - memory module label control file
545 - ``dimm_location`` - location of the memory module
552 - *csrow* and *channel* - used when the memory controller
553 doesn't identify a single DIMM - e. g. in ``rankX`` dir;
554 - *branch*, *channel*, *slot* - typically used on FB-DIMM memory
556 - *channel*, *slot* - used on Nehalem and newer Intel drivers.
558 - ``dimm_mem_type`` - Memory Type attribute file
564 - Registered-DDR
565 - Unbuffered-DDR
577 ----------------------
580 directories. As this API doesn't work properly for Rambus, FB-DIMMs and
584 In the ``csrowX`` directories are EDAC control and attribute files for
588 - ``ue_count`` - Total Uncorrectable Errors count attribute file
592 this counter will not have a chance to increment, since EDAC
596 - ``ce_count`` - Total Correctable Errors count attribute file
602 monitored for non-zero values and report such information
606 - ``size_mb`` - Total memory managed by this csrow attribute file
612 - ``mem_type`` - Memory Type attribute file
618 - Registered-DDR
619 - Unbuffered-DDR
622 - ``edac_mode`` - EDAC Mode of operation attribute file
628 - ``dev_type`` - Device type attribute file
634 - x1
635 - x2
636 - x4
637 - x8
640 - ``ch0_ce_count`` - Channel 0 CE Count attribute file
646 - ``ch0_ue_count`` - Channel 0 UE Count attribute file
652 - ``ch0_dimm_label`` - Channel 0 DIMM Label control file
668 - ``ch1_ce_count`` - Channel 1 CE Count attribute file
675 - ``ch1_ue_count`` - Channel 1 UE Count attribute file
682 - ``ch1_dimm_label`` - Channel 1 DIMM Label control file
698 --------------
703 …EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76…
704 …EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76…
709 +---------------------------------------+-------------+
713 +---------------------------------------+-------------+
715 +---------------------------------------+-------------+
717 +---------------------------------------+-------------+
719 +---------------------------------------+-------------+
722 +---------------------------------------+-------------+
724 +---------------------------------------+-------------+
726 +---------------------------------------+-------------+
728 +---------------------------------------+-------------+
730 +---------------------------------------+-------------+
731 | And then an optional, driver-specific | |
734 +---------------------------------------+-------------+
737 type, a notice of "no info" and then an optional, driver-specific error
742 ------------------------
752 -------------------
754 Under ``/sys/devices/system/edac/pci`` are control and attribute files as
758 - ``check_pci_parity`` - Enable/Disable PCI Parity checking control file
766 echo "1" >/sys/devices/system/edac/pci/check_pci_parity
770 echo "0" >/sys/devices/system/edac/pci/check_pci_parity
773 - ``pci_parity_count`` - Parity Count
780 -----------------
782 - ``edac_mc_panic_on_ue`` - Panic on UE control file
786 occurs - it is indeterminate what was uncorrected and the operating
788 corruption. If the kernel has MCE configured, then EDAC will never
800 - ``edac_mc_log_ue`` - Log UE control file
816 - ``edac_mc_log_ce`` - Log CE control file
832 - ``edac_mc_poll_msec`` - Polling period control file
851 - ``panic_on_pci_parity`` - Panic on PCI PARITY Error
872 EDAC device type
873 ----------------
880 At the location ``/sys/devices/system/edac`` (sysfs) new edac_device devices
883 There is a three level tree beneath the above ``edac`` directory. For example,
887 /sys/devices/system/edac/test-instance
909 One out-of-tree driver uses controls here to allow
917 ---------
922 +----------------+
923 | test-instance0 |
924 +----------------+
936 ------
941 +-------------+
942 | test-block0 |
943 +-------------+
958 test-block-bits-0 for every POLL cycle this counter
960 test-block-bits-1 every 10 cycles, this counter is bumped once,
961 and test-block-bits-0 is set to 0
962 test-block-bits-2 every 100 cycles, this counter is bumped once,
963 and test-block-bits-1 is set to 0
964 test-block-bits-3 every 1000 cycles, this counter is bumped once,
965 and test-block-bits-2 is set to 0
970 reset-counters writing ANY thing to this control will
979 http://bluesmoke.sourceforge.net project site for EDAC.
982 Usage of EDAC APIs on Nehalem and newer Intel CPUs
983 --------------------------------------------------
988 controller (MC) inside the CPUs.
1004 Each MC have 3 physical read channels, 3 physical write channels and
1009 As EDAC API maps the minimum unity is csrows, the driver sequentially
1033 2) The MC has the ability to inject errors to test drivers. The drivers
1037 ``/sys/devices/system/edac/mc/mc?/``:
1039 - ``inject_addrmatch/*``:
1057 echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm
1058 echo 1 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank
1062 echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm
1063 echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank
1065 - ``inject_eccmask``:
1068 - ``inject_section``:
1075 - ``inject_type``:
1078 bit 0 - repeat
1079 bit 1 - ecc
1080 bit 2 - parity
1082 - ``inject_enable``:
1094 echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/channel
1095 echo 2 >/sys/devices/system/edac/mc/mc0/inject_type
1096 echo 64 >/sys/devices/system/edac/mc/mc0/inject_eccmask
1097 echo 3 >/sys/devices/system/edac/mc/mc0/inject_section
1098 echo 1 >/sys/devices/system/edac/mc/mc0/inject_enable
1106 …EDAC MC0: UE row 0, channel-a= 0 channel-b= 0 labels "-": NON_FATAL (addr = 0x0075b980, socket=0, …
1121 $ for i in /sys/devices/system/edac/mc/mc0/all_channel_counts/*; do echo $i; cat $i; done
1122 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
1124 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
1126 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2
1155 ------------------------------------------
1158 (available from http://support.amd.com/en-us/search/tech-docs):
1181 Models 30h-3Fh Processors
1185 :Link: http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf
1188 Models 60h-6Fh Processors
1192 :Link: http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf
1195 Models 00h-0Fh Processors
1206 - 7 Dec 2005
1207 - 17 Jul 2007 Updated
1211 - 05 Aug 2009 Nehalem interface
1212 - 26 Oct 2016 Converted to ReST and cleanups at the Nehalem section
1214 * EDAC authors/maintainers:
1216 - Doug Thompson, Dave Jiang, Dave Peterson et al,
1217 - Mauro Carvalho Chehab
1218 - Borislav Petkov
1219 - original author: Thayne Harbaugh