xref: /openbmc/linux/Documentation/driver-api/edac.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
16634fbb6SMauro Carvalho ChehabError Detection And Correction (EDAC) Devices
26634fbb6SMauro Carvalho Chehab=============================================
36634fbb6SMauro Carvalho Chehab
46b1fb6f7SMauro Carvalho ChehabMain Concepts used at the EDAC subsystem
56b1fb6f7SMauro Carvalho Chehab----------------------------------------
66b1fb6f7SMauro Carvalho Chehab
76b1fb6f7SMauro Carvalho ChehabThere are several things to be aware of that aren't at all obvious, like
86b1fb6f7SMauro Carvalho Chehab*sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*,
96b1fb6f7SMauro Carvalho Chehabetc...
106b1fb6f7SMauro Carvalho Chehab
116b1fb6f7SMauro Carvalho ChehabThese are some of the many terms that are thrown about that don't always
126b1fb6f7SMauro Carvalho Chehabmean what people think they mean (Inconceivable!).  In the interest of
136b1fb6f7SMauro Carvalho Chehabcreating a common ground for discussion, terms and their definitions
146b1fb6f7SMauro Carvalho Chehabwill be established.
156b1fb6f7SMauro Carvalho Chehab
166b1fb6f7SMauro Carvalho Chehab* Memory devices
176b1fb6f7SMauro Carvalho Chehab
186b1fb6f7SMauro Carvalho ChehabThe individual DRAM chips on a memory stick.  These devices commonly
196b1fb6f7SMauro Carvalho Chehaboutput 4 and 8 bits each (x4, x8). Grouping several of these in parallel
206b1fb6f7SMauro Carvalho Chehabprovides the number of bits that the memory controller expects:
216b1fb6f7SMauro Carvalho Chehabtypically 72 bits, in order to provide 64 bits + 8 bits of ECC data.
226b1fb6f7SMauro Carvalho Chehab
236b1fb6f7SMauro Carvalho Chehab* Memory Stick
246b1fb6f7SMauro Carvalho Chehab
256b1fb6f7SMauro Carvalho ChehabA printed circuit board that aggregates multiple memory devices in
266b1fb6f7SMauro Carvalho Chehabparallel.  In general, this is the Field Replaceable Unit (FRU) which
276b1fb6f7SMauro Carvalho Chehabgets replaced, in the case of excessive errors. Most often it is also
286b1fb6f7SMauro Carvalho Chehabcalled DIMM (Dual Inline Memory Module).
296b1fb6f7SMauro Carvalho Chehab
306b1fb6f7SMauro Carvalho Chehab* Memory Socket
316b1fb6f7SMauro Carvalho Chehab
326b1fb6f7SMauro Carvalho ChehabA physical connector on the motherboard that accepts a single memory
336b1fb6f7SMauro Carvalho Chehabstick. Also called as "slot" on several datasheets.
346b1fb6f7SMauro Carvalho Chehab
356b1fb6f7SMauro Carvalho Chehab* Channel
366b1fb6f7SMauro Carvalho Chehab
376b1fb6f7SMauro Carvalho ChehabA memory controller channel, responsible to communicate with a group of
386b1fb6f7SMauro Carvalho ChehabDIMMs. Each channel has its own independent control (command) and data
396b1fb6f7SMauro Carvalho Chehabbus, and can be used independently or grouped with other channels.
406b1fb6f7SMauro Carvalho Chehab
416b1fb6f7SMauro Carvalho Chehab* Branch
426b1fb6f7SMauro Carvalho Chehab
436b1fb6f7SMauro Carvalho ChehabIt is typically the highest hierarchy on a Fully-Buffered DIMM memory
446b1fb6f7SMauro Carvalho Chehabcontroller. Typically, it contains two channels. Two channels at the
456b1fb6f7SMauro Carvalho Chehabsame branch can be used in single mode or in lockstep mode. When
466b1fb6f7SMauro Carvalho Chehablockstep is enabled, the cacheline is doubled, but it generally brings
476b1fb6f7SMauro Carvalho Chehabsome performance penalty. Also, it is generally not possible to point to
486b1fb6f7SMauro Carvalho Chehabjust one memory stick when an error occurs, as the error correction code
496b1fb6f7SMauro Carvalho Chehabis calculated using two DIMMs instead of one. Due to that, it is capable
506b1fb6f7SMauro Carvalho Chehabof correcting more errors than on single mode.
516b1fb6f7SMauro Carvalho Chehab
526b1fb6f7SMauro Carvalho Chehab* Single-channel
536b1fb6f7SMauro Carvalho Chehab
546b1fb6f7SMauro Carvalho ChehabThe data accessed by the memory controller is contained into one dimm
556b1fb6f7SMauro Carvalho Chehabonly. E. g. if the data is 64 bits-wide, the data flows to the CPU using
566b1fb6f7SMauro Carvalho Chehabone 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3
576b1fb6f7SMauro Carvalho Chehabmemories. FB-DIMM and RAMBUS use a different concept for channel, so
586b1fb6f7SMauro Carvalho Chehabthis concept doesn't apply there.
596b1fb6f7SMauro Carvalho Chehab
606b1fb6f7SMauro Carvalho Chehab* Double-channel
616b1fb6f7SMauro Carvalho Chehab
626b1fb6f7SMauro Carvalho ChehabThe data size accessed by the memory controller is interlaced into two
636b1fb6f7SMauro Carvalho Chehabdimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72
646b1fb6f7SMauro Carvalho Chehabbits with ECC), the data flows to the CPU using a 128 bits parallel
656b1fb6f7SMauro Carvalho Chehabaccess.
666b1fb6f7SMauro Carvalho Chehab
676b1fb6f7SMauro Carvalho Chehab* Chip-select row
686b1fb6f7SMauro Carvalho Chehab
696b1fb6f7SMauro Carvalho ChehabThis is the name of the DRAM signal used to select the DRAM ranks to be
706b1fb6f7SMauro Carvalho Chehabaccessed. Common chip-select rows for single channel are 64 bits, for
716b1fb6f7SMauro Carvalho Chehabdual channel 128 bits. It may not be visible by the memory controller,
726b1fb6f7SMauro Carvalho Chehabas some DIMM types have a memory buffer that can hide direct access to
736b1fb6f7SMauro Carvalho Chehabit from the Memory Controller.
746b1fb6f7SMauro Carvalho Chehab
756b1fb6f7SMauro Carvalho Chehab* Single-Ranked stick
766b1fb6f7SMauro Carvalho Chehab
776b1fb6f7SMauro Carvalho ChehabA Single-ranked stick has 1 chip-select row of memory. Motherboards
786b1fb6f7SMauro Carvalho Chehabcommonly drive two chip-select pins to a memory stick. A single-ranked
796b1fb6f7SMauro Carvalho Chehabstick, will occupy only one of those rows. The other will be unused.
806b1fb6f7SMauro Carvalho Chehab
816b1fb6f7SMauro Carvalho Chehab.. _doubleranked:
826b1fb6f7SMauro Carvalho Chehab
836b1fb6f7SMauro Carvalho Chehab* Double-Ranked stick
846b1fb6f7SMauro Carvalho Chehab
856b1fb6f7SMauro Carvalho ChehabA double-ranked stick has two chip-select rows which access different
866b1fb6f7SMauro Carvalho Chehabsets of memory devices.  The two rows cannot be accessed concurrently.
876b1fb6f7SMauro Carvalho Chehab
886b1fb6f7SMauro Carvalho Chehab* Double-sided stick
896b1fb6f7SMauro Carvalho Chehab
906b1fb6f7SMauro Carvalho Chehab**DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`.
916b1fb6f7SMauro Carvalho Chehab
926b1fb6f7SMauro Carvalho ChehabA double-sided stick has two chip-select rows which access different sets
936b1fb6f7SMauro Carvalho Chehabof memory devices. The two rows cannot be accessed concurrently.
946b1fb6f7SMauro Carvalho Chehab"Double-sided" is irrespective of the memory devices being mounted on
956b1fb6f7SMauro Carvalho Chehabboth sides of the memory stick.
966b1fb6f7SMauro Carvalho Chehab
976b1fb6f7SMauro Carvalho Chehab* Socket set
986b1fb6f7SMauro Carvalho Chehab
996b1fb6f7SMauro Carvalho ChehabAll of the memory sticks that are required for a single memory access or
1006b1fb6f7SMauro Carvalho Chehaball of the memory sticks spanned by a chip-select row.  A single socket
1016b1fb6f7SMauro Carvalho Chehabset has two chip-select rows and if double-sided sticks are used these
1026b1fb6f7SMauro Carvalho Chehabwill occupy those chip-select rows.
1036b1fb6f7SMauro Carvalho Chehab
1046b1fb6f7SMauro Carvalho Chehab* Bank
1056b1fb6f7SMauro Carvalho Chehab
1066b1fb6f7SMauro Carvalho ChehabThis term is avoided because it is unclear when needing to distinguish
1076b1fb6f7SMauro Carvalho Chehabbetween chip-select rows and socket sets.
1086b1fb6f7SMauro Carvalho Chehab
109*4f3fa571SMuralidhara M K* High Bandwidth Memory (HBM)
110*4f3fa571SMuralidhara M K
111*4f3fa571SMuralidhara M KHBM is a new memory type with low power consumption and ultra-wide
112*4f3fa571SMuralidhara M Kcommunication lanes. It uses vertically stacked memory chips (DRAM dies)
113*4f3fa571SMuralidhara M Kinterconnected by microscopic wires called "through-silicon vias," or
114*4f3fa571SMuralidhara M KTSVs.
115*4f3fa571SMuralidhara M K
116*4f3fa571SMuralidhara M KSeveral stacks of HBM chips connect to the CPU or GPU through an ultra-fast
117*4f3fa571SMuralidhara M Kinterconnect called the "interposer". Therefore, HBM's characteristics
118*4f3fa571SMuralidhara M Kare nearly indistinguishable from on-chip integrated RAM.
1196b1fb6f7SMauro Carvalho Chehab
1206634fbb6SMauro Carvalho ChehabMemory Controllers
1216634fbb6SMauro Carvalho Chehab------------------
1226634fbb6SMauro Carvalho Chehab
1236634fbb6SMauro Carvalho ChehabMost of the EDAC core is focused on doing Memory Controller error detection.
1246634fbb6SMauro Carvalho ChehabThe :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info``
1256634fbb6SMauro Carvalho Chehabto describe the memory controllers, with is an opaque struct for the EDAC
1266634fbb6SMauro Carvalho Chehabdrivers. Only the EDAC core is allowed to touch it.
1276634fbb6SMauro Carvalho Chehab
1286634fbb6SMauro Carvalho Chehab.. kernel-doc:: include/linux/edac.h
1296634fbb6SMauro Carvalho Chehab
1306634fbb6SMauro Carvalho Chehab.. kernel-doc:: drivers/edac/edac_mc.h
1316634fbb6SMauro Carvalho Chehab
1326634fbb6SMauro Carvalho ChehabPCI Controllers
1336634fbb6SMauro Carvalho Chehab---------------
1346634fbb6SMauro Carvalho Chehab
1356634fbb6SMauro Carvalho ChehabThe EDAC subsystem provides a mechanism to handle PCI controllers by calling
1366634fbb6SMauro Carvalho Chehabthe :c:func:`edac_pci_alloc_ctl_info`. It will use the struct
1376634fbb6SMauro Carvalho Chehab:c:type:`edac_pci_ctl_info` to describe the PCI controllers.
1386634fbb6SMauro Carvalho Chehab
1396634fbb6SMauro Carvalho Chehab.. kernel-doc:: drivers/edac/edac_pci.h
1406634fbb6SMauro Carvalho Chehab
1416634fbb6SMauro Carvalho ChehabEDAC Blocks
1426634fbb6SMauro Carvalho Chehab-----------
1436634fbb6SMauro Carvalho Chehab
1446634fbb6SMauro Carvalho ChehabThe EDAC subsystem also provides a generic mechanism to report errors on
1456634fbb6SMauro Carvalho Chehabother parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function.
1466634fbb6SMauro Carvalho Chehab
1476634fbb6SMauro Carvalho ChehabThe structures :c:type:`edac_dev_sysfs_block_attribute`,
1486634fbb6SMauro Carvalho Chehab:c:type:`edac_device_block`, :c:type:`edac_device_instance` and
1496634fbb6SMauro Carvalho Chehab:c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device'
1506634fbb6SMauro Carvalho Chehabrepresentation at sysfs.
1516634fbb6SMauro Carvalho Chehab
1526634fbb6SMauro Carvalho ChehabThis set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or
1536634fbb6SMauro Carvalho ChehabPCI, like:
1546634fbb6SMauro Carvalho Chehab
1556634fbb6SMauro Carvalho Chehab- CPU caches (L1 and L2)
1566634fbb6SMauro Carvalho Chehab- DMA engines
1576634fbb6SMauro Carvalho Chehab- Core CPU switches
1586634fbb6SMauro Carvalho Chehab- Fabric switch units
1596634fbb6SMauro Carvalho Chehab- PCIe interface controllers
1606634fbb6SMauro Carvalho Chehab- other EDAC/ECC type devices that can be monitored for
1616634fbb6SMauro Carvalho Chehab  errors, etc.
1626634fbb6SMauro Carvalho Chehab
1636634fbb6SMauro Carvalho ChehabIt allows for a 2 level set of hierarchy.
1646634fbb6SMauro Carvalho Chehab
1656634fbb6SMauro Carvalho ChehabFor example, a cache could be composed of L1, L2 and L3 levels of cache.
1666634fbb6SMauro Carvalho ChehabEach CPU core would have its own L1 cache, while sharing L2 and maybe L3
1676634fbb6SMauro Carvalho Chehabcaches. On such case, those can be represented via the following sysfs
1686634fbb6SMauro Carvalho Chehabnodes::
1696634fbb6SMauro Carvalho Chehab
1706634fbb6SMauro Carvalho Chehab	/sys/devices/system/edac/..
1716634fbb6SMauro Carvalho Chehab
1726634fbb6SMauro Carvalho Chehab	pci/		<existing pci directory (if available)>
1736634fbb6SMauro Carvalho Chehab	mc/		<existing memory device directory>
1746634fbb6SMauro Carvalho Chehab	cpu/cpu0/..	<L1 and L2 block directory>
1756634fbb6SMauro Carvalho Chehab		/L1-cache/ce_count
1766634fbb6SMauro Carvalho Chehab			 /ue_count
1776634fbb6SMauro Carvalho Chehab		/L2-cache/ce_count
1786634fbb6SMauro Carvalho Chehab			 /ue_count
1796634fbb6SMauro Carvalho Chehab	cpu/cpu1/..	<L1 and L2 block directory>
1806634fbb6SMauro Carvalho Chehab		/L1-cache/ce_count
1816634fbb6SMauro Carvalho Chehab			 /ue_count
1826634fbb6SMauro Carvalho Chehab		/L2-cache/ce_count
1836634fbb6SMauro Carvalho Chehab			 /ue_count
1846634fbb6SMauro Carvalho Chehab	...
1856634fbb6SMauro Carvalho Chehab
1866634fbb6SMauro Carvalho Chehab	the L1 and L2 directories would be "edac_device_block's"
1876634fbb6SMauro Carvalho Chehab
1886634fbb6SMauro Carvalho Chehab.. kernel-doc:: drivers/edac/edac_device.h
189*4f3fa571SMuralidhara M K
190*4f3fa571SMuralidhara M K
191*4f3fa571SMuralidhara M KHeterogeneous system support
192*4f3fa571SMuralidhara M K----------------------------
193*4f3fa571SMuralidhara M K
194*4f3fa571SMuralidhara M KAn AMD heterogeneous system is built by connecting the data fabrics of
195*4f3fa571SMuralidhara M Kboth CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
196*4f3fa571SMuralidhara M KGPU nodes can be accessed the same way as the data fabric on CPU nodes.
197*4f3fa571SMuralidhara M K
198*4f3fa571SMuralidhara M KThe MI200 accelerators are data center GPUs. They have 2 data fabrics,
199*4f3fa571SMuralidhara M Kand each GPU data fabric contains four Unified Memory Controllers (UMC).
200*4f3fa571SMuralidhara M KEach UMC contains eight channels. Each UMC channel controls one 128-bit
201*4f3fa571SMuralidhara M KHBM2e (2GB) channel (equivalent to 8 X 2GB ranks).  This creates a total
202*4f3fa571SMuralidhara M Kof 4096-bits of DRAM data bus.
203*4f3fa571SMuralidhara M K
204*4f3fa571SMuralidhara M KWhile the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
205*4f3fa571SMuralidhara M Kchannel is interfacing 2GB of DRAM (represented as rank).
206*4f3fa571SMuralidhara M K
207*4f3fa571SMuralidhara M KMemory controllers on AMD GPU nodes can be represented in EDAC thusly:
208*4f3fa571SMuralidhara M K
209*4f3fa571SMuralidhara M K	GPU DF / GPU Node -> EDAC MC
210*4f3fa571SMuralidhara M K	GPU UMC           -> EDAC CSROW
211*4f3fa571SMuralidhara M K	GPU UMC channel   -> EDAC CHANNEL
212*4f3fa571SMuralidhara M K
213*4f3fa571SMuralidhara M KFor example: a heterogeneous system with 1 AMD CPU is connected to
214*4f3fa571SMuralidhara M K4 MI200 (Aldebaran) GPUs using xGMI.
215*4f3fa571SMuralidhara M K
216*4f3fa571SMuralidhara M KSome more heterogeneous hardware details:
217*4f3fa571SMuralidhara M K
218*4f3fa571SMuralidhara M K- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
219*4f3fa571SMuralidhara M K  They have chip selects (csrows) and channels. However, the layouts are different
220*4f3fa571SMuralidhara M K  for performance, physical layout, or other reasons.
221*4f3fa571SMuralidhara M K- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
222*4f3fa571SMuralidhara M K  marketing speak. CPU has X memory channels, etc.
223*4f3fa571SMuralidhara M K- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
224*4f3fa571SMuralidhara M K- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
225*4f3fa571SMuralidhara M K- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
226*4f3fa571SMuralidhara M K
227*4f3fa571SMuralidhara M KThe EDAC subsystem provides a mechanism to handle AMD heterogeneous
228*4f3fa571SMuralidhara M Ksystems by calling system specific ops for both CPUs and GPUs.
229*4f3fa571SMuralidhara M K
230*4f3fa571SMuralidhara M KAMD GPU nodes are enumerated in sequential order based on the PCI
231*4f3fa571SMuralidhara M Khierarchy, and the first GPU node is assumed to have a Node ID value
232*4f3fa571SMuralidhara M Kfollowing those of the CPU nodes after latter are fully populated::
233*4f3fa571SMuralidhara M K
234*4f3fa571SMuralidhara M K	$ ls /sys/devices/system/edac/mc/
235*4f3fa571SMuralidhara M K		mc0   - CPU MC node 0
236*4f3fa571SMuralidhara M K		mc1  |
237*4f3fa571SMuralidhara M K		mc2  |- GPU card[0] => node 0(mc1), node 1(mc2)
238*4f3fa571SMuralidhara M K		mc3  |
239*4f3fa571SMuralidhara M K		mc4  |- GPU card[1] => node 0(mc3), node 1(mc4)
240*4f3fa571SMuralidhara M K		mc5  |
241*4f3fa571SMuralidhara M K		mc6  |- GPU card[2] => node 0(mc5), node 1(mc6)
242*4f3fa571SMuralidhara M K		mc7  |
243*4f3fa571SMuralidhara M K		mc8  |- GPU card[3] => node 0(mc7), node 1(mc8)
244*4f3fa571SMuralidhara M K
245*4f3fa571SMuralidhara M KFor example, a heterogeneous system with one AMD CPU is connected to
246*4f3fa571SMuralidhara M Kfour MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
247*4f3fa571SMuralidhara M Kvia the following sysfs entries::
248*4f3fa571SMuralidhara M K
249*4f3fa571SMuralidhara M K	/sys/devices/system/edac/mc/..
250*4f3fa571SMuralidhara M K
251*4f3fa571SMuralidhara M K	CPU			# CPU node
252*4f3fa571SMuralidhara M K	├── mc 0
253*4f3fa571SMuralidhara M K
254*4f3fa571SMuralidhara M K	GPU Nodes are enumerated sequentially after CPU nodes have been populated
255*4f3fa571SMuralidhara M K	GPU card 1		# Each MI200 GPU has 2 nodes/mcs
256*4f3fa571SMuralidhara M K	├── mc 1		# GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
257*4f3fa571SMuralidhara M K	│   ├── csrow 0		# UMC 0
258*4f3fa571SMuralidhara M K	│   │   ├── channel 0	# Each UMC has 8 channels
259*4f3fa571SMuralidhara M K	│   │   ├── channel 1   # size of each channel is 2 GB, so each UMC has 16 GB
260*4f3fa571SMuralidhara M K	│   │   ├── channel 2
261*4f3fa571SMuralidhara M K	│   │   ├── channel 3
262*4f3fa571SMuralidhara M K	│   │   ├── channel 4
263*4f3fa571SMuralidhara M K	│   │   ├── channel 5
264*4f3fa571SMuralidhara M K	│   │   ├── channel 6
265*4f3fa571SMuralidhara M K	│   │   ├── channel 7
266*4f3fa571SMuralidhara M K	│   ├── csrow 1		# UMC 1
267*4f3fa571SMuralidhara M K	│   │   ├── channel 0
268*4f3fa571SMuralidhara M K	│   │   ├── ..
269*4f3fa571SMuralidhara M K	│   │   ├── channel 7
270*4f3fa571SMuralidhara M K	│   ├── ..		..
271*4f3fa571SMuralidhara M K	│   ├── csrow 3		# UMC 3
272*4f3fa571SMuralidhara M K	│   │   ├── channel 0
273*4f3fa571SMuralidhara M K	│   │   ├── ..
274*4f3fa571SMuralidhara M K	│   │   ├── channel 7
275*4f3fa571SMuralidhara M K	│   ├── rank 0
276*4f3fa571SMuralidhara M K	│   ├── ..		..
277*4f3fa571SMuralidhara M K	│   ├── rank 31		# total 32 ranks/dimms from 4 UMCs
278*4f3fa571SMuralidhara M K279*4f3fa571SMuralidhara M K	├── mc 2		# GPU node 1 == mc2
280*4f3fa571SMuralidhara M K	│   ├── ..		# each GPU has total 64 GB
281*4f3fa571SMuralidhara M K
282*4f3fa571SMuralidhara M K	GPU card 2
283*4f3fa571SMuralidhara M K	├── mc 3
284*4f3fa571SMuralidhara M K	│   ├── ..
285*4f3fa571SMuralidhara M K	├── mc 4
286*4f3fa571SMuralidhara M K	│   ├── ..
287*4f3fa571SMuralidhara M K
288*4f3fa571SMuralidhara M K	GPU card 3
289*4f3fa571SMuralidhara M K	├── mc 5
290*4f3fa571SMuralidhara M K	│   ├── ..
291*4f3fa571SMuralidhara M K	├── mc 6
292*4f3fa571SMuralidhara M K	│   ├── ..
293*4f3fa571SMuralidhara M K
294*4f3fa571SMuralidhara M K	GPU card 4
295*4f3fa571SMuralidhara M K	├── mc 7
296*4f3fa571SMuralidhara M K	│   ├── ..
297*4f3fa571SMuralidhara M K	├── mc 8
298*4f3fa571SMuralidhara M K	│   ├── ..
299