1Error Detection And Correction (EDAC) Devices 2============================================= 3 4Main Concepts used at the EDAC subsystem 5---------------------------------------- 6 7There are several things to be aware of that aren't at all obvious, like 8*sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*, 9etc... 10 11These are some of the many terms that are thrown about that don't always 12mean what people think they mean (Inconceivable!). In the interest of 13creating a common ground for discussion, terms and their definitions 14will be established. 15 16* Memory devices 17 18The individual DRAM chips on a memory stick. These devices commonly 19output 4 and 8 bits each (x4, x8). Grouping several of these in parallel 20provides the number of bits that the memory controller expects: 21typically 72 bits, in order to provide 64 bits + 8 bits of ECC data. 22 23* Memory Stick 24 25A printed circuit board that aggregates multiple memory devices in 26parallel. In general, this is the Field Replaceable Unit (FRU) which 27gets replaced, in the case of excessive errors. Most often it is also 28called DIMM (Dual Inline Memory Module). 29 30* Memory Socket 31 32A physical connector on the motherboard that accepts a single memory 33stick. Also called as "slot" on several datasheets. 34 35* Channel 36 37A memory controller channel, responsible to communicate with a group of 38DIMMs. Each channel has its own independent control (command) and data 39bus, and can be used independently or grouped with other channels. 40 41* Branch 42 43It is typically the highest hierarchy on a Fully-Buffered DIMM memory 44controller. Typically, it contains two channels. Two channels at the 45same branch can be used in single mode or in lockstep mode. When 46lockstep is enabled, the cacheline is doubled, but it generally brings 47some performance penalty. Also, it is generally not possible to point to 48just one memory stick when an error occurs, as the error correction code 49is calculated using two DIMMs instead of one. Due to that, it is capable 50of correcting more errors than on single mode. 51 52* Single-channel 53 54The data accessed by the memory controller is contained into one dimm 55only. E. g. if the data is 64 bits-wide, the data flows to the CPU using 56one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3 57memories. FB-DIMM and RAMBUS use a different concept for channel, so 58this concept doesn't apply there. 59 60* Double-channel 61 62The data size accessed by the memory controller is interlaced into two 63dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72 64bits with ECC), the data flows to the CPU using a 128 bits parallel 65access. 66 67* Chip-select row 68 69This is the name of the DRAM signal used to select the DRAM ranks to be 70accessed. Common chip-select rows for single channel are 64 bits, for 71dual channel 128 bits. It may not be visible by the memory controller, 72as some DIMM types have a memory buffer that can hide direct access to 73it from the Memory Controller. 74 75* Single-Ranked stick 76 77A Single-ranked stick has 1 chip-select row of memory. Motherboards 78commonly drive two chip-select pins to a memory stick. A single-ranked 79stick, will occupy only one of those rows. The other will be unused. 80 81.. _doubleranked: 82 83* Double-Ranked stick 84 85A double-ranked stick has two chip-select rows which access different 86sets of memory devices. The two rows cannot be accessed concurrently. 87 88* Double-sided stick 89 90**DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`. 91 92A double-sided stick has two chip-select rows which access different sets 93of memory devices. The two rows cannot be accessed concurrently. 94"Double-sided" is irrespective of the memory devices being mounted on 95both sides of the memory stick. 96 97* Socket set 98 99All of the memory sticks that are required for a single memory access or 100all of the memory sticks spanned by a chip-select row. A single socket 101set has two chip-select rows and if double-sided sticks are used these 102will occupy those chip-select rows. 103 104* Bank 105 106This term is avoided because it is unclear when needing to distinguish 107between chip-select rows and socket sets. 108 109* High Bandwidth Memory (HBM) 110 111HBM is a new memory type with low power consumption and ultra-wide 112communication lanes. It uses vertically stacked memory chips (DRAM dies) 113interconnected by microscopic wires called "through-silicon vias," or 114TSVs. 115 116Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast 117interconnect called the "interposer". Therefore, HBM's characteristics 118are nearly indistinguishable from on-chip integrated RAM. 119 120Memory Controllers 121------------------ 122 123Most of the EDAC core is focused on doing Memory Controller error detection. 124The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info`` 125to describe the memory controllers, with is an opaque struct for the EDAC 126drivers. Only the EDAC core is allowed to touch it. 127 128.. kernel-doc:: include/linux/edac.h 129 130.. kernel-doc:: drivers/edac/edac_mc.h 131 132PCI Controllers 133--------------- 134 135The EDAC subsystem provides a mechanism to handle PCI controllers by calling 136the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct 137:c:type:`edac_pci_ctl_info` to describe the PCI controllers. 138 139.. kernel-doc:: drivers/edac/edac_pci.h 140 141EDAC Blocks 142----------- 143 144The EDAC subsystem also provides a generic mechanism to report errors on 145other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function. 146 147The structures :c:type:`edac_dev_sysfs_block_attribute`, 148:c:type:`edac_device_block`, :c:type:`edac_device_instance` and 149:c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device' 150representation at sysfs. 151 152This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or 153PCI, like: 154 155- CPU caches (L1 and L2) 156- DMA engines 157- Core CPU switches 158- Fabric switch units 159- PCIe interface controllers 160- other EDAC/ECC type devices that can be monitored for 161 errors, etc. 162 163It allows for a 2 level set of hierarchy. 164 165For example, a cache could be composed of L1, L2 and L3 levels of cache. 166Each CPU core would have its own L1 cache, while sharing L2 and maybe L3 167caches. On such case, those can be represented via the following sysfs 168nodes:: 169 170 /sys/devices/system/edac/.. 171 172 pci/ <existing pci directory (if available)> 173 mc/ <existing memory device directory> 174 cpu/cpu0/.. <L1 and L2 block directory> 175 /L1-cache/ce_count 176 /ue_count 177 /L2-cache/ce_count 178 /ue_count 179 cpu/cpu1/.. <L1 and L2 block directory> 180 /L1-cache/ce_count 181 /ue_count 182 /L2-cache/ce_count 183 /ue_count 184 ... 185 186 the L1 and L2 directories would be "edac_device_block's" 187 188.. kernel-doc:: drivers/edac/edac_device.h 189 190 191Heterogeneous system support 192---------------------------- 193 194An AMD heterogeneous system is built by connecting the data fabrics of 195both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the 196GPU nodes can be accessed the same way as the data fabric on CPU nodes. 197 198The MI200 accelerators are data center GPUs. They have 2 data fabrics, 199and each GPU data fabric contains four Unified Memory Controllers (UMC). 200Each UMC contains eight channels. Each UMC channel controls one 128-bit 201HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total 202of 4096-bits of DRAM data bus. 203 204While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC 205channel is interfacing 2GB of DRAM (represented as rank). 206 207Memory controllers on AMD GPU nodes can be represented in EDAC thusly: 208 209 GPU DF / GPU Node -> EDAC MC 210 GPU UMC -> EDAC CSROW 211 GPU UMC channel -> EDAC CHANNEL 212 213For example: a heterogeneous system with 1 AMD CPU is connected to 2144 MI200 (Aldebaran) GPUs using xGMI. 215 216Some more heterogeneous hardware details: 217 218- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC. 219 They have chip selects (csrows) and channels. However, the layouts are different 220 for performance, physical layout, or other reasons. 221- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the 222 marketing speak. CPU has X memory channels, etc. 223- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW. 224- GPU UMCs use 1 chip select, So UMC = EDAC CSROW. 225- GPU UMCs use 8 channels, So UMC channel = EDAC channel. 226 227The EDAC subsystem provides a mechanism to handle AMD heterogeneous 228systems by calling system specific ops for both CPUs and GPUs. 229 230AMD GPU nodes are enumerated in sequential order based on the PCI 231hierarchy, and the first GPU node is assumed to have a Node ID value 232following those of the CPU nodes after latter are fully populated:: 233 234 $ ls /sys/devices/system/edac/mc/ 235 mc0 - CPU MC node 0 236 mc1 | 237 mc2 |- GPU card[0] => node 0(mc1), node 1(mc2) 238 mc3 | 239 mc4 |- GPU card[1] => node 0(mc3), node 1(mc4) 240 mc5 | 241 mc6 |- GPU card[2] => node 0(mc5), node 1(mc6) 242 mc7 | 243 mc8 |- GPU card[3] => node 0(mc7), node 1(mc8) 244 245For example, a heterogeneous system with one AMD CPU is connected to 246four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented 247via the following sysfs entries:: 248 249 /sys/devices/system/edac/mc/.. 250 251 CPU # CPU node 252 ├── mc 0 253 254 GPU Nodes are enumerated sequentially after CPU nodes have been populated 255 GPU card 1 # Each MI200 GPU has 2 nodes/mcs 256 ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs 257 │ ├── csrow 0 # UMC 0 258 │ │ ├── channel 0 # Each UMC has 8 channels 259 │ │ ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB 260 │ │ ├── channel 2 261 │ │ ├── channel 3 262 │ │ ├── channel 4 263 │ │ ├── channel 5 264 │ │ ├── channel 6 265 │ │ ├── channel 7 266 │ ├── csrow 1 # UMC 1 267 │ │ ├── channel 0 268 │ │ ├── .. 269 │ │ ├── channel 7 270 │ ├── .. .. 271 │ ├── csrow 3 # UMC 3 272 │ │ ├── channel 0 273 │ │ ├── .. 274 │ │ ├── channel 7 275 │ ├── rank 0 276 │ ├── .. .. 277 │ ├── rank 31 # total 32 ranks/dimms from 4 UMCs 278 ├ 279 ├── mc 2 # GPU node 1 == mc2 280 │ ├── .. # each GPU has total 64 GB 281 282 GPU card 2 283 ├── mc 3 284 │ ├── .. 285 ├── mc 4 286 │ ├── .. 287 288 GPU card 3 289 ├── mc 5 290 │ ├── .. 291 ├── mc 6 292 │ ├── .. 293 294 GPU card 4 295 ├── mc 7 296 │ ├── .. 297 ├── mc 8 298 │ ├── .. 299