1What: /sys/bus/platform/devices/smpro-errmon.*/error_[core|mem|pcie|other]_[ce|ue] 2KernelVersion: 6.1 3Contact: Quan Nguyen <quan@os.amperecomputing.com> 4Description: 5 (RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record printed 6 in hex format according to the table below: 7 8 +--------+---------------+-------------+------------------------------------------------------------+ 9 | Offset | Field | Size (byte) | Description | 10 +--------+---------------+-------------+------------------------------------------------------------+ 11 | 00 | Error Type | 1 | See :ref:`the table below <smpro-error-types>` for details | 12 +--------+---------------+-------------+------------------------------------------------------------+ 13 | 01 | Subtype | 1 | See :ref:`the table below <smpro-error-types>` for details | 14 +--------+---------------+-------------+------------------------------------------------------------+ 15 | 02 | Instance | 2 | See :ref:`the table below <smpro-error-types>` for details | 16 +--------+---------------+-------------+------------------------------------------------------------+ 17 | 04 | Error status | 4 | See ARM RAS specification for details | 18 +--------+---------------+-------------+------------------------------------------------------------+ 19 | 08 | Error Address | 8 | See ARM RAS specification for details | 20 +--------+---------------+-------------+------------------------------------------------------------+ 21 | 16 | Error Misc 0 | 8 | See ARM RAS specification for details | 22 +--------+---------------+-------------+------------------------------------------------------------+ 23 | 24 | Error Misc 1 | 8 | See ARM RAS specification for details | 24 +--------+---------------+-------------+------------------------------------------------------------+ 25 | 32 | Error Misc 2 | 8 | See ARM RAS specification for details | 26 +--------+---------------+-------------+------------------------------------------------------------+ 27 | 40 | Error Misc 3 | 8 | See ARM RAS specification for details | 28 +--------+---------------+-------------+------------------------------------------------------------+ 29 30 The table below defines the value of error types, their subtype, subcomponent and instance: 31 32 .. _smpro-error-types: 33 34 +-----------------+------------+----------+----------------+----------------------------------------+ 35 | Error Group | Error Type | Sub type | Sub component | Instance | 36 +-----------------+------------+----------+----------------+----------------------------------------+ 37 | CPM (core) | 0 | 0 | Snoop-Logic | CPM # | 38 +-----------------+------------+----------+----------------+----------------------------------------+ 39 | CPM (core) | 0 | 2 | Armv8 Core 1 | CPM # | 40 +-----------------+------------+----------+----------------+----------------------------------------+ 41 | MCU (mem) | 1 | 1 | ERR1 | MCU # \| SLOT << 11 | 42 +-----------------+------------+----------+----------------+----------------------------------------+ 43 | MCU (mem) | 1 | 2 | ERR2 | MCU # \| SLOT << 11 | 44 +-----------------+------------+----------+----------------+----------------------------------------+ 45 | MCU (mem) | 1 | 3 | ERR3 | MCU # | 46 +-----------------+------------+----------+----------------+----------------------------------------+ 47 | MCU (mem) | 1 | 4 | ERR4 | MCU # | 48 +-----------------+------------+----------+----------------+----------------------------------------+ 49 | MCU (mem) | 1 | 5 | ERR5 | MCU # | 50 +-----------------+------------+----------+----------------+----------------------------------------+ 51 | MCU (mem) | 1 | 6 | ERR6 | MCU # | 52 +-----------------+------------+----------+----------------+----------------------------------------+ 53 | MCU (mem) | 1 | 7 | Link Error | MCU # | 54 +-----------------+------------+----------+----------------+----------------------------------------+ 55 | Mesh (other) | 2 | 0 | Cross Point | X \| (Y << 5) \| NS <<11 | 56 +-----------------+------------+----------+----------------+----------------------------------------+ 57 | Mesh (other) | 2 | 1 | Home Node(IO) | X \| (Y << 5) \| NS <<11 | 58 +-----------------+------------+----------+----------------+----------------------------------------+ 59 | Mesh (other) | 2 | 2 | Home Node(Mem) | X \| (Y << 5) \| NS <<11 \| device<<12 | 60 +-----------------+------------+----------+----------------+----------------------------------------+ 61 | Mesh (other) | 2 | 4 | CCIX Node | X \| (Y << 5) \| NS <<11 | 62 +-----------------+------------+----------+----------------+----------------------------------------+ 63 | 2P Link (other) | 3 | 0 | N/A | Altra 2P Link # | 64 +-----------------+------------+----------+----------------+----------------------------------------+ 65 | GIC (other) | 5 | 0 | ERR0 | 0 | 66 +-----------------+------------+----------+----------------+----------------------------------------+ 67 | GIC (other) | 5 | 1 | ERR1 | 0 | 68 +-----------------+------------+----------+----------------+----------------------------------------+ 69 | GIC (other) | 5 | 2 | ERR2 | 0 | 70 +-----------------+------------+----------+----------------+----------------------------------------+ 71 | GIC (other) | 5 | 3 | ERR3 | 0 | 72 +-----------------+------------+----------+----------------+----------------------------------------+ 73 | GIC (other) | 5 | 4 | ERR4 | 0 | 74 +-----------------+------------+----------+----------------+----------------------------------------+ 75 | GIC (other) | 5 | 5 | ERR5 | 0 | 76 +-----------------+------------+----------+----------------+----------------------------------------+ 77 | GIC (other) | 5 | 6 | ERR6 | 0 | 78 +-----------------+------------+----------+----------------+----------------------------------------+ 79 | GIC (other) | 5 | 7 | ERR7 | 0 | 80 +-----------------+------------+----------+----------------+----------------------------------------+ 81 | GIC (other) | 5 | 8 | ERR8 | 0 | 82 +-----------------+------------+----------+----------------+----------------------------------------+ 83 | GIC (other) | 5 | 9 | ERR9 | 0 | 84 +-----------------+------------+----------+----------------+----------------------------------------+ 85 | GIC (other) | 5 | 10 | ERR10 | 0 | 86 +-----------------+------------+----------+----------------+----------------------------------------+ 87 | GIC (other) | 5 | 11 | ERR11 | 0 | 88 +-----------------+------------+----------+----------------+----------------------------------------+ 89 | GIC (other) | 5 | 12 | ERR12 | 0 | 90 +-----------------+------------+----------+----------------+----------------------------------------+ 91 | GIC (other) | 5 | 13-21 | ERR13 | RC # + 1 | 92 +-----------------+------------+----------+----------------+----------------------------------------+ 93 | SMMU (other) | 6 | TCU | 100 | RC # | 94 +-----------------+------------+----------+----------------+----------------------------------------+ 95 | SMMU (other) | 6 | TBU0 | 0 | RC # | 96 +-----------------+------------+----------+----------------+----------------------------------------+ 97 | SMMU (other) | 6 | TBU1 | 1 | RC # | 98 +-----------------+------------+----------+----------------+----------------------------------------+ 99 | SMMU (other) | 6 | TBU2 | 2 | RC # | 100 +-----------------+------------+----------+----------------+----------------------------------------+ 101 | SMMU (other) | 6 | TBU3 | 3 | RC # | 102 +-----------------+------------+----------+----------------+----------------------------------------+ 103 | SMMU (other) | 6 | TBU4 | 4 | RC # | 104 +-----------------+------------+----------+----------------+----------------------------------------+ 105 | SMMU (other) | 6 | TBU5 | 5 | RC # | 106 +-----------------+------------+----------+----------------+----------------------------------------+ 107 | SMMU (other) | 6 | TBU6 | 6 | RC # | 108 +-----------------+------------+----------+----------------+----------------------------------------+ 109 | SMMU (other) | 6 | TBU7 | 7 | RC # | 110 +-----------------+------------+----------+----------------+----------------------------------------+ 111 | SMMU (other) | 6 | TBU8 | 8 | RC # | 112 +-----------------+------------+----------+----------------+----------------------------------------+ 113 | SMMU (other) | 6 | TBU9 | 9 | RC # | 114 +-----------------+------------+----------+----------------+----------------------------------------+ 115 | PCIe AER (pcie) | 7 | Root | 0 | RC # | 116 +-----------------+------------+----------+----------------+----------------------------------------+ 117 | PCIe AER (pcie) | 7 | Device | 1 | RC # | 118 +-----------------+------------+----------+----------------+----------------------------------------+ 119 | PCIe RC (pcie) | 8 | RCA HB | 0 | RC # | 120 +-----------------+------------+----------+----------------+----------------------------------------+ 121 | PCIe RC (pcie) | 8 | RCB HB | 1 | RC # | 122 +-----------------+------------+----------+----------------+----------------------------------------+ 123 | PCIe RC (pcie) | 8 | RASDP | 8 | RC # | 124 +-----------------+------------+----------+----------------+----------------------------------------+ 125 | OCM (other) | 9 | ERR0 | 0 | 0 | 126 +-----------------+------------+----------+----------------+----------------------------------------+ 127 | OCM (other) | 9 | ERR1 | 1 | 0 | 128 +-----------------+------------+----------+----------------+----------------------------------------+ 129 | OCM (other) | 9 | ERR2 | 2 | 0 | 130 +-----------------+------------+----------+----------------+----------------------------------------+ 131 | SMpro (other) | 10 | ERR0 | 0 | 0 | 132 +-----------------+------------+----------+----------------+----------------------------------------+ 133 | SMpro (other) | 10 | ERR1 | 1 | 0 | 134 +-----------------+------------+----------+----------------+----------------------------------------+ 135 | SMpro (other) | 10 | MPA_ERR | 2 | 0 | 136 +-----------------+------------+----------+----------------+----------------------------------------+ 137 | PMpro (other) | 11 | ERR0 | 0 | 0 | 138 +-----------------+------------+----------+----------------+----------------------------------------+ 139 | PMpro (other) | 11 | ERR1 | 1 | 0 | 140 +-----------------+------------+----------+----------------+----------------------------------------+ 141 | PMpro (other) | 11 | MPA_ERR | 2 | 0 | 142 +-----------------+------------+----------+----------------+----------------------------------------+ 143 144 Example:: 145 146 # cat error_other_ue 147 880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000 148 149 The detail of each sysfs entries is as below: 150 151 +-------------+---------------------------------------------------------+----------------------------------+ 152 | Error | Sysfs entry | Description (when triggered) | 153 +-------------+---------------------------------------------------------+----------------------------------+ 154 | Core's CE | /sys/bus/platform/devices/smpro-errmon.*/error_core_ce | Core has CE error | 155 +-------------+---------------------------------------------------------+----------------------------------+ 156 | Core's UE | /sys/bus/platform/devices/smpro-errmon.*/error_core_ue | Core has UE error | 157 +-------------+---------------------------------------------------------+----------------------------------+ 158 | Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ce | Memory has CE error | 159 +-------------+---------------------------------------------------------+----------------------------------+ 160 | Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ue | Memory has UE error | 161 +-------------+---------------------------------------------------------+----------------------------------+ 162 | PCIe's CE | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ce | any PCIe controller has CE error | 163 +-------------+---------------------------------------------------------+----------------------------------+ 164 | PCIe's UE | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ue | any PCIe controller has UE error | 165 +-------------+---------------------------------------------------------+----------------------------------+ 166 | Other's CE | /sys/bus/platform/devices/smpro-errmon.*/error_other_ce | any other CE error | 167 +-------------+---------------------------------------------------------+----------------------------------+ 168 | Other's UE | /sys/bus/platform/devices/smpro-errmon.*/error_other_ue | any other UE error | 169 +-------------+---------------------------------------------------------+----------------------------------+ 170 171 UE: Uncorrect-able Error 172 CE: Correct-able Error 173 174 For details, see section `3.3 Ampere (Vendor-Specific) Error Record Formats, 175 Altra Family RAS Supplement`. 176 177 178What: /sys/bus/platform/devices/smpro-errmon.*/overflow_[core|mem|pcie|other]_[ce|ue] 179KernelVersion: 6.1 180Contact: Quan Nguyen <quan@os.amperecomputing.com> 181Description: 182 (RO) Return the overflow status of each type HW error reported: 183 184 - 0 : No overflow 185 - 1 : There is an overflow and the oldest HW errors are dropped 186 187 The detail of each sysfs entries is as below: 188 189 +-------------+-----------------------------------------------------------+---------------------------------------+ 190 | Overflow | Sysfs entry | Description | 191 +-------------+-----------------------------------------------------------+---------------------------------------+ 192 | Core's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ce | Core CE error overflow | 193 +-------------+-----------------------------------------------------------+---------------------------------------+ 194 | Core's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ue | Core UE error overflow | 195 +-------------+-----------------------------------------------------------+---------------------------------------+ 196 | Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ce | Memory CE error overflow | 197 +-------------+-----------------------------------------------------------+---------------------------------------+ 198 | Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ue | Memory UE error overflow | 199 +-------------+-----------------------------------------------------------+---------------------------------------+ 200 | PCIe's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ce | any PCIe controller CE error overflow | 201 +-------------+-----------------------------------------------------------+---------------------------------------+ 202 | PCIe's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ue | any PCIe controller UE error overflow | 203 +-------------+-----------------------------------------------------------+---------------------------------------+ 204 | Other's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ce| any other CE error overflow | 205 +-------------+-----------------------------------------------------------+---------------------------------------+ 206 | Other's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ue| other UE error overflow | 207 +-------------+-----------------------------------------------------------+---------------------------------------+ 208 209 where: 210 211 - UE: Uncorrect-able Error 212 - CE: Correct-able Error 213 214What: /sys/bus/platform/devices/smpro-errmon.*/[error|warn]_[smpro|pmpro] 215KernelVersion: 6.1 216Contact: Quan Nguyen <quan@os.amperecomputing.com> 217Description: 218 (RO) Contains the internal firmware error/warning printed as hex format. 219 220 The detail of each sysfs entries is as below: 221 222 +---------------+------------------------------------------------------+--------------------------+ 223 | Error | Sysfs entry | Description | 224 +---------------+------------------------------------------------------+--------------------------+ 225 | SMpro error | /sys/bus/platform/devices/smpro-errmon.*/error_smpro | system has SMpro error | 226 +---------------+------------------------------------------------------+--------------------------+ 227 | SMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_smpro | system has SMpro warning | 228 +---------------+------------------------------------------------------+--------------------------+ 229 | PMpro error | /sys/bus/platform/devices/smpro-errmon.*/error_pmpro | system has PMpro error | 230 +---------------+------------------------------------------------------+--------------------------+ 231 | PMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_pmpro | system has PMpro warning | 232 +---------------+------------------------------------------------------+--------------------------+ 233 234 For details, see section `5.10 RAS Internal Error Register Definitions, 235 Altra Family Soc BMC Interface Specification`. 236 237What: /sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot] 238KernelVersion: 6.1 239Contact: Quan Nguyen <quan@os.amperecomputing.com> 240Description: 241 (RO) Contains the detail information in case of VRD/DIMM warning/hot events 242 in hex format as below:: 243 244 AAAA 245 246 where: 247 248 - ``AAAA``: The event detail information data 249 250 The detail of each sysfs entries is as below: 251 252 +---------------+---------------------------------------------------------------+---------------------+ 253 | Event | Sysfs entry | Description | 254 +---------------+---------------------------------------------------------------+---------------------+ 255 | VRD HOT | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_hot | VRD Hot | 256 +---------------+---------------------------------------------------------------+---------------------+ 257 | VR Warn/Fault | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_warn_fault | VR Warning or Fault | 258 +---------------+---------------------------------------------------------------+---------------------+ 259 | DIMM HOT | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot | DIMM Hot | 260 +---------------+---------------------------------------------------------------+---------------------+ 261 262 For more details, see section `5.7 GPI Status Registers, 263 Altra Family Soc BMC Interface Specification`. 264 265What: /sys/bus/platform/devices/smpro-misc.*/boot_progress 266KernelVersion: 6.1 267Contact: Quan Nguyen <quan@os.amperecomputing.com> 268Description: 269 (RO) Contains the boot stages information in hex as format below:: 270 271 AABBCCCCCCCC 272 273 where: 274 275 - ``AA`` : The boot stages 276 277 - 00: SMpro firmware booting 278 - 01: PMpro firmware booting 279 - 02: ATF BL1 firmware booting 280 - 03: DDR initialization 281 - 04: DDR training report status 282 - 05: ATF BL2 firmware booting 283 - 06: ATF BL31 firmware booting 284 - 07: ATF BL32 firmware booting 285 - 08: UEFI firmware booting 286 - 09: OS booting 287 288 - ``BB`` : Boot status 289 290 - 00: Not started 291 - 01: Started 292 - 02: Completed without error 293 - 03: Failed. 294 295 - ``CCCCCCCC``: Boot status information defined for each boot stages 296 297 For details, see section `5.11 Boot Stage Register Definitions` 298 and section `6. Processor Boot Progress Codes, Altra Family Soc BMC 299 Interface Specification`. 300 301 302What: /sys/bus/platform/devices/smpro-misc*/soc_power_limit 303KernelVersion: 6.1 304Contact: Quan Nguyen <quan@os.amperecomputing.com> 305Description: 306 (RW) Contains the desired SoC power limit in Watt. 307 Writes to this sysfs set the desired SoC power limit (W). 308 Reads from this register return the current SoC power limit (W). 309 The value ranges: 310 311 - Minimum: 120 W 312 - Maximum: Socket TDP power 313