| 7c410e29 | 11-Feb-2026 |
Daniel Osawa <dosawa@nvidia.com> |
nvidia-events: Hex encoding and schema alignment
Wrap eventInfo and context data under variant keys (cpu/gpu, opaque/type1-4/gpuMetadata/gpuLegacyXid/gpuRecommendedActions) to match the Redfish XML
nvidia-events: Hex encoding and schema alignment
Wrap eventInfo and context data under variant keys (cpu/gpu, opaque/type1-4/gpuMetadata/gpuLegacyXid/gpuRecommendedActions) to match the Redfish XML schema's wrapper pattern used by all other oneOf types in libcper.
Encode hardware identifiers and opaque fields as hex strings: - eventHeader: type, subtype, linkId - eventInfo.cpu: Ecid1-4, InstanceBase - eventInfo.gpu: Pdi (0x hex), EventOriginator (raw/value dict) - data.type1/2/3/4: key and value fields - data.gpuMetadata: configuration, pdi (MAC-style XX:XX:...), architectureId (decomposed raw + architecture name), pciInfo fields (class/subclass/rev/vendorId/deviceId/etc) - data.gpuRecommendedActions: flags - data.opaque: flattened to direct hex string
Fix GPU context JSON schemas (gpuMetadata, gpuLegacyXid, gpuRecommendedActions) to match their C struct definitions.
Change get_value_hex_16/32/64 signatures to void* to handle packed struct member addresses safely with clang.
Add GPU init and UCE ECC example CPER pairs with unit tests.
Change-Id: I30602e6a34c18bf511f5eefa5cb1dd7707b2c3b0 Signed-off-by: Daniel Osawa <dosawa@nvidia.com>
show more ...
|
| 529d0817 | 06-Feb-2026 |
Daniel Osawa <dosawa@nvidia.com> |
nvidia-events: Decode Architecture field in CPU event info
Decode the 32-bit Architecture field into its component bit fields: - hidFam (bits 3:0): Hardware ID family - majorRev (bits 7:4): Major re
nvidia-events: Decode Architecture field in CPU event info
Decode the 32-bit Architecture field into its component bit fields: - hidFam (bits 3:0): Hardware ID family - majorRev (bits 7:4): Major revision - chipId (bits 15:8): Chip ID - minorRev (bits 19:16): Minor revision - preSiPlatform (bits 24:20): Silicon/PreSilicon indicator - einjTag (bit 31): Error injection tag
Change-Id: I7d3030bf5c9c064a3ee0fdbcac65b962ea4a12c6 Signed-off-by: Daniel Osawa <dosawa@nvidia.com>
show more ...
|
| 5beecea6 | 06-Feb-2026 |
Daniel Osawa <dosawa@nvidia.com> |
nvidia-events: Fix EventContextCount and remove compat
Fix EventContextCount being written as 0 during JSON-to-binary conversion by moving context counting before the header fwrite().
Also remove b
nvidia-events: Fix EventContextCount and remove compat
Fix EventContextCount being written as 0 during JSON-to-binary conversion by moving context counting before the header fwrite().
Also remove backward compatibility for eventContexts as object format - only array format is now supported.
Change-Id: Ied643264148e4faeff6d73b9c3f6519511f11992 Signed-off-by: Daniel Osawa <dosawa@nvidia.com>
show more ...
|
| 3f810e5b | 04-Feb-2026 |
Ed Tanous <ed@tanous.net> |
Implement add_base64 helper
The majority of our code calling add_string_len is doing so to add a base64 encoded string. Break that out into a helper, and refactor.
Change-Id: I0d49b4636d11b7c307a2
Implement add_base64 helper
The majority of our code calling add_string_len is doing so to add a base64 encoded string. Break that out into a helper, and refactor.
Change-Id: I0d49b4636d11b7c307a20788740d1c593c06758d Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 6c5d2f34 | 02-Feb-2026 |
Ed Tanous <ed@tanous.net> |
Consistently use helpers
add_int and the new add_uint helper functions reduce the amount of code we write, and make adding things to json more "normal". Additionally add add_string and add_string_l
Consistently use helpers
add_int and the new add_uint helper functions reduce the amount of code we write, and make adding things to json more "normal". Additionally add add_string and add_string_len, which add a null terminated and non null terminated string respectively.
This commit was done largely using ast-grep.
Change-Id: Id18ddc2405e95b9b3c8aeb832b5903eb90e1267c Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 51c18132 | 26-Nov-2025 |
Daniel Osawa <dosawa@nvidia.com> |
Add NVIDIA Event CPER section support
Add parsing and generation for NVIDIA Event error sections, including: - CPU and GPU device-specific event info - Multiple context data formats (key-value pairs
Add NVIDIA Event CPER section support
Add parsing and generation for NVIDIA Event error sections, including: - CPU and GPU device-specific event info - Multiple context data formats (key-value pairs, opaque, GPU metadata, legacy XID, recommended actions) - JSON schema specifications - Example files and tests
Change-Id: Ibf66e2e4263014c2157958acf2f6158361fc6866 Signed-off-by: Daniel Osawa <dosawa@nvidia.com> Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| e1cba52d | 18-Sep-2025 |
Prachotan Bathi <prachotan.bathi@arm.com> |
cper-section-arm-ras: Support Arm RAS System Architecture node CPER
The DEN0085 - Arm ACPI for the Armv8-A RAS Extension and RAS System Architecture v2.0 specification, section 4, defines additional
cper-section-arm-ras: Support Arm RAS System Architecture node CPER
The DEN0085 - Arm ACPI for the Armv8-A RAS Extension and RAS System Architecture v2.0 specification, section 4, defines additional standard CPER records for Arm RAS architecture. https://developer.arm.com/documentation/den0085/latest/
Added section definitions and generator to generate an example cper with one descriptor. Generate using: ./cper-generate --out cper.generated.dump --sections arm-ras-node
Signed-off-by: Prachotan Bathi <prachotan.bathi@arm.com> Signed-off-by: Ed Tanous <etanous@nvidia.com> Change-Id: Ic7fa68a6c584c537a3dc2c4b17795dd7ba3b3f8c
show more ...
|
| 84752233 | 21-Jan-2026 |
Ed Tanous <etanous@nvidia.com> |
Use size of buffer for copy
When using the untrusted functions, use the length of the buffer, rather than strlen to determine the correct size of the char array.
While we're there, change the gener
Use size of buffer for copy
When using the untrusted functions, use the length of the buffer, rather than strlen to determine the correct size of the char array.
While we're there, change the generation function to explicitly load all bytes of the signature with zeros.
Tested: Unit tests pass.
Change-Id: I588c7f03dec0f749dad76e776ae818b31351d45c Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 9f689326 | 12-Jan-2026 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Enable fuzzing path for nvidia-cmet
Randomize the Signature property to point to any of the pre-existing types. This will ensure Fuzzing catches errors related to this code path.
Change-Id: Ic2ae0c
Enable fuzzing path for nvidia-cmet
Randomize the Signature property to point to any of the pre-existing types. This will ensure Fuzzing catches errors related to this code path.
Change-Id: Ic2ae0c7d57722fa7b573afac11fe3e67c91b220e Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 680875cb | 13-Jan-2026 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
CMET size check off by 1
The size calculation was off by 1 in the old code. We need to account for EFI_NVIDIA_REGISTER_DATA + EFI_NVIDIA_ERROR_DATA when i==0.
Change-Id: Ib5d29f90cd52a199dd2fa391ba
CMET size check off by 1
The size calculation was off by 1 in the old code. We need to account for EFI_NVIDIA_REGISTER_DATA + EFI_NVIDIA_ERROR_DATA when i==0.
Change-Id: Ib5d29f90cd52a199dd2fa391bac5d4bb634083a6 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| f125142c | 09-Jan-2026 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Fix memory leaks in to_ir conversions
Memory was not being freed properly on bail conditions in the section to_ir functions. This frees description strings, frees json objects and assigns descriptio
Fix memory leaks in to_ir conversions
Memory was not being freed properly on bail conditions in the section to_ir functions. This frees description strings, frees json objects and assigns description pointers to null.
Change-Id: I69c1efdeaeb4796033e7f42b22569c62ced77c42 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 043d5f4b | 17-Oct-2025 |
Erwin Tsaur <etsaur@nvidia.com> |
ARM CPER: Decode ErrorType as bit values
ErrorInformation.ErrorType needs to be decoded as bit values instead of as an integer.
Change-Id: Iee09eb6e62561620d0903fea1ae4d6ed35898445 Signed-off-by: E
ARM CPER: Decode ErrorType as bit values
ErrorInformation.ErrorType needs to be decoded as bit values instead of as an integer.
Change-Id: Iee09eb6e62561620d0903fea1ae4d6ed35898445 Signed-off-by: Erwin Tsaur <etsaur@nvidia.com>
show more ...
|
| f1c89124 | 14-Nov-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Handle Unknown ARM error type's description string
If an arm processor error type is not populated, ensure its description string reflects this. Also handle case where errorInfo type is invalid.
Be
Handle Unknown ARM error type's description string
If an arm processor error type is not populated, ensure its description string reflects this. Also handle case where errorInfo type is invalid.
Before: "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): { at Virtual Addr=0xE
After: "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): {Unknown Error at Virtual Addr=0xE
Change-Id: I46e23fcaaa5e3e424a30cd28fc0fe1d5725db5c4 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 991ebf22 | 14-Nov-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Initialize string mem buffers to 0
This will prevent garbage from populating the description string if not handled or copied correctly.
Change-Id: Icfb57cdaa283c40fbf8fffdde7621e2a1ac19ba4 Signed-o
Initialize string mem buffers to 0
This will prevent garbage from populating the description string if not handled or copied correctly.
Change-Id: Icfb57cdaa283c40fbf8fffdde7621e2a1ac19ba4 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| da75128c | 28-Jul-2025 |
Peter Benitez <pbenitez@nvidia.com> |
cper-section-memory: Fix validation dependency for Extended field bits
Fixed incorrect dependency between validation bits 16, 17, and 18 for the Extended field. Previously, cardSmbiosHandle (validat
cper-section-memory: Fix validation dependency for Extended field bits
Fixed incorrect dependency between validation bits 16, 17, and 18 for the Extended field. Previously, cardSmbiosHandle (validation bit 16) and moduleSmbiosHandle (validation bit 17) were incorrectly made dependent on the Extended field validation (bit 18), but these are independent components.
Validation bit 18 controls the Extended field containing row address bits 16 and 17, while validation bits 16 and 17 control SMBIOS handle fields. These SMBIOS handle fields are independent components that should be validated separately from the Extended field's row address bits.
Tested: Added memory-validation-bits unit test
Change-Id: I9461c71bf0b782bda74ed24c95b63c080f913b19 Signed-off-by: Peter Benitez <pbenitez@nvidia.com>
show more ...
|
| 9147b633 | 09-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Fix compilation issues with strncat
Use remaining buf size as "num" argument instead of bytes to copy.
Change-Id: Ie9a721fdcdde605bcdfa850b47da472db7412362 Signed-off-by: Aushim Nagarkatti <anagark
Fix compilation issues with strncat
Use remaining buf size as "num" argument instead of bytes to copy.
Change-Id: Ie9a721fdcdde605bcdfa850b47da472db7412362 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 89833fe4 | 08-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Use long unsigned for CPU socket type
PRIu64 to make compatible with both 32 and 64b Fix format descriptor for cpu socket.
Change-Id: I1485b89d1d3cf896fc58d067469fd8fcff5bd776 Signed-off-by: Aushim
Use long unsigned for CPU socket type
PRIu64 to make compatible with both 32 and 64b Fix format descriptor for cpu socket.
Change-Id: I1485b89d1d3cf896fc58d067469fd8fcff5bd776 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| df2248dc | 08-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Use long unsigned for CPU socket type
Fix format descriptor for cpu socket.
Change-Id: I175a09e23393c51fdc466cc83b70ad08fe80822b Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com> |
| ad6c880f | 18-Jun-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Support to stringify CPER output
Initial commit to add a "message" property that provides a single line description of some important properties. This makes it easier to parse multiple CPERs in crow
Support to stringify CPER output
Initial commit to add a "message" property that provides a single line description of some important properties. This makes it easier to parse multiple CPERs in crowded logs.
For now, "message" is supported for nvidia, arm processor and memory types. The other types contain generic messages.
Example output:
``` "sections":[ { "message":"A Corrected CCPLEXSCF NVIDIA Error occurred on CPU 0", "Nvidia":{ "signature":"CCPLEXSC",
"sections":[ { "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): {Cache Error at Virtual Addr=0x41D6AA12D528 Physical Addr=0x80003A198DDA10}", "ArmProcessor":{ "errorInfoNum":1,
"sections":[ { "message":"A Multi-bit ECC Memory Error occurred at address 0x0000000080000000 at node 0", ```
Change-Id: I395d0370ec60579b8f7fede825b45a3ced8ff18f Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| eda19ff0 | 10-Jun-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Rename PCIe properties
For CSDL compatibility property names shouldn't begin with underscores or digits. Fix PCIe names.
Change-Id: I6a801e26550320f808a2cac2d91f8bd913a0eabf Signed-off-by: Aushim N
Rename PCIe properties
For CSDL compatibility property names shouldn't begin with underscores or digits. Fix PCIe names.
Change-Id: I6a801e26550320f808a2cac2d91f8bd913a0eabf Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| ffa7e17d | 29-May-2025 |
Ed Tanous <etanous@nvidia.com> |
Add schema for PCIe aerInfo
A few commits ago, we punted and didn't include a schema for aerinfo. This commit reenables the json schema, and corrects the config for the PCIe error fields.
There are
Add schema for PCIe aerInfo
A few commits ago, we punted and didn't include a schema for aerinfo. This commit reenables the json schema, and corrects the config for the PCIe error fields.
There are certain objects that have zero properties. These are commented out temporarily to ensure that we don't have empty objects in the output, which would confuse users.
Change-Id: Id756cd90348cd77a1647c2781a6ce26e7d9a3485 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 55968b12 | 06-May-2025 |
Ed Tanous <ed@tanous.net> |
Nvidia add cmet-info
Add decoding of more specific Error codes.
Unit tests pass.
Change-Id: Ia0ca0dfdf550381da435b0fb9041b664784f7476 Signed-off-by: Ed Tanous <etanous@nvidia.com> |
| 9f260e5e | 24-Apr-2025 |
Ed Tanous <etanous@nvidia.com> |
Add decode of pcie device class
This commit makes two major changes. First, the device class tables were being decoded incorrectly as LSB-first. This changes them to MSB-first.
Second, once this v
Add decode of pcie device class
This commit makes two major changes. First, the device class tables were being decoded incorrectly as LSB-first. This changes them to MSB-first.
Second, once this value is correct decode the tables to a string name making it much easier for users to identify a specific device that caused a PCIe error.
Note: The majority of the mode lookup table was created by LLM. The entries have been manually reviewed for correctness and completeness.
Change-Id: I61bc813dbab39ca6116046e302dafca9fbbb0893 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 1bc852ab | 09-Apr-2025 |
Ed Tanous <etanous@nvidia.com> |
Allow decoding nvidia CPERs
There are cases where satmc might not be able to buffer the full CPER message. In those cases, it is advantageous if the decoder continues as far as it can.
This commit
Allow decoding nvidia CPERs
There are cases where satmc might not be able to buffer the full CPER message. In those cases, it is advantageous if the decoder continues as far as it can.
This commit moves a range check in nvidia cpers such that if there is a buffer error, the CPER registers are still iterated until we hit the buffer error, then filled in with null after the buffer error. This makes it more clear what the issue is, and decodes more of the output.
Change-Id: Idf202860d4994e719c1b73b811e767ad94ee0cae Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 8870c074 | 28-Feb-2025 |
Erwin Tsaur <etsaur@nvidia.com> |
PCIe CPER Section Enhancement
This commit improves PCIe error reporting capabilities by: - Adding support for PCIe capability version detection and parsing - Expanding Advanced Error Reporting infor
PCIe CPER Section Enhancement
This commit improves PCIe error reporting capabilities by: - Adding support for PCIe capability version detection and parsing - Expanding Advanced Error Reporting information extraction
The changes include: - New capability_registers structure to decode PCIe capability registers - Updated PCIe JSON Schema to match - Support for PCIe 2.0+ extended registers when detected - Improved error source identification and root error status reporting - Fix typo for Advanced Error Reporting capabilit[i]es_control - Updated generate/gen-section-pcie.c and pcie.json example
In the future we could: - Implement TLP header log parsing with detailed descriptions - Add support for Flit mode in PCIe 2.0+ devices
Tested: - test/cper-tests passes - cper-convert to-json|to-cper on pcie.cper|json in example path - Tested "cper-convert to-json-section" using an extracted OS GHES PCIE CPER from error injection and compare against expected values
Note, schema validation is intentionally less restrictive than it could be for pcie advanced error reporting as it evolves.
Change-Id: Ifebb9d97d28a3a487a0aab53bf9e757afeedd64a Signed-off-by: Erwin Tsaur <etsaur@nvidia.com> Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|