| 043d5f4b | 17-Oct-2025 |
Erwin Tsaur <etsaur@nvidia.com> |
ARM CPER: Decode ErrorType as bit values
ErrorInformation.ErrorType needs to be decoded as bit values instead of as an integer.
Change-Id: Iee09eb6e62561620d0903fea1ae4d6ed35898445 Signed-off-by: E
ARM CPER: Decode ErrorType as bit values
ErrorInformation.ErrorType needs to be decoded as bit values instead of as an integer.
Change-Id: Iee09eb6e62561620d0903fea1ae4d6ed35898445 Signed-off-by: Erwin Tsaur <etsaur@nvidia.com>
show more ...
|
| f1c89124 | 14-Nov-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Handle Unknown ARM error type's description string
If an arm processor error type is not populated, ensure its description string reflects this. Also handle case where errorInfo type is invalid.
Be
Handle Unknown ARM error type's description string
If an arm processor error type is not populated, ensure its description string reflects this. Also handle case where errorInfo type is invalid.
Before: "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): { at Virtual Addr=0xE
After: "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): {Unknown Error at Virtual Addr=0xE
Change-Id: I46e23fcaaa5e3e424a30cd28fc0fe1d5725db5c4 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 991ebf22 | 14-Nov-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Initialize string mem buffers to 0
This will prevent garbage from populating the description string if not handled or copied correctly.
Change-Id: Icfb57cdaa283c40fbf8fffdde7621e2a1ac19ba4 Signed-o
Initialize string mem buffers to 0
This will prevent garbage from populating the description string if not handled or copied correctly.
Change-Id: Icfb57cdaa283c40fbf8fffdde7621e2a1ac19ba4 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| da75128c | 28-Jul-2025 |
Peter Benitez <pbenitez@nvidia.com> |
cper-section-memory: Fix validation dependency for Extended field bits
Fixed incorrect dependency between validation bits 16, 17, and 18 for the Extended field. Previously, cardSmbiosHandle (validat
cper-section-memory: Fix validation dependency for Extended field bits
Fixed incorrect dependency between validation bits 16, 17, and 18 for the Extended field. Previously, cardSmbiosHandle (validation bit 16) and moduleSmbiosHandle (validation bit 17) were incorrectly made dependent on the Extended field validation (bit 18), but these are independent components.
Validation bit 18 controls the Extended field containing row address bits 16 and 17, while validation bits 16 and 17 control SMBIOS handle fields. These SMBIOS handle fields are independent components that should be validated separately from the Extended field's row address bits.
Tested: Added memory-validation-bits unit test
Change-Id: I9461c71bf0b782bda74ed24c95b63c080f913b19 Signed-off-by: Peter Benitez <pbenitez@nvidia.com>
show more ...
|
| 9147b633 | 09-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Fix compilation issues with strncat
Use remaining buf size as "num" argument instead of bytes to copy.
Change-Id: Ie9a721fdcdde605bcdfa850b47da472db7412362 Signed-off-by: Aushim Nagarkatti <anagark
Fix compilation issues with strncat
Use remaining buf size as "num" argument instead of bytes to copy.
Change-Id: Ie9a721fdcdde605bcdfa850b47da472db7412362 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 89833fe4 | 08-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Use long unsigned for CPU socket type
PRIu64 to make compatible with both 32 and 64b Fix format descriptor for cpu socket.
Change-Id: I1485b89d1d3cf896fc58d067469fd8fcff5bd776 Signed-off-by: Aushim
Use long unsigned for CPU socket type
PRIu64 to make compatible with both 32 and 64b Fix format descriptor for cpu socket.
Change-Id: I1485b89d1d3cf896fc58d067469fd8fcff5bd776 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| df2248dc | 08-Jul-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Use long unsigned for CPU socket type
Fix format descriptor for cpu socket.
Change-Id: I175a09e23393c51fdc466cc83b70ad08fe80822b Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com> |
| ad6c880f | 18-Jun-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Support to stringify CPER output
Initial commit to add a "message" property that provides a single line description of some important properties. This makes it easier to parse multiple CPERs in crow
Support to stringify CPER output
Initial commit to add a "message" property that provides a single line description of some important properties. This makes it easier to parse multiple CPERs in crowded logs.
For now, "message" is supported for nvidia, arm processor and memory types. The other types contain generic messages.
Example output:
``` "sections":[ { "message":"A Corrected CCPLEXSCF NVIDIA Error occurred on CPU 0", "Nvidia":{ "signature":"CCPLEXSC",
"sections":[ { "message":"An ARM Processor Error occurred on CPU 0; Error Type(s): {Cache Error at Virtual Addr=0x41D6AA12D528 Physical Addr=0x80003A198DDA10}", "ArmProcessor":{ "errorInfoNum":1,
"sections":[ { "message":"A Multi-bit ECC Memory Error occurred at address 0x0000000080000000 at node 0", ```
Change-Id: I395d0370ec60579b8f7fede825b45a3ced8ff18f Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| eda19ff0 | 10-Jun-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Rename PCIe properties
For CSDL compatibility property names shouldn't begin with underscores or digits. Fix PCIe names.
Change-Id: I6a801e26550320f808a2cac2d91f8bd913a0eabf Signed-off-by: Aushim N
Rename PCIe properties
For CSDL compatibility property names shouldn't begin with underscores or digits. Fix PCIe names.
Change-Id: I6a801e26550320f808a2cac2d91f8bd913a0eabf Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| ffa7e17d | 29-May-2025 |
Ed Tanous <etanous@nvidia.com> |
Add schema for PCIe aerInfo
A few commits ago, we punted and didn't include a schema for aerinfo. This commit reenables the json schema, and corrects the config for the PCIe error fields.
There are
Add schema for PCIe aerInfo
A few commits ago, we punted and didn't include a schema for aerinfo. This commit reenables the json schema, and corrects the config for the PCIe error fields.
There are certain objects that have zero properties. These are commented out temporarily to ensure that we don't have empty objects in the output, which would confuse users.
Change-Id: Id756cd90348cd77a1647c2781a6ce26e7d9a3485 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 55968b12 | 06-May-2025 |
Ed Tanous <ed@tanous.net> |
Nvidia add cmet-info
Add decoding of more specific Error codes.
Unit tests pass.
Change-Id: Ia0ca0dfdf550381da435b0fb9041b664784f7476 Signed-off-by: Ed Tanous <etanous@nvidia.com> |
| 9f260e5e | 24-Apr-2025 |
Ed Tanous <etanous@nvidia.com> |
Add decode of pcie device class
This commit makes two major changes. First, the device class tables were being decoded incorrectly as LSB-first. This changes them to MSB-first.
Second, once this v
Add decode of pcie device class
This commit makes two major changes. First, the device class tables were being decoded incorrectly as LSB-first. This changes them to MSB-first.
Second, once this value is correct decode the tables to a string name making it much easier for users to identify a specific device that caused a PCIe error.
Note: The majority of the mode lookup table was created by LLM. The entries have been manually reviewed for correctness and completeness.
Change-Id: I61bc813dbab39ca6116046e302dafca9fbbb0893 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 1bc852ab | 09-Apr-2025 |
Ed Tanous <etanous@nvidia.com> |
Allow decoding nvidia CPERs
There are cases where satmc might not be able to buffer the full CPER message. In those cases, it is advantageous if the decoder continues as far as it can.
This commit
Allow decoding nvidia CPERs
There are cases where satmc might not be able to buffer the full CPER message. In those cases, it is advantageous if the decoder continues as far as it can.
This commit moves a range check in nvidia cpers such that if there is a buffer error, the CPER registers are still iterated until we hit the buffer error, then filled in with null after the buffer error. This makes it more clear what the issue is, and decodes more of the output.
Change-Id: Idf202860d4994e719c1b73b811e767ad94ee0cae Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 8870c074 | 28-Feb-2025 |
Erwin Tsaur <etsaur@nvidia.com> |
PCIe CPER Section Enhancement
This commit improves PCIe error reporting capabilities by: - Adding support for PCIe capability version detection and parsing - Expanding Advanced Error Reporting infor
PCIe CPER Section Enhancement
This commit improves PCIe error reporting capabilities by: - Adding support for PCIe capability version detection and parsing - Expanding Advanced Error Reporting information extraction
The changes include: - New capability_registers structure to decode PCIe capability registers - Updated PCIe JSON Schema to match - Support for PCIe 2.0+ extended registers when detected - Improved error source identification and root error status reporting - Fix typo for Advanced Error Reporting capabilit[i]es_control - Updated generate/gen-section-pcie.c and pcie.json example
In the future we could: - Implement TLP header log parsing with detailed descriptions - Add support for Flit mode in PCIe 2.0+ devices
Tested: - test/cper-tests passes - cper-convert to-json|to-cper on pcie.cper|json in example path - Tested "cper-convert to-json-section" using an extracted OS GHES PCIE CPER from error injection and compare against expected values
Note, schema validation is intentionally less restrictive than it could be for pcie advanced error reporting as it evolves.
Change-Id: Ifebb9d97d28a3a487a0aab53bf9e757afeedd64a Signed-off-by: Erwin Tsaur <etsaur@nvidia.com> Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 8e423945 | 15-Mar-2025 |
Ed Tanous <etanous@nvidia.com> |
Make sections return known values
Some sections when being fuzzed might not return valid values. Make sure behavior is consistent.
Change-Id: I5c334acd2208872a48a8a9f887317a5066cd4422 Signed-off-b
Make sections return known values
Some sections when being fuzzed might not return valid values. Make sure behavior is consistent.
Change-Id: I5c334acd2208872a48a8a9f887317a5066cd4422 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| d6b62637 | 14-Mar-2025 |
Ed Tanous <etanous@nvidia.com> |
Fix some json schema validation bugs
There were a couple of places where we would add null objects when they were not allowed. Fix them.
Change-Id: I7c4c12ea1fa2913014e79603995267a9e560e288 Signed
Fix some json schema validation bugs
There were a couple of places where we would add null objects when they were not allowed. Fix them.
Change-Id: I7c4c12ea1fa2913014e79603995267a9e560e288 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 50b966f7 | 11-Mar-2025 |
Ed Tanous <ed@tanous.net> |
Implement common logging function
When used as a library, it's desirable to be able to suppress logging, or pipe logging through a different path. This commit changes behavior such that logging is
Implement common logging function
When used as a library, it's desirable to be able to suppress logging, or pipe logging through a different path. This commit changes behavior such that logging is disabled by default, and introduces 2 new methods, cper_set_log_stdio and cper_set_log_custom.
These allow library integrators to specify their logging mode. In practice, this also allows fuzzing to run faster by not printing errors to the log.
Change-Id: I941476627bc9b8261ba5f6c0b2b2338fdf931dd2 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 1a648569 | 10-Mar-2025 |
Ed Tanous <etanous@nvidia.com> |
Rework guid for fuzzing
There's a lot of places we do guid comparisons against lists of known guids. Break these out into helper functions to help not duplicate the fuzzing logic in a lot of places
Rework guid for fuzzing
There's a lot of places we do guid comparisons against lists of known guids. Break these out into helper functions to help not duplicate the fuzzing logic in a lot of places, and allow us to fuzz these places appropriately.
Change-Id: I76c79cd62ccc95feb2609d5098db546f740711e1 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| c2ebdddb | 09-Mar-2025 |
Ed Tanous <ed@tanous.net> |
Fix up guid
GUIDs have some cases where they might not be representable as a string, due to an overflow. To handle this previously, the existing implementation just allocated extra space.
To ensur
Fix up guid
GUIDs have some cases where they might not be representable as a string, due to an overflow. To handle this previously, the existing implementation just allocated extra space.
To ensure that we are always publishing correct guids, break this function down into an add_guid method that we can call anytime we add a guid to json. This function can use the appropriate guid string length, and if we go over, we can make sure that we don't publish the string at all, by handling the appropriate error codes.
Change-Id: I98239b7d5ba7567cea1b016579d7566e292b6e81 Signed-off-by: Ed Tanous <ed@tanous.net>
show more ...
|
| 5e2164a0 | 09-Mar-2025 |
Ed Tanous <ed@tanous.net> |
More range checks
This is a second patch adding more range checks where appropriate.
Change-Id: Ie169efe8924153c9cc11e4472a1b07b8d04efb3b Signed-off-by: Ed Tanous <ed@tanous.net> |
| 12dbd4fd | 08-Mar-2025 |
Ed Tanous <etanous@nvidia.com> |
Fix range check bugs
This is a patch hunting for fuzzing failures and adding appropriate range checks.
Change-Id: Ieae02b7e461b9a6c5e25de6c663a768f7a0d5e10 Signed-off-by: Ed Tanous <etanous@nvidia
Fix range check bugs
This is a patch hunting for fuzzing failures and adding appropriate range checks.
Change-Id: Ieae02b7e461b9a6c5e25de6c663a768f7a0d5e10 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 5aedbb26 | 05-Mar-2025 |
Ed Tanous <etanous@nvidia.com> |
Make all section reads const
The way sections are read currently is unsafe in two ways, first, buffers are completely unchecked for length, and section, buffers are passed in as non-const void*.
St
Make all section reads const
The way sections are read currently is unsafe in two ways, first, buffers are completely unchecked for length, and section, buffers are passed in as non-const void*.
Start fixing things by making the sections const.
Change-Id: I02e9ded525e9710b56589a47a9cc4f3583c216df Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| ae8f6d9a | 29-Jan-2025 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Remove validation bits
Discard invalid properties from json decode. JSON output should only contain valid properties. This saves time in preventing post processing of output for valid fields.
Ensur
Remove validation bits
Discard invalid properties from json decode. JSON output should only contain valid properties. This saves time in preventing post processing of output for valid fields.
Ensure round trip validity with validation bits removed and required properties populated.
Fix bugs in json decode.
Overhaul unit tests to use valijson. Add tests with static examples to validate against schema. Use and nlohmann for better schema validation over intrinsic libcper validation.
Example json output before: { "ValidationBits": { "LevelValid": false, "CorrectedValid": true }, "Level": 1, "Corrected": true }
After: { "Corrected": true }
Change-Id: I188bdc2827a57d938c22a431238fadfcdc939ab8 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| cc367011 | 05-Dec-2024 |
Aushim Nagarkatti <anagarkatti@nvidia.com> |
Include hex decode for human readable fields
Hexadecimal decode for some fields like deviceAddress make sense to be represented in hex over decimal to make scripting and human-usability easier.
Cha
Include hex decode for human readable fields
Hexadecimal decode for some fields like deviceAddress make sense to be represented in hex over decimal to make scripting and human-usability easier.
Change-Id: I7d0d100162bc681c3c6885ca01ed23020c3b5063 Signed-off-by: Aushim Nagarkatti <anagarkatti@nvidia.com>
show more ...
|
| 3cebfc28 | 20-Nov-2024 |
Andrew Adriance <aadriance@nvidia.com> |
Add AER registers to PCIe decoding
Break out AER registers so aerinfo doesn't require manual interpretation
Change-Id: I5e626155270636420a1f6e7c473a2b15bfa7ecf0 Signed-off-by: Andrew Adriance <aadr
Add AER registers to PCIe decoding
Break out AER registers so aerinfo doesn't require manual interpretation
Change-Id: I5e626155270636420a1f6e7c473a2b15bfa7ecf0 Signed-off-by: Andrew Adriance <aadriance@nvidia.com>
show more ...
|