adda0540 | 06-Apr-2023 |
Zane Shelley <zshelle@us.ibm.com> |
Clarify definition of chip checkstop
Previously, the ATTN_TYPE_CHECKSTOP associated with a signature was synonymous with a system checkstop event. This is certainly true for if a processor chip chec
Clarify definition of chip checkstop
Previously, the ATTN_TYPE_CHECKSTOP associated with a signature was synonymous with a system checkstop event. This is certainly true for if a processor chip checkstops. However, this is not true if a connected OCMB chip checkstops because it is possible in some cases for a system to recover. To differentiate an OCMB chip checkstop from a system checkstop they were previously reported as unit checkstops. With the addition Odyssey OCMBs, which have ability to report both chip and unit checkstops, we decided to fix the confusion and disassociate a chip checkstop from a system checkstop. Now the signatures will properly report the chip attention type and the signature filtering code has been modified to simply associate only chip checkstops from processor chips as system checkstop attentions.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: Iff9822ff8c9c0ae1afe84353010e94759dbdf49d
show more ...
|
93b001c5 | 24-Mar-2023 |
Zane Shelley <zshelle@us.ibm.com> |
Remove support for deprecated RAS data version 1
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: I91572c057169e3416bc543bad5ccab1a505d1485 |
100c7a26 | 03-Mar-2023 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Updates to Odyssey RAS data for TP and MEM local FIRs
Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> Change-Id: Ia916f92fe7afc7278c6f8efc2d46ad5223064eb5 |
51f8202c | 22-Feb-2023 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Update DSTL_FIR callouts in the event of failure to analyze an OCMB
Change-Id: I40c17703ad032aa98f02b43d9cb321b7fc86fea3 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
d5fa9584 | 27-Feb-2023 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Add initial RAS data files for Odyssey
Change-Id: I70a596dd364057a4ce555546c36cc8369764b785 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
31a8753a | 22-Feb-2023 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Update Explorer RAS data json with sorted keys
Change-Id: I6ebc92345d2cb857abcbe5b73ca73e3c7a3a5191 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
5836f4a6 | 09-Feb-2023 |
Zane Shelley <zshelle@us.ibm.com> |
use nholmann json exceptions instead of std
nholmann::json::at() will throw nholmann::json::out_of_range instead of std::out_of_range. This resulted in missed exceptions in the signature filtering c
use nholmann json exceptions instead of std
nholmann::json::at() will throw nholmann::json::out_of_range instead of std::out_of_range. This resulted in missed exceptions in the signature filtering code.
Change-Id: I573e1ed4455bbda4f05c100edd315eb0ccdc9c3f Signed-off-by: Zane Shelley <zshelle@us.ibm.com>
show more ...
|
02d59af5 | 07-Feb-2023 |
Zane Shelley <zshelle@us.ibm.com> |
Exception handling with flags in ras-data-parser
It is possible that a signature may not be defined in the RAS data. In which case, trying to access the flags for an undefined signature would throw
Exception handling with flags in ras-data-parser
It is possible that a signature may not be defined in the RAS data. In which case, trying to access the flags for an undefined signature would throw an exception. This is not the desired behavior. Instead, we'll catch the exceptions and move on as if the flag is not defined.
Change-Id: I4d3cff52ce5f32074fca9863f60b84726dd590aa Signed-off-by: Zane Shelley <zshelle@us.ibm.com>
show more ...
|
ecde53fc | 13-Dec-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Adjust TI root cause filter to skip INT_CQ_FIR[47:50]
These bits are recoverable errors and should not be blamed as the root cause of a TI.
Change-Id: I666eadbde0c2a0935fa47206f337112bc44a100f Sign
Adjust TI root cause filter to skip INT_CQ_FIR[47:50]
These bits are recoverable errors and should not be blamed as the root cause of a TI.
Change-Id: I666eadbde0c2a0935fa47206f337112bc44a100f Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com>
show more ...
|
b69b2ba0 | 14-Dec-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Updated P10 RAS data json with added thresholds
Change-Id: Iddc7d587c69560eb8194edf22235b3e9d903412e Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
8b10d699 | 08-Dec-2022 |
Patrick Williams <patrick@stwcx.xyz> |
prettier: re-format
Prettier is enabled in openbmc-build-scripts on Markdown, JSON, and YAML files to have consistent formatting for these file types. Re-run the formatter on the whole repository.
prettier: re-format
Prettier is enabled in openbmc-build-scripts on Markdown, JSON, and YAML files to have consistent formatting for these file types. Re-run the formatter on the whole repository.
Change-Id: Ib936836ce0d698dc522bc047a78d4f1b0060c13c Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
1a4f0e70 | 07-Nov-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Update root cause filtering to use RAS data flags
Change-Id: I172540905a39533139821d3cb1676424824bd804 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
de220920 | 05-Dec-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Change scope of auto-generated build info header
Changes had to be made in libhei to make the build information header more portable. These changes are in reaction to that.
Signed-off-by: Zane Shel
Change scope of auto-generated build info header
Changes had to be made in libhei to make the build information header more portable. These changes are in reaction to that.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: Ifeb04f302d850446eff42ae66c2b29b1693c5889
show more ...
|
934635e0 | 03-Nov-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Update Explorer RAS data json to auto-generated v2
Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> Change-Id: I0b5062f3f8dac85d9f8abfe2115c15aae3e12d0c |
f1184392 | 07-Oct-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Add RAS data parser handling for getting RAS data flags
In the future we will be supporting an additional 'flags' type stored in the RAS data files for specific bits. This adds the handling to the R
Add RAS data parser handling for getting RAS data flags
In the future we will be supporting an additional 'flags' type stored in the RAS data files for specific bits. This adds the handling to the RAS data parser to get those flags.
Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> Change-Id: Ie7889135ae7a643fec287565143a8ee7edc33777
show more ...
|
dd74a84f | 02-Nov-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Update RAS data json with flags for root cause filtering
Change-Id: If39f871c3d02c06cb5ad972a361c326ab8391748 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com> |
e36866c3 | 31-Oct-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Add auto-generated json RAS data and supporting changes
Moving forward we want to use json RAS data files that have been auto-generated instead of maintaining the json itself. This updates the curre
Add auto-generated json RAS data and supporting changes
Moving forward we want to use json RAS data files that have been auto-generated instead of maintaining the json itself. This updates the current json RAS data to version 2 and makes accompanying changes in the RAS data parser and schema.
Change-Id: I1278c65f6479437630de5b9d3440d4a19f42a1f6 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com>
show more ...
|
329dbbde | 03-Oct-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Adjust root cause filtering for IUE thresholds
After handling an IUE threshold, a channel fail will be initiated by firmware. If that channel fail causes a system checkstop, we want to blame the IUE
Adjust root cause filtering for IUE thresholds
After handling an IUE threshold, a channel fail will be initiated by firmware. If that channel fail causes a system checkstop, we want to blame the IUE FIR bits as the root cause.
Change-Id: Idd28b0b4310b83b97258755bc8da0dad1f58d2a6 Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com>
show more ...
|
7a465259 | 09-Sep-2022 |
Caleb Palmer <cnpalmer@us.ibm.com> |
Add FFDC for signatures stored in scratch registers
If analysis was interrupted by a system checkstop there may exist an error signature within two Hostboot scratch regs that indicates the signature
Add FFDC for signatures stored in scratch registers
If analysis was interrupted by a system checkstop there may exist an error signature within two Hostboot scratch regs that indicates the signature from that analysis. This commit adds support to add that signature as FFDC to the PEL if it exists to indicate that a prior analysis was interrupted such that we may be missing a PEL for that signature.
Change-Id: I53216e2c7910c69c4e7e74010a5c0045b793bfde Signed-off-by: Caleb Palmer <cnpalmer@us.ibm.com>
show more ...
|
fc7e2476 | 24-Jun-2022 |
Zane Shelley <zshelle@us.ibm.com> |
CORE_FIR recoverables could be blamed as checkstop root cause
If a CORE_FIR recoverable attention fails recovery, it will trigger a core unit checkstop attention via another bit. All core unit check
CORE_FIR recoverables could be blamed as checkstop root cause
If a CORE_FIR recoverable attention fails recovery, it will trigger a core unit checkstop attention via another bit. All core unit checkstop attentions have the potential to trigger a system checkstop attention. Therefore, all CORE_FIR recoverable attentions could be blamed a system checkstop root cause attentions.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: Ib2f3916218b4dce88797f645a302716ef4fd4d49
show more ...
|
b82cbf75 | 27-Jun-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Update to clang-format-14
Required because the Jenkins CI tools have moved to v14.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: I3cf4df1b45325545a423bdcb810040724a598ec5 |
513f64aa | 15-Jun-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Handling for host detected LPC timeout
For reasons not explained yet, hardware will not initiate an LPC timeout attention via NCU timeout FIR bit as we expected. When the host firmware detects an LP
Handling for host detected LPC timeout
For reasons not explained yet, hardware will not initiate an LPC timeout attention via NCU timeout FIR bit as we expected. When the host firmware detects an LPC timeout, it will manually set N1_LOCAL_FIR[61] to force a system checkstop. The service response for this bit will be to call out the hardware as if there was a hardware reported LPC timeout.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: I863e8aa3ef50a4b18b5106b3a45c4cf81b2c7808
show more ...
|
ed3ab8f9 | 24-May-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Fix outdate comment in analyzer filter support
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: I5e14eb82a4017ed794314d2800ea88dd0d706942 |
026e5a3f | 05-May-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Avoid guarding on TOD interfaces errors
The error could be anywhere between the two processors in the interface. Fatally guarding the MDMT will cause system outage until service is done. Instead, do
Avoid guarding on TOD interfaces errors
The error could be anywhere between the two processors in the interface. Fatally guarding the MDMT will cause system outage until service is done. Instead, do not guard on the TOD interface errors to avoid outage.
Signed-off-by: Zane Shelley <zshelle@us.ibm.com> Change-Id: I446917bad985e5143657398b2fbadacf6e8c4a9d
show more ...
|
7bf1bfa5 | 27-Apr-2022 |
Zane Shelley <zshelle@us.ibm.com> |
Enable LPC timeout handling
It turns out the plugin exists, but nothing in the RAS data was calling the plugin.
Change-Id: I9d35a61064e5f412f216ffbea96597b4d691a98a |