fbdfc765 | 30-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Use lg2 in Manager class
Modernize it a bit and it makes it easier to see debug traces which can be done by just running the daemon from the command line. There are quite a few debug traces in
PEL: Use lg2 in Manager class
Modernize it a bit and it makes it easier to see debug traces which can be done by just running the daemon from the command line. There are quite a few debug traces in this file.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Id1291e3a3c3882d1ed917d4a414cb6253ebf2644
show more ...
|
a167a7d9 | 30-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Use lg2 in DataInterface files
Modernize it a bit and it makes it easier to see debug traces which can be done by just running the daemon from the command line.
Signed-off-by: Matt Spinler <sp
PEL: Use lg2 in DataInterface files
Modernize it a bit and it makes it easier to see debug traces which can be done by just running the daemon from the command line.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ic0416222e29be0c6687b6d31cb6ca4feae2ad619
show more ...
|
527ff346 | 29-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Handle failing to start a PLDM cmd better
A recent PLDM bug caused the registerReceiveCallback() function, which is used to setup listening for the PLDM response from the host when telling the
PEL: Handle failing to start a PLDM cmd better
A recent PLDM bug caused the registerReceiveCallback() function, which is used to setup listening for the PLDM response from the host when telling them about a new PEL, to throw an exception.
When this happened, the code got stuck in the 'in progress' state, so it would never try again when the next PEL came in.
Fix that by having startCommand() throw an exception instead of calling the failure response function callback. With this change, the code will continue on to call the cleanupCmd() function so everything is ready when the next PEL comes in.
Tested: With the bad PLDM code, after the first PEL ran out of retry attempts, created another PEL and saw the code attempt again to call PLDM. Also, wrote a new unit test case for it.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I38034440435d6a86e8dd880eef09499f19dd6e9c
show more ...
|
1b41886d | 29-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Use lg2 in PLMD related files
There are a lot of debug traces in the code that uses PLDM to send PELs up to the OS. Convert the files that deal with that to lg2 so that the debug traces can be
PEL: Use lg2 in PLMD related files
There are a lot of debug traces in the code that uses PLDM to send PELs up to the OS. Convert the files that deal with that to lg2 so that the debug traces can be seen by just running phosphor-log-manager from the command line where lg2 will print to the console as opposed to having to change the journal priority in an overlay file.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I818bab1bf636fe83d3e7ff64c1bc31ad5e09705e
show more ...
|
85f0160d | 06-Jul-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Remove unnecessary call to restore resolution
The Resolution D-Bus property, which holds the PEL callouts, are saved along with the other elog properties in the cereal backing file and don't ne
PEL: Remove unnecessary call to restore resolution
The Resolution D-Bus property, which holds the PEL callouts, are saved along with the other elog properties in the cereal backing file and don't need to be manually restored when the daemon starts up.
Tested: Resolution is still filled in after restarting the daemon.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I18e1b79d40b219a6a37c013355cd965de11ce8cb
show more ...
|
3387eac9 | 06-Jul-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Fixed serializing of elog properties
The PEL code modifies the Resolution and EventId properties on the standard event log objects to fill them in with PEL specific values. It was originally d
PEL: Fixed serializing of elog properties
The PEL code modifies the Resolution and EventId properties on the standard event log objects to fill them in with PEL specific values. It was originally doing this by using the resolution() and eventId() override functions on the Entry object that would do the elog serialize after it updated those properties.
Since then, the code was changed to instead call the resolution() and eventId() functions that also take the bool skipSignal parameter, which caused it to not call the overridden functions which meant those properties weren't serialized and so not restored after a restart.
While we could just also override the functions that take that skipSignal parameter to do the serialize, those get called when the event log object is restored on startup, so it would cause unnecessary calls to serialize.
Instead, just call the serialize() function directly after updating the event ID and resolution when creating a PEL.
Tested: The Resolution and EventId D-Bus properties are now correct after the process is restarted.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I8c1822e9c31925983feddab657644c98f37ef079
show more ...
|
36a82ebe | 22-Jun-2023 |
Lakshmi Yadlapati <lakshmiy@us.ibm.com> |
PEL: Error log entries for ECC errors
This commit adds three new error log entries for ECC errors.
The new entries include: - Correctable ECC memory error collection limit is reached. - Correctable
PEL: Error log entries for ECC errors
This commit adds three new error log entries for ECC errors.
The new entries include: - Correctable ECC memory error collection limit is reached. - Correctable ECC/other correctable memory error. - Uncorrectable ECC/other uncorrectable memory error.
Change-Id: Ibfa7654bdfbccc5cf7154664a048d931d10b433c Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>
show more ...
|
7f5e4410 | 23-Jun-2023 |
Patrick Williams <patrick@stwcx.xyz> |
lg2: simplify source_location for clang-16
As of clang-16, `source_location` is fully supported, so we do not need to use the `experimental::source_location` workarounds anymore. Eliminate them and
lg2: simplify source_location for clang-16
As of clang-16, `source_location` is fully supported, so we do not need to use the `experimental::source_location` workarounds anymore. Eliminate them and use `std::source_location` directly rather than the `lg2::source_location` alias.
Tested: Compiled the repository with CXX=clang++.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Iba987ed9580d228781acac630681fda172a6cef7
show more ...
|
f0200b51 | 14-Jun-2023 |
Marri Devender Rao <devenrao@in.ibm.com> |
PEL : Modify faultlog error to include only serviceable record count
Modified to include only serviceable records count like system guard record count, unresolved pel's with deconfigured bit set.
N
PEL : Modify faultlog error to include only serviceable record count
Modified to include only serviceable records count like system guard record count, unresolved pel's with deconfigured bit set.
Not including manual guard and FCO deconfigured record counts in the PEL as thes are created by the user/servie engineer. These records will be shown in the nag dump though.
Faultlog PEL will be created only if serviceable records are found.
Tested: root@xxxx:/tmp# peltool -i 0x500090BF { "User Header": { "Section Version": "1", "Action Flags": [ "Service Action Required", "Report Externally", "HMC Call Home" ], }, "Primary SRC": { "Section Version": "1", "Error Details": { "Message": "Firmware detected either a FRU deconfigured And/Or guard record", "GUARD_RECORD_COUNT": [ "0x1", "Number of system guard records if any" ], "PEL_WITH_DECONFIG_BIT_COUNT": [ "0x0", "Number of PEL's having deconfig bit set" ] }, "Valid Word Count": "0x09", "Reference Code": "BD50F138", "Hex Word 6": "00000001", "Hex Word 7": "00000000", "Hex Word 8": "00000000", "Hex Word 9": "00000000", "Callout Section": { "Callout Count": "1", "Callouts": [{ "FRU Type": "Maintenance Procedure Required", "Priority": "Mandatory, replace all with this type as a unit", "Procedure": "BMC0008" }] } }, "User Data 1": { "Section Version": "1", "Sub-section type": "1", "Created by": "bmc error logging", "GUARD_RECORD_COUNT": "1", "PEL_WITH_DECONFIG_BIT_COUNT": "0" } }
Signed-off-by: Marri Devender Rao <devenrao@in.ibm.com> Change-Id: Id845c3ecc45b7be7c553942658b68d4d928e29b2
show more ...
|
7ba17c34 | 12-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
Don't throw on Settings access fail
The isQuiesceOnErrorEnabled() function makes a call to the Settings daemon. If that throws an exception, the code just re-throws the exception. While that excep
Don't throw on Settings access fail
The isQuiesceOnErrorEnabled() function makes a call to the Settings daemon. If that throws an exception, the code just re-throws the exception. While that exception is caught by sdbusplus, it causes the function to exit early and not actually save the event log in the class.
To fix this, just change the code to say quiesceOnError isn't enabled and continue on.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I814e6dae90483c98e0f24af90baa5f267d22b78d
show more ...
|
b25e8a32 | 07-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: No PC signals when updating entry props
The PEL code has to update properties on the xyz.openbmc_project.Logging.Entry interface after the corresponding PEL has been created. These updates don
PEL: No PC signals when updating entry props
The PEL code has to update properties on the xyz.openbmc_project.Logging.Entry interface after the corresponding PEL has been created. These updates don't need to send PropertiesChanged signals since it is when the event log is being created.
The ipmid daemon is watching for PC signals and it was triggering a bunch of unnecessary activity.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I6f9a2992abf0454bc95b44bb9897642cecd3200e
show more ...
|
da5b76b2 | 01-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Support for CheckstopFlag msg reg field
Similiar to the DeconfigFlag field that was recently added, this one indicates the PEL is for a hardware checkstop and results in a bit in SRC hex word 5
PEL: Support for CheckstopFlag msg reg field
Similiar to the DeconfigFlag field that was recently added, this one indicates the PEL is for a hardware checkstop and results in a bit in SRC hex word 5 being set.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ib05de7471ad3e32f48e7f20a5c611abc119fe82a
show more ...
|
81bc5611 | 01-Jun-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Fixes for gcc13
* Add the cstdint header file as now required to get the uint* types. * Fix a move assignment test * Refactor some nlohmann::json code to avoid:
``` /usr/include/c++/13/valarra
PEL: Fixes for gcc13
* Add the cstdint header file as now required to get the uint* types. * Fix a move assignment test * Refactor some nlohmann::json code to avoid:
``` /usr/include/c++/13/valarray:1201:1: note: template argument deduction/substitution failed: ../extensions/openpower-pels/registry.cpp:665:43: note: ‘const nlohmann::json_abi_v3_11_2::basic_json<>::value_type’ {aka ‘const nlohmann::json_abi_v3_11_2::basic_json<>’} is not derived from ‘const std::valarray<_Tp>’ 665 | (name == j["SRC"]["ReasonCode"] && type == LookupType::reasonCode)); ```
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ia3e733602134a60008d0d47934f95a217d2a0eb1
show more ...
|
5ee3605d | 30-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Trace less when watching Present changes
The fans and power supplies in IO expansion drawers are also in the inventory, hosted by PLDM, and have their Present property set to true at least once
PEL: Trace less when watching Present changes
The fans and power supplies in IO expansion drawers are also in the inventory, hosted by PLDM, and have their Present property set to true at least once every boot after the hypervisor starts and tells PLDM they are present. This causes multiple traces every boot from the PEL code watching for fan and power supply hot plugs. Also, one of the times the items become present, the call to get its location code from PLDM fails which causes another trace.
Since the PEL code doesn't care about fans and PSs in IO drawers, just change these traces to debug ones so that no traces show up on normal boots. There will still be a trace if the code does end up clearing the deconfig flag from a PEL with a plugged local fan/PS as a callout.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I18cf95e6adfc6aa5700fe33a74c18014b36d0b9d
show more ...
|
77930f20 | 25-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Up the state PEL journal lines to 100
Change the number of journal lines captured to the max of 100 for state related PELs, since it seems like there's always interesting things going on when t
PEL: Up the state PEL journal lines to 100
Change the number of journal lines captured to the max of 100 for state related PELs, since it seems like there's always interesting things going on when those are created. The resulting PEL was about 11KB, which is still well under the max of 16KB.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I8344431386d8eb555814b64ea1b929aa4ec6b925
show more ...
|
06634e8b | 25-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Add reg entry for master FSI detection fail
This error is created when something goes wrong during the scan the FSI device driver does.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-
PEL: Add reg entry for master FSI detection fail
This error is created when something goes wrong during the scan the FSI device driver does.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ifd166ad3395602cc6d1f36356fc121379b3d2765
show more ...
|
a6c4ba7d | 24-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Use uppercase hex in message registry
The code in the Registry class requires the ReasonCode field in the message registry to use upper case hex digits (A-F) to be able to find a match to print
PEL: Use uppercase hex in message registry
The code in the Registry class requires the ReasonCode field in the message registry to use upper case hex digits (A-F) to be able to find a match to print the description in peltool, so change the message registry schema to require that. Also fix an entry that had lower case.
Alternatively, the code could be fixed, but with the schema checking that runs in CI to catch errors this method seems less impactful.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I4f26c876575be58232f1408f9357465a3340002a
show more ...
|
757c0ef3 | 04-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Update README.md with latest functionality
Describe how the code will clear the deconfig flag for replaced fans and power supplies.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id:
PEL: Update README.md with latest functionality
Describe how the code will clear the deconfig flag for replaced fans and power supplies.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I8a78748a8093fb764c890433e9e3d4d3248843bb
show more ...
|
0dd22c83 | 04-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Clear deconfig flag after callout replaced
Making use of the previous commit's framework to call a function when a fan or power supply becomes present, add code to the Manager class to register
PEL: Clear deconfig flag after callout replaced
Making use of the previous commit's framework to call a function when a fan or power supply becomes present, add code to the Manager class to register a callback that will clear the deconfig flag for all PELs created with the power-thermal or fan component ID that have the location code of the replaced fan/PS as a callout.
This way, the degraded mode reporting code will no longer pick up those PELs in its report as since the hardware was replaced those PELs are no longer relevant.
This is necessary only for fans or power supplies because they're the only N+1 hardware that can be hot plugged at runtime. And also because this is what the IBM service team wants.
Tested: Simulated missing hardware (changed present D-Bus property for fans, toggled PSU presence GPIO in the simulator for PSs). Saw errors get created for it, then simulated replacing it and saw those errors have their deconfig flag cleared, verifying before and after with peltool:
``` // Remove and replace fan phosphor-fan-monitor: Fan /system/chassis/motherboard/fan0 presence state change to false phosphor-log-manager: Created PEL 0x50000002 (BMC ID 2) with SRC 110076F1 phosphor-fan-monitor: Fan /system/chassis/motherboard/fan0 presence state change to true phosphor-log-manager: Detected FRU /xyz/openbmc_project/inventory/system/chassis/motherboard/fan0 (U78DB.ND0.1234567-A0) present phosphor-log-manager: Clearing deconfig flag in PEL 0x50000002 with SRC 110076F1 because U78DB.ND0.1234567-A0 was replaced
// Remove and replace PS phosphor-log-manager: Created PEL 0x50000003 (BMC ID 3) with SRC 110015F6 ... phosphor-psu-monitor: Updating inventory present property. present:true invpath:/system/chassis/motherboard/powersupply0 name:powersupply0 phosphor-log-manager: Detected FRU /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0 (U78DB.ND0.1234567-E0) present phosphor-log-manager: Clearing deconfig flag in PEL 0x50000003 with SRC 110015F6 because U78DB.ND0.1234567-E0 was replaced ```
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Iee05b4a612ca8f438f8c89f37b4e7b529a131a9f
show more ...
|
5b423651 | 04-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Watch for fan/PS hotplugs
Code is going to need to know when a fan or power supply (the only hotpluggable redundant FRUs) are added or replaced so that it can clear a flag in PELs that have tha
PEL: Watch for fan/PS hotplugs
Code is going to need to know when a fan or power supply (the only hotpluggable redundant FRUs) are added or replaced so that it can clear a flag in PELs that have that HW as a callout. This is so other code that is doing degraded mode notifications will not do any notifications for PELs calling out HW that has been replaced.
To enable this functionality, add support to the DataInterface class to tell subscribers via a function callback when a fan or power supply becomes present, as indicated by the Present property.
Code will watch the Present property of fan and power supplies to change via the PropertiesChanged signal, as well as watch for InterfacesAdded to catch when they show up on D-Bus in the first place.
It won't start any of these watches until the BMC gets to ready state so that the inventory has a chance to be populated first.
On the first boot when the inventory was previously empty there will be a round of inventory InterfacesAdded signals that will wake up the daemon when PLDM receives the processor core presence information from hostboot, but it just checks and sees it's the wrong interface and returns, and I'm not sure how it can be avoided.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I93d0727c3082677826db4a4a02c1a30986f6099b
show more ...
|
d0ccda3c | 04-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: Modify Repo::updatePEL behavior
The Repository class's updatePEL function takes a PEL and then calls a passed in function to update it.
Change the behavior slightly so the callback returns a b
PEL: Modify Repo::updatePEL behavior
The Repository class's updatePEL function takes a PEL and then calls a passed in function to update it.
Change the behavior slightly so the callback returns a bool - true if the PEL was actually updated, and false else. That way the code knows if the PEL needs to be written back out or not.
There is also a minor change to refresh the PEL attributes map for the PEL inside the updatePEL function so it doesn't need to be done outside of it.
This is all to support upcoming functionality where an updatePEL call won't know if the PEL needs to be updated until the PEL fields can be checked.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ic1eabd2fcd8dfc7f559be24142b3e147d4b65062
show more ...
|
784b02e7 | 25-Apr-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL:test: Refactor mocked checkDumpStatus usage
When creating an instance of the SRC class in a testcase, it needs a filled in mock of the checkDumpStatus() function. Instead of manually doing that
PEL:test: Refactor mocked checkDumpStatus usage
When creating an instance of the SRC class in a testcase, it needs a filled in mock of the checkDumpStatus() function. Instead of manually doing that everywhere a PEL or SRC class is created, just do it in the constructor of the mock DataInterface class.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I74790c67251465aae87d318ea37891d9eabab5e5
show more ...
|
32e36b8c | 25-Apr-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL:pel_manager_test: Refactor temp dir cleanup
Several testcases were creating a temporary directory and then removing it at the end, except for one which missed the cleanup so the dir would stick
PEL:pel_manager_test: Refactor temp dir cleanup
Several testcases were creating a temporary directory and then removing it at the end, except for one which missed the cleanup so the dir would stick around in /tmp after the run was done.
To fix that and to make sure it doesn't happen again, just add the creating and deleting of the temp directory to the test fixture class so the deleting happens automatically.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I2abb720b4c0aeb9dec6117d6399c3e4692709c68
show more ...
|
4deed972 | 28-Apr-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: peltool: show hostboot deconfig/guard flags
The deconfig and guard flags in the SRC section are also valid for hostboot SRCs, so display them in peltool for hostboot PELs.
Signed-off-by: Matt
PEL: peltool: show hostboot deconfig/guard flags
The deconfig and guard flags in the SRC section are also valid for hostboot SRCs, so display them in peltool for hostboot PELs.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I3a36d61ced592e0718e016665587444b7e14f964
show more ...
|
8e65f4ea | 02-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
PEL: New D-Bus properties on PEL entry iface
Fill in the 4 newly added properties on the PEL entry D-Bus interface: - Platform log ID (PLID) - Deconfig flag from the SRC section - Guard flag from th
PEL: New D-Bus properties on PEL entry iface
Fill in the 4 newly added properties on the PEL entry D-Bus interface: - Platform log ID (PLID) - Deconfig flag from the SRC section - Guard flag from the SRC section - Creation timestamp
These were also added to the PELAttributes map in the Repository class so that each PEL wouldn't have to be reconstructed from a file again when creating the D-Bus objects.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I7878645f56c634e6111fcecc22ab27673d0c0f5d
show more ...
|