#
c488bac1 |
| 17-Mar-2025 |
Chris Cain <cjcain@us.ibm.com> |
Clear flags when the host changes to powered off state
When the host is powered off, the OCCs will be stopped, so clear any reset pending flags as well as any outstanding HRESET requests.
This will
Clear flags when the host changes to powered off state
When the host is powered off, the OCCs will be stopped, so clear any reset pending flags as well as any outstanding HRESET requests.
This will ensure the next boot will start clean and not react to something that happened on prior boot.
Tested on Rainier for several error scenarios.
Change-Id: Ie4156975a844e823787f7162ee0542d7f099bd12 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
92dfb271 |
| 13-Feb-2025 |
Chris Cain <cjcain@us.ibm.com> |
Ignore HRESET_NOT_READY state until HRESET completes
After HRESET has been requested, code will wait for HRESET_READY or HRESET_FAILED status before attempting OCC communication again.
Code will al
Ignore HRESET_NOT_READY state until HRESET completes
After HRESET has been requested, code will wait for HRESET_READY or HRESET_FAILED status before attempting OCC communication again.
Code will also not clear the outstandingHReset until READY/FAILED, since the reset should still be in progress.
OCC comm will get disabled before the HRESET and re-enabled if reset completes successfully. If failed, no further comm will work.
My testing found that pldm instance ids were not getting freed automatically when receiving a response. So this change will also free those IDs when the response is received.
Tested on Rainier with recoverable and unrecoverable SBE injects.
''' Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation - failbit badbit Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=0) Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation - failbit badbit Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=11) Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: SBE timeout, requesting HRESET (OCC0) Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to False Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: got id 15 and set PldmInstanceId to 15 Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: openMctpDemuxTransport: pldmFd has fd=9 Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: sendPldm: calling pldm_transport_send_msg(OCC0, instance:15, 8 bytes, timeout 30) Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: calling pldm_transport_recv_msg() instance:15 Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: pldm_transport_recv_msg() rsp was 4 bytes Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: Reset has been successfully started Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Freed PLDM instance ID 15 Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldm: HRESET is NOT READY (OCC0) Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: HRESET succeeded (OCC0) Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to True Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: validateOccMaster: OCC0 is master of 4 OCCs Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: OCC0 state 0x3 (lastState: 0x0) Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendModeChange: SET_MODE(12,0) command to OCC0 (9 bytes) Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Idle Power Saver Parameters: enabled:True, enter:8%/240s, exit:12%/10s Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendIpsData: SET_CFG_DATA[IPS] command to OCC0 (12 bytes) Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: successfully read OCC0 state: 3 '''
Change-Id: I7e5bc60576e4e8fa6cba4253be535220cb8048ec Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
2d6ec909 |
| 01-Feb-2025 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: update latest spec and reformat
Copy the latest format file from the docs repository and apply.
Change-Id: I289fffda3d8b39bb9b16eab30928d0aa7c8e1821 Signed-off-by: Patrick Williams <p
clang-format: update latest spec and reformat
Copy the latest format file from the docs repository and apply.
Change-Id: I289fffda3d8b39bb9b16eab30928d0aa7c8e1821 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
37abe9be |
| 31-Oct-2024 |
Chris Cain <cjcain@us.ibm.com> |
Update occ-control to use lg2 for all logging
Convert existing log<level>() trace statements to lg2::level()
Testing: Verified on Rainier - captured journal traces before and after commit during bo
Update occ-control to use lg2 for all logging
Convert existing log<level>() trace statements to lg2::level()
Testing: Verified on Rainier - captured journal traces before and after commit during boots, mode, pcap and ips changes.
Change-Id: I318fa7bf3902c641b0c28b09190db4b61d0a2fa9 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
f0295f52 |
| 12-Sep-2024 |
Chris Cain <cjcain@us.ibm.com> |
Improve BMC error handling for OCC comm failures
- Delay starting OCC reset until all OCCs have been detected (or timeout). It will prevent multiple resets from being triggered and to help detecting
Improve BMC error handling for OCC comm failures
- Delay starting OCC reset until all OCCs have been detected (or timeout). It will prevent multiple resets from being triggered and to help detecting when reset is completed (active sensor being set after reset is complete) - Wait for PLDM response to OCC reset and HRESET requests and retry if they fail - If HRESET returns NOT_READY, collect SBE FFDC and try OCC reset. A persistent failure will put the system in safe state.
- Prevent overwriting dvfs over-temp filename for p10 and beyond since that old file is only present in old kernel - Prevent assert when opening sysfs files. (added catch and then created an OCC Comm failure PEL, which will force an OCC reset.) - Check return code after reading sysfs files to confirm success. If read fails, try reset to recover.
- Updated traces to include which processor/OCC encountered issues. - Better recovery to close windows that were leaving system in partial good state.
JIRA: PFES-66 Change-Id: I0b087d0e05bd8562682062e1c662f9e18164a720 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
88811adc |
| 27-Aug-2024 |
Eddie James <eajames@linux.ibm.com> |
Fix missing PLDM transport configuration
The PLDM transport option wasn't actually used. In addition, throttle the PLDM open failure trace.
Signed-off-by: Eddie James <eajames@linux.ibm.com> Change
Fix missing PLDM transport configuration
The PLDM transport option wasn't actually used. In addition, throttle the PLDM open failure trace.
Signed-off-by: Eddie James <eajames@linux.ibm.com> Change-Id: I6c5f9151b3230171a1f050d54bbb143334577ab2
show more ...
|
#
6213f199 |
| 01-Jul-2024 |
Lakshmi Yadlapati <lakshmiy@us.ibm.com> |
Add kernel MCTP (AF_MCTP) support and transport-implementation option
-Added support for kernel MCTP (AF_MCTP) to enable MCTP communication using AF_MCTP sockets.
- Introduced a new configuration
Add kernel MCTP (AF_MCTP) support and transport-implementation option
-Added support for kernel MCTP (AF_MCTP) to enable MCTP communication using AF_MCTP sockets.
- Introduced a new configuration option 'transport-implementation'
The 'transport-implementation' option can be set to either: - 'mctp-demux': Uses the existing mctp-demux transport method. - 'af-mctp': Uses the new kernel AF_MCTP transport method.
Change-Id: I2978273fe4579d1dce00368dabb7f90815dbbce8 Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com> Signed-off-by: Eddie James <eajames@linux.ibm.com>
show more ...
|
#
d7542c83 |
| 16-Aug-2024 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda forma
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda formatting also changed, so we have made changes to the organization default style format to better handle lambda formatting.
See I5e08687e696dd240402a2780158664b7113def0e for updated style. See Iea0776aaa7edd483fa395e23de25ebf5a6288f71 for clang-18 enablement.
Change-Id: I94e2bfdc8fae9bc14e30c701a0e622709ee9b0fe Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
52328cb4 |
| 14-Feb-2023 |
Rashmica Gupta <rashmica@linux.ibm.com> |
Move to libpldm pldm_transport APIs
- Replaced the deprecated pldm transport APIs with the new libpldm pldm_transport APIs.
This change migrates the application off of the deprecated "requester"
Move to libpldm pldm_transport APIs
- Replaced the deprecated pldm transport APIs with the new libpldm pldm_transport APIs.
This change migrates the application off of the deprecated "requester" APIs in libpldm.
We don't currently have the infrastructure in place to get the correct TIDs, so to keep everything working as before use the EID as the TID in the EID-to-TID mapping.
Change-Id: Iedbfe936a710d37f75e737c3d307295ff55cf07b Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com> Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>
show more ...
|
#
db38e91a |
| 24-May-2023 |
Rashmica Gupta <rashmica@linux.ibm.com> |
Move to libpldm instance id APIs
Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com> Change-Id: I2955097a78c673f65054fa9bff1ef5243da136a2 Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>
|
#
aeba51cd |
| 16-Feb-2023 |
Rashmica Gupta <rashmica@linux.ibm.com> |
Change "mctp instance id" to "pldm instance id"
It's not a MCTP thing.
Change-Id: I922d28f848c8948d43ea4b8926d2310c8fb50228 Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com> Signed-off-by: Lak
Change "mctp instance id" to "pldm instance id"
It's not a MCTP thing.
Change-Id: I922d28f848c8948d43ea4b8926d2310c8fb50228 Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com> Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>
show more ...
|
#
97476a1e |
| 19-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
pldm: Replace deprecated libpldm header path
There are more OEMs than IBM contributing to libpldm, so the OEM headers were restructured. Replace the deprecated IBM OEM header path with the namespace
pldm: Replace deprecated libpldm header path
There are more OEMs than IBM contributing to libpldm, so the OEM headers were restructured. Replace the deprecated IBM OEM header path with the namespaced path.
The patch was generated by with the coccinelle[1] script from [2]:
``` $ spatch \ --sp-file .../libpldm/origin/evolutions/current/oem-ibm-header-compat.cocci \ --in-place \ $(git ls-files | grep -E '\.[ch](pp)?') ```
[1]: https://coccinelle.gitlabpages.inria.fr/website/ [2]: https://gerrit.openbmc.org/c/openbmc/libpldm/+/72202
Change-Id: Ib3b4c5a650a06e816155a91ac3a83c2176639dc7 Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
show more ...
|
#
755af102 |
| 27-Feb-2024 |
Chris Cain <cjcain@us.ibm.com> |
Handle other PLDM_STATE_SET_OPERATIONAL states
- Code will no longer assume the OCC is not running if an unexpected state is received. It will continue to look for a good/known state. - Added code t
Handle other PLDM_STATE_SET_OPERATIONAL states
- Code will no longer assume the OCC is not running if an unexpected state is received. It will continue to look for a good/known state. - Added code that will throttle the occ-control pldm journal traces if unable to read the OCC active sensor states. In some error conditions, this tracing would flood the trace and the repeated traces are not helpful for debug. - Change some journal entries to ERR when the state indicated that the system was in safe mode (OCCs disabled) - If request for occ active sensor state was sent, and then a PLDM sensor event comes in for that instance, the event status is used and the response is ignored. - Added README for occ-control
Signed-off-by: Chris Cain <cjcain@us.ibm.com> Change-Id: Ic26f1d0c4dc59e7a61b965b052d649e4bc152fde
show more ...
|
#
48002498 |
| 13-Feb-2024 |
Patrick Williams <patrick@stwcx.xyz> |
prefer std::format over fmt
Switch to std::format to remove the dependency on fmt.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Id3a1295ba8a90fb756cfc500892dcc5b3235e27b
|
#
159a2279 |
| 27-Sep-2023 |
Pavithra Barithaya <pavithra.b@ibm.com> |
Minor fix in pldmClose() API
The libpldm has introduced new transport APIs for the open(), send(), close() etc. The present APIs are being deprecated in libpldm - https://github.com/openbmc/libpldm/
Minor fix in pldmClose() API
The libpldm has introduced new transport APIs for the open(), send(), close() etc. The present APIs are being deprecated in libpldm - https://github.com/openbmc/libpldm/blob/main/src/requester/pldm.c#L30 This commit makes use of the close API as openpower-occ-control causes confusion with the fd resources in HRESET path. This change is needed until openpower-occ-control makes use of the new transport APIs in its code.
Tested: Error injection which initiates HRESET was successful.
Change-Id: Ic54b84001407935bee51c51e633e548366c6accc Signed-off-by: Pavithra Barithaya <pavithra.b@ibm.com>
show more ...
|
#
5161a028 |
| 15-Aug-2023 |
Chris Cain <cjcain@us.ibm.com> |
Fix trace of pldm return codes for yocto update
Fix Yocto update compile failure
Change-Id: I392cefaefb5aa3d44e83b1edb7df0dbaaf11d86b Signed-off-by: Chris Cain <cjcain@us.ibm.com>
|
#
a49c987e |
| 10-May-2023 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest .clang-format from the docs repository and reformat the repository.
Change-Id: I39f8c77091744c8516e043054b4ed7207d85aa08 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
082a6ca7 |
| 21-Mar-2023 |
Chris Cain <cjcain@us.ibm.com> |
Handle OCC active sensor updates prior to host runtime
On some systems, occ-control was getting notified that the OCCs were active before the host reached runtime state. This would prevent occ-contr
Handle OCC active sensor updates prior to host runtime
On some systems, occ-control was getting notified that the OCCs were active before the host reached runtime state. This would prevent occ-control from starting communication with the OCCs.
The fix will ignore the early OCC Active sensor enabled messages and once the host gets to runtime, it will re-query the sensors to ensure they are still active.
Verified on fresh boot, warm boot, BMC reset, warm boot after BMC reset on a system that exhibited the early sensors and one that did not.
Also removes an unnecessary InternalFailure when a sensor was cleared, but no OCC objets were found.
Change-Id: Idb6c107cf83d12272aef9179045de73298e6d6b6 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
7b00cde2 |
| 14-Mar-2023 |
Chris Cain <cjcain@us.ibm.com> |
Check host state before attempting OCC communication
HBRT sends a PLDM message to occ-control when the OCCs go active. If the system is powered down close to that time, there is a window where the P
Check host state before attempting OCC communication
HBRT sends a PLDM message to occ-control when the OCCs go active. If the system is powered down close to that time, there is a window where the PLDM message could get sent even when the host is no longer running. This commit will confirm the host is actually running when the message is received; if not, the OCC communication willl not be allowed.
Change-Id: Ia209a5893e8294f1b10bcb143bc59831205223ab Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
c9dc4418 |
| 06-Mar-2023 |
Chris Cain <cjcain@us.ibm.com> |
Trace PLDM response on unexpected states
Commit will trace the PLDM response packet when querying the OCC active state sensors if the state is not one of the expected values: PLDM_STATE_SET_OPERAT
Trace PLDM response on unexpected states
Commit will trace the PLDM response packet when querying the OCC active state sensors if the state is not one of the expected values: PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_IN_SERVICE PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_STOPPED PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_DORMANT
Change-Id: I87d144b68aed76e473ebf28348ade3df910a5c5b Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
af40808f |
| 22-Jul-2022 |
Patrick Williams <patrick@stwcx.xyz> |
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are for: * bus_t * exception_t * manager_t * match_t * message_t * object_t * slot_t
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I9541d521bf67882215a4a66dce020e38ac2df065
show more ...
|
#
8cf7496b |
| 29-Jun-2022 |
Chris Cain <cjcain@us.ibm.com> |
Re-fetch StateSensors if unable to find sensor
Saw defect after BMC reset/reload where occ-control only saw PDR for first OCC. Code change will re-fetch the sensors in the case where an expected PDR
Re-fetch StateSensors if unable to find sensor
Saw defect after BMC reset/reload where occ-control only saw PDR for first OCC. Code change will re-fetch the sensors in the case where an expected PDR was not found.
Testing: Forced removal of sensors which triggered the fetch and saw recovery.
Change-Id: I6e180f23b5817bc9ea0575674a318a2673f66f3d Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
157467d0 |
| 24-Jun-2022 |
Chris Cain <cjcain@us.ibm.com> |
Revert clearing PDRs if host power is off
Code was added to clear the PDRs when the host was powered off, but the next power on occ-control never saw the OCC StateSensor updates. This change will be
Revert clearing PDRs if host power is off
Code was added to clear the PDRs when the host was powered off, but the next power on occ-control never saw the OCC StateSensor updates. This change will be reverted to resolve this issue.
Tested on multiple machines with multiple reboots and guarded procs
Change-Id: Ibea28ede25c81f22e4e9fe2574c1668c4a81352c Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
72d01aab |
| 14-Jun-2022 |
Chris Cain <cjcain@us.ibm.com> |
Fix correlation between OCC StateSensorPDRs and procs
occ-control was not correlating the OCC Active sensors with the correct processor. Code change will now use the Sensor ID to know which OCC/proc
Fix correlation between OCC StateSensorPDRs and procs
occ-control was not correlating the OCC Active sensors with the correct processor. Code change will now use the Sensor ID to know which OCC/proc is active. Hostboot will also be making a change to ensure that the Sensor IDs are always numbered according to processor order (p0, p1, etc)
Wait for PHYP to start before reading PLDM sensors: occ-control caches the PLDM sensor IDs to limit the dbus queries. The cache was supposed to be cleared when the OS was powered off, but the existing code only cleared it when CurrentHostState was Off. Got a defect where occ-control was using invalid/old sensor IDs when getting notifications of OCC Active sensors. This causes the app to try communicating with the wrong or invalid OCC.
Code change will clear the sensor cache anytime PHYP is not running, and will populate the cache once PHYP is running.
Tested on hardware with various boot types and resets.
Change-Id: I4b32aa848768296065d6570466475f5b17771d2e Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
8b508bfb |
| 26-May-2022 |
Chris Cain <cjcain@us.ibm.com> |
Reuse MCTP instance IDs for PLDM retries
occ-control will request a new instance ID when it times out waiting for the PLDM response. Code change will not request a new ID unless the prior response w
Reuse MCTP instance IDs for PLDM retries
occ-control will request a new instance ID when it times out waiting for the PLDM response. Code change will not request a new ID unless the prior response was received successfully.
Change-Id: I8a3509d7ea583bb706ad2ef41bf90cc5d0f0275b Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|