History log of /openbmc/openpower-occ-control/pldm.cpp (Results 1 – 25 of 49)
Revision Date Author Comments
# c488bac1 17-Mar-2025 Chris Cain <cjcain@us.ibm.com>

Clear flags when the host changes to powered off state

When the host is powered off, the OCCs will be stopped, so clear any
reset pending flags as well as any outstanding HRESET requests.

This will

Clear flags when the host changes to powered off state

When the host is powered off, the OCCs will be stopped, so clear any
reset pending flags as well as any outstanding HRESET requests.

This will ensure the next boot will start clean and not react to
something that happened on prior boot.

Tested on Rainier for several error scenarios.

Change-Id: Ie4156975a844e823787f7162ee0542d7f099bd12
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 92dfb271 13-Feb-2025 Chris Cain <cjcain@us.ibm.com>

Ignore HRESET_NOT_READY state until HRESET completes

After HRESET has been requested, code will wait for HRESET_READY or
HRESET_FAILED status before attempting OCC communication again.

Code will al

Ignore HRESET_NOT_READY state until HRESET completes

After HRESET has been requested, code will wait for HRESET_READY or
HRESET_FAILED status before attempting OCC communication again.

Code will also not clear the outstandingHReset until READY/FAILED, since
the reset should still be in progress.

OCC comm will get disabled before the HRESET and re-enabled if
reset completes successfully. If failed, no further comm will work.

My testing found that pldm instance ids were not getting freed
automatically when receiving a response. So this change will also free
those IDs when the response is received.

Tested on Rainier with recoverable and unrecoverable SBE injects.

'''
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation - failbit badbit
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=0)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation - failbit badbit
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=11)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: SBE timeout, requesting HRESET (OCC0)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to False
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: got id 15 and set PldmInstanceId to 15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: openMctpDemuxTransport: pldmFd has fd=9
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: sendPldm: calling pldm_transport_send_msg(OCC0, instance:15, 8 bytes, timeout 30)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: calling pldm_transport_recv_msg() instance:15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: pldm_transport_recv_msg() rsp was 4 bytes
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: Reset has been successfully started
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Freed PLDM instance ID 15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldm: HRESET is NOT READY (OCC0)
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: HRESET succeeded (OCC0)
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to True
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: validateOccMaster: OCC0 is master of 4 OCCs
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: OCC0 state 0x3 (lastState: 0x0)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendModeChange: SET_MODE(12,0) command to OCC0 (9 bytes)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Idle Power Saver Parameters: enabled:True, enter:8%/240s, exit:12%/10s
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendIpsData: SET_CFG_DATA[IPS] command to OCC0 (12 bytes)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: successfully read OCC0 state: 3
'''

Change-Id: I7e5bc60576e4e8fa6cba4253be535220cb8048ec
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 2d6ec909 01-Feb-2025 Patrick Williams <patrick@stwcx.xyz>

clang-format: update latest spec and reformat

Copy the latest format file from the docs repository and apply.

Change-Id: I289fffda3d8b39bb9b16eab30928d0aa7c8e1821
Signed-off-by: Patrick Williams <p

clang-format: update latest spec and reformat

Copy the latest format file from the docs repository and apply.

Change-Id: I289fffda3d8b39bb9b16eab30928d0aa7c8e1821
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>

show more ...


# 37abe9be 31-Oct-2024 Chris Cain <cjcain@us.ibm.com>

Update occ-control to use lg2 for all logging

Convert existing log<level>() trace statements to lg2::level()

Testing: Verified on Rainier - captured journal traces before and after
commit during bo

Update occ-control to use lg2 for all logging

Convert existing log<level>() trace statements to lg2::level()

Testing: Verified on Rainier - captured journal traces before and after
commit during boots, mode, pcap and ips changes.

Change-Id: I318fa7bf3902c641b0c28b09190db4b61d0a2fa9
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# f0295f52 12-Sep-2024 Chris Cain <cjcain@us.ibm.com>

Improve BMC error handling for OCC comm failures

- Delay starting OCC reset until all OCCs have been detected (or
timeout). It will prevent multiple resets from being triggered and to
help detecting

Improve BMC error handling for OCC comm failures

- Delay starting OCC reset until all OCCs have been detected (or
timeout). It will prevent multiple resets from being triggered and to
help detecting when reset is completed (active sensor being set after
reset is complete)
- Wait for PLDM response to OCC reset and HRESET requests and retry if
they fail
- If HRESET returns NOT_READY, collect SBE FFDC and try OCC reset. A
persistent failure will put the system in safe state.

- Prevent overwriting dvfs over-temp filename for p10 and beyond since
that old file is only present in old kernel
- Prevent assert when opening sysfs files. (added catch and then created
an OCC Comm failure PEL, which will force an OCC reset.)
- Check return code after reading sysfs files to confirm success. If
read fails, try reset to recover.

- Updated traces to include which processor/OCC encountered issues.
- Better recovery to close windows that were leaving system in partial
good state.

JIRA: PFES-66
Change-Id: I0b087d0e05bd8562682062e1c662f9e18164a720
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 88811adc 27-Aug-2024 Eddie James <eajames@linux.ibm.com>

Fix missing PLDM transport configuration

The PLDM transport option wasn't actually used. In addition,
throttle the PLDM open failure trace.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Change

Fix missing PLDM transport configuration

The PLDM transport option wasn't actually used. In addition,
throttle the PLDM open failure trace.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Change-Id: I6c5f9151b3230171a1f050d54bbb143334577ab2

show more ...


# 6213f199 01-Jul-2024 Lakshmi Yadlapati <lakshmiy@us.ibm.com>

Add kernel MCTP (AF_MCTP) support and transport-implementation option

-Added support for kernel MCTP (AF_MCTP) to enable MCTP communication
using AF_MCTP sockets.

- Introduced a new configuration

Add kernel MCTP (AF_MCTP) support and transport-implementation option

-Added support for kernel MCTP (AF_MCTP) to enable MCTP communication
using AF_MCTP sockets.

- Introduced a new configuration option 'transport-implementation'

The 'transport-implementation' option can be set to either:
- 'mctp-demux': Uses the existing mctp-demux transport method.
- 'af-mctp': Uses the new kernel AF_MCTP transport method.

Change-Id: I2978273fe4579d1dce00368dabb7f90815dbbce8
Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>
Signed-off-by: Eddie James <eajames@linux.ibm.com>

show more ...


# d7542c83 16-Aug-2024 Patrick Williams <patrick@stwcx.xyz>

clang-format: re-format for clang-18

clang-format-18 isn't compatible with the clang-format-17 output, so we
need to reformat the code with the latest version. The way clang-18
handles lambda forma

clang-format: re-format for clang-18

clang-format-18 isn't compatible with the clang-format-17 output, so we
need to reformat the code with the latest version. The way clang-18
handles lambda formatting also changed, so we have made changes to the
organization default style format to better handle lambda formatting.

See I5e08687e696dd240402a2780158664b7113def0e for updated style.
See Iea0776aaa7edd483fa395e23de25ebf5a6288f71 for clang-18 enablement.

Change-Id: I94e2bfdc8fae9bc14e30c701a0e622709ee9b0fe
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>

show more ...


# 52328cb4 14-Feb-2023 Rashmica Gupta <rashmica@linux.ibm.com>

Move to libpldm pldm_transport APIs

- Replaced the deprecated pldm transport APIs with the new libpldm
pldm_transport APIs.

This change migrates the application off of the deprecated "requester"

Move to libpldm pldm_transport APIs

- Replaced the deprecated pldm transport APIs with the new libpldm
pldm_transport APIs.

This change migrates the application off of the deprecated "requester"
APIs in libpldm.

We don't currently have the infrastructure in place to get the correct
TIDs, so to keep everything working as before use the EID as the TID in
the EID-to-TID mapping.

Change-Id: Iedbfe936a710d37f75e737c3d307295ff55cf07b
Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com>
Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>

show more ...


# db38e91a 24-May-2023 Rashmica Gupta <rashmica@linux.ibm.com>

Move to libpldm instance id APIs

Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com>
Change-Id: I2955097a78c673f65054fa9bff1ef5243da136a2
Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>


# aeba51cd 16-Feb-2023 Rashmica Gupta <rashmica@linux.ibm.com>

Change "mctp instance id" to "pldm instance id"

It's not a MCTP thing.

Change-Id: I922d28f848c8948d43ea4b8926d2310c8fb50228
Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com>
Signed-off-by: Lak

Change "mctp instance id" to "pldm instance id"

It's not a MCTP thing.

Change-Id: I922d28f848c8948d43ea4b8926d2310c8fb50228
Signed-off-by: Rashmica Gupta <rashmica@linux.ibm.com>
Signed-off-by: Lakshmi Yadlapati <lakshmiy@us.ibm.com>

show more ...


# 97476a1e 19-Jun-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

pldm: Replace deprecated libpldm header path

There are more OEMs than IBM contributing to libpldm, so the OEM headers
were restructured. Replace the deprecated IBM OEM header path with the
namespace

pldm: Replace deprecated libpldm header path

There are more OEMs than IBM contributing to libpldm, so the OEM headers
were restructured. Replace the deprecated IBM OEM header path with the
namespaced path.

The patch was generated by with the coccinelle[1] script from [2]:

```
$ spatch \
--sp-file .../libpldm/origin/evolutions/current/oem-ibm-header-compat.cocci \
--in-place \
$(git ls-files | grep -E '\.[ch](pp)?')
```

[1]: https://coccinelle.gitlabpages.inria.fr/website/
[2]: https://gerrit.openbmc.org/c/openbmc/libpldm/+/72202

Change-Id: Ib3b4c5a650a06e816155a91ac3a83c2176639dc7
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>

show more ...


# 755af102 27-Feb-2024 Chris Cain <cjcain@us.ibm.com>

Handle other PLDM_STATE_SET_OPERATIONAL states

- Code will no longer assume the OCC is not running if an unexpected
state is received. It will continue to look for a good/known state.
- Added code t

Handle other PLDM_STATE_SET_OPERATIONAL states

- Code will no longer assume the OCC is not running if an unexpected
state is received. It will continue to look for a good/known state.
- Added code that will throttle the occ-control pldm journal traces
if unable to read the OCC active sensor states. In some error
conditions, this tracing would flood the trace and the repeated traces
are not helpful for debug.
- Change some journal entries to ERR when the state indicated that the
system was in safe mode (OCCs disabled)
- If request for occ active sensor state was sent, and then a PLDM
sensor event comes in for that instance, the event status is used and
the response is ignored.
- Added README for occ-control

Signed-off-by: Chris Cain <cjcain@us.ibm.com>
Change-Id: Ic26f1d0c4dc59e7a61b965b052d649e4bc152fde

show more ...


# 48002498 13-Feb-2024 Patrick Williams <patrick@stwcx.xyz>

prefer std::format over fmt

Switch to std::format to remove the dependency on fmt.

Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
Change-Id: Id3a1295ba8a90fb756cfc500892dcc5b3235e27b


# 159a2279 27-Sep-2023 Pavithra Barithaya <pavithra.b@ibm.com>

Minor fix in pldmClose() API

The libpldm has introduced new transport APIs for the
open(), send(), close() etc. The present APIs are being deprecated
in libpldm - https://github.com/openbmc/libpldm/

Minor fix in pldmClose() API

The libpldm has introduced new transport APIs for the
open(), send(), close() etc. The present APIs are being deprecated
in libpldm - https://github.com/openbmc/libpldm/blob/main/src/requester/pldm.c#L30
This commit makes use of the close API as openpower-occ-control
causes confusion with the fd resources in HRESET path.
This change is needed until openpower-occ-control makes use of the
new transport APIs in its code.

Tested: Error injection which initiates HRESET was successful.

Change-Id: Ic54b84001407935bee51c51e633e548366c6accc
Signed-off-by: Pavithra Barithaya <pavithra.b@ibm.com>

show more ...


# 5161a028 15-Aug-2023 Chris Cain <cjcain@us.ibm.com>

Fix trace of pldm return codes for yocto update

Fix Yocto update compile failure

Change-Id: I392cefaefb5aa3d44e83b1edb7df0dbaaf11d86b
Signed-off-by: Chris Cain <cjcain@us.ibm.com>


# a49c987e 10-May-2023 Patrick Williams <patrick@stwcx.xyz>

clang-format: copy latest and re-format

clang-format-16 has some backwards incompatible changes that require
additional settings for best compatibility and re-running the formatter.
Copy the latest

clang-format: copy latest and re-format

clang-format-16 has some backwards incompatible changes that require
additional settings for best compatibility and re-running the formatter.
Copy the latest .clang-format from the docs repository and reformat the
repository.

Change-Id: I39f8c77091744c8516e043054b4ed7207d85aa08
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>

show more ...


# 082a6ca7 21-Mar-2023 Chris Cain <cjcain@us.ibm.com>

Handle OCC active sensor updates prior to host runtime

On some systems, occ-control was getting notified that the OCCs were
active before the host reached runtime state. This would prevent
occ-contr

Handle OCC active sensor updates prior to host runtime

On some systems, occ-control was getting notified that the OCCs were
active before the host reached runtime state. This would prevent
occ-control from starting communication with the OCCs.

The fix will ignore the early OCC Active sensor enabled messages and
once the host gets to runtime, it will re-query the sensors to ensure
they are still active.

Verified on fresh boot, warm boot, BMC reset, warm boot after BMC reset
on a system that exhibited the early sensors and one that did not.

Also removes an unnecessary InternalFailure when a sensor was cleared,
but no OCC objets were found.

Change-Id: Idb6c107cf83d12272aef9179045de73298e6d6b6
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 7b00cde2 14-Mar-2023 Chris Cain <cjcain@us.ibm.com>

Check host state before attempting OCC communication

HBRT sends a PLDM message to occ-control when the OCCs go active. If the
system is powered down close to that time, there is a window where the
P

Check host state before attempting OCC communication

HBRT sends a PLDM message to occ-control when the OCCs go active. If the
system is powered down close to that time, there is a window where the
PLDM message could get sent even when the host is no longer running.
This commit will confirm the host is actually running when the message
is received; if not, the OCC communication willl not be allowed.

Change-Id: Ia209a5893e8294f1b10bcb143bc59831205223ab
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# c9dc4418 06-Mar-2023 Chris Cain <cjcain@us.ibm.com>

Trace PLDM response on unexpected states

Commit will trace the PLDM response packet when querying the OCC active
state sensors if the state is not one of the expected values:
PLDM_STATE_SET_OPERAT

Trace PLDM response on unexpected states

Commit will trace the PLDM response packet when querying the OCC active
state sensors if the state is not one of the expected values:
PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_IN_SERVICE
PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_STOPPED
PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_DORMANT

Change-Id: I87d144b68aed76e473ebf28348ade3df910a5c5b
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# af40808f 22-Jul-2022 Patrick Williams <patrick@stwcx.xyz>

sdbusplus: use shorter type aliases

The sdbusplus headers provide shortened aliases for many types.
Switch to using them to provide better code clarity and shorter
lines. Possible replacements are

sdbusplus: use shorter type aliases

The sdbusplus headers provide shortened aliases for many types.
Switch to using them to provide better code clarity and shorter
lines. Possible replacements are for:
* bus_t
* exception_t
* manager_t
* match_t
* message_t
* object_t
* slot_t

Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
Change-Id: I9541d521bf67882215a4a66dce020e38ac2df065

show more ...


# 8cf7496b 29-Jun-2022 Chris Cain <cjcain@us.ibm.com>

Re-fetch StateSensors if unable to find sensor

Saw defect after BMC reset/reload where occ-control only saw PDR for
first OCC. Code change will re-fetch the sensors in the case where an
expected PDR

Re-fetch StateSensors if unable to find sensor

Saw defect after BMC reset/reload where occ-control only saw PDR for
first OCC. Code change will re-fetch the sensors in the case where an
expected PDR was not found.

Testing:
Forced removal of sensors which triggered the fetch and saw recovery.

Change-Id: I6e180f23b5817bc9ea0575674a318a2673f66f3d
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 157467d0 24-Jun-2022 Chris Cain <cjcain@us.ibm.com>

Revert clearing PDRs if host power is off

Code was added to clear the PDRs when the host was powered off, but the
next power on occ-control never saw the OCC StateSensor updates.
This change will be

Revert clearing PDRs if host power is off

Code was added to clear the PDRs when the host was powered off, but the
next power on occ-control never saw the OCC StateSensor updates.
This change will be reverted to resolve this issue.

Tested on multiple machines with multiple reboots and guarded procs

Change-Id: Ibea28ede25c81f22e4e9fe2574c1668c4a81352c
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 72d01aab 14-Jun-2022 Chris Cain <cjcain@us.ibm.com>

Fix correlation between OCC StateSensorPDRs and procs

occ-control was not correlating the OCC Active sensors with the correct
processor. Code change will now use the Sensor ID to know which
OCC/proc

Fix correlation between OCC StateSensorPDRs and procs

occ-control was not correlating the OCC Active sensors with the correct
processor. Code change will now use the Sensor ID to know which
OCC/proc is active. Hostboot will also be making a change to ensure that
the Sensor IDs are always numbered according to processor order (p0, p1,
etc)

Wait for PHYP to start before reading PLDM sensors:
occ-control caches the PLDM sensor IDs to limit the dbus queries.
The cache was supposed to be cleared when the OS was powered off, but
the existing code only cleared it when CurrentHostState was Off.
Got a defect where occ-control was using invalid/old sensor IDs when
getting notifications of OCC Active sensors. This causes the app to try
communicating with the wrong or invalid OCC.

Code change will clear the sensor cache anytime PHYP is not running, and
will populate the cache once PHYP is running.

Tested on hardware with various boot types and resets.

Change-Id: I4b32aa848768296065d6570466475f5b17771d2e
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


# 8b508bfb 26-May-2022 Chris Cain <cjcain@us.ibm.com>

Reuse MCTP instance IDs for PLDM retries

occ-control will request a new instance ID when it times out waiting for
the PLDM response. Code change will not request a new ID unless the
prior response w

Reuse MCTP instance IDs for PLDM retries

occ-control will request a new instance ID when it times out waiting for
the PLDM response. Code change will not request a new ID unless the
prior response was received successfully.

Change-Id: I8a3509d7ea583bb706ad2ef41bf90cc5d0f0275b
Signed-off-by: Chris Cain <cjcain@us.ibm.com>

show more ...


12