#
3ece5b99 |
| 10-Jan-2025 |
Chris Cain <cjcain@us.ibm.com> |
Add processor callout when BMC has no comm to an OCC
The non-informational PEL indicating that communication has been lost to the OCC did not have any callouts. This commit will add a callout for th
Add processor callout when BMC has no comm to an OCC
The non-informational PEL indicating that communication has been lost to the OCC did not have any callouts. This commit will add a callout for the processor that it was unable to communicate with.
Example: ''' "Error Details": { "Message": "BMC failed to communicate with the OCC" }, "Valid Word Count": "0x09", "Reference Code": "BD572684", "Hex Word 2": "00000055", "Hex Word 3": "2E2D0010", "Hex Word 4": "00000000", "Hex Word 5": "00000000", "Hex Word 6": "00000000", "Hex Word 7": "00000000", "Hex Word 8": "00000000", "Hex Word 9": "00000000", "Callout Section": { "Callout Count": "1", "Callouts": [{ "FRU Type": "Normal Hardware FRU", "Priority": "Mandatory, replace all with this type as a unit", "Location Code": "U78DA.ND0.WZS003T-P0-C15", "Part Number": "F201110", "CCIN": "5C67", "Serial Number": "YA39AAAA1828" }] } '''
A Verified on Rainier.
Change-Id: Ia26668af568238d78263b81d16fd30a849f67134 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
37abe9be |
| 31-Oct-2024 |
Chris Cain <cjcain@us.ibm.com> |
Update occ-control to use lg2 for all logging
Convert existing log<level>() trace statements to lg2::level()
Testing: Verified on Rainier - captured journal traces before and after commit during bo
Update occ-control to use lg2 for all logging
Convert existing log<level>() trace statements to lg2::level()
Testing: Verified on Rainier - captured journal traces before and after commit during boots, mode, pcap and ips changes.
Change-Id: I318fa7bf3902c641b0c28b09190db4b61d0a2fa9 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
1fe436dc |
| 10-Oct-2024 |
Chris Cain <cjcain@us.ibm.com> |
Disable Idle Power Saver support in efficiency modes
IPS will not be published on DBUS when the PowerMode is set to one of the newer efficiency modes: EfficiencyFavorPower EfficiencyFavorPerform
Disable Idle Power Saver support in efficiency modes
IPS will not be published on DBUS when the PowerMode is set to one of the newer efficiency modes: EfficiencyFavorPower EfficiencyFavorPerformance (OEM only)
This will prevent the Redfish interface from allowing GET/PATCH commands and provides a way for the GUI to know when to suppress B displaying the parameters on the customer GUI.
Testing: Verified on Rainier.
When in a non-efficiency mode, the Redfish query returns IPS data: (GET /redfish/v1/Systems/system) ''' ... "Id": "system", "IdlePowerSaver": { "Enabled": true, "EnterDwellTimeSeconds": 240, "EnterUtilizationPercent": 8, "ExitDwellTimeSeconds": 10, "ExitUtilizationPercent": 12 }, "IndicatorLED": "Off", ... '''
When in an efficiency mode, the Redfish query does not return IPS data. ''' ... "Id": "system", "IndicatorLED": "Off", ... '''
Verified across re-ipls, OCC resets, app restarts.
Change-Id: I45bb0d8e97dab33a1a66c0d791f7bb4848bfce41 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
f0295f52 |
| 12-Sep-2024 |
Chris Cain <cjcain@us.ibm.com> |
Improve BMC error handling for OCC comm failures
- Delay starting OCC reset until all OCCs have been detected (or timeout). It will prevent multiple resets from being triggered and to help detecting
Improve BMC error handling for OCC comm failures
- Delay starting OCC reset until all OCCs have been detected (or timeout). It will prevent multiple resets from being triggered and to help detecting when reset is completed (active sensor being set after reset is complete) - Wait for PLDM response to OCC reset and HRESET requests and retry if they fail - If HRESET returns NOT_READY, collect SBE FFDC and try OCC reset. A persistent failure will put the system in safe state.
- Prevent overwriting dvfs over-temp filename for p10 and beyond since that old file is only present in old kernel - Prevent assert when opening sysfs files. (added catch and then created an OCC Comm failure PEL, which will force an OCC reset.) - Check return code after reading sysfs files to confirm success. If read fails, try reset to recover.
- Updated traces to include which processor/OCC encountered issues. - Better recovery to close windows that were leaving system in partial good state.
JIRA: PFES-66 Change-Id: I0b087d0e05bd8562682062e1c662f9e18164a720 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
d7542c83 |
| 16-Aug-2024 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda forma
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda formatting also changed, so we have made changes to the organization default style format to better handle lambda formatting.
See I5e08687e696dd240402a2780158664b7113def0e for updated style. See Iea0776aaa7edd483fa395e23de25ebf5a6288f71 for clang-18 enablement.
Change-Id: I94e2bfdc8fae9bc14e30c701a0e622709ee9b0fe Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
48002498 |
| 13-Feb-2024 |
Patrick Williams <patrick@stwcx.xyz> |
prefer std::format over fmt
Switch to std::format to remove the dependency on fmt.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Id3a1295ba8a90fb756cfc500892dcc5b3235e27b
|
#
c86d80fa |
| 04-May-2023 |
Chris Cain <cjcain@us.ibm.com> |
Implement processor throttle dbus properties
- create processor throttle dbus objects for each OCC (processor) - update throttle properties based on OCC poll response data or safe mode status.
Th
Implement processor throttle dbus properties
- create processor throttle dbus objects for each OCC (processor) - update throttle properties based on OCC poll response data or safe mode status.
Throttle data will be made available via Redfish
NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Control.Power.Throttle interface - - - .ThrottleCauses property as 0 emits-change .Throttled property b false emits-change
Example of throttled processor (due to a power limit): as 1 "xyz.openbmc_project.Control.Power.Throttle.ThrottleReasons.PowerLimit"
Change-Id: I0af9d82fab9d694427d0adaa45f4a372d25fbc12 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
a49c987e |
| 10-May-2023 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest .clang-format from the docs repository and reformat the repository.
Change-Id: I39f8c77091744c8516e043054b4ed7207d85aa08 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
52cce032 |
| 09-Feb-2023 |
Chris Cain <cjcain@us.ibm.com> |
Remove Hypervisor check for PowerVM target
Remove the dbus query to determine if system is PowerVM. It is assumed that all systems are PowerVM.
Change-Id: I50d3bb76b469383618aa280c591459674f337130
Remove Hypervisor check for PowerVM target
Remove the dbus query to determine if system is PowerVM. It is assumed that all systems are PowerVM.
Change-Id: I50d3bb76b469383618aa280c591459674f337130 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
af40808f |
| 22-Jul-2022 |
Patrick Williams <patrick@stwcx.xyz> |
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are for: * bus_t * exception_t * manager_t * match_t * message_t * object_t * slot_t
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I9541d521bf67882215a4a66dce020e38ac2df065
show more ...
|
#
9789e71f |
| 25-May-2022 |
Eddie James <eajames@linux.ibm.com> |
Log a PEL for communication, presence mismatch, and safe state errors
Add code to log a PEL in various error scenarios. Refactor some of the error handling to get the return code out of the driver.
Log a PEL for communication, presence mismatch, and safe state errors
Add code to log a PEL in various error scenarios. Refactor some of the error handling to get the return code out of the driver.
Signed-off-by: Eddie James <eajames@linux.ibm.com> Change-Id: Ifd91cfc063718e484ec8886df8357d115c6b41e3
show more ...
|
#
c8dd4599 |
| 12-May-2022 |
Sheldon Bailey <baileysh@us.ibm.com> |
Fix Max fan speed with Deconfigured DIMM
Fan Speeds Remain at Maximum Speed With Deconfigured DIMMs Tested: Injected Errors on DIMMs. Signed-off-by: Sheldon Bailey <baileysh@us.ibm.com> Change-Id: I
Fix Max fan speed with Deconfigured DIMM
Fan Speeds Remain at Maximum Speed With Deconfigured DIMMs Tested: Injected Errors on DIMMs. Signed-off-by: Sheldon Bailey <baileysh@us.ibm.com> Change-Id: I797698f888cd9fb431d4f490c144eb5ad942bcd8
show more ...
|
#
6d8f37a2 |
| 29-Apr-2022 |
Chris Cain <cjcain@us.ibm.com> |
Ensure occActive called after device enabled
I noticed that the power cap bounds was not getting updated. This was caused because the device.master() call was made before setting the occActive flag.
Ensure occActive called after device enabled
I noticed that the power cap bounds was not getting updated. This was caused because the device.master() call was made before setting the occActive flag.
The device object now prevents the device from triggering sysfs reads if the occActive flag is not set. This change will set occActive right after enabling the device.
Also prevent extra calls to checkAllActiveSensors by only setting the waitingForAll flag once.
Change-Id: If19771e53b64dd1d5e27e7ae153bd56e0e070aff Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
bd551de3 |
| 26-Apr-2022 |
Chris Cain <cjcain@us.ibm.com> |
Queue active sensor updates before status objs created
If an OCC Active sensor update comes in before the Status objects have been created, the value is lost. This commit will save that sensor valu
Queue active sensor updates before status objs created
If an OCC Active sensor update comes in before the Status objects have been created, the value is lost. This commit will save that sensor value until the status objects are created. Also, prevent querying the master sysfs file if OCCs are not yet activated.
Change-Id: I8c0d2686024278ec7435e19cf99f59802e1ecceb Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
aced3098 |
| 22-Apr-2022 |
Eddie James <eajames@linux.ibm.com> |
Switch to setting Device active
With the latest Linux driver, the hwmon device is always bound, so instead use the occ_active attribute to tell the driver when the OCC has gone active.
Signed-off-b
Switch to setting Device active
With the latest Linux driver, the hwmon device is always bound, so instead use the occ_active attribute to tell the driver when the OCC has gone active.
Signed-off-by: Eddie James <eajames@linux.ibm.com> Change-Id: Ia89b517d90a1947f87bb5937bf0d4c1b50191201
show more ...
|
#
bae4d07e |
| 28-Feb-2022 |
Chris Cain <cjcain@us.ibm.com> |
Wait for OCC Active PDR to become available
occ-control was attempting to communicate with OCCs before the OCC Active sensor PDR was available. This commit will wait until all PDRs are available bef
Wait for OCC Active PDR to become available
occ-control was attempting to communicate with OCCs before the OCC Active sensor PDR was available. This commit will wait until all PDRs are available before allowing communication with the OCCs. It will also read the OCC Active sensor to determine the state before allowing communication.
When Hostboot moves a system to safe mode it will set the PLDM state of the OCC Active sensor to PLDM_STATE_SET_OPERATIONAL_RUNNING_STATUS_DORMANT This commit will also handle processing of that DORMANT status.
Change-Id: I348cf14455d9373898deec2ac4a04b3752e81c4a Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
c567dc8d |
| 01-Apr-2022 |
Chris Cain <cjcain@us.ibm.com> |
Prevent termination on OCC open/read/write errors
Instead of creating an elog and terminating, the code will now retry on a Open/Read/Write failure and if it continues to fail, it will return a Comm
Prevent termination on OCC open/read/write errors
Instead of creating an elog and terminating, the code will now retry on a Open/Read/Write failure and if it continues to fail, it will return a CommFailure to the caller. Then the caller of can determine how to handle (reset PM complex or ignore). Normally all communication errors with the OCC should trigger an OCC reset.
Change-Id: Ibf272270156edae565231ca429f15e8fca2f15bc Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
373af757 |
| 21-Feb-2022 |
Sheldon Bailey <baileysh@us.ibm.com> |
openpower-occ-control:failure to read OCC state.
Failure to read OCC state set the OCCs sensors to Nan/Not Functional Tested: cronus error inject on OCC with and without OCC resets. Signed-off-by: S
openpower-occ-control:failure to read OCC state.
Failure to read OCC state set the OCCs sensors to Nan/Not Functional Tested: cronus error inject on OCC with and without OCC resets. Signed-off-by: Sheldon Bailey <baileysh@us.ibm.com> Change-Id: I2a6bb6a431f09ea816979b3a482b54a28e21db53 Signed-off-by: Sheldon Bailey <baileysh@us.ibm.com>
show more ...
|
#
e2d0a43c |
| 28-Mar-2022 |
Chris Cain <cjcain@us.ibm.com> |
Handle file errors when device path gets unbound
occ-control was asserting when the hwmon path would get removed due to device being unbound. This change will gracefully handle the path getting remo
Handle file errors when device path gets unbound
occ-control was asserting when the hwmon path would get removed due to device being unbound. This change will gracefully handle the path getting removed / added back.
Tested on Raininer by unbind/bind of devices: echo occ-hwmon.2 > /sys/bus/platform/drivers/occ-hwmon/unbind echo occ-hwmon.2 > /sys/bus/platform/drivers/occ-hwmon/bind
Change-Id: I46fd2c2c54868ffb8183d3dc49cd0c2751165d3b Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
40501a23 |
| 14-Mar-2022 |
Chris Cain <cjcain@us.ibm.com> |
Monitor for power cap changes when occ-control started
The object that is monitoring for power cap changes needs to get created when occ-control is started so that changes are not missed. Added vali
Monitor for power cap changes when occ-control started
The object that is monitoring for power cap changes needs to get created when occ-control is started so that changes are not missed. Added validation check to ensure that the value sent to the OCC matches the user configured value on dbus.
Verified on Raininer
Change-Id: I0d77f4569e5459ff58d6fc72147153133d2104a3 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
5d66a0aa |
| 09-Feb-2022 |
Chris Cain <cjcain@us.ibm.com> |
Expose power cap min/max on dbus
The min and max power cap values will be read from sysfs files and then the data will be put on the dbus. This will allow users to know the valid range of power cap
Expose power cap min/max on dbus
The min and max power cap values will be read from sysfs files and then the data will be put on the dbus. This will allow users to know the valid range of power cap available.
The PowerCap object was moved from Manager to Status.
Change-Id: I5196cc8645f84c31a5282cf844109bae47b09bf7 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
1be4337b |
| 09-Dec-2021 |
Chris Cain <cjcain@us.ibm.com> |
Read default power mode & IPS parameters from DBus
If the power mode or idle power saver parameters have never been set, they will be read from a set of default parameters that are on DBus. This all
Read default power mode & IPS parameters from DBus
If the power mode or idle power saver parameters have never been set, they will be read from a set of default parameters that are on DBus. This allows individual systems to have different default settings. Mode/IPS parameters moved from Settings to OCC.Control
Code verified on Everest hardware with multiple scenarios including host reboots, BMC reboots, OCC resets.
Change-Id: Id02613455db7f382791ff37c7dc420dbea1e0906 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
6fa848a9 |
| 24-Jan-2022 |
Chris Cain <cjcain@us.ibm.com> |
Fix occ-control terminations
1. Ensure PowerMode object created before attempting to use 2. Only call setChassisAssociation after getting successful temperature reading.
Tested on Everest hw
Chang
Fix occ-control terminations
1. Ensure PowerMode object created before attempting to use 2. Only call setChassisAssociation after getting successful temperature reading.
Tested on Everest hw
Change-Id: I308c072cf5ab0235086c136ba7644125de0a8c6a Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
36f9cded |
| 22-Nov-2021 |
Chris Cain <cjcain@us.ibm.com> |
Add support for OEM Power Modes
- Allow mode to be set via PassThrough interface - Allow non-customer OEM power modes to be persisted - Persist any OEM power mode settings - moved mode related code
Add support for OEM Power Modes
- Allow mode to be set via PassThrough interface - Allow non-customer OEM power modes to be persisted - Persist any OEM power mode settings - moved mode related code from Status to PowerMode object - merged PowerIPS into PowerMode object
Tested on Everest and Rainier. Setting mode through PassThrough/ce-login: busctl call org.open_power.OCC.Control /org/open_power/control/occ0 org.open_power.OCC.PassThrough SetMode yq 11 3600 Trace (via PassThrough interface) openpower-occ-control[4440]: PassThrough::setMode() Setting Power Mode 11 (data: 3600) openpower-occ-control[4440]: PowerMode::sendModeChange: SET_MODE(11,3600) command to OCC0 (9 bytes) Trace (setting mode via GUI/Redfish): openpower-occ-control[4440]: Power Mode Change Requested: xyz.openbmc_project.Control.Power.Mode.PowerMode.MaximumPerformance openpower-occ-control[4440]: PowerMode::sendModeChange: SET_MODE(12,0) command to OCC0 (9 bytes) Verified when system in any OEM mode that Redfish also reports OEM Verified all modes are persisted across PM Complex resets and reboots
Change-Id: Idd0be05cb6fd74dbd0776145f212c49addd1c365 Signed-off-by: Chris Cain <cjcain@us.ibm.com>
show more ...
|
#
a7b74dc3 |
| 10-Nov-2021 |
Chris Cain <cjcain@us.ibm.com> |
Clean up error cases on boot or when app restarted
- Detect when no or duplicate masters are found (force reset) - Prevent sending commands to OCC when OCC is disabled - Detect OCC state changes to
Clean up error cases on boot or when app restarted
- Detect when no or duplicate masters are found (force reset) - Prevent sending commands to OCC when OCC is disabled - Detect OCC state changes to SAFE (after 60 sec force reset)
Tested on Everest and Rainier hardware
Signed-off-by: Chris Cain <cjcain@us.ibm.com> Change-Id: I490f182405e11da207b42a0607a532566479bfd9
show more ...
|