#
4fa67aa1 |
| 03-Feb-2025 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: update latest spec and reformat
Copy the latest format file from the docs repository and apply.
Change-Id: If152304b21dd2daaa2f79255a4f98218615efb05 Signed-off-by: Patrick Williams <p
clang-format: update latest spec and reformat
Copy the latest format file from the docs repository and apply.
Change-Id: If152304b21dd2daaa2f79255a4f98218615efb05 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
dfddd648 |
| 16-Aug-2024 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda forma
clang-format: re-format for clang-18
clang-format-18 isn't compatible with the clang-format-17 output, so we need to reformat the code with the latest version. The way clang-18 handles lambda formatting also changed, so we have made changes to the organization default style format to better handle lambda formatting.
See I5e08687e696dd240402a2780158664b7113def0e for updated style. See Iea0776aaa7edd483fa395e23de25ebf5a6288f71 for clang-18 enablement.
Change-Id: Ica590f8613f1fb89ab1ca676ac51c1cc7e38d67f Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
5e15c3ba |
| 20-Oct-2023 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: copy latest and re-format
clang-format-17 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest
clang-format: copy latest and re-format
clang-format-17 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest .clang-format from the docs repository and reformat the repository.
Change-Id: I3e9e6350864ac267819a4b8d670bef7d3746976e Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
fbf4703f |
| 17-Jul-2023 |
Patrick Williams <patrick@stwcx.xyz> |
use std::format instead of fmt::format
The std::format is sufficient for the uses in this repository except for in one file (override_fan_target.cpp, since P2286 isn't supported by GCC yet). Switch
use std::format instead of fmt::format
The std::format is sufficient for the uses in this repository except for in one file (override_fan_target.cpp, since P2286 isn't supported by GCC yet). Switch to std::format whenever possible.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Ib2576fb530a4d7ce238e1b0bd95b40b476ec2107
show more ...
|
#
18fb12b8 |
| 09-May-2023 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Change Fan/Sensor def tuples to structs
The tuples were big and hard to read/use. Change them to structs so the members are always named.
Signed-off-by: Matt Spinler <spinler@us.ibm.com>
monitor: Change Fan/Sensor def tuples to structs
The tuples were big and hard to read/use. Change them to structs so the members are always named.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I79826563faf44636b251e614f45ff86f1e02c607
show more ...
|
#
61b73296 |
| 10-May-2023 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest .clang-format from the docs repository and reformat the repository.
Change-Id: I152f141a5e8343b92b5ce81d3ca16eec77b5606b Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
#
fce14908 |
| 13-Jan-2023 |
Chau Ly <chaul@amperecomputing.com> |
monitor: Use host state to decide power state
phosphos-fan-monitor service is using pgood to decide the power state. When power state is off, phosphor-fan-monitor should not check functionality of f
monitor: Use host state to decide power state
phosphos-fan-monitor service is using pgood to decide the power state. When power state is off, phosphor-fan-monitor should not check functionality of fans. However, with Ampere's Softoff (e.g via power cycle), it takes long for pgood to change state after the command to power cycle host is taken, so phosphor-fan-monitor fails to detect the power state is off, and continues to check functionality. This results in fans being marked non-functional when host is off during power cycle. This patch offers a package configuration option for choosing to use CurrentHostState instead of pgood to decide the power state. When the CurrentHostState is TransitioningToOff, which is set right after the power cycle command, the power state will be considered as off.
Signed-off-by: Chau Ly <chaul@amperecomputing.com> Change-Id: I6f459384b1d536f61c5df787d696412acc04ba02
show more ...
|
#
751c8beb |
| 13-Jan-2023 |
Chau Ly <chaul@amperecomputing.com> |
monitor: Add delay for host control
Some OpenBMC platforms use dbus-sensor and entity-manager to create the fan sensors. In those systems, phosphor-fan-monitor starts before the fan sensors are crea
monitor: Add delay for host control
Some OpenBMC platforms use dbus-sensor and entity-manager to create the fan sensors. In those systems, phosphor-fan-monitor starts before the fan sensors are created during BMC boot up. phosphor-fan-monitor is designed to shutdown the host when there is no fan tach sensors which is not desirable in this case. This patch supports a package configuration option delay-host-control to add a desired delay before phosphor-fan-monitor turns off host. This can be configured to match with each system timing.
Signed-off-by: Chau Ly <chaul@amperecomputing.com> Change-Id: I63cd85eb5e6cb04069ce7b4c21c2f4621d243502
show more ...
|
#
4f472a86 |
| 26-Aug-2022 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Use USR1 signal to dump debug data
Similar to what fan control is already doing, this commit adds a handler for the USR1 signal to write debug data to /tmp/fan_monitor_dump.json. The data b
monitor: Use USR1 signal to dump debug data
Similar to what fan control is already doing, this commit adds a handler for the USR1 signal to write debug data to /tmp/fan_monitor_dump.json. The data being written is the same data saved in an event log - the current sensor status plus any of the Logger class's logs.
Example output, which shows fan0 recovering from previous faults: { "logs": [ ... [ "Aug 26 17:04:47", "Setting tach sensor /xyz/openbmc_project/sensors/fan_tach/fan0_0 functional state to false. [target = 18000, input = 3446, allowed range = (10600 - NoMax) owned = true]" ], [ "Aug 26 17:04:47", "Starting shutdown action 'EPOW Power Off: 60s/60s' due to cause '2 Nonfunctional Fan Rotors'" ], [ "Aug 26 17:04:47", "Action EPOW Power Off: 60s/60s: Starting service mode timer" ], [ "Aug 26 17:04:47", "Creating event log for faulted fan /xyz/openbmc_project/inventory/system/chassis/motherboard/fan0 sensor /xyz/openbmc_project/sensors/fan_tach/fan0_0" ] ], "sensors": { "sensors": { "/xyz/openbmc_project/sensors/fan_tach/fan0_0": { "functional": false, "in_range": true, "present": true, "prev_tachs": "[11829,11867,11829,11867,11829,11867,11718,11467]", "prev_targets": "[18000,9000,9040,10320,0,0,0,0]", "tach": 11829.0, "target": 18000, "ticks": 18 }, "/xyz/openbmc_project/sensors/fan_tach/fan0_1": { "functional": false, "in_range": true, "present": true, "prev_tachs": "[17857,17772,17857,17772,17201,17045,16741,16375]", "tach": 17857.0, "ticks": 20 }, "/xyz/openbmc_project/sensors/fan_tach/fan1_0": { "functional": true, "in_range": true, "present": true, "prev_tachs": "[11755,11792,11755,11792,11755,11792,11755,11792]", "prev_targets": "[18000,9000,9040,10320,0,0,0,0]", "tach": 11755.0, "target": 18000, "ticks": 0 }, ... } } }
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I84179f78ec83ca6bab788052d0bebe677c1fd29f
show more ...
|
#
d16d464a |
| 26-Aug-2022 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Capture 'in range' status in sensor FFDC
This field provides information if the current sensor tach reading is considered healthy or not without having to manually do the math based on the
monitor: Capture 'in range' status in sensor FFDC
This field provides information if the current sensor tach reading is considered healthy or not without having to manually do the math based on the current input and target values that fan monitor does.
Example output:
"/xyz/openbmc_project/sensors/fan_tach/fan0_0": { "functional": true, "in_range": false, "present": true, "prev_tachs": "[3135,3132,3130,3127,3130,3125,3127,3125]", "prev_targets": "[9000,9040,10320,0,0,0,0,0]", "tach": 3135.0, "target": 9000, "ticks": 27 }
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ifbb6693f84fd20351bffd96c0a04e4e4872c4662
show more ...
|
#
bf8e56f6 |
| 29-Jun-2022 |
Mike Capps <mikepcapps@gmail.com> |
meson support: configuration option removal
D-Bus names and paths that were previously configurable at build-time are now hard-coded and moved to dbus_paths.hpp to reduce the number of configure-tim
meson support: configuration option removal
D-Bus names and paths that were previously configurable at build-time are now hard-coded and moved to dbus_paths.hpp to reduce the number of configure-time options and simplify maintenance.
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I16d88daad90e747cc40d87c853874b1a5fedf5fa
show more ...
|
#
87f9adc4 |
| 11-Aug-2022 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Add tick count to error log capture
If the 'count' method of looking for fan faults is configured, add the tick counts to the fan sensor data capture when an error is created. While the co
monitor: Add tick count to error log capture
If the 'count' method of looking for fan faults is configured, add the tick counts to the fan sensor data capture when an error is created. While the count will of course be at the max for the failed sensor, it will show the counts for the other sensors which could help show if other rotors are having issues but just haven't hit the thresholds yet.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I99a3e2480005244df0a0d2d86a36d6e762304bd7
show more ...
|
#
cb356d48 |
| 22-Jul-2022 |
Patrick Williams <patrick@stwcx.xyz> |
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are for: * bus_t * exception_t * manager_t * match_t * message_t * object_t * slot_t
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I9029cc722e7712633c15436bd3868d8c3209f567
show more ...
|
#
477b13bd |
| 11-Jul-2022 |
Mike Capps <mikepcapps@gmail.com> |
monitor,sensor-monitor: catch exceptions when creating BMC dumps
catch and log exceptions thrown when creating BMC dumps
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I986ca3e51302016
monitor,sensor-monitor: catch exceptions when creating BMC dumps
catch and log exceptions thrown when creating BMC dumps
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I986ca3e51302016886ca8ae571054a5b4260a093
show more ...
|
#
752f24e4 |
| 06-Jul-2022 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Default tach sensors to true
Instead of reading the functional status of the tach sensors out of the inventory on startup, just default them to true. Any issues with the fans could then be
monitor: Default tach sensors to true
Instead of reading the functional status of the tach sensors out of the inventory on startup, just default them to true. Any issues with the fans could then be rediscovered after the reboot.
This was the original behavior. It was probably changed with the intent that the shutdown timers could immediately start back up again after a reboot if things were nonfunctional before.
In practice, we've found that there can be a race between the shutdown actions turning off the system (due to nonfunctional sensors) and the sensor objects being marked functional again, even when the only reason they were nonfunctional before the reboot was because the fan sensor daemon was turned off before fan monitor on the way down.
For this to make a noticeable change, the shutdown actions/timers would have to be in progress during the reboot anyway, which is pretty unlikely.
Worst case, it would extend a shutdown by the time it takes an error to be rediscovered, which is:
If the 'count' method is configured: monitor_start_delay + (count_interval * threshold)
If the 'timebased' method is configured: monitor_start_delay + nonfunc_rotor_error_delay
This has no affect on shutdowns caused by missing fans, as the code still reads that out of the inventory on startup, plus it can be instantaneously detected as opposed to being calculated over time.
In summary, extending the shutdown time in very uncommon cases seems better than mistakenly shutting off a running system, which can be a huge deal depending on the user.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I2840c5f2e79bd734626b4144713e4428af28551a
show more ...
|
#
808d7fe8 |
| 13-Jun-2022 |
Mike Capps <mikepcapps@gmail.com> |
meson support: remove code warnings 1
This commit contains code changes necessary to support the increased warning level from Meson builds. Most changes are for unused variables.
To keep the review
meson support: remove code warnings 1
This commit contains code changes necessary to support the increased warning level from Meson builds. Most changes are for unused variables.
To keep the review size manageable, this commit contains only monitor and presence changes (and top-level json_config.hpp).
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I7280b512c54e8d5aeba3300764a239f3dcbab14d
show more ...
|
#
7b34ee0f |
| 04-May-2022 |
Mike Capps <mikepcapps@gmail.com> |
monitor: include previous targets and tachs in PEL
To discover the source of certain fan ramp-up failures, this change outputs the previous 8 targets and tach readings. The strategy is to see if har
monitor: include previous targets and tachs in PEL
To discover the source of certain fan ramp-up failures, this change outputs the previous 8 targets and tach readings. The strategy is to see if hardware limitations prevent attaining the targets quickly enough.
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: Ia38867986b8a8a651de5d01766393c07d413273c
show more ...
|
#
683a96c6 |
| 27-Apr-2022 |
Mike Capps <mikepcapps@gmail.com> |
monitor: Capture BMC dumps on fan/ambient shutdowns
When fan-monitor or sensor-monitor generates an EPOW, this change creates a BMC dump after the system is powered off and all error logs are create
monitor: Capture BMC dumps on fan/ambient shutdowns
When fan-monitor or sensor-monitor generates an EPOW, this change creates a BMC dump after the system is powered off and all error logs are created.
Change-Id: Iacdd2d2b388e79988e2536d52497f0e697e1d444 Signed-off-by: Mike Capps <mikepcapps@gmail.com>
show more ...
|
#
b4379a1e |
| 11-Oct-2021 |
Mike Capps <mikepcapps@gmail.com> |
Monitor : handle inventory service offline
Using nameHasOwner and nameOwnerChanged D-Bus signals, a callback is activated when inventory is started.
There are two primary modes for operation: Compa
Monitor : handle inventory service offline
Using nameHasOwner and nameOwnerChanged D-Bus signals, a callback is activated when inventory is started.
There are two primary modes for operation: Compatible Interfaces, the inventory-detection callback will fail, however start() will be called a second time after EntityManager starts and forces a reload of the proper config for the machine type. Separately, if no EntityManager exists, then the callback for Inventory-detection will succeed and use the default configuration file.
To test: stop fan monitor and inventory services. start monitor, wait 10s, start Inventory, after about 15s you should see the online detection.
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I289493a0aabb849abee8ce8de047513e94ee2219
show more ...
|
#
ddb773b2 |
| 06-Oct-2021 |
Patrick Williams <patrick@stwcx.xyz> |
catch exceptions as const
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Id1b5054d3147c39d98309bc11ed7016d6909e2a6
|
#
25f0327e |
| 13-Sep-2021 |
Mike Capps <mikepcapps@gmail.com> |
Monitor: Support hwmon service offline during startup
It is possible for fan-monitor to startup before the Hwmonitor service, causing unhandled exceptions that block system initialization. This fix
Monitor: Support hwmon service offline during startup
It is possible for fan-monitor to startup before the Hwmonitor service, causing unhandled exceptions that block system initialization. This fix catches the exception until a proper hwmon presence detector is deployed.
If the exception is caught, this code change forces a re-subscription during the poweron event to ensure tach sensors will receive published updates upon resumption of the hwmon service.
Signed-off-by: Mike Capps <mikepcapps@gmail.com> Change-Id: I8e696e747c432d7a6f696c5ccd9dab73abf7708f
show more ...
|
#
fdcd5db3 |
| 20-May-2021 |
Mike Capps <mikepcapps@gmail.com> |
monitor: Subscribe to tach target and feedback services
Subscribes to nameOwnerChanged signals for the services of the sensor and target interfaces for each configured fan. If those services go offl
monitor: Subscribe to tach target and feedback services
Subscribes to nameOwnerChanged signals for the services of the sensor and target interfaces for each configured fan. If those services go offline, the fan tach sensors should get marked nonfunctional due to no longer receiving updated target or feedback values. In this design, we use the existing method of determining when a fan tach sensor should be marked nonfunctional to allow a recovery window, wherein a brief offline/online transition (such as during a restart) will not trigger a nonfunctional state change.
Change-Id: I0a935ccad5a864dc952d023185356a1ef1226830 Signed-off-by: Mike Capps <mikepcapps@gmail.com>
show more ...
|
#
bb449c1c |
| 14-Jun-2021 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Shut down if no readings at power on
If there are no tach sensors on D-Bus when the power state changes to on, then create an event log and shut down the system. This is done because in th
monitor: Shut down if no readings at power on
If there are no tach sensors on D-Bus when the power state changes to on, then create an event log and shut down the system. This is done because in this case the code is not able to know the fan state - if there are any present or spinning.
The most likely reason there are no sensors (aside from a glaring error in the config file) is because the fan controller device driver failed its probe and was unable to detect it, maybe because the device didn't have power or there was an I2C problem. To aid in root cause analysis if this were to occur in the field, the code adds the following FFDC (First Failure Data Capture) to the event log:
* All of the loaded hwmon drivers, taken from /sys/class/hwmon/*/name * Failure related lines in dmesg, which is where driver errors would show up.
Tested: Unbound the fan device driver and then powered on the system. Also disabled I2C to the fan controller device in simulation and tried a power on.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ic0b80d67ec79c9401f59324fe1134ff12084112a
show more ...
|
#
823bc49e |
| 21-Jun-2021 |
Matthew Barth <msbarth@us.ibm.com> |
monitor: Use new JsonConfig object
To simplify handling the loading of config files, use the updated JsonConfig object that populates the available compatibility values used when retrieving the JSON
monitor: Use new JsonConfig object
To simplify handling the loading of config files, use the updated JsonConfig object that populates the available compatibility values used when retrieving the JSON file and loading it. The given load function is called if compatibility values are found upon being constructed or after an interfacesAdded signal is received, which then it can call `getConfFile` to find the JSON config file to be loaded.
Change-Id: Ifc164d36c036cf0ff810018d40e8de52efc6ca58 Signed-off-by: Matthew Barth <msbarth@us.ibm.com>
show more ...
|
#
f435eb1a |
| 11-May-2021 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Changes for power off errors
When a fan error causes a power off due to a power off action being triggered, the previous fan error is reposted at the time of the power off. For this error,
monitor: Changes for power off errors
When a fan error causes a power off due to a power off action being triggered, the previous fan error is reposted at the time of the power off. For this error, make the following changes that will differentiate it from the first time it was logged:
1. Change severity to Critical 2. Set POWER_THERMAL_CRITICAL_FAULT=TRUE in the additional data 3. Set SEVERITY_DETAIL=SYSTEM_TERM in the additional data
Certain implementations, such as the IBM one, will take additional actions based on these changes.
Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I5f36171e58493130114427f9e9fd870cd0d2dd76
show more ...
|