#
7d135641 |
| 04-Feb-2021 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Support for running with power off Put in the remaining changes necessary so that fan monitor doesn't need to be killed when power turns off. This includes things like:
monitor: Support for running with power off Put in the remaining changes necessary so that fan monitor doesn't need to be killed when power turns off. This includes things like: * Support for starting before the Present property is on D-Bus. * Support for starting before the config file name is available. * Stopping any running timers when power is turned off. * Checking the power off rules when power turns on. Most, but not all, of the changes are common between the JSON and YAML modes, but this only truly supported when compiled for JSON. This also removes the init vs monitor modes of operation, if compiled for JSON. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ic2c6848f24511c9dc763227e05bbebb4c8c80cd1
show more ...
|
#
ba3ee9ae |
| 06-Jan-2021 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Fill in EpowPowerOff action This action does the following: 1) Starts a service mode timer, which would allow the system to be serviced before anything happens.
monitor: Fill in EpowPowerOff action This action does the following: 1) Starts a service mode timer, which would allow the system to be serviced before anything happens. 2) On the expiration of that timer, it will: a) Set the thermal fault alert D-Bus property. This will be used to send an EPOW alert to the host on IBM systems. b) Start the meltdown timer. 3) On the expiration of the meltdown timer, a hard power off will occur. This timer cannot be canceled even if fans start behaving. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I9434699b816b23b68c6d9d1e97283b4ab9befe4f
show more ...
|
#
c8d3c51f |
| 06-Jan-2021 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Add thermal fault alert D-Bus property Add a new property to alert of a thermal fault. In this context, it means an imminent power off due to fan faults. On certain IBM system
monitor: Add thermal fault alert D-Bus property Add a new property to alert of a thermal fault. In this context, it means an imminent power off due to fan faults. On certain IBM systems it will be used as a mechanism to alert the host of the power off when the 'epow_power_off' power off rule is used. Service: xyz.openbmc_project.Thermal.Alert Path: /xyz/openbmc_project/alerts/thermal_fault_alert Interface: xyz.openbmc_project.Object.Enable Property: Enabled Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I0531de9ce40b6148244fda18a20e144bad85d830
show more ...
|
#
ac1efc11 |
| 27-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Re-log fan error on a power off In the case where a power off rule runs to completion and powers off the system due to either missing or faulted fans, at the point of power off
monitor: Re-log fan error on a power off In the case where a power off rule runs to completion and powers off the system due to either missing or faulted fans, at the point of power off re-post the event log for the previous fan error. This way, there can be an error associated with the power off, because depending on the power off rule delays the original error could have happened several minutes or more in the past. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I1a38062cf75ffd4a11baa417ef3983b6c1a47ada
show more ...
|
#
27f6b686 |
| 27-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Event logs for missing fans This commit adds the code to create event logs calling out the fan when it has been missing for a certain amount of time. This is basically
monitor: Event logs for missing fans This commit adds the code to create event logs calling out the fan when it has been missing for a certain amount of time. This is basically identical to the functionality that the fan presence application in this repo provides, but with it in this application all fan errors are created from the same place. This will become important when there is a power off due to a fan missing and the error for that needs to be re-committed at power off time so it can be shown as the cause of the power off. The functionality is configured in the JSON: fan_missing_error_delay: Defines the number of seconds a fan must be missing with power on before an error will be created. If this isn't present in the JSON, then errors will not be created at all. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I76de9d8d1bf6e283560b1ce46e70f84522e2d708
show more ...
|
#
f13b42e2 |
| 26-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Event logs for nonfunc fan sensors This commit adds the code to create event logs calling out the fan when fan sensors have been nonfunctional for a certain amount of time.
monitor: Event logs for nonfunc fan sensors This commit adds the code to create event logs calling out the fan when fan sensors have been nonfunctional for a certain amount of time. This functionality is configured in the JSON, and will only be enabled if the 'fault_handling' JSON section is present. It uses the following new JSON parameters: nonfunc_rotor_error_delay (per fan): This says how many seconds a fan sensor must be nonfunctional before the event log will be created. num_nonfunc_rotors_before_error (under fault_handling): This specifies how many nonfunctional fan rotors there must be at the same time before an event log with an error severity is created for the rotor. When there are fewer than this many nonfunctional rotors, then event logs with an informational severity will be created. A new FanError class is used to create the event logs. It adds the Logger output as FFDC, plus any JSON data that is passed in with the commit() API. It uses CALLOUT_INVENTORY_PATH in the AdditionalData property to specify the faulted fan FRU. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I365114357580b4f38ec943a769c1ce7f695b51ab
show more ...
|
#
e892e39a |
| 14-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Start checking power off rules In the system object, load the power off rules and start checking them. It will check them in the following cases (if power is on): * When the
monitor: Start checking power off rules In the system object, load the power off rules and start checking them. It will check them in the following cases (if power is on): * When the object is constructed * When the JSON config is reloaded * When fan presence or sensor functional state changes * When the power state changes to on When the power is turned off, it will cancel any running rules. Previously, fan monitor was only designed to run with power on, and there still may be more changes than just the ones added here to support it always running. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I8be81612ae4997d7568678471ac0f6f854a0e758
show more ...
|
#
b63aa09e |
| 14-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Track fan health in the System object To prepare for being able to power off the system based on missing fans or nonfunctional fan sensors, put a global view of this health for
monitor: Track fan health in the System object To prepare for being able to power off the system based on missing fans or nonfunctional fan sensors, put a global view of this health for all fans in the System object. This requires now keeping track of fan presence. This information is stored in a map based on the fan name. It is done this way, as opposed to just always calling present/functional APIs on the Fan objects, so that the code that will be using this information can be tested in isolation without the System or Fan objects. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: Ieb1d4003bd13cebc806fd06f0064c63ea8ac6180
show more ...
|
#
b0412d07 |
| 12-Oct-2020 |
Matt Spinler <spinler@us.ibm.com> |
monitor: Use only init mode when using JSON Fan monitor is currently split into 2 modes - 'init' which is used right after a power on, and 'monitor', which is used later after the fa
monitor: Use only init mode when using JSON Fan monitor is currently split into 2 modes - 'init' which is used right after a power on, and 'monitor', which is used later after the fans-ready target is started. Normally, the 'init' mode just sets the fans to functional and then exits, and the real monitoring work is done in the 'monitor' mode. In the future this application will need to be able to check for fan problems as soon as it starts up after power on so that it can handle shutting down due to missing fans. To prepare for this, move all functionality into the init mode, and just exit immediately when called to run in the monitor mode. Only do this when compiled to use the JSON configuration, as this is new and I don't want to change how the existing YAML setups work. This also creates a new 'monitor_start_delay' entry in the JSON to say how long to wait after startup before actually doing any sensor monitoring, which then gives the same behavior as how the monitor mode would delay by waiting for the fan control ready target, which itself is started by fan control --init after a hardcoded delay. This field is optional to preserve backwards compatibility and defaults to 0s. Signed-off-by: Matt Spinler <spinler@us.ibm.com> Change-Id: I623a233f50233e734f50cd9e80139c60467518d8
show more ...
|
#
d06905c9 |
| 12-Jun-2020 |
Matthew Barth <msbarth@us.ibm.com> |
monitor:SIGHUP: Handle reloading JSON config thru SIGHUP Enable capturing the HUP signal to reload the JSON configuration. This will reload the appropriate JSON configuration file found
monitor:SIGHUP: Handle reloading JSON config thru SIGHUP Enable capturing the HUP signal to reload the JSON configuration. This will reload the appropriate JSON configuration file found and update the trust groups and fan definitions configured. Tested: JSON configuration is reloaded and updated after SIGHUP Single instance of trust groups exist that match the JSON config Single instance of fan definitions exist that match the JSON config Change-Id: If55ca583a67fd76f0733009707bd5c4b5eda3e63 Signed-off-by: Matthew Barth <msbarth@us.ibm.com>
show more ...
|
#
c95c527a |
| 15-Jun-2020 |
Matthew Barth <msbarth@us.ibm.com> |
monitor:SIGHUP: Create and use system object Use a system object to handle retrieving the trust groups and fan definitions configured. This is necessary for handling HUP signals in t
monitor:SIGHUP: Create and use system object Use a system object to handle retrieving the trust groups and fan definitions configured. This is necessary for handling HUP signals in the future where a reload of the JSON configuration is done. Tested: No change in the loading of the trust groups configuration No change in the loading of the fan definitions configured Change-Id: I5df2d54641f80778bbf09d7b1f4588a458e11c71 Signed-off-by: Matthew Barth <msbarth@us.ibm.com>
show more ...
|