History log of /openbmc/phosphor-fan-presence/monitor/system.hpp (Results 1 – 18 of 18)
Revision Date Author Comments
# 4f472a86 26-Aug-2022 Matt Spinler <spinler@us.ibm.com>

monitor: Use USR1 signal to dump debug data

Similar to what fan control is already doing, this commit adds a handler
for the USR1 signal to write debug data to /tmp/fan_monitor_dump.json.
The data b

monitor: Use USR1 signal to dump debug data

Similar to what fan control is already doing, this commit adds a handler
for the USR1 signal to write debug data to /tmp/fan_monitor_dump.json.
The data being written is the same data saved in an event log - the
current sensor status plus any of the Logger class's logs.

Example output, which shows fan0 recovering from previous faults:
{
"logs": [
...
[
"Aug 26 17:04:47",
"Setting tach sensor /xyz/openbmc_project/sensors/fan_tach/fan0_0 functional state to false. [target = 18000, input = 3446, allowed range = (10600 - NoMax) owned = true]"
],
[
"Aug 26 17:04:47",
"Starting shutdown action 'EPOW Power Off: 60s/60s' due to cause '2 Nonfunctional Fan Rotors'"
],
[
"Aug 26 17:04:47",
"Action EPOW Power Off: 60s/60s: Starting service mode timer"
],
[
"Aug 26 17:04:47",
"Creating event log for faulted fan /xyz/openbmc_project/inventory/system/chassis/motherboard/fan0 sensor /xyz/openbmc_project/sensors/fan_tach/fan0_0"
]
],
"sensors": {
"sensors": {
"/xyz/openbmc_project/sensors/fan_tach/fan0_0": {
"functional": false,
"in_range": true,
"present": true,
"prev_tachs": "[11829,11867,11829,11867,11829,11867,11718,11467]",
"prev_targets": "[18000,9000,9040,10320,0,0,0,0]",
"tach": 11829.0,
"target": 18000,
"ticks": 18
},
"/xyz/openbmc_project/sensors/fan_tach/fan0_1": {
"functional": false,
"in_range": true,
"present": true,
"prev_tachs": "[17857,17772,17857,17772,17201,17045,16741,16375]",
"tach": 17857.0,
"ticks": 20
},
"/xyz/openbmc_project/sensors/fan_tach/fan1_0": {
"functional": true,
"in_range": true,
"present": true,
"prev_tachs": "[11755,11792,11755,11792,11755,11792,11755,11792]",
"prev_targets": "[18000,9000,9040,10320,0,0,0,0]",
"tach": 11755.0,
"target": 18000,
"ticks": 0
},
...
}
}
}

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I84179f78ec83ca6bab788052d0bebe677c1fd29f

show more ...


# cb356d48 22-Jul-2022 Patrick Williams <patrick@stwcx.xyz>

sdbusplus: use shorter type aliases

The sdbusplus headers provide shortened aliases for many types.
Switch to using them to provide better code clarity and shorter
lines. Possible replacements are

sdbusplus: use shorter type aliases

The sdbusplus headers provide shortened aliases for many types.
Switch to using them to provide better code clarity and shorter
lines. Possible replacements are for:
* bus_t
* exception_t
* manager_t
* match_t
* message_t
* object_t
* slot_t

Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
Change-Id: I9029cc722e7712633c15436bd3868d8c3209f567

show more ...


# 683a96c6 27-Apr-2022 Mike Capps <mikepcapps@gmail.com>

monitor: Capture BMC dumps on fan/ambient shutdowns

When fan-monitor or sensor-monitor generates an EPOW, this change
creates a BMC dump after the system is powered off and all error logs
are create

monitor: Capture BMC dumps on fan/ambient shutdowns

When fan-monitor or sensor-monitor generates an EPOW, this change
creates a BMC dump after the system is powered off and all error logs
are created.

Change-Id: Iacdd2d2b388e79988e2536d52497f0e697e1d444
Signed-off-by: Mike Capps <mikepcapps@gmail.com>

show more ...


# b4379a1e 11-Oct-2021 Mike Capps <mikepcapps@gmail.com>

Monitor : handle inventory service offline

Using nameHasOwner and nameOwnerChanged D-Bus signals, a callback is
activated when inventory is started.

There are two primary modes for operation: Compa

Monitor : handle inventory service offline

Using nameHasOwner and nameOwnerChanged D-Bus signals, a callback is
activated when inventory is started.

There are two primary modes for operation: Compatible Interfaces, the
inventory-detection callback will fail, however start() will be called a
second time after EntityManager starts and forces a reload of the proper
config for the machine type. Separately, if no EntityManager exists,
then the callback for Inventory-detection will succeed and use the
default configuration file.

To test: stop fan monitor and inventory services. start monitor, wait
10s, start Inventory, after about 15s you should see the online
detection.

Signed-off-by: Mike Capps <mikepcapps@gmail.com>
Change-Id: I289493a0aabb849abee8ce8de047513e94ee2219

show more ...


# 25f0327e 13-Sep-2021 Mike Capps <mikepcapps@gmail.com>

Monitor: Support hwmon service offline during startup

It is possible for fan-monitor to startup before the Hwmonitor service,
causing unhandled exceptions that block system initialization. This fix

Monitor: Support hwmon service offline during startup

It is possible for fan-monitor to startup before the Hwmonitor service,
causing unhandled exceptions that block system initialization. This fix
catches the exception until a proper hwmon presence detector is
deployed.

If the exception is caught, this code change forces a re-subscription
during the poweron event to ensure tach sensors will receive published
updates upon resumption of the hwmon service.

Signed-off-by: Mike Capps <mikepcapps@gmail.com>
Change-Id: I8e696e747c432d7a6f696c5ccd9dab73abf7708f

show more ...


# fdcd5db3 20-May-2021 Mike Capps <mikepcapps@gmail.com>

monitor: Subscribe to tach target and feedback services

Subscribes to nameOwnerChanged signals for the services of the sensor
and target interfaces for each configured fan. If those services go
offl

monitor: Subscribe to tach target and feedback services

Subscribes to nameOwnerChanged signals for the services of the sensor
and target interfaces for each configured fan. If those services go
offline, the fan tach sensors should get marked nonfunctional due to no
longer receiving updated target or feedback values. In this design, we
use the existing method of determining when a fan tach sensor should be
marked nonfunctional to allow a recovery window, wherein a brief
offline/online transition (such as during a restart) will not trigger a
nonfunctional state change.

Change-Id: I0a935ccad5a864dc952d023185356a1ef1226830
Signed-off-by: Mike Capps <mikepcapps@gmail.com>

show more ...


# bb449c1c 14-Jun-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Shut down if no readings at power on

If there are no tach sensors on D-Bus when the power state changes to
on, then create an event log and shut down the system. This is done
because in th

monitor: Shut down if no readings at power on

If there are no tach sensors on D-Bus when the power state changes to
on, then create an event log and shut down the system. This is done
because in this case the code is not able to know the fan state - if
there are any present or spinning.

The most likely reason there are no sensors (aside from a glaring error
in the config file) is because the fan controller device driver failed
its probe and was unable to detect it, maybe because the device didn't
have power or there was an I2C problem. To aid in root cause analysis
if this were to occur in the field, the code adds the following FFDC
(First Failure Data Capture) to the event log:

* All of the loaded hwmon drivers, taken from /sys/class/hwmon/*/name
* Failure related lines in dmesg, which is where driver errors would
show up.

Tested: Unbound the fan device driver and then powered on the system.
Also disabled I2C to the fan controller device in simulation and tried a
power on.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Ic0b80d67ec79c9401f59324fe1134ff12084112a

show more ...


# 823bc49e 21-Jun-2021 Matthew Barth <msbarth@us.ibm.com>

monitor: Use new JsonConfig object

To simplify handling the loading of config files, use the updated
JsonConfig object that populates the available compatibility values used
when retrieving the JSON

monitor: Use new JsonConfig object

To simplify handling the loading of config files, use the updated
JsonConfig object that populates the available compatibility values used
when retrieving the JSON file and loading it. The given load function is
called if compatibility values are found upon being constructed or after
an interfacesAdded signal is received, which then it can call
`getConfFile` to find the JSON config file to be loaded.

Change-Id: Ifc164d36c036cf0ff810018d40e8de52efc6ca58
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...


# 4283c5d5 01-Mar-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Allow missing D-Bus sensors on startup

Now that phosphor-fan-monitor is starting at the multi-user target, it
may be starting before the fan sensor hwmon daemon is able to put the
tach read

monitor: Allow missing D-Bus sensors on startup

Now that phosphor-fan-monitor is starting at the multi-user target, it
may be starting before the fan sensor hwmon daemon is able to put the
tach reading sensors on D-Bus. This was causing the TachSensor class
objects to not get created so even if the hwmon tach sensor values did
show up later on D-Bus fan monitor wouldn't notice them.

To fix this, still create the TachSensor objects if the corresponding
hwmon D-Bus objects aren't there, and still set them to functional in
the inventory so that any other monitoring code, such as
phosphor-dbus-monitor, won't shut down the system before the hwmon tach
sensors get a chance to show up on D-Bus, which was happening on
witherspoon when a reboot was done with the power on.

When the monitor delay timer expires to kick off monitoring, a D-Bus
read is forced, and if the hwmon sensors still aren't on D-Bus then the
corresponding TachSensor objects will be set to nonfunctional to start
down the error paths.

Also, when the power state changes to on, instead of blindly setting all
TachSensor objects to functional, again check if their hwmon sensor
values are on D-Bus before doing so.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I3e62727296630bf68602b0472328f4613e1a78e3

show more ...


# 7d135641 04-Feb-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Support for running with power off

Put in the remaining changes necessary so that fan monitor doesn't need
to be killed when power turns off.

This includes things like:
* Support for start

monitor: Support for running with power off

Put in the remaining changes necessary so that fan monitor doesn't need
to be killed when power turns off.

This includes things like:
* Support for starting before the Present property is on D-Bus.
* Support for starting before the config file name is available.
* Stopping any running timers when power is turned off.
* Checking the power off rules when power turns on.

Most, but not all, of the changes are common between the JSON and YAML
modes, but this only truly supported when compiled for JSON.

This also removes the init vs monitor modes of operation, if compiled
for JSON.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Ic2c6848f24511c9dc763227e05bbebb4c8c80cd1

show more ...


# c8d3c51f 06-Jan-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Add thermal fault alert D-Bus property

Add a new property to alert of a thermal fault. In this context, it
means an imminent power off due to fan faults. On certain IBM systems
it will be

monitor: Add thermal fault alert D-Bus property

Add a new property to alert of a thermal fault. In this context, it
means an imminent power off due to fan faults. On certain IBM systems
it will be used as a mechanism to alert the host of the power off when
the 'epow_power_off' power off rule is used.

Service: xyz.openbmc_project.Thermal.Alert
Path: /xyz/openbmc_project/alerts/thermal_fault_alert
Interface: xyz.openbmc_project.Object.Enable
Property: Enabled

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I0531de9ce40b6148244fda18a20e144bad85d830

show more ...


# ac1efc11 27-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Re-log fan error on a power off

In the case where a power off rule runs to completion and powers off the
system due to either missing or faulted fans, at the point of power off
re-post the

monitor: Re-log fan error on a power off

In the case where a power off rule runs to completion and powers off the
system due to either missing or faulted fans, at the point of power off
re-post the event log for the previous fan error.

This way, there can be an error associated with the power off, because
depending on the power off rule delays the original error could have
happened several minutes or more in the past.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I1a38062cf75ffd4a11baa417ef3983b6c1a47ada

show more ...


# 27f6b686 27-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Event logs for missing fans

This commit adds the code to create event logs calling out the fan when
it has been missing for a certain amount of time.

This is basically identical to the fun

monitor: Event logs for missing fans

This commit adds the code to create event logs calling out the fan when
it has been missing for a certain amount of time.

This is basically identical to the functionality that the fan presence
application in this repo provides, but with it in this application all
fan errors are created from the same place. This will become important
when there is a power off due to a fan missing and the error for that
needs to be re-committed at power off time so it can be shown as the
cause of the power off.

The functionality is configured in the JSON:

fan_missing_error_delay:
Defines the number of seconds a fan must be missing with power on before
an error will be created. If this isn't present in the JSON, then
errors will not be created at all.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I76de9d8d1bf6e283560b1ce46e70f84522e2d708

show more ...


# f13b42e2 26-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Event logs for nonfunc fan sensors

This commit adds the code to create event logs calling out the fan when
fan sensors have been nonfunctional for a certain amount of time.

This functional

monitor: Event logs for nonfunc fan sensors

This commit adds the code to create event logs calling out the fan when
fan sensors have been nonfunctional for a certain amount of time.

This functionality is configured in the JSON, and will only be enabled
if the 'fault_handling' JSON section is present. It uses the following
new JSON parameters:

nonfunc_rotor_error_delay (per fan):
This says how many seconds a fan sensor must be nonfunctional before the
event log will be created.

num_nonfunc_rotors_before_error (under fault_handling):
This specifies how many nonfunctional fan rotors there must be at the
same time before an event log with an error severity is created for the
rotor. When there are fewer than this many nonfunctional rotors, then
event logs with an informational severity will be created.

A new FanError class is used to create the event logs. It adds the
Logger output as FFDC, plus any JSON data that is passed in with the
commit() API. It uses CALLOUT_INVENTORY_PATH in the AdditionalData
property to specify the faulted fan FRU.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I365114357580b4f38ec943a769c1ce7f695b51ab

show more ...


# e892e39a 14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Start checking power off rules

In the system object, load the power off rules and start checking them.
It will check them in the following cases (if power is on):
* When the object is const

monitor: Start checking power off rules

In the system object, load the power off rules and start checking them.
It will check them in the following cases (if power is on):
* When the object is constructed
* When the JSON config is reloaded
* When fan presence or sensor functional state changes
* When the power state changes to on

When the power is turned off, it will cancel any running rules.

Previously, fan monitor was only designed to run with power on, and
there still may be more changes than just the ones added here to support
it always running.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I8be81612ae4997d7568678471ac0f6f854a0e758

show more ...


# b63aa09e 14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Track fan health in the System object

To prepare for being able to power off the system based on missing fans
or nonfunctional fan sensors, put a global view of this health for all
fans in

monitor: Track fan health in the System object

To prepare for being able to power off the system based on missing fans
or nonfunctional fan sensors, put a global view of this health for all
fans in the System object. This requires now keeping track of fan
presence.

This information is stored in a map based on the fan name. It is done
this way, as opposed to just always calling present/functional APIs on
the Fan objects, so that the code that will be using this information
can be tested in isolation without the System or Fan objects.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Ieb1d4003bd13cebc806fd06f0064c63ea8ac6180

show more ...


# d06905c9 12-Jun-2020 Matthew Barth <msbarth@us.ibm.com>

monitor:SIGHUP: Handle reloading JSON config thru SIGHUP

Enable capturing the HUP signal to reload the JSON configuration. This
will reload the appropriate JSON configuration file found and update t

monitor:SIGHUP: Handle reloading JSON config thru SIGHUP

Enable capturing the HUP signal to reload the JSON configuration. This
will reload the appropriate JSON configuration file found and update the
trust groups and fan definitions configured.

Tested:
JSON configuration is reloaded and updated after SIGHUP
Single instance of trust groups exist that match the JSON config
Single instance of fan definitions exist that match the JSON config

Change-Id: If55ca583a67fd76f0733009707bd5c4b5eda3e63
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...


# c95c527a 15-Jun-2020 Matthew Barth <msbarth@us.ibm.com>

monitor:SIGHUP: Create and use system object

Use a system object to handle retrieving the trust groups and fan
definitions configured. This is necessary for handling HUP signals in
the future where

monitor:SIGHUP: Create and use system object

Use a system object to handle retrieving the trust groups and fan
definitions configured. This is necessary for handling HUP signals in
the future where a reload of the JSON configuration is done.

Tested:
No change in the loading of the trust groups configuration
No change in the loading of the fan definitions configured

Change-Id: I5df2d54641f80778bbf09d7b1f4588a458e11c71
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...