History log of /openbmc/phosphor-fan-presence/monitor/ (Results 101 – 125 of 197)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
ba3ee9ae06-Jan-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Fill in EpowPowerOff action

This action does the following:

1) Starts a service mode timer, which would allow the system to be
serviced before anything happens.
2) On the expiration of

monitor: Fill in EpowPowerOff action

This action does the following:

1) Starts a service mode timer, which would allow the system to be
serviced before anything happens.
2) On the expiration of that timer, it will:
a) Set the thermal fault alert D-Bus property. This will be used
to send an EPOW alert to the host on IBM systems.
b) Start the meltdown timer.
3) On the expiration of the meltdown timer, a hard power off will
occur. This timer cannot be canceled even if fans start behaving.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I9434699b816b23b68c6d9d1e97283b4ab9befe4f

show more ...

c8d3c51f06-Jan-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Add thermal fault alert D-Bus property

Add a new property to alert of a thermal fault. In this context, it
means an imminent power off due to fan faults. On certain IBM systems
it will be

monitor: Add thermal fault alert D-Bus property

Add a new property to alert of a thermal fault. In this context, it
means an imminent power off due to fan faults. On certain IBM systems
it will be used as a mechanism to alert the host of the power off when
the 'epow_power_off' power off rule is used.

Service: xyz.openbmc_project.Thermal.Alert
Path: /xyz/openbmc_project/alerts/thermal_fault_alert
Interface: xyz.openbmc_project.Object.Enable
Property: Enabled

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I0531de9ce40b6148244fda18a20e144bad85d830

show more ...

c4bed6b806-Jan-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Remove _active from PowerOffAction

It isn't used anywhere.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I4697a3ff775206501b1e000b8ce14de7637453b4

b92aa3bf06-Jan-2021 Matt Spinler <spinler@us.ibm.com>

monitor: Change power off rule trace order

When starting a power off action, trace it before starting it so if the
action traces something too this trace comes first.

Signed-off-by: Matt Spinler <s

monitor: Change power off rule trace order

When starting a power off action, trace it before starting it so if the
action traces something too this trace comes first.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Iae7a196422e9c629098e587e31cb01f2f15eabb3

show more ...

69f2f48e20-Oct-2020 Jolie Ku <jolie_ku@wistron.com>

monitor: Add up/down count fault detection

Create an up/down count fault determination algorithm that
could be used in place of the current timer based outOfRange()
function.
The up/down count is a

monitor: Add up/down count fault detection

Create an up/down count fault determination algorithm that
could be used in place of the current timer based outOfRange()
function.
The up/down count is a different method for determining when
a fan is faulted by counting up each iteration a rotor is
out of spec and removing those counts when the rotor
returns within spec.

Tested:
1. Remove a fan and run Mihawk, the counter add 1 when sensor
is out of spec, and replaced the fan back before hit the
threshold, the counter decrement back to 0.
2. Remove a fan, counter add 1 and mark the removed fan as
nonfunctional when counter reaches the threshold, and
Replaced the fan back, counter will decrement back to 0
and fan back to functional.

Change-Id: I632dd2c7553b007beb7ae6bb694a590d2cfc2a1c
Signed-off-by: Jolie Ku <jolie_ku@wistron.com>
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...

12b3201027-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Add a README

Add a README.md for fan monitor that provides a high level overview of
what it does.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Id13ee104005d7328e3ba3102cf6d6

monitor: Add a README

Add a README.md for fan monitor that provides a high level overview of
what it does.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Id13ee104005d7328e3ba3102cf6d6f32ee3a1f78

show more ...

ac1efc1127-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Re-log fan error on a power off

In the case where a power off rule runs to completion and powers off the
system due to either missing or faulted fans, at the point of power off
re-post the

monitor: Re-log fan error on a power off

In the case where a power off rule runs to completion and powers off the
system due to either missing or faulted fans, at the point of power off
re-post the event log for the previous fan error.

This way, there can be an error associated with the power off, because
depending on the power off rule delays the original error could have
happened several minutes or more in the past.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I1a38062cf75ffd4a11baa417ef3983b6c1a47ada

show more ...

27f6b68627-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Event logs for missing fans

This commit adds the code to create event logs calling out the fan when
it has been missing for a certain amount of time.

This is basically identical to the fun

monitor: Event logs for missing fans

This commit adds the code to create event logs calling out the fan when
it has been missing for a certain amount of time.

This is basically identical to the functionality that the fan presence
application in this repo provides, but with it in this application all
fan errors are created from the same place. This will become important
when there is a power off due to a fan missing and the error for that
needs to be re-committed at power off time so it can be shown as the
cause of the power off.

The functionality is configured in the JSON:

fan_missing_error_delay:
Defines the number of seconds a fan must be missing with power on before
an error will be created. If this isn't present in the JSON, then
errors will not be created at all.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I76de9d8d1bf6e283560b1ce46e70f84522e2d708

show more ...

f13b42e226-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Event logs for nonfunc fan sensors

This commit adds the code to create event logs calling out the fan when
fan sensors have been nonfunctional for a certain amount of time.

This functional

monitor: Event logs for nonfunc fan sensors

This commit adds the code to create event logs calling out the fan when
fan sensors have been nonfunctional for a certain amount of time.

This functionality is configured in the JSON, and will only be enabled
if the 'fault_handling' JSON section is present. It uses the following
new JSON parameters:

nonfunc_rotor_error_delay (per fan):
This says how many seconds a fan sensor must be nonfunctional before the
event log will be created.

num_nonfunc_rotors_before_error (under fault_handling):
This specifies how many nonfunctional fan rotors there must be at the
same time before an event log with an error severity is created for the
rotor. When there are fewer than this many nonfunctional rotors, then
event logs with an informational severity will be created.

A new FanError class is used to create the event logs. It adds the
Logger output as FFDC, plus any JSON data that is passed in with the
commit() API. It uses CALLOUT_INVENTORY_PATH in the AdditionalData
property to specify the faulted fan FRU.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I365114357580b4f38ec943a769c1ce7f695b51ab

show more ...

ae1f8efe14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Allowing ignoring fan FRU func status

Make the 'num_sensors_nonfunc_for_fan_nonfunc' JSON entry be optional,
and if it isn't present then don't set the parent fan FRU inventory
object funct

monitor: Allowing ignoring fan FRU func status

Make the 'num_sensors_nonfunc_for_fan_nonfunc' JSON entry be optional,
and if it isn't present then don't set the parent fan FRU inventory
object functional state when the tach sensor functional states change.

This is necessary because on some systems some other entity will be
managing the FRU level functional state.

This also adds a trace when the tach sensor functional state changes,
since if the FRU functional state updating is turned off then the
existing traces won't appear.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I1be9cc335c15a78d342e2e7ea4e5108a66d29de3

show more ...

e892e39a14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Start checking power off rules

In the system object, load the power off rules and start checking them.
It will check them in the following cases (if power is on):
* When the object is const

monitor: Start checking power off rules

In the system object, load the power off rules and start checking them.
It will check them in the following cases (if power is on):
* When the object is constructed
* When the JSON config is reloaded
* When fan presence or sensor functional state changes
* When the power state changes to on

When the power is turned off, it will cancel any running rules.

Previously, fan monitor was only designed to run with power on, and
there still may be more changes than just the ones added here to support
it always running.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I8be81612ae4997d7568678471ac0f6f854a0e758

show more ...

f06ab07c14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Create PowerOffRules class

This class contains a PowerOffCause and a PowerOffAction. It provides a
check() method that takes the FanHealth map which it then checks against
the cause. If t

monitor: Create PowerOffRules class

This class contains a PowerOffCause and a PowerOffAction. It provides a
check() method that takes the FanHealth map which it then checks against
the cause. If the cause is satisfied, it then starts the power off
action. It provides a cancel method that will force cancel a running
action in the case that the object owner detects a system power off and
so doesn't need to run this power off anymore.

The class's configuration data is read from the JSON config file.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I5c0c168591d6d62c894c4d036ec762797fd759af

show more ...

69b0cf0814-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Create PowerOffAction class hierarchy

The PowerOffAction base class and its derived classes will be used to
power off a system due to fan failures.

There are 3 types of power offs:
1. Hard

monitor: Create PowerOffAction class hierarchy

The PowerOffAction base class and its derived classes will be used to
power off a system due to fan failures.

There are 3 types of power offs:
1. HardPowerOff - Do a hard power off after a delay
2. SoftPowerOff - Do a soft power off after a delay
3. EpowPowerOff - This isn't fully defined yet, but it will involve
powering off after setting an early power off warning
somehow and then waiting through 2 delays.

The code that makes the D-Bus calls to do the power offs is in a
standalone class so that it can be be mocked in testcases.

This code also makes use of the Logger class for logging, so this commit
brings that in as a singleton.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I83118963df4ec0b4f89619572f6935329eec3adb

show more ...

0023743914-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Create PowerOffCause class hierarchy

The PowerOffCause base class and its derived classes will be used to
determine when a power off needs to be done based on fan failures.

The 'satisified

monitor: Create PowerOffCause class hierarchy

The PowerOffCause base class and its derived classes will be used to
determine when a power off needs to be done based on fan failures.

The 'satisified()' method, which takes the fan health map, is used to
say if the cause is satisfied and a shut down will need to occur.

It provides two types of causes:
* MissingFanFRUCause - Looks at missing fan FRUs
* NonfuncFanRotorCause - Looks at nonfunctional rotors (sensors)

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I3c43347782dc559eb7c7441bf9c03d3407b248e2

show more ...

b63aa09e14-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Track fan health in the System object

To prepare for being able to power off the system based on missing fans
or nonfunctional fan sensors, put a global view of this health for all
fans in

monitor: Track fan health in the System object

To prepare for being able to power off the system based on missing fans
or nonfunctional fan sensors, put a global view of this health for all
fans in the System object. This requires now keeping track of fan
presence.

This information is stored in a map based on the fan name. It is done
this way, as opposed to just always calling present/functional APIs on
the Fan objects, so that the code that will be using this information
can be tested in isolation without the System or Fan objects.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: Ieb1d4003bd13cebc806fd06f0064c63ea8ac6180

show more ...

b0412d0712-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Use only init mode when using JSON

Fan monitor is currently split into 2 modes - 'init' which is used
right after a power on, and 'monitor', which is used later after the
fans-ready target

monitor: Use only init mode when using JSON

Fan monitor is currently split into 2 modes - 'init' which is used
right after a power on, and 'monitor', which is used later after the
fans-ready target is started. Normally, the 'init' mode just sets the
fans to functional and then exits, and the real monitoring work is done
in the 'monitor' mode.

In the future this application will need to be able to check for fan
problems as soon as it starts up after power on so that it can handle
shutting down due to missing fans. To prepare for this, move all
functionality into the init mode, and just exit immediately when called
to run in the monitor mode. Only do this when compiled to use the JSON
configuration, as this is new and I don't want to change how the
existing YAML setups work.

This also creates a new 'monitor_start_delay' entry in the JSON to say
how long to wait after startup before actually doing any sensor
monitoring, which then gives the same behavior as how the monitor mode
would delay by waiting for the fan control ready target, which itself is
started by fan control --init after a hardcoded delay. This field is
optional to preserve backwards compatibility and defaults to 0s.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I623a233f50233e734f50cd9e80139c60467518d8

show more ...

3220350f05-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Add fault config JSON documentation

Document the entries in fan monitor's JSON config file that relate to
fault handling. This deals with when to create errors against faulted
fans, and wh

monitor: Add fault config JSON documentation

Document the entries in fan monitor's JSON config file that relate to
fault handling. This deals with when to create errors against faulted
fans, and when and how to power off the system based on faulted and/or
missing fans.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I814e3d16df5fc4ed268fa92a8cca47747b7d57e9

show more ...

5d08322905-Oct-2020 Matt Spinler <spinler@us.ibm.com>

monitor: Add JSON documentation

Add a markdown file to document the fields in the fan monitor JSON
configuration file. A few fields have TODO placeholders with the intent
they will be filled in lat

monitor: Add JSON documentation

Add a markdown file to document the fields in the fan monitor JSON
configuration file. A few fields have TODO placeholders with the intent
they will be filled in later.

A placeholder for a new fault handling configuration JSON section was
also included. This section will eventually describe the configuration
of how the fan monitor application will handle creating event logs and
shutting down the system due to missing or faulted fans.

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Change-Id: I35c242372225310d25c063d36e948433dd9c6c4c

show more ...

5d564a9f22-Oct-2020 Jolie Ku <jolie_ku@wistron.com>

monitor: Use the number of failed tach sensors at startup

When marking a fan nonfunctional due to its tach sensors failing
to be read from dbus, a check against the configured number of
sensors that

monitor: Use the number of failed tach sensors at startup

When marking a fan nonfunctional due to its tach sensors failing
to be read from dbus, a check against the configured number of
sensors that should result in the fan being marked as nonfunctional
should be checked.

Tested:
Run phosphor-fan-monitor --monitor in witherspoon qemu,
when the number of failed tach sensor is larger than the
configured _numSensorFailsForNonFunc then mark the
associated fan as nonfunctional.

Change-Id: I6ff97b9aae4279d6ce402d3aecda087d45dfa318
Signed-off-by: Jolie Ku <jolie_ku@wistron.com>

show more ...

a7aed01706-Oct-2020 Jay Meyer <jaymeyer@us.ibm.com>

monitor: journal message for fan Actual Speed wrong

Problem: Actual speed is formatted with the wrong format type.
Solution: By using the format library, it is not necessary to specify a
format for

monitor: journal message for fan Actual Speed wrong

Problem: Actual speed is formatted with the wrong format type.
Solution: By using the format library, it is not necessary to specify a
format for the result, and speed is correctly displayed.

Tested: Ran with simulation.
In terminal connected to simiulator:
systemctl disable
phosphor-dbus-monitor.service obmcutil poweron

changed the fan speed in /sys/class/hwmon/hwmon9/fan1_target using an
echo command:
echo 8000 > fan1_target

After jrnl showed the fan had been disabled, setting fan to
nonfunctional showed expected speed:
"Setting fan /system/chassis/motherboard/fan5 to nonfunctional Sensor:
/xyz/openbmc_project/sensors/fan_tach/fan5_0 Actual speed: 8000.0 Target speed:
11200"

Changed the fan speed back:
echo 11200 > fan1_target

journal entry for setting fan back to functional was seen.
"Setting fan /system/chassis/motherboard/fan2 back to functional"

Signed-off-by: Jay Meyer <jaymeyer@us.ibm.com>
Change-Id: I26bf717694ff8a60851dde1a5052945e4336dfa0

show more ...

4c3c24f808-Sep-2020 Jolie Ku <jolie_ku@wistron.com>

monitor: Mark a fan with a missing dbus sensor as nonfunctional

When fan monitor starts up and retrieves the tach feedback
sensor values from dbus, the associated fan should be marked
nonfunctional

monitor: Mark a fan with a missing dbus sensor as nonfunctional

When fan monitor starts up and retrieves the tach feedback
sensor values from dbus, the associated fan should be marked
nonfunctional when the sensor value is not found on dbus.

Tested:
run phosphor-fan-monitor --monitor will mark missing
tach sensor and associated fan as non-functional upon
poweron in witherspoon qemu

Change-Id: I3be24504223d3bd9efe8c4306548d6cca93d8224
Signed-off-by: Jolie Ku <jolie_ku@wistron.com>

show more ...

1826c73028-Aug-2020 Matthew Barth <msbarth@us.ibm.com>

format: Include format lib and use on errors opening JSON files

Included the format library used to add more details to the journal
message without needing the verbose output and updated the journal

format: Include format lib and use on errors opening JSON files

Included the format library used to add more details to the journal
message without needing the verbose output and updated the journal
logging when loading a JSON file. When loading a JSON file, now any
errors will produce a journal message atleast containing the JSON file
that failed to be loaded.

Tested:
Removed JSON configuration file and attempted to load it
Journal msg shows which JSON configuration file is loaded now
Failure to parse JSON shows file and exception in journal msg

Change-Id: I6bec9bb01d8e95c3dced467ea96163129c59619b
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...

0891e3b313-Aug-2020 Matthew Barth <msbarth@us.ibm.com>

monitor: Tach input to double

It was found that after the Value property on the Sensor.Value interface
was changed to be of double type, the tach input failed to get converted
correctly from double

monitor: Tach input to double

It was found that after the Value property on the Sensor.Value interface
was changed to be of double type, the tach input failed to get converted
correctly from double to int64 type when the property changed. Need to
set the tach input to be of double type explicitly.

Tested:
Tach input values correctly read from dbus signal

Change-Id: I718375d5de50a88bcfaf8ff419e71f732d0b8a65
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

show more ...

8a0c232717-Jun-2020 Matthew Barth <msbarth@us.ibm.com>

monitor: `optional` no longer experimental

Change-Id: I29e4fa5cfdf5cefe1af548fd5af2a54d08682a11
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

fbe86eee17-Jun-2020 Matthew Barth <msbarth@us.ibm.com>

monitor: Remove never used logging include from main

Change-Id: I2afd45c822033e81eb9a6cd79aeab33136b51179
Signed-off-by: Matthew Barth <msbarth@us.ibm.com>

12345678