2a940011 | 25-Jun-2023 |
Jian Zhang <zhangjian.3032@bytedance.com> |
misc: Move to steady_timer
We found that the system clock can affect the deadline timer, causing the timer to be cancelled if it does not meet expectations, so we replaced it with steady_timer here,
misc: Move to steady_timer
We found that the system clock can affect the deadline timer, causing the timer to be cancelled if it does not meet expectations, so we replaced it with steady_timer here, which many repositories have already done, and this should have no other effect.
Change-Id: Id0d514100cdedc718c4e9dfc328bae8476aa1aeb Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
show more ...
|
658d70aa | 10-May-2023 |
Patrick Williams <patrick@stwcx.xyz> |
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest
clang-format: copy latest and re-format
clang-format-16 has some backwards incompatible changes that require additional settings for best compatibility and re-running the formatter. Copy the latest .clang-format from the docs repository and reformat the repository.
Change-Id: Icdb911b82f317b75b8f147d9e8e0599131c7e601 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
67d40593 | 12-Apr-2023 |
Patrick Williams <patrick@stwcx.xyz> |
meson: remove deprecated get_pkgconfig_variable
Since meson 0.56, the `get_pkgconfig_variable` has been deprecated. In meson 0.58 the `get_variable` was enhanced to no longer require the `pkgconfig
meson: remove deprecated get_pkgconfig_variable
Since meson 0.56, the `get_pkgconfig_variable` has been deprecated. In meson 0.58 the `get_variable` was enhanced to no longer require the `pkgconfig` keyword argument. Ensure meson 0.58 is required and update the usage of all `get_pkgconfig_variable` and `get_variable` to be the modern variant.
Change-Id: I9daae807312503e2ba42b6cae0417c54621db394 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
ea0c9bb8 | 16-Mar-2023 |
Sui Chen <suichen@google.com> |
Add "Restart=always" to the systemd unit file
The health-monitor daemon is not one-shot and should restart when it gets terminated, so adding "Restart=always" to the systemd unit file.
Tested: Befo
Add "Restart=always" to the systemd unit file
The health-monitor daemon is not one-shot and should restart when it gets terminated, so adding "Restart=always" to the systemd unit file.
Tested: Before this change: root@bmc:~# pidof health-monitor 310 root@bmc:~# kill -9 310 root@bmc:~# pidof health-monitor root@bmc:~# # (Does not restart)
After this change: root@bmc:~# pidof health-monitor 12839 root@bmc:~# kill -9 12839 root@bmc:~# pidof health-monitor 12904
Signed-off-By: Sui Chen <suichen@google.com> Change-Id: I7a2402bdeb2de369cbaac1dd7c698812948a7003
show more ...
|
a19c6fb2 | 06-Mar-2023 |
Ed Tanous <edtanous@google.com> |
Move to boost::asio::post
This allows entity_manager to compile with BOOST_ASIO_NO_DEPRECATED set. It was functionally changed a few years ago, and is identical to the other behavior.
Change-Id: Ib
Move to boost::asio::post
This allows entity_manager to compile with BOOST_ASIO_NO_DEPRECATED set. It was functionally changed a few years ago, and is identical to the other behavior.
Change-Id: Ib8125fcf9bdddf0bafd46118c4e764f7f83a0c2d Signed-off-by: Ed Tanous <edtanous@google.com>
show more ...
|
ec6601d1 | 09-Jan-2023 |
Sui Chen <suichen@google.com> |
Fix CPU utilization calculation
Currently, CPU utilization is calculated by (activeTimeDiff / (activeTimeDiff + idleTimeDiff)), where idleTimeDiff is defined as "idle time + IO wait time". The idleT
Fix CPU utilization calculation
Currently, CPU utilization is calculated by (activeTimeDiff / (activeTimeDiff + idleTimeDiff)), where idleTimeDiff is defined as "idle time + IO wait time". The idleTimeDiff term is incorrect -- it should be "everything else". As of current, one can have "kernel utilization", "userspace utilization" and "overall utilization" reach "100%" simultaneously which is does not make sense.
This change calculates CPU usage as follows: * Kernel-space usage is "kernel time delta" / "total time delta". * Userspace usage is "userspace delta" / "total time delta". * Overall usage delta is "(kernel time delta + userspace delta)" / "total time delta".
Tested: Compile the "Explicit sampling version of the SmallPT path tracer" (https://www.kevinbeason.com/smallpt/explicit.cpp) as a workload, and run two copies of it on the BMC to fully stress the CPU cores. (Alternatively, any benchmark program can fulfill this purpose, but for this one, I understand what it does and know it's compute-bound.)
One can see from `htop`, both CPU cores are almost 100% occupied. Most (around 90%) CPU time is spent in user-space. Remainder of the CPU usage is attributable to other tasks and background processing.
When one checks the `/xyz/openbmc_project/sensors/utilization/CPU_Kernel` and `/xyz/openbmc_project/sensors/utilization/CPU_User` objects, one can see CPU_User reading ramp up and reach around 90%. CPU_Kernel stabilizes at 10%. When `smallpt_explicit` is terminated, kernel and userspace CPU usage re-converge to their normal values.
Change-Id: I7c0e10e08bd2b6c8b3bd1c1a618fffb2739feecc Signed-off-By: Sui Chen <suichen@google.com> Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
f7406df0 | 06-Dec-2022 |
Patrick Williams <patrick@stwcx.xyz> |
prettier: reformat with org-wide settings
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I29a6b362713a04d53e1e0cc14c3316ebd1023ce6 |
af109947 | 22-Nov-2022 |
Nan Zhou <nanzhoumails@gmail.com> |
sensors: change object_manager path
As per PDI, all sensor implementation shall put object manager at `/xyz/openbmc_project/sensors`. [1] https://github.com/openbmc/phosphor-dbus-interfaces/blob/mas
sensors: change object_manager path
As per PDI, all sensor implementation shall put object manager at `/xyz/openbmc_project/sensors`. [1] https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml#L20
Tested: on hardware, ipmitool sdr elist is working. ``` CPU | 36h | ok | 0.1 | CPU_Kernel | 37h | ok | 0.1 | CPU_User | 38h | ok | 0.1 | Memory_Available | 39h | ok | 0.1 | Storage_RW | 3Ah | ok | 0.1 | ```
Signed-off-by: Nan Zhou <nanzhoumails@gmail.com> Change-Id: Ia9b21eebf37cb22cfa4bc19f2369423ea4d5c035
show more ...
|
9ca00458 | 26-Nov-2022 |
Patrick Williams <patrick@stwcx.xyz> |
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are for: * bus_t * exception_t * manager_t * match_t * message_t * object_t * slot_t
Change-Id: Ie38db6cd54835ce4f3a4eae520e9ad834e0751c1 Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
show more ...
|
b6f22cf7 | 22-Nov-2022 |
Jayashree Dhanapal <jayashree-d@hcl.com> |
Remove initial values for Health sensor
For the first 120 (N - Window size), the monitoring is ignored and stored the default values for health sensor.
As in the boot up, CPU utilization may be hig
Remove initial values for Health sensor
For the first 120 (N - Window size), the monitoring is ignored and stored the default values for health sensor.
As in the boot up, CPU utilization may be high because of all the applications started initializing. Therefore, removing initial values and monitoring the values for each health sensor till the window size and updating the value in D-Bus.
Tested : Tested and verified in Facebook YosemiteV2 platform.
Signed-off-by: Jayashree Dhanapal <jayashree-d@hcl.com> Change-Id: I60fc25219337607ec911c1f7f674303dcb8fb63c
show more ...
|
51bcfcb1 | 01-Nov-2021 |
Sui Chen <suichen@google.com> |
Add kernel and user CPU utilization
This adds two utilization sensors, "CPU_Kernel" and "CPU_User" that are populated with the fraction of CPU processing spent in kernel space and user space.
The i
Add kernel and user CPU utilization
This adds two utilization sensors, "CPU_Kernel" and "CPU_User" that are populated with the fraction of CPU processing spent in kernel space and user space.
The intended use case is BMCWeb can read those sensors from phosphor-health-monitor and report them in a RedFish response.
The RedFish resources are: * ManagerDiagnosticData.ProcessorStatistics.KernelPercent * ManagerDiagnosticData.ProcessorStatistics.UserPercent
Tested: Installed on a QEMU-emulated BMC bmc# busctl tree xyz.openbmc_project.HealthMon `-/xyz `-/xyz/openbmc_project `-/xyz/openbmc_project/sensors `-/xyz/openbmc_project/sensors/utilization |-/xyz/openbmc_project/sensors/utilization/CPU |-/xyz/openbmc_project/sensors/utilization/CPU_Kernel `-/xyz/openbmc_project/sensors/utilization/CPU_User
signed-off-by: Sui Chen <suichen@google.com> Change-Id: Ic6792e4c7cd8eba144eb9adec9366c1bc15b1a44
show more ...
|
517524a0 | 19-Dec-2021 |
Sui Chen <suichen@google.com> |
Change association tuple to "monitoring, monitored_by, bmc"
Per the discussion in 46081, we need a way to differentiate between "Host CPU usage" and "BMC CPU usage" for the DBus objects. Association
Change association tuple to "monitoring, monitored_by, bmc"
Per the discussion in 46081, we need a way to differentiate between "Host CPU usage" and "BMC CPU usage" for the DBus objects. Associations has been used previously for this purpose, so the association for the health utility "sensors" are changed accordingly for this purpose.
From existing systems, some examples of existing association edge names may include: - "chassis": from a DBus sensor to a Chassis/Board inventory - "all_sensors": the reverse of "chassis" - "inventory": from a voltage sensor to a board - "updateable", "active", "functional": from /software to a particular version - "software_version": reverse of the above three - "containedby": from a module to a module container - "contains": reverse of above
Considering existing association names and the "Requirements and Expectations for dbus interfaces" document, we name the association edges "monitoring" and "monitored_by". This association tuple is to be interpreted as "the utilization sensors are monitoring the current BMC; the current BMC is monitored by the utilization sensors".
Signed-off-by: Sui Chen <suichen@google.com> Change-Id: Ib0c634df83beacb7acac1dc7885ba47e58523a79
show more ...
|
3928a024 | 31-Aug-2022 |
Andrew Geissler <geissonator@yahoo.com> |
monitor size of /tmp
OpenBMC application use /tmp for a variety of reasons. If /tmp becomes full, then certain applications will start to fail (in very undefined ways).
/tmp uses a tmpfs filesystem
monitor size of /tmp
OpenBMC application use /tmp for a variety of reasons. If /tmp becomes full, then certain applications will start to fail (in very undefined ways).
/tmp uses a tmpfs filesystem that is backed by BMC memory. Just monitoring BMC memory is not enough though because by default, only 50% of memory is allowed to be used by /tmp.
This commit adds in a default to monitor /tmp and log to the journal if it sees /tmp exceed 85% usage. This will at least offer a clue to users when debugging issues.
Tested: - Tested as a part of the following commit: https://gerrit.openbmc.org/c/openbmc/openbmc/+/56065
Signed-off-by: Andrew Geissler <geissonator@yahoo.com> Change-Id: I1355f0d29a0b7a0840075561069d4ec7d3b27672
show more ...
|
b7d7bd5a | 22-Aug-2022 |
Potin Lai <potin.lai@quantatw.com> |
healthMonitor: use MemAvailable for memory utilzation calculation
We notice the memory utilization can easily reach 95% or higher when large files are being readed, and most memories are consumed by
healthMonitor: use MemAvailable for memory utilzation calculation
We notice the memory utilization can easily reach 95% or higher when large files are being readed, and most memories are consumed by filesystem as file cache.
In current memory utilizaion calculation, it calculate "total - free" as uesd memory size, which includes buff/cache as part of used memory.
Convert method to get MemAvailable and MemTotal by parsing from /proc/meminfo, then calculate "unavailable" memory percentage for memory utilization.
Tested results:
root@bletchley:~# free total used free shared buff/cache available Mem: 2066620 247320 59108 34412 1760192 1713028 Swap: 0 0 0
root@bletchley:~# busctl introspect xyz.openbmc_project.HealthMon \ > /xyz/openbmc_project/sensors/utilization/Memory \ > xyz.openbmc_project.Sensor.Value NAME TYPE SIGNATURE RESULT/VALUE FLAGS .MaxValue property d 100 emits-change writable .MinValue property d 0 emits-change writable .Unit property s "xyz.openbmc_project.Sensor.Value.Uni... emits-change writable .Value property d 14.9148 emits-change writable
Signed-off-by: Potin Lai <potin.lai@quantatw.com> Change-Id: Ib87edb313bcfbdae1306847babc4c7a5d96766f9
show more ...
|
c82e6164 | 02-Aug-2022 |
Potin Lai <potin.lai@quantatw.com> |
healthMonitor: wait until enough sensor reading in queue
Usually the BMC is very busy during startup and it is risky to fill up the entire queue with only one initial value.
In this patch, we wait
healthMonitor: wait until enough sensor reading in queue
Usually the BMC is very busy during startup and it is risky to fill up the entire queue with only one initial value.
In this patch, we wait until the queue is filled with enough sensor values before doing any checks.
Signed-off-by: Potin Lai <potin.lai@quantatw.com> Change-Id: I38de9351e88fb666051008170bdeae635fbcd103
show more ...
|
973c1b69 | 04-Aug-2022 |
Patrick Williams <patrick@stwcx.xyz> |
MAINTAINERS: remove file
The MAINTAINERS file is deprecated in favor of OWNERS.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I444f9ba834bf7f4540ee5ddf99913ab9bdd2f94a
|
37f8b513 | 02-Aug-2022 |
Potin Lai <potin.lai@quantatw.com> |
bmc_health_config: remove default threshold target unit
Remove reboot.target from default json file to keep same behavior before target unit call implementation.
Signed-off-by: Potin Lai <potin.lai
bmc_health_config: remove default threshold target unit
Remove reboot.target from default json file to keep same behavior before target unit call implementation.
Signed-off-by: Potin Lai <potin.lai@quantatw.com> Change-Id: I1e821731a3f5e36d49bd47b7daa19e254b927d58
show more ...
|
bbfe7186 | 22-Jul-2022 |
Patrick Williams <patrick@stwcx.xyz> |
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are
sdbusplus: use shorter type aliases
The sdbusplus headers provide shortened aliases for many types. Switch to using them to provide better code clarity and shorter lines. Possible replacements are for: * bus_t * exception_t * manager_t * match_t * message_t * object_t * slot_t
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Ibd0930728f0a945f424e3a3b5d3e8ea9e54f1979
show more ...
|
85f31f8b | 22-Jul-2022 |
Patrick Williams <patrick@stwcx.xyz> |
OWNERS: switch 'matches' to 'matchers'
The original OWNERS template had a mistake which used 'matches' instead of the field supported by the Gerrit plugin 'matchers'. Update the OWNERS file to have
OWNERS: switch 'matches' to 'matchers'
The original OWNERS template had a mistake which used 'matches' instead of the field supported by the Gerrit plugin 'matchers'. Update the OWNERS file to have the correct field.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: I6806c276b14e4906840c3b90e7f6aff60a1d3fb7
show more ...
|
156ecf31 | 11-Jul-2022 |
Potin Lai <potin.lai@quantatw.com> |
healthMonitor: call configured unit when threshold exceeded
Add threshold target implementation. Call the system target unit configured in config file when threshold value exceeded.
Signed-off-by:
healthMonitor: call configured unit when threshold exceeded
Add threshold target implementation. Call the system target unit configured in config file when threshold value exceeded.
Signed-off-by: Potin Lai <potin.lai@quantatw.com> Change-Id: I31f0cf4df0c913f10c47ecb9fa24b59b7b9de0e5
show more ...
|
7fc0aa1e | 20-Jun-2022 |
Willy Tu <wltu@google.com> |
cleanup: Remove unnecessary info message for dbus matcher
Remove the needUpdate message once it has been set to true. This is to reduce the amount of spam generated by this daemon.
Change-Id: Id216
cleanup: Remove unnecessary info message for dbus matcher
Remove the needUpdate message once it has been set to true. This is to reduce the amount of spam generated by this daemon.
Change-Id: Id21675e2cbca89ebb35eb29af053bfd9d917b14a Signed-off-by: Willy Tu <wltu@google.com>
show more ...
|
f8d79737 | 11-Mar-2021 |
Yong Li <yong.b.li@linux.intel.com> |
Make health-monitor compatible with “sensor list” command
When running both the health-monitor with dbus-sensors service, "ipmitool sensor list" command cannot list all sensors and error reported. T
Make health-monitor compatible with “sensor list” command
When running both the health-monitor with dbus-sensors service, "ipmitool sensor list" command cannot list all sensors and error reported. The root cause is that health-monitor exports it as an utilization sensor but incompatible with sensor reading: Max and min values should be set and max should > min; Critical/warning values should be set Object Manager should be registered on dbus "/"
Tested: Ipmitool sensor list can list all these sensors without error.
Signed-off-by: Yong Li <yong.b.li@linux.intel.com> Change-Id: Ie9311f153b647e2ddec09aaa3edf448926c21e97
show more ...
|
a1ed140b | 21-Mar-2022 |
Patrick Williams <patrick@stwcx.xyz> |
meson: simplify dependencies
Leverage wrapfile `[provide]` directives to simplify the dependency searching in the meson.build.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Ib36f2c
meson: simplify dependencies
Leverage wrapfile `[provide]` directives to simplify the dependency searching in the meson.build.
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Change-Id: Ib36f2cf3b3547838aaeb89c0867187f248910074
show more ...
|
a6cd704b | 21-Dec-2021 |
Konstantin Aladyshev <aladyshev22@gmail.com> |
Set unset threshold values to NaN
Currently unset thresholds are set to 0. But correct value for unset thresholds is NaN.
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com> Change-Id: I1fb
Set unset threshold values to NaN
Currently unset thresholds are set to 0. But correct value for unset thresholds is NaN.
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com> Change-Id: I1fbf120d5ea4af7e470fa4d9b7d77c6a14a15b96
show more ...
|
9d29b378 | 21-Dec-2021 |
Konstantin Aladyshev <aladyshev22@gmail.com> |
Set correct values for sensor min and max properties
Interface 'xyz.openbmc_project.Sensor.Value' has 'MinValue' and 'MaxValue' properties. In case of utilization sensors these values should be set
Set correct values for sensor min and max properties
Interface 'xyz.openbmc_project.Sensor.Value' has 'MinValue' and 'MaxValue' properties. In case of utilization sensors these values should be set to 0 and 100 accordingly.
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com> Change-Id: I3e6c75375c200fd1716a044a63782f310d4f914b
show more ...
|