2549d792 | 26-Jan-2022 |
Adriana Kobylak <anoo@us.ibm.com> |
psu-ng: Log error on brownout condition
Per design document: https://github.com/openbmc/docs/blob/master/designs/power-recovery.md#brownout
The BMC must log an error indicating the brownout event h
psu-ng: Log error on brownout condition
Per design document: https://github.com/openbmc/docs/blob/master/designs/power-recovery.md#brownout
The BMC must log an error indicating the brownout event has occurred.
Check for a brownout condition where all PSUs are either not present or report an AC loss VIN fault. The error log would be the same as a Blackout condition. Keep track if the error was created via a variable so that the error is not created multiple times. Clear the variable once the brownout condition is no longer detected or when then system is powered off so that it gets logged on the next power on.
Tested: On Rainier 2S2U simulation where there are 4 PSUs slots but only two power supplies connected, inject a VIN fault on the two present PSUs and verify an error log for AC loss 110000AC is created:
Jan 31 16:57:37 p10bmc phosphor-psu-monitor[963]: INPUT fault: STATUS_WORD = 0x2848, STATUS_MFR_SPECIFIC = 0x0, STATUS_INPUT = 0x38 Jan 31 16:57:37 p10bmc phosphor-psu-monitor[963]: VIN_UV fault: STATUS_WORD = 0x2848, STATUS_MFR_SPECIFIC = 0x0, STATUS_INPUT = 0x38 Jan 31 16:57:38 p10bmc phosphor-psu-monitor[963]: INPUT fault: STATUS_WORD = 0x2848, STATUS_MFR_SPECIFIC = 0x0, STATUS_INPUT = 0x38 Jan 31 16:57:38 p10bmc phosphor-psu-monitor[963]: VIN_UV fault: STATUS_WORD = 0x2848, STATUS_MFR_SPECIFIC = 0x0, STATUS_INPUT = 0x38 Jan 31 16:57:38 p10bmc phosphor-log-manager[305]: Created PEL 0x50000007 (BMC ID 7) with SRC 110000AC
Change-Id: I7760b59a02ef2afc81bd7807c7896183d99a66ec Signed-off-by: Adriana Kobylak <anoo@us.ibm.com>
show more ...
|
f8d8c464 | 27-Jan-2022 |
Ben Tyner <ben.tyner@ibm.com> |
phosphor-power-supply: Populate RT VPD Keywords
Populate the Resource Type (RT) VPD keywords for type VINI and DINF.
Signed-off-by: Ben Tyner <ben.tyner@ibm.com> Change-Id: Ib39f1013111fd00ca9276fd
phosphor-power-supply: Populate RT VPD Keywords
Populate the Resource Type (RT) VPD keywords for type VINI and DINF.
Signed-off-by: Ben Tyner <ben.tyner@ibm.com> Change-Id: Ib39f1013111fd00ca9276fd58e5193457e682d39
show more ...
|
32453e9b | 15-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Continue reading after readFail
Update the PMBus::read() function to allow for not creating journal trace and elog, but default to continuing to trace and elog, the previous behavior.
If we
psu-ng: Continue reading after readFail
Update the PMBus::read() function to allow for not creating journal trace and elog, but default to continuing to trace and elog, the previous behavior.
If we reach the limit of read failures that results in a communication error log, continue to read, but stop logging failures.
If communication restores, we may be able to detect what caused the read failure, or otherwise detect or clear new faults.
Change-Id: If59b86211ab54c31248ede78f8f117b607298923 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
b70eae9a | 20-Jan-2022 |
Adriana Kobylak <anoo@us.ibm.com> |
psu-ng: Check if mismatched PSU is supported
The function that checks that all PSUs have the same model currently calls out the first mismatched PSU. If the mismatched PSU is listed in the supported
psu-ng: Check if mismatched PSU is supported
The function that checks that all PSUs have the same model currently calls out the first mismatched PSU. If the mismatched PSU is listed in the supported configurations and the base model is not, the error log may cause confusion because it would indicate in the callout data to replace the PSU that it is supposed to be supported instead of the one that it is not supported on that system.
Therefore check the supported configurations to determine which PSU to callout in case of a mismatch.
Tested: On p10bmc with ps0 model 2B1D (no supported) and ps1 model 2B1E (supported): - Before change: "ACTUAL_MODEL": "2B1E", "CALLOUT_INVENTORY_PATH": "/xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1", "EXPECTED_MODEL": "2B1D", - After change: "ACTUAL_MODEL": "2B1D", "CALLOUT_INVENTORY_PATH": "/xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0", "EXPECTED_MODEL": "2B1E",
Change-Id: I0b2d487e12f55e08a93e77b6c569726dde9d4e68 Signed-off-by: Adriana Kobylak <anoo@us.ibm.com>
show more ...
|
c2906f47 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: De-glitch all faults
Use DEGLITCH_LIMIT to determine all the faults. If a fault bit is on, do not consider that a fault until it is seen at least DEGLITCH_LIMIT times. With DEGLITCH_LIMIT se
psu-ng: De-glitch all faults
Use DEGLITCH_LIMIT to determine all the faults. If a fault bit is on, do not consider that a fault until it is seen at least DEGLITCH_LIMIT times. With DEGLITCH_LIMIT set to 3, the monitor would need to see a fault bit on 3 times in a row before indicating that the power supply has that fault.
This was done earlier for the PGOOD fault detection.
Change-Id: I918c2fcdd1d90ae253ab268bd04aa7a0da0208b8 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
82affd94 | 24-Nov-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Clear faults when voltage back in range
If the last read voltage (via READ_VIN) was below the minimum and now it is back in a valid range (100 or 200 volt range valid), clear all the faults
psu-ng: Clear faults when voltage back in range
If the last read voltage (via READ_VIN) was below the minimum and now it is back in a valid range (100 or 200 volt range valid), clear all the faults to allow for re-detection of faults and logging of new errors.
Trace if INPUT_FAULT_WARN or VIN_UV clear. We should not expect to see that without sending a CLEAR_FAULTS command (or a power cycle).
Tested: Rainier 2S2U real hardware. ePDU outlet off/on allows re-detection of injected CML fault. - input fault, vin_uv fault, pgood/off fault. - repeat shows faults cleared, and new faults logged. Simulator pgood fault, then low voltage followed by good voltage. Verify simulator can re-detect faults after voltage back in range. Simulator fake input fault/warn on, then off and other fault on. - verified tracing input going off without clear faults sent. Simulator fake input fault/warn on, then no faults. - verified tracing input going off without clear faults sent.
Change-Id: Ic8022cf137978ff660680e9680f778853cbecf0d Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
f087f475 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor VIN_UV fault detection
Split off code checking for VIN_UV fault in STATUS_WORD to its own function.
Change-Id: Ifd07b6958885ed19c7611e10e343d1a5f10ec684 Signed-off-by: Brandon Wyma
psu-ng: Refactor VIN_UV fault detection
Split off code checking for VIN_UV fault in STATUS_WORD to its own function.
Change-Id: Ifd07b6958885ed19c7611e10e343d1a5f10ec684 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
d5d9a225 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor fan fault detection
Split off code checking for fan fault/warning in STATUS_WORD to its own function.
Change-Id: I49b6dc2d62b6ca39a564262a4745aa5ed25c14eb Signed-off-by: Brandon Wy
psu-ng: Refactor fan fault detection
Split off code checking for fan fault/warning in STATUS_WORD to its own function.
Change-Id: I49b6dc2d62b6ca39a564262a4745aa5ed25c14eb Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
08378784 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor VOUT_UV_FAULT detection
Split off code checking for VOUT_UV fault in STATUS_WORD to its own function.
Change-Id: I3b5e898a7d4f1ad21317c66b7fcb97c211581dcd Signed-off-by: Brandon Wy
psu-ng: Refactor VOUT_UV_FAULT detection
Split off code checking for VOUT_UV fault in STATUS_WORD to its own function.
Change-Id: I3b5e898a7d4f1ad21317c66b7fcb97c211581dcd Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
a00e7300 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor IOUT_OC_FAULT detection
Split off code checking for IOUT_OC fault in STATUS_WORD to its own function.
Change-Id: I4925149b0b1f9bce2400c83eb2808aeba0f7d1cc Signed-off-by: Brandon Wy
psu-ng: Refactor IOUT_OC_FAULT detection
Split off code checking for IOUT_OC fault in STATUS_WORD to its own function.
Change-Id: I4925149b0b1f9bce2400c83eb2808aeba0f7d1cc Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
c2c87131 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor VOUT_OV_FAULT detection
Split off code checking for VOUT_OV fault in STATUS_WORD to its own function.
Change-Id: Id5fef1a3830ff4a60ca235bbf83b3f99caea5986 Signed-off-by: Brandon Wy
psu-ng: Refactor VOUT_OV_FAULT detection
Split off code checking for VOUT_OV fault in STATUS_WORD to its own function.
Change-Id: Id5fef1a3830ff4a60ca235bbf83b3f99caea5986 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
e3b0bb01 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor input fault detection
Split off input fault checking of STATUS_WORD into its own function.
Change-Id: I72a299ec1e905c9d59460b37f594997eee124e27 Signed-off-by: Brandon Wyman <bjwyma
psu-ng: Refactor input fault detection
Split off input fault checking of STATUS_WORD into its own function.
Change-Id: I72a299ec1e905c9d59460b37f594997eee124e27 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
c220343c | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor CML fault detection
Split off CML fault checking of STATUS_WORD into its own function.
Change-Id: I845fe0dbcc86f085057bd5128303dde04cd9057f Signed-off-by: Brandon Wyman <bjwyman@gm
psu-ng: Refactor CML fault detection
Split off CML fault checking of STATUS_WORD into its own function.
Change-Id: I845fe0dbcc86f085057bd5128303dde04cd9057f Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
52cb3f28 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor temperature fault detection
Split off temperature fault checking of STATUS_WORD into its own function.
Change-Id: Ia7dc08af12647cc7ca76356c9e9d2b75e2e95f56 Signed-off-by: Brandon W
psu-ng: Refactor temperature fault detection
Split off temperature fault checking of STATUS_WORD into its own function.
Change-Id: Ia7dc08af12647cc7ca76356c9e9d2b75e2e95f56 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
993b554f | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor PGOOD and UNIT_IS_OFF fault
Split the code for detecting PGOOD and/or UNIT_IS_OFF fault into its own private member function.
The analyze() function is getting a bit long and hard
psu-ng: Refactor PGOOD and UNIT_IS_OFF fault
Split the code for detecting PGOOD and/or UNIT_IS_OFF fault into its own private member function.
The analyze() function is getting a bit long and hard to read.
Change-Id: I48771a5d4e8991ce37b54bd6ad4c3e938924418e Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
6c2ac394 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor MFR fault detection
Move handling of MFR fault to its own analyzeMFRFault() function.
This will handle checking if the MFR fault bit is on in STATUS_WORD, and determinine what powe
psu-ng: Refactor MFR fault detection
Move handling of MFR fault to its own analyzeMFRFault() function.
This will handle checking if the MFR fault bit is on in STATUS_WORD, and determinine what power supply specific meaning any STATUS_MFR_SPECIFIC bits mean by calling determineMFRFault().
The analyze() function is getting a bit long and hard to read.
Change-Id: I401ebcf11943099385044081518a27511075fa94 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
e3f7ad23 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Refactor clearing fault member variables
Consolidated the copy/paste of clearing the fault member variables into a helper function that both analyze() and clearFaults() can use.
Change-Id:
psu-ng: Refactor clearing fault member variables
Consolidated the copy/paste of clearing the fault member variables into a helper function that both analyze() and clearFaults() can use.
Change-Id: Ib56718b0d4cc36edd000b9ba1f52fb42047e2a8c Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
925c0263 | 21-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Change pgoodFault type to size_t
The Misc Guidelines section of the coding conventions indicate that we should "Always use size_t or ssize_t for things that are sizes, counts, etc. ...."
ht
psu-ng: Change pgoodFault type to size_t
The Misc Guidelines section of the coding conventions indicate that we should "Always use size_t or ssize_t for things that are sizes, counts, etc. ...."
https: //github.com/openbmc/docs/blob/master/cpp-style-and-conventions.md Change-Id: I23eba141c00e138477e008a40962f0c1af94bb51 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
391a0690 | 08-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: gtest cleanup expectation warnings
A number of the tests are missing various EXPECT_CALL statements that result in very verbose testlog.txt output. This becomes especially problematic when a
psu-ng: gtest cleanup expectation warnings
A number of the tests are missing various EXPECT_CALL statements that result in very verbose testlog.txt output. This becomes especially problematic when a test fails, as there are pages of output to look for to narrow down what failed where and why.
Adding in the EXPECT_CALL statements that should be there, such as findHwmonDir, the reading of "in1_input" as part of the fault clearing, etc.
Change-Id: I9f2f88622ad7b682461069df980a50b0b13c44a6 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
39ea02bc | 23-Nov-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Add in handling of specific MFR faults
Add in a function to determine what the various bits in statusMFR may be indicating for a fault, based on the type of power supply (device driver bound
psu-ng: Add in handling of specific MFR faults
Add in a function to determine what the various bits in statusMFR may be indicating for a fault, based on the type of power supply (device driver bound).
Add in PS_Kill, 12Vcs, and 12V CS faults for IBM power supply types.
Add in creating error logs for PS_Kill, 12Vcs, and 12V CS faults. The 12Vcs and 12V CS faults can essentially be treated the same as VOUT_UV faults (same error type, same call out).
Tested: Verified no PS_Kill, 12Vcs, or 12V CS fault on normal Rainier 2S4U
Simulated PS_Kill fault: MFR fault: STATUS_WORD = 0x1840 STATUS_MFR_SPECIFIC = 0x10
Simulated 12Vcs fault: PGOOD fault: STATUS_WORD = 0x1840, STATUS_MFR_SPECIFIC = 0x40 MFR fault: STATUS_WORD = 0x1840 STATUS_MFR_SPECIFIC = 0x40
Simulated 12V CS fault/warning: MFR fault: STATUS_WORD = 0x1000 STATUS_MFR_SPECIFIC = 0x80
Change-Id: Ie89a58836ecec86dfa2e124eb6ab03e9dccce929 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
a169b0f9 | 07-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: IBM FN goes to SparePartNumber
When reading the IBM FN (FRU_NUMBER), add it to the SparePartNumber asset properties in the D-Bus inventory.
Tested: root@p10bmc:~# busctl get-property xyz.op
psu-ng: IBM FN goes to SparePartNumber
When reading the IBM FN (FRU_NUMBER), add it to the SparePartNumber asset properties in the D-Bus inventory.
Tested: root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > PartNumber s "3FP210" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > PartNumber s "3FP210" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > SparePartNumber s "" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > SparePartNumber s "" root@p10bmc:~# root@p10bmc:~# # patch needed in /usr root@p10bmc:~# mkdir -p /tmp/persist/usr root@p10bmc:~# mkdir -p /tmp/persist/work/usr root@p10bmc:~# mount -t overlay -o lowerdir=/usr,upperdir=/tmp/persist/usr,workdir=/tmp/persist/work/usr overlay /usr root@p10bmc:~# md5sum /usr/bin/phosphor-psu-monitor /tmp/phosphor-psu-monitor ac1a50698a63e53dd3819b9f3c78c378 /usr/bin/phosphor-psu-monitor 4a2806d1a3494d1dd7176cd7b9dadf1a /tmp/phosphor-psu-monitor root@p10bmc:~# mv /tmp/phosphor-psu-monitor /usr/bin/phosphor-psu-monitor root@p10bmc:~# md5sum /usr/bin/phosphor-psu-monitor /tmp/phosphor-psu-monitor 4a2806d1a3494d1dd7176cd7b9dadf1a /usr/bin/phosphor-psu-monitor md5sum: can't open '/tmp/phosphor-psu-monitor': No such file or directory root@p10bmc:~# systemctl daemon-reload root@p10bmc:~# systemctl restart phosphor-psu-monitor.service root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > PartNumber s "3FP210" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > PartNumber s "3FP210" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > SparePartNumber s "3FP211" root@p10bmc:~# root@p10bmc:~# busctl get-property xyz.openbmc_project.Inventory.Manager \ > /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1 \ > xyz.openbmc_project.Inventory.Decorator.Asset \ > SparePartNumber s "3FP211" root@p10bmc:~#
Change-Id: I4aaa906f576894f62fa36083c40c89d935d646a8 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
9ba38235 | 16-Nov-2021 |
Adriana Kobylak <anoo@us.ibm.com> |
psu-ng: Run validation when PSU is plugged
Subscribe to the Present inventory property so that when a PSU becomes present (it's plugged into the system), it triggers the PSU validation check so that
psu-ng: Run validation when PSU is plugged
Subscribe to the Present inventory property so that when a PSU becomes present (it's plugged into the system), it triggers the PSU validation check so that the user can know if the new PSU(s) is supported on the system instead of needing to issue a power on to run the validation.
Tested: At BMC Ready state, set the Present property on powersupply1 to false, then true, and check that the validation ran every time the Present property was set to true. Same when changing the Present property for powersupply0.
Change-Id: I14dc7d5902871284c9c099e81b45e78e4abf83bc Signed-off-by: Adriana Kobylak <anoo@us.ibm.com>
show more ...
|
06ca4590 | 06-Dec-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Add DEGLITCH_LIMIT, deglitch pgoodFault
While the power supply should not arbitrarily report a PGOOD fault, and then turn it back off, there is a perception that this is indeed possible, a g
psu-ng: Add DEGLITCH_LIMIT, deglitch pgoodFault
While the power supply should not arbitrarily report a PGOOD fault, and then turn it back off, there is a perception that this is indeed possible, a glitch of some sort.
To avoid possibly logging an error for an erroneous fault reporting, make sure the fault is reported more than once before considering it to be a true fault (deglitch the signal).
Tested: Real Rainier 2S2U: Verify tracing PGOOD faults seen and cleared, no error logged Verify PGOOD/OFF error logged when manually set ON_OFF_CONFIG & OPERATION. Verify deglitched PGOOD again on restart service (ON_OFF_CONFIG reset).
Change-Id: I54f775004d2e363cff21ff0512bd9283408f1f72 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
4aecc295 | 10-Nov-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: INFO journal trace when pgoodFault clears
Change-Id: I416a58c8d6c4276c0fa7edcad85e59a889332109 Signed-off-by: Brandon Wyman <bjwyman@gmail.com> |
7ee4d7e4 | 19-Nov-2021 |
Brandon Wyman <bjwyman@gmail.com> |
psu-ng: Add in detection of fan faults
If the FANS bit in the STATUS_WORD turns on (A fan or airflow fault or warning has occurred), set a fan fault indicator in the power supply object. During anal
psu-ng: Add in detection of fan faults
If the FANS bit in the STATUS_WORD turns on (A fan or airflow fault or warning has occurred), set a fan fault indicator in the power supply object. During analysis of the power supplies, if a fan fault has occurred, prioritize that over a temperature fault, include the STATUS_TEMPERATURE and STATUS_FANS_1_2 command responses in the error created. Call out the power supply with the fault.
Tested: Verify no faults detected or logged on real hardware (Rainier 2S4U). Simulate fan 1 fault on Rainier 2S2U, 110015FF PEL created.
Change-Id: Ifff5b4d96efe44b081a33caa01d70fdb578e57e3 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|