xref: /openbmc/phosphor-power/phosphor-power-sequencer/docs/pgood_faults.md (revision 27ae70b8219cd292f3d8d9b3d5ff2a9ee18b1f37)
1# Power Good Faults
2
3## Overview
4
5The power sequencer device provides a chassis power good (pgood) signal. This
6indicates that all of the main (non-standby) voltage rails are powered on.
7
8If the chassis pgood state is false when it should be true, a chassis power good
9(pgood) fault has occurred.
10
11## Pgood fault while powering on the system
12
13When the power sequencer device is powering on the main voltage rails in order,
14one of the rails may fail to power on. This is often due to a hardware problem.
15
16When a voltage rail fails to power on, the power sequencer device may
17immediately indicate an error. However, the device may instead wait indefinitely
18for the rail to power on. In both cases the chassis pgood signal never changes
19to true.
20
21## Pgood fault after the system was powered on
22
23A pgood fault can occur after a system has been powered on. The system may have
24been successfully running for days or months.
25
26A voltage rail may suddenly power off or stop providing the expected level of
27voltage. This could occur if the voltage regulator stops working or if it shuts
28itself off due to exceeding a temperature/voltage/current limit.
29
30The power sequencer device will detect that the voltage rail has failed. The
31device will change the state of the chassis pgood signal to false. The device
32may also power off several other related voltage rails, depending on how the
33hardware is configured.
34
35## Pgood fault handling
36
37`phosphor-power-sequencer` detects a pgood fault by monitoring the chassis pgood
38signal:
39
40- Powering on chassis: pgood signal never changes to true.
41- Chassis was powered on: pgood signal changes from true to false.
42
43When a pgood fault is detected, `phosphor-power-sequencer` will perform the
44following steps:
45
46- Use information from the power sequencer device to determine the cause of the
47  fault.
48- Log an error with information about the fault.
49- If this is a single chassis system:
50  - The system will be [powered off](powering_off.md).
51- If this is a multiple chassis system:
52  - Wait a short period of time, and then check if all the other chassis that
53    were powered on are also experiencing a pgood fault. If so, check if any
54    chassis is experiencing a brownout or blackout. This determines whether this
55    is a chassis-specific problem or a system-wide problem due to a
56    [Power Loss](power_loss.md).
57  - If this is a chassis-specific problem, add the inventory path of the chassis
58    to the error log. This may result in hardware isolation, which will cause
59    the `Enabled` property of the chassis to be false.
60  - The system will be powered [off](powering_off.md) and then
61    [on](powering_on.md) again.
62  - Chassis with an `Enabled` value of false will **not** be powered back on.
63
64See [Chassis Status](chassis_status.md) for more information on the `Enabled`
65property.
66
67Note that when a pgood error happens **during** a power on attempt, the
68`phosphor-chassis-state-manager` application handles the power off/power cycle.
69When the pgood error happens **after** the system was powered on, the
70`phosphor-power-sequencer` application handles the power off/power cycle. This
71is due to the complex service file relationships that occur during a power on
72attempt.
73
74## Determining the cause of a pgood fault
75
76It is very helpful to determine which voltage rail caused a pgood fault. That
77determines what hardware potentially needs to be replaced.
78
79Determining the correct rail requires the following:
80
81- The power sequencer device type is supported by `phosphor-power-sequencer`.
82- A [JSON configuration file](config_file/README.md) is defined for the system.
83
84If those requirements are not met, a general pgood error will be logged.
85
86If those requirements are met, `phosphor-power-sequencer` will attempt to
87determine which voltage rail caused the chassis pgood fault. The following
88methods are supported in the JSON configuration file:
89
90- Read a GPIO from the power sequencer device
91- Check the PMBus STATUS_VOUT command value
92- Compare the PMBus READ_VOUT value to the PMBus VOUT_UV_FAULT_LIMIT value
93
94Multiple methods might need to be used on the same rail. For example, the PMBus
95STATUS_VOUT error bits might be set for a pgood fault after the system powered
96on, but they might not be set during a power on attempt because the power
97sequencer is waiting indefinitely for the rail to power on.
98
99See the [rail](config_file/rail.md) object in the configuration file for more
100information.
101
102If a specific voltage rail is found, an error is logged against that rail.
103
104If the voltage rail is from the power supplies, and the `phosphor-power-supply`
105application found a power supply error, then the power supply error is logged as
106the cause of the pgood fault.
107
108If no voltage rail is found, a general pgood error is logged.
109