2d248aed | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add system data for doing GPU callouts
The 2 GPU fault types (pgood & overtemp) are wired from each GPU to a summary bit on the UCD90160. If code detects a summary bit on, it will need to read some
Add system data for doing GPU callouts
The 2 GPU fault types (pgood & overtemp) are wired from each GPU to a summary bit on the UCD90160. If code detects a summary bit on, it will need to read some GPIOs on an IO expander (pca9552) to tell which actual GPU failed in order to call it out.
This commit provides the data to know when to read the extra GPIOs, and how those map to the specific faults and GPUs.
Change-Id: I688ddf2ef08b0313b73ed8737eeb01dec059bf40 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
7b14db24 | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add GPU error logging functions
Add functions to log the GPU PGOOD and overtemp errors.
Change-Id: I6f58d76883f8a78a3301481dbacd111c74b396d4 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
fcd4a719 | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Refactor findGPIODevice
Update findGPIODevice() to work on the path passed into it instead of just on the gpioDevice member variable.
Now it can be called both for finding the path for the UCD chip
Refactor findGPIODevice
Update findGPIODevice() to work on the path passed into it instead of just on the gpioDevice member variable.
Now it can be called both for finding the path for the UCD chip as well as for the device that has the GPIOs used for GPU isolation.
Change-Id: I01b93ece63cf28a11f0b438741689823280c7e2f Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
81be00b1 | 07-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Remove clearFaults calls
The community wasn't thrilled with the device driver providing the clear_logged_faults command. As it isn't absolutely necessary for this code to do now, it is being remove
Remove clearFaults calls
The community wasn't thrilled with the device driver providing the clear_logged_faults command. As it isn't absolutely necessary for this code to do now, it is being removed.
Note: Currently Device::clearFaults is a pure virtual function so it still needs to be defined in the UCD90160 class.
A future commit may also remove these.
Change-Id: I0b3a33d56987dd97ab7253eb6b5d3b5afd835d67 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
45a054ac | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add method to turn off UCD90160 hardware accesses
As the fault monitoring functionality is going in toward the end of a release, a flag is being provided to quickly turn off the hardware accesses wh
Add method to turn off UCD90160 hardware accesses
As the fault monitoring functionality is going in toward the end of a release, a flag is being provided to quickly turn off the hardware accesses while still leaving the ability to create general errors on PGOOD fails, as well as issuing a shutdown on a runtime PGOOD fail.
This is meant to used if it turns out there are problems with the hardware that end up taking a lot of time to debug.
The flag is --enable-turn-off-ucd90160-access.
Change-Id: I03f0ab5dc4010bf20ef2871f2e737ce310b4398f Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
7084927e | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Monitor UCD90160 for faults at runtime
Add the RuntimeMonitor class that will monitor the UCD90160 faults in 2 ways:
1) Watch for the PowerLost signal, meaning system PGOOD was lost. When it oc
Monitor UCD90160 for faults at runtime
Add the RuntimeMonitor class that will monitor the UCD90160 faults in 2 ways:
1) Watch for the PowerLost signal, meaning system PGOOD was lost. When it occurs, analyze the chip for errors and then issue a proper shutdown so a faulted device doesn't keep getting power.
2) Poll on an interval for nonfatal errors that need to be logged but don't cause a PGOOD loss.
The main executable can now launch either the PGOODMonitor or the RuntimeMonitor based on commandline arguments.
Change-Id: If2856f173d5d6288d8333538334b4b4cb4a60097 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
b2d72511 | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Derive PGOODMonitor from DeviceMonitor
Adding this base class so that PGOODMonitor can fulfill its true purpose, which is checking the UCD90160 for errors on PGOOD failures.
Change-Id: Ie0637676ae5
Derive PGOODMonitor from DeviceMonitor
Adding this base class so that PGOODMonitor can fulfill its true purpose, which is checking the UCD90160 for errors on PGOOD failures.
Change-Id: Ie0637676ae5239c677d60f14d738ff9709d2b7b0 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
9efb308f | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in the createPowerFaultLog function
Only PGOOD and voltage faults are expected to occur during normal operation, but when the system PGOOD is lost and one of those 2 errors wasn't found, we wil
Fill in the createPowerFaultLog function
Only PGOOD and voltage faults are expected to occur during normal operation, but when the system PGOOD is lost and one of those 2 errors wasn't found, we will log a generic power fault log that captures the status_word and mfr_status registers.
Change-Id: I583c5f6c825adce00ac1b458444fcb8e05900c91 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
d998b736 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in PGOOD fault checking code
Fill in the code to check for a PGOOD fault. This is where the power sequencer device detects that one of its child devices lost PGOOD. A separate error log will
Fill in PGOOD fault checking code
Fill in the code to check for a PGOOD fault. This is where the power sequencer device detects that one of its child devices lost PGOOD. A separate error log will be created for each input that has a fault. Each input will only have an error logged against it once for the lifetime of the object.
Errors are detected by reading the real time status of the PGOOD input, which is exposed as a GPIO by the device driver.
Ideally we would be able to use a summary bit in the status_word register to see if there is an error before doing any GPIO reads, but as the device drivers sends a clear faults every time we read that register, and the GPI fault bits are edge triggered in the mfr_status register that feeds status_word, we would never detect any failures. If this was ever fixed in the core PMBus device driver code, we could add this functionality in and save some CPU cycles.
Change-Id: I5000f2ebc2b22dcca946154afd2405b29734ccaf Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
110b2841 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Find the path for the GPIO device
This path is required to access a GPIO later.
Change-Id: I4ec64adbf939c5f0eaa12b7e18345d0fa2247a7d Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
ee7adb7f | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add GPIO class
This class will be used for accessing GPIOs off of the UCD90160 to check for PGOOD faults, and also later for isolating GPU overtemps and PGOOD faults down to the specific GPU that fa
Add GPIO class
This class will be used for accessing GPIOs off of the UCD90160 to check for PGOOD faults, and also later for isolating GPU overtemps and PGOOD faults down to the specific GPU that failed.
The FileDescriptor class is used by the GPIO class to manage the lifetime of the GPIO file handles.
The class only supports reading a GPIO value. At this point there is no requirement for doing writes.
This class was copied from phosphor-gpio-monitor/gpio-util.
Change-Id: Iee276aed67e1cba549c3070c08238ab5f621c320 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
e7e432b4 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in voltage fault checking code
Fill in the code to check for a voltage fault. This is where the power sequencer device detects that one of its child devices has a bad voltage. A separate erro
Fill in voltage fault checking code
Fill in the code to check for a voltage fault. This is where the power sequencer device detects that one of its child devices has a bad voltage. A separate error log will be created for each voltage rail that has a fault. Each rail will only have an error logged against it once for the lifetime of the object.
There will be support documentation that maps the failing rail name to the hardware that the rail corresponds to.
Change-Id: I13380b9898613bf8e76d66a72e1fbe005f816dad Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
1e365698 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add clear faults support
Add support to send the clear-logged-faults command to the UCD90160 chip. This is a manufacturer specific command. The PMBus standard clear-faults command is done automatic
Add clear faults support
Add support to send the clear-logged-faults command to the UCD90160 chip. This is a manufacturer specific command. The PMBus standard clear-faults command is done automatically by the device driver every time a PMBus sysfs file is read.
Change-Id: I6a9f670502ce1e1b4ffbaab63db335daa0865f46 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
b54357f6 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add UCD90160 class
This class represents the UCD90160 power sequencer chip, and provides the ability to check that chip for voltage and PGOOD faults.
This commit just adds function stubs.
Change-I
Add UCD90160 class
This class represents the UCD90160 power sequencer chip, and provides the ability to check that chip for voltage and PGOOD faults.
This commit just adds function stubs.
Change-Id: Iec6e83e9bcddbd476bdd86a887db08f5875f11cd Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
56d90a89 | 14-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in witherspoon-pseq-monitor main()
Call run() on an instance of the PGOODMonitor class.
Change-Id: I1ec693ece7dd6034c513d2ca6b294e5b0a1e0e6d Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
f02daec1 | 14-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add PGOODMonitor class
This class checks that PGOOD comes on in the amount of time specified. If it doesn't, it will create an error log.
Future commits will analyze the power sequencer chip for f
Add PGOODMonitor class
This class checks that PGOOD comes on in the amount of time specified. If it doesn't, it will create an error log.
Future commits will analyze the power sequencer chip for failures in this case so a better callout can be done.
Change-Id: Ia3679e5a7d36103f908b70aa0301cd012b0e7b20 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
afb39132 | 14-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add witherspoon-pseq-monitor framework
This application is for monitoring the power sequencer chip for faults.
Change-Id: I6b18fcba75ae0206311e4bcafdebad27221b0796 Signed-off-by: Matt Spinler <spin
Add witherspoon-pseq-monitor framework
This application is for monitoring the power sequencer chip for faults.
Change-Id: I6b18fcba75ae0206311e4bcafdebad27221b0796 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|