10295547 | 09-Aug-2017 |
Brandon Wyman <bjwyman@gmail.com> |
Add STATUS_WORD to metadata for VIN_UV_FAULT
Change-Id: Iaa6001f7c5d0c558ad3bc01e209dc316236fea93 Signed-off-by: Brandon Wyman <bjwyman@gmail.com> |
442035f0 | 08-Aug-2017 |
Brandon Wyman <bjwyman@gmail.com> |
Update analyze to check VIN_UV_FAULT
The function is a pure virtual function in DeviceMonitor, add in the implementation of that for PowerSupply. Read the file that represents that bit from the STAT
Update analyze to check VIN_UV_FAULT
The function is a pure virtual function in DeviceMonitor, add in the implementation of that for PowerSupply. Read the file that represents that bit from the STATUS_WORD. If fault is on, report a fault.
Change-Id: I05a4bff997bb0c8b8b71db444e9db0e506765689 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
1db9a9e2 | 26-Jul-2017 |
Brandon Wyman <bjwyman@gmail.com> |
Update PowerSupply to be derived from Device
The PowerSupply will pass a name and instance number down to the Device class it is derived from, but will also have an inventory path and a path to moni
Update PowerSupply to be derived from Device
The PowerSupply will pass a name and instance number down to the Device class it is derived from, but will also have an inventory path and a path to monitor for PMBus interfaces.
Change-Id: I29f875fda1f07d031b58ec7ffd381d655495f248 Signed-off-by: Brandon Wyman <bjwyman@gmail.com>
show more ...
|
24e422fe | 25-Jul-2017 |
Brandon Wyman <bjwyman@gmail.com> |
Update build framework for PowerSupply fault app
Change-Id: I98a75efc88d92de0ab016a77d5dd4a1e9345df83 Signed-off-by: Brandon Wyman <bjwyman@gmail.com> |
a8269652 | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Shutdown system on GPU over-temps
Resolves openbmc/openbmc#1726
Change-Id: If3263678bc03df7714f31aa097f38ee6c09389f4 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
8bc1283f | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Find and call out faulted GPUs
Isolate down to the GPU that caused the GPU PGOOD or overtemp summary fault bit to turn on. On Witherspoon this involves reading GPIOs on a pca9552 device to find the
Find and call out faulted GPUs
Isolate down to the GPU that caused the GPU PGOOD or overtemp summary fault bit to turn on. On Witherspoon this involves reading GPIOs on a pca9552 device to find the GPU signaling the fault.
GPUs are not currently in the inventory, so the code isn't doing the standard callout by adding a certain metadata field. The GPU number that failed will just be added to the error log metadata, and work will be done with support to make sure that is documented. Also, the other power fault callouts don't use the standard inventory callouts either as they are more complicated than just a single FRU, so this method is consistent with that.
Note that these faults do not cause the system to power off automatically like other power faults, though a future commit will power off the system on a GPU overtemp.
Change-Id: If4053f32a06a335a6612a04a8164d34306530b22 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
2d248aed | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add system data for doing GPU callouts
The 2 GPU fault types (pgood & overtemp) are wired from each GPU to a summary bit on the UCD90160. If code detects a summary bit on, it will need to read some
Add system data for doing GPU callouts
The 2 GPU fault types (pgood & overtemp) are wired from each GPU to a summary bit on the UCD90160. If code detects a summary bit on, it will need to read some GPIOs on an IO expander (pca9552) to tell which actual GPU failed in order to call it out.
This commit provides the data to know when to read the extra GPIOs, and how those map to the specific faults and GPUs.
Change-Id: I688ddf2ef08b0313b73ed8737eeb01dec059bf40 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
7b14db24 | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add GPU error logging functions
Add functions to log the GPU PGOOD and overtemp errors.
Change-Id: I6f58d76883f8a78a3301481dbacd111c74b396d4 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
fcd4a719 | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Refactor findGPIODevice
Update findGPIODevice() to work on the path passed into it instead of just on the gpioDevice member variable.
Now it can be called both for finding the path for the UCD chip
Refactor findGPIODevice
Update findGPIODevice() to work on the path passed into it instead of just on the gpioDevice member variable.
Now it can be called both for finding the path for the UCD chip as well as for the device that has the GPIOs used for GPU isolation.
Change-Id: I01b93ece63cf28a11f0b438741689823280c7e2f Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
403dff0d | 19-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add GPU errors
Add GPU power fault and overtemp errors.
Change-Id: Iac5782a0db0c5cda2fe478ef8c4e1d7cd3ff4560 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
2f135445 | 18-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add testcases for NamesValues
Change-Id: I61c52f2a196a32dbd4d03a2ccafa2b94414f7555 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
81be00b1 | 07-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Remove clearFaults calls
The community wasn't thrilled with the device driver providing the clear_logged_faults command. As it isn't absolutely necessary for this code to do now, it is being remove
Remove clearFaults calls
The community wasn't thrilled with the device driver providing the clear_logged_faults command. As it isn't absolutely necessary for this code to do now, it is being removed.
Note: Currently Device::clearFaults is a pure virtual function so it still needs to be defined in the UCD90160 class.
A future commit may also remove these.
Change-Id: I0b3a33d56987dd97ab7253eb6b5d3b5afd835d67 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
3b090d14 | 08-Sep-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add helper class to handle PMBus metadata
Add a NamesValues class to help when adding multiple PMBus registers to a single metadata entry.
It allows one to add multiple name/value pairs to the obje
Add helper class to handle PMBus metadata
Add a NamesValues class to help when adding multiple PMBus registers to a single metadata entry.
It allows one to add multiple name/value pairs to the object, and provides a get() method to return them all concatenated.
It currently supports numeric values, up to a uint64_t, but more types could be added if ever necessary.
The resulting string will look like: "name1=value1|name2=value2|etc"
Change-Id: I4def8ac313882a0b9efc7cb33488f19ad33f1727 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
45a054ac | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add method to turn off UCD90160 hardware accesses
As the fault monitoring functionality is going in toward the end of a release, a flag is being provided to quickly turn off the hardware accesses wh
Add method to turn off UCD90160 hardware accesses
As the fault monitoring functionality is going in toward the end of a release, a flag is being provided to quickly turn off the hardware accesses while still leaving the ability to create general errors on PGOOD fails, as well as issuing a shutdown on a runtime PGOOD fail.
This is meant to used if it turns out there are problems with the hardware that end up taking a lot of time to debug.
The flag is --enable-turn-off-ucd90160-access.
Change-Id: I03f0ab5dc4010bf20ef2871f2e737ce310b4398f Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
7084927e | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Monitor UCD90160 for faults at runtime
Add the RuntimeMonitor class that will monitor the UCD90160 faults in 2 ways:
1) Watch for the PowerLost signal, meaning system PGOOD was lost. When it oc
Monitor UCD90160 for faults at runtime
Add the RuntimeMonitor class that will monitor the UCD90160 faults in 2 ways:
1) Watch for the PowerLost signal, meaning system PGOOD was lost. When it occurs, analyze the chip for errors and then issue a proper shutdown so a faulted device doesn't keep getting power.
2) Poll on an interval for nonfatal errors that need to be logged but don't cause a PGOOD loss.
The main executable can now launch either the PGOODMonitor or the RuntimeMonitor based on commandline arguments.
Change-Id: If2856f173d5d6288d8333538334b4b4cb4a60097 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
b2d72511 | 22-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Derive PGOODMonitor from DeviceMonitor
Adding this base class so that PGOODMonitor can fulfill its true purpose, which is checking the UCD90160 for errors on PGOOD failures.
Change-Id: Ie0637676ae5
Derive PGOODMonitor from DeviceMonitor
Adding this base class so that PGOODMonitor can fulfill its true purpose, which is checking the UCD90160 for errors on PGOOD failures.
Change-Id: Ie0637676ae5239c677d60f14d738ff9709d2b7b0 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
9efb308f | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in the createPowerFaultLog function
Only PGOOD and voltage faults are expected to occur during normal operation, but when the system PGOOD is lost and one of those 2 errors wasn't found, we wil
Fill in the createPowerFaultLog function
Only PGOOD and voltage faults are expected to occur during normal operation, but when the system PGOOD is lost and one of those 2 errors wasn't found, we will log a generic power fault log that captures the status_word and mfr_status registers.
Change-Id: I583c5f6c825adce00ac1b458444fcb8e05900c91 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
d998b736 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in PGOOD fault checking code
Fill in the code to check for a PGOOD fault. This is where the power sequencer device detects that one of its child devices lost PGOOD. A separate error log will
Fill in PGOOD fault checking code
Fill in the code to check for a PGOOD fault. This is where the power sequencer device detects that one of its child devices lost PGOOD. A separate error log will be created for each input that has a fault. Each input will only have an error logged against it once for the lifetime of the object.
Errors are detected by reading the real time status of the PGOOD input, which is exposed as a GPIO by the device driver.
Ideally we would be able to use a summary bit in the status_word register to see if there is an error before doing any GPIO reads, but as the device drivers sends a clear faults every time we read that register, and the GPI fault bits are edge triggered in the mfr_status register that feeds status_word, we would never detect any failures. If this was ever fixed in the core PMBus device driver code, we could add this functionality in and save some CPU cycles.
Change-Id: I5000f2ebc2b22dcca946154afd2405b29734ccaf Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
110b2841 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Find the path for the GPIO device
This path is required to access a GPIO later.
Change-Id: I4ec64adbf939c5f0eaa12b7e18345d0fa2247a7d Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
ee7adb7f | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add GPIO class
This class will be used for accessing GPIOs off of the UCD90160 to check for PGOOD faults, and also later for isolating GPU overtemps and PGOOD faults down to the specific GPU that fa
Add GPIO class
This class will be used for accessing GPIOs off of the UCD90160 to check for PGOOD faults, and also later for isolating GPU overtemps and PGOOD faults down to the specific GPU that failed.
The FileDescriptor class is used by the GPIO class to manage the lifetime of the GPIO file handles.
The class only supports reading a GPIO value. At this point there is no requirement for doing writes.
This class was copied from phosphor-gpio-monitor/gpio-util.
Change-Id: Iee276aed67e1cba549c3070c08238ab5f621c320 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
e7e432b4 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Fill in voltage fault checking code
Fill in the code to check for a voltage fault. This is where the power sequencer device detects that one of its child devices has a bad voltage. A separate erro
Fill in voltage fault checking code
Fill in the code to check for a voltage fault. This is where the power sequencer device detects that one of its child devices has a bad voltage. A separate error log will be created for each voltage rail that has a fault. Each rail will only have an error logged against it once for the lifetime of the object.
There will be support documentation that maps the failing rail name to the hardware that the rail corresponds to.
Change-Id: I13380b9898613bf8e76d66a72e1fbe005f816dad Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
ac4b52f7 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add power sequencer errors
Add the errors that can be created by the UCD90160 device.
Change-Id: Ic12c8cbdaf01602583226c93feeb6b22cdbed283 Signed-off-by: Matt Spinler <spinler@us.ibm.com> |
1e365698 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add clear faults support
Add support to send the clear-logged-faults command to the UCD90160 chip. This is a manufacturer specific command. The PMBus standard clear-faults command is done automatic
Add clear faults support
Add support to send the clear-logged-faults command to the UCD90160 chip. This is a manufacturer specific command. The PMBus standard clear-faults command is done automatically by the device driver every time a PMBus sysfs file is read.
Change-Id: I6a9f670502ce1e1b4ffbaab63db335daa0865f46 Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
b54357f6 | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add UCD90160 class
This class represents the UCD90160 power sequencer chip, and provides the ability to check that chip for voltage and PGOOD faults.
This commit just adds function stubs.
Change-I
Add UCD90160 class
This class represents the UCD90160 power sequencer chip, and provides the ability to check that chip for voltage and PGOOD faults.
This commit just adds function stubs.
Change-Id: Iec6e83e9bcddbd476bdd86a887db08f5875f11cd Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|
8f0d953f | 21-Aug-2017 |
Matt Spinler <spinler@us.ibm.com> |
Add device debug path to PMBus class
Some devices may also have files in /sys/kernel/debug/<device driver name>.<instance>, so adding a DeviceDebug path type to support that.
The device driver name
Add device debug path to PMBus class
Some devices may also have files in /sys/kernel/debug/<device driver name>.<instance>, so adding a DeviceDebug path type to support that.
The device driver name and chip instance number are then required to be passed in to the constructor.
Change-Id: I301d730a29ac7c2c39198e4eb7125aff70d727dc Signed-off-by: Matt Spinler <spinler@us.ibm.com>
show more ...
|