xref: /openbmc/docs/designs/psu-monitoring.md (revision 0ee8da09)
1
2# Power Supply Monitoring Application
3
4Author:
5  Brandon Wyman !bjwyman
6
7Other contributors:
8  Derek Howard
9
10Created:
11  2019-06-17
12
13## Problem Description
14This is a proposal to provide a set of enhancements to the current OpenBMC
15power supply application for enterprise class systems. Some enterprise class
16systems may consist of a number of configuration variations including different
17power supply types and numbers. An application capable of communicating with the
18different power supplies is needed in order to initialize the power supplies,
19validate configurations, report invalid configurations, detect and report
20various faults, and report vital product data (VPD). Some of the function will
21be configurable to be included or excluded for use on different platforms.
22
23## Background and References
24The OpenBMC project currently has a [witherspoon-pfault-analysis][1] repository
25that contains a power supply monitor application and a power sequencer monitor
26application. The current power supply application is lacking things desired for
27an enterprise class server.
28
29The intent of this new application design is to enhance the OpenBMC project
30with a single power supply application that can communicate with one or more
31[PMBus][2] power supplies and provide the enterprise features currently lacking
32in the existing application that has multiple instances talking to a single
33power supply.
34
35## Requirements
36
37Some of these requirements may be deemed as business specific logic, and thus
38could be configurable options as appropriate.
39
401. The power supply application must detect, isolate, and report individual
41input power and power FRU faults, during boot and at runtime only.
422. The power supply application must determine power supply presence,
43configuration, and status, and report via external interfaces.
443. The power supply application must report power supply failures to IPMI and
45Redfish requests (during boot and at runtime only).
464. The power supply application must report power supply present/missing changes
47and status to IPMI and Redfish requests, and to the hypervisor. Recipes and code
48for presence state monitoring and event log creation may need to be moved from
49the `phosphor-dbus-monitor` to this application, depending on if such function
50was already written or ported forward from a previously similar system.
515. The power supply application must ensure proper power supply configuration
52and report improper configurations (during boot and at runtime only).
536. The power supply application must collect and report power supply VPD (unless
54that VPD is collected and reported via another application reading an EEPROM
55device).
567. The power supply application must allow power supply hot-plug and concurrent
57maintenance (CM).
588. The power supply application should create and update average and maximum
59power consumption metric interfaces for telemetry data.
609. The power supply application must be able to detect how many power supplies
61are present in the system, what type of power supply is present (maximum output
62power such as 900W, 1400W, 2200W, etc.), and what type of input power is being
63supplied (AC input, DC input, input voltage, etc.).
6410. The application must be able to recognize if the power supplies present
65consist of a valid configuration. Certain invalid combinations may result in the
66application updating properties for a Minimum Ship Level ([MSL][3]) check.
6711. The application must create error logs for invalid configurations, or for
68power supplies experiencing some other faulted condition (no input power, output
69over voltage, output over current, etc.).
7012. The application would periodically communicate with the power supplies via
71the sysfs file system files updated via a PMBus device driver (currently only
72known to be created and updated by the [ibm-cffps][4] device driver). Certain
73device driver updates may be necessary to support some power supplies or power
74supply features. Any power supply that communicates using the PMBus
75specification should be able to be supported, some manufacturing specific code
76paths may be required for commands in the "User Data and Configuration"
77(USER_DATA_00 through USER_DATA_15) and the "Manufacturer Specific Commands"
78(MFR_SPECIFIC_00 through MFR_SPECIFIC_45), as well as bit definitions for
79STATUS_MFR_SPECIFIC and any other "MFR" command.
80
81## Proposed Design
82The proposal is to create a single new power supply application in the OpenBMC
83[phosphor-power][6] repository. The application would be written in C++17.
84
85Upon startup, the power supply application would be passed a parameter
86consisting of the location of some kind of configuration file, some JSON format
87file. This file would contain information such as the D-Bus object name(s),
88possible power supply types, possible system types that the various power
89supplies are valid to be used in, I2C/PMBus file location data, read retries,
90deglitch counts, etc.
91
92The power supply application would then detect which system type it is running
93on, which supplies are present, if the power supply is ready for reading VPD
94information, what type each supply is, etc. The application would then try to
95find a matching valid configuration. If no match is found, that configuration
96would be considered invalid. The application should continue to check what if
97any faults are occurring, logging errors as appropriate.
98
99When the system is powered on, the power supplies should start outputting power
100to the system. At that point the application will start to and continue to
101monitor the supplies and communicate any changes such as removal of input
102voltage, removal of a power supply, insertion of a power supply, and take any
103necessary actions to take upon detection of fault conditions.
104
105The proposed power supply application would not control any fans internal to the
106power supply, that function would be left to other userspace application(s).
107
108## Alternatives Considered
109The current implementation of multiple instances of a power supply monitor was
110considered, essentially similar to the [psu-monitor][5] from the
111[witherspoon-pfault-analysis][1] repository. This design was avoided due to:
112 - Complexity of the various valid and invalid configuration combinations.
113 - Power line disturbance communication.
114 - Timing/serialization concerns with power supply communication.
115
116## Impacts
117The application is expected to have some impact on the PLDM API, due to the
118various DBus properties it may be updating.
119
120No security impacts are anticipated.
121
122The main documentation impact should be this design document. Future
123enhancements or clarifications may be required for this document.
124
125The application is expected to have a similar or lesser performance impact than
126the one application per power supply.
127
128## Testing
129Testing can be accomplished via automated or manual testing to verify that:
130* Configuration not listed as valid results in appropriate behavior.
131* Application detects and logs faults for power supply faults including input
132faults, output faults, shorts, current share faults, communication failures,
133etc.
134* Power supply VPD data reported for present power supplies.
135* Power supply removal and insertion, on a system supporting concurrent
136maintenance, does not result in power loss to powered on system.
137* System operates through power supply faults and power line disturbances as
138appropriate.
139
140CI testing could be impacted if a system being used for testing is in an
141unsupported or faulted configuration.
142
143[1]: https://github.com/openbmc/witherspoon-pfault-analysis
144[2]: https://en.wikipedia.org/wiki/Power_Management_Bus
145[3]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Control/README.msl.md
146[4]: https://github.com/openbmc/linux/blob/dev-5.3/drivers/hwmon/pmbus/ibm-cffps.c
147[5]: https://github.com/openbmc/witherspoon-pfault-analysis/tree/master/power-supply
148[6]: https://github.com/openbmc/phosphor-power/
149