xref: /openbmc/docs/designs/psu-monitoring.md (revision f4febd00)
1# Power Supply Monitoring Application
2
3Author: Brandon Wyman !bjwyman
4
5Other contributors: Derek Howard
6
7Created: 2019-06-17
8
9## Problem Description
10
11This is a proposal to provide a set of enhancements to the current OpenBMC power
12supply application for enterprise class systems. Some enterprise class systems
13may consist of a number of configuration variations including different power
14supply types and numbers. An application capable of communicating with the
15different power supplies is needed in order to initialize the power supplies,
16validate configurations, report invalid configurations, detect and report
17various faults, and report vital product data (VPD). Some of the function will
18be configurable to be included or excluded for use on different platforms.
19
20## Background and References
21
22The OpenBMC project currently has a [witherspoon-pfault-analysis][1] repository
23that contains a power supply monitor application and a power sequencer monitor
24application. The current power supply application is lacking things desired for
25an enterprise class server.
26
27The intent of this new application design is to enhance the OpenBMC project with
28a single power supply application that can communicate with one or more
29[PMBus][2] power supplies and provide the enterprise features currently lacking
30in the existing application that has multiple instances talking to a single
31power supply.
32
33## Requirements
34
35Some of these requirements may be deemed as business specific logic, and thus
36could be configurable options as appropriate.
37
381. The power supply application must detect, isolate, and report individual
39   input power and power FRU faults, during boot and at runtime only.
402. The power supply application must determine power supply presence,
41   configuration, and status, and report via external interfaces.
423. The power supply application must report power supply failures to IPMI and
43   Redfish requests (during boot and at runtime only).
444. The power supply application must report power supply present/missing changes
45   and status to IPMI and Redfish requests, and to the hypervisor. Recipes and
46   code for presence state monitoring and event log creation may need to be
47   moved from the `phosphor-dbus-monitor` to this application, depending on if
48   such function was already written or ported forward from a previously similar
49   system.
505. The power supply application must ensure proper power supply configuration
51   and report improper configurations (during boot and at runtime only).
526. The power supply application must collect and report power supply VPD (unless
53   that VPD is collected and reported via another application reading an EEPROM
54   device).
557. The power supply application must allow power supply hot-plug and concurrent
56   maintenance (CM).
578. The power supply application should create and update average and maximum
58   power consumption metric interfaces for telemetry data.
599. The power supply application must be able to detect how many power supplies
60   are present in the system, what type of power supply is present (maximum
61   output power such as 900W, 1400W, 2200W, etc.), and what type of input power
62   is being supplied (AC input, DC input, input voltage, etc.).
6310. The application must be able to recognize if the power supplies present
64    consist of a valid configuration. Certain invalid combinations may result in
65    the application updating properties for a Minimum Ship Level ([MSL][3])
66    check.
6711. The application must create error logs for invalid configurations, or for
68    power supplies experiencing some other faulted condition (no input power,
69    output over voltage, output over current, etc.).
7012. The application would periodically communicate with the power supplies via
71    the sysfs file system files updated via a PMBus device driver (currently
72    only known to be created and updated by the [ibm-cffps][4] device driver).
73    Certain device driver updates may be necessary to support some power
74    supplies or power supply features. Any power supply that communicates using
75    the PMBus specification should be able to be supported, some manufacturing
76    specific code paths may be required for commands in the "User Data and
77    Configuration" (USER_DATA_00 through USER_DATA_15) and the "Manufacturer
78    Specific Commands" (MFR_SPECIFIC_00 through MFR_SPECIFIC_45), as well as bit
79    definitions for STATUS_MFR_SPECIFIC and any other "MFR" command.
80
81## Proposed Design
82
83The proposal is to create a single new power supply application in the OpenBMC
84[phosphor-power][6] repository. The application would be written in C++17.
85
86Upon startup, the power supply application would be passed a parameter
87consisting of the location of some kind of configuration file, some JSON format
88file. This file would contain information such as the D-Bus object name(s),
89possible power supply types, possible system types that the various power
90supplies are valid to be used in, I2C/PMBus file location data, read retries,
91deglitch counts, etc.
92
93The power supply application would then detect which system type it is running
94on, which supplies are present, if the power supply is ready for reading VPD
95information, what type each supply is, etc. The application would then try to
96find a matching valid configuration. If no match is found, that configuration
97would be considered invalid. The application should continue to check what if
98any faults are occurring, logging errors as appropriate.
99
100When the system is powered on, the power supplies should start outputting power
101to the system. At that point the application will start to and continue to
102monitor the supplies and communicate any changes such as removal of input
103voltage, removal of a power supply, insertion of a power supply, and take any
104necessary actions to take upon detection of fault conditions.
105
106The proposed power supply application would not control any fans internal to the
107power supply, that function would be left to other userspace application(s).
108
109## Alternatives Considered
110
111The current implementation of multiple instances of a power supply monitor was
112considered, essentially similar to the [psu-monitor][5] from the
113[witherspoon-pfault-analysis][1] repository. This design was avoided due to:
114
115- Complexity of the various valid and invalid configuration combinations.
116- Power line disturbance communication.
117- Timing/serialization concerns with power supply communication.
118
119## Impacts
120
121The application is expected to have some impact on the PLDM API, due to the
122various DBus properties it may be updating.
123
124No security impacts are anticipated.
125
126The main documentation impact should be this design document. Future
127enhancements or clarifications may be required for this document.
128
129The application is expected to have a similar or lesser performance impact than
130the one application per power supply.
131
132## Testing
133
134Testing can be accomplished via automated or manual testing to verify that:
135
136- Configuration not listed as valid results in appropriate behavior.
137- Application detects and logs faults for power supply faults including input
138  faults, output faults, shorts, current share faults, communication failures,
139  etc.
140- Power supply VPD data reported for present power supplies.
141- Power supply removal and insertion, on a system supporting concurrent
142  maintenance, does not result in power loss to powered on system.
143- System operates through power supply faults and power line disturbances as
144  appropriate.
145
146CI testing could be impacted if a system being used for testing is in an
147unsupported or faulted configuration.
148
149[1]: https://github.com/openbmc/witherspoon-pfault-analysis
150[2]: https://en.wikipedia.org/wiki/Power_Management_Bus
151[3]:
152  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Control/README.msl.md
153[4]:
154  https://github.com/openbmc/linux/blob/dev-5.3/drivers/hwmon/pmbus/ibm-cffps.c
155[5]:
156  https://github.com/openbmc/witherspoon-pfault-analysis/tree/master/power-supply
157[6]: https://github.com/openbmc/phosphor-power/
158