xref: /openbmc/docs/designs/psu-monitoring.md (revision f4febd00)
1fd60b601SBrandon Wyman# Power Supply Monitoring Application
2fd60b601SBrandon Wyman
3*f4febd00SPatrick WilliamsAuthor: Brandon Wyman !bjwyman
4fd60b601SBrandon Wyman
5*f4febd00SPatrick WilliamsOther contributors: Derek Howard
6fd60b601SBrandon Wyman
7*f4febd00SPatrick WilliamsCreated: 2019-06-17
8fd60b601SBrandon Wyman
9fd60b601SBrandon Wyman## Problem Description
10*f4febd00SPatrick Williams
11*f4febd00SPatrick WilliamsThis is a proposal to provide a set of enhancements to the current OpenBMC power
12*f4febd00SPatrick Williamssupply application for enterprise class systems. Some enterprise class systems
13*f4febd00SPatrick Williamsmay consist of a number of configuration variations including different power
14*f4febd00SPatrick Williamssupply types and numbers. An application capable of communicating with the
15fd60b601SBrandon Wymandifferent power supplies is needed in order to initialize the power supplies,
16fd60b601SBrandon Wymanvalidate configurations, report invalid configurations, detect and report
17fd60b601SBrandon Wymanvarious faults, and report vital product data (VPD). Some of the function will
18fd60b601SBrandon Wymanbe configurable to be included or excluded for use on different platforms.
19fd60b601SBrandon Wyman
20fd60b601SBrandon Wyman## Background and References
21*f4febd00SPatrick Williams
22fd60b601SBrandon WymanThe OpenBMC project currently has a [witherspoon-pfault-analysis][1] repository
23fd60b601SBrandon Wymanthat contains a power supply monitor application and a power sequencer monitor
24fd60b601SBrandon Wymanapplication. The current power supply application is lacking things desired for
25fd60b601SBrandon Wymanan enterprise class server.
26fd60b601SBrandon Wyman
27*f4febd00SPatrick WilliamsThe intent of this new application design is to enhance the OpenBMC project with
28*f4febd00SPatrick Williamsa single power supply application that can communicate with one or more
29fd60b601SBrandon Wyman[PMBus][2] power supplies and provide the enterprise features currently lacking
30fd60b601SBrandon Wymanin the existing application that has multiple instances talking to a single
31fd60b601SBrandon Wymanpower supply.
32fd60b601SBrandon Wyman
33fd60b601SBrandon Wyman## Requirements
34fd60b601SBrandon Wyman
35fd60b601SBrandon WymanSome of these requirements may be deemed as business specific logic, and thus
36fd60b601SBrandon Wymancould be configurable options as appropriate.
37fd60b601SBrandon Wyman
38fd60b601SBrandon Wyman1. The power supply application must detect, isolate, and report individual
39fd60b601SBrandon Wyman   input power and power FRU faults, during boot and at runtime only.
40fd60b601SBrandon Wyman2. The power supply application must determine power supply presence,
41fd60b601SBrandon Wyman   configuration, and status, and report via external interfaces.
42fd60b601SBrandon Wyman3. The power supply application must report power supply failures to IPMI and
43fd60b601SBrandon Wyman   Redfish requests (during boot and at runtime only).
44fd60b601SBrandon Wyman4. The power supply application must report power supply present/missing changes
45*f4febd00SPatrick Williams   and status to IPMI and Redfish requests, and to the hypervisor. Recipes and
46*f4febd00SPatrick Williams   code for presence state monitoring and event log creation may need to be
47*f4febd00SPatrick Williams   moved from the `phosphor-dbus-monitor` to this application, depending on if
48*f4febd00SPatrick Williams   such function was already written or ported forward from a previously similar
49*f4febd00SPatrick Williams   system.
50fd60b601SBrandon Wyman5. The power supply application must ensure proper power supply configuration
51fd60b601SBrandon Wyman   and report improper configurations (during boot and at runtime only).
52fd60b601SBrandon Wyman6. The power supply application must collect and report power supply VPD (unless
53fd60b601SBrandon Wyman   that VPD is collected and reported via another application reading an EEPROM
54fd60b601SBrandon Wyman   device).
55fd60b601SBrandon Wyman7. The power supply application must allow power supply hot-plug and concurrent
56fd60b601SBrandon Wyman   maintenance (CM).
57fd60b601SBrandon Wyman8. The power supply application should create and update average and maximum
58fd60b601SBrandon Wyman   power consumption metric interfaces for telemetry data.
59fd60b601SBrandon Wyman9. The power supply application must be able to detect how many power supplies
60*f4febd00SPatrick Williams   are present in the system, what type of power supply is present (maximum
61*f4febd00SPatrick Williams   output power such as 900W, 1400W, 2200W, etc.), and what type of input power
62*f4febd00SPatrick Williams   is being supplied (AC input, DC input, input voltage, etc.).
63fd60b601SBrandon Wyman10. The application must be able to recognize if the power supplies present
64*f4febd00SPatrick Williams    consist of a valid configuration. Certain invalid combinations may result in
65*f4febd00SPatrick Williams    the application updating properties for a Minimum Ship Level ([MSL][3])
66*f4febd00SPatrick Williams    check.
67fd60b601SBrandon Wyman11. The application must create error logs for invalid configurations, or for
68*f4febd00SPatrick Williams    power supplies experiencing some other faulted condition (no input power,
69*f4febd00SPatrick Williams    output over voltage, output over current, etc.).
70fd60b601SBrandon Wyman12. The application would periodically communicate with the power supplies via
71*f4febd00SPatrick Williams    the sysfs file system files updated via a PMBus device driver (currently
72*f4febd00SPatrick Williams    only known to be created and updated by the [ibm-cffps][4] device driver).
73*f4febd00SPatrick Williams    Certain device driver updates may be necessary to support some power
74*f4febd00SPatrick Williams    supplies or power supply features. Any power supply that communicates using
75*f4febd00SPatrick Williams    the PMBus specification should be able to be supported, some manufacturing
76*f4febd00SPatrick Williams    specific code paths may be required for commands in the "User Data and
77*f4febd00SPatrick Williams    Configuration" (USER_DATA_00 through USER_DATA_15) and the "Manufacturer
78*f4febd00SPatrick Williams    Specific Commands" (MFR_SPECIFIC_00 through MFR_SPECIFIC_45), as well as bit
79*f4febd00SPatrick Williams    definitions for STATUS_MFR_SPECIFIC and any other "MFR" command.
80fd60b601SBrandon Wyman
81fd60b601SBrandon Wyman## Proposed Design
82*f4febd00SPatrick Williams
8327f4ba99SBrandon WymanThe proposal is to create a single new power supply application in the OpenBMC
8427f4ba99SBrandon Wyman[phosphor-power][6] repository. The application would be written in C++17.
85fd60b601SBrandon Wyman
86fd60b601SBrandon WymanUpon startup, the power supply application would be passed a parameter
87fd60b601SBrandon Wymanconsisting of the location of some kind of configuration file, some JSON format
88fd60b601SBrandon Wymanfile. This file would contain information such as the D-Bus object name(s),
89fd60b601SBrandon Wymanpossible power supply types, possible system types that the various power
90fd60b601SBrandon Wymansupplies are valid to be used in, I2C/PMBus file location data, read retries,
91fd60b601SBrandon Wymandeglitch counts, etc.
92fd60b601SBrandon Wyman
93fd60b601SBrandon WymanThe power supply application would then detect which system type it is running
94fd60b601SBrandon Wymanon, which supplies are present, if the power supply is ready for reading VPD
95fd60b601SBrandon Wymaninformation, what type each supply is, etc. The application would then try to
96fd60b601SBrandon Wymanfind a matching valid configuration. If no match is found, that configuration
97fd60b601SBrandon Wymanwould be considered invalid. The application should continue to check what if
98fd60b601SBrandon Wymanany faults are occurring, logging errors as appropriate.
99fd60b601SBrandon Wyman
100fd60b601SBrandon WymanWhen the system is powered on, the power supplies should start outputting power
101fd60b601SBrandon Wymanto the system. At that point the application will start to and continue to
102fd60b601SBrandon Wymanmonitor the supplies and communicate any changes such as removal of input
103fd60b601SBrandon Wymanvoltage, removal of a power supply, insertion of a power supply, and take any
104fd60b601SBrandon Wymannecessary actions to take upon detection of fault conditions.
105fd60b601SBrandon Wyman
106fd60b601SBrandon WymanThe proposed power supply application would not control any fans internal to the
107fd60b601SBrandon Wymanpower supply, that function would be left to other userspace application(s).
108fd60b601SBrandon Wyman
109fd60b601SBrandon Wyman## Alternatives Considered
110*f4febd00SPatrick Williams
111fd60b601SBrandon WymanThe current implementation of multiple instances of a power supply monitor was
112fd60b601SBrandon Wymanconsidered, essentially similar to the [psu-monitor][5] from the
113fd60b601SBrandon Wyman[witherspoon-pfault-analysis][1] repository. This design was avoided due to:
114*f4febd00SPatrick Williams
115fd60b601SBrandon Wyman- Complexity of the various valid and invalid configuration combinations.
116fd60b601SBrandon Wyman- Power line disturbance communication.
117fd60b601SBrandon Wyman- Timing/serialization concerns with power supply communication.
118fd60b601SBrandon Wyman
119fd60b601SBrandon Wyman## Impacts
120*f4febd00SPatrick Williams
121fd60b601SBrandon WymanThe application is expected to have some impact on the PLDM API, due to the
122fd60b601SBrandon Wymanvarious DBus properties it may be updating.
123fd60b601SBrandon Wyman
124fd60b601SBrandon WymanNo security impacts are anticipated.
125fd60b601SBrandon Wyman
126fd60b601SBrandon WymanThe main documentation impact should be this design document. Future
127fd60b601SBrandon Wymanenhancements or clarifications may be required for this document.
128fd60b601SBrandon Wyman
129fd60b601SBrandon WymanThe application is expected to have a similar or lesser performance impact than
130fd60b601SBrandon Wymanthe one application per power supply.
131fd60b601SBrandon Wyman
132fd60b601SBrandon Wyman## Testing
133*f4febd00SPatrick Williams
134fd60b601SBrandon WymanTesting can be accomplished via automated or manual testing to verify that:
135*f4febd00SPatrick Williams
136*f4febd00SPatrick Williams- Configuration not listed as valid results in appropriate behavior.
137*f4febd00SPatrick Williams- Application detects and logs faults for power supply faults including input
138fd60b601SBrandon Wyman  faults, output faults, shorts, current share faults, communication failures,
139fd60b601SBrandon Wyman  etc.
140*f4febd00SPatrick Williams- Power supply VPD data reported for present power supplies.
141*f4febd00SPatrick Williams- Power supply removal and insertion, on a system supporting concurrent
142fd60b601SBrandon Wyman  maintenance, does not result in power loss to powered on system.
143*f4febd00SPatrick Williams- System operates through power supply faults and power line disturbances as
144fd60b601SBrandon Wyman  appropriate.
145fd60b601SBrandon Wyman
146fd60b601SBrandon WymanCI testing could be impacted if a system being used for testing is in an
147fd60b601SBrandon Wymanunsupported or faulted configuration.
148fd60b601SBrandon Wyman
149fd60b601SBrandon Wyman[1]: https://github.com/openbmc/witherspoon-pfault-analysis
150fd60b601SBrandon Wyman[2]: https://en.wikipedia.org/wiki/Power_Management_Bus
151*f4febd00SPatrick Williams[3]:
152*f4febd00SPatrick Williams  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Control/README.msl.md
153*f4febd00SPatrick Williams[4]:
154*f4febd00SPatrick Williams  https://github.com/openbmc/linux/blob/dev-5.3/drivers/hwmon/pmbus/ibm-cffps.c
155*f4febd00SPatrick Williams[5]:
156*f4febd00SPatrick Williams  https://github.com/openbmc/witherspoon-pfault-analysis/tree/master/power-supply
15727f4ba99SBrandon Wyman[6]: https://github.com/openbmc/phosphor-power/
158