xref: /openbmc/docs/designs/psu-monitoring.md (revision ca12debc)
1
2# Power Supply Monitoring Application
3
4Author:
5  Brandon Wyman !bjwyman
6
7Primary assignee:
8  Brandon Wyman
9
10Other contributors:
11  Derek Howard
12
13Created:
14  2019-06-17
15
16## Problem Description
17This is a proposal to provide a set of enhancements to the current OpenBMC
18power supply application for enterprise class systems. Some enterprise class
19systems may consist of a number of configuration variations including different
20power supply types and numbers. An application capable of communicating with the
21different power supplies is needed in order to initialize the power supplies,
22validate configurations, report invalid configurations, detect and report
23various faults, and report vital product data (VPD). Some of the function will
24be configurable to be included or excluded for use on different platforms.
25
26## Background and References
27The OpenBMC project currently has a [witherspoon-pfault-analysis][1] repository
28that contains a power supply monitor application and a power sequencer monitor
29application. The current power supply application is lacking things desired for
30an enterprise class server.
31
32The intent of this new application design is to enhance the OpenBMC project
33with a single power supply application that can communicate with one or more
34[PMBus][2] power supplies and provide the enterprise features currently lacking
35in the existing application that has multiple instances talking to a single
36power supply.
37
38## Requirements
39
40Some of these requirements may be deemed as business specific logic, and thus
41could be configurable options as appropriate.
42
431. The power supply application must detect, isolate, and report individual
44input power and power FRU faults, during boot and at runtime only.
452. The power supply application must determine power supply presence,
46configuration, and status, and report via external interfaces.
473. The power supply application must report power supply failures to IPMI and
48Redfish requests (during boot and at runtime only).
494. The power supply application must report power supply present/missing changes
50and status to IPMI and Redfish requests, and to the hypervisor. Recipes and code
51for presence state monitoring and event log creation may need to be moved from
52the `phosphor-dbus-monitor` to this application, depending on if such function
53was already written or ported forward from a previously similar system.
545. The power supply application must ensure proper power supply configuration
55and report improper configurations (during boot and at runtime only).
566. The power supply application must collect and report power supply VPD (unless
57that VPD is collected and reported via another application reading an EEPROM
58device).
597. The power supply application must allow power supply hot-plug and concurrent
60maintenance (CM).
618. The power supply application should create and update average and maximum
62power consumption metric interfaces for telemetry data.
639. The power supply application must be able to detect how many power supplies
64are present in the system, what type of power supply is present (maximum output
65power such as 900W, 1400W, 2200W, etc.), and what type of input power is being
66supplied (AC input, DC input, input voltage, etc.).
6710. The application must be able to recognize if the power supplies present
68consist of a valid configuration. Certain invalid combinations may result in the
69application updating properties for a Minimum Ship Level ([MSL][3]) check.
7011. The application must create error logs for invalid configurations, or for
71power supplies experiencing some other faulted condition (no input power, output
72over voltage, output over current, etc.).
7312. The application would periodically communicate with the power supplies via
74the sysfs file system files updated via a PMBus device driver (currently only
75known to be created and updated by the [ibm-cffps][4] device driver). Certain
76device driver updates may be necessary to support some power supplies or power
77supply features. Any power supply that communicates using the PMBus
78specification should be able to be supported, some manufacturing specific code
79paths may be required for commands in the "User Data and Configuration"
80(USER_DATA_00 through USER_DATA_15) and the "Manufacturer Specific Commands"
81(MFR_SPECIFIC_00 through MFR_SPECIFIC_45), as well as bit definitions for
82STATUS_MFR_SPECIFIC and any other "MFR" command.
83
84## Proposed Design
85The proposal is to create a single new power supply application in the OpenBMC
86[phosphor-power][6] repository. The application would be written in C++17.
87
88Upon startup, the power supply application would be passed a parameter
89consisting of the location of some kind of configuration file, some JSON format
90file. This file would contain information such as the D-Bus object name(s),
91possible power supply types, possible system types that the various power
92supplies are valid to be used in, I2C/PMBus file location data, read retries,
93deglitch counts, etc.
94
95The power supply application would then detect which system type it is running
96on, which supplies are present, if the power supply is ready for reading VPD
97information, what type each supply is, etc. The application would then try to
98find a matching valid configuration. If no match is found, that configuration
99would be considered invalid. The application should continue to check what if
100any faults are occurring, logging errors as appropriate.
101
102When the system is powered on, the power supplies should start outputting power
103to the system. At that point the application will start to and continue to
104monitor the supplies and communicate any changes such as removal of input
105voltage, removal of a power supply, insertion of a power supply, and take any
106necessary actions to take upon detection of fault conditions.
107
108The proposed power supply application would not control any fans internal to the
109power supply, that function would be left to other userspace application(s).
110
111## Alternatives Considered
112The current implementation of multiple instances of a power supply monitor was
113considered, essentially similar to the [psu-monitor][5] from the
114[witherspoon-pfault-analysis][1] repository. This design was avoided due to:
115 - Complexity of the various valid and invalid configuration combinations.
116 - Power line disturbance communication.
117 - Timing/serialization concerns with power supply communication.
118
119## Impacts
120The application is expected to have some impact on the PLDM API, due to the
121various DBus properties it may be updating.
122
123No security impacts are anticipated.
124
125The main documentation impact should be this design document. Future
126enhancements or clarifications may be required for this document.
127
128The application is expected to have a similar or lesser performance impact than
129the one application per power supply.
130
131## Testing
132Testing can be accomplished via automated or manual testing to verify that:
133* Configuration not listed as valid results in appropriate behavior.
134* Application detects and logs faults for power supply faults including input
135faults, output faults, shorts, current share faults, communication failures,
136etc.
137* Power supply VPD data reported for present power supplies.
138* Power supply removal and insertion, on a system supporting concurrent
139maintenance, does not result in power loss to powered on system.
140* System operates through power supply faults and power line disturbances as
141appropriate.
142
143CI testing could be impacted if a system being used for testing is in an
144unsupported or faulted configuration.
145
146[1]: https://github.com/openbmc/witherspoon-pfault-analysis
147[2]: https://en.wikipedia.org/wiki/Power_Management_Bus
148[3]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Control/README.msl.md
149[4]: https://github.com/openbmc/linux/blob/dev-5.3/drivers/hwmon/pmbus/ibm-cffps.c
150[5]: https://github.com/openbmc/witherspoon-pfault-analysis/tree/master/power-supply
151[6]: https://github.com/openbmc/phosphor-power/
152