xref: /openbmc/docs/designs/psu-monitoring.md (revision eec71886)
1
2# Power Supply Monitoring Application
3
4Author:
5  Brandon Wyman !bjwyman
6
7Primary assignee:
8  Brandon Wyman
9
10Other contributors:
11  Derek Howard
12
13Created:
14  2019-06-17
15
16## Problem Description
17This is a proposal to provide a set of enhancements to the current OpenBMC
18power supply application for enterprise class systems. Some enterprise class
19systems may consist of a number of configuration variations including different
20power supply types and numbers. An application capable of communicating with the
21different power supplies is needed in order to initialize the power supplies,
22validate configurations, report invalid configurations, detect and report
23various faults, and report vital product data (VPD). Some of the function will
24be configurable to be included or excluded for use on different platforms.
25
26## Background and References
27The OpenBMC project currently has a [witherspoon-pfault-analysis][1] repository
28that contains a power supply monitor application and a power sequencer monitor
29application. The current power supply application is lacking things desired for
30an enterprise class server.
31
32The intent of this new application design is to enhance the OpenBMC project
33with a single power supply application that can communicate with one or more
34[PMBus][2] power supplies and provide the enterprise features currently lacking
35in the existing application that has multiple instances talking to a single
36power supply.
37
38## Requirements
39
40Some of these requirements may be deemed as business specific logic, and thus
41could be configurable options as appropriate.
42
431. The power supply application must detect, isolate, and report individual
44input power and power FRU faults, during boot and at runtime only.
452. The power supply application must determine power supply presence,
46configuration, and status, and report via external interfaces.
473. The power supply application must report power supply failures to IPMI and
48Redfish requests (during boot and at runtime only).
494. The power supply application must report power supply present/missing changes
50and status to IPMI and Redfish requests, and to the hypervisor. Recipes and code
51for presence state monitoring and event log creation may need to be moved from
52the `phosphor-dbus-monitor` to this application, depending on if such function
53was already written or ported forward from a previously similar system.
545. The power supply application must ensure proper power supply configuration
55and report improper configurations (during boot and at runtime only).
566. The power supply application must collect and report power supply VPD (unless
57that VPD is collected and reported via another application reading an EEPROM
58device).
597. The power supply application must allow power supply hot-plug and concurrent
60maintenance (CM).
618. The power supply application should create and update average and maximum
62power consumption metric interfaces for telemetry data.
639. The power supply application must be able to detect how many power supplies
64are present in the system, what type of power supply is present (maximum output
65power such as 900W, 1400W, 2200W, etc.), and what type of input power is being
66supplied (AC input, DC input, input voltage, etc.).
6710. The application must be able to recognize if the power supplies present
68consist of a valid configuration. Certain invalid combinations may result in the
69application updating properties for a Minimum Ship Level ([MSL][3]) check.
7011. The application must create error logs for invalid configurations, or for
71power supplies experiencing some other faulted condition (no input power, output
72over voltage, output over current, etc.).
7312. The application would periodically communicate with the power supplies via
74the sysfs file system files updated via a PMBus device driver (currently only
75known to be created and updated by the [ibm-cffps][4] device driver). Certain
76device driver updates may be necessary to support some power supplies or power
77supply features. Any power supply that communicates using the PMBus
78specification should be able to be supported, some manufacturing specific code
79paths may be required for commands in the "User Data and Configuration"
80(USER_DATA_00 through USER_DATA_15) and the "Manufacturer Specific Commands"
81(MFR_SPECIFIC_00 through MFR_SPECIFIC_45), as well as bit definitions for
82STATUS_MFR_SPECIFIC and any other "MFR" command.
83
84## Proposed Design
85The proposal is to create a single new power supply application in some new
86OpenBMC repository, such as `phosphor-power-monitor`. The application would be
87written in C++17.
88
89Upon startup, the power supply application would be passed a parameter
90consisting of the location of some kind of configuration file, some JSON format
91file. This file would contain information such as the D-Bus object name(s),
92possible power supply types, possible system types that the various power
93supplies are valid to be used in, I2C/PMBus file location data, read retries,
94deglitch counts, etc.
95
96The power supply application would then detect which system type it is running
97on, which supplies are present, if the power supply is ready for reading VPD
98information, what type each supply is, etc. The application would then try to
99find a matching valid configuration. If no match is found, that configuration
100would be considered invalid. The application should continue to check what if
101any faults are occurring, logging errors as appropriate.
102
103When the system is powered on, the power supplies should start outputting power
104to the system. At that point the application will start to and continue to
105monitor the supplies and communicate any changes such as removal of input
106voltage, removal of a power supply, insertion of a power supply, and take any
107necessary actions to take upon detection of fault conditions.
108
109The proposed power supply application would not control any fans internal to the
110power supply, that function would be left to other userspace application(s).
111
112## Alternatives Considered
113The current implementation of multiple instances of a power supply monitor was
114considered, essentially similar to the [psu-monitor][5] from the
115[witherspoon-pfault-analysis][1] repository. This design was avoided due to:
116 - Complexity of the various valid and invalid configuration combinations.
117 - Power line disturbance communication.
118 - Timing/serialization concerns with power supply communication.
119
120## Impacts
121The application is expected to have some impact on the PLDM API, due to the
122various DBus properties it may be updating.
123
124No security impacts are anticipated.
125
126The main documentation impact should be this design document. Future
127enhancements or clarifications may be required for this document.
128
129The application is expected to have a similar or lesser performance impact than
130the one application per power supply.
131
132## Testing
133Testing can be accomplished via automated or manual testing to verify that:
134* Configuration not listed as valid results in appropriate behavior.
135* Application detects and logs faults for power supply faults including input
136faults, output faults, shorts, current share faults, communication failures,
137etc.
138* Power supply VPD data reported for present power supplies.
139* Power supply removal and insertion, on a system supporting concurrent
140maintenance, does not result in power loss to powered on system.
141* System operates through power supply faults and power line disturbances as
142appropriate.
143
144CI testing could be impacted if a system being used for testing is in an
145unsupported or faulted configuration.
146
147[1]: https://github.com/openbmc/witherspoon-pfault-analysis
148[2]: https://en.wikipedia.org/wiki/Power_Management_Bus
149[3]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Control/README.msl.md
150[4]: https://github.com/openbmc/linux/blob/dev-5.1/drivers/hwmon/pmbus/ibm-cffps.c
151[5]: https://github.com/openbmc/witherspoon-pfault-analysis/tree/master/power-supply
152