xref: /openbmc/docs/designs/nvmemi-over-smbus.md (revision 0ee8da09)
1### NVMe-MI over SMBus
2
3Author:
4  Tony Lee <tony.lee@quantatw.com>
5
6Created:
7  3-8-2019
8
9#### Problem Description
10
11Currently, OpenBMC does not support NVMe drive information. NVMe-MI
12specification defines a command that can read the NVMe drive information via
13SMBus directly. The NVMe drive can provide its information or status, like
14vendor ID, temperature, etc. The aim of this proposal is to allow users to
15monitor NVMe drives so appropriate action can be taken.
16
17#### Background and References
18
19NVMe-MI specification defines a command called
20`NVM Express Basic Management Command` that can read the NVMe drives
21information via SMBus directly. [1]. This command uses SMBus Block Read
22protocol specified by the SMBus specification. [2].
23
24For our purpose is retrieve NVMe drives information, therefore, using NVM
25Express Basic Management Command where describe in NVMe-MI specification to
26communicate with NVMe drives. According to different platforms, temperature
27sensor, present status, LED and power sequence will be customized.
28
29[1] NVM Express Management Interface Revision 1.0a April 8, 2017 in Appendix A.
30(https://nvmexpress.org/wp-content/uploads/NVM_Express_Management_Interface_1_0a_2017.04.08_-_gold.pdf)
31[2] System Management Bus (SMBus) Specification Version 3.0 20 Dec 2014
32(http://smbus.org/specs/SMBus_3_0_20141220.pdf)
33
34#### Requirements
35
36The implementation should:
37
38- Provide a daemon to monitor NVMe drives. Parameters to be monitored are
39  Status Flags, SMART Warnings, Temperature, Percentage Drive Life Used, Vendor
40  ID, and Serial Number.
41- Provide a D-bus interface to allow other services to access data.
42- Capability of communication over hardware channel I2C to NVMe drives.
43- Ability to turn the fault LED on/off for each drive by SmartWarnings if the
44  object path of fault LED is defined in the configuration file.
45
46#### Proposed Design
47
48Create a D-bus service "xyz.openbmc_project.nvme.manager" with object paths for
49each NVMe sensor: "/xyz/openbmc_project/sensors/temperature/nvme0",
50"/xyz/openbmc_project/sensors/temperature/nvme1", etc.
51There is a JSON configuration file for drive index, bus ID, and the fault LED
52object path for each drive.
53For example,
54
55```json
56{
57  "NvmeDriveIndex": 0,
58  "NVMeDriveBusID": 16,
59  "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault",
60  "NVMeDrivePresentPin": 148,
61  "NVMeDrivePwrGoodPin": 161
62},
63{
64  "NvmeDriveIndex": 1,
65  "NVMeDriveBusID": 17,
66  "NVMeDriveFaultLEDGroupPath": "/xyz/openbmc_project/led/groups/led_u2_0_fault",
67  "NVMeDrivePresentPin": 149,
68  "NVMeDrivePwrGoodPin": 162
69}
70```
71
72Structure like:
73
74Under the D-bus named "xyz.openbmc_project.nvme.manager":
75
76```
77    /xyz/openbmc_project
78    └─/xyz/openbmc_project/sensors
79      └─/xyz/openbmc_project/sensors/temperature/nvme0
80```
81
82/xyz/openbmc_project/sensors/temperature/nvme0
83Which implements:
84
85- xyz.openbmc_project.Sensor.Value
86- xyz.openbmc_project.Sensor.Threshold.Warning
87- xyz.openbmc_project.Sensor.Threshold.Critical
88
89Under the D-bus named "xyz.openbmc_project.Inventory.Manager":
90
91```
92/xyz/openbmc_project
93    └─/xyz/openbmc_project/inventory
94      └─/xyz/openbmc_project/inventory/system
95        └─/xyz/openbmc_project/inventory/system/chassis
96          └─/xyz/openbmc_project/inventory/system/chassis/motherboard
97           └─/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0
98```
99
100/xyz/openbmc_project/inventory/system/chassis/motherboard/nvme0
101Which implements:
102
103- xyz.openbmc_project.Inventory.Item
104- xyz.openbmc_project.Inventory.Decorator.Asset
105- xyz.openbmc_project.Nvme.Status
106
107Interface `xyz.openbmc_project.Sensor.Value`, it's for hwmon to monitor
108temperature and with the following properties:
109
110| Property | Type | Description |
111| -------- | ---- | ----------- |
112| MaxValue | int64 | Sensor maximum value |
113| MinValue | int64 | Sensor minimum value |
114| Scale | int64 | Sensor value scale |
115| Unit | string | Sensor unit |
116| Value | int64 | Sensor value |
117
118Interface `xyz.openbmc_project.Nvme.Status` with the following properties:
119
120| Property | Type | Description |
121| -------- | ---- | ----------- |
122| SmartWarnings| string | Indicates smart warnings for the state |
123| StatusFlags | string | Indicates the status of the drives |
124| DriveLifeUsed | string | A vendor specific estimate of the percentage |
125| TemperatureFault| bool | If warning type about temperature happened |
126| BackupdrivesFault | bool | If warning type about backup drives happened |
127| CapacityFault| bool | If warning type about capacity happened |
128| DegradesFault| bool | If warning type about degrades happened |
129| MediaFault| bool | If warning type about media happened |
130
131Interface `xyz.openbmc_project.Inventory.Item` with the following properties:
132
133| Property | Type | Description |
134| -------- | ---- | ----------- |
135| PrettyName| string | The human readable name of the item |
136| Present | bool | Whether or not the item is present |
137
138Interface `xyz.openbmc_project.Inventory.Decorator.Asset` with the following
139properties:
140
141| Property | Type | Description |
142| -------- | ---- | ----------- |
143| PartNumber| string | The item part number, typically a stocking number |
144| SerialNumber | string | The item serial number |
145| Manufacturer | string | The item manufacturer |
146| BuildDate| bool | The date of item manufacture in YYYYMMDD format |
147| Model | bool | The model of the item |
148
149##### xyz.openbmc_project.nvme.manager.service
150
151This service has several steps:
152
1531. It will register a D-bus called `xyz.openbmc_project.nvme.manager`
154   description above.
1552. Obtain the drive index, bus ID, GPIO present pin, power good pin and fault
156   LED object path from the json file mentioned above.
1573. Each cycle will do following steps:
158   1. Check if the present pin of target drive is true, if true, means drive
159      exists and go to next step. If not, means drive does not exists and
160      remove object path from D-bus by drive index.
161   2. Check if the power good pin of target drive is true, if true means drive
162      is ready then create object path by drive index and go to next step. If
163      not, means drive power abnormal, turn on fault LED and log in journal.
164   3. Send a NVMe-MI command via SMBus Block Read protocol by bus ID of target
165      drive to get data. Data get from NVMe drives are "Status Flags",
166      "SMART Warnings", "Temperature", "Percentage Drive Life Used",
167      "Vendor ID", and "Serial Number".
168   4. The data will be set to the properties in D-bus.
169
170This service will run automatically and look up NVMe drives every second.
171
172##### Fault LED
173
174When the value obtained from the command corresponds to one of the warning
175types, it will trigger the fault LED of corresponding device and issue events.
176
177##### Add SEL related to NVMe
178
179The events `TemperatureFault`, `BackupdrivesFault`,
180`CapacityFault`, `DegradesFault` and `MediaFault` will be generated for the
181NVMe errors.
182
183- Temperature Fault log : when the property `TemperatureFault` set to true
184- Backupdrives Fault log : when the property `BackupdrivesFault` set to true
185- Capacity Fault log : when the property `CapacityFault` set to true
186- Degrades Fault log : when the property `DegradesFault` set to true
187- Media Fault log: when the property `MediaFault` set to true
188
189#### Alternatives Considered
190
191NVMe-MI specification defines multiple commands that can communicate with
192NVMe drives over MCTP protocol. The NVMe-MI over MCTP has the following key
193capabilities:
194
195- Discover drives that are present and learn capabilities of each drives.
196- Store data about the host environment enabling a Management Controller to
197  query the data later.
198- A standard format for VPD and defined mechanisms to read/write VPD contents.
199- Inventorying, configuring and monitoring.
200
201For monitoring NVMe drives, using NVM Express Basic Management Command over
202SMBus directly is much simpler than NVMe-MI over MCTP protocol.
203
204#### Impacts
205
206This application is monitoring NVMe drives via SMbus and set values to D-bus.
207The impacts should be small in the system.
208
209#### Testing
210
211This implementation is to use NVMe-MI-Basic command over SMBus and then set the
212response data to D-bus.
213Testing will send SMBus command to the drives to get the information and compare
214with the properties in D-bus to make sure they are the same.
215The testing can be performed on different NVMe drives by different
216manufacturers.
217For example: Intel P4500/P4600 and Micron 9200 Max/Pro.
218
219Unit tests will test by function:
220
221- It tests the length of responded data is as same as design in the function
222of getting NVMe information.
223- It tests the function of setting values to D-bus is as same as design.
224- It tests the function of turn the corresponding LED ON/OFF by different
225Smartwarnings values.
226