xref: /openbmc/docs/designs/telemetry.md (revision 31de159f86a42b643858e33eed4840dbdd6dd9f8)
1# OpenBMC platform telemetry
2
3Author:
4  Piotr Matuszczak <piotr.matuszczak@intel.com>
5
6Primary assignee:
7  Piotr Matuszczak
8
9Other contributors:
10  Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>,
11  Kamil Kowalski <kamil.kowalski@intel.com>
12
13Created:
14  2019-08-07
15
16## Problem Description
17The BMC on server platform gathers lots of telemetry data, which has to
18be exposed in clean, human readable and standardized format. This document
19focuses on telemetry over the Redfish, since it is standard API
20for platform manageability.
21
22## Background and References
23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1]
24for exposing platform telemetry over the network.
25* OpenBMC platform telemetry shall leverage the
26[OpenBMC sensors architecture implementation][2].
27* OpenBMC platform telemetry shall implement a service, called Monitoring
28Service to deal with metrics report and trigger management. This service
29is described later in this document.
30* Although we use the [hwmon][3] to gather readings from physical sensors, this
31architecture does not depend on it, because the Monitoring Service component
32relies on the [OpenBMC D-Bus sensors][2].
33
34
35## Requirements
36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since
37the Monitoring Service requires telemetry sources to be implemented as
38D-Bus sensors.
39
40
41## Proposed Design
42Redfish Telemetry Model shall implement Telemetry Service with the following
43collection resources:
44* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.)
45* Metric Report Definitions - defines how metric report shall be created
46(which metrics it shall contain, how often it shall be generated etc.)
47* Metric Reports - contains actual metric reports containing telemetry data
48generated according to the Metric Report Definitions
49* Metric Triggers - contains thresholds and actions that apply to specific
50metrics
51
52OpenBMC telemetry architecture is shown on the diagram below.
53
54```ascii
55   +--------------+               +----------------+     +-----------------+
56   |hwmon|        |               |Dbus sensors|   |     |Monitoring|      |
57   +-----/        |               +------------/   |     |service   |      |
58   |              +--filesystem--->                |     +----------/      |
59   |              |               |                |     |                 |
60   +--------------+               +--------^-------+     +--------^--------+
61                                           |                      |
62                                           |                      |
63<------------------------------------------v-----^--DBus----------v----------->
64                                                 |
65                                                 |
66+-------+---------------------------------------------------------------------+
67|bmcweb |                                        |                            |
68+-------/                                        |                            |
69|                                                |                            |
70| +--------+-------------------------------------v--------------------------+ |
71| |Redfish |                                                                | |
72| +--------/                                            +---------+-------+ | |
73| |                                                     |Existing |       | | |
74| | +------------------------------------------------+  |Redfish  |       | | |
75| | |Telemetry Service|                              |  |resources|       | | |
76| | +----------------+/                              |  +---------/       | | |
77| | |  +----------+  +-----------+  +-------------+  |  |   +---------+   | | |
78| | |  |  Metric  |  |  Metric   |  |Metric report|  |  |   | Redfish |   | | |
79| | |  | triggers |  |definitions|  |definitions  <---------+ sensors |   | | |
80| | |  |          |  |           |  |             |  |  |   |         |   | | |
81| | |  +----+-----+  +-----+-----+  +------+------+  |  |   +---------+   | | |
82| | |       |              |               |         |  |                 | | |
83| | |       |              |               |         |  |                 | | |
84| | |       |              |               |         |  |                 | | |
85| | |       |        +-----v-----+         |         |  |                 | | |
86| | |       |        |   Metric  |         |         |  |                 | | |
87| | |       +-------->   report  <---------+         |  |                 | | |
88| | |                |           |                   |  |                 | | |
89| | |                +-----------+                   |  |                 | | |
90| | |                                                |  |                 | | |
91| | +------------------------------------------------+  +-----------------+ | |
92| |                                                                         | |
93| +-------------------------------------------------------------------------+ |
94|                                                                             |
95+-----------------------------------------------------------------------------+
96```
97
98The telemetry service component is a part of Redfish and implements the DMTF's
99[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors
100URIs for metric report creation. Those sensors are also used to get
101URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation
102layer for the telemetry, while Monitoring Service is responsible for gathering
103metrics from D-Bus sensors and exposing them as D-Bus objects. Monitoring
104Service supports different monitoring modes (periodic, on change and on demand)
105along with aggregated operations:
106* SINGLE - current reading value
107* AVERAGE - average value over defined time period
108* MAX - max reading value during defined time period
109* MIN - min reading value during defined time period
110* SUM - sum of reading values over defined time period
111
112The time period for calculating aggregated is taken from the Redfish Metric
113Definition resource for each sensor's metric.
114
115Monitoring Service supports creating and managing metric report, which may
116contain single or multiple metrics from sensors. This metric report is mapped
117to Metric Report for the Redfish Telemetry Service.
118
119The diagram below shows the flows for creation and update of metric report.
120
121```ascii
122+----+              +------+              +----------+                +-------+
123|User|              |bmcweb|              |Monitoring|                | D-Bus |
124+-+--+              +--+---+              | Service  |                |Sensors|
125  |                    |                  +----------+                +---+---+
126  |                    |                       |                          |
127+-----------------------------------------------------------------------------+
128|Metric report definition flow|                |                          |   |
129+-----------------------------+                |                          |   |
130| |                    |                       |                          |   |
131| |                    |                       |                          |   |
132| |    POST request    |                       |                          |   |
133| |    with metric     |                       |                          |   |
134| |    report          |                       |                          |   |
135| |    definition      |                       |                          |   |
136| +-------------------->  Invoke AddReport     |  Register for D-Bus      |   |
137| |                    |  method on D-Bus      |  sensors                 |   |
138| |                    +----------------------->  PropertiesChanged       |   |
139| |                    |                       |  signals                 |   |
140| |                    |                       +-------------------------->   |
141| |                    |                       |-------------------------->   |
142| |                    |                       +-------------------------->   |
143| |                    |                       |                          |   |
144| |  HTTP response     |                       +-+Create Report           |   |
145| |  code 201 with     |  Return created       | |D-Bus object            |   |
146| |  Metric Report     |  Report D-Bus path    <-+                        |   |
147| |  Definition's URI  <-----------------------+                          |   |
148| <--------------------+                       |                          |   |
149| |                    |                       |                          |   |
150| |                    |                       |                          |   |
151+-----------------------------------------------------------------------------+
152  |                    |                       |                          |
153+-----------------------------------------------------------------------------+
154|Periodic metric report update flow|           |                          |   |
155+----------------------------------+           +-+Metric report           |   |
156| |                    |                       | |timer triggers          |   |
157| |                    |                       <-+report update           |   |
158| |                    |                       |                          |   |
159+----------------------------------Optional-----------------------------------+
160| |                    |                       |                          |   |
161| |  Send report as SSE or push-style event    |                          |   |
162| |  using Redfish Event Service (not shown    |                          |   |
163| |  here) if configured to do so.             |                          |   |
164| <--------------------------------------------+                          |   |
165| |                    |                       |                          |   |
166+-----------------------------------------------------------------------------+
167| |  GET on Metric     |                       |                          |   |
168| |  Report URI        |                       |   Sensor's Properties-   |   |
169| +-------------------->                       |   Changed signal         |   |
170| |                    +-+Map report's URI     <--------------------------+   |
171| |                    | |to D-Bus path        |                          |   |
172| |                    <-+                     | +----------------------+ |   |
173| |                    | Invoke GetAll method  | |Note that sensor's    | |   |
174| |                    | on report D-Bus       | |PropertiesChanged     | |   |
175| |                    | object                | |signal is asynchronous| |   |
176| |                    +-----------------------> |to metric report timer| |   |
177| |                    |                       | |This timer is the only| |   |
178| |  Return metric     | Return report data    | |thing that triggers   | |   |
179| |  report in JSON    <-----------------------+ |metric report update  | |   |
180| |  format            |                       | +----------------------+ |   |
181| <--------------------+                       |                          |   |
182| |                    |                       |                          |   |
183+-----------------------------------------------------------------------------+
184  |                    |                       |                          |
185+-----------------------------------------------------------------------------+
186|On change metric report update flow|          |   Sensor's Properties-   |   |
187+-----------------------------------+          |   Changed signal         |   |
188| |                    |                       <--------------------------+   |
189| |                    |                       |                          |   |
190| |                    |                       +-+Sensor's signal         |   |
191| |                    |                       | |triggers report         |   |
192| |                    |                       <-+update                  |   |
193| |                    |                       |                          |   |
194+----------------------------------Optional-----------------------------------+
195| |                    |                       |                          |   |
196| |  Send report as SSE or push-style event    |                          |   |
197| |  using Redfish Event Service (not shown    |                          |   |
198| |  here) if configured to do so.             |                          |   |
199| <--------------------------------------------+                          |   |
200| |                    |                       |                          |   |
201+-----------------------------------------------------------------------------+
202| |  GET on Metric     |                       |                          |   |
203| |  Report URI        |                       |                          |   |
204| +-------------------->                       |                          |   |
205| |                    +-+Map report's URI     |                          |   |
206| |                    | |to D-Bus path        | +----------------------+ |   |
207| |                    <-+                     | |Note that sensor's    | |   |
208| |                    | Invoke GetAll method  | |PropertiesChanged     | |   |
209| |                    | on report D-Bus       | |signal triggers the   | |   |
210| |                    | object                | |report update. It is  | |   |
211| |                    +-----------------------> |sufficient that the   | |   |
212| |                    |                       | |signal from only one  | |   |
213| |  Return metric     | Return report data    | |sensor triggers report| |   |
214| |  report in JSON    <-----------------------+ |update.               | |   |
215| |  format            |                       | +----------------------+ |   |
216| <--------------------+                       |                          |   |
217| |                    |                       |                          |   |
218+-----------------------------------------------------------------------------+
219  |                    |                       |                          |
220+-+--------------------+------------------------------------------------------+
221|On demand metric report update flow|          |                          |   |
222+-+--------------------+------------+          |                          |   |
223| |                    |                       |                          |   |
224| |  GET on Metric     |                       |                          |   |
225| |  Report URI        |                       |                          |   |
226| +-------------------->                       |                          |   |
227| |                    +-+Map report's URI     |                          |   |
228| |                    | |to D-Bus path        |                          |   |
229| |                    <-+                     |                          |   |
230| |                    |                       |                          |   |
231| |                    |  Invoke the Update    |                          |   |
232| |                    |  method for report    |                          |   |
233| |                    |  D+Bus object         |                          |   |
234| |                    +----------------------->                          |   |
235| |                    |                       +-+Update method triggers  |   |
236| |                    |                       | |report to be updated    |   |
237| |                    |                       | |with the latest known   |   |
238| |                    |                       | |sensor's readings.      |   |
239| |                    |                       | |No additional sensor    |   |
240| |                    |                       <-+readings are performed. |   |
241+----------------------------------Optional-----------------------------------+
242| |                    |                       |                          |   |
243| |  Send report as SSE or push-style event    |                          |   |
244| |  using Redfish Event Service (not shown    |                          |   |
245| |  here) if configured to do so.             |                          |   |
246| <--------------------------------------------+                          |   |
247| |                    |                       |                          |   |
248+-----------------------------------------------------------------------------+
249| |                    | Update method call    |                          |   |
250| |                    | result                |                          |   |
251| |                    <-----------------------+                          |   |
252| |                    |                       |                          |   |
253| |                    | Invoke GetAll method  |                          |   |
254| |                    | on report D-Bus       |                          |   |
255| |                    | object                |                          |   |
256| |                    +----------------------->                          |   |
257| |                    |                       |                          |   |
258| |  Return metric     | Return report data    |                          |   |
259| |  report in JSON    <-----------------------+                          |   |
260| |  format            |                       |                          |   |
261| <--------------------+                       |                          |   |
262| |                    |                       |                          |   |
263+-----------------------------------------------------------------------------+
264  |                    |                       |                          |
265```
266
267The Redfish implementation in bmcweb is stateless, thus it is not able to
268store metric reports. All operations on metric reports shall be done in
269the Monitoring Service. Sending metric report as SSE or push-style events
270shall be done via the [Redfish Event Service][6]. It is marked as optional
271because metric report does not have to be configured for pushing its data
272through the event.
273
274In case of on demand metric report update, Monitoring Service performs no
275additional sensor readings because it already has the latest values, since
276they are updated on PropertiesChanged signal from the D-Bus sensors.
277
278**Monitoring service on [D-Bus][4]**
279
280Monitoring service exposes specific interfaces on D-Bus. One of them will be
281used for reading report management. The second one will be used for triggers
282management.
283
284**Reading report management**
285
286The reading report management D-Bus object:
287
288```ascii
289xyz.openbmc_project.MonitoringService.ReportsManagement
290/xyz/openbmc_project/MonitoringService/Reports
291```
292The ```ReportsManagement``` supports the following interface apart from
293standard D-Bus interface.
294
295| Name | Type | Signature | Result/Value | Flags |
296|------|------|-----------|--------------|-------|
297|```xyz.openbmc_project.MonitoringService.ReportsManagement``` | interface | - | - | - |
298|```.AddReport```                          | method    | ssuas | s | - |
299|```.MaxReports```                         | property  | u | 50 | emits-change |
300|```.PollRateResolution```                 | property  | u | 100 | emits-change |
301
302The ```AddReport``` method is used to create metric report. The report
303may contain single or multiple sensor readings. It is stored in the BMC's
304volatile memory. The method has the following arguments:
305
306| Argument | Type | Description |
307|----------|------|-------------|
308| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> testqrapndyY |
309| ReportingType | string | Reporting type: <br> "xyz.openbmc_project.MonitoringService.Metric.Periodic" - For periodic update "xyz.openbmc_project.MonitoringService.Metric.OnChange" - For update when value changes "xyz.openbmc_project.MonitoringService.Metric.OnRequest" - For update when user requests data |
310| ScanPeriod | uint32_t | Scan period used when Periodic type is set (in milliseconds) |
311| MetricsParams | array of structures | Collection of metric parameters.  |
312
313The ```MetricParams``` array entry is a structure containing:
314| Field | Type | Description |
315|----------|------|-------------|
316| Sensor's path | object | D-Bus path, path to the sensor providing readings. |
317| Operation's type | enum | {SINGLE, MAX, MIN, AVG, SUM} - information about aggregated operation. |
318| Metric id | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
319
320The ```ScanPeriod``` is defined per report, thus all sensors listed in the MetricsParams
321collection will be scanned wit the same frequency. Also the ReportingType is
322defined per report. In case when *xyz.openbmc_project.MonitoringService.Metric.OnChange*
323ReportingType was defined, metric report will emit signal when at least one
324reading has changed.
325
326The ```AddReport``` method returns:
327```ascii
328String for created report - ie. '/xyz/openbmc_project/MonitoringService/Reports/testqrapndyY'
329```
330
331Such created metric report implements the following interfaces, methods and
332properties (apart from standard D-Bus interface):
333
334| Name | Type | Signature | Result/Value | Flags |
335|------|------|-----------|--------------|-------|
336|```xyz.openbmc_project.Object.Delete```   | interface | - | - | - |
337|```.Delete```                             | method    | - | - | - |
338|```xyz.openbmc_project.MonitoringService.Report``` | interface | - | - | - |
339|```.Update```                             | method    | - | - | - |
340|```.ReadingParameters```                  | property  | a(sos) | 1 "/" | emits-change writable |
341|```.Readings```                           | property  | a(svs) | 0 | emits-change read-only |
342|```.ReportingType```                      | property  | s | One of reporting type strings| emits-change writable |
343|```.ScanPeriod```                         | property  | u | 100 | emits-change writable |
344
345The ```Update``` method is defined for the on demand metric report update. It
346shall trigger the ```Readings``` property to be updated and send
347PropertiesChanged signal.
348
349The ```ReadingParameters``` property contains an array of structures containing
350unique metric id, D-Bus sensor path and aggregated operation type. This
351property is made writable in order to support metric report modifications.
352
353| Field Type  | Field Description          |
354|-------------|----------------------------|
355| string      | Unique metric id           |
356| object path | D-Bus sensor's path        |
357| string      | Aggregated operation type  |
358
359The Readings property contains the array of the structures containing metric
360unique id, sensor's reading value and reading timestamp.
361
362| Field Type | Field Description          |
363|------------|----------------------------|
364| string     | Unique metric id           |
365| variant    | Sensor's reading value     |
366| string     | Sensor's reading timestamp |
367
368The ```ScanPeriod``` property has single value for the whole metric report.
369The Delete method results in deleting the whole metric report.
370
371The ```MaxReports``` property of
372the ```xyz.openbmc_project.MonitoringService.ReportsManagement``` interface
373contains the max number of metric reports supported by the Monitoring Service.
374This property is added to be compliant with the Redfish Telemetry Service
375schema, that contains ```MaxReports``` property.
376
377**Trigger management**
378
379The trigger management D-Bus object:
380
381```ascii
382xyz.openbmc_project.MonitoringService.TriggersManagement
383/xyz/openbmc_project/MonitoringService/Triggers
384```
385The ```TriggersManagement``` supports the following interface apart from
386standard D-Bus interface.
387
388| Name | Type | Signature | Result/Value | Flags |
389|------|------|-----------|--------------|-------|
390|```xyz.openbmc_project.MonitoringService.TriggersManagement``` | interface | - | - | - |
391|```.AddTrigger```                         | method    | sssv(os) | s | - |
392
393The ```AddTrigger``` method shall be used to create new trigger for the
394certain metric. Triggers are stored in BMC's volatile memory. The method
395has the following arguments:
396
397| Argument | Type | Description |
398|----------|------|-------------|
399| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> trigger0dfvAgVt6 |
400| ActionType | string | Action type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Log" - For logging to log service "xyz.openbmc_project.MonitoringService.Trigger.Event" - For sending Redfish event "xyz.openbmc_project.MonitoringService.Trigger.Update" - For trigger metric report update |
401| MetricType | string | Metric type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete" - for discrete sensors "xyz.openbmc_project.MonitoringService.Trigger.Numeric" - for numeric sensors |
402| TriggerParams | variant | Variant containing structure with either discrete triggers or numeric thresholds. |
403| MetricParam | structure | Structure containing D-Bus sensor's path and unique metric Id and optional D-Bus path to metric report to trigger. |
404
405The ```TriggerParams``` is variant type, which shall contain structure
406depending on the ```MetricType``` value. In case when ```MetricType``` contains
407the ```xyz.openbmc_project.MonitoringService.Trigger.Discrete``` value,
408 ```TriggerParams``` shall contain structure with discrete triggers.
409When ```MetricType``` contains
410the ```xyz.openbmc_project.MonitoringService.Trigger.Numeric``` value,
411 ```TriggerParams``` shall contain structure with numeric thresholds.
412
413Discrete triggers structure:
414
415| Field | Type | Description |
416|-------|------|-------------|
417| TriggerCondition | string | Discrete trigger condition: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
418| DiscreteTriggers | array of structures | Array of discrete trigger structures. |
419
420Member of DiscreteTriggers array:
421
422| Field | Type | Description |
423|-------|------|-------------|
424| TriggerId| string     | Unique trigger Id |
425| Severity | string     | Severity: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Critical" - requires immediate attention |
426| Value | variant    | Value of discrete metric, that constitutes a trigger event. |
427| DwellTime | uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
428
429Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical,
430lower warning and lower critical. Thus it will contain up to 4 structures shown below:
431
432| Field | Type | Description |
433|-------|------|-------------|
434| ThresholdType | string | Numeric trigger type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical","xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning","xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical","xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning"|
435| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
436| Activation | string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" |
437| ThresholdValue | variant | Value of reading that will trigger the threshold |
438
439The numeric threshold trigger type meaning:
440
441- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical" -
442indicates the reading is above normal range and requires immediate attention
443- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning" -
444indicates the reading is above normal range and may require attention
445- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical" -
446indicates the reading is below normal range and requires immediate attention
447- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning" -
448indicates the reading is below normal range and may require attention
449
450The numeric threshold activation meaning:
451
452- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing" -
453trigger action when reading is changing from below to above the threshold's value
454- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing" -
455trigger action when reading is changing from above to below the threshold's value
456- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" -
457trigger action when reading is crossing the threshold's value in either direction
458described above
459
460The ```MetricParam``` structure contains the following data:
461
462| Field | Type | Description |
463|-------|------|-------------|
464| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. |
465| MetricId   | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
466| ReportPath | object path | D-Bus path to Monitoring Service's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's ActionType is set to "xyz.openbmc_project.MonitoringService.Trigger.Update". |
467
468The ```AddTrigger``` method returns:
469```ascii
470String for created trigger - ie. '/xyz/openbmc_project/MonitoringService/Triggers/trigger0dfvAgVt6'
471```
472Such created trigger implements the following interfaces, methods and
473properties (apart from standard D-Bus interface):
474
475| Name | Type | Signature | Result/Value | Flags |
476|------|------|-----------|--------------|-------|
477|```xyz.openbmc_project.Object.Delete```   | interface | - | - | - |
478|```.Delete```                             | method    | - | - | - |
479|```xyz.openbmc_project.MonitoringService.Trigger``` | interface | - | - | - |
480|```.MetricType```                         | property | s | One of the MetricType strings | emits-change read-only |
481|```.Triggers```                           | property | {sa{ssvu64}} or a{su64sv} | The structure containing triggers. It depends on ```.MetricType``` property how the structure is defined. | emits-change writable |
482|```.ActionType```                         | property | s | One of ActionType strings | emits-change writable |
483|```.Metric```                             | property | (oso) | Structure containing details of metric, for which trigger is defined. | emits-change writable |
484
485The ```.MetricType``` property contains information about metric type for which
486trigger was created. It can be either discrete or numeric. This property is
487read-only, thus created trigger cannot be changed from discrete to numeric or
488from numeric to discrete. This also determines how the ```.Triggers``` property
489looks like on D-Bus.
490
491If ```.MetricType``` is equal to "xyz.openbmc_project.MonitoringService.Trigger.Discrete"
492then ```.Triggers``` property contains discrete trigger that looks like this:
493
494| Type | Description |
495|------|-------------|
496| string | Discrete trigger condition: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
497| array of structures | Array of discrete trigger structures. |
498
499Member of DiscreteTriggers array:
500
501| Type | Description |
502|------|-------------|
503| string     | Unique trigger Id |
504| string     | Severity: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Critical" - requires immediate attention |
505| variant    | Value of discrete metric, that constitutes a trigger event. |
506| uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
507
508If ```.MetricType``` is equal to "xyz.openbmc_project.MonitoringService.Trigger.Numeric"
509then ```.Triggers``` property contains numeric trigger that is an array of 4 structures
510presented below:
511
512| Type | Description |
513|------|-------------|
514| string | Numeric trigger type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning"|
515| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
516| string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" |
517| variant | Value of reading that will trigger the threshold |
518
519The ```.Metric``` property stores the details about reading, for which trigger was defined.
520It is in a form of structure consisting of three fields.
521
522| Field type | Description  |
523|------------|--------------|
524| object path  | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. |
525| string     | Unique metric Id |
526| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Monitoring Service. |
527
528**Trigger operations**
529
530Triggers support three types of operation: Log, Event and Update. For each,
531there is a different way of proceeding.
532
5331. For action Log, the event shall
534be logged to the system journal. In this case the Monitoring Service writes
535data to system journal using libjournal. The Redfish log service shall then
536retrieve the data by reading system journal. All is shown on the diagram below.
537
538```ascii
539+---------------------------+
540|bmcweb|                    |         +----------------------+
541+------/    +-----------+-+ |         |Monitoring|           |
542|           |Redfish    | | |         |Service   |           |
543|           |log service| | |         +----------/           |
544|           +-----------/ | |         |                      |
545|           |             | |         |                      |
546|           |             | |         |                      |
547|           +------^------+ |         +-----------+----------+
548+---------------------------+                     |
549                   |                              |
550                   +----collect----+            event
551                     journal entry |      (write to journal)
552                                   |              |
553       +------------------------------------+     |
554       |systemd|                   |        |     |
555       +-------/ +----------+  +---+------+ |     |
556       |         |journal|  |  |libjournal| |     |
557       |         +-------/  <-->          <-------+
558       |         |          |  +----------+ |
559       |         |          |               |
560       |         |          |               |
561       |         +----------+               |
562       |                                    |
563       +------------------------------------+
564```
5652. For action Event, the Monitoring Service shall send event using the
566[Redfish Event Service][6] either as push-style event or SSE.
567
5683. For action Update, the Monitoring Service will trigger the update of reading
569report pointed by it's D-Bus path contained in ReportPath property inside
570the ```.Metric``` structure. The update shall cause the reading report's D-Bus
571object to emit property change signal. This will cause Redfish Metric Report to
572be streamed out if it was configured to do so.
573
574**Telemetry Service Redfish API**
575
576Telemetry service shall support 2019.1 Redfish schemas for telemetry resources.
577Metric report definitions determines which metrics are to be included in metric
578report. Metric definition is assigned to particular metric type and it
579describes how the metric should be interpreted. The following resource schemas
580shall be supported:
581
582- TelemetryService 1.1.2
583- MetricDefinition 1.0.3
584- MetricReportDefinition 1.3.0
585- MetricReport 1.2.0
586- Triggers 1.1.1
587
588The following diagram shows relations between these resources.
589
590```ascii
591 +----------------------------------------------------------------------------+
592 |                             Service root                                   |
593 +----------------------------------+-------------------------------+---------+
594                                    |                               |
595                                    |                               |
596                                    |                               |
597 +----------------------------------v-----------------+  +----------v---------+
598 |                                                    |  |Chassis             |
599 |                Telemetry Service                   |  |                    |
600 |                                                    |  |                    |
601 |                                                    |  |  +---------------+ |
602 +---------+--------------+------------------+--------+  |  |               | |
603           |              |                  |           |  |   Chassis 1   | |
604           |              |                  |           |  |               | |
605           |              |                  |           |  +---------+-----+ |
606           |              |                  |           |            |       |
607+----------v--+  +--------v----+  +----------v-----+     +--------------------+
608|Triggers     |  |Metric       |  |Metric report   |                  |
609|             |  |definition   |  |                |                  |
610|             |  | +---------+ |  |                | Reads            |
611| +---------+ |  | |Reading  | |  | +-----------+  | ReadingVolts  +--v------+
612| |         | |  | |Volts    <------+           +------------------>         |
613| |Trigger 1| |  | +---------+ |  | |  Metric   |  |               |         |
614| |         | |  |             |  | | report 1  |  | Reads         |  Power  |
615| |         | |  | +---------+ |  | |           |  | PowerConsumed |         |
616| |         | |  | |         | |  | |           |  | Watts         |         |
617| +--+---+--+ |  | |Power    <------+           +------------------>         |
618|    |   |    |  | |Consumed | |  | +-----^-----+  |               +----^----+
619|    |   |    |  | |Watts    | |  |       |        |                    |
620|    |   |    |  | +---------+ |  |       |        |                    |
621|    |   |    |  |             |  |       |        |                    |
622+-------------+  +-------------+  +----------------+                    |
623     |   |                                |                             |
624     |   | Triggers report update         |                             |
625	 |   | (when applicable)              |                             |
626     |   +--------------------------------+                             |
627     |                                                                  |
628     |   Monitors PowerConsumedWatts to check                           |
629	 |   whether trigger value is exceeded                              |
630     +------------------------------------------------------------------+
631```
632
633The diagram shows the relations between Redfish resources. Metric report is
634defined to be generated periodically, on demand or on change. Each metric in the
635Metric Report contains the URI to its metric definition and Redfish sensor,
636which reading value is presented. Nevertheless, under this presentation layer,
637Monitoring Service is gathering D-Bus sensors readings and exposing them
638in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor
639is mapped to Redfish sensor.
640
641Below examples of Redfish resources for the Telemetry Service are shown.
642
643The telemetry service Redfish resource example:
644
645```json
646{
647    "@odata.type": "#TelemetryService.v1_1_2.TelemetryService",
648    "Id": "TelemetryService",
649    "Name": "Telemetry Service",
650    "Status": {
651        "State": "Enabled",
652        "Health": "OK"
653    },
654    "MinCollectionInterval": "T00:00:10s",
655    "SupportedCollectionFunctions": [],
656    "MaxReports": <max_no_of_reports>,
657    "MetricDefinitions": {
658        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions"
659    },
660    "MetricReportDefinitions": {
661        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions"
662    },
663    "MetricReports": {
664        "@odata.id": "/redfish/v1/TelemetryService/MetricReports"
665    },
666    "Triggers": {
667        "@odata.id": "/redfish/v1/TelemetryService/Triggers"
668    },
669    "LogService": {
670        "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal"
671    },
672    "@odata.context": "/redfish/v1/$metadata#TelemetryService",
673    "@odata.id": "/redfish/v1/TelemetryService"
674}
675```
676
677Sample metric report definition:
678
679```json
680{
681    "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition",
682    "Id": "SampleMetric",
683    "Name": "Sample Metric Report Definition",
684    "MetricReportDefinitionType": "Periodic",
685    "Schedule": {
686        "RecurrenceInterval": "T00:00:10"
687    },
688    "ReportActions": [
689        "LogToMetricReportsCollection"
690    ],
691    "ReportUpdates": "Overwrite",
692    "MetricReport": {
693        "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric"
694    },
695    "Status": {
696        "State": "Enabled"
697    },
698    "Metrics": [
699        {
700            "MetricId": "Test",
701            "MetricProperties": [
702                "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts"
703            ]
704        }
705    ],
706    "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition",
707    "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage"
708}
709```
710
711Sample metric report:
712
713```json
714{
715    "@odata.type": "#MetricReport.v1_2_0.MetricReport",
716    "Id": "SampleMetric",
717    "Name": "Sample Metric Report",
718    "ReportSequence": "0",
719    "MetricReportDefinition": {
720        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
721    },
722    "MetricValues": [
723        {
724            "MetricDefinition": {
725                "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition"
726            },
727            "MetricId": "Test",
728            "MetricValue": "100",
729            "Timestamp": "2016-11-08T12:25:00-05:00",
730            "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
731        }
732    ],
733    "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport",
734    "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage"
735}
736```
737
738Sample trigger, that will trigger metric report update:
739
740```json
741{
742    "@odata.type": "#Triggers.v1_1_1.Triggers",
743    "Id": "SampleTrigger",
744    "Name": "Sample Trigger",
745    "MetricType": "Numeric",
746    "Links": {
747        "MetricReportDefinitions": [
748            "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
749        ]
750    },
751    "Status": {
752        "State": "Enabled"
753    },
754    "TriggerActions": [
755        "RedfishMetricReport"
756    ],
757    "NumericThresholds": {
758        "UpperCritical": {
759            "Reading": 50,
760            "Activation": "Increasing",
761            "DwellTime": "PT0.001S"
762        },
763        "UpperWarning": {
764            "Reading": 48.1,
765            "Activation": "Increasing",
766            "DwellTime": "PT0.004S"
767        }
768    },
769    "MetricProperties": [
770        "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
771    ],
772    "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers",
773    "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers"
774}
775```
776
777**Performance tests**
778
779Performance test were conducted on the AST2500 system with 64 MB flash and
780512 MB RAM. Flash consumption by the Monitoring Service is 197.5 kB. The
781runtime statistics are shown in the table below. The reading report is
782mapped into single Metric Report. The runtime data is collected for the
783Monitoring Service component only. All reports was created with
784```xyz.openbmc_project.MonitoringService.Metric.OnChange``` property to
785maximize the workload. In the configuration with 50 reports and 50 sensors
786it is about 200 new readings per second, generating 200 reading reports
787per second. The table shows CPU usage and memory usage. The VSZ is the amount
788of memory mapped into the address space of the process. It includes pages
789backed by the process' executable file and shared libraries, its heap and
790stack, as well as anything else it has mapped.
791
792
793| Monitoring service state                         | VSZ  | %VSZ | %CPU |
794|--------------------------------------------------|------|------|------|
795| Idle (0 reports, 0 sensors)                      |5188 B| 1%   | 0%   |
796| 1 report, 1 sensor                               |5188 B| 1%   | 1%   |
797| 2 reports, 1 sensor                              |5188 B| 1%   | 1%   |
798| 2 reports, 2 sensors (1 sensor per report)       |5188 B| 1%   | 1%   |
799| 1 report, 10 sensors                             |5188 B| 1%   | 1%   |
800| 10 reports, 10 sensors (same for each report)    |5320 B| 1%   | 1-2% |
801| 2 reports, 20 sensors (10 per report)            |5188 B| 1%   | 1%   |
802| 30 reports, 30 sensors (10 per report)           |5444 B| 1%   | 5-9% |
803| 50 reports, 50 sensors (10 per report)           |5572 B| 1%   |11-14%|
804
805The last two configurations use 10 sensors per reading report, which gives
8063 or 5 distinctive configurations. Each such configuration is used to
807create 10 reading reports to obtain the desired amount of 30 or 50 reading
808reports.
809
810In this architecture reading report is created every time when Redfish
811Metric Report Definition is posted (creating new Metric Report).
812
813## Alternatives Considered
814The [framework based on collectd/librrd][5] was considered as alternate design.
815Although it seems to be versatile and scalable solution, it has some drawbacks
816from our point of view:
817* Collectd's footprint in the minimal working configuration is around 2.6 MB,
818while available space for the OpenBMC is limited to 64 MB.
819* In this design, librrd is used to store metrics on the BMC's non-volatile
820storage, which may be an issue, when lots of metrics are captured and stored
821to OpenBMC's limited storage space. Also flash wear-out issue may occur, when
822metrics are captured frequently (like once per second).
823* Monitoring Service is directly compatible with Redfish Telemetry
824Service API, which means, that Monitoring Service's reading reports can
825be directly mapped to Redfish Metric Reports.
826* Monitoring Service unifies the way how the BMC's telemetry is exposed over
827the Redfish and may be used with multiple front-ends, thus there is no problem
828 to add support telemetry over IPMI or any other API.
829
830Since this design assumes flexibility and modularity, there is no obstacles
831to use collectd in cooperation with Monitoring Service. The one of possible
832configurations is shown on the diagram below.
833
834```ascii
835   +-----------------+      +-----------------+
836   |                 |      |   Monitoring    |
837   |  D-Bus sensors  |      |    Service      |
838   |                 |      |                 |
839   +--------^--------+      +--------^--------+
840            |                        |
841            |                        |
842            |                        |
843<--------^--v-----------D-Bus--------v-^---------->
844         |                             |
845         |                             |
846         |                             |
847 +-------v------------+     +----------v--------+
848 |  collectd metrics  |     |                   |
849 |  exposed as D-Bus  |     |     bmcweb        |
850 |      sensors       |     |  (with Redfish    |
851 +---------^----------+     |    Telemetry      |
852           |                |     Service)      |
853           |                |                   |
854    +------+-------+        +-------------------+
855    |              |
856    |   collectd   |
857    |              |
858    +--------------+
859```
860Here collectd is used as the source of some set of metrics. It exposes them
861as the D-Bus sensors, which can easily be consumed either by the bmcweb and
862Monitoring Service without any changes in their D-Bus interfaces. In such
863configuration Monitoring Service provides metric reports and triggers
864management.
865
866Other possible configuration is to use collectd without the Monitoring Service,
867but in such case, collectd does not provide metric reports and triggers support
868compatible with the Redfish. In such case, Redfish Telemetry Service won't be
869supported or metric reports and triggers support has to be provided by the
870collectd.
871
872## Impacts
873This design impacts the architecture of the bmcweb component, since it adds
874the Telemetry Service implementation as a component for the existing
875Redfish API implementation.
876
877## Testing
878This is the very high-level description of the proposed set of tests.
879Testing shall be done on three basic levels:
880* Unit tests
881* Functional tests
882* Performance tests
883
884**Unit tests**
885
886The Monitoring Service's code shall be covered by the unit tests. The preferred
887framework is the [GTest/GMock][7]. The unit tests shall be ran before code
888change is to be committed to make sure, that nothing is broken in existing
889functionality. Also, when new code is introduced, a new set of unit tests shall
890be committed with it according to test-driven development principle. Unit tests
891shall be also carefully reviewed.
892
893**Functional tests**
894
895Functional tests will be divided into two steps.
896
897First step is for testing the
898Monitoring Service metric reports management. Test scenario shall contain
899creating metric report by POSTing proper metric report definition, reading
900metric report (using GET on proper URI) and deleting the metric report. The
901required configuration for such test is D-Bus sensors (at least some of them)
902and bmcweb with Redfish Telemetry Service implemented. The tests shall be
903performed on real hardware. For ease of metric testing, dummy D-Bus sensors
904may be provided to provide specifically prepared metrics. This configuration
905shall also enable testing aggregated operations (MIN, MAX, SUM, AVG).
906
907Second step is to test triggers and events generation. This will require also
908Event Service to be implemented along with Log Service. Tests shall cover all
909scenarios with sending metric report as an event, triggering metric report
910update and logging events.
911
912**Performance tests**
913
914Performance tests shall be done using full OpenBMC configuration with all
915the required set of features. The tests shall create a lot of metric reports
916(up to maximum number) along with all possible triggers. Measurements shall
917cover the periodic metric report jitter, delays in event logging or sending,
918BMC's CPU utilization and the performance impact on other services.
919
920[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip
921[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md
922[3]: https://www.kernel.org/doc/Documentation/hwmon/
923[4]: https://www.freedesktop.org/wiki/Software/dbus/
924[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257
925[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749
926[7]: https://github.com/google/googletest
927