xref: /openbmc/docs/designs/telemetry.md (revision ba560cc31297caddfc157c540ae9e6d760d630e5)
1# OpenBMC platform telemetry
2
3Author: Piotr Matuszczak <piotr.matuszczak@intel.com>
4
5Other contributors:
6
7- Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> `pawelr`
8- Kamil Kowalski <kamil.kowalski@intel.com>
9
10Created: 2019-08-07
11
12## Problem Description
13
14The BMC on server platform gathers lots of telemetry data, which has to be
15exposed in clean, human readable and standardized format. This document focuses
16on telemetry over the Redfish, since it is standard API for platform
17manageability.
18
19## Background and References
20
21- OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1]
22  for exposing platform telemetry over the network.
23- OpenBMC platform telemetry shall leverage the [OpenBMC sensors architecture
24  implementation][2].
25- OpenBMC platform telemetry shall implement a service, called Telemetry to deal
26  with metrics report and trigger management. This service is described later in
27  this document.
28- Although we use the [hwmon][3] to gather readings from physical sensors, this
29  architecture does not depend on it, because the Telemetry service component
30  relies on the [OpenBMC D-Bus sensors][2].
31
32## Requirements
33
34- [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the
35  Telemetry service requires telemetry sources to be implemented as D-Bus
36  sensors.
37
38## Proposed Design
39
40Redfish Telemetry Model shall implement Telemetry Service with the following
41collection resources:
42
43- Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.)
44- Metric Report Definitions - defines how metric report shall be created (which
45  metrics it shall contain, how often it shall be generated etc.)
46- Metric Reports - contains actual metric reports containing telemetry data
47  generated according to the Metric Report Definitions
48- Metric Triggers - contains thresholds and actions that apply to specific
49  metrics
50
51OpenBMC telemetry architecture is shown on the diagram below.
52
53```ascii
54   +--------------+               +----------------+     +-----------------+
55   |hwmon|        |               |Dbus sensors|   |     |Telemetry|       |
56   +-----/        |               +------------/   |     +---------/       |
57   |              +--filesystem--->                |     |                 |
58   |              |               |                |     |                 |
59   +--------------+               +--------^-------+     +--------^--------+
60                                           |                      |
61                                           |                      |
62<------------------------------------------v-----^--DBus----------v----------->
63                                                 |
64                                                 |
65+-------+---------------------------------------------------------------------+
66|bmcweb |                                        |                            |
67+-------/                                        |                            |
68|                                                |                            |
69| +--------+-------------------------------------v--------------------------+ |
70| |Redfish |                                                                | |
71| +--------/                                            +---------+-------+ | |
72| |                                                     |Existing |       | | |
73| | +------------------------------------------------+  |Redfish  |       | | |
74| | |Telemetry Service|                              |  |resources|       | | |
75| | +----------------+/                              |  +---------/       | | |
76| | |  +----------+  +-----------+  +-------------+  |  |   +---------+   | | |
77| | |  |  Metric  |  |  Metric   |  |Metric report|  |  |   | Redfish |   | | |
78| | |  | triggers |  |definitions|  |definitions  <---------+ sensors |   | | |
79| | |  |          |  |           |  |             |  |  |   |         |   | | |
80| | |  +----+-----+  +-----+-----+  +------+------+  |  |   +---------+   | | |
81| | |       |              |               |         |  |                 | | |
82| | |       |              |               |         |  |                 | | |
83| | |       |              |               |         |  |                 | | |
84| | |       |        +-----v-----+         |         |  |                 | | |
85| | |       |        |   Metric  |         |         |  |                 | | |
86| | |       +-------->   report  <---------+         |  |                 | | |
87| | |                |           |                   |  |                 | | |
88| | |                +-----------+                   |  |                 | | |
89| | |                                                |  |                 | | |
90| | +------------------------------------------------+  +-----------------+ | |
91| |                                                                         | |
92| +-------------------------------------------------------------------------+ |
93|                                                                             |
94+-----------------------------------------------------------------------------+
95```
96
97The telemetry service component is a part of Redfish and implements the DMTF's
98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors
99URIs for metric report creation. Those sensors are also used to get URI->D-Bus
100sensor mapping. Redfish Telemetry Service acts as presentation layer for the
101telemetry, while Telemetry service is responsible for gathering metrics from
102D-Bus sensors and exposing them as D-Bus objects. Telemetry service supports
103different monitoring modes (periodic, on change and on demand) along with
104aggregated operations:
105
106- SINGLE - current reading value
107- AVERAGE - average value over defined time period
108- MAX - max reading value during defined time period
109- MIN - min reading value during defined time period
110- SUM - sum of reading values over defined time period
111
112The time period for calculating aggregated metric is taken from the Redfish
113Metric Report Definition resource for each sensor's metric.
114
115Telemetry service supports creating and managing metric report, which may
116contain single or multiple metrics from sensors. This metric report is mapped to
117Metric Report for the Redfish Telemetry Service.
118
119The diagram below shows the flows for creation and update of metric report.
120
121```ascii
122+----+              +------+              +---------+                 +-------+
123|User|              |bmcweb|              |Telemetry|                 | D-Bus |
124+-+--+              +--+---+              +----+----+                 |Sensors|
125  |                    |                       |                      +---+---+
126  |                    |                       |                          |
127+-----------------------------------------------------------------------------+
128|Metric report definition flow|                |                          |   |
129+-----------------------------+                |                          |   |
130| |                    |                       |                          |   |
131| |                    |                       |                          |   |
132| |    POST request    |                       |                          |   |
133| |    with metric     |                       |                          |   |
134| |    report          |                       |                          |   |
135| |    definition      |                       |                          |   |
136| +-------------------->  Invoke AddReport     |  Register for D-Bus      |   |
137| |                    |  method on D-Bus      |  sensors                 |   |
138| |                    +----------------------->  PropertiesChanged       |   |
139| |                    |                       |  signals                 |   |
140| |                    |                       +-------------------------->   |
141| |                    |                       |-------------------------->   |
142| |                    |                       +-------------------------->   |
143| |                    |                       |                          |   |
144| |  HTTP response     |                       +-+Create Report           |   |
145| |  code 201 with     |  Return created       | |D-Bus object            |   |
146| |  Metric Report     |  Report D-Bus path    <-+                        |   |
147| |  Definition's URI  <-----------------------+                          |   |
148| <--------------------+                       |                          |   |
149| |                    |                       |                          |   |
150| |                    |                       |                          |   |
151+-----------------------------------------------------------------------------+
152  |                    |                       |                          |
153+-----------------------------------------------------------------------------+
154|Periodic metric report update flow|           |                          |   |
155+----------------------------------+           +-+Metric report           |   |
156| |                    |                       | |timer triggers          |   |
157| |                    |                       <-+report update           |   |
158| |                    |                       |                          |   |
159+----------------------------------Optional-----------------------------------+
160| |                    |                       |                          |   |
161| |  Send report as SSE or push-style event    |                          |   |
162| |  using Redfish Event Service (not shown    |                          |   |
163| |  here) if configured to do so.             |                          |   |
164| <--------------------------------------------+                          |   |
165| |                    |                       |                          |   |
166+-----------------------------------------------------------------------------+
167| |  GET on Metric     |                       |                          |   |
168| |  Report URI        |                       |   Sensor's Properties-   |   |
169| +-------------------->                       |   Changed signal         |   |
170| |                    +-+Map report's URI     <--------------------------+   |
171| |                    | |to D-Bus path        |                          |   |
172| |                    <-+                     | +----------------------+ |   |
173| |                    | Invoke GetAll method  | |Note that sensor's    | |   |
174| |                    | on report D-Bus       | |PropertiesChanged     | |   |
175| |                    | object                | |signal is asynchronous| |   |
176| |                    +-----------------------> |to metric report timer| |   |
177| |                    |                       | |This timer is the only| |   |
178| |  Return metric     | Return report data    | |thing that triggers   | |   |
179| |  report in JSON    <-----------------------+ |metric report update  | |   |
180| |  format            |                       | +----------------------+ |   |
181| <--------------------+                       |                          |   |
182| |                    |                       |                          |   |
183+-----------------------------------------------------------------------------+
184  |                    |                       |                          |
185+-----------------------------------------------------------------------------+
186|On change metric report update flow|          |   Sensor's Properties-   |   |
187+-----------------------------------+          |   Changed signal         |   |
188| |                    |                       <--------------------------+   |
189| |                    |                       |                          |   |
190| |                    |                       +-+Sensor's signal         |   |
191| |                    |                       | |triggers report         |   |
192| |                    |                       <-+update                  |   |
193| |                    |                       |                          |   |
194+----------------------------------Optional-----------------------------------+
195| |                    |                       |                          |   |
196| |  Send report as SSE or push-style event    |                          |   |
197| |  using Redfish Event Service (not shown    |                          |   |
198| |  here) if configured to do so.             |                          |   |
199| <--------------------------------------------+                          |   |
200| |                    |                       |                          |   |
201+-----------------------------------------------------------------------------+
202| |  GET on Metric     |                       |                          |   |
203| |  Report URI        |                       |                          |   |
204| +-------------------->                       |                          |   |
205| |                    +-+Map report's URI     |                          |   |
206| |                    | |to D-Bus path        | +----------------------+ |   |
207| |                    <-+                     | |Note that sensor's    | |   |
208| |                    | Invoke GetAll method  | |PropertiesChanged     | |   |
209| |                    | on report D-Bus       | |signal triggers the   | |   |
210| |                    | object                | |report update. It is  | |   |
211| |                    +-----------------------> |sufficient that the   | |   |
212| |                    |                       | |signal from only one  | |   |
213| |  Return metric     | Return report data    | |sensor triggers report| |   |
214| |  report in JSON    <-----------------------+ |update.               | |   |
215| |  format            |                       | +----------------------+ |   |
216| <--------------------+                       |                          |   |
217| |                    |                       |                          |   |
218+-----------------------------------------------------------------------------+
219  |                    |                       |                          |
220+-+--------------------+------------------------------------------------------+
221|On demand metric report update flow|          |                          |   |
222+-+--------------------+------------+          |                          |   |
223| |                    |                       |                          |   |
224| |  GET on Metric     |                       |                          |   |
225| |  Report URI        |                       |                          |   |
226| +-------------------->                       |                          |   |
227| |                    +-+Map report's URI     |                          |   |
228| |                    | |to D-Bus path        |                          |   |
229| |                    <-+                     |                          |   |
230| |                    |                       |                          |   |
231| |                    |  Invoke the Update    |                          |   |
232| |                    |  method for report    |                          |   |
233| |                    |  D-Bus object         |                          |   |
234| |                    +----------------------->                          |   |
235| |                    |                       +-+Update method triggers  |   |
236| |                    |                       | |report to be updated    |   |
237| |                    |                       | |with the latest known   |   |
238| |                    |                       | |sensor's readings.      |   |
239| |                    |                       | |No additional sensor    |   |
240| |                    |                       <-+readings are performed. |   |
241+----------------------------------Optional-----------------------------------+
242| |                    |                       |                          |   |
243| |  Send report as SSE or push-style event    |                          |   |
244| |  using Redfish Event Service (not shown    |                          |   |
245| |  here) if configured to do so.             |                          |   |
246| <--------------------------------------------+                          |   |
247| |                    |                       |                          |   |
248+-----------------------------------------------------------------------------+
249| |                    | Update method call    |                          |   |
250| |                    | result                |                          |   |
251| |                    <-----------------------+                          |   |
252| |                    |                       |                          |   |
253| |                    | Invoke GetAll method  |                          |   |
254| |                    | on report D-Bus       |                          |   |
255| |                    | object                |                          |   |
256| |                    +----------------------->                          |   |
257| |                    |                       |                          |   |
258| |  Return metric     | Return report data    |                          |   |
259| |  report in JSON    <-----------------------+                          |   |
260| |  format            |                       |                          |   |
261| <--------------------+                       |                          |   |
262| |                    |                       |                          |   |
263+-----------------------------------------------------------------------------+
264  |                    |                       |                          |
265```
266
267The Redfish implementation in bmcweb is stateless, thus it is not able to store
268metric reports. All operations on metric reports shall be done in the Telemetry
269service. Sending metric report as SSE or push-style events shall be done via the
270[Redfish Event Service][6]. It is marked as optional because metric report does
271not have to be configured for pushing its data through the event.
272
273In case of on demand metric report update, Telemetry service performs no
274additional sensor readings because it already has the latest values, since they
275are updated on PropertiesChanged signal from the D-Bus sensors.
276
277### Telemetry service on [D-Bus][4]
278
279Telemetry service exposes specific interfaces on D-Bus. One of them will be used
280for reading report management. The second one will be used for triggers
281management.
282
283### Reading report management
284
285The reading report management D-Bus object:
286
287```ascii
288xyz.openbmc_project.Telemetry.ReportManager
289/xyz/openbmc_project/Telemetry/Reports
290```
291
292The `ReportManager` implements D-Bus interface
293[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The
294interface is described in the phosphor-dbus-interfaces. This interface
295implements `AddReport` method, which is used to create a metric report. The
296report may contain a single or multiple sensor readings. The way how the report
297will be stored by the BMC is defined by one of this method's parameters. The
298`ReportManager` object implements property that stores the maximum number of
299reports supported simultaneously.
300
301The `AddReport` method returns the path to the newly created report object. The
302report object implements the [`xyz.openbmc_project.Object.Delete`][10] and
303[`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10]
304interface is defined to add support for removing Report object, while the
305[`Report`][9] interface implements methods and properties for Report management
306along with properties containing telemetry readings. Each report object contains
307the timestamp of its last update. The report object contains an array of
308structures containing reading with its metadata and timestamp of last update of
309this metric. Each report has also the property that stores update interval (for
310periodically updated reports).
311
312### Trigger management
313
314The trigger management D-Bus object:
315
316```ascii
317xyz.openbmc_project.Telemetry.TriggerManager
318/xyz/openbmc_project/Telemetry/Triggers
319```
320
321The `TriggerManager` supports the `xyz.openbmc_project.Telemetry.TriggerManager`
322interface, which implements the `AddTrigger` method. This method shall be used
323to create new trigger for the certain metric. The method's parameters allow to
324define the type of metric for which trigger is set (discrete or numeric). Depend
325on this setting, this method accepts different set of trigger parameters.
326
327For discrete metric type, trigger parameters contain:
328
329| Field            | Type                | Description                                                                                                                                                                                                     |
330| ---------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
331| TriggerCondition | enum                | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
332| DiscreteTriggers | array of structures | Array of discrete trigger structures.                                                                                                                                                                           |
333
334Member of DiscreteTriggers array:
335
336| Field     | Type    | Description                                                                                                      |
337| --------- | ------- | ---------------------------------------------------------------------------------------------------------------- |
338| TriggerId | string  | Unique trigger Id                                                                                                |
339| Severity  | enum    | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention    |
340| Value     | variant | Value of discrete metric, that constitutes a trigger event.                                                      |
341| DwellTime | uint64  | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. |
342
343For numeric metric type, trigger parameters contain numeric thresholds. Numeric
344thresholds structure shall contain up to 4 thresholds: upper warning, upper
345critical, lower warning and lower critical. Thus it will contain up to 4
346structures shown below:
347
348| Field          | Type    | Description                                                                                                                                                                                                                                                                                                                                                                                                         |
349| -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
350| ThresholdType  | enum    | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention                                                             |
351| DwellTime      | uint64  | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed.                                                                                                                                                                                                                                                                                                    |
352| Activation     | enum    | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above |
353| ThresholdValue | variant | Value of reading that will trigger the threshold                                                                                                                                                                                                                                                                                                                                                                    |
354
355The `AddTrigger` method also allows to define the specific action when trigger
356is activated. Upon the trigger activation, three possible actions are allowed,
357logging event to log service, sending event via event service and triggering the
358metric report update.
359
360In order to assign trigger to specific metric, the metric parameter is defined.
361Its structure contains the following data:
362
363| Field      | Type        | Description                                                                                                                                                                                        |
364| ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
365| SensorPath | object path | D-Bus path to sensor, for which trigger is defined.                                                                                                                                                |
366| MetricId   | string      | Contains unique metric id, that can be mapped to Redfish MetricId.                                                                                                                                 |
367| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. |
368
369The `AddTrigger` method also allows to set trigger's persistency (whether
370trigger shall be stored in the BMC's non-volatile memory).
371
372The `AddTrigger` method returns:
373
374```ascii
375String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}'
376```
377
378Such created trigger implements the `xyz.openbmc_project.Object.Delete` and the
379`xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger object contains
380read-only information about metric type, for which it was created (discrete or
381numeric). This information determines which triggers are stored within trigger
382object.
383
384If trigger is defined for discrete metric type, than it contains trigger
385information that looks like this:
386
387| Type                | Description                                                                                                                                                                                                   |
388| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
389| enum                | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
390| array of structures | Array of discrete trigger structures.                                                                                                                                                                         |
391
392Discrete trigger structure:
393
394| Type    | Description                                                                                                         |
395| ------- | ------------------------------------------------------------------------------------------------------------------- |
396| string  | Unique trigger Id                                                                                                   |
397| enum    | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention        |
398| variant | Value of discrete metric, that constitutes a trigger event.                                                         |
399| uint64  | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed. |
400
401If trigger is defined for numeric metric type, than it contains information
402about numeric triggers that is an array of 4 structures presented below:
403
404| Type    | Description                                                                                                                           |
405| ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
406| enum    | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning"                                      |
407| uint64  | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed.                   |
408| enum    | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" |
409| variant | Value of reading that will trigger the threshold                                                                                      |
410
411The trigger object also contains information about reading, for which trigger
412was defined. It is in a form of structure consisting of three fields.
413
414| Field type  | Description                                                                                                                                                                              |
415| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
416| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined.                                                                                                |
417| string      | Unique metric Id                                                                                                                                                                         |
418| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. |
419
420### Trigger operations\*\*
421
422Triggers support three types of operation: Log, Event and Update. For each,
423there is a different way of proceeding.
424
4251. For action Log, the event shall be logged to the system journal. In this case
426   the Telemetry service writes data to system journal using libjournal. The
427   Redfish log service shall then retrieve the data by reading system journal.
428   All is shown on the diagram below.
429
430   ```ascii
431   +---------------------------+
432   |bmcweb|                    |         +----------------------+
433   +------/    +-----------+-+ |         |Telemetry|            |
434   |           |Redfish    | | |         +---------/            |
435   |           |log service| | |         |                      |
436   |           +-----------/ | |         |                      |
437   |           |             | |         |                      |
438   |           |             | |         |                      |
439   |           +------^------+ |         +-----------+----------+
440   +---------------------------+                     |
441                      |                              |
442                      +----collect----+            event
443                        journal entry |      (write to journal)
444                                      |              |
445          +------------------------------------+     |
446          |systemd|                   |        |     |
447          +-------/ +----------+  +---+------+ |     |
448          |         |journal|  |  |libjournal| |     |
449          |         +-------/  <-->          <-------+
450          |         |          |  +----------+ |
451          |         |          |               |
452          |         |          |               |
453          |         +----------+               |
454          |                                    |
455          +------------------------------------+
456   ```
457
4582. For action Event, the Telemetry service shall send event using the [Redfish
459   Event Service][6] either as push-style event or SSE.
460
4613. For action Update, the Telemetry service will trigger the update of reading
462   report pointed by it's D-Bus path contained in trigger object properties. The
463   update shall cause the reading report's D-Bus object to emit property change
464   signal. This will cause Redfish Metric Report to be streamed out if it was
465   configured to do so.
466
467### Redfish Telemetry Service API
468
469Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry
470resources. Metric report definitions determines which metrics are to be include
471in metric report. Metric definition is assigned to particular metric type and it
472describes how the metric should be interpreted. The following resource schemas
473shall be supported:
474
475- TelemetryService 1.1.2
476- MetricDefinition 1.0.3
477- MetricReportDefinition 1.3.0
478- MetricReport 1.2.0
479- Triggers 1.1.1
480
481The following diagram shows relations between these resources.
482
483```ascii
484 +----------------------------------------------------------------------------+
485 |                             Service root                                   |
486 +----------------------------------+-------------------------------+---------+
487                                    |                               |
488                                    |                               |
489                                    |                               |
490 +----------------------------------v-----------------+  +----------v---------+
491 |                                                    |  |Chassis             |
492 |                Telemetry Service                   |  |                    |
493 |                                                    |  |                    |
494 |                                                    |  |  +---------------+ |
495 +---------+--------------+------------------+--------+  |  |               | |
496           |              |                  |           |  |   Chassis 1   | |
497           |              |                  |           |  |               | |
498           |              |                  |           |  +---------+-----+ |
499           |              |                  |           |            |       |
500+----------v--+  +--------v----+  +----------v-----+     +--------------------+
501|Triggers     |  |Metric       |  |Metric report   |                  |
502|             |  |definition   |  |                |                  |
503|             |  | +---------+ |  |                | Reads            |
504| +---------+ |  | |Reading  | |  | +-----------+  | ReadingVolts  +--v------+
505| |         | |  | |Volts    <------+           +------------------>         |
506| |Trigger 1| |  | +---------+ |  | |  Metric   |  |               |         |
507| |         | |  |             |  | | report 1  |  | Reads         |  Power  |
508| |         | |  | +---------+ |  | |           |  | PowerConsumed |         |
509| |         | |  | |         | |  | |           |  | Watts         |         |
510| +--+---+--+ |  | |Power    <------+           +------------------>         |
511|    |   |    |  | |Consumed | |  | +-----^-----+  |               +----^----+
512|    |   |    |  | |Watts    | |  |       |        |                    |
513|    |   |    |  | +---------+ |  |       |        |                    |
514|    |   |    |  |             |  |       |        |                    |
515+-------------+  +-------------+  +----------------+                    |
516     |   |                                |                             |
517     |   | Triggers report update         |                             |
518     |   | (when applicable)              |                             |
519     |   +--------------------------------+                             |
520     |                                                                  |
521     |   Monitors PowerConsumedWatts to check                           |
522     |   whether trigger value is exceeded                              |
523     +------------------------------------------------------------------+
524```
525
526The diagram shows the relations between Redfish resources. Metric report is
527defined to be generated periodically, on demand or on change. Each metric in the
528Metric Report contains the URI to its metric definition and Redfish sensor,
529which reading value is presented. Nevertheless, under this presentation layer,
530Telemetry is gathering D-Bus sensors readings and exposing them in reading
531reports over D-Bus for the Telemetry Service. Each D-Bus sensor is mapped to
532Redfish sensor.
533
534Below examples of Redfish resources for the Telemetry Service are shown.
535
536The Telemetry Service Redfish resource example:
537
538```json
539{
540    "@odata.type": "#TelemetryService.v1_1_2.TelemetryService",
541    "Id": "TelemetryService",
542    "Name": "Telemetry Service",
543    "Status": {
544        "State": "Enabled",
545        "Health": "OK"
546    },
547    "MinCollectionInterval": "T00:00:10s",
548    "SupportedCollectionFunctions": [],
549    "MaxReports": <max_no_of_reports>,
550    "MetricDefinitions": {
551        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions"
552    },
553    "MetricReportDefinitions": {
554        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions"
555    },
556    "MetricReports": {
557        "@odata.id": "/redfish/v1/TelemetryService/MetricReports"
558    },
559    "Triggers": {
560        "@odata.id": "/redfish/v1/TelemetryService/Triggers"
561    },
562    "LogService": {
563        "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal"
564    },
565    "@odata.context": "/redfish/v1/$metadata#TelemetryService",
566    "@odata.id": "/redfish/v1/TelemetryService"
567}
568```
569
570Sample metric report definition:
571
572```json
573{
574  "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition",
575  "Id": "SampleMetric",
576  "Name": "Sample Metric Report Definition",
577  "MetricReportDefinitionType": "Periodic",
578  "Schedule": {
579    "RecurrenceInterval": "T00:00:10"
580  },
581  "ReportActions": ["LogToMetricReportsCollection"],
582  "ReportUpdates": "Overwrite",
583  "MetricReport": {
584    "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric"
585  },
586  "Status": {
587    "State": "Enabled"
588  },
589  "Metrics": [
590    {
591      "MetricId": "Test",
592      "MetricProperties": [
593        "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts"
594      ]
595    }
596  ],
597  "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition",
598  "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage"
599}
600```
601
602Sample metric report:
603
604```json
605{
606  "@odata.type": "#MetricReport.v1_2_0.MetricReport",
607  "Id": "SampleMetric",
608  "Name": "Sample Metric Report",
609  "ReportSequence": "0",
610  "MetricReportDefinition": {
611    "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
612  },
613  "MetricValues": [
614    {
615      "MetricDefinition": {
616        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition"
617      },
618      "MetricId": "Test",
619      "MetricValue": "100",
620      "Timestamp": "2016-11-08T12:25:00-05:00",
621      "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
622    }
623  ],
624  "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport",
625  "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage"
626}
627```
628
629Sample trigger, that will trigger metric report update:
630
631```json
632{
633  "@odata.type": "#Triggers.v1_1_1.Triggers",
634  "Id": "SampleTrigger",
635  "Name": "Sample Trigger",
636  "MetricType": "Numeric",
637  "Links": {
638    "MetricReportDefinitions": [
639      "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
640    ]
641  },
642  "Status": {
643    "State": "Enabled"
644  },
645  "TriggerActions": ["RedfishMetricReport"],
646  "NumericThresholds": {
647    "UpperCritical": {
648      "Reading": 50,
649      "Activation": "Increasing",
650      "DwellTime": "PT0.001S"
651    },
652    "UpperWarning": {
653      "Reading": 48.1,
654      "Activation": "Increasing",
655      "DwellTime": "PT0.004S"
656    }
657  },
658  "MetricProperties": [
659    "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
660  ],
661  "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers",
662  "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers"
663}
664```
665
666### Performance tests
667
668Performance test were conducted on the AST2500 system with 64 MB flash and 512
669MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The runtime
670statistics are shown in the table below. The reading report is mapped into
671single Metric Report. The runtime data is collected for the Telemetry component
672only. All reports was created with
673`xyz.openbmc_project.Telemetry.Metric.OnChange` property to maximize the
674workload. In the configuration with 50 reports and 50 sensors it is about 200
675new readings per second, generating 200 reading reports per second. The table
676shows CPU usage and memory usage. The VSZ is the amount of memory mapped into
677the address space of the process. It includes pages backed by the process'
678executable file and shared libraries, its heap and stack, as well as anything
679else it has mapped.
680
681| Telemetry service state                       | VSZ    | %VSZ | %CPU   |
682| --------------------------------------------- | ------ | ---- | ------ |
683| Idle (0 reports, 0 sensors)                   | 5188 B | 1%   | 0%     |
684| 1 report, 1 sensor                            | 5188 B | 1%   | 1%     |
685| 2 reports, 1 sensor                           | 5188 B | 1%   | 1%     |
686| 2 reports, 2 sensors (1 sensor per report)    | 5188 B | 1%   | 1%     |
687| 1 report, 10 sensors                          | 5188 B | 1%   | 1%     |
688| 10 reports, 10 sensors (same for each report) | 5320 B | 1%   | 1-2%   |
689| 2 reports, 20 sensors (10 per report)         | 5188 B | 1%   | 1%     |
690| 30 reports, 30 sensors (10 per report)        | 5444 B | 1%   | 5-9%   |
691| 50 reports, 50 sensors (10 per report)        | 5572 B | 1%   | 11-14% |
692
693The last two configurations use 10 sensors per reading report, which gives 3 or
6945 distinctive configurations. Each such configuration is used to create 10
695reading reports to obtain the desired amount of 30 or 50 reading reports.
696
697In this architecture reading report is created every time when Redfish Metric
698Report Definition is posted (creating new Metric Report).
699
700## Alternatives Considered
701
702The [framework based on collectd/librrd][5] was considered as alternate design.
703Although it seems to be versatile and scalable solution, it has some drawbacks
704from our point of view:
705
706- Collectd's footprint in the minimal working configuration is around 2.6 MB,
707  while available space for the OpenBMC is limited to 64 MB.
708- In this design, librrd is used to store metrics on the BMC's non-volatile
709  storage, which may be an issue, when lots of metrics are captured and stored
710  to OpenBMC's limited storage space. Also flash wear-out issue may occur, when
711  metrics are captured frequently (like once per second).
712- Telemetry service is directly compatible with Redfish Telemetry Service API,
713  which means, that Telemetry's reading reports can be directly mapped to
714  Redfish Metric Reports.
715- Telemetry service unifies the way how the BMC's telemetry is exposed over the
716  Redfish and may be used with multiple front-ends, thus there is no problem to
717  add support telemetry over IPMI or any other API.
718
719Since this design assumes flexibility and modularity, there is no obstacles to
720use collectd in cooperation with Telemetry. The one of possible configurations
721is shown on the diagram below.
722
723```ascii
724   +-----------------+      +-----------------+
725   |  D-Bus sensors  |      |   Telemetry     |
726   +--------^--------+      +--------^--------+
727            |                        |
728            |                        |
729            |                        |
730<--------^--v-----------D-Bus--------v-^---------->
731         |                             |
732         |                             |
733         |                             |
734 +-------v------------+     +----------v--------+
735 |  collectd metrics  |     |                   |
736 |  exposed as D-Bus  |     |     bmcweb        |
737 |      sensors       |     |  (with Redfish    |
738 +---------^----------+     |    Telemetry      |
739           |                |     Service)      |
740           |                |                   |
741    +------+-------+        +-------------------+
742    |              |
743    |   collectd   |
744    |              |
745    +--------------+
746```
747
748Here collectd is used as the source of some set of metrics. It exposes them as
749the D-Bus sensors, which can easily be consumed either by the bmcweb and
750Telemetry service without any changes in their D-Bus interfaces. In such
751configuration Telemetry service provides metric reports and triggers management.
752
753Other possible configuration is to use collectd without the Telemetry service,
754but in such case, collectd does not provide metric reports and triggers support
755compatible with the Redfish. In such case, Redfish Telemetry Service won't be
756supported or metric reports and triggers support has to be provided by the
757collectd.
758
759## Impacts
760
761This design impacts the architecture of the bmcweb component, since it adds the
762Redfish Telemetry Service implementation as a component for the existing Redfish
763API implementation.
764
765## Testing
766
767This is the very high-level description of the proposed set of tests. Testing
768shall be done on three basic levels:
769
770- Unit tests
771- Functional tests
772- Performance tests
773
774### Unit tests
775
776The Telemetry's code shall be covered by the unit tests. The preferred framework
777is the [GTest/GMock][7]. The unit tests shall be ran before code change is to be
778committed to make sure, that nothing is broken in existing functionality. Also,
779when new code is introduced, a new set of unit tests shall be committed with it
780according to test-driven development principle. Unit tests shall be also
781carefully reviewed.
782
783### Functional tests
784
785Functional tests will be divided into two steps.
786
787First step is for testing the Telemetry metric reports management. Test scenario
788shall contain creating metric report by POSTing proper metric report definition,
789reading metric report (using GET on proper URI) and deleting the metric report.
790The required configuration for such test is D-Bus sensors (at least some of
791them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be
792performed on real hardware. For ease of metric testing, dummy D-Bus sensors may
793be provided to provide specifically prepared metrics. This configuration shall
794also enable testing aggregated operations (MIN, MAX, SUM, AVG).
795
796Second step is to test triggers and events generation. This will require also
797Event Service to be implemented along with Log Service. Tests shall cover all
798scenarios with sending metric report as an event, triggering metric report
799update and logging events.
800
801### Performance tests
802
803Performance tests shall be done using full OpenBMC configuration with all the
804required set of features. The tests shall create a lot of metric reports (up to
805maximum number) along with all possible triggers. Measurements shall cover the
806periodic metric report jitter, delays in event logging or sending, BMC's CPU
807utilization and the performance impact on other services.
808
809[1]:
810  https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip
811[2]:
812  https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md
813[3]: https://www.kernel.org/doc/Documentation/hwmon/
814[4]: https://www.freedesktop.org/wiki/Software/dbus/
815[5]: https://gerrit.openbmc.org/c/openbmc/docs/+/22257
816[6]: https://gerrit.openbmc.org/c/openbmc/docs/+/24749
817[7]: https://github.com/google/googletest
818[8]:
819  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml
820[9]:
821  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/Report.interface.yaml
822[10]:
823  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Object/Delete.interface.yaml
824