xref: /openbmc/docs/designs/telemetry.md (revision 7ed7b62a)
1# OpenBMC platform telemetry
2
3Author:
4  Piotr Matuszczak <piotr.matuszczak@intel.com>
5
6Primary assignee:
7  Piotr Matuszczak
8
9Other contributors:
10  Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>,
11  Kamil Kowalski <kamil.kowalski@intel.com>
12
13Created:
14  2019-08-07
15
16## Problem Description
17The BMC on server platform gathers lots of telemetry data, which has to
18be exposed in clean, human readable and standardized format. This document
19focuses on telemetry over the Redfish, since it is standard API
20for platform manageability.
21
22## Background and References
23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1]
24for exposing platform telemetry over the network.
25* OpenBMC platform telemetry shall leverage the
26[OpenBMC sensors architecture implementation][2].
27* OpenBMC platform telemetry shall implement a service, called Telemetry to deal
28with metrics report and trigger management. This service is described later in
29this document.
30* Although we use the [hwmon][3] to gather readings from physical sensors, this
31architecture does not depend on it, because the Telemetry service component
32relies on the [OpenBMC D-Bus sensors][2].
33
34
35## Requirements
36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the
37Telemetry service requires telemetry sources to be implemented as D-Bus sensors.
38
39
40## Proposed Design
41Redfish Telemetry Model shall implement Telemetry Service with the following
42collection resources:
43* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.)
44* Metric Report Definitions - defines how metric report shall be created
45(which metrics it shall contain, how often it shall be generated etc.)
46* Metric Reports - contains actual metric reports containing telemetry data
47generated according to the Metric Report Definitions
48* Metric Triggers - contains thresholds and actions that apply to specific
49metrics
50
51OpenBMC telemetry architecture is shown on the diagram below.
52
53```ascii
54   +--------------+               +----------------+     +-----------------+
55   |hwmon|        |               |Dbus sensors|   |     |Telemetry|       |
56   +-----/        |               +------------/   |     +---------/       |
57   |              +--filesystem--->                |     |                 |
58   |              |               |                |     |                 |
59   +--------------+               +--------^-------+     +--------^--------+
60                                           |                      |
61                                           |                      |
62<------------------------------------------v-----^--DBus----------v----------->
63                                                 |
64                                                 |
65+-------+---------------------------------------------------------------------+
66|bmcweb |                                        |                            |
67+-------/                                        |                            |
68|                                                |                            |
69| +--------+-------------------------------------v--------------------------+ |
70| |Redfish |                                                                | |
71| +--------/                                            +---------+-------+ | |
72| |                                                     |Existing |       | | |
73| | +------------------------------------------------+  |Redfish  |       | | |
74| | |Telemetry Service|                              |  |resources|       | | |
75| | +----------------+/                              |  +---------/       | | |
76| | |  +----------+  +-----------+  +-------------+  |  |   +---------+   | | |
77| | |  |  Metric  |  |  Metric   |  |Metric report|  |  |   | Redfish |   | | |
78| | |  | triggers |  |definitions|  |definitions  <---------+ sensors |   | | |
79| | |  |          |  |           |  |             |  |  |   |         |   | | |
80| | |  +----+-----+  +-----+-----+  +------+------+  |  |   +---------+   | | |
81| | |       |              |               |         |  |                 | | |
82| | |       |              |               |         |  |                 | | |
83| | |       |              |               |         |  |                 | | |
84| | |       |        +-----v-----+         |         |  |                 | | |
85| | |       |        |   Metric  |         |         |  |                 | | |
86| | |       +-------->   report  <---------+         |  |                 | | |
87| | |                |           |                   |  |                 | | |
88| | |                +-----------+                   |  |                 | | |
89| | |                                                |  |                 | | |
90| | +------------------------------------------------+  +-----------------+ | |
91| |                                                                         | |
92| +-------------------------------------------------------------------------+ |
93|                                                                             |
94+-----------------------------------------------------------------------------+
95```
96
97The telemetry service component is a part of Redfish and implements the DMTF's
98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors
99URIs for metric report creation. Those sensors are also used to get
100URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation
101layer for the telemetry, while Telemetry service is responsible for gathering
102metrics from D-Bus sensors and exposing them as D-Bus objects. Telemetry
103service supports different monitoring modes (periodic, on change and on demand)
104along with aggregated operations:
105* SINGLE - current reading value
106* AVERAGE - average value over defined time period
107* MAX - max reading value during defined time period
108* MIN - min reading value during defined time period
109* SUM - sum of reading values over defined time period
110
111The time period for calculating aggregated metric is taken from the Redfish
112Metric Report Definition resource for each sensor's metric.
113
114Telemetry service supports creating and managing metric report, which may
115contain single or multiple metrics from sensors. This metric report is mapped
116to Metric Report for the Redfish Telemetry Service.
117
118The diagram below shows the flows for creation and update of metric report.
119
120```ascii
121+----+              +------+              +---------+                 +-------+
122|User|              |bmcweb|              |Telemetry|                 | D-Bus |
123+-+--+              +--+---+              +----+----+                 |Sensors|
124  |                    |                       |                      +---+---+
125  |                    |                       |                          |
126+-----------------------------------------------------------------------------+
127|Metric report definition flow|                |                          |   |
128+-----------------------------+                |                          |   |
129| |                    |                       |                          |   |
130| |                    |                       |                          |   |
131| |    POST request    |                       |                          |   |
132| |    with metric     |                       |                          |   |
133| |    report          |                       |                          |   |
134| |    definition      |                       |                          |   |
135| +-------------------->  Invoke AddReport     |  Register for D-Bus      |   |
136| |                    |  method on D-Bus      |  sensors                 |   |
137| |                    +----------------------->  PropertiesChanged       |   |
138| |                    |                       |  signals                 |   |
139| |                    |                       +-------------------------->   |
140| |                    |                       |-------------------------->   |
141| |                    |                       +-------------------------->   |
142| |                    |                       |                          |   |
143| |  HTTP response     |                       +-+Create Report           |   |
144| |  code 201 with     |  Return created       | |D-Bus object            |   |
145| |  Metric Report     |  Report D-Bus path    <-+                        |   |
146| |  Definition's URI  <-----------------------+                          |   |
147| <--------------------+                       |                          |   |
148| |                    |                       |                          |   |
149| |                    |                       |                          |   |
150+-----------------------------------------------------------------------------+
151  |                    |                       |                          |
152+-----------------------------------------------------------------------------+
153|Periodic metric report update flow|           |                          |   |
154+----------------------------------+           +-+Metric report           |   |
155| |                    |                       | |timer triggers          |   |
156| |                    |                       <-+report update           |   |
157| |                    |                       |                          |   |
158+----------------------------------Optional-----------------------------------+
159| |                    |                       |                          |   |
160| |  Send report as SSE or push-style event    |                          |   |
161| |  using Redfish Event Service (not shown    |                          |   |
162| |  here) if configured to do so.             |                          |   |
163| <--------------------------------------------+                          |   |
164| |                    |                       |                          |   |
165+-----------------------------------------------------------------------------+
166| |  GET on Metric     |                       |                          |   |
167| |  Report URI        |                       |   Sensor's Properties-   |   |
168| +-------------------->                       |   Changed signal         |   |
169| |                    +-+Map report's URI     <--------------------------+   |
170| |                    | |to D-Bus path        |                          |   |
171| |                    <-+                     | +----------------------+ |   |
172| |                    | Invoke GetAll method  | |Note that sensor's    | |   |
173| |                    | on report D-Bus       | |PropertiesChanged     | |   |
174| |                    | object                | |signal is asynchronous| |   |
175| |                    +-----------------------> |to metric report timer| |   |
176| |                    |                       | |This timer is the only| |   |
177| |  Return metric     | Return report data    | |thing that triggers   | |   |
178| |  report in JSON    <-----------------------+ |metric report update  | |   |
179| |  format            |                       | +----------------------+ |   |
180| <--------------------+                       |                          |   |
181| |                    |                       |                          |   |
182+-----------------------------------------------------------------------------+
183  |                    |                       |                          |
184+-----------------------------------------------------------------------------+
185|On change metric report update flow|          |   Sensor's Properties-   |   |
186+-----------------------------------+          |   Changed signal         |   |
187| |                    |                       <--------------------------+   |
188| |                    |                       |                          |   |
189| |                    |                       +-+Sensor's signal         |   |
190| |                    |                       | |triggers report         |   |
191| |                    |                       <-+update                  |   |
192| |                    |                       |                          |   |
193+----------------------------------Optional-----------------------------------+
194| |                    |                       |                          |   |
195| |  Send report as SSE or push-style event    |                          |   |
196| |  using Redfish Event Service (not shown    |                          |   |
197| |  here) if configured to do so.             |                          |   |
198| <--------------------------------------------+                          |   |
199| |                    |                       |                          |   |
200+-----------------------------------------------------------------------------+
201| |  GET on Metric     |                       |                          |   |
202| |  Report URI        |                       |                          |   |
203| +-------------------->                       |                          |   |
204| |                    +-+Map report's URI     |                          |   |
205| |                    | |to D-Bus path        | +----------------------+ |   |
206| |                    <-+                     | |Note that sensor's    | |   |
207| |                    | Invoke GetAll method  | |PropertiesChanged     | |   |
208| |                    | on report D-Bus       | |signal triggers the   | |   |
209| |                    | object                | |report update. It is  | |   |
210| |                    +-----------------------> |sufficient that the   | |   |
211| |                    |                       | |signal from only one  | |   |
212| |  Return metric     | Return report data    | |sensor triggers report| |   |
213| |  report in JSON    <-----------------------+ |update.               | |   |
214| |  format            |                       | +----------------------+ |   |
215| <--------------------+                       |                          |   |
216| |                    |                       |                          |   |
217+-----------------------------------------------------------------------------+
218  |                    |                       |                          |
219+-+--------------------+------------------------------------------------------+
220|On demand metric report update flow|          |                          |   |
221+-+--------------------+------------+          |                          |   |
222| |                    |                       |                          |   |
223| |  GET on Metric     |                       |                          |   |
224| |  Report URI        |                       |                          |   |
225| +-------------------->                       |                          |   |
226| |                    +-+Map report's URI     |                          |   |
227| |                    | |to D-Bus path        |                          |   |
228| |                    <-+                     |                          |   |
229| |                    |                       |                          |   |
230| |                    |  Invoke the Update    |                          |   |
231| |                    |  method for report    |                          |   |
232| |                    |  D-Bus object         |                          |   |
233| |                    +----------------------->                          |   |
234| |                    |                       +-+Update method triggers  |   |
235| |                    |                       | |report to be updated    |   |
236| |                    |                       | |with the latest known   |   |
237| |                    |                       | |sensor's readings.      |   |
238| |                    |                       | |No additional sensor    |   |
239| |                    |                       <-+readings are performed. |   |
240+----------------------------------Optional-----------------------------------+
241| |                    |                       |                          |   |
242| |  Send report as SSE or push-style event    |                          |   |
243| |  using Redfish Event Service (not shown    |                          |   |
244| |  here) if configured to do so.             |                          |   |
245| <--------------------------------------------+                          |   |
246| |                    |                       |                          |   |
247+-----------------------------------------------------------------------------+
248| |                    | Update method call    |                          |   |
249| |                    | result                |                          |   |
250| |                    <-----------------------+                          |   |
251| |                    |                       |                          |   |
252| |                    | Invoke GetAll method  |                          |   |
253| |                    | on report D-Bus       |                          |   |
254| |                    | object                |                          |   |
255| |                    +----------------------->                          |   |
256| |                    |                       |                          |   |
257| |  Return metric     | Return report data    |                          |   |
258| |  report in JSON    <-----------------------+                          |   |
259| |  format            |                       |                          |   |
260| <--------------------+                       |                          |   |
261| |                    |                       |                          |   |
262+-----------------------------------------------------------------------------+
263  |                    |                       |                          |
264```
265
266The Redfish implementation in bmcweb is stateless, thus it is not able to
267store metric reports. All operations on metric reports shall be done in
268the Telemetry service. Sending metric report as SSE or push-style events
269shall be done via the [Redfish Event Service][6]. It is marked as optional
270because metric report does not have to be configured for pushing its data
271through the event.
272
273In case of on demand metric report update, Telemetry service performs no
274additional sensor readings because it already has the latest values, since
275they are updated on PropertiesChanged signal from the D-Bus sensors.
276
277**Telemetry service on [D-Bus][4]**
278
279Telemetry service exposes specific interfaces on D-Bus. One of them will be
280used for reading report management. The second one will be used for triggers
281management.
282
283**Reading report management**
284
285The reading report management D-Bus object:
286
287```ascii
288xyz.openbmc_project.Telemetry.ReportManager
289/xyz/openbmc_project/Telemetry/Reports
290```
291The `ReportManager` implements D-Bus interface
292[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The
293interface is described in the phosphor-dbus-interfaces. This interface
294implements `AddReport` method, which is used to create a metric report. The
295report may contain a single or multiple sensor readings. The way how the report
296will be stored by the BMC is defined by one of this method's parameters.
297The `ReportManager` object implements property that stores the maximum number
298of reports supported simultaneously.
299
300The `AddReport` method returns the path to the newly created report object.
301The report object implements the [`xyz.openbmc_project.Object.Delete`][10]
302and [`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10]
303interface is defined to add support for removing Report object, while the
304[`Report`][9] interface implements methods and properties for Report
305management along with properties containing telemetry readings. Each report
306object contains the timestamp of its last update. The report object
307contains an array of structures containing reading with its metadata and
308timestamp of last update of this metric. Each report has also the property
309that stores update interval (for periodically updated reports).
310
311**Trigger management**
312
313The trigger management D-Bus object:
314
315```ascii
316xyz.openbmc_project.Telemetry.TriggerManager
317/xyz/openbmc_project/Telemetry/Triggers
318```
319The `TriggerManager` supports the
320`xyz.openbmc_project.Telemetry.TriggerManager` interface, which implements
321the `AddTrigger` method. This method shall be used to create new trigger for
322the certain metric. The method's parameters allow to define the type of metric
323for which trigger is set (discrete or numeric). Depend on this setting, this
324method accepts different set of trigger parameters.
325
326For discrete metric type, trigger parameters contain:
327
328| Field | Type | Description |
329|-------|------|-------------|
330| TriggerCondition | enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
331| DiscreteTriggers | array of structures | Array of discrete trigger structures. |
332
333Member of DiscreteTriggers array:
334
335| Field | Type | Description |
336|-------|------|-------------|
337| TriggerId| string     | Unique trigger Id |
338| Severity | enum     | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention |
339| Value | variant    | Value of discrete metric, that constitutes a trigger event. |
340| DwellTime | uint64     | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. |
341
342For numeric metric type, trigger parameters contain numeric thresholds.
343Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical, lower warning and lower critical. Thus it will contain up to 4 structures shown below:
344
345| Field | Type | Description |
346|-------|------|-------------|
347| ThresholdType | enum | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention|
348| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. |
349| Activation | enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above |
350| ThresholdValue | variant | Value of reading that will trigger the threshold |
351
352The `AddTrigger` method also allows to define the specific action when trigger
353is activated. Upon the trigger activation, three possible actions are allowed,
354logging event to log service, sending event via event service and triggering
355the metric report update.
356
357In order to assign trigger to specific metric, the metric parameter is defined.
358Its structure contains the following data:
359
360| Field | Type | Description |
361|-------|------|-------------|
362| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. |
363| MetricId   | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
364| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. |
365
366The `AddTrigger` method also allows to set trigger's persistency (whether
367trigger shall be stored in the BMC's non-volatile memory).
368
369The `AddTrigger` method returns:
370```ascii
371String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}'
372```
373Such created trigger implements the `xyz.openbmc_project.Object.Delete`
374and the `xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger
375object contains read-only information about metric type, for which it was
376created (discrete or numeric). This information determines which triggers
377are stored within trigger object.
378
379If trigger is defined for discrete metric type, than it contains trigger
380information that looks like this:
381
382| Type | Description |
383|------|-------------|
384| enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
385| array of structures | Array of discrete trigger structures. |
386
387Discrete trigger structure:
388
389| Type | Description |
390|------|-------------|
391| string     | Unique trigger Id |
392| enum     | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention |
393| variant    | Value of discrete metric, that constitutes a trigger event. |
394| uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
395
396If trigger is defined for numeric metric type, than it contains information
397about numeric triggers that is an array of 4 structures presented below:
398
399| Type | Description |
400|------|-------------|
401| enum | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning"|
402| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
403| enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" |
404| variant | Value of reading that will trigger the threshold |
405
406The trigger object also contains information about reading, for which trigger
407was defined. It is in a form of structure consisting of three fields.
408
409| Field type | Description  |
410|------------|--------------|
411| object path  | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. |
412| string     | Unique metric Id |
413| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. |
414
415**Trigger operations**
416
417Triggers support three types of operation: Log, Event and Update. For each,
418there is a different way of proceeding.
419
4201. For action Log, the event shall
421be logged to the system journal. In this case the Telemetry service writes
422data to system journal using libjournal. The Redfish log service shall then
423retrieve the data by reading system journal. All is shown on the diagram below.
424
425```ascii
426+---------------------------+
427|bmcweb|                    |         +----------------------+
428+------/    +-----------+-+ |         |Telemetry|            |
429|           |Redfish    | | |         +---------/            |
430|           |log service| | |         |                      |
431|           +-----------/ | |         |                      |
432|           |             | |         |                      |
433|           |             | |         |                      |
434|           +------^------+ |         +-----------+----------+
435+---------------------------+                     |
436                   |                              |
437                   +----collect----+            event
438                     journal entry |      (write to journal)
439                                   |              |
440       +------------------------------------+     |
441       |systemd|                   |        |     |
442       +-------/ +----------+  +---+------+ |     |
443       |         |journal|  |  |libjournal| |     |
444       |         +-------/  <-->          <-------+
445       |         |          |  +----------+ |
446       |         |          |               |
447       |         |          |               |
448       |         +----------+               |
449       |                                    |
450       +------------------------------------+
451```
4522. For action Event, the Telemetry service shall send event using the
453[Redfish Event Service][6] either as push-style event or SSE.
454
4553. For action Update, the Telemetry service will trigger the update of reading
456report pointed by it's D-Bus path contained in trigger object properties. The
457update shall cause the reading report's D-Bus object to emit property change
458signal. This will cause Redfish Metric Report to be streamed out if it was
459configured to do so.
460
461**Redfish Telemetry Service API**
462
463Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry
464resources. Metric report definitions determines which metrics are to be include
465in metric report. Metric definition is assigned to particular metric type and it
466describes how the metric should be interpreted. The following resource schemas
467shall be supported:
468
469- TelemetryService 1.1.2
470- MetricDefinition 1.0.3
471- MetricReportDefinition 1.3.0
472- MetricReport 1.2.0
473- Triggers 1.1.1
474
475The following diagram shows relations between these resources.
476
477```ascii
478 +----------------------------------------------------------------------------+
479 |                             Service root                                   |
480 +----------------------------------+-------------------------------+---------+
481                                    |                               |
482                                    |                               |
483                                    |                               |
484 +----------------------------------v-----------------+  +----------v---------+
485 |                                                    |  |Chassis             |
486 |                Telemetry Service                   |  |                    |
487 |                                                    |  |                    |
488 |                                                    |  |  +---------------+ |
489 +---------+--------------+------------------+--------+  |  |               | |
490           |              |                  |           |  |   Chassis 1   | |
491           |              |                  |           |  |               | |
492           |              |                  |           |  +---------+-----+ |
493           |              |                  |           |            |       |
494+----------v--+  +--------v----+  +----------v-----+     +--------------------+
495|Triggers     |  |Metric       |  |Metric report   |                  |
496|             |  |definition   |  |                |                  |
497|             |  | +---------+ |  |                | Reads            |
498| +---------+ |  | |Reading  | |  | +-----------+  | ReadingVolts  +--v------+
499| |         | |  | |Volts    <------+           +------------------>         |
500| |Trigger 1| |  | +---------+ |  | |  Metric   |  |               |         |
501| |         | |  |             |  | | report 1  |  | Reads         |  Power  |
502| |         | |  | +---------+ |  | |           |  | PowerConsumed |         |
503| |         | |  | |         | |  | |           |  | Watts         |         |
504| +--+---+--+ |  | |Power    <------+           +------------------>         |
505|    |   |    |  | |Consumed | |  | +-----^-----+  |               +----^----+
506|    |   |    |  | |Watts    | |  |       |        |                    |
507|    |   |    |  | +---------+ |  |       |        |                    |
508|    |   |    |  |             |  |       |        |                    |
509+-------------+  +-------------+  +----------------+                    |
510     |   |                                |                             |
511     |   | Triggers report update         |                             |
512     |   | (when applicable)              |                             |
513     |   +--------------------------------+                             |
514     |                                                                  |
515     |   Monitors PowerConsumedWatts to check                           |
516     |   whether trigger value is exceeded                              |
517     +------------------------------------------------------------------+
518```
519
520The diagram shows the relations between Redfish resources. Metric report is
521defined to be generated periodically, on demand or on change. Each metric in the
522Metric Report contains the URI to its metric definition and Redfish sensor,
523which reading value is presented. Nevertheless, under this presentation layer,
524Telemetry is gathering D-Bus sensors readings and exposing them
525in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor
526is mapped to Redfish sensor.
527
528Below examples of Redfish resources for the Telemetry Service are shown.
529
530The Telemetry Service Redfish resource example:
531
532```json
533{
534    "@odata.type": "#TelemetryService.v1_1_2.TelemetryService",
535    "Id": "TelemetryService",
536    "Name": "Telemetry Service",
537    "Status": {
538        "State": "Enabled",
539        "Health": "OK"
540    },
541    "MinCollectionInterval": "T00:00:10s",
542    "SupportedCollectionFunctions": [],
543    "MaxReports": <max_no_of_reports>,
544    "MetricDefinitions": {
545        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions"
546    },
547    "MetricReportDefinitions": {
548        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions"
549    },
550    "MetricReports": {
551        "@odata.id": "/redfish/v1/TelemetryService/MetricReports"
552    },
553    "Triggers": {
554        "@odata.id": "/redfish/v1/TelemetryService/Triggers"
555    },
556    "LogService": {
557        "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal"
558    },
559    "@odata.context": "/redfish/v1/$metadata#TelemetryService",
560    "@odata.id": "/redfish/v1/TelemetryService"
561}
562```
563
564Sample metric report definition:
565
566```json
567{
568    "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition",
569    "Id": "SampleMetric",
570    "Name": "Sample Metric Report Definition",
571    "MetricReportDefinitionType": "Periodic",
572    "Schedule": {
573        "RecurrenceInterval": "T00:00:10"
574    },
575    "ReportActions": [
576        "LogToMetricReportsCollection"
577    ],
578    "ReportUpdates": "Overwrite",
579    "MetricReport": {
580        "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric"
581    },
582    "Status": {
583        "State": "Enabled"
584    },
585    "Metrics": [
586        {
587            "MetricId": "Test",
588            "MetricProperties": [
589                "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts"
590            ]
591        }
592    ],
593    "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition",
594    "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage"
595}
596```
597
598Sample metric report:
599
600```json
601{
602    "@odata.type": "#MetricReport.v1_2_0.MetricReport",
603    "Id": "SampleMetric",
604    "Name": "Sample Metric Report",
605    "ReportSequence": "0",
606    "MetricReportDefinition": {
607        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
608    },
609    "MetricValues": [
610        {
611            "MetricDefinition": {
612                "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition"
613            },
614            "MetricId": "Test",
615            "MetricValue": "100",
616            "Timestamp": "2016-11-08T12:25:00-05:00",
617            "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
618        }
619    ],
620    "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport",
621    "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage"
622}
623```
624
625Sample trigger, that will trigger metric report update:
626
627```json
628{
629    "@odata.type": "#Triggers.v1_1_1.Triggers",
630    "Id": "SampleTrigger",
631    "Name": "Sample Trigger",
632    "MetricType": "Numeric",
633    "Links": {
634        "MetricReportDefinitions": [
635            "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
636        ]
637    },
638    "Status": {
639        "State": "Enabled"
640    },
641    "TriggerActions": [
642        "RedfishMetricReport"
643    ],
644    "NumericThresholds": {
645        "UpperCritical": {
646            "Reading": 50,
647            "Activation": "Increasing",
648            "DwellTime": "PT0.001S"
649        },
650        "UpperWarning": {
651            "Reading": 48.1,
652            "Activation": "Increasing",
653            "DwellTime": "PT0.004S"
654        }
655    },
656    "MetricProperties": [
657        "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
658    ],
659    "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers",
660    "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers"
661}
662```
663
664**Performance tests**
665
666Performance test were conducted on the AST2500 system with 64 MB flash and
667512 MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The
668runtime statistics are shown in the table below. The reading report is
669mapped into single Metric Report. The runtime data is collected for the
670Telemetry component only. All reports was created with
671```xyz.openbmc_project.Telemetry.Metric.OnChange``` property to
672maximize the workload. In the configuration with 50 reports and 50 sensors
673it is about 200 new readings per second, generating 200 reading reports
674per second. The table shows CPU usage and memory usage. The VSZ is the amount
675of memory mapped into the address space of the process. It includes pages
676backed by the process' executable file and shared libraries, its heap and
677stack, as well as anything else it has mapped.
678
679
680| Telemetry service state                          | VSZ  | %VSZ | %CPU |
681|--------------------------------------------------|------|------|------|
682| Idle (0 reports, 0 sensors)                      |5188 B| 1%   | 0%   |
683| 1 report, 1 sensor                               |5188 B| 1%   | 1%   |
684| 2 reports, 1 sensor                              |5188 B| 1%   | 1%   |
685| 2 reports, 2 sensors (1 sensor per report)       |5188 B| 1%   | 1%   |
686| 1 report, 10 sensors                             |5188 B| 1%   | 1%   |
687| 10 reports, 10 sensors (same for each report)    |5320 B| 1%   | 1-2% |
688| 2 reports, 20 sensors (10 per report)            |5188 B| 1%   | 1%   |
689| 30 reports, 30 sensors (10 per report)           |5444 B| 1%   | 5-9% |
690| 50 reports, 50 sensors (10 per report)           |5572 B| 1%   |11-14%|
691
692The last two configurations use 10 sensors per reading report, which gives
6933 or 5 distinctive configurations. Each such configuration is used to
694create 10 reading reports to obtain the desired amount of 30 or 50 reading
695reports.
696
697In this architecture reading report is created every time when Redfish
698Metric Report Definition is posted (creating new Metric Report).
699
700## Alternatives Considered
701The [framework based on collectd/librrd][5] was considered as alternate design.
702Although it seems to be versatile and scalable solution, it has some drawbacks
703from our point of view:
704* Collectd's footprint in the minimal working configuration is around 2.6 MB,
705while available space for the OpenBMC is limited to 64 MB.
706* In this design, librrd is used to store metrics on the BMC's non-volatile
707storage, which may be an issue, when lots of metrics are captured and stored
708to OpenBMC's limited storage space. Also flash wear-out issue may occur, when
709metrics are captured frequently (like once per second).
710* Telemetry service is directly compatible with Redfish Telemetry Service API,
711which means, that Telemetry's reading reports can be directly mapped to Redfish
712Metric Reports.
713* Telemetry service unifies the way how the BMC's telemetry is exposed over
714the Redfish and may be used with multiple front-ends, thus there is no problem
715to add support telemetry over IPMI or any other API.
716
717Since this design assumes flexibility and modularity, there is no obstacles to
718use collectd in cooperation with Telemetry. The one of possible configurations
719is shown on the diagram below.
720
721```ascii
722   +-----------------+      +-----------------+
723   |  D-Bus sensors  |      |   Telemetry     |
724   +--------^--------+      +--------^--------+
725            |                        |
726            |                        |
727            |                        |
728<--------^--v-----------D-Bus--------v-^---------->
729         |                             |
730         |                             |
731         |                             |
732 +-------v------------+     +----------v--------+
733 |  collectd metrics  |     |                   |
734 |  exposed as D-Bus  |     |     bmcweb        |
735 |      sensors       |     |  (with Redfish    |
736 +---------^----------+     |    Telemetry      |
737           |                |     Service)      |
738           |                |                   |
739    +------+-------+        +-------------------+
740    |              |
741    |   collectd   |
742    |              |
743    +--------------+
744```
745Here collectd is used as the source of some set of metrics. It exposes them
746as the D-Bus sensors, which can easily be consumed either by the bmcweb and
747Telemetry service without any changes in their D-Bus interfaces. In such
748configuration Telemetry service provides metric reports and triggers
749management.
750
751Other possible configuration is to use collectd without the Telemetry service,
752but in such case, collectd does not provide metric reports and triggers support
753compatible with the Redfish. In such case, Redfish Telemetry Service won't be
754supported or metric reports and triggers support has to be provided by the
755collectd.
756
757## Impacts
758This design impacts the architecture of the bmcweb component, since it adds
759the Redfish Telemetry Service implementation as a component for the existing
760Redfish API implementation.
761
762## Testing
763This is the very high-level description of the proposed set of tests.
764Testing shall be done on three basic levels:
765* Unit tests
766* Functional tests
767* Performance tests
768
769**Unit tests**
770
771The Telemetry's code shall be covered by the unit tests. The preferred
772framework is the [GTest/GMock][7]. The unit tests shall be ran before code
773change is to be committed to make sure, that nothing is broken in existing
774functionality. Also, when new code is introduced, a new set of unit tests shall
775be committed with it according to test-driven development principle. Unit tests
776shall be also carefully reviewed.
777
778**Functional tests**
779
780Functional tests will be divided into two steps.
781
782First step is for testing the Telemetry metric reports management. Test scenario
783shall contain creating metric report by POSTing proper metric report definition,
784reading metric report (using GET on proper URI) and deleting the metric report.
785The required configuration for such test is D-Bus sensors (at least some of
786them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be
787performed on real hardware. For ease of metric testing, dummy D-Bus sensors may
788be provided to provide specifically prepared metrics. This configuration shall
789also enable testing aggregated operations (MIN, MAX, SUM, AVG).
790
791Second step is to test triggers and events generation. This will require also
792Event Service to be implemented along with Log Service. Tests shall cover all
793scenarios with sending metric report as an event, triggering metric report
794update and logging events.
795
796**Performance tests**
797
798Performance tests shall be done using full OpenBMC configuration with all
799the required set of features. The tests shall create a lot of metric reports
800(up to maximum number) along with all possible triggers. Measurements shall
801cover the periodic metric report jitter, delays in event logging or sending,
802BMC's CPU utilization and the performance impact on other services.
803
804[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip
805[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md
806[3]: https://www.kernel.org/doc/Documentation/hwmon/
807[4]: https://www.freedesktop.org/wiki/Software/dbus/
808[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257
809[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749
810[7]: https://github.com/google/googletest
811[8]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml
812[9]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Telemetry/Report.interface.yaml
813[10]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Object/Delete.interface.yaml
814