xref: /openbmc/docs/designs/telemetry.md (revision 0ee8da092d3b7c067b9fbf99f62a744eb95c04a8)
1# OpenBMC platform telemetry
2
3Author:
4  Piotr Matuszczak <piotr.matuszczak@intel.com>
5
6Other contributors:
7  Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>,
8  Kamil Kowalski <kamil.kowalski@intel.com>
9
10Created:
11  2019-08-07
12
13## Problem Description
14The BMC on server platform gathers lots of telemetry data, which has to
15be exposed in clean, human readable and standardized format. This document
16focuses on telemetry over the Redfish, since it is standard API
17for platform manageability.
18
19## Background and References
20* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1]
21for exposing platform telemetry over the network.
22* OpenBMC platform telemetry shall leverage the
23[OpenBMC sensors architecture implementation][2].
24* OpenBMC platform telemetry shall implement a service, called Telemetry to deal
25with metrics report and trigger management. This service is described later in
26this document.
27* Although we use the [hwmon][3] to gather readings from physical sensors, this
28architecture does not depend on it, because the Telemetry service component
29relies on the [OpenBMC D-Bus sensors][2].
30
31
32## Requirements
33* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the
34Telemetry service requires telemetry sources to be implemented as D-Bus sensors.
35
36
37## Proposed Design
38Redfish Telemetry Model shall implement Telemetry Service with the following
39collection resources:
40* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.)
41* Metric Report Definitions - defines how metric report shall be created
42(which metrics it shall contain, how often it shall be generated etc.)
43* Metric Reports - contains actual metric reports containing telemetry data
44generated according to the Metric Report Definitions
45* Metric Triggers - contains thresholds and actions that apply to specific
46metrics
47
48OpenBMC telemetry architecture is shown on the diagram below.
49
50```ascii
51   +--------------+               +----------------+     +-----------------+
52   |hwmon|        |               |Dbus sensors|   |     |Telemetry|       |
53   +-----/        |               +------------/   |     +---------/       |
54   |              +--filesystem--->                |     |                 |
55   |              |               |                |     |                 |
56   +--------------+               +--------^-------+     +--------^--------+
57                                           |                      |
58                                           |                      |
59<------------------------------------------v-----^--DBus----------v----------->
60                                                 |
61                                                 |
62+-------+---------------------------------------------------------------------+
63|bmcweb |                                        |                            |
64+-------/                                        |                            |
65|                                                |                            |
66| +--------+-------------------------------------v--------------------------+ |
67| |Redfish |                                                                | |
68| +--------/                                            +---------+-------+ | |
69| |                                                     |Existing |       | | |
70| | +------------------------------------------------+  |Redfish  |       | | |
71| | |Telemetry Service|                              |  |resources|       | | |
72| | +----------------+/                              |  +---------/       | | |
73| | |  +----------+  +-----------+  +-------------+  |  |   +---------+   | | |
74| | |  |  Metric  |  |  Metric   |  |Metric report|  |  |   | Redfish |   | | |
75| | |  | triggers |  |definitions|  |definitions  <---------+ sensors |   | | |
76| | |  |          |  |           |  |             |  |  |   |         |   | | |
77| | |  +----+-----+  +-----+-----+  +------+------+  |  |   +---------+   | | |
78| | |       |              |               |         |  |                 | | |
79| | |       |              |               |         |  |                 | | |
80| | |       |              |               |         |  |                 | | |
81| | |       |        +-----v-----+         |         |  |                 | | |
82| | |       |        |   Metric  |         |         |  |                 | | |
83| | |       +-------->   report  <---------+         |  |                 | | |
84| | |                |           |                   |  |                 | | |
85| | |                +-----------+                   |  |                 | | |
86| | |                                                |  |                 | | |
87| | +------------------------------------------------+  +-----------------+ | |
88| |                                                                         | |
89| +-------------------------------------------------------------------------+ |
90|                                                                             |
91+-----------------------------------------------------------------------------+
92```
93
94The telemetry service component is a part of Redfish and implements the DMTF's
95[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors
96URIs for metric report creation. Those sensors are also used to get
97URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation
98layer for the telemetry, while Telemetry service is responsible for gathering
99metrics from D-Bus sensors and exposing them as D-Bus objects. Telemetry
100service supports different monitoring modes (periodic, on change and on demand)
101along with aggregated operations:
102* SINGLE - current reading value
103* AVERAGE - average value over defined time period
104* MAX - max reading value during defined time period
105* MIN - min reading value during defined time period
106* SUM - sum of reading values over defined time period
107
108The time period for calculating aggregated metric is taken from the Redfish
109Metric Report Definition resource for each sensor's metric.
110
111Telemetry service supports creating and managing metric report, which may
112contain single or multiple metrics from sensors. This metric report is mapped
113to Metric Report for the Redfish Telemetry Service.
114
115The diagram below shows the flows for creation and update of metric report.
116
117```ascii
118+----+              +------+              +---------+                 +-------+
119|User|              |bmcweb|              |Telemetry|                 | D-Bus |
120+-+--+              +--+---+              +----+----+                 |Sensors|
121  |                    |                       |                      +---+---+
122  |                    |                       |                          |
123+-----------------------------------------------------------------------------+
124|Metric report definition flow|                |                          |   |
125+-----------------------------+                |                          |   |
126| |                    |                       |                          |   |
127| |                    |                       |                          |   |
128| |    POST request    |                       |                          |   |
129| |    with metric     |                       |                          |   |
130| |    report          |                       |                          |   |
131| |    definition      |                       |                          |   |
132| +-------------------->  Invoke AddReport     |  Register for D-Bus      |   |
133| |                    |  method on D-Bus      |  sensors                 |   |
134| |                    +----------------------->  PropertiesChanged       |   |
135| |                    |                       |  signals                 |   |
136| |                    |                       +-------------------------->   |
137| |                    |                       |-------------------------->   |
138| |                    |                       +-------------------------->   |
139| |                    |                       |                          |   |
140| |  HTTP response     |                       +-+Create Report           |   |
141| |  code 201 with     |  Return created       | |D-Bus object            |   |
142| |  Metric Report     |  Report D-Bus path    <-+                        |   |
143| |  Definition's URI  <-----------------------+                          |   |
144| <--------------------+                       |                          |   |
145| |                    |                       |                          |   |
146| |                    |                       |                          |   |
147+-----------------------------------------------------------------------------+
148  |                    |                       |                          |
149+-----------------------------------------------------------------------------+
150|Periodic metric report update flow|           |                          |   |
151+----------------------------------+           +-+Metric report           |   |
152| |                    |                       | |timer triggers          |   |
153| |                    |                       <-+report update           |   |
154| |                    |                       |                          |   |
155+----------------------------------Optional-----------------------------------+
156| |                    |                       |                          |   |
157| |  Send report as SSE or push-style event    |                          |   |
158| |  using Redfish Event Service (not shown    |                          |   |
159| |  here) if configured to do so.             |                          |   |
160| <--------------------------------------------+                          |   |
161| |                    |                       |                          |   |
162+-----------------------------------------------------------------------------+
163| |  GET on Metric     |                       |                          |   |
164| |  Report URI        |                       |   Sensor's Properties-   |   |
165| +-------------------->                       |   Changed signal         |   |
166| |                    +-+Map report's URI     <--------------------------+   |
167| |                    | |to D-Bus path        |                          |   |
168| |                    <-+                     | +----------------------+ |   |
169| |                    | Invoke GetAll method  | |Note that sensor's    | |   |
170| |                    | on report D-Bus       | |PropertiesChanged     | |   |
171| |                    | object                | |signal is asynchronous| |   |
172| |                    +-----------------------> |to metric report timer| |   |
173| |                    |                       | |This timer is the only| |   |
174| |  Return metric     | Return report data    | |thing that triggers   | |   |
175| |  report in JSON    <-----------------------+ |metric report update  | |   |
176| |  format            |                       | +----------------------+ |   |
177| <--------------------+                       |                          |   |
178| |                    |                       |                          |   |
179+-----------------------------------------------------------------------------+
180  |                    |                       |                          |
181+-----------------------------------------------------------------------------+
182|On change metric report update flow|          |   Sensor's Properties-   |   |
183+-----------------------------------+          |   Changed signal         |   |
184| |                    |                       <--------------------------+   |
185| |                    |                       |                          |   |
186| |                    |                       +-+Sensor's signal         |   |
187| |                    |                       | |triggers report         |   |
188| |                    |                       <-+update                  |   |
189| |                    |                       |                          |   |
190+----------------------------------Optional-----------------------------------+
191| |                    |                       |                          |   |
192| |  Send report as SSE or push-style event    |                          |   |
193| |  using Redfish Event Service (not shown    |                          |   |
194| |  here) if configured to do so.             |                          |   |
195| <--------------------------------------------+                          |   |
196| |                    |                       |                          |   |
197+-----------------------------------------------------------------------------+
198| |  GET on Metric     |                       |                          |   |
199| |  Report URI        |                       |                          |   |
200| +-------------------->                       |                          |   |
201| |                    +-+Map report's URI     |                          |   |
202| |                    | |to D-Bus path        | +----------------------+ |   |
203| |                    <-+                     | |Note that sensor's    | |   |
204| |                    | Invoke GetAll method  | |PropertiesChanged     | |   |
205| |                    | on report D-Bus       | |signal triggers the   | |   |
206| |                    | object                | |report update. It is  | |   |
207| |                    +-----------------------> |sufficient that the   | |   |
208| |                    |                       | |signal from only one  | |   |
209| |  Return metric     | Return report data    | |sensor triggers report| |   |
210| |  report in JSON    <-----------------------+ |update.               | |   |
211| |  format            |                       | +----------------------+ |   |
212| <--------------------+                       |                          |   |
213| |                    |                       |                          |   |
214+-----------------------------------------------------------------------------+
215  |                    |                       |                          |
216+-+--------------------+------------------------------------------------------+
217|On demand metric report update flow|          |                          |   |
218+-+--------------------+------------+          |                          |   |
219| |                    |                       |                          |   |
220| |  GET on Metric     |                       |                          |   |
221| |  Report URI        |                       |                          |   |
222| +-------------------->                       |                          |   |
223| |                    +-+Map report's URI     |                          |   |
224| |                    | |to D-Bus path        |                          |   |
225| |                    <-+                     |                          |   |
226| |                    |                       |                          |   |
227| |                    |  Invoke the Update    |                          |   |
228| |                    |  method for report    |                          |   |
229| |                    |  D-Bus object         |                          |   |
230| |                    +----------------------->                          |   |
231| |                    |                       +-+Update method triggers  |   |
232| |                    |                       | |report to be updated    |   |
233| |                    |                       | |with the latest known   |   |
234| |                    |                       | |sensor's readings.      |   |
235| |                    |                       | |No additional sensor    |   |
236| |                    |                       <-+readings are performed. |   |
237+----------------------------------Optional-----------------------------------+
238| |                    |                       |                          |   |
239| |  Send report as SSE or push-style event    |                          |   |
240| |  using Redfish Event Service (not shown    |                          |   |
241| |  here) if configured to do so.             |                          |   |
242| <--------------------------------------------+                          |   |
243| |                    |                       |                          |   |
244+-----------------------------------------------------------------------------+
245| |                    | Update method call    |                          |   |
246| |                    | result                |                          |   |
247| |                    <-----------------------+                          |   |
248| |                    |                       |                          |   |
249| |                    | Invoke GetAll method  |                          |   |
250| |                    | on report D-Bus       |                          |   |
251| |                    | object                |                          |   |
252| |                    +----------------------->                          |   |
253| |                    |                       |                          |   |
254| |  Return metric     | Return report data    |                          |   |
255| |  report in JSON    <-----------------------+                          |   |
256| |  format            |                       |                          |   |
257| <--------------------+                       |                          |   |
258| |                    |                       |                          |   |
259+-----------------------------------------------------------------------------+
260  |                    |                       |                          |
261```
262
263The Redfish implementation in bmcweb is stateless, thus it is not able to
264store metric reports. All operations on metric reports shall be done in
265the Telemetry service. Sending metric report as SSE or push-style events
266shall be done via the [Redfish Event Service][6]. It is marked as optional
267because metric report does not have to be configured for pushing its data
268through the event.
269
270In case of on demand metric report update, Telemetry service performs no
271additional sensor readings because it already has the latest values, since
272they are updated on PropertiesChanged signal from the D-Bus sensors.
273
274**Telemetry service on [D-Bus][4]**
275
276Telemetry service exposes specific interfaces on D-Bus. One of them will be
277used for reading report management. The second one will be used for triggers
278management.
279
280**Reading report management**
281
282The reading report management D-Bus object:
283
284```ascii
285xyz.openbmc_project.Telemetry.ReportManager
286/xyz/openbmc_project/Telemetry/Reports
287```
288The `ReportManager` implements D-Bus interface
289[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The
290interface is described in the phosphor-dbus-interfaces. This interface
291implements `AddReport` method, which is used to create a metric report. The
292report may contain a single or multiple sensor readings. The way how the report
293will be stored by the BMC is defined by one of this method's parameters.
294The `ReportManager` object implements property that stores the maximum number
295of reports supported simultaneously.
296
297The `AddReport` method returns the path to the newly created report object.
298The report object implements the [`xyz.openbmc_project.Object.Delete`][10]
299and [`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10]
300interface is defined to add support for removing Report object, while the
301[`Report`][9] interface implements methods and properties for Report
302management along with properties containing telemetry readings. Each report
303object contains the timestamp of its last update. The report object
304contains an array of structures containing reading with its metadata and
305timestamp of last update of this metric. Each report has also the property
306that stores update interval (for periodically updated reports).
307
308**Trigger management**
309
310The trigger management D-Bus object:
311
312```ascii
313xyz.openbmc_project.Telemetry.TriggerManager
314/xyz/openbmc_project/Telemetry/Triggers
315```
316The `TriggerManager` supports the
317`xyz.openbmc_project.Telemetry.TriggerManager` interface, which implements
318the `AddTrigger` method. This method shall be used to create new trigger for
319the certain metric. The method's parameters allow to define the type of metric
320for which trigger is set (discrete or numeric). Depend on this setting, this
321method accepts different set of trigger parameters.
322
323For discrete metric type, trigger parameters contain:
324
325| Field | Type | Description |
326|-------|------|-------------|
327| TriggerCondition | enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
328| DiscreteTriggers | array of structures | Array of discrete trigger structures. |
329
330Member of DiscreteTriggers array:
331
332| Field | Type | Description |
333|-------|------|-------------|
334| TriggerId| string     | Unique trigger Id |
335| Severity | enum     | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention |
336| Value | variant    | Value of discrete metric, that constitutes a trigger event. |
337| DwellTime | uint64     | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. |
338
339For numeric metric type, trigger parameters contain numeric thresholds.
340Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical, lower warning and lower critical. Thus it will contain up to 4 structures shown below:
341
342| Field | Type | Description |
343|-------|------|-------------|
344| ThresholdType | enum | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention|
345| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. |
346| Activation | enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above |
347| ThresholdValue | variant | Value of reading that will trigger the threshold |
348
349The `AddTrigger` method also allows to define the specific action when trigger
350is activated. Upon the trigger activation, three possible actions are allowed,
351logging event to log service, sending event via event service and triggering
352the metric report update.
353
354In order to assign trigger to specific metric, the metric parameter is defined.
355Its structure contains the following data:
356
357| Field | Type | Description |
358|-------|------|-------------|
359| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. |
360| MetricId   | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
361| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. |
362
363The `AddTrigger` method also allows to set trigger's persistency (whether
364trigger shall be stored in the BMC's non-volatile memory).
365
366The `AddTrigger` method returns:
367```ascii
368String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}'
369```
370Such created trigger implements the `xyz.openbmc_project.Object.Delete`
371and the `xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger
372object contains read-only information about metric type, for which it was
373created (discrete or numeric). This information determines which triggers
374are stored within trigger object.
375
376If trigger is defined for discrete metric type, than it contains trigger
377information that looks like this:
378
379| Type | Description |
380|------|-------------|
381| enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
382| array of structures | Array of discrete trigger structures. |
383
384Discrete trigger structure:
385
386| Type | Description |
387|------|-------------|
388| string     | Unique trigger Id |
389| enum     | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention |
390| variant    | Value of discrete metric, that constitutes a trigger event. |
391| uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
392
393If trigger is defined for numeric metric type, than it contains information
394about numeric triggers that is an array of 4 structures presented below:
395
396| Type | Description |
397|------|-------------|
398| enum | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning"|
399| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
400| enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" |
401| variant | Value of reading that will trigger the threshold |
402
403The trigger object also contains information about reading, for which trigger
404was defined. It is in a form of structure consisting of three fields.
405
406| Field type | Description  |
407|------------|--------------|
408| object path  | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. |
409| string     | Unique metric Id |
410| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. |
411
412**Trigger operations**
413
414Triggers support three types of operation: Log, Event and Update. For each,
415there is a different way of proceeding.
416
4171. For action Log, the event shall
418be logged to the system journal. In this case the Telemetry service writes
419data to system journal using libjournal. The Redfish log service shall then
420retrieve the data by reading system journal. All is shown on the diagram below.
421
422```ascii
423+---------------------------+
424|bmcweb|                    |         +----------------------+
425+------/    +-----------+-+ |         |Telemetry|            |
426|           |Redfish    | | |         +---------/            |
427|           |log service| | |         |                      |
428|           +-----------/ | |         |                      |
429|           |             | |         |                      |
430|           |             | |         |                      |
431|           +------^------+ |         +-----------+----------+
432+---------------------------+                     |
433                   |                              |
434                   +----collect----+            event
435                     journal entry |      (write to journal)
436                                   |              |
437       +------------------------------------+     |
438       |systemd|                   |        |     |
439       +-------/ +----------+  +---+------+ |     |
440       |         |journal|  |  |libjournal| |     |
441       |         +-------/  <-->          <-------+
442       |         |          |  +----------+ |
443       |         |          |               |
444       |         |          |               |
445       |         +----------+               |
446       |                                    |
447       +------------------------------------+
448```
4492. For action Event, the Telemetry service shall send event using the
450[Redfish Event Service][6] either as push-style event or SSE.
451
4523. For action Update, the Telemetry service will trigger the update of reading
453report pointed by it's D-Bus path contained in trigger object properties. The
454update shall cause the reading report's D-Bus object to emit property change
455signal. This will cause Redfish Metric Report to be streamed out if it was
456configured to do so.
457
458**Redfish Telemetry Service API**
459
460Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry
461resources. Metric report definitions determines which metrics are to be include
462in metric report. Metric definition is assigned to particular metric type and it
463describes how the metric should be interpreted. The following resource schemas
464shall be supported:
465
466- TelemetryService 1.1.2
467- MetricDefinition 1.0.3
468- MetricReportDefinition 1.3.0
469- MetricReport 1.2.0
470- Triggers 1.1.1
471
472The following diagram shows relations between these resources.
473
474```ascii
475 +----------------------------------------------------------------------------+
476 |                             Service root                                   |
477 +----------------------------------+-------------------------------+---------+
478                                    |                               |
479                                    |                               |
480                                    |                               |
481 +----------------------------------v-----------------+  +----------v---------+
482 |                                                    |  |Chassis             |
483 |                Telemetry Service                   |  |                    |
484 |                                                    |  |                    |
485 |                                                    |  |  +---------------+ |
486 +---------+--------------+------------------+--------+  |  |               | |
487           |              |                  |           |  |   Chassis 1   | |
488           |              |                  |           |  |               | |
489           |              |                  |           |  +---------+-----+ |
490           |              |                  |           |            |       |
491+----------v--+  +--------v----+  +----------v-----+     +--------------------+
492|Triggers     |  |Metric       |  |Metric report   |                  |
493|             |  |definition   |  |                |                  |
494|             |  | +---------+ |  |                | Reads            |
495| +---------+ |  | |Reading  | |  | +-----------+  | ReadingVolts  +--v------+
496| |         | |  | |Volts    <------+           +------------------>         |
497| |Trigger 1| |  | +---------+ |  | |  Metric   |  |               |         |
498| |         | |  |             |  | | report 1  |  | Reads         |  Power  |
499| |         | |  | +---------+ |  | |           |  | PowerConsumed |         |
500| |         | |  | |         | |  | |           |  | Watts         |         |
501| +--+---+--+ |  | |Power    <------+           +------------------>         |
502|    |   |    |  | |Consumed | |  | +-----^-----+  |               +----^----+
503|    |   |    |  | |Watts    | |  |       |        |                    |
504|    |   |    |  | +---------+ |  |       |        |                    |
505|    |   |    |  |             |  |       |        |                    |
506+-------------+  +-------------+  +----------------+                    |
507     |   |                                |                             |
508     |   | Triggers report update         |                             |
509     |   | (when applicable)              |                             |
510     |   +--------------------------------+                             |
511     |                                                                  |
512     |   Monitors PowerConsumedWatts to check                           |
513     |   whether trigger value is exceeded                              |
514     +------------------------------------------------------------------+
515```
516
517The diagram shows the relations between Redfish resources. Metric report is
518defined to be generated periodically, on demand or on change. Each metric in the
519Metric Report contains the URI to its metric definition and Redfish sensor,
520which reading value is presented. Nevertheless, under this presentation layer,
521Telemetry is gathering D-Bus sensors readings and exposing them
522in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor
523is mapped to Redfish sensor.
524
525Below examples of Redfish resources for the Telemetry Service are shown.
526
527The Telemetry Service Redfish resource example:
528
529```json
530{
531    "@odata.type": "#TelemetryService.v1_1_2.TelemetryService",
532    "Id": "TelemetryService",
533    "Name": "Telemetry Service",
534    "Status": {
535        "State": "Enabled",
536        "Health": "OK"
537    },
538    "MinCollectionInterval": "T00:00:10s",
539    "SupportedCollectionFunctions": [],
540    "MaxReports": <max_no_of_reports>,
541    "MetricDefinitions": {
542        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions"
543    },
544    "MetricReportDefinitions": {
545        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions"
546    },
547    "MetricReports": {
548        "@odata.id": "/redfish/v1/TelemetryService/MetricReports"
549    },
550    "Triggers": {
551        "@odata.id": "/redfish/v1/TelemetryService/Triggers"
552    },
553    "LogService": {
554        "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal"
555    },
556    "@odata.context": "/redfish/v1/$metadata#TelemetryService",
557    "@odata.id": "/redfish/v1/TelemetryService"
558}
559```
560
561Sample metric report definition:
562
563```json
564{
565    "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition",
566    "Id": "SampleMetric",
567    "Name": "Sample Metric Report Definition",
568    "MetricReportDefinitionType": "Periodic",
569    "Schedule": {
570        "RecurrenceInterval": "T00:00:10"
571    },
572    "ReportActions": [
573        "LogToMetricReportsCollection"
574    ],
575    "ReportUpdates": "Overwrite",
576    "MetricReport": {
577        "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric"
578    },
579    "Status": {
580        "State": "Enabled"
581    },
582    "Metrics": [
583        {
584            "MetricId": "Test",
585            "MetricProperties": [
586                "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts"
587            ]
588        }
589    ],
590    "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition",
591    "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage"
592}
593```
594
595Sample metric report:
596
597```json
598{
599    "@odata.type": "#MetricReport.v1_2_0.MetricReport",
600    "Id": "SampleMetric",
601    "Name": "Sample Metric Report",
602    "ReportSequence": "0",
603    "MetricReportDefinition": {
604        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
605    },
606    "MetricValues": [
607        {
608            "MetricDefinition": {
609                "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition"
610            },
611            "MetricId": "Test",
612            "MetricValue": "100",
613            "Timestamp": "2016-11-08T12:25:00-05:00",
614            "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
615        }
616    ],
617    "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport",
618    "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage"
619}
620```
621
622Sample trigger, that will trigger metric report update:
623
624```json
625{
626    "@odata.type": "#Triggers.v1_1_1.Triggers",
627    "Id": "SampleTrigger",
628    "Name": "Sample Trigger",
629    "MetricType": "Numeric",
630    "Links": {
631        "MetricReportDefinitions": [
632            "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
633        ]
634    },
635    "Status": {
636        "State": "Enabled"
637    },
638    "TriggerActions": [
639        "RedfishMetricReport"
640    ],
641    "NumericThresholds": {
642        "UpperCritical": {
643            "Reading": 50,
644            "Activation": "Increasing",
645            "DwellTime": "PT0.001S"
646        },
647        "UpperWarning": {
648            "Reading": 48.1,
649            "Activation": "Increasing",
650            "DwellTime": "PT0.004S"
651        }
652    },
653    "MetricProperties": [
654        "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
655    ],
656    "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers",
657    "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers"
658}
659```
660
661**Performance tests**
662
663Performance test were conducted on the AST2500 system with 64 MB flash and
664512 MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The
665runtime statistics are shown in the table below. The reading report is
666mapped into single Metric Report. The runtime data is collected for the
667Telemetry component only. All reports was created with
668```xyz.openbmc_project.Telemetry.Metric.OnChange``` property to
669maximize the workload. In the configuration with 50 reports and 50 sensors
670it is about 200 new readings per second, generating 200 reading reports
671per second. The table shows CPU usage and memory usage. The VSZ is the amount
672of memory mapped into the address space of the process. It includes pages
673backed by the process' executable file and shared libraries, its heap and
674stack, as well as anything else it has mapped.
675
676
677| Telemetry service state                          | VSZ  | %VSZ | %CPU |
678|--------------------------------------------------|------|------|------|
679| Idle (0 reports, 0 sensors)                      |5188 B| 1%   | 0%   |
680| 1 report, 1 sensor                               |5188 B| 1%   | 1%   |
681| 2 reports, 1 sensor                              |5188 B| 1%   | 1%   |
682| 2 reports, 2 sensors (1 sensor per report)       |5188 B| 1%   | 1%   |
683| 1 report, 10 sensors                             |5188 B| 1%   | 1%   |
684| 10 reports, 10 sensors (same for each report)    |5320 B| 1%   | 1-2% |
685| 2 reports, 20 sensors (10 per report)            |5188 B| 1%   | 1%   |
686| 30 reports, 30 sensors (10 per report)           |5444 B| 1%   | 5-9% |
687| 50 reports, 50 sensors (10 per report)           |5572 B| 1%   |11-14%|
688
689The last two configurations use 10 sensors per reading report, which gives
6903 or 5 distinctive configurations. Each such configuration is used to
691create 10 reading reports to obtain the desired amount of 30 or 50 reading
692reports.
693
694In this architecture reading report is created every time when Redfish
695Metric Report Definition is posted (creating new Metric Report).
696
697## Alternatives Considered
698The [framework based on collectd/librrd][5] was considered as alternate design.
699Although it seems to be versatile and scalable solution, it has some drawbacks
700from our point of view:
701* Collectd's footprint in the minimal working configuration is around 2.6 MB,
702while available space for the OpenBMC is limited to 64 MB.
703* In this design, librrd is used to store metrics on the BMC's non-volatile
704storage, which may be an issue, when lots of metrics are captured and stored
705to OpenBMC's limited storage space. Also flash wear-out issue may occur, when
706metrics are captured frequently (like once per second).
707* Telemetry service is directly compatible with Redfish Telemetry Service API,
708which means, that Telemetry's reading reports can be directly mapped to Redfish
709Metric Reports.
710* Telemetry service unifies the way how the BMC's telemetry is exposed over
711the Redfish and may be used with multiple front-ends, thus there is no problem
712to add support telemetry over IPMI or any other API.
713
714Since this design assumes flexibility and modularity, there is no obstacles to
715use collectd in cooperation with Telemetry. The one of possible configurations
716is shown on the diagram below.
717
718```ascii
719   +-----------------+      +-----------------+
720   |  D-Bus sensors  |      |   Telemetry     |
721   +--------^--------+      +--------^--------+
722            |                        |
723            |                        |
724            |                        |
725<--------^--v-----------D-Bus--------v-^---------->
726         |                             |
727         |                             |
728         |                             |
729 +-------v------------+     +----------v--------+
730 |  collectd metrics  |     |                   |
731 |  exposed as D-Bus  |     |     bmcweb        |
732 |      sensors       |     |  (with Redfish    |
733 +---------^----------+     |    Telemetry      |
734           |                |     Service)      |
735           |                |                   |
736    +------+-------+        +-------------------+
737    |              |
738    |   collectd   |
739    |              |
740    +--------------+
741```
742Here collectd is used as the source of some set of metrics. It exposes them
743as the D-Bus sensors, which can easily be consumed either by the bmcweb and
744Telemetry service without any changes in their D-Bus interfaces. In such
745configuration Telemetry service provides metric reports and triggers
746management.
747
748Other possible configuration is to use collectd without the Telemetry service,
749but in such case, collectd does not provide metric reports and triggers support
750compatible with the Redfish. In such case, Redfish Telemetry Service won't be
751supported or metric reports and triggers support has to be provided by the
752collectd.
753
754## Impacts
755This design impacts the architecture of the bmcweb component, since it adds
756the Redfish Telemetry Service implementation as a component for the existing
757Redfish API implementation.
758
759## Testing
760This is the very high-level description of the proposed set of tests.
761Testing shall be done on three basic levels:
762* Unit tests
763* Functional tests
764* Performance tests
765
766**Unit tests**
767
768The Telemetry's code shall be covered by the unit tests. The preferred
769framework is the [GTest/GMock][7]. The unit tests shall be ran before code
770change is to be committed to make sure, that nothing is broken in existing
771functionality. Also, when new code is introduced, a new set of unit tests shall
772be committed with it according to test-driven development principle. Unit tests
773shall be also carefully reviewed.
774
775**Functional tests**
776
777Functional tests will be divided into two steps.
778
779First step is for testing the Telemetry metric reports management. Test scenario
780shall contain creating metric report by POSTing proper metric report definition,
781reading metric report (using GET on proper URI) and deleting the metric report.
782The required configuration for such test is D-Bus sensors (at least some of
783them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be
784performed on real hardware. For ease of metric testing, dummy D-Bus sensors may
785be provided to provide specifically prepared metrics. This configuration shall
786also enable testing aggregated operations (MIN, MAX, SUM, AVG).
787
788Second step is to test triggers and events generation. This will require also
789Event Service to be implemented along with Log Service. Tests shall cover all
790scenarios with sending metric report as an event, triggering metric report
791update and logging events.
792
793**Performance tests**
794
795Performance tests shall be done using full OpenBMC configuration with all
796the required set of features. The tests shall create a lot of metric reports
797(up to maximum number) along with all possible triggers. Measurements shall
798cover the periodic metric report jitter, delays in event logging or sending,
799BMC's CPU utilization and the performance impact on other services.
800
801[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip
802[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md
803[3]: https://www.kernel.org/doc/Documentation/hwmon/
804[4]: https://www.freedesktop.org/wiki/Software/dbus/
805[5]: https://gerrit.openbmc.org/c/openbmc/docs/+/22257
806[6]: https://gerrit.openbmc.org/c/openbmc/docs/+/24749
807[7]: https://github.com/google/googletest
808[8]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml
809[9]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/Report.interface.yaml
810[10]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Object/Delete.interface.yaml
811