xref: /openbmc/docs/designs/telemetry.md (revision a0364138d665d485af4fccdbc672965e7760af2b)
1# OpenBMC platform telemetry
2
3Author:
4  Piotr Matuszczak <piotr.matuszczak@intel.com>
5
6Primary assignee:
7  Piotr Matuszczak
8
9Other contributors:
10  Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>,
11  Kamil Kowalski <kamil.kowalski@intel.com>
12
13Created:
14  2019-08-07
15
16## Problem Description
17The BMC on server platform gathers lots of telemetry data, which has to
18be exposed in clean, human readable and standardized format. This document
19focuses on telemetry over the Redfish, since it is standard API
20for platform manageability.
21
22## Background and References
23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1]
24for exposing platform telemetry over the network.
25* OpenBMC platform telemetry shall leverage the
26[OpenBMC sensors architecture implementation][2].
27* OpenBMC platform telemetry shall implement a service, called Telemetry to deal
28with metrics report and trigger management. This service is described later in
29this document.
30* Although we use the [hwmon][3] to gather readings from physical sensors, this
31architecture does not depend on it, because the Telemetry service component
32relies on the [OpenBMC D-Bus sensors][2].
33
34
35## Requirements
36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the
37Telemetry service requires telemetry sources to be implemented as D-Bus sensors.
38
39
40## Proposed Design
41Redfish Telemetry Model shall implement Telemetry Service with the following
42collection resources:
43* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.)
44* Metric Report Definitions - defines how metric report shall be created
45(which metrics it shall contain, how often it shall be generated etc.)
46* Metric Reports - contains actual metric reports containing telemetry data
47generated according to the Metric Report Definitions
48* Metric Triggers - contains thresholds and actions that apply to specific
49metrics
50
51OpenBMC telemetry architecture is shown on the diagram below.
52
53```ascii
54   +--------------+               +----------------+     +-----------------+
55   |hwmon|        |               |Dbus sensors|   |     |Telemetry|       |
56   +-----/        |               +------------/   |     +---------/       |
57   |              +--filesystem--->                |     |                 |
58   |              |               |                |     |                 |
59   +--------------+               +--------^-------+     +--------^--------+
60                                           |                      |
61                                           |                      |
62<------------------------------------------v-----^--DBus----------v----------->
63                                                 |
64                                                 |
65+-------+---------------------------------------------------------------------+
66|bmcweb |                                        |                            |
67+-------/                                        |                            |
68|                                                |                            |
69| +--------+-------------------------------------v--------------------------+ |
70| |Redfish |                                                                | |
71| +--------/                                            +---------+-------+ | |
72| |                                                     |Existing |       | | |
73| | +------------------------------------------------+  |Redfish  |       | | |
74| | |Telemetry Service|                              |  |resources|       | | |
75| | +----------------+/                              |  +---------/       | | |
76| | |  +----------+  +-----------+  +-------------+  |  |   +---------+   | | |
77| | |  |  Metric  |  |  Metric   |  |Metric report|  |  |   | Redfish |   | | |
78| | |  | triggers |  |definitions|  |definitions  <---------+ sensors |   | | |
79| | |  |          |  |           |  |             |  |  |   |         |   | | |
80| | |  +----+-----+  +-----+-----+  +------+------+  |  |   +---------+   | | |
81| | |       |              |               |         |  |                 | | |
82| | |       |              |               |         |  |                 | | |
83| | |       |              |               |         |  |                 | | |
84| | |       |        +-----v-----+         |         |  |                 | | |
85| | |       |        |   Metric  |         |         |  |                 | | |
86| | |       +-------->   report  <---------+         |  |                 | | |
87| | |                |           |                   |  |                 | | |
88| | |                +-----------+                   |  |                 | | |
89| | |                                                |  |                 | | |
90| | +------------------------------------------------+  +-----------------+ | |
91| |                                                                         | |
92| +-------------------------------------------------------------------------+ |
93|                                                                             |
94+-----------------------------------------------------------------------------+
95```
96
97The telemetry service component is a part of Redfish and implements the DMTF's
98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors
99URIs for metric report creation. Those sensors are also used to get
100URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation
101layer for the telemetry, while Telemetry service is responsible for gathering
102metrics from D-Bus sensors and exposing them as D-Bus objects. Telemetry
103service supports different monitoring modes (periodic, on change and on demand)
104along with aggregated operations:
105* SINGLE - current reading value
106* AVERAGE - average value over defined time period
107* MAX - max reading value during defined time period
108* MIN - min reading value during defined time period
109* SUM - sum of reading values over defined time period
110
111The time period for calculating aggregated is taken from the Redfish Metric
112Definition resource for each sensor's metric.
113
114Telemetry service supports creating and managing metric report, which may
115contain single or multiple metrics from sensors. This metric report is mapped
116to Metric Report for the Redfish Telemetry Service.
117
118The diagram below shows the flows for creation and update of metric report.
119
120```ascii
121+----+              +------+              +---------+                 +-------+
122|User|              |bmcweb|              |Telemetry|                 | D-Bus |
123+-+--+              +--+---+              +----+----+                 |Sensors|
124  |                    |                       |                      +---+---+
125  |                    |                       |                          |
126+-----------------------------------------------------------------------------+
127|Metric report definition flow|                |                          |   |
128+-----------------------------+                |                          |   |
129| |                    |                       |                          |   |
130| |                    |                       |                          |   |
131| |    POST request    |                       |                          |   |
132| |    with metric     |                       |                          |   |
133| |    report          |                       |                          |   |
134| |    definition      |                       |                          |   |
135| +-------------------->  Invoke AddReport     |  Register for D-Bus      |   |
136| |                    |  method on D-Bus      |  sensors                 |   |
137| |                    +----------------------->  PropertiesChanged       |   |
138| |                    |                       |  signals                 |   |
139| |                    |                       +-------------------------->   |
140| |                    |                       |-------------------------->   |
141| |                    |                       +-------------------------->   |
142| |                    |                       |                          |   |
143| |  HTTP response     |                       +-+Create Report           |   |
144| |  code 201 with     |  Return created       | |D-Bus object            |   |
145| |  Metric Report     |  Report D-Bus path    <-+                        |   |
146| |  Definition's URI  <-----------------------+                          |   |
147| <--------------------+                       |                          |   |
148| |                    |                       |                          |   |
149| |                    |                       |                          |   |
150+-----------------------------------------------------------------------------+
151  |                    |                       |                          |
152+-----------------------------------------------------------------------------+
153|Periodic metric report update flow|           |                          |   |
154+----------------------------------+           +-+Metric report           |   |
155| |                    |                       | |timer triggers          |   |
156| |                    |                       <-+report update           |   |
157| |                    |                       |                          |   |
158+----------------------------------Optional-----------------------------------+
159| |                    |                       |                          |   |
160| |  Send report as SSE or push-style event    |                          |   |
161| |  using Redfish Event Service (not shown    |                          |   |
162| |  here) if configured to do so.             |                          |   |
163| <--------------------------------------------+                          |   |
164| |                    |                       |                          |   |
165+-----------------------------------------------------------------------------+
166| |  GET on Metric     |                       |                          |   |
167| |  Report URI        |                       |   Sensor's Properties-   |   |
168| +-------------------->                       |   Changed signal         |   |
169| |                    +-+Map report's URI     <--------------------------+   |
170| |                    | |to D-Bus path        |                          |   |
171| |                    <-+                     | +----------------------+ |   |
172| |                    | Invoke GetAll method  | |Note that sensor's    | |   |
173| |                    | on report D-Bus       | |PropertiesChanged     | |   |
174| |                    | object                | |signal is asynchronous| |   |
175| |                    +-----------------------> |to metric report timer| |   |
176| |                    |                       | |This timer is the only| |   |
177| |  Return metric     | Return report data    | |thing that triggers   | |   |
178| |  report in JSON    <-----------------------+ |metric report update  | |   |
179| |  format            |                       | +----------------------+ |   |
180| <--------------------+                       |                          |   |
181| |                    |                       |                          |   |
182+-----------------------------------------------------------------------------+
183  |                    |                       |                          |
184+-----------------------------------------------------------------------------+
185|On change metric report update flow|          |   Sensor's Properties-   |   |
186+-----------------------------------+          |   Changed signal         |   |
187| |                    |                       <--------------------------+   |
188| |                    |                       |                          |   |
189| |                    |                       +-+Sensor's signal         |   |
190| |                    |                       | |triggers report         |   |
191| |                    |                       <-+update                  |   |
192| |                    |                       |                          |   |
193+----------------------------------Optional-----------------------------------+
194| |                    |                       |                          |   |
195| |  Send report as SSE or push-style event    |                          |   |
196| |  using Redfish Event Service (not shown    |                          |   |
197| |  here) if configured to do so.             |                          |   |
198| <--------------------------------------------+                          |   |
199| |                    |                       |                          |   |
200+-----------------------------------------------------------------------------+
201| |  GET on Metric     |                       |                          |   |
202| |  Report URI        |                       |                          |   |
203| +-------------------->                       |                          |   |
204| |                    +-+Map report's URI     |                          |   |
205| |                    | |to D-Bus path        | +----------------------+ |   |
206| |                    <-+                     | |Note that sensor's    | |   |
207| |                    | Invoke GetAll method  | |PropertiesChanged     | |   |
208| |                    | on report D-Bus       | |signal triggers the   | |   |
209| |                    | object                | |report update. It is  | |   |
210| |                    +-----------------------> |sufficient that the   | |   |
211| |                    |                       | |signal from only one  | |   |
212| |  Return metric     | Return report data    | |sensor triggers report| |   |
213| |  report in JSON    <-----------------------+ |update.               | |   |
214| |  format            |                       | +----------------------+ |   |
215| <--------------------+                       |                          |   |
216| |                    |                       |                          |   |
217+-----------------------------------------------------------------------------+
218  |                    |                       |                          |
219+-+--------------------+------------------------------------------------------+
220|On demand metric report update flow|          |                          |   |
221+-+--------------------+------------+          |                          |   |
222| |                    |                       |                          |   |
223| |  GET on Metric     |                       |                          |   |
224| |  Report URI        |                       |                          |   |
225| +-------------------->                       |                          |   |
226| |                    +-+Map report's URI     |                          |   |
227| |                    | |to D-Bus path        |                          |   |
228| |                    <-+                     |                          |   |
229| |                    |                       |                          |   |
230| |                    |  Invoke the Update    |                          |   |
231| |                    |  method for report    |                          |   |
232| |                    |  D+Bus object         |                          |   |
233| |                    +----------------------->                          |   |
234| |                    |                       +-+Update method triggers  |   |
235| |                    |                       | |report to be updated    |   |
236| |                    |                       | |with the latest known   |   |
237| |                    |                       | |sensor's readings.      |   |
238| |                    |                       | |No additional sensor    |   |
239| |                    |                       <-+readings are performed. |   |
240+----------------------------------Optional-----------------------------------+
241| |                    |                       |                          |   |
242| |  Send report as SSE or push-style event    |                          |   |
243| |  using Redfish Event Service (not shown    |                          |   |
244| |  here) if configured to do so.             |                          |   |
245| <--------------------------------------------+                          |   |
246| |                    |                       |                          |   |
247+-----------------------------------------------------------------------------+
248| |                    | Update method call    |                          |   |
249| |                    | result                |                          |   |
250| |                    <-----------------------+                          |   |
251| |                    |                       |                          |   |
252| |                    | Invoke GetAll method  |                          |   |
253| |                    | on report D-Bus       |                          |   |
254| |                    | object                |                          |   |
255| |                    +----------------------->                          |   |
256| |                    |                       |                          |   |
257| |  Return metric     | Return report data    |                          |   |
258| |  report in JSON    <-----------------------+                          |   |
259| |  format            |                       |                          |   |
260| <--------------------+                       |                          |   |
261| |                    |                       |                          |   |
262+-----------------------------------------------------------------------------+
263  |                    |                       |                          |
264```
265
266The Redfish implementation in bmcweb is stateless, thus it is not able to
267store metric reports. All operations on metric reports shall be done in
268the Telemetry service. Sending metric report as SSE or push-style events
269shall be done via the [Redfish Event Service][6]. It is marked as optional
270because metric report does not have to be configured for pushing its data
271through the event.
272
273In case of on demand metric report update, Telemetry service performs no
274additional sensor readings because it already has the latest values, since
275they are updated on PropertiesChanged signal from the D-Bus sensors.
276
277**Telemetry service on [D-Bus][4]**
278
279Telemetry service exposes specific interfaces on D-Bus. One of them will be
280used for reading report management. The second one will be used for triggers
281management.
282
283**Reading report management**
284
285The reading report management D-Bus object:
286
287```ascii
288xyz.openbmc_project.Telemetry.ReportsManagement
289/xyz/openbmc_project/Telemetry/Reports
290```
291The ```ReportsManagement``` supports the following interface apart from
292standard D-Bus interface.
293
294| Name | Type | Signature | Result/Value | Flags |
295|------|------|-----------|--------------|-------|
296|```xyz.openbmc_project.Telemetry.ReportsManagement``` | interface | - | - | - |
297|```.AddReport```                          | method    | ssuas | s | - |
298|```.MaxReports```                         | property  | u | 50 | emits-change |
299|```.PollRateResolution```                 | property  | u | 100 | emits-change |
300
301The ```AddReport``` method is used to create metric report. The report
302may contain single or multiple sensor readings. It is stored in the BMC's
303volatile memory. The method has the following arguments:
304
305| Argument | Type | Description |
306|----------|------|-------------|
307| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> testqrapndyY |
308| ReportingType | string | Reporting type: <br> "xyz.openbmc_project.Telemetry.Metric.Periodic" - For periodic update "xyz.openbmc_project.Telemetry.Metric.OnChange" - For update when value changes "xyz.openbmc_project.Telemetry.Metric.OnRequest" - For update when user requests data |
309| ScanPeriod | uint32_t | Scan period used when Periodic type is set (in milliseconds) |
310| MetricsParams | array of structures | Collection of metric parameters.  |
311
312The ```MetricParams``` array entry is a structure containing:
313| Field | Type | Description |
314|----------|------|-------------|
315| Sensor's path | object | D-Bus path, path to the sensor providing readings. |
316| Operation's type | enum | {SINGLE, MAX, MIN, AVG, SUM} - information about aggregated operation. |
317| Metric id | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
318
319The ```ScanPeriod``` is defined per report, thus all sensors listed in the MetricsParams
320collection will be scanned wit the same frequency. Also the ReportingType is
321defined per report. In case when *xyz.openbmc_project.Telemetry.Metric.OnChange*
322ReportingType was defined, metric report will emit signal when at least one
323reading has changed.
324
325The ```AddReport``` method returns:
326```ascii
327String for created report - ie. '/xyz/openbmc_project/Telemetry/Reports/testqrapndyY'
328```
329
330Such created metric report implements the following interfaces, methods and
331properties (apart from standard D-Bus interface):
332
333| Name | Type | Signature | Result/Value | Flags |
334|------|------|-----------|--------------|-------|
335|```xyz.openbmc_project.Object.Delete```   | interface | - | - | - |
336|```.Delete```                             | method    | - | - | - |
337|```xyz.openbmc_project.Telemetry.Report``` | interface | - | - | - |
338|```.Update```                             | method    | - | - | - |
339|```.ReadingParameters```                  | property  | a(sos) | 1 "/" | emits-change writable |
340|```.Readings```                           | property  | a(svs) | 0 | emits-change read-only |
341|```.ReportingType```                      | property  | s | One of reporting type strings| emits-change writable |
342|```.ScanPeriod```                         | property  | u | 100 | emits-change writable |
343
344The ```Update``` method is defined for the on demand metric report update. It
345shall trigger the ```Readings``` property to be updated and send
346PropertiesChanged signal.
347
348The ```ReadingParameters``` property contains an array of structures containing
349unique metric id, D-Bus sensor path and aggregated operation type. This
350property is made writable in order to support metric report modifications.
351
352| Field Type  | Field Description          |
353|-------------|----------------------------|
354| string      | Unique metric id           |
355| object path | D-Bus sensor's path        |
356| string      | Aggregated operation type  |
357
358The Readings property contains the array of the structures containing metric
359unique id, sensor's reading value and reading timestamp.
360
361| Field Type | Field Description          |
362|------------|----------------------------|
363| string     | Unique metric id           |
364| variant    | Sensor's reading value     |
365| string     | Sensor's reading timestamp |
366
367The ```ScanPeriod``` property has single value for the whole metric report.
368The Delete method results in deleting the whole metric report.
369
370The ```MaxReports``` property of
371the ```xyz.openbmc_project.Telemetry.ReportsManagement``` interface contains the
372max number of metric reports supported by the Telemetry service. This property
373is added to be compliant with the Redfish Telemetry Service schema, that
374contains ```MaxReports``` property.
375
376**Trigger management**
377
378The trigger management D-Bus object:
379
380```ascii
381xyz.openbmc_project.Telemetry.TriggersManagement
382/xyz/openbmc_project/Telemetry/Triggers
383```
384The ```TriggersManagement``` supports the following interface apart from
385standard D-Bus interface.
386
387| Name | Type | Signature | Result/Value | Flags |
388|------|------|-----------|--------------|-------|
389|```xyz.openbmc_project.Telemetry.TriggersManagement``` | interface | - | - | - |
390|```.AddTrigger```                         | method    | sssv(os) | s | - |
391
392The ```AddTrigger``` method shall be used to create new trigger for the
393certain metric. Triggers are stored in BMC's volatile memory. The method
394has the following arguments:
395
396| Argument | Type | Description |
397|----------|------|-------------|
398| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> trigger0dfvAgVt6 |
399| ActionType | string | Action type: <br> "xyz.openbmc_project.Telemetry.Trigger.Log" - For logging to log service "xyz.openbmc_project.Telemetry.Trigger.Event" - For sending Redfish event "xyz.openbmc_project.Telemetry.Trigger.Update" - For trigger metric report update |
400| MetricType | string | Metric type: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete" - for discrete sensors "xyz.openbmc_project.Telemetry.Trigger.Numeric" - for numeric sensors |
401| TriggerParams | variant | Variant containing structure with either discrete triggers or numeric thresholds. |
402| MetricParam | structure | Structure containing D-Bus sensor's path and unique metric Id and optional D-Bus path to metric report to trigger. |
403
404The ```TriggerParams``` is variant type, which shall contain structure
405depending on the ```MetricType``` value. In case when ```MetricType``` contains
406the ```xyz.openbmc_project.Telemetry.Trigger.Discrete``` value,
407 ```TriggerParams``` shall contain structure with discrete triggers.
408When ```MetricType``` contains
409the ```xyz.openbmc_project.Telemetry.Trigger.Numeric``` value,
410 ```TriggerParams``` shall contain structure with numeric thresholds.
411
412Discrete triggers structure:
413
414| Field | Type | Description |
415|-------|------|-------------|
416| TriggerCondition | string | Discrete trigger condition: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
417| DiscreteTriggers | array of structures | Array of discrete trigger structures. |
418
419Member of DiscreteTriggers array:
420
421| Field | Type | Description |
422|-------|------|-------------|
423| TriggerId| string     | Unique trigger Id |
424| Severity | string     | Severity: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Critical" - requires immediate attention |
425| Value | variant    | Value of discrete metric, that constitutes a trigger event. |
426| DwellTime | uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
427
428Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical,
429lower warning and lower critical. Thus it will contain up to 4 structures shown below:
430
431| Field | Type | Description |
432|-------|------|-------------|
433| ThresholdType | string | Numeric trigger type: <br> "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical","xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning","xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical","xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning"|
434| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
435| Activation | string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" |
436| ThresholdValue | variant | Value of reading that will trigger the threshold |
437
438The numeric threshold trigger type meaning:
439
440- "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical" -
441indicates the reading is above normal range and requires immediate attention
442- "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning" -
443indicates the reading is above normal range and may require attention
444- "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical" -
445indicates the reading is below normal range and requires immediate attention
446- "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning" -
447indicates the reading is below normal range and may require attention
448
449The numeric threshold activation meaning:
450
451- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing" -
452trigger action when reading is changing from below to above the threshold's value
453- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing" -
454trigger action when reading is changing from above to below the threshold's value
455- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" -
456trigger action when reading is crossing the threshold's value in either direction
457described above
458
459The ```MetricParam``` structure contains the following data:
460
461| Field | Type | Description |
462|-------|------|-------------|
463| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. |
464| MetricId   | string | Contains unique metric id, that can be mapped to Redfish MetricId. |
465| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's ActionType is set to "xyz.openbmc_project.Telemetry.Trigger.Update". |
466
467The ```AddTrigger``` method returns:
468```ascii
469String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/trigger0dfvAgVt6'
470```
471Such created trigger implements the following interfaces, methods and
472properties (apart from standard D-Bus interface):
473
474| Name | Type | Signature | Result/Value | Flags |
475|------|------|-----------|--------------|-------|
476|```xyz.openbmc_project.Object.Delete```   | interface | - | - | - |
477|```.Delete```                             | method    | - | - | - |
478|```xyz.openbmc_project.Telemetry.Trigger``` | interface | - | - | - |
479|```.MetricType```                         | property | s | One of the MetricType strings | emits-change read-only |
480|```.Triggers```                           | property | {sa{ssvu64}} or a{su64sv} | The structure containing triggers. It depends on ```.MetricType``` property how the structure is defined. | emits-change writable |
481|```.ActionType```                         | property | s | One of ActionType strings | emits-change writable |
482|```.Metric```                             | property | (oso) | Structure containing details of metric, for which trigger is defined. | emits-change writable |
483
484The ```.MetricType``` property contains information about metric type for which
485trigger was created. It can be either discrete or numeric. This property is
486read-only, thus created trigger cannot be changed from discrete to numeric or
487from numeric to discrete. This also determines how the ```.Triggers``` property
488looks like on D-Bus.
489
490If ```.MetricType``` is equal to "xyz.openbmc_project.Telemetry.Trigger.Discrete"
491then ```.Triggers``` property contains discrete trigger that looks like this:
492
493| Type | Description |
494|------|-------------|
495| string | Discrete trigger condition: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; "xyz.openbmc_project.Telemetry.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. |
496| array of structures | Array of discrete trigger structures. |
497
498Member of DiscreteTriggers array:
499
500| Type | Description |
501|------|-------------|
502| string     | Unique trigger Id |
503| string     | Severity: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Critical" - requires immediate attention |
504| variant    | Value of discrete metric, that constitutes a trigger event. |
505| uint64     | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
506
507If ```.MetricType``` is equal to "xyz.openbmc_project.Telemetry.Trigger.Numeric"
508then ```.Triggers``` property contains numeric trigger that is an array of 4 structures
509presented below:
510
511| Type | Description |
512|------|-------------|
513| string | Numeric trigger type: <br> "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical", "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning", "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical", "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning"|
514| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. |
515| string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" |
516| variant | Value of reading that will trigger the threshold |
517
518The ```.Metric``` property stores the details about reading, for which trigger was defined.
519It is in a form of structure consisting of three fields.
520
521| Field type | Description  |
522|------------|--------------|
523| object path  | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. |
524| string     | Unique metric Id |
525| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. |
526
527**Trigger operations**
528
529Triggers support three types of operation: Log, Event and Update. For each,
530there is a different way of proceeding.
531
5321. For action Log, the event shall
533be logged to the system journal. In this case the Telemetry service writes
534data to system journal using libjournal. The Redfish log service shall then
535retrieve the data by reading system journal. All is shown on the diagram below.
536
537```ascii
538+---------------------------+
539|bmcweb|                    |         +----------------------+
540+------/    +-----------+-+ |         |Telemetry|            |
541|           |Redfish    | | |         +---------/            |
542|           |log service| | |         |                      |
543|           +-----------/ | |         |                      |
544|           |             | |         |                      |
545|           |             | |         |                      |
546|           +------^------+ |         +-----------+----------+
547+---------------------------+                     |
548                   |                              |
549                   +----collect----+            event
550                     journal entry |      (write to journal)
551                                   |              |
552       +------------------------------------+     |
553       |systemd|                   |        |     |
554       +-------/ +----------+  +---+------+ |     |
555       |         |journal|  |  |libjournal| |     |
556       |         +-------/  <-->          <-------+
557       |         |          |  +----------+ |
558       |         |          |               |
559       |         |          |               |
560       |         +----------+               |
561       |                                    |
562       +------------------------------------+
563```
5642. For action Event, the Telemetry service shall send event using the
565[Redfish Event Service][6] either as push-style event or SSE.
566
5673. For action Update, the Telemetry service will trigger the update of reading
568report pointed by it's D-Bus path contained in ReportPath property inside
569the ```.Metric``` structure. The update shall cause the reading report's D-Bus
570object to emit property change signal. This will cause Redfish Metric Report to
571be streamed out if it was configured to do so.
572
573**Redfish Telemetry Service API**
574
575Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry
576resources. Metric report definitions determines which metrics are to be include
577in metric report. Metric definition is assigned to particular metric type and it
578describes how the metric should be interpreted. The following resource schemas
579shall be supported:
580
581- TelemetryService 1.1.2
582- MetricDefinition 1.0.3
583- MetricReportDefinition 1.3.0
584- MetricReport 1.2.0
585- Triggers 1.1.1
586
587The following diagram shows relations between these resources.
588
589```ascii
590 +----------------------------------------------------------------------------+
591 |                             Service root                                   |
592 +----------------------------------+-------------------------------+---------+
593                                    |                               |
594                                    |                               |
595                                    |                               |
596 +----------------------------------v-----------------+  +----------v---------+
597 |                                                    |  |Chassis             |
598 |                Telemetry Service                   |  |                    |
599 |                                                    |  |                    |
600 |                                                    |  |  +---------------+ |
601 +---------+--------------+------------------+--------+  |  |               | |
602           |              |                  |           |  |   Chassis 1   | |
603           |              |                  |           |  |               | |
604           |              |                  |           |  +---------+-----+ |
605           |              |                  |           |            |       |
606+----------v--+  +--------v----+  +----------v-----+     +--------------------+
607|Triggers     |  |Metric       |  |Metric report   |                  |
608|             |  |definition   |  |                |                  |
609|             |  | +---------+ |  |                | Reads            |
610| +---------+ |  | |Reading  | |  | +-----------+  | ReadingVolts  +--v------+
611| |         | |  | |Volts    <------+           +------------------>         |
612| |Trigger 1| |  | +---------+ |  | |  Metric   |  |               |         |
613| |         | |  |             |  | | report 1  |  | Reads         |  Power  |
614| |         | |  | +---------+ |  | |           |  | PowerConsumed |         |
615| |         | |  | |         | |  | |           |  | Watts         |         |
616| +--+---+--+ |  | |Power    <------+           +------------------>         |
617|    |   |    |  | |Consumed | |  | +-----^-----+  |               +----^----+
618|    |   |    |  | |Watts    | |  |       |        |                    |
619|    |   |    |  | +---------+ |  |       |        |                    |
620|    |   |    |  |             |  |       |        |                    |
621+-------------+  +-------------+  +----------------+                    |
622     |   |                                |                             |
623     |   | Triggers report update         |                             |
624	 |   | (when applicable)              |                             |
625     |   +--------------------------------+                             |
626     |                                                                  |
627     |   Monitors PowerConsumedWatts to check                           |
628	 |   whether trigger value is exceeded                              |
629     +------------------------------------------------------------------+
630```
631
632The diagram shows the relations between Redfish resources. Metric report is
633defined to be generated periodically, on demand or on change. Each metric in the
634Metric Report contains the URI to its metric definition and Redfish sensor,
635which reading value is presented. Nevertheless, under this presentation layer,
636Telemetry is gathering D-Bus sensors readings and exposing them
637in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor
638is mapped to Redfish sensor.
639
640Below examples of Redfish resources for the Telemetry Service are shown.
641
642The Telemetry Service Redfish resource example:
643
644```json
645{
646    "@odata.type": "#TelemetryService.v1_1_2.TelemetryService",
647    "Id": "TelemetryService",
648    "Name": "Telemetry Service",
649    "Status": {
650        "State": "Enabled",
651        "Health": "OK"
652    },
653    "MinCollectionInterval": "T00:00:10s",
654    "SupportedCollectionFunctions": [],
655    "MaxReports": <max_no_of_reports>,
656    "MetricDefinitions": {
657        "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions"
658    },
659    "MetricReportDefinitions": {
660        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions"
661    },
662    "MetricReports": {
663        "@odata.id": "/redfish/v1/TelemetryService/MetricReports"
664    },
665    "Triggers": {
666        "@odata.id": "/redfish/v1/TelemetryService/Triggers"
667    },
668    "LogService": {
669        "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal"
670    },
671    "@odata.context": "/redfish/v1/$metadata#TelemetryService",
672    "@odata.id": "/redfish/v1/TelemetryService"
673}
674```
675
676Sample metric report definition:
677
678```json
679{
680    "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition",
681    "Id": "SampleMetric",
682    "Name": "Sample Metric Report Definition",
683    "MetricReportDefinitionType": "Periodic",
684    "Schedule": {
685        "RecurrenceInterval": "T00:00:10"
686    },
687    "ReportActions": [
688        "LogToMetricReportsCollection"
689    ],
690    "ReportUpdates": "Overwrite",
691    "MetricReport": {
692        "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric"
693    },
694    "Status": {
695        "State": "Enabled"
696    },
697    "Metrics": [
698        {
699            "MetricId": "Test",
700            "MetricProperties": [
701                "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts"
702            ]
703        }
704    ],
705    "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition",
706    "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage"
707}
708```
709
710Sample metric report:
711
712```json
713{
714    "@odata.type": "#MetricReport.v1_2_0.MetricReport",
715    "Id": "SampleMetric",
716    "Name": "Sample Metric Report",
717    "ReportSequence": "0",
718    "MetricReportDefinition": {
719        "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
720    },
721    "MetricValues": [
722        {
723            "MetricDefinition": {
724                "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition"
725            },
726            "MetricId": "Test",
727            "MetricValue": "100",
728            "Timestamp": "2016-11-08T12:25:00-05:00",
729            "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
730        }
731    ],
732    "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport",
733    "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage"
734}
735```
736
737Sample trigger, that will trigger metric report update:
738
739```json
740{
741    "@odata.type": "#Triggers.v1_1_1.Triggers",
742    "Id": "SampleTrigger",
743    "Name": "Sample Trigger",
744    "MetricType": "Numeric",
745    "Links": {
746        "MetricReportDefinitions": [
747            "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric"
748        ]
749    },
750    "Status": {
751        "State": "Enabled"
752    },
753    "TriggerActions": [
754        "RedfishMetricReport"
755    ],
756    "NumericThresholds": {
757        "UpperCritical": {
758            "Reading": 50,
759            "Activation": "Increasing",
760            "DwellTime": "PT0.001S"
761        },
762        "UpperWarning": {
763            "Reading": 48.1,
764            "Activation": "Increasing",
765            "DwellTime": "PT0.004S"
766        }
767    },
768    "MetricProperties": [
769        "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts"
770    ],
771    "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers",
772    "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers"
773}
774```
775
776**Performance tests**
777
778Performance test were conducted on the AST2500 system with 64 MB flash and
779512 MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The
780runtime statistics are shown in the table below. The reading report is
781mapped into single Metric Report. The runtime data is collected for the
782Telemetry component only. All reports was created with
783```xyz.openbmc_project.Telemetry.Metric.OnChange``` property to
784maximize the workload. In the configuration with 50 reports and 50 sensors
785it is about 200 new readings per second, generating 200 reading reports
786per second. The table shows CPU usage and memory usage. The VSZ is the amount
787of memory mapped into the address space of the process. It includes pages
788backed by the process' executable file and shared libraries, its heap and
789stack, as well as anything else it has mapped.
790
791
792| Telemetry service state                          | VSZ  | %VSZ | %CPU |
793|--------------------------------------------------|------|------|------|
794| Idle (0 reports, 0 sensors)                      |5188 B| 1%   | 0%   |
795| 1 report, 1 sensor                               |5188 B| 1%   | 1%   |
796| 2 reports, 1 sensor                              |5188 B| 1%   | 1%   |
797| 2 reports, 2 sensors (1 sensor per report)       |5188 B| 1%   | 1%   |
798| 1 report, 10 sensors                             |5188 B| 1%   | 1%   |
799| 10 reports, 10 sensors (same for each report)    |5320 B| 1%   | 1-2% |
800| 2 reports, 20 sensors (10 per report)            |5188 B| 1%   | 1%   |
801| 30 reports, 30 sensors (10 per report)           |5444 B| 1%   | 5-9% |
802| 50 reports, 50 sensors (10 per report)           |5572 B| 1%   |11-14%|
803
804The last two configurations use 10 sensors per reading report, which gives
8053 or 5 distinctive configurations. Each such configuration is used to
806create 10 reading reports to obtain the desired amount of 30 or 50 reading
807reports.
808
809In this architecture reading report is created every time when Redfish
810Metric Report Definition is posted (creating new Metric Report).
811
812## Alternatives Considered
813The [framework based on collectd/librrd][5] was considered as alternate design.
814Although it seems to be versatile and scalable solution, it has some drawbacks
815from our point of view:
816* Collectd's footprint in the minimal working configuration is around 2.6 MB,
817while available space for the OpenBMC is limited to 64 MB.
818* In this design, librrd is used to store metrics on the BMC's non-volatile
819storage, which may be an issue, when lots of metrics are captured and stored
820to OpenBMC's limited storage space. Also flash wear-out issue may occur, when
821metrics are captured frequently (like once per second).
822* Telemetry service is directly compatible with Redfish Telemetry Service API,
823which means, that Telemetry's reading reports can be directly mapped to Redfish
824Metric Reports.
825* Telemetry service unifies the way how the BMC's telemetry is exposed over
826the Redfish and may be used with multiple front-ends, thus there is no problem
827to add support telemetry over IPMI or any other API.
828
829Since this design assumes flexibility and modularity, there is no obstacles to
830use collectd in cooperation with Telemetry. The one of possible configurations
831is shown on the diagram below.
832
833```ascii
834   +-----------------+      +-----------------+
835   |  D-Bus sensors  |      |   Telemetry     |
836   +--------^--------+      +--------^--------+
837            |                        |
838            |                        |
839            |                        |
840<--------^--v-----------D-Bus--------v-^---------->
841         |                             |
842         |                             |
843         |                             |
844 +-------v------------+     +----------v--------+
845 |  collectd metrics  |     |                   |
846 |  exposed as D-Bus  |     |     bmcweb        |
847 |      sensors       |     |  (with Redfish    |
848 +---------^----------+     |    Telemetry      |
849           |                |     Service)      |
850           |                |                   |
851    +------+-------+        +-------------------+
852    |              |
853    |   collectd   |
854    |              |
855    +--------------+
856```
857Here collectd is used as the source of some set of metrics. It exposes them
858as the D-Bus sensors, which can easily be consumed either by the bmcweb and
859Telemetry service without any changes in their D-Bus interfaces. In such
860configuration Telemetry service provides metric reports and triggers
861management.
862
863Other possible configuration is to use collectd without the Telemetry service,
864but in such case, collectd does not provide metric reports and triggers support
865compatible with the Redfish. In such case, Redfish Telemetry Service won't be
866supported or metric reports and triggers support has to be provided by the
867collectd.
868
869## Impacts
870This design impacts the architecture of the bmcweb component, since it adds
871the Redfish Telemetry Service implementation as a component for the existing
872Redfish API implementation.
873
874## Testing
875This is the very high-level description of the proposed set of tests.
876Testing shall be done on three basic levels:
877* Unit tests
878* Functional tests
879* Performance tests
880
881**Unit tests**
882
883The Telemetry's code shall be covered by the unit tests. The preferred
884framework is the [GTest/GMock][7]. The unit tests shall be ran before code
885change is to be committed to make sure, that nothing is broken in existing
886functionality. Also, when new code is introduced, a new set of unit tests shall
887be committed with it according to test-driven development principle. Unit tests
888shall be also carefully reviewed.
889
890**Functional tests**
891
892Functional tests will be divided into two steps.
893
894First step is for testing the Telemetry metric reports management. Test scenario
895shall contain creating metric report by POSTing proper metric report definition,
896reading metric report (using GET on proper URI) and deleting the metric report.
897The required configuration for such test is D-Bus sensors (at least some of
898them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be
899performed on real hardware. For ease of metric testing, dummy D-Bus sensors may
900be provided to provide specifically prepared metrics. This configuration shall
901also enable testing aggregated operations (MIN, MAX, SUM, AVG).
902
903Second step is to test triggers and events generation. This will require also
904Event Service to be implemented along with Log Service. Tests shall cover all
905scenarios with sending metric report as an event, triggering metric report
906update and logging events.
907
908**Performance tests**
909
910Performance tests shall be done using full OpenBMC configuration with all
911the required set of features. The tests shall create a lot of metric reports
912(up to maximum number) along with all possible triggers. Measurements shall
913cover the periodic metric report jitter, delays in event logging or sending,
914BMC's CPU utilization and the performance impact on other services.
915
916[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip
917[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md
918[3]: https://www.kernel.org/doc/Documentation/hwmon/
919[4]: https://www.freedesktop.org/wiki/Software/dbus/
920[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257
921[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749
922[7]: https://github.com/google/googletest
923