1# OpenBMC platform telemetry 2 3Author: Piotr Matuszczak <piotr.matuszczak@intel.com> 4 5Other contributors: Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>, 6Kamil Kowalski <kamil.kowalski@intel.com> 7 8Created: 2019-08-07 9 10## Problem Description 11 12The BMC on server platform gathers lots of telemetry data, which has to be 13exposed in clean, human readable and standardized format. This document focuses 14on telemetry over the Redfish, since it is standard API for platform 15manageability. 16 17## Background and References 18 19- OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1] 20 for exposing platform telemetry over the network. 21- OpenBMC platform telemetry shall leverage the [OpenBMC sensors architecture 22 implementation][2]. 23- OpenBMC platform telemetry shall implement a service, called Telemetry to deal 24 with metrics report and trigger management. This service is described later in 25 this document. 26- Although we use the [hwmon][3] to gather readings from physical sensors, this 27 architecture does not depend on it, because the Telemetry service component 28 relies on the [OpenBMC D-Bus sensors][2]. 29 30## Requirements 31 32- [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the 33 Telemetry service requires telemetry sources to be implemented as D-Bus 34 sensors. 35 36## Proposed Design 37 38Redfish Telemetry Model shall implement Telemetry Service with the following 39collection resources: 40 41- Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.) 42- Metric Report Definitions - defines how metric report shall be created (which 43 metrics it shall contain, how often it shall be generated etc.) 44- Metric Reports - contains actual metric reports containing telemetry data 45 generated according to the Metric Report Definitions 46- Metric Triggers - contains thresholds and actions that apply to specific 47 metrics 48 49OpenBMC telemetry architecture is shown on the diagram below. 50 51```ascii 52 +--------------+ +----------------+ +-----------------+ 53 |hwmon| | |Dbus sensors| | |Telemetry| | 54 +-----/ | +------------/ | +---------/ | 55 | +--filesystem---> | | | 56 | | | | | | 57 +--------------+ +--------^-------+ +--------^--------+ 58 | | 59 | | 60<------------------------------------------v-----^--DBus----------v-----------> 61 | 62 | 63+-------+---------------------------------------------------------------------+ 64|bmcweb | | | 65+-------/ | | 66| | | 67| +--------+-------------------------------------v--------------------------+ | 68| |Redfish | | | 69| +--------/ +---------+-------+ | | 70| | |Existing | | | | 71| | +------------------------------------------------+ |Redfish | | | | 72| | |Telemetry Service| | |resources| | | | 73| | +----------------+/ | +---------/ | | | 74| | | +----------+ +-----------+ +-------------+ | | +---------+ | | | 75| | | | Metric | | Metric | |Metric report| | | | Redfish | | | | 76| | | | triggers | |definitions| |definitions <---------+ sensors | | | | 77| | | | | | | | | | | | | | | | 78| | | +----+-----+ +-----+-----+ +------+------+ | | +---------+ | | | 79| | | | | | | | | | | 80| | | | | | | | | | | 81| | | | | | | | | | | 82| | | | +-----v-----+ | | | | | | 83| | | | | Metric | | | | | | | 84| | | +--------> report <---------+ | | | | | 85| | | | | | | | | | 86| | | +-----------+ | | | | | 87| | | | | | | | 88| | +------------------------------------------------+ +-----------------+ | | 89| | | | 90| +-------------------------------------------------------------------------+ | 91| | 92+-----------------------------------------------------------------------------+ 93``` 94 95The telemetry service component is a part of Redfish and implements the DMTF's 96[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors 97URIs for metric report creation. Those sensors are also used to get URI->D-Bus 98sensor mapping. Redfish Telemetry Service acts as presentation layer for the 99telemetry, while Telemetry service is responsible for gathering metrics from 100D-Bus sensors and exposing them as D-Bus objects. Telemetry service supports 101different monitoring modes (periodic, on change and on demand) along with 102aggregated operations: 103 104- SINGLE - current reading value 105- AVERAGE - average value over defined time period 106- MAX - max reading value during defined time period 107- MIN - min reading value during defined time period 108- SUM - sum of reading values over defined time period 109 110The time period for calculating aggregated metric is taken from the Redfish 111Metric Report Definition resource for each sensor's metric. 112 113Telemetry service supports creating and managing metric report, which may 114contain single or multiple metrics from sensors. This metric report is mapped to 115Metric Report for the Redfish Telemetry Service. 116 117The diagram below shows the flows for creation and update of metric report. 118 119```ascii 120+----+ +------+ +---------+ +-------+ 121|User| |bmcweb| |Telemetry| | D-Bus | 122+-+--+ +--+---+ +----+----+ |Sensors| 123 | | | +---+---+ 124 | | | | 125+-----------------------------------------------------------------------------+ 126|Metric report definition flow| | | | 127+-----------------------------+ | | | 128| | | | | | 129| | | | | | 130| | POST request | | | | 131| | with metric | | | | 132| | report | | | | 133| | definition | | | | 134| +--------------------> Invoke AddReport | Register for D-Bus | | 135| | | method on D-Bus | sensors | | 136| | +-----------------------> PropertiesChanged | | 137| | | | signals | | 138| | | +--------------------------> | 139| | | |--------------------------> | 140| | | +--------------------------> | 141| | | | | | 142| | HTTP response | +-+Create Report | | 143| | code 201 with | Return created | |D-Bus object | | 144| | Metric Report | Report D-Bus path <-+ | | 145| | Definition's URI <-----------------------+ | | 146| <--------------------+ | | | 147| | | | | | 148| | | | | | 149+-----------------------------------------------------------------------------+ 150 | | | | 151+-----------------------------------------------------------------------------+ 152|Periodic metric report update flow| | | | 153+----------------------------------+ +-+Metric report | | 154| | | | |timer triggers | | 155| | | <-+report update | | 156| | | | | | 157+----------------------------------Optional-----------------------------------+ 158| | | | | | 159| | Send report as SSE or push-style event | | | 160| | using Redfish Event Service (not shown | | | 161| | here) if configured to do so. | | | 162| <--------------------------------------------+ | | 163| | | | | | 164+-----------------------------------------------------------------------------+ 165| | GET on Metric | | | | 166| | Report URI | | Sensor's Properties- | | 167| +--------------------> | Changed signal | | 168| | +-+Map report's URI <--------------------------+ | 169| | | |to D-Bus path | | | 170| | <-+ | +----------------------+ | | 171| | | Invoke GetAll method | |Note that sensor's | | | 172| | | on report D-Bus | |PropertiesChanged | | | 173| | | object | |signal is asynchronous| | | 174| | +-----------------------> |to metric report timer| | | 175| | | | |This timer is the only| | | 176| | Return metric | Return report data | |thing that triggers | | | 177| | report in JSON <-----------------------+ |metric report update | | | 178| | format | | +----------------------+ | | 179| <--------------------+ | | | 180| | | | | | 181+-----------------------------------------------------------------------------+ 182 | | | | 183+-----------------------------------------------------------------------------+ 184|On change metric report update flow| | Sensor's Properties- | | 185+-----------------------------------+ | Changed signal | | 186| | | <--------------------------+ | 187| | | | | | 188| | | +-+Sensor's signal | | 189| | | | |triggers report | | 190| | | <-+update | | 191| | | | | | 192+----------------------------------Optional-----------------------------------+ 193| | | | | | 194| | Send report as SSE or push-style event | | | 195| | using Redfish Event Service (not shown | | | 196| | here) if configured to do so. | | | 197| <--------------------------------------------+ | | 198| | | | | | 199+-----------------------------------------------------------------------------+ 200| | GET on Metric | | | | 201| | Report URI | | | | 202| +--------------------> | | | 203| | +-+Map report's URI | | | 204| | | |to D-Bus path | +----------------------+ | | 205| | <-+ | |Note that sensor's | | | 206| | | Invoke GetAll method | |PropertiesChanged | | | 207| | | on report D-Bus | |signal triggers the | | | 208| | | object | |report update. It is | | | 209| | +-----------------------> |sufficient that the | | | 210| | | | |signal from only one | | | 211| | Return metric | Return report data | |sensor triggers report| | | 212| | report in JSON <-----------------------+ |update. | | | 213| | format | | +----------------------+ | | 214| <--------------------+ | | | 215| | | | | | 216+-----------------------------------------------------------------------------+ 217 | | | | 218+-+--------------------+------------------------------------------------------+ 219|On demand metric report update flow| | | | 220+-+--------------------+------------+ | | | 221| | | | | | 222| | GET on Metric | | | | 223| | Report URI | | | | 224| +--------------------> | | | 225| | +-+Map report's URI | | | 226| | | |to D-Bus path | | | 227| | <-+ | | | 228| | | | | | 229| | | Invoke the Update | | | 230| | | method for report | | | 231| | | D-Bus object | | | 232| | +-----------------------> | | 233| | | +-+Update method triggers | | 234| | | | |report to be updated | | 235| | | | |with the latest known | | 236| | | | |sensor's readings. | | 237| | | | |No additional sensor | | 238| | | <-+readings are performed. | | 239+----------------------------------Optional-----------------------------------+ 240| | | | | | 241| | Send report as SSE or push-style event | | | 242| | using Redfish Event Service (not shown | | | 243| | here) if configured to do so. | | | 244| <--------------------------------------------+ | | 245| | | | | | 246+-----------------------------------------------------------------------------+ 247| | | Update method call | | | 248| | | result | | | 249| | <-----------------------+ | | 250| | | | | | 251| | | Invoke GetAll method | | | 252| | | on report D-Bus | | | 253| | | object | | | 254| | +-----------------------> | | 255| | | | | | 256| | Return metric | Return report data | | | 257| | report in JSON <-----------------------+ | | 258| | format | | | | 259| <--------------------+ | | | 260| | | | | | 261+-----------------------------------------------------------------------------+ 262 | | | | 263``` 264 265The Redfish implementation in bmcweb is stateless, thus it is not able to store 266metric reports. All operations on metric reports shall be done in the Telemetry 267service. Sending metric report as SSE or push-style events shall be done via the 268[Redfish Event Service][6]. It is marked as optional because metric report does 269not have to be configured for pushing its data through the event. 270 271In case of on demand metric report update, Telemetry service performs no 272additional sensor readings because it already has the latest values, since they 273are updated on PropertiesChanged signal from the D-Bus sensors. 274 275**Telemetry service on [D-Bus][4]** 276 277Telemetry service exposes specific interfaces on D-Bus. One of them will be used 278for reading report management. The second one will be used for triggers 279management. 280 281**Reading report management** 282 283The reading report management D-Bus object: 284 285```ascii 286xyz.openbmc_project.Telemetry.ReportManager 287/xyz/openbmc_project/Telemetry/Reports 288``` 289 290The `ReportManager` implements D-Bus interface 291[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The 292interface is described in the phosphor-dbus-interfaces. This interface 293implements `AddReport` method, which is used to create a metric report. The 294report may contain a single or multiple sensor readings. The way how the report 295will be stored by the BMC is defined by one of this method's parameters. The 296`ReportManager` object implements property that stores the maximum number of 297reports supported simultaneously. 298 299The `AddReport` method returns the path to the newly created report object. The 300report object implements the [`xyz.openbmc_project.Object.Delete`][10] and 301[`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10] 302interface is defined to add support for removing Report object, while the 303[`Report`][9] interface implements methods and properties for Report management 304along with properties containing telemetry readings. Each report object contains 305the timestamp of its last update. The report object contains an array of 306structures containing reading with its metadata and timestamp of last update of 307this metric. Each report has also the property that stores update interval (for 308periodically updated reports). 309 310**Trigger management** 311 312The trigger management D-Bus object: 313 314```ascii 315xyz.openbmc_project.Telemetry.TriggerManager 316/xyz/openbmc_project/Telemetry/Triggers 317``` 318 319The `TriggerManager` supports the `xyz.openbmc_project.Telemetry.TriggerManager` 320interface, which implements the `AddTrigger` method. This method shall be used 321to create new trigger for the certain metric. The method's parameters allow to 322define the type of metric for which trigger is set (discrete or numeric). Depend 323on this setting, this method accepts different set of trigger parameters. 324 325For discrete metric type, trigger parameters contain: 326 327| Field | Type | Description | 328| ---------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 329| TriggerCondition | enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 330| DiscreteTriggers | array of structures | Array of discrete trigger structures. | 331 332Member of DiscreteTriggers array: 333 334| Field | Type | Description | 335| --------- | ------- | ---------------------------------------------------------------------------------------------------------------- | 336| TriggerId | string | Unique trigger Id | 337| Severity | enum | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention | 338| Value | variant | Value of discrete metric, that constitutes a trigger event. | 339| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 340 341For numeric metric type, trigger parameters contain numeric thresholds. Numeric 342thresholds structure shall contain up to 4 thresholds: upper warning, upper 343critical, lower warning and lower critical. Thus it will contain up to 4 344structures shown below: 345 346| Field | Type | Description | 347| -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 348| ThresholdType | enum | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention | 349| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 350| Activation | enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above | 351| ThresholdValue | variant | Value of reading that will trigger the threshold | 352 353The `AddTrigger` method also allows to define the specific action when trigger 354is activated. Upon the trigger activation, three possible actions are allowed, 355logging event to log service, sending event via event service and triggering the 356metric report update. 357 358In order to assign trigger to specific metric, the metric parameter is defined. 359Its structure contains the following data: 360 361| Field | Type | Description | 362| ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 363| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. | 364| MetricId | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 365| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. | 366 367The `AddTrigger` method also allows to set trigger's persistency (whether 368trigger shall be stored in the BMC's non-volatile memory). 369 370The `AddTrigger` method returns: 371 372```ascii 373String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}' 374``` 375 376Such created trigger implements the `xyz.openbmc_project.Object.Delete` and the 377`xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger object contains 378read-only information about metric type, for which it was created (discrete or 379numeric). This information determines which triggers are stored within trigger 380object. 381 382If trigger is defined for discrete metric type, than it contains trigger 383information that looks like this: 384 385| Type | Description | 386| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 387| enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 388| array of structures | Array of discrete trigger structures. | 389 390Discrete trigger structure: 391 392| Type | Description | 393| ------- | ------------------------------------------------------------------------------------------------------------------- | 394| string | Unique trigger Id | 395| enum | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention | 396| variant | Value of discrete metric, that constitutes a trigger event. | 397| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed. | 398 399If trigger is defined for numeric metric type, than it contains information 400about numeric triggers that is an array of 4 structures presented below: 401 402| Type | Description | 403| ------- | ------------------------------------------------------------------------------------------------------------------------------------- | 404| enum | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning" | 405| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed. | 406| enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" | 407| variant | Value of reading that will trigger the threshold | 408 409The trigger object also contains information about reading, for which trigger 410was defined. It is in a form of structure consisting of three fields. 411 412| Field type | Description | 413| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 414| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. | 415| string | Unique metric Id | 416| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. | 417 418**Trigger operations** 419 420Triggers support three types of operation: Log, Event and Update. For each, 421there is a different way of proceeding. 422 4231. For action Log, the event shall be logged to the system journal. In this case 424 the Telemetry service writes data to system journal using libjournal. The 425 Redfish log service shall then retrieve the data by reading system journal. 426 All is shown on the diagram below. 427 428```ascii 429+---------------------------+ 430|bmcweb| | +----------------------+ 431+------/ +-----------+-+ | |Telemetry| | 432| |Redfish | | | +---------/ | 433| |log service| | | | | 434| +-----------/ | | | | 435| | | | | | 436| | | | | | 437| +------^------+ | +-----------+----------+ 438+---------------------------+ | 439 | | 440 +----collect----+ event 441 journal entry | (write to journal) 442 | | 443 +------------------------------------+ | 444 |systemd| | | | 445 +-------/ +----------+ +---+------+ | | 446 | |journal| | |libjournal| | | 447 | +-------/ <--> <-------+ 448 | | | +----------+ | 449 | | | | 450 | | | | 451 | +----------+ | 452 | | 453 +------------------------------------+ 454``` 455 4562. For action Event, the Telemetry service shall send event using the [Redfish 457 Event Service][6] either as push-style event or SSE. 458 4593. For action Update, the Telemetry service will trigger the update of reading 460 report pointed by it's D-Bus path contained in trigger object properties. The 461 update shall cause the reading report's D-Bus object to emit property change 462 signal. This will cause Redfish Metric Report to be streamed out if it was 463 configured to do so. 464 465**Redfish Telemetry Service API** 466 467Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry 468resources. Metric report definitions determines which metrics are to be include 469in metric report. Metric definition is assigned to particular metric type and it 470describes how the metric should be interpreted. The following resource schemas 471shall be supported: 472 473- TelemetryService 1.1.2 474- MetricDefinition 1.0.3 475- MetricReportDefinition 1.3.0 476- MetricReport 1.2.0 477- Triggers 1.1.1 478 479The following diagram shows relations between these resources. 480 481```ascii 482 +----------------------------------------------------------------------------+ 483 | Service root | 484 +----------------------------------+-------------------------------+---------+ 485 | | 486 | | 487 | | 488 +----------------------------------v-----------------+ +----------v---------+ 489 | | |Chassis | 490 | Telemetry Service | | | 491 | | | | 492 | | | +---------------+ | 493 +---------+--------------+------------------+--------+ | | | | 494 | | | | | Chassis 1 | | 495 | | | | | | | 496 | | | | +---------+-----+ | 497 | | | | | | 498+----------v--+ +--------v----+ +----------v-----+ +--------------------+ 499|Triggers | |Metric | |Metric report | | 500| | |definition | | | | 501| | | +---------+ | | | Reads | 502| +---------+ | | |Reading | | | +-----------+ | ReadingVolts +--v------+ 503| | | | | |Volts <------+ +------------------> | 504| |Trigger 1| | | +---------+ | | | Metric | | | | 505| | | | | | | | report 1 | | Reads | Power | 506| | | | | +---------+ | | | | | PowerConsumed | | 507| | | | | | | | | | | | Watts | | 508| +--+---+--+ | | |Power <------+ +------------------> | 509| | | | | |Consumed | | | +-----^-----+ | +----^----+ 510| | | | | |Watts | | | | | | 511| | | | | +---------+ | | | | | 512| | | | | | | | | | 513+-------------+ +-------------+ +----------------+ | 514 | | | | 515 | | Triggers report update | | 516 | | (when applicable) | | 517 | +--------------------------------+ | 518 | | 519 | Monitors PowerConsumedWatts to check | 520 | whether trigger value is exceeded | 521 +------------------------------------------------------------------+ 522``` 523 524The diagram shows the relations between Redfish resources. Metric report is 525defined to be generated periodically, on demand or on change. Each metric in the 526Metric Report contains the URI to its metric definition and Redfish sensor, 527which reading value is presented. Nevertheless, under this presentation layer, 528Telemetry is gathering D-Bus sensors readings and exposing them in reading 529reports over D-Bus for the Telemetry Service. Each D-Bus sensor is mapped to 530Redfish sensor. 531 532Below examples of Redfish resources for the Telemetry Service are shown. 533 534The Telemetry Service Redfish resource example: 535 536```json 537{ 538 "@odata.type": "#TelemetryService.v1_1_2.TelemetryService", 539 "Id": "TelemetryService", 540 "Name": "Telemetry Service", 541 "Status": { 542 "State": "Enabled", 543 "Health": "OK" 544 }, 545 "MinCollectionInterval": "T00:00:10s", 546 "SupportedCollectionFunctions": [], 547 "MaxReports": <max_no_of_reports>, 548 "MetricDefinitions": { 549 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions" 550 }, 551 "MetricReportDefinitions": { 552 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions" 553 }, 554 "MetricReports": { 555 "@odata.id": "/redfish/v1/TelemetryService/MetricReports" 556 }, 557 "Triggers": { 558 "@odata.id": "/redfish/v1/TelemetryService/Triggers" 559 }, 560 "LogService": { 561 "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal" 562 }, 563 "@odata.context": "/redfish/v1/$metadata#TelemetryService", 564 "@odata.id": "/redfish/v1/TelemetryService" 565} 566``` 567 568Sample metric report definition: 569 570```json 571{ 572 "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition", 573 "Id": "SampleMetric", 574 "Name": "Sample Metric Report Definition", 575 "MetricReportDefinitionType": "Periodic", 576 "Schedule": { 577 "RecurrenceInterval": "T00:00:10" 578 }, 579 "ReportActions": ["LogToMetricReportsCollection"], 580 "ReportUpdates": "Overwrite", 581 "MetricReport": { 582 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric" 583 }, 584 "Status": { 585 "State": "Enabled" 586 }, 587 "Metrics": [ 588 { 589 "MetricId": "Test", 590 "MetricProperties": [ 591 "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts" 592 ] 593 } 594 ], 595 "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition", 596 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage" 597} 598``` 599 600Sample metric report: 601 602```json 603{ 604 "@odata.type": "#MetricReport.v1_2_0.MetricReport", 605 "Id": "SampleMetric", 606 "Name": "Sample Metric Report", 607 "ReportSequence": "0", 608 "MetricReportDefinition": { 609 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 610 }, 611 "MetricValues": [ 612 { 613 "MetricDefinition": { 614 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition" 615 }, 616 "MetricId": "Test", 617 "MetricValue": "100", 618 "Timestamp": "2016-11-08T12:25:00-05:00", 619 "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 620 } 621 ], 622 "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport", 623 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage" 624} 625``` 626 627Sample trigger, that will trigger metric report update: 628 629```json 630{ 631 "@odata.type": "#Triggers.v1_1_1.Triggers", 632 "Id": "SampleTrigger", 633 "Name": "Sample Trigger", 634 "MetricType": "Numeric", 635 "Links": { 636 "MetricReportDefinitions": [ 637 "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 638 ] 639 }, 640 "Status": { 641 "State": "Enabled" 642 }, 643 "TriggerActions": ["RedfishMetricReport"], 644 "NumericThresholds": { 645 "UpperCritical": { 646 "Reading": 50, 647 "Activation": "Increasing", 648 "DwellTime": "PT0.001S" 649 }, 650 "UpperWarning": { 651 "Reading": 48.1, 652 "Activation": "Increasing", 653 "DwellTime": "PT0.004S" 654 } 655 }, 656 "MetricProperties": [ 657 "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 658 ], 659 "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers", 660 "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers" 661} 662``` 663 664**Performance tests** 665 666Performance test were conducted on the AST2500 system with 64 MB flash and 512 667MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The runtime 668statistics are shown in the table below. The reading report is mapped into 669single Metric Report. The runtime data is collected for the Telemetry component 670only. All reports was created with 671`xyz.openbmc_project.Telemetry.Metric.OnChange` property to maximize the 672workload. In the configuration with 50 reports and 50 sensors it is about 200 673new readings per second, generating 200 reading reports per second. The table 674shows CPU usage and memory usage. The VSZ is the amount of memory mapped into 675the address space of the process. It includes pages backed by the process' 676executable file and shared libraries, its heap and stack, as well as anything 677else it has mapped. 678 679| Telemetry service state | VSZ | %VSZ | %CPU | 680| --------------------------------------------- | ------ | ---- | ------ | 681| Idle (0 reports, 0 sensors) | 5188 B | 1% | 0% | 682| 1 report, 1 sensor | 5188 B | 1% | 1% | 683| 2 reports, 1 sensor | 5188 B | 1% | 1% | 684| 2 reports, 2 sensors (1 sensor per report) | 5188 B | 1% | 1% | 685| 1 report, 10 sensors | 5188 B | 1% | 1% | 686| 10 reports, 10 sensors (same for each report) | 5320 B | 1% | 1-2% | 687| 2 reports, 20 sensors (10 per report) | 5188 B | 1% | 1% | 688| 30 reports, 30 sensors (10 per report) | 5444 B | 1% | 5-9% | 689| 50 reports, 50 sensors (10 per report) | 5572 B | 1% | 11-14% | 690 691The last two configurations use 10 sensors per reading report, which gives 3 or 6925 distinctive configurations. Each such configuration is used to create 10 693reading reports to obtain the desired amount of 30 or 50 reading reports. 694 695In this architecture reading report is created every time when Redfish Metric 696Report Definition is posted (creating new Metric Report). 697 698## Alternatives Considered 699 700The [framework based on collectd/librrd][5] was considered as alternate design. 701Although it seems to be versatile and scalable solution, it has some drawbacks 702from our point of view: 703 704- Collectd's footprint in the minimal working configuration is around 2.6 MB, 705 while available space for the OpenBMC is limited to 64 MB. 706- In this design, librrd is used to store metrics on the BMC's non-volatile 707 storage, which may be an issue, when lots of metrics are captured and stored 708 to OpenBMC's limited storage space. Also flash wear-out issue may occur, when 709 metrics are captured frequently (like once per second). 710- Telemetry service is directly compatible with Redfish Telemetry Service API, 711 which means, that Telemetry's reading reports can be directly mapped to 712 Redfish Metric Reports. 713- Telemetry service unifies the way how the BMC's telemetry is exposed over the 714 Redfish and may be used with multiple front-ends, thus there is no problem to 715 add support telemetry over IPMI or any other API. 716 717Since this design assumes flexibility and modularity, there is no obstacles to 718use collectd in cooperation with Telemetry. The one of possible configurations 719is shown on the diagram below. 720 721```ascii 722 +-----------------+ +-----------------+ 723 | D-Bus sensors | | Telemetry | 724 +--------^--------+ +--------^--------+ 725 | | 726 | | 727 | | 728<--------^--v-----------D-Bus--------v-^----------> 729 | | 730 | | 731 | | 732 +-------v------------+ +----------v--------+ 733 | collectd metrics | | | 734 | exposed as D-Bus | | bmcweb | 735 | sensors | | (with Redfish | 736 +---------^----------+ | Telemetry | 737 | | Service) | 738 | | | 739 +------+-------+ +-------------------+ 740 | | 741 | collectd | 742 | | 743 +--------------+ 744``` 745 746Here collectd is used as the source of some set of metrics. It exposes them as 747the D-Bus sensors, which can easily be consumed either by the bmcweb and 748Telemetry service without any changes in their D-Bus interfaces. In such 749configuration Telemetry service provides metric reports and triggers management. 750 751Other possible configuration is to use collectd without the Telemetry service, 752but in such case, collectd does not provide metric reports and triggers support 753compatible with the Redfish. In such case, Redfish Telemetry Service won't be 754supported or metric reports and triggers support has to be provided by the 755collectd. 756 757## Impacts 758 759This design impacts the architecture of the bmcweb component, since it adds the 760Redfish Telemetry Service implementation as a component for the existing Redfish 761API implementation. 762 763## Testing 764 765This is the very high-level description of the proposed set of tests. Testing 766shall be done on three basic levels: 767 768- Unit tests 769- Functional tests 770- Performance tests 771 772**Unit tests** 773 774The Telemetry's code shall be covered by the unit tests. The preferred framework 775is the [GTest/GMock][7]. The unit tests shall be ran before code change is to be 776committed to make sure, that nothing is broken in existing functionality. Also, 777when new code is introduced, a new set of unit tests shall be committed with it 778according to test-driven development principle. Unit tests shall be also 779carefully reviewed. 780 781**Functional tests** 782 783Functional tests will be divided into two steps. 784 785First step is for testing the Telemetry metric reports management. Test scenario 786shall contain creating metric report by POSTing proper metric report definition, 787reading metric report (using GET on proper URI) and deleting the metric report. 788The required configuration for such test is D-Bus sensors (at least some of 789them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be 790performed on real hardware. For ease of metric testing, dummy D-Bus sensors may 791be provided to provide specifically prepared metrics. This configuration shall 792also enable testing aggregated operations (MIN, MAX, SUM, AVG). 793 794Second step is to test triggers and events generation. This will require also 795Event Service to be implemented along with Log Service. Tests shall cover all 796scenarios with sending metric report as an event, triggering metric report 797update and logging events. 798 799**Performance tests** 800 801Performance tests shall be done using full OpenBMC configuration with all the 802required set of features. The tests shall create a lot of metric reports (up to 803maximum number) along with all possible triggers. Measurements shall cover the 804periodic metric report jitter, delays in event logging or sending, BMC's CPU 805utilization and the performance impact on other services. 806 807[1]: 808 https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip 809[2]: 810 https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md 811[3]: https://www.kernel.org/doc/Documentation/hwmon/ 812[4]: https://www.freedesktop.org/wiki/Software/dbus/ 813[5]: https://gerrit.openbmc.org/c/openbmc/docs/+/22257 814[6]: https://gerrit.openbmc.org/c/openbmc/docs/+/24749 815[7]: https://github.com/google/googletest 816[8]: 817 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml 818[9]: 819 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/Report.interface.yaml 820[10]: 821 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Object/Delete.interface.yaml 822