1# OpenBMC platform telemetry 2 3Author: 4 Piotr Matuszczak <piotr.matuszczak@intel.com> 5 6Primary assignee: 7 Piotr Matuszczak 8 9Other contributors: 10 Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>, 11 Kamil Kowalski <kamil.kowalski@intel.com> 12 13Created: 14 2019-08-07 15 16## Problem Description 17The BMC on server platform gathers lots of telemetry data, which has to 18be exposed in clean, human readable and standardized format. This document 19focuses on telemetry over the Redfish, since it is standard API 20for platform manageability. 21 22## Background and References 23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1] 24for exposing platform telemetry over the network. 25* OpenBMC platform telemetry shall leverage the 26[OpenBMC sensors architecture implementation][2]. 27* OpenBMC platform telemetry shall implement a service, called Telemetry to deal 28with metrics report and trigger management. This service is described later in 29this document. 30* Although we use the [hwmon][3] to gather readings from physical sensors, this 31architecture does not depend on it, because the Telemetry service component 32relies on the [OpenBMC D-Bus sensors][2]. 33 34 35## Requirements 36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the 37Telemetry service requires telemetry sources to be implemented as D-Bus sensors. 38 39 40## Proposed Design 41Redfish Telemetry Model shall implement Telemetry Service with the following 42collection resources: 43* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.) 44* Metric Report Definitions - defines how metric report shall be created 45(which metrics it shall contain, how often it shall be generated etc.) 46* Metric Reports - contains actual metric reports containing telemetry data 47generated according to the Metric Report Definitions 48* Metric Triggers - contains thresholds and actions that apply to specific 49metrics 50 51OpenBMC telemetry architecture is shown on the diagram below. 52 53```ascii 54 +--------------+ +----------------+ +-----------------+ 55 |hwmon| | |Dbus sensors| | |Telemetry| | 56 +-----/ | +------------/ | +---------/ | 57 | +--filesystem---> | | | 58 | | | | | | 59 +--------------+ +--------^-------+ +--------^--------+ 60 | | 61 | | 62<------------------------------------------v-----^--DBus----------v-----------> 63 | 64 | 65+-------+---------------------------------------------------------------------+ 66|bmcweb | | | 67+-------/ | | 68| | | 69| +--------+-------------------------------------v--------------------------+ | 70| |Redfish | | | 71| +--------/ +---------+-------+ | | 72| | |Existing | | | | 73| | +------------------------------------------------+ |Redfish | | | | 74| | |Telemetry Service| | |resources| | | | 75| | +----------------+/ | +---------/ | | | 76| | | +----------+ +-----------+ +-------------+ | | +---------+ | | | 77| | | | Metric | | Metric | |Metric report| | | | Redfish | | | | 78| | | | triggers | |definitions| |definitions <---------+ sensors | | | | 79| | | | | | | | | | | | | | | | 80| | | +----+-----+ +-----+-----+ +------+------+ | | +---------+ | | | 81| | | | | | | | | | | 82| | | | | | | | | | | 83| | | | | | | | | | | 84| | | | +-----v-----+ | | | | | | 85| | | | | Metric | | | | | | | 86| | | +--------> report <---------+ | | | | | 87| | | | | | | | | | 88| | | +-----------+ | | | | | 89| | | | | | | | 90| | +------------------------------------------------+ +-----------------+ | | 91| | | | 92| +-------------------------------------------------------------------------+ | 93| | 94+-----------------------------------------------------------------------------+ 95``` 96 97The telemetry service component is a part of Redfish and implements the DMTF's 98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors 99URIs for metric report creation. Those sensors are also used to get 100URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation 101layer for the telemetry, while Telemetry service is responsible for gathering 102metrics from D-Bus sensors and exposing them as D-Bus objects. Telemetry 103service supports different monitoring modes (periodic, on change and on demand) 104along with aggregated operations: 105* SINGLE - current reading value 106* AVERAGE - average value over defined time period 107* MAX - max reading value during defined time period 108* MIN - min reading value during defined time period 109* SUM - sum of reading values over defined time period 110 111The time period for calculating aggregated metric is taken from the Redfish 112Metric Report Definition resource for each sensor's metric. 113 114Telemetry service supports creating and managing metric report, which may 115contain single or multiple metrics from sensors. This metric report is mapped 116to Metric Report for the Redfish Telemetry Service. 117 118The diagram below shows the flows for creation and update of metric report. 119 120```ascii 121+----+ +------+ +---------+ +-------+ 122|User| |bmcweb| |Telemetry| | D-Bus | 123+-+--+ +--+---+ +----+----+ |Sensors| 124 | | | +---+---+ 125 | | | | 126+-----------------------------------------------------------------------------+ 127|Metric report definition flow| | | | 128+-----------------------------+ | | | 129| | | | | | 130| | | | | | 131| | POST request | | | | 132| | with metric | | | | 133| | report | | | | 134| | definition | | | | 135| +--------------------> Invoke AddReport | Register for D-Bus | | 136| | | method on D-Bus | sensors | | 137| | +-----------------------> PropertiesChanged | | 138| | | | signals | | 139| | | +--------------------------> | 140| | | |--------------------------> | 141| | | +--------------------------> | 142| | | | | | 143| | HTTP response | +-+Create Report | | 144| | code 201 with | Return created | |D-Bus object | | 145| | Metric Report | Report D-Bus path <-+ | | 146| | Definition's URI <-----------------------+ | | 147| <--------------------+ | | | 148| | | | | | 149| | | | | | 150+-----------------------------------------------------------------------------+ 151 | | | | 152+-----------------------------------------------------------------------------+ 153|Periodic metric report update flow| | | | 154+----------------------------------+ +-+Metric report | | 155| | | | |timer triggers | | 156| | | <-+report update | | 157| | | | | | 158+----------------------------------Optional-----------------------------------+ 159| | | | | | 160| | Send report as SSE or push-style event | | | 161| | using Redfish Event Service (not shown | | | 162| | here) if configured to do so. | | | 163| <--------------------------------------------+ | | 164| | | | | | 165+-----------------------------------------------------------------------------+ 166| | GET on Metric | | | | 167| | Report URI | | Sensor's Properties- | | 168| +--------------------> | Changed signal | | 169| | +-+Map report's URI <--------------------------+ | 170| | | |to D-Bus path | | | 171| | <-+ | +----------------------+ | | 172| | | Invoke GetAll method | |Note that sensor's | | | 173| | | on report D-Bus | |PropertiesChanged | | | 174| | | object | |signal is asynchronous| | | 175| | +-----------------------> |to metric report timer| | | 176| | | | |This timer is the only| | | 177| | Return metric | Return report data | |thing that triggers | | | 178| | report in JSON <-----------------------+ |metric report update | | | 179| | format | | +----------------------+ | | 180| <--------------------+ | | | 181| | | | | | 182+-----------------------------------------------------------------------------+ 183 | | | | 184+-----------------------------------------------------------------------------+ 185|On change metric report update flow| | Sensor's Properties- | | 186+-----------------------------------+ | Changed signal | | 187| | | <--------------------------+ | 188| | | | | | 189| | | +-+Sensor's signal | | 190| | | | |triggers report | | 191| | | <-+update | | 192| | | | | | 193+----------------------------------Optional-----------------------------------+ 194| | | | | | 195| | Send report as SSE or push-style event | | | 196| | using Redfish Event Service (not shown | | | 197| | here) if configured to do so. | | | 198| <--------------------------------------------+ | | 199| | | | | | 200+-----------------------------------------------------------------------------+ 201| | GET on Metric | | | | 202| | Report URI | | | | 203| +--------------------> | | | 204| | +-+Map report's URI | | | 205| | | |to D-Bus path | +----------------------+ | | 206| | <-+ | |Note that sensor's | | | 207| | | Invoke GetAll method | |PropertiesChanged | | | 208| | | on report D-Bus | |signal triggers the | | | 209| | | object | |report update. It is | | | 210| | +-----------------------> |sufficient that the | | | 211| | | | |signal from only one | | | 212| | Return metric | Return report data | |sensor triggers report| | | 213| | report in JSON <-----------------------+ |update. | | | 214| | format | | +----------------------+ | | 215| <--------------------+ | | | 216| | | | | | 217+-----------------------------------------------------------------------------+ 218 | | | | 219+-+--------------------+------------------------------------------------------+ 220|On demand metric report update flow| | | | 221+-+--------------------+------------+ | | | 222| | | | | | 223| | GET on Metric | | | | 224| | Report URI | | | | 225| +--------------------> | | | 226| | +-+Map report's URI | | | 227| | | |to D-Bus path | | | 228| | <-+ | | | 229| | | | | | 230| | | Invoke the Update | | | 231| | | method for report | | | 232| | | D-Bus object | | | 233| | +-----------------------> | | 234| | | +-+Update method triggers | | 235| | | | |report to be updated | | 236| | | | |with the latest known | | 237| | | | |sensor's readings. | | 238| | | | |No additional sensor | | 239| | | <-+readings are performed. | | 240+----------------------------------Optional-----------------------------------+ 241| | | | | | 242| | Send report as SSE or push-style event | | | 243| | using Redfish Event Service (not shown | | | 244| | here) if configured to do so. | | | 245| <--------------------------------------------+ | | 246| | | | | | 247+-----------------------------------------------------------------------------+ 248| | | Update method call | | | 249| | | result | | | 250| | <-----------------------+ | | 251| | | | | | 252| | | Invoke GetAll method | | | 253| | | on report D-Bus | | | 254| | | object | | | 255| | +-----------------------> | | 256| | | | | | 257| | Return metric | Return report data | | | 258| | report in JSON <-----------------------+ | | 259| | format | | | | 260| <--------------------+ | | | 261| | | | | | 262+-----------------------------------------------------------------------------+ 263 | | | | 264``` 265 266The Redfish implementation in bmcweb is stateless, thus it is not able to 267store metric reports. All operations on metric reports shall be done in 268the Telemetry service. Sending metric report as SSE or push-style events 269shall be done via the [Redfish Event Service][6]. It is marked as optional 270because metric report does not have to be configured for pushing its data 271through the event. 272 273In case of on demand metric report update, Telemetry service performs no 274additional sensor readings because it already has the latest values, since 275they are updated on PropertiesChanged signal from the D-Bus sensors. 276 277**Telemetry service on [D-Bus][4]** 278 279Telemetry service exposes specific interfaces on D-Bus. One of them will be 280used for reading report management. The second one will be used for triggers 281management. 282 283**Reading report management** 284 285The reading report management D-Bus object: 286 287```ascii 288xyz.openbmc_project.Telemetry.ReportManager 289/xyz/openbmc_project/Telemetry/Reports 290``` 291The `ReportManager` implements D-Bus interface 292[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The 293interface is described in the phosphor-dbus-interfaces. This interface 294implements `AddReport` method, which is used to create a metric report. The 295report may contain a single or multiple sensor readings. The way how the report 296will be stored by the BMC is defined by one of this method's parameters. 297The `ReportManager` object implements property that stores the maximum number 298of reports supported simultaneously. 299 300The `AddReport` method returns the path to the newly created report object. 301The report object implements the [`xyz.openbmc_project.Object.Delete`][10] 302and [`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10] 303interface is defined to add support for removing Report object, while the 304[`Report`][9] interface implements methods and properties for Report 305management along with properties containing telemetry readings. Each report 306object contains the timestamp of its last update. The report object 307contains an array of structures containing reading with its metadata and 308timestamp of last update of this metric. Each report has also the property 309that stores update interval (for periodically updated reports). 310 311**Trigger management** 312 313The trigger management D-Bus object: 314 315```ascii 316xyz.openbmc_project.Telemetry.TriggerManager 317/xyz/openbmc_project/Telemetry/Triggers 318``` 319The `TriggerManager` supports the 320`xyz.openbmc_project.Telemetry.TriggerManager` interface, which implements 321the `AddTrigger` method. This method shall be used to create new trigger for 322the certain metric. The method's parameters allow to define the type of metric 323for which trigger is set (discrete or numeric). Depend on this setting, this 324method accepts different set of trigger parameters. 325 326For discrete metric type, trigger parameters contain: 327 328| Field | Type | Description | 329|-------|------|-------------| 330| TriggerCondition | enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 331| DiscreteTriggers | array of structures | Array of discrete trigger structures. | 332 333Member of DiscreteTriggers array: 334 335| Field | Type | Description | 336|-------|------|-------------| 337| TriggerId| string | Unique trigger Id | 338| Severity | enum | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention | 339| Value | variant | Value of discrete metric, that constitutes a trigger event. | 340| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 341 342For numeric metric type, trigger parameters contain numeric thresholds. 343Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical, lower warning and lower critical. Thus it will contain up to 4 structures shown below: 344 345| Field | Type | Description | 346|-------|------|-------------| 347| ThresholdType | enum | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention| 348| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 349| Activation | enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above | 350| ThresholdValue | variant | Value of reading that will trigger the threshold | 351 352The `AddTrigger` method also allows to define the specific action when trigger 353is activated. Upon the trigger activation, three possible actions are allowed, 354logging event to log service, sending event via event service and triggering 355the metric report update. 356 357In order to assign trigger to specific metric, the metric parameter is defined. 358Its structure contains the following data: 359 360| Field | Type | Description | 361|-------|------|-------------| 362| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. | 363| MetricId | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 364| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. | 365 366The `AddTrigger` method also allows to set trigger's persistency (whether 367trigger shall be stored in the BMC's non-volatile memory). 368 369The `AddTrigger` method returns: 370```ascii 371String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}' 372``` 373Such created trigger implements the `xyz.openbmc_project.Object.Delete` 374and the `xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger 375object contains read-only information about metric type, for which it was 376created (discrete or numeric). This information determines which triggers 377are stored within trigger object. 378 379If trigger is defined for discrete metric type, than it contains trigger 380information that looks like this: 381 382| Type | Description | 383|------|-------------| 384| enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 385| array of structures | Array of discrete trigger structures. | 386 387Discrete trigger structure: 388 389| Type | Description | 390|------|-------------| 391| string | Unique trigger Id | 392| enum | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention | 393| variant | Value of discrete metric, that constitutes a trigger event. | 394| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 395 396If trigger is defined for numeric metric type, than it contains information 397about numeric triggers that is an array of 4 structures presented below: 398 399| Type | Description | 400|------|-------------| 401| enum | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning"| 402| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 403| enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" | 404| variant | Value of reading that will trigger the threshold | 405 406The trigger object also contains information about reading, for which trigger 407was defined. It is in a form of structure consisting of three fields. 408 409| Field type | Description | 410|------------|--------------| 411| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. | 412| string | Unique metric Id | 413| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. | 414 415**Trigger operations** 416 417Triggers support three types of operation: Log, Event and Update. For each, 418there is a different way of proceeding. 419 4201. For action Log, the event shall 421be logged to the system journal. In this case the Telemetry service writes 422data to system journal using libjournal. The Redfish log service shall then 423retrieve the data by reading system journal. All is shown on the diagram below. 424 425```ascii 426+---------------------------+ 427|bmcweb| | +----------------------+ 428+------/ +-----------+-+ | |Telemetry| | 429| |Redfish | | | +---------/ | 430| |log service| | | | | 431| +-----------/ | | | | 432| | | | | | 433| | | | | | 434| +------^------+ | +-----------+----------+ 435+---------------------------+ | 436 | | 437 +----collect----+ event 438 journal entry | (write to journal) 439 | | 440 +------------------------------------+ | 441 |systemd| | | | 442 +-------/ +----------+ +---+------+ | | 443 | |journal| | |libjournal| | | 444 | +-------/ <--> <-------+ 445 | | | +----------+ | 446 | | | | 447 | | | | 448 | +----------+ | 449 | | 450 +------------------------------------+ 451``` 4522. For action Event, the Telemetry service shall send event using the 453[Redfish Event Service][6] either as push-style event or SSE. 454 4553. For action Update, the Telemetry service will trigger the update of reading 456report pointed by it's D-Bus path contained in trigger object properties. The 457update shall cause the reading report's D-Bus object to emit property change 458signal. This will cause Redfish Metric Report to be streamed out if it was 459configured to do so. 460 461**Redfish Telemetry Service API** 462 463Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry 464resources. Metric report definitions determines which metrics are to be include 465in metric report. Metric definition is assigned to particular metric type and it 466describes how the metric should be interpreted. The following resource schemas 467shall be supported: 468 469- TelemetryService 1.1.2 470- MetricDefinition 1.0.3 471- MetricReportDefinition 1.3.0 472- MetricReport 1.2.0 473- Triggers 1.1.1 474 475The following diagram shows relations between these resources. 476 477```ascii 478 +----------------------------------------------------------------------------+ 479 | Service root | 480 +----------------------------------+-------------------------------+---------+ 481 | | 482 | | 483 | | 484 +----------------------------------v-----------------+ +----------v---------+ 485 | | |Chassis | 486 | Telemetry Service | | | 487 | | | | 488 | | | +---------------+ | 489 +---------+--------------+------------------+--------+ | | | | 490 | | | | | Chassis 1 | | 491 | | | | | | | 492 | | | | +---------+-----+ | 493 | | | | | | 494+----------v--+ +--------v----+ +----------v-----+ +--------------------+ 495|Triggers | |Metric | |Metric report | | 496| | |definition | | | | 497| | | +---------+ | | | Reads | 498| +---------+ | | |Reading | | | +-----------+ | ReadingVolts +--v------+ 499| | | | | |Volts <------+ +------------------> | 500| |Trigger 1| | | +---------+ | | | Metric | | | | 501| | | | | | | | report 1 | | Reads | Power | 502| | | | | +---------+ | | | | | PowerConsumed | | 503| | | | | | | | | | | | Watts | | 504| +--+---+--+ | | |Power <------+ +------------------> | 505| | | | | |Consumed | | | +-----^-----+ | +----^----+ 506| | | | | |Watts | | | | | | 507| | | | | +---------+ | | | | | 508| | | | | | | | | | 509+-------------+ +-------------+ +----------------+ | 510 | | | | 511 | | Triggers report update | | 512 | | (when applicable) | | 513 | +--------------------------------+ | 514 | | 515 | Monitors PowerConsumedWatts to check | 516 | whether trigger value is exceeded | 517 +------------------------------------------------------------------+ 518``` 519 520The diagram shows the relations between Redfish resources. Metric report is 521defined to be generated periodically, on demand or on change. Each metric in the 522Metric Report contains the URI to its metric definition and Redfish sensor, 523which reading value is presented. Nevertheless, under this presentation layer, 524Telemetry is gathering D-Bus sensors readings and exposing them 525in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor 526is mapped to Redfish sensor. 527 528Below examples of Redfish resources for the Telemetry Service are shown. 529 530The Telemetry Service Redfish resource example: 531 532```json 533{ 534 "@odata.type": "#TelemetryService.v1_1_2.TelemetryService", 535 "Id": "TelemetryService", 536 "Name": "Telemetry Service", 537 "Status": { 538 "State": "Enabled", 539 "Health": "OK" 540 }, 541 "MinCollectionInterval": "T00:00:10s", 542 "SupportedCollectionFunctions": [], 543 "MaxReports": <max_no_of_reports>, 544 "MetricDefinitions": { 545 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions" 546 }, 547 "MetricReportDefinitions": { 548 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions" 549 }, 550 "MetricReports": { 551 "@odata.id": "/redfish/v1/TelemetryService/MetricReports" 552 }, 553 "Triggers": { 554 "@odata.id": "/redfish/v1/TelemetryService/Triggers" 555 }, 556 "LogService": { 557 "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal" 558 }, 559 "@odata.context": "/redfish/v1/$metadata#TelemetryService", 560 "@odata.id": "/redfish/v1/TelemetryService" 561} 562``` 563 564Sample metric report definition: 565 566```json 567{ 568 "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition", 569 "Id": "SampleMetric", 570 "Name": "Sample Metric Report Definition", 571 "MetricReportDefinitionType": "Periodic", 572 "Schedule": { 573 "RecurrenceInterval": "T00:00:10" 574 }, 575 "ReportActions": [ 576 "LogToMetricReportsCollection" 577 ], 578 "ReportUpdates": "Overwrite", 579 "MetricReport": { 580 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric" 581 }, 582 "Status": { 583 "State": "Enabled" 584 }, 585 "Metrics": [ 586 { 587 "MetricId": "Test", 588 "MetricProperties": [ 589 "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts" 590 ] 591 } 592 ], 593 "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition", 594 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage" 595} 596``` 597 598Sample metric report: 599 600```json 601{ 602 "@odata.type": "#MetricReport.v1_2_0.MetricReport", 603 "Id": "SampleMetric", 604 "Name": "Sample Metric Report", 605 "ReportSequence": "0", 606 "MetricReportDefinition": { 607 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 608 }, 609 "MetricValues": [ 610 { 611 "MetricDefinition": { 612 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition" 613 }, 614 "MetricId": "Test", 615 "MetricValue": "100", 616 "Timestamp": "2016-11-08T12:25:00-05:00", 617 "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 618 } 619 ], 620 "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport", 621 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage" 622} 623``` 624 625Sample trigger, that will trigger metric report update: 626 627```json 628{ 629 "@odata.type": "#Triggers.v1_1_1.Triggers", 630 "Id": "SampleTrigger", 631 "Name": "Sample Trigger", 632 "MetricType": "Numeric", 633 "Links": { 634 "MetricReportDefinitions": [ 635 "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 636 ] 637 }, 638 "Status": { 639 "State": "Enabled" 640 }, 641 "TriggerActions": [ 642 "RedfishMetricReport" 643 ], 644 "NumericThresholds": { 645 "UpperCritical": { 646 "Reading": 50, 647 "Activation": "Increasing", 648 "DwellTime": "PT0.001S" 649 }, 650 "UpperWarning": { 651 "Reading": 48.1, 652 "Activation": "Increasing", 653 "DwellTime": "PT0.004S" 654 } 655 }, 656 "MetricProperties": [ 657 "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 658 ], 659 "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers", 660 "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers" 661} 662``` 663 664**Performance tests** 665 666Performance test were conducted on the AST2500 system with 64 MB flash and 667512 MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The 668runtime statistics are shown in the table below. The reading report is 669mapped into single Metric Report. The runtime data is collected for the 670Telemetry component only. All reports was created with 671```xyz.openbmc_project.Telemetry.Metric.OnChange``` property to 672maximize the workload. In the configuration with 50 reports and 50 sensors 673it is about 200 new readings per second, generating 200 reading reports 674per second. The table shows CPU usage and memory usage. The VSZ is the amount 675of memory mapped into the address space of the process. It includes pages 676backed by the process' executable file and shared libraries, its heap and 677stack, as well as anything else it has mapped. 678 679 680| Telemetry service state | VSZ | %VSZ | %CPU | 681|--------------------------------------------------|------|------|------| 682| Idle (0 reports, 0 sensors) |5188 B| 1% | 0% | 683| 1 report, 1 sensor |5188 B| 1% | 1% | 684| 2 reports, 1 sensor |5188 B| 1% | 1% | 685| 2 reports, 2 sensors (1 sensor per report) |5188 B| 1% | 1% | 686| 1 report, 10 sensors |5188 B| 1% | 1% | 687| 10 reports, 10 sensors (same for each report) |5320 B| 1% | 1-2% | 688| 2 reports, 20 sensors (10 per report) |5188 B| 1% | 1% | 689| 30 reports, 30 sensors (10 per report) |5444 B| 1% | 5-9% | 690| 50 reports, 50 sensors (10 per report) |5572 B| 1% |11-14%| 691 692The last two configurations use 10 sensors per reading report, which gives 6933 or 5 distinctive configurations. Each such configuration is used to 694create 10 reading reports to obtain the desired amount of 30 or 50 reading 695reports. 696 697In this architecture reading report is created every time when Redfish 698Metric Report Definition is posted (creating new Metric Report). 699 700## Alternatives Considered 701The [framework based on collectd/librrd][5] was considered as alternate design. 702Although it seems to be versatile and scalable solution, it has some drawbacks 703from our point of view: 704* Collectd's footprint in the minimal working configuration is around 2.6 MB, 705while available space for the OpenBMC is limited to 64 MB. 706* In this design, librrd is used to store metrics on the BMC's non-volatile 707storage, which may be an issue, when lots of metrics are captured and stored 708to OpenBMC's limited storage space. Also flash wear-out issue may occur, when 709metrics are captured frequently (like once per second). 710* Telemetry service is directly compatible with Redfish Telemetry Service API, 711which means, that Telemetry's reading reports can be directly mapped to Redfish 712Metric Reports. 713* Telemetry service unifies the way how the BMC's telemetry is exposed over 714the Redfish and may be used with multiple front-ends, thus there is no problem 715to add support telemetry over IPMI or any other API. 716 717Since this design assumes flexibility and modularity, there is no obstacles to 718use collectd in cooperation with Telemetry. The one of possible configurations 719is shown on the diagram below. 720 721```ascii 722 +-----------------+ +-----------------+ 723 | D-Bus sensors | | Telemetry | 724 +--------^--------+ +--------^--------+ 725 | | 726 | | 727 | | 728<--------^--v-----------D-Bus--------v-^----------> 729 | | 730 | | 731 | | 732 +-------v------------+ +----------v--------+ 733 | collectd metrics | | | 734 | exposed as D-Bus | | bmcweb | 735 | sensors | | (with Redfish | 736 +---------^----------+ | Telemetry | 737 | | Service) | 738 | | | 739 +------+-------+ +-------------------+ 740 | | 741 | collectd | 742 | | 743 +--------------+ 744``` 745Here collectd is used as the source of some set of metrics. It exposes them 746as the D-Bus sensors, which can easily be consumed either by the bmcweb and 747Telemetry service without any changes in their D-Bus interfaces. In such 748configuration Telemetry service provides metric reports and triggers 749management. 750 751Other possible configuration is to use collectd without the Telemetry service, 752but in such case, collectd does not provide metric reports and triggers support 753compatible with the Redfish. In such case, Redfish Telemetry Service won't be 754supported or metric reports and triggers support has to be provided by the 755collectd. 756 757## Impacts 758This design impacts the architecture of the bmcweb component, since it adds 759the Redfish Telemetry Service implementation as a component for the existing 760Redfish API implementation. 761 762## Testing 763This is the very high-level description of the proposed set of tests. 764Testing shall be done on three basic levels: 765* Unit tests 766* Functional tests 767* Performance tests 768 769**Unit tests** 770 771The Telemetry's code shall be covered by the unit tests. The preferred 772framework is the [GTest/GMock][7]. The unit tests shall be ran before code 773change is to be committed to make sure, that nothing is broken in existing 774functionality. Also, when new code is introduced, a new set of unit tests shall 775be committed with it according to test-driven development principle. Unit tests 776shall be also carefully reviewed. 777 778**Functional tests** 779 780Functional tests will be divided into two steps. 781 782First step is for testing the Telemetry metric reports management. Test scenario 783shall contain creating metric report by POSTing proper metric report definition, 784reading metric report (using GET on proper URI) and deleting the metric report. 785The required configuration for such test is D-Bus sensors (at least some of 786them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be 787performed on real hardware. For ease of metric testing, dummy D-Bus sensors may 788be provided to provide specifically prepared metrics. This configuration shall 789also enable testing aggregated operations (MIN, MAX, SUM, AVG). 790 791Second step is to test triggers and events generation. This will require also 792Event Service to be implemented along with Log Service. Tests shall cover all 793scenarios with sending metric report as an event, triggering metric report 794update and logging events. 795 796**Performance tests** 797 798Performance tests shall be done using full OpenBMC configuration with all 799the required set of features. The tests shall create a lot of metric reports 800(up to maximum number) along with all possible triggers. Measurements shall 801cover the periodic metric report jitter, delays in event logging or sending, 802BMC's CPU utilization and the performance impact on other services. 803 804[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip 805[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md 806[3]: https://www.kernel.org/doc/Documentation/hwmon/ 807[4]: https://www.freedesktop.org/wiki/Software/dbus/ 808[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257 809[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749 810[7]: https://github.com/google/googletest 811[8]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml 812[9]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Telemetry/Report.interface.yaml 813[10]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Object/Delete.interface.yaml 814