1# OpenBMC platform telemetry 2 3Author: Piotr Matuszczak <piotr.matuszczak@intel.com> 4 5Other contributors: 6 7- Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> `pawelr` 8- Kamil Kowalski <kamil.kowalski@intel.com> 9 10Created: 2019-08-07 11 12## Problem Description 13 14The BMC on server platform gathers lots of telemetry data, which has to be 15exposed in clean, human readable and standardized format. This document focuses 16on telemetry over the Redfish, since it is standard API for platform 17manageability. 18 19## Background and References 20 21- OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1] 22 for exposing platform telemetry over the network. 23- OpenBMC platform telemetry shall leverage the [OpenBMC sensors architecture 24 implementation][2]. 25- OpenBMC platform telemetry shall implement a service, called Telemetry to deal 26 with metrics report and trigger management. This service is described later in 27 this document. 28- Although we use the [hwmon][3] to gather readings from physical sensors, this 29 architecture does not depend on it, because the Telemetry service component 30 relies on the [OpenBMC D-Bus sensors][2]. 31 32## Requirements 33 34- [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the 35 Telemetry service requires telemetry sources to be implemented as D-Bus 36 sensors. 37 38## Proposed Design 39 40Redfish Telemetry Model shall implement Telemetry Service with the following 41collection resources: 42 43- Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.) 44- Metric Report Definitions - defines how metric report shall be created (which 45 metrics it shall contain, how often it shall be generated etc.) 46- Metric Reports - contains actual metric reports containing telemetry data 47 generated according to the Metric Report Definitions 48- Metric Triggers - contains thresholds and actions that apply to specific 49 metrics 50 51OpenBMC telemetry architecture is shown on the diagram below. 52 53```ascii 54 +--------------+ +----------------+ +-----------------+ 55 |hwmon| | |Dbus sensors| | |Telemetry| | 56 +-----/ | +------------/ | +---------/ | 57 | +--filesystem---> | | | 58 | | | | | | 59 +--------------+ +--------^-------+ +--------^--------+ 60 | | 61 | | 62<------------------------------------------v-----^--DBus----------v-----------> 63 | 64 | 65+-------+---------------------------------------------------------------------+ 66|bmcweb | | | 67+-------/ | | 68| | | 69| +--------+-------------------------------------v--------------------------+ | 70| |Redfish | | | 71| +--------/ +---------+-------+ | | 72| | |Existing | | | | 73| | +------------------------------------------------+ |Redfish | | | | 74| | |Telemetry Service| | |resources| | | | 75| | +----------------+/ | +---------/ | | | 76| | | +----------+ +-----------+ +-------------+ | | +---------+ | | | 77| | | | Metric | | Metric | |Metric report| | | | Redfish | | | | 78| | | | triggers | |definitions| |definitions <---------+ sensors | | | | 79| | | | | | | | | | | | | | | | 80| | | +----+-----+ +-----+-----+ +------+------+ | | +---------+ | | | 81| | | | | | | | | | | 82| | | | | | | | | | | 83| | | | | | | | | | | 84| | | | +-----v-----+ | | | | | | 85| | | | | Metric | | | | | | | 86| | | +--------> report <---------+ | | | | | 87| | | | | | | | | | 88| | | +-----------+ | | | | | 89| | | | | | | | 90| | +------------------------------------------------+ +-----------------+ | | 91| | | | 92| +-------------------------------------------------------------------------+ | 93| | 94+-----------------------------------------------------------------------------+ 95``` 96 97The telemetry service component is a part of Redfish and implements the DMTF's 98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors 99URIs for metric report creation. Those sensors are also used to get URI->D-Bus 100sensor mapping. Redfish Telemetry Service acts as presentation layer for the 101telemetry, while Telemetry service is responsible for gathering metrics from 102D-Bus sensors and exposing them as D-Bus objects. Telemetry service supports 103different monitoring modes (periodic, on change and on demand) along with 104aggregated operations: 105 106- SINGLE - current reading value 107- AVERAGE - average value over defined time period 108- MAX - max reading value during defined time period 109- MIN - min reading value during defined time period 110- SUM - sum of reading values over defined time period 111 112The time period for calculating aggregated metric is taken from the Redfish 113Metric Report Definition resource for each sensor's metric. 114 115Telemetry service supports creating and managing metric report, which may 116contain single or multiple metrics from sensors. This metric report is mapped to 117Metric Report for the Redfish Telemetry Service. 118 119The diagram below shows the flows for creation and update of metric report. 120 121```ascii 122+----+ +------+ +---------+ +-------+ 123|User| |bmcweb| |Telemetry| | D-Bus | 124+-+--+ +--+---+ +----+----+ |Sensors| 125 | | | +---+---+ 126 | | | | 127+-----------------------------------------------------------------------------+ 128|Metric report definition flow| | | | 129+-----------------------------+ | | | 130| | | | | | 131| | | | | | 132| | POST request | | | | 133| | with metric | | | | 134| | report | | | | 135| | definition | | | | 136| +--------------------> Invoke AddReport | Register for D-Bus | | 137| | | method on D-Bus | sensors | | 138| | +-----------------------> PropertiesChanged | | 139| | | | signals | | 140| | | +--------------------------> | 141| | | |--------------------------> | 142| | | +--------------------------> | 143| | | | | | 144| | HTTP response | +-+Create Report | | 145| | code 201 with | Return created | |D-Bus object | | 146| | Metric Report | Report D-Bus path <-+ | | 147| | Definition's URI <-----------------------+ | | 148| <--------------------+ | | | 149| | | | | | 150| | | | | | 151+-----------------------------------------------------------------------------+ 152 | | | | 153+-----------------------------------------------------------------------------+ 154|Periodic metric report update flow| | | | 155+----------------------------------+ +-+Metric report | | 156| | | | |timer triggers | | 157| | | <-+report update | | 158| | | | | | 159+----------------------------------Optional-----------------------------------+ 160| | | | | | 161| | Send report as SSE or push-style event | | | 162| | using Redfish Event Service (not shown | | | 163| | here) if configured to do so. | | | 164| <--------------------------------------------+ | | 165| | | | | | 166+-----------------------------------------------------------------------------+ 167| | GET on Metric | | | | 168| | Report URI | | Sensor's Properties- | | 169| +--------------------> | Changed signal | | 170| | +-+Map report's URI <--------------------------+ | 171| | | |to D-Bus path | | | 172| | <-+ | +----------------------+ | | 173| | | Invoke GetAll method | |Note that sensor's | | | 174| | | on report D-Bus | |PropertiesChanged | | | 175| | | object | |signal is asynchronous| | | 176| | +-----------------------> |to metric report timer| | | 177| | | | |This timer is the only| | | 178| | Return metric | Return report data | |thing that triggers | | | 179| | report in JSON <-----------------------+ |metric report update | | | 180| | format | | +----------------------+ | | 181| <--------------------+ | | | 182| | | | | | 183+-----------------------------------------------------------------------------+ 184 | | | | 185+-----------------------------------------------------------------------------+ 186|On change metric report update flow| | Sensor's Properties- | | 187+-----------------------------------+ | Changed signal | | 188| | | <--------------------------+ | 189| | | | | | 190| | | +-+Sensor's signal | | 191| | | | |triggers report | | 192| | | <-+update | | 193| | | | | | 194+----------------------------------Optional-----------------------------------+ 195| | | | | | 196| | Send report as SSE or push-style event | | | 197| | using Redfish Event Service (not shown | | | 198| | here) if configured to do so. | | | 199| <--------------------------------------------+ | | 200| | | | | | 201+-----------------------------------------------------------------------------+ 202| | GET on Metric | | | | 203| | Report URI | | | | 204| +--------------------> | | | 205| | +-+Map report's URI | | | 206| | | |to D-Bus path | +----------------------+ | | 207| | <-+ | |Note that sensor's | | | 208| | | Invoke GetAll method | |PropertiesChanged | | | 209| | | on report D-Bus | |signal triggers the | | | 210| | | object | |report update. It is | | | 211| | +-----------------------> |sufficient that the | | | 212| | | | |signal from only one | | | 213| | Return metric | Return report data | |sensor triggers report| | | 214| | report in JSON <-----------------------+ |update. | | | 215| | format | | +----------------------+ | | 216| <--------------------+ | | | 217| | | | | | 218+-----------------------------------------------------------------------------+ 219 | | | | 220+-+--------------------+------------------------------------------------------+ 221|On demand metric report update flow| | | | 222+-+--------------------+------------+ | | | 223| | | | | | 224| | GET on Metric | | | | 225| | Report URI | | | | 226| +--------------------> | | | 227| | +-+Map report's URI | | | 228| | | |to D-Bus path | | | 229| | <-+ | | | 230| | | | | | 231| | | Invoke the Update | | | 232| | | method for report | | | 233| | | D-Bus object | | | 234| | +-----------------------> | | 235| | | +-+Update method triggers | | 236| | | | |report to be updated | | 237| | | | |with the latest known | | 238| | | | |sensor's readings. | | 239| | | | |No additional sensor | | 240| | | <-+readings are performed. | | 241+----------------------------------Optional-----------------------------------+ 242| | | | | | 243| | Send report as SSE or push-style event | | | 244| | using Redfish Event Service (not shown | | | 245| | here) if configured to do so. | | | 246| <--------------------------------------------+ | | 247| | | | | | 248+-----------------------------------------------------------------------------+ 249| | | Update method call | | | 250| | | result | | | 251| | <-----------------------+ | | 252| | | | | | 253| | | Invoke GetAll method | | | 254| | | on report D-Bus | | | 255| | | object | | | 256| | +-----------------------> | | 257| | | | | | 258| | Return metric | Return report data | | | 259| | report in JSON <-----------------------+ | | 260| | format | | | | 261| <--------------------+ | | | 262| | | | | | 263+-----------------------------------------------------------------------------+ 264 | | | | 265``` 266 267The Redfish implementation in bmcweb is stateless, thus it is not able to store 268metric reports. All operations on metric reports shall be done in the Telemetry 269service. Sending metric report as SSE or push-style events shall be done via the 270[Redfish Event Service][6]. It is marked as optional because metric report does 271not have to be configured for pushing its data through the event. 272 273In case of on demand metric report update, Telemetry service performs no 274additional sensor readings because it already has the latest values, since they 275are updated on PropertiesChanged signal from the D-Bus sensors. 276 277### Telemetry service on [D-Bus][4] 278 279Telemetry service exposes specific interfaces on D-Bus. One of them will be used 280for reading report management. The second one will be used for triggers 281management. 282 283### Reading report management 284 285The reading report management D-Bus object: 286 287```ascii 288xyz.openbmc_project.Telemetry.ReportManager 289/xyz/openbmc_project/Telemetry/Reports 290``` 291 292The `ReportManager` implements D-Bus interface 293[`xyz.openbmc_project.Telemetry.ReportManager`][8] for report management. The 294interface is described in the phosphor-dbus-interfaces. This interface 295implements `AddReport` method, which is used to create a metric report. The 296report may contain a single or multiple sensor readings. The way how the report 297will be stored by the BMC is defined by one of this method's parameters. The 298`ReportManager` object implements property that stores the maximum number of 299reports supported simultaneously. 300 301The `AddReport` method returns the path to the newly created report object. The 302report object implements the [`xyz.openbmc_project.Object.Delete`][10] and 303[`xyz.openbmc_project.Telemetry.Report`][9] interfaces. The [`Delete`][10] 304interface is defined to add support for removing Report object, while the 305[`Report`][9] interface implements methods and properties for Report management 306along with properties containing telemetry readings. Each report object contains 307the timestamp of its last update. The report object contains an array of 308structures containing reading with its metadata and timestamp of last update of 309this metric. Each report has also the property that stores update interval (for 310periodically updated reports). 311 312### Trigger management 313 314The trigger management D-Bus object: 315 316```ascii 317xyz.openbmc_project.Telemetry.TriggerManager 318/xyz/openbmc_project/Telemetry/Triggers 319``` 320 321The `TriggerManager` supports the `xyz.openbmc_project.Telemetry.TriggerManager` 322interface, which implements the `AddTrigger` method. This method shall be used 323to create new trigger for the certain metric. The method's parameters allow to 324define the type of metric for which trigger is set (discrete or numeric). Depend 325on this setting, this method accepts different set of trigger parameters. 326 327For discrete metric type, trigger parameters contain: 328 329| Field | Type | Description | 330| ---------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 331| TriggerCondition | enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed; <br> "Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 332| DiscreteTriggers | array of structures | Array of discrete trigger structures. | 333 334Member of DiscreteTriggers array: 335 336| Field | Type | Description | 337| --------- | ------- | ---------------------------------------------------------------------------------------------------------------- | 338| TriggerId | string | Unique trigger Id | 339| Severity | enum | Severity: <br> "OK" - normal<br> "Warning" - requires attention<br> "Critical" - requires immediate attention | 340| Value | variant | Value of discrete metric, that constitutes a trigger event. | 341| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 342 343For numeric metric type, trigger parameters contain numeric thresholds. Numeric 344thresholds structure shall contain up to 4 thresholds: upper warning, upper 345critical, lower warning and lower critical. Thus it will contain up to 4 346structures shown below: 347 348| Field | Type | Description | 349| -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 350| ThresholdType | enum | Numeric trigger type: <br> "UpperCritical" - reading is above normal range and requires immediate attention<br>"UpperWarning" - reading is above normal range and may require attention<br>"LowerCritical" - reading is below normal range and requires immediate attention<br>"LowerWarning" - reading is below normal range and may require attention | 351| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined for this trigger is performed. | 352| Activation | enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing" - trigger action when reading is changing from below to above the threshold's value<br> "Decreasing" - trigger action when reading is changing from above to below the threshold's value<br> "Either" - trigger action when reading is crossing the threshold's value in either direction described above | 353| ThresholdValue | variant | Value of reading that will trigger the threshold | 354 355The `AddTrigger` method also allows to define the specific action when trigger 356is activated. Upon the trigger activation, three possible actions are allowed, 357logging event to log service, sending event via event service and triggering the 358metric report update. 359 360In order to assign trigger to specific metric, the metric parameter is defined. 361Its structure contains the following data: 362 363| Field | Type | Description | 364| ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 365| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. | 366| MetricId | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 367| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's action is set to update metric report. | 368 369The `AddTrigger` method also allows to set trigger's persistency (whether 370trigger shall be stored in the BMC's non-volatile memory). 371 372The `AddTrigger` method returns: 373 374```ascii 375String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/{Domain}/{Name}' 376``` 377 378Such created trigger implements the `xyz.openbmc_project.Object.Delete` and the 379`xyz.openbmc_project.Telemetry.Trigger` interfaces. Each trigger object contains 380read-only information about metric type, for which it was created (discrete or 381numeric). This information determines which triggers are stored within trigger 382object. 383 384If trigger is defined for discrete metric type, than it contains trigger 385information that looks like this: 386 387| Type | Description | 388| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 389| enum | Discrete trigger condition: <br> "Changed" - trigger occurs when value of metric has changed;<br>"Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 390| array of structures | Array of discrete trigger structures. | 391 392Discrete trigger structure: 393 394| Type | Description | 395| ------- | ------------------------------------------------------------------------------------------------------------------- | 396| string | Unique trigger Id | 397| enum | Severity: <br> "OK" - normal<br>"Warning" - requires attention<br> "Critical" - requires immediate attention | 398| variant | Value of discrete metric, that constitutes a trigger event. | 399| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed. | 400 401If trigger is defined for numeric metric type, than it contains information 402about numeric triggers that is an array of 4 structures presented below: 403 404| Type | Description | 405| ------- | ------------------------------------------------------------------------------------------------------------------------------------- | 406| enum | Numeric trigger type: <br> "UpperWarning"<br>"UpperWarning"<br>"LowerCritical"<br>"LowerWarning" | 407| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the `ActionType` is performed. | 408| enum | Indicates direction of crossing the threshold value that trigger the threshold's action:<br> "Increasing"<br>"Decreasing"<br>"Either" | 409| variant | Value of reading that will trigger the threshold | 410 411The trigger object also contains information about reading, for which trigger 412was defined. It is in a form of structure consisting of three fields. 413 414| Field type | Description | 415| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 416| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. | 417| string | Unique metric Id | 418| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. | 419 420### Trigger operations\*\* 421 422Triggers support three types of operation: Log, Event and Update. For each, 423there is a different way of proceeding. 424 4251. For action Log, the event shall be logged to the system journal. In this case 426 the Telemetry service writes data to system journal using libjournal. The 427 Redfish log service shall then retrieve the data by reading system journal. 428 All is shown on the diagram below. 429 430 ```ascii 431 +---------------------------+ 432 |bmcweb| | +----------------------+ 433 +------/ +-----------+-+ | |Telemetry| | 434 | |Redfish | | | +---------/ | 435 | |log service| | | | | 436 | +-----------/ | | | | 437 | | | | | | 438 | | | | | | 439 | +------^------+ | +-----------+----------+ 440 +---------------------------+ | 441 | | 442 +----collect----+ event 443 journal entry | (write to journal) 444 | | 445 +------------------------------------+ | 446 |systemd| | | | 447 +-------/ +----------+ +---+------+ | | 448 | |journal| | |libjournal| | | 449 | +-------/ <--> <-------+ 450 | | | +----------+ | 451 | | | | 452 | | | | 453 | +----------+ | 454 | | 455 +------------------------------------+ 456 ``` 457 4582. For action Event, the Telemetry service shall send event using the [Redfish 459 Event Service][6] either as push-style event or SSE. 460 4613. For action Update, the Telemetry service will trigger the update of reading 462 report pointed by it's D-Bus path contained in trigger object properties. The 463 update shall cause the reading report's D-Bus object to emit property change 464 signal. This will cause Redfish Metric Report to be streamed out if it was 465 configured to do so. 466 467### Redfish Telemetry Service API 468 469Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry 470resources. Metric report definitions determines which metrics are to be include 471in metric report. Metric definition is assigned to particular metric type and it 472describes how the metric should be interpreted. The following resource schemas 473shall be supported: 474 475- TelemetryService 1.1.2 476- MetricDefinition 1.0.3 477- MetricReportDefinition 1.3.0 478- MetricReport 1.2.0 479- Triggers 1.1.1 480 481The following diagram shows relations between these resources. 482 483```ascii 484 +----------------------------------------------------------------------------+ 485 | Service root | 486 +----------------------------------+-------------------------------+---------+ 487 | | 488 | | 489 | | 490 +----------------------------------v-----------------+ +----------v---------+ 491 | | |Chassis | 492 | Telemetry Service | | | 493 | | | | 494 | | | +---------------+ | 495 +---------+--------------+------------------+--------+ | | | | 496 | | | | | Chassis 1 | | 497 | | | | | | | 498 | | | | +---------+-----+ | 499 | | | | | | 500+----------v--+ +--------v----+ +----------v-----+ +--------------------+ 501|Triggers | |Metric | |Metric report | | 502| | |definition | | | | 503| | | +---------+ | | | Reads | 504| +---------+ | | |Reading | | | +-----------+ | ReadingVolts +--v------+ 505| | | | | |Volts <------+ +------------------> | 506| |Trigger 1| | | +---------+ | | | Metric | | | | 507| | | | | | | | report 1 | | Reads | Power | 508| | | | | +---------+ | | | | | PowerConsumed | | 509| | | | | | | | | | | | Watts | | 510| +--+---+--+ | | |Power <------+ +------------------> | 511| | | | | |Consumed | | | +-----^-----+ | +----^----+ 512| | | | | |Watts | | | | | | 513| | | | | +---------+ | | | | | 514| | | | | | | | | | 515+-------------+ +-------------+ +----------------+ | 516 | | | | 517 | | Triggers report update | | 518 | | (when applicable) | | 519 | +--------------------------------+ | 520 | | 521 | Monitors PowerConsumedWatts to check | 522 | whether trigger value is exceeded | 523 +------------------------------------------------------------------+ 524``` 525 526The diagram shows the relations between Redfish resources. Metric report is 527defined to be generated periodically, on demand or on change. Each metric in the 528Metric Report contains the URI to its metric definition and Redfish sensor, 529which reading value is presented. Nevertheless, under this presentation layer, 530Telemetry is gathering D-Bus sensors readings and exposing them in reading 531reports over D-Bus for the Telemetry Service. Each D-Bus sensor is mapped to 532Redfish sensor. 533 534Below examples of Redfish resources for the Telemetry Service are shown. 535 536The Telemetry Service Redfish resource example: 537 538```json 539{ 540 "@odata.type": "#TelemetryService.v1_1_2.TelemetryService", 541 "Id": "TelemetryService", 542 "Name": "Telemetry Service", 543 "Status": { 544 "State": "Enabled", 545 "Health": "OK" 546 }, 547 "MinCollectionInterval": "T00:00:10s", 548 "SupportedCollectionFunctions": [], 549 "MaxReports": <max_no_of_reports>, 550 "MetricDefinitions": { 551 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions" 552 }, 553 "MetricReportDefinitions": { 554 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions" 555 }, 556 "MetricReports": { 557 "@odata.id": "/redfish/v1/TelemetryService/MetricReports" 558 }, 559 "Triggers": { 560 "@odata.id": "/redfish/v1/TelemetryService/Triggers" 561 }, 562 "LogService": { 563 "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal" 564 }, 565 "@odata.context": "/redfish/v1/$metadata#TelemetryService", 566 "@odata.id": "/redfish/v1/TelemetryService" 567} 568``` 569 570Sample metric report definition: 571 572```json 573{ 574 "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition", 575 "Id": "SampleMetric", 576 "Name": "Sample Metric Report Definition", 577 "MetricReportDefinitionType": "Periodic", 578 "Schedule": { 579 "RecurrenceInterval": "T00:00:10" 580 }, 581 "ReportActions": ["LogToMetricReportsCollection"], 582 "ReportUpdates": "Overwrite", 583 "MetricReport": { 584 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric" 585 }, 586 "Status": { 587 "State": "Enabled" 588 }, 589 "Metrics": [ 590 { 591 "MetricId": "Test", 592 "MetricProperties": [ 593 "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts" 594 ] 595 } 596 ], 597 "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition", 598 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage" 599} 600``` 601 602Sample metric report: 603 604```json 605{ 606 "@odata.type": "#MetricReport.v1_2_0.MetricReport", 607 "Id": "SampleMetric", 608 "Name": "Sample Metric Report", 609 "ReportSequence": "0", 610 "MetricReportDefinition": { 611 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 612 }, 613 "MetricValues": [ 614 { 615 "MetricDefinition": { 616 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition" 617 }, 618 "MetricId": "Test", 619 "MetricValue": "100", 620 "Timestamp": "2016-11-08T12:25:00-05:00", 621 "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 622 } 623 ], 624 "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport", 625 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage" 626} 627``` 628 629Sample trigger, that will trigger metric report update: 630 631```json 632{ 633 "@odata.type": "#Triggers.v1_1_1.Triggers", 634 "Id": "SampleTrigger", 635 "Name": "Sample Trigger", 636 "MetricType": "Numeric", 637 "Links": { 638 "MetricReportDefinitions": [ 639 "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 640 ] 641 }, 642 "Status": { 643 "State": "Enabled" 644 }, 645 "TriggerActions": ["RedfishMetricReport"], 646 "NumericThresholds": { 647 "UpperCritical": { 648 "Reading": 50, 649 "Activation": "Increasing", 650 "DwellTime": "PT0.001S" 651 }, 652 "UpperWarning": { 653 "Reading": 48.1, 654 "Activation": "Increasing", 655 "DwellTime": "PT0.004S" 656 } 657 }, 658 "MetricProperties": [ 659 "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 660 ], 661 "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers", 662 "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers" 663} 664``` 665 666### Performance tests 667 668Performance test were conducted on the AST2500 system with 64 MB flash and 512 669MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The runtime 670statistics are shown in the table below. The reading report is mapped into 671single Metric Report. The runtime data is collected for the Telemetry component 672only. All reports was created with 673`xyz.openbmc_project.Telemetry.Metric.OnChange` property to maximize the 674workload. In the configuration with 50 reports and 50 sensors it is about 200 675new readings per second, generating 200 reading reports per second. The table 676shows CPU usage and memory usage. The VSZ is the amount of memory mapped into 677the address space of the process. It includes pages backed by the process' 678executable file and shared libraries, its heap and stack, as well as anything 679else it has mapped. 680 681| Telemetry service state | VSZ | %VSZ | %CPU | 682| --------------------------------------------- | ------ | ---- | ------ | 683| Idle (0 reports, 0 sensors) | 5188 B | 1% | 0% | 684| 1 report, 1 sensor | 5188 B | 1% | 1% | 685| 2 reports, 1 sensor | 5188 B | 1% | 1% | 686| 2 reports, 2 sensors (1 sensor per report) | 5188 B | 1% | 1% | 687| 1 report, 10 sensors | 5188 B | 1% | 1% | 688| 10 reports, 10 sensors (same for each report) | 5320 B | 1% | 1-2% | 689| 2 reports, 20 sensors (10 per report) | 5188 B | 1% | 1% | 690| 30 reports, 30 sensors (10 per report) | 5444 B | 1% | 5-9% | 691| 50 reports, 50 sensors (10 per report) | 5572 B | 1% | 11-14% | 692 693The last two configurations use 10 sensors per reading report, which gives 3 or 6945 distinctive configurations. Each such configuration is used to create 10 695reading reports to obtain the desired amount of 30 or 50 reading reports. 696 697In this architecture reading report is created every time when Redfish Metric 698Report Definition is posted (creating new Metric Report). 699 700## Alternatives Considered 701 702The [framework based on collectd/librrd][5] was considered as alternate design. 703Although it seems to be versatile and scalable solution, it has some drawbacks 704from our point of view: 705 706- Collectd's footprint in the minimal working configuration is around 2.6 MB, 707 while available space for the OpenBMC is limited to 64 MB. 708- In this design, librrd is used to store metrics on the BMC's non-volatile 709 storage, which may be an issue, when lots of metrics are captured and stored 710 to OpenBMC's limited storage space. Also flash wear-out issue may occur, when 711 metrics are captured frequently (like once per second). 712- Telemetry service is directly compatible with Redfish Telemetry Service API, 713 which means, that Telemetry's reading reports can be directly mapped to 714 Redfish Metric Reports. 715- Telemetry service unifies the way how the BMC's telemetry is exposed over the 716 Redfish and may be used with multiple front-ends, thus there is no problem to 717 add support telemetry over IPMI or any other API. 718 719Since this design assumes flexibility and modularity, there is no obstacles to 720use collectd in cooperation with Telemetry. The one of possible configurations 721is shown on the diagram below. 722 723```ascii 724 +-----------------+ +-----------------+ 725 | D-Bus sensors | | Telemetry | 726 +--------^--------+ +--------^--------+ 727 | | 728 | | 729 | | 730<--------^--v-----------D-Bus--------v-^----------> 731 | | 732 | | 733 | | 734 +-------v------------+ +----------v--------+ 735 | collectd metrics | | | 736 | exposed as D-Bus | | bmcweb | 737 | sensors | | (with Redfish | 738 +---------^----------+ | Telemetry | 739 | | Service) | 740 | | | 741 +------+-------+ +-------------------+ 742 | | 743 | collectd | 744 | | 745 +--------------+ 746``` 747 748Here collectd is used as the source of some set of metrics. It exposes them as 749the D-Bus sensors, which can easily be consumed either by the bmcweb and 750Telemetry service without any changes in their D-Bus interfaces. In such 751configuration Telemetry service provides metric reports and triggers management. 752 753Other possible configuration is to use collectd without the Telemetry service, 754but in such case, collectd does not provide metric reports and triggers support 755compatible with the Redfish. In such case, Redfish Telemetry Service won't be 756supported or metric reports and triggers support has to be provided by the 757collectd. 758 759## Impacts 760 761This design impacts the architecture of the bmcweb component, since it adds the 762Redfish Telemetry Service implementation as a component for the existing Redfish 763API implementation. 764 765## Testing 766 767This is the very high-level description of the proposed set of tests. Testing 768shall be done on three basic levels: 769 770- Unit tests 771- Functional tests 772- Performance tests 773 774### Unit tests 775 776The Telemetry's code shall be covered by the unit tests. The preferred framework 777is the [GTest/GMock][7]. The unit tests shall be ran before code change is to be 778committed to make sure, that nothing is broken in existing functionality. Also, 779when new code is introduced, a new set of unit tests shall be committed with it 780according to test-driven development principle. Unit tests shall be also 781carefully reviewed. 782 783### Functional tests 784 785Functional tests will be divided into two steps. 786 787First step is for testing the Telemetry metric reports management. Test scenario 788shall contain creating metric report by POSTing proper metric report definition, 789reading metric report (using GET on proper URI) and deleting the metric report. 790The required configuration for such test is D-Bus sensors (at least some of 791them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be 792performed on real hardware. For ease of metric testing, dummy D-Bus sensors may 793be provided to provide specifically prepared metrics. This configuration shall 794also enable testing aggregated operations (MIN, MAX, SUM, AVG). 795 796Second step is to test triggers and events generation. This will require also 797Event Service to be implemented along with Log Service. Tests shall cover all 798scenarios with sending metric report as an event, triggering metric report 799update and logging events. 800 801### Performance tests 802 803Performance tests shall be done using full OpenBMC configuration with all the 804required set of features. The tests shall create a lot of metric reports (up to 805maximum number) along with all possible triggers. Measurements shall cover the 806periodic metric report jitter, delays in event logging or sending, BMC's CPU 807utilization and the performance impact on other services. 808 809[1]: 810 https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip 811[2]: 812 https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md 813[3]: https://www.kernel.org/doc/Documentation/hwmon/ 814[4]: https://www.freedesktop.org/wiki/Software/dbus/ 815[5]: https://gerrit.openbmc.org/c/openbmc/docs/+/22257 816[6]: https://gerrit.openbmc.org/c/openbmc/docs/+/24749 817[7]: https://github.com/google/googletest 818[8]: 819 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/ReportManager.interface.yaml 820[9]: 821 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Telemetry/Report.interface.yaml 822[10]: 823 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Object/Delete.interface.yaml 824