1# OpenBMC platform telemetry 2 3Author: 4 Piotr Matuszczak <piotr.matuszczak@intel.com> 5 6Primary assignee: 7 Piotr Matuszczak 8 9Other contributors: 10 Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>, 11 Kamil Kowalski <kamil.kowalski@intel.com> 12 13Created: 14 2019-08-07 15 16## Problem Description 17The BMC on server platform gathers lots of telemetry data, which has to 18be exposed in clean, human readable and standardized format. This document 19focuses on telemetry over the Redfish, since it is standard API 20for platform manageability. 21 22## Background and References 23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1] 24for exposing platform telemetry over the network. 25* OpenBMC platform telemetry shall leverage the 26[OpenBMC sensors architecture implementation][2]. 27* OpenBMC platform telemetry shall implement a service, called Telemetry to deal 28with metrics report and trigger management. This service is described later in 29this document. 30* Although we use the [hwmon][3] to gather readings from physical sensors, this 31architecture does not depend on it, because the Telemetry service component 32relies on the [OpenBMC D-Bus sensors][2]. 33 34 35## Requirements 36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since the 37Telemetry service requires telemetry sources to be implemented as D-Bus sensors. 38 39 40## Proposed Design 41Redfish Telemetry Model shall implement Telemetry Service with the following 42collection resources: 43* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.) 44* Metric Report Definitions - defines how metric report shall be created 45(which metrics it shall contain, how often it shall be generated etc.) 46* Metric Reports - contains actual metric reports containing telemetry data 47generated according to the Metric Report Definitions 48* Metric Triggers - contains thresholds and actions that apply to specific 49metrics 50 51OpenBMC telemetry architecture is shown on the diagram below. 52 53```ascii 54 +--------------+ +----------------+ +-----------------+ 55 |hwmon| | |Dbus sensors| | |Telemetry| | 56 +-----/ | +------------/ | +---------/ | 57 | +--filesystem---> | | | 58 | | | | | | 59 +--------------+ +--------^-------+ +--------^--------+ 60 | | 61 | | 62<------------------------------------------v-----^--DBus----------v-----------> 63 | 64 | 65+-------+---------------------------------------------------------------------+ 66|bmcweb | | | 67+-------/ | | 68| | | 69| +--------+-------------------------------------v--------------------------+ | 70| |Redfish | | | 71| +--------/ +---------+-------+ | | 72| | |Existing | | | | 73| | +------------------------------------------------+ |Redfish | | | | 74| | |Telemetry Service| | |resources| | | | 75| | +----------------+/ | +---------/ | | | 76| | | +----------+ +-----------+ +-------------+ | | +---------+ | | | 77| | | | Metric | | Metric | |Metric report| | | | Redfish | | | | 78| | | | triggers | |definitions| |definitions <---------+ sensors | | | | 79| | | | | | | | | | | | | | | | 80| | | +----+-----+ +-----+-----+ +------+------+ | | +---------+ | | | 81| | | | | | | | | | | 82| | | | | | | | | | | 83| | | | | | | | | | | 84| | | | +-----v-----+ | | | | | | 85| | | | | Metric | | | | | | | 86| | | +--------> report <---------+ | | | | | 87| | | | | | | | | | 88| | | +-----------+ | | | | | 89| | | | | | | | 90| | +------------------------------------------------+ +-----------------+ | | 91| | | | 92| +-------------------------------------------------------------------------+ | 93| | 94+-----------------------------------------------------------------------------+ 95``` 96 97The telemetry service component is a part of Redfish and implements the DMTF's 98[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors 99URIs for metric report creation. Those sensors are also used to get 100URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation 101layer for the telemetry, while Telemetry service is responsible for gathering 102metrics from D-Bus sensors and exposing them as D-Bus objects. Telemetry 103service supports different monitoring modes (periodic, on change and on demand) 104along with aggregated operations: 105* SINGLE - current reading value 106* AVERAGE - average value over defined time period 107* MAX - max reading value during defined time period 108* MIN - min reading value during defined time period 109* SUM - sum of reading values over defined time period 110 111The time period for calculating aggregated is taken from the Redfish Metric 112Definition resource for each sensor's metric. 113 114Telemetry service supports creating and managing metric report, which may 115contain single or multiple metrics from sensors. This metric report is mapped 116to Metric Report for the Redfish Telemetry Service. 117 118The diagram below shows the flows for creation and update of metric report. 119 120```ascii 121+----+ +------+ +---------+ +-------+ 122|User| |bmcweb| |Telemetry| | D-Bus | 123+-+--+ +--+---+ +----+----+ |Sensors| 124 | | | +---+---+ 125 | | | | 126+-----------------------------------------------------------------------------+ 127|Metric report definition flow| | | | 128+-----------------------------+ | | | 129| | | | | | 130| | | | | | 131| | POST request | | | | 132| | with metric | | | | 133| | report | | | | 134| | definition | | | | 135| +--------------------> Invoke AddReport | Register for D-Bus | | 136| | | method on D-Bus | sensors | | 137| | +-----------------------> PropertiesChanged | | 138| | | | signals | | 139| | | +--------------------------> | 140| | | |--------------------------> | 141| | | +--------------------------> | 142| | | | | | 143| | HTTP response | +-+Create Report | | 144| | code 201 with | Return created | |D-Bus object | | 145| | Metric Report | Report D-Bus path <-+ | | 146| | Definition's URI <-----------------------+ | | 147| <--------------------+ | | | 148| | | | | | 149| | | | | | 150+-----------------------------------------------------------------------------+ 151 | | | | 152+-----------------------------------------------------------------------------+ 153|Periodic metric report update flow| | | | 154+----------------------------------+ +-+Metric report | | 155| | | | |timer triggers | | 156| | | <-+report update | | 157| | | | | | 158+----------------------------------Optional-----------------------------------+ 159| | | | | | 160| | Send report as SSE or push-style event | | | 161| | using Redfish Event Service (not shown | | | 162| | here) if configured to do so. | | | 163| <--------------------------------------------+ | | 164| | | | | | 165+-----------------------------------------------------------------------------+ 166| | GET on Metric | | | | 167| | Report URI | | Sensor's Properties- | | 168| +--------------------> | Changed signal | | 169| | +-+Map report's URI <--------------------------+ | 170| | | |to D-Bus path | | | 171| | <-+ | +----------------------+ | | 172| | | Invoke GetAll method | |Note that sensor's | | | 173| | | on report D-Bus | |PropertiesChanged | | | 174| | | object | |signal is asynchronous| | | 175| | +-----------------------> |to metric report timer| | | 176| | | | |This timer is the only| | | 177| | Return metric | Return report data | |thing that triggers | | | 178| | report in JSON <-----------------------+ |metric report update | | | 179| | format | | +----------------------+ | | 180| <--------------------+ | | | 181| | | | | | 182+-----------------------------------------------------------------------------+ 183 | | | | 184+-----------------------------------------------------------------------------+ 185|On change metric report update flow| | Sensor's Properties- | | 186+-----------------------------------+ | Changed signal | | 187| | | <--------------------------+ | 188| | | | | | 189| | | +-+Sensor's signal | | 190| | | | |triggers report | | 191| | | <-+update | | 192| | | | | | 193+----------------------------------Optional-----------------------------------+ 194| | | | | | 195| | Send report as SSE or push-style event | | | 196| | using Redfish Event Service (not shown | | | 197| | here) if configured to do so. | | | 198| <--------------------------------------------+ | | 199| | | | | | 200+-----------------------------------------------------------------------------+ 201| | GET on Metric | | | | 202| | Report URI | | | | 203| +--------------------> | | | 204| | +-+Map report's URI | | | 205| | | |to D-Bus path | +----------------------+ | | 206| | <-+ | |Note that sensor's | | | 207| | | Invoke GetAll method | |PropertiesChanged | | | 208| | | on report D-Bus | |signal triggers the | | | 209| | | object | |report update. It is | | | 210| | +-----------------------> |sufficient that the | | | 211| | | | |signal from only one | | | 212| | Return metric | Return report data | |sensor triggers report| | | 213| | report in JSON <-----------------------+ |update. | | | 214| | format | | +----------------------+ | | 215| <--------------------+ | | | 216| | | | | | 217+-----------------------------------------------------------------------------+ 218 | | | | 219+-+--------------------+------------------------------------------------------+ 220|On demand metric report update flow| | | | 221+-+--------------------+------------+ | | | 222| | | | | | 223| | GET on Metric | | | | 224| | Report URI | | | | 225| +--------------------> | | | 226| | +-+Map report's URI | | | 227| | | |to D-Bus path | | | 228| | <-+ | | | 229| | | | | | 230| | | Invoke the Update | | | 231| | | method for report | | | 232| | | D+Bus object | | | 233| | +-----------------------> | | 234| | | +-+Update method triggers | | 235| | | | |report to be updated | | 236| | | | |with the latest known | | 237| | | | |sensor's readings. | | 238| | | | |No additional sensor | | 239| | | <-+readings are performed. | | 240+----------------------------------Optional-----------------------------------+ 241| | | | | | 242| | Send report as SSE or push-style event | | | 243| | using Redfish Event Service (not shown | | | 244| | here) if configured to do so. | | | 245| <--------------------------------------------+ | | 246| | | | | | 247+-----------------------------------------------------------------------------+ 248| | | Update method call | | | 249| | | result | | | 250| | <-----------------------+ | | 251| | | | | | 252| | | Invoke GetAll method | | | 253| | | on report D-Bus | | | 254| | | object | | | 255| | +-----------------------> | | 256| | | | | | 257| | Return metric | Return report data | | | 258| | report in JSON <-----------------------+ | | 259| | format | | | | 260| <--------------------+ | | | 261| | | | | | 262+-----------------------------------------------------------------------------+ 263 | | | | 264``` 265 266The Redfish implementation in bmcweb is stateless, thus it is not able to 267store metric reports. All operations on metric reports shall be done in 268the Telemetry service. Sending metric report as SSE or push-style events 269shall be done via the [Redfish Event Service][6]. It is marked as optional 270because metric report does not have to be configured for pushing its data 271through the event. 272 273In case of on demand metric report update, Telemetry service performs no 274additional sensor readings because it already has the latest values, since 275they are updated on PropertiesChanged signal from the D-Bus sensors. 276 277**Telemetry service on [D-Bus][4]** 278 279Telemetry service exposes specific interfaces on D-Bus. One of them will be 280used for reading report management. The second one will be used for triggers 281management. 282 283**Reading report management** 284 285The reading report management D-Bus object: 286 287```ascii 288xyz.openbmc_project.Telemetry.ReportsManagement 289/xyz/openbmc_project/Telemetry/Reports 290``` 291The ```ReportsManagement``` supports the following interface apart from 292standard D-Bus interface. 293 294| Name | Type | Signature | Result/Value | Flags | 295|------|------|-----------|--------------|-------| 296|```xyz.openbmc_project.Telemetry.ReportsManagement``` | interface | - | - | - | 297|```.AddReport``` | method | ssuas | s | - | 298|```.MaxReports``` | property | u | 50 | emits-change | 299|```.PollRateResolution``` | property | u | 100 | emits-change | 300 301The ```AddReport``` method is used to create metric report. The report 302may contain single or multiple sensor readings. It is stored in the BMC's 303volatile memory. The method has the following arguments: 304 305| Argument | Type | Description | 306|----------|------|-------------| 307| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> testqrapndyY | 308| ReportingType | string | Reporting type: <br> "xyz.openbmc_project.Telemetry.Metric.Periodic" - For periodic update "xyz.openbmc_project.Telemetry.Metric.OnChange" - For update when value changes "xyz.openbmc_project.Telemetry.Metric.OnRequest" - For update when user requests data | 309| ScanPeriod | uint32_t | Scan period used when Periodic type is set (in milliseconds) | 310| MetricsParams | array of structures | Collection of metric parameters. | 311 312The ```MetricParams``` array entry is a structure containing: 313| Field | Type | Description | 314|----------|------|-------------| 315| Sensor's path | object | D-Bus path, path to the sensor providing readings. | 316| Operation's type | enum | {SINGLE, MAX, MIN, AVG, SUM} - information about aggregated operation. | 317| Metric id | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 318 319The ```ScanPeriod``` is defined per report, thus all sensors listed in the MetricsParams 320collection will be scanned wit the same frequency. Also the ReportingType is 321defined per report. In case when *xyz.openbmc_project.Telemetry.Metric.OnChange* 322ReportingType was defined, metric report will emit signal when at least one 323reading has changed. 324 325The ```AddReport``` method returns: 326```ascii 327String for created report - ie. '/xyz/openbmc_project/Telemetry/Reports/testqrapndyY' 328``` 329 330Such created metric report implements the following interfaces, methods and 331properties (apart from standard D-Bus interface): 332 333| Name | Type | Signature | Result/Value | Flags | 334|------|------|-----------|--------------|-------| 335|```xyz.openbmc_project.Object.Delete``` | interface | - | - | - | 336|```.Delete``` | method | - | - | - | 337|```xyz.openbmc_project.Telemetry.Report``` | interface | - | - | - | 338|```.Update``` | method | - | - | - | 339|```.ReadingParameters``` | property | a(sos) | 1 "/" | emits-change writable | 340|```.Readings``` | property | a(svs) | 0 | emits-change read-only | 341|```.ReportingType``` | property | s | One of reporting type strings| emits-change writable | 342|```.ScanPeriod``` | property | u | 100 | emits-change writable | 343 344The ```Update``` method is defined for the on demand metric report update. It 345shall trigger the ```Readings``` property to be updated and send 346PropertiesChanged signal. 347 348The ```ReadingParameters``` property contains an array of structures containing 349unique metric id, D-Bus sensor path and aggregated operation type. This 350property is made writable in order to support metric report modifications. 351 352| Field Type | Field Description | 353|-------------|----------------------------| 354| string | Unique metric id | 355| object path | D-Bus sensor's path | 356| string | Aggregated operation type | 357 358The Readings property contains the array of the structures containing metric 359unique id, sensor's reading value and reading timestamp. 360 361| Field Type | Field Description | 362|------------|----------------------------| 363| string | Unique metric id | 364| variant | Sensor's reading value | 365| string | Sensor's reading timestamp | 366 367The ```ScanPeriod``` property has single value for the whole metric report. 368The Delete method results in deleting the whole metric report. 369 370The ```MaxReports``` property of 371the ```xyz.openbmc_project.Telemetry.ReportsManagement``` interface contains the 372max number of metric reports supported by the Telemetry service. This property 373is added to be compliant with the Redfish Telemetry Service schema, that 374contains ```MaxReports``` property. 375 376**Trigger management** 377 378The trigger management D-Bus object: 379 380```ascii 381xyz.openbmc_project.Telemetry.TriggersManagement 382/xyz/openbmc_project/Telemetry/Triggers 383``` 384The ```TriggersManagement``` supports the following interface apart from 385standard D-Bus interface. 386 387| Name | Type | Signature | Result/Value | Flags | 388|------|------|-----------|--------------|-------| 389|```xyz.openbmc_project.Telemetry.TriggersManagement``` | interface | - | - | - | 390|```.AddTrigger``` | method | sssv(os) | s | - | 391 392The ```AddTrigger``` method shall be used to create new trigger for the 393certain metric. Triggers are stored in BMC's volatile memory. The method 394has the following arguments: 395 396| Argument | Type | Description | 397|----------|------|-------------| 398| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> trigger0dfvAgVt6 | 399| ActionType | string | Action type: <br> "xyz.openbmc_project.Telemetry.Trigger.Log" - For logging to log service "xyz.openbmc_project.Telemetry.Trigger.Event" - For sending Redfish event "xyz.openbmc_project.Telemetry.Trigger.Update" - For trigger metric report update | 400| MetricType | string | Metric type: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete" - for discrete sensors "xyz.openbmc_project.Telemetry.Trigger.Numeric" - for numeric sensors | 401| TriggerParams | variant | Variant containing structure with either discrete triggers or numeric thresholds. | 402| MetricParam | structure | Structure containing D-Bus sensor's path and unique metric Id and optional D-Bus path to metric report to trigger. | 403 404The ```TriggerParams``` is variant type, which shall contain structure 405depending on the ```MetricType``` value. In case when ```MetricType``` contains 406the ```xyz.openbmc_project.Telemetry.Trigger.Discrete``` value, 407 ```TriggerParams``` shall contain structure with discrete triggers. 408When ```MetricType``` contains 409the ```xyz.openbmc_project.Telemetry.Trigger.Numeric``` value, 410 ```TriggerParams``` shall contain structure with numeric thresholds. 411 412Discrete triggers structure: 413 414| Field | Type | Description | 415|-------|------|-------------| 416| TriggerCondition | string | Discrete trigger condition: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 417| DiscreteTriggers | array of structures | Array of discrete trigger structures. | 418 419Member of DiscreteTriggers array: 420 421| Field | Type | Description | 422|-------|------|-------------| 423| TriggerId| string | Unique trigger Id | 424| Severity | string | Severity: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Critical" - requires immediate attention | 425| Value | variant | Value of discrete metric, that constitutes a trigger event. | 426| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 427 428Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical, 429lower warning and lower critical. Thus it will contain up to 4 structures shown below: 430 431| Field | Type | Description | 432|-------|------|-------------| 433| ThresholdType | string | Numeric trigger type: <br> "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical","xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning","xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical","xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning"| 434| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 435| Activation | string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" | 436| ThresholdValue | variant | Value of reading that will trigger the threshold | 437 438The numeric threshold trigger type meaning: 439 440- "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical" - 441indicates the reading is above normal range and requires immediate attention 442- "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning" - 443indicates the reading is above normal range and may require attention 444- "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical" - 445indicates the reading is below normal range and requires immediate attention 446- "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning" - 447indicates the reading is below normal range and may require attention 448 449The numeric threshold activation meaning: 450 451- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing" - 452trigger action when reading is changing from below to above the threshold's value 453- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing" - 454trigger action when reading is changing from above to below the threshold's value 455- "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" - 456trigger action when reading is crossing the threshold's value in either direction 457described above 458 459The ```MetricParam``` structure contains the following data: 460 461| Field | Type | Description | 462|-------|------|-------------| 463| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. | 464| MetricId | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 465| ReportPath | object path | D-Bus path to Telemetry's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's ActionType is set to "xyz.openbmc_project.Telemetry.Trigger.Update". | 466 467The ```AddTrigger``` method returns: 468```ascii 469String for created trigger - ie. '/xyz/openbmc_project/Telemetry/Triggers/trigger0dfvAgVt6' 470``` 471Such created trigger implements the following interfaces, methods and 472properties (apart from standard D-Bus interface): 473 474| Name | Type | Signature | Result/Value | Flags | 475|------|------|-----------|--------------|-------| 476|```xyz.openbmc_project.Object.Delete``` | interface | - | - | - | 477|```.Delete``` | method | - | - | - | 478|```xyz.openbmc_project.Telemetry.Trigger``` | interface | - | - | - | 479|```.MetricType``` | property | s | One of the MetricType strings | emits-change read-only | 480|```.Triggers``` | property | {sa{ssvu64}} or a{su64sv} | The structure containing triggers. It depends on ```.MetricType``` property how the structure is defined. | emits-change writable | 481|```.ActionType``` | property | s | One of ActionType strings | emits-change writable | 482|```.Metric``` | property | (oso) | Structure containing details of metric, for which trigger is defined. | emits-change writable | 483 484The ```.MetricType``` property contains information about metric type for which 485trigger was created. It can be either discrete or numeric. This property is 486read-only, thus created trigger cannot be changed from discrete to numeric or 487from numeric to discrete. This also determines how the ```.Triggers``` property 488looks like on D-Bus. 489 490If ```.MetricType``` is equal to "xyz.openbmc_project.Telemetry.Trigger.Discrete" 491then ```.Triggers``` property contains discrete trigger that looks like this: 492 493| Type | Description | 494|------|-------------| 495| string | Discrete trigger condition: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; "xyz.openbmc_project.Telemetry.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 496| array of structures | Array of discrete trigger structures. | 497 498Member of DiscreteTriggers array: 499 500| Type | Description | 501|------|-------------| 502| string | Unique trigger Id | 503| string | Severity: <br> "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.Telemetry.Trigger.Discrete.Severity.Critical" - requires immediate attention | 504| variant | Value of discrete metric, that constitutes a trigger event. | 505| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 506 507If ```.MetricType``` is equal to "xyz.openbmc_project.Telemetry.Trigger.Numeric" 508then ```.Triggers``` property contains numeric trigger that is an array of 4 structures 509presented below: 510 511| Type | Description | 512|------|-------------| 513| string | Numeric trigger type: <br> "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperCritical", "xyz.openbmc_project.Telemetry.Trigger.Numeric.UpperWarning", "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerCritical", "xyz.openbmc_project.Telemetry.Trigger.Numeric.LowerWarning"| 514| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 515| string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.Telemetry.Trigger.Numeric.Activation.Either" | 516| variant | Value of reading that will trigger the threshold | 517 518The ```.Metric``` property stores the details about reading, for which trigger was defined. 519It is in a form of structure consisting of three fields. 520 521| Field type | Description | 522|------------|--------------| 523| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. | 524| string | Unique metric Id | 525| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Telemetry service. | 526 527**Trigger operations** 528 529Triggers support three types of operation: Log, Event and Update. For each, 530there is a different way of proceeding. 531 5321. For action Log, the event shall 533be logged to the system journal. In this case the Telemetry service writes 534data to system journal using libjournal. The Redfish log service shall then 535retrieve the data by reading system journal. All is shown on the diagram below. 536 537```ascii 538+---------------------------+ 539|bmcweb| | +----------------------+ 540+------/ +-----------+-+ | |Telemetry| | 541| |Redfish | | | +---------/ | 542| |log service| | | | | 543| +-----------/ | | | | 544| | | | | | 545| | | | | | 546| +------^------+ | +-----------+----------+ 547+---------------------------+ | 548 | | 549 +----collect----+ event 550 journal entry | (write to journal) 551 | | 552 +------------------------------------+ | 553 |systemd| | | | 554 +-------/ +----------+ +---+------+ | | 555 | |journal| | |libjournal| | | 556 | +-------/ <--> <-------+ 557 | | | +----------+ | 558 | | | | 559 | | | | 560 | +----------+ | 561 | | 562 +------------------------------------+ 563``` 5642. For action Event, the Telemetry service shall send event using the 565[Redfish Event Service][6] either as push-style event or SSE. 566 5673. For action Update, the Telemetry service will trigger the update of reading 568report pointed by it's D-Bus path contained in ReportPath property inside 569the ```.Metric``` structure. The update shall cause the reading report's D-Bus 570object to emit property change signal. This will cause Redfish Metric Report to 571be streamed out if it was configured to do so. 572 573**Redfish Telemetry Service API** 574 575Redfish Telemetry Service shall support 2019.1 Redfish schemas for telemetry 576resources. Metric report definitions determines which metrics are to be include 577in metric report. Metric definition is assigned to particular metric type and it 578describes how the metric should be interpreted. The following resource schemas 579shall be supported: 580 581- TelemetryService 1.1.2 582- MetricDefinition 1.0.3 583- MetricReportDefinition 1.3.0 584- MetricReport 1.2.0 585- Triggers 1.1.1 586 587The following diagram shows relations between these resources. 588 589```ascii 590 +----------------------------------------------------------------------------+ 591 | Service root | 592 +----------------------------------+-------------------------------+---------+ 593 | | 594 | | 595 | | 596 +----------------------------------v-----------------+ +----------v---------+ 597 | | |Chassis | 598 | Telemetry Service | | | 599 | | | | 600 | | | +---------------+ | 601 +---------+--------------+------------------+--------+ | | | | 602 | | | | | Chassis 1 | | 603 | | | | | | | 604 | | | | +---------+-----+ | 605 | | | | | | 606+----------v--+ +--------v----+ +----------v-----+ +--------------------+ 607|Triggers | |Metric | |Metric report | | 608| | |definition | | | | 609| | | +---------+ | | | Reads | 610| +---------+ | | |Reading | | | +-----------+ | ReadingVolts +--v------+ 611| | | | | |Volts <------+ +------------------> | 612| |Trigger 1| | | +---------+ | | | Metric | | | | 613| | | | | | | | report 1 | | Reads | Power | 614| | | | | +---------+ | | | | | PowerConsumed | | 615| | | | | | | | | | | | Watts | | 616| +--+---+--+ | | |Power <------+ +------------------> | 617| | | | | |Consumed | | | +-----^-----+ | +----^----+ 618| | | | | |Watts | | | | | | 619| | | | | +---------+ | | | | | 620| | | | | | | | | | 621+-------------+ +-------------+ +----------------+ | 622 | | | | 623 | | Triggers report update | | 624 | | (when applicable) | | 625 | +--------------------------------+ | 626 | | 627 | Monitors PowerConsumedWatts to check | 628 | whether trigger value is exceeded | 629 +------------------------------------------------------------------+ 630``` 631 632The diagram shows the relations between Redfish resources. Metric report is 633defined to be generated periodically, on demand or on change. Each metric in the 634Metric Report contains the URI to its metric definition and Redfish sensor, 635which reading value is presented. Nevertheless, under this presentation layer, 636Telemetry is gathering D-Bus sensors readings and exposing them 637in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor 638is mapped to Redfish sensor. 639 640Below examples of Redfish resources for the Telemetry Service are shown. 641 642The Telemetry Service Redfish resource example: 643 644```json 645{ 646 "@odata.type": "#TelemetryService.v1_1_2.TelemetryService", 647 "Id": "TelemetryService", 648 "Name": "Telemetry Service", 649 "Status": { 650 "State": "Enabled", 651 "Health": "OK" 652 }, 653 "MinCollectionInterval": "T00:00:10s", 654 "SupportedCollectionFunctions": [], 655 "MaxReports": <max_no_of_reports>, 656 "MetricDefinitions": { 657 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions" 658 }, 659 "MetricReportDefinitions": { 660 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions" 661 }, 662 "MetricReports": { 663 "@odata.id": "/redfish/v1/TelemetryService/MetricReports" 664 }, 665 "Triggers": { 666 "@odata.id": "/redfish/v1/TelemetryService/Triggers" 667 }, 668 "LogService": { 669 "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal" 670 }, 671 "@odata.context": "/redfish/v1/$metadata#TelemetryService", 672 "@odata.id": "/redfish/v1/TelemetryService" 673} 674``` 675 676Sample metric report definition: 677 678```json 679{ 680 "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition", 681 "Id": "SampleMetric", 682 "Name": "Sample Metric Report Definition", 683 "MetricReportDefinitionType": "Periodic", 684 "Schedule": { 685 "RecurrenceInterval": "T00:00:10" 686 }, 687 "ReportActions": [ 688 "LogToMetricReportsCollection" 689 ], 690 "ReportUpdates": "Overwrite", 691 "MetricReport": { 692 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric" 693 }, 694 "Status": { 695 "State": "Enabled" 696 }, 697 "Metrics": [ 698 { 699 "MetricId": "Test", 700 "MetricProperties": [ 701 "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts" 702 ] 703 } 704 ], 705 "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition", 706 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage" 707} 708``` 709 710Sample metric report: 711 712```json 713{ 714 "@odata.type": "#MetricReport.v1_2_0.MetricReport", 715 "Id": "SampleMetric", 716 "Name": "Sample Metric Report", 717 "ReportSequence": "0", 718 "MetricReportDefinition": { 719 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 720 }, 721 "MetricValues": [ 722 { 723 "MetricDefinition": { 724 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition" 725 }, 726 "MetricId": "Test", 727 "MetricValue": "100", 728 "Timestamp": "2016-11-08T12:25:00-05:00", 729 "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 730 } 731 ], 732 "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport", 733 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage" 734} 735``` 736 737Sample trigger, that will trigger metric report update: 738 739```json 740{ 741 "@odata.type": "#Triggers.v1_1_1.Triggers", 742 "Id": "SampleTrigger", 743 "Name": "Sample Trigger", 744 "MetricType": "Numeric", 745 "Links": { 746 "MetricReportDefinitions": [ 747 "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 748 ] 749 }, 750 "Status": { 751 "State": "Enabled" 752 }, 753 "TriggerActions": [ 754 "RedfishMetricReport" 755 ], 756 "NumericThresholds": { 757 "UpperCritical": { 758 "Reading": 50, 759 "Activation": "Increasing", 760 "DwellTime": "PT0.001S" 761 }, 762 "UpperWarning": { 763 "Reading": 48.1, 764 "Activation": "Increasing", 765 "DwellTime": "PT0.004S" 766 } 767 }, 768 "MetricProperties": [ 769 "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 770 ], 771 "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers", 772 "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers" 773} 774``` 775 776**Performance tests** 777 778Performance test were conducted on the AST2500 system with 64 MB flash and 779512 MB RAM. Flash consumption by the Telemetry service is 197.5 kB. The 780runtime statistics are shown in the table below. The reading report is 781mapped into single Metric Report. The runtime data is collected for the 782Telemetry component only. All reports was created with 783```xyz.openbmc_project.Telemetry.Metric.OnChange``` property to 784maximize the workload. In the configuration with 50 reports and 50 sensors 785it is about 200 new readings per second, generating 200 reading reports 786per second. The table shows CPU usage and memory usage. The VSZ is the amount 787of memory mapped into the address space of the process. It includes pages 788backed by the process' executable file and shared libraries, its heap and 789stack, as well as anything else it has mapped. 790 791 792| Telemetry service state | VSZ | %VSZ | %CPU | 793|--------------------------------------------------|------|------|------| 794| Idle (0 reports, 0 sensors) |5188 B| 1% | 0% | 795| 1 report, 1 sensor |5188 B| 1% | 1% | 796| 2 reports, 1 sensor |5188 B| 1% | 1% | 797| 2 reports, 2 sensors (1 sensor per report) |5188 B| 1% | 1% | 798| 1 report, 10 sensors |5188 B| 1% | 1% | 799| 10 reports, 10 sensors (same for each report) |5320 B| 1% | 1-2% | 800| 2 reports, 20 sensors (10 per report) |5188 B| 1% | 1% | 801| 30 reports, 30 sensors (10 per report) |5444 B| 1% | 5-9% | 802| 50 reports, 50 sensors (10 per report) |5572 B| 1% |11-14%| 803 804The last two configurations use 10 sensors per reading report, which gives 8053 or 5 distinctive configurations. Each such configuration is used to 806create 10 reading reports to obtain the desired amount of 30 or 50 reading 807reports. 808 809In this architecture reading report is created every time when Redfish 810Metric Report Definition is posted (creating new Metric Report). 811 812## Alternatives Considered 813The [framework based on collectd/librrd][5] was considered as alternate design. 814Although it seems to be versatile and scalable solution, it has some drawbacks 815from our point of view: 816* Collectd's footprint in the minimal working configuration is around 2.6 MB, 817while available space for the OpenBMC is limited to 64 MB. 818* In this design, librrd is used to store metrics on the BMC's non-volatile 819storage, which may be an issue, when lots of metrics are captured and stored 820to OpenBMC's limited storage space. Also flash wear-out issue may occur, when 821metrics are captured frequently (like once per second). 822* Telemetry service is directly compatible with Redfish Telemetry Service API, 823which means, that Telemetry's reading reports can be directly mapped to Redfish 824Metric Reports. 825* Telemetry service unifies the way how the BMC's telemetry is exposed over 826the Redfish and may be used with multiple front-ends, thus there is no problem 827to add support telemetry over IPMI or any other API. 828 829Since this design assumes flexibility and modularity, there is no obstacles to 830use collectd in cooperation with Telemetry. The one of possible configurations 831is shown on the diagram below. 832 833```ascii 834 +-----------------+ +-----------------+ 835 | D-Bus sensors | | Telemetry | 836 +--------^--------+ +--------^--------+ 837 | | 838 | | 839 | | 840<--------^--v-----------D-Bus--------v-^----------> 841 | | 842 | | 843 | | 844 +-------v------------+ +----------v--------+ 845 | collectd metrics | | | 846 | exposed as D-Bus | | bmcweb | 847 | sensors | | (with Redfish | 848 +---------^----------+ | Telemetry | 849 | | Service) | 850 | | | 851 +------+-------+ +-------------------+ 852 | | 853 | collectd | 854 | | 855 +--------------+ 856``` 857Here collectd is used as the source of some set of metrics. It exposes them 858as the D-Bus sensors, which can easily be consumed either by the bmcweb and 859Telemetry service without any changes in their D-Bus interfaces. In such 860configuration Telemetry service provides metric reports and triggers 861management. 862 863Other possible configuration is to use collectd without the Telemetry service, 864but in such case, collectd does not provide metric reports and triggers support 865compatible with the Redfish. In such case, Redfish Telemetry Service won't be 866supported or metric reports and triggers support has to be provided by the 867collectd. 868 869## Impacts 870This design impacts the architecture of the bmcweb component, since it adds 871the Redfish Telemetry Service implementation as a component for the existing 872Redfish API implementation. 873 874## Testing 875This is the very high-level description of the proposed set of tests. 876Testing shall be done on three basic levels: 877* Unit tests 878* Functional tests 879* Performance tests 880 881**Unit tests** 882 883The Telemetry's code shall be covered by the unit tests. The preferred 884framework is the [GTest/GMock][7]. The unit tests shall be ran before code 885change is to be committed to make sure, that nothing is broken in existing 886functionality. Also, when new code is introduced, a new set of unit tests shall 887be committed with it according to test-driven development principle. Unit tests 888shall be also carefully reviewed. 889 890**Functional tests** 891 892Functional tests will be divided into two steps. 893 894First step is for testing the Telemetry metric reports management. Test scenario 895shall contain creating metric report by POSTing proper metric report definition, 896reading metric report (using GET on proper URI) and deleting the metric report. 897The required configuration for such test is D-Bus sensors (at least some of 898them) and bmcweb with Redfish Telemetry Service implemented. The tests shall be 899performed on real hardware. For ease of metric testing, dummy D-Bus sensors may 900be provided to provide specifically prepared metrics. This configuration shall 901also enable testing aggregated operations (MIN, MAX, SUM, AVG). 902 903Second step is to test triggers and events generation. This will require also 904Event Service to be implemented along with Log Service. Tests shall cover all 905scenarios with sending metric report as an event, triggering metric report 906update and logging events. 907 908**Performance tests** 909 910Performance tests shall be done using full OpenBMC configuration with all 911the required set of features. The tests shall create a lot of metric reports 912(up to maximum number) along with all possible triggers. Measurements shall 913cover the periodic metric report jitter, delays in event logging or sending, 914BMC's CPU utilization and the performance impact on other services. 915 916[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip 917[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md 918[3]: https://www.kernel.org/doc/Documentation/hwmon/ 919[4]: https://www.freedesktop.org/wiki/Software/dbus/ 920[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257 921[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749 922[7]: https://github.com/google/googletest 923