1# OpenBMC platform telemetry 2 3Author: 4 Piotr Matuszczak <piotr.matuszczak@intel.com> 5 6Primary assignee: 7 Piotr Matuszczak 8 9Other contributors: 10 Pawel Rapkiewicz <pawel.rapkiewicz@intel.com> <pawelr>, 11 Kamil Kowalski <kamil.kowalski@intel.com> 12 13Created: 14 2019-08-07 15 16## Problem Description 17The BMC on server platform gathers lots of telemetry data, which has to 18be exposed in clean, human readable and standardized format. This document 19focuses on telemetry over the Redfish, since it is standard API 20for platform manageability. 21 22## Background and References 23* OpenBMC platform telemetry shall leverage DMTF's [Redfish Telemetry Model][1] 24for exposing platform telemetry over the network. 25* OpenBMC platform telemetry shall leverage the 26[OpenBMC sensors architecture implementation][2]. 27* OpenBMC platform telemetry shall implement a service, called Monitoring 28Service to deal with metrics report and trigger management. This service 29is described later in this document. 30* Although we use the [hwmon][3] to gather readings from physical sensors, this 31architecture does not depend on it, because the Monitoring Service component 32relies on the [OpenBMC D-Bus sensors][2]. 33 34 35## Requirements 36* [OpenBMC D-Bus sensors][2] support. This is also design limitation, since 37the Monitoring Service requires telemetry sources to be implemented as 38D-Bus sensors. 39 40 41## Proposed Design 42Redfish Telemetry Model shall implement Telemetry Service with the following 43collection resources: 44* Metric Definitions - contains the metadata for metrics (unit, accuracy, etc.) 45* Metric Report Definitions - defines how metric report shall be created 46(which metrics it shall contain, how often it shall be generated etc.) 47* Metric Reports - contains actual metric reports containing telemetry data 48generated according to the Metric Report Definitions 49* Metric Triggers - contains thresholds and actions that apply to specific 50metrics 51 52OpenBMC telemetry architecture is shown on the diagram below. 53 54```ascii 55 +--------------+ +----------------+ +-----------------+ 56 |hwmon| | |Dbus sensors| | |Monitoring| | 57 +-----/ | +------------/ | |service | | 58 | +--filesystem---> | +----------/ | 59 | | | | | | 60 +--------------+ +--------^-------+ +--------^--------+ 61 | | 62 | | 63<------------------------------------------v-----^--DBus----------v-----------> 64 | 65 | 66+-------+---------------------------------------------------------------------+ 67|bmcweb | | | 68+-------/ | | 69| | | 70| +--------+-------------------------------------v--------------------------+ | 71| |Redfish | | | 72| +--------/ +---------+-------+ | | 73| | |Existing | | | | 74| | +------------------------------------------------+ |Redfish | | | | 75| | |Telemetry Service| | |resources| | | | 76| | +----------------+/ | +---------/ | | | 77| | | +----------+ +-----------+ +-------------+ | | +---------+ | | | 78| | | | Metric | | Metric | |Metric report| | | | Redfish | | | | 79| | | | triggers | |definitions| |definitions <---------+ sensors | | | | 80| | | | | | | | | | | | | | | | 81| | | +----+-----+ +-----+-----+ +------+------+ | | +---------+ | | | 82| | | | | | | | | | | 83| | | | | | | | | | | 84| | | | | | | | | | | 85| | | | +-----v-----+ | | | | | | 86| | | | | Metric | | | | | | | 87| | | +--------> report <---------+ | | | | | 88| | | | | | | | | | 89| | | +-----------+ | | | | | 90| | | | | | | | 91| | +------------------------------------------------+ +-----------------+ | | 92| | | | 93| +-------------------------------------------------------------------------+ | 94| | 95+-----------------------------------------------------------------------------+ 96``` 97 98The telemetry service component is a part of Redfish and implements the DMTF's 99[Redfish Telemetry Model][1]. Metric report definitions uses Redfish sensors 100URIs for metric report creation. Those sensors are also used to get 101URI->D-Bus sensor mapping. Redfish Telemetry Service acts as presentation 102layer for the telemetry, while Monitoring Service is responsible for gathering 103metrics from D-Bus sensors and exposing them as D-Bus objects. Monitoring 104Service supports different monitoring modes (periodic, on change and on demand) 105along with aggregated operations: 106* SINGLE - current reading value 107* AVERAGE - average value over defined time period 108* MAX - max reading value during defined time period 109* MIN - min reading value during defined time period 110* SUM - sum of reading values over defined time period 111 112The time period for calculating aggregated is taken from the Redfish Metric 113Definition resource for each sensor's metric. 114 115Monitoring Service supports creating and managing metric report, which may 116contain single or multiple metrics from sensors. This metric report is mapped 117to Metric Report for the Redfish Telemetry Service. 118 119The diagram below shows the flows for creation and update of metric report. 120 121```ascii 122+----+ +------+ +----------+ +-------+ 123|User| |bmcweb| |Monitoring| | D-Bus | 124+-+--+ +--+---+ | Service | |Sensors| 125 | | +----------+ +---+---+ 126 | | | | 127+-----------------------------------------------------------------------------+ 128|Metric report definition flow| | | | 129+-----------------------------+ | | | 130| | | | | | 131| | | | | | 132| | POST request | | | | 133| | with metric | | | | 134| | report | | | | 135| | definition | | | | 136| +--------------------> Invoke AddReport | Register for D-Bus | | 137| | | method on D-Bus | sensors | | 138| | +-----------------------> PropertiesChanged | | 139| | | | signals | | 140| | | +--------------------------> | 141| | | |--------------------------> | 142| | | +--------------------------> | 143| | | | | | 144| | HTTP response | +-+Create Report | | 145| | code 201 with | Return created | |D-Bus object | | 146| | Metric Report | Report D-Bus path <-+ | | 147| | Definition's URI <-----------------------+ | | 148| <--------------------+ | | | 149| | | | | | 150| | | | | | 151+-----------------------------------------------------------------------------+ 152 | | | | 153+-----------------------------------------------------------------------------+ 154|Periodic metric report update flow| | | | 155+----------------------------------+ +-+Metric report | | 156| | | | |timer triggers | | 157| | | <-+report update | | 158| | | | | | 159+----------------------------------Optional-----------------------------------+ 160| | | | | | 161| | Send report as SSE or push-style event | | | 162| | using Redfish Event Service (not shown | | | 163| | here) if configured to do so. | | | 164| <--------------------------------------------+ | | 165| | | | | | 166+-----------------------------------------------------------------------------+ 167| | GET on Metric | | | | 168| | Report URI | | Sensor's Properties- | | 169| +--------------------> | Changed signal | | 170| | +-+Map report's URI <--------------------------+ | 171| | | |to D-Bus path | | | 172| | <-+ | +----------------------+ | | 173| | | Invoke GetAll method | |Note that sensor's | | | 174| | | on report D-Bus | |PropertiesChanged | | | 175| | | object | |signal is asynchronous| | | 176| | +-----------------------> |to metric report timer| | | 177| | | | |This timer is the only| | | 178| | Return metric | Return report data | |thing that triggers | | | 179| | report in JSON <-----------------------+ |metric report update | | | 180| | format | | +----------------------+ | | 181| <--------------------+ | | | 182| | | | | | 183+-----------------------------------------------------------------------------+ 184 | | | | 185+-----------------------------------------------------------------------------+ 186|On change metric report update flow| | Sensor's Properties- | | 187+-----------------------------------+ | Changed signal | | 188| | | <--------------------------+ | 189| | | | | | 190| | | +-+Sensor's signal | | 191| | | | |triggers report | | 192| | | <-+update | | 193| | | | | | 194+----------------------------------Optional-----------------------------------+ 195| | | | | | 196| | Send report as SSE or push-style event | | | 197| | using Redfish Event Service (not shown | | | 198| | here) if configured to do so. | | | 199| <--------------------------------------------+ | | 200| | | | | | 201+-----------------------------------------------------------------------------+ 202| | GET on Metric | | | | 203| | Report URI | | | | 204| +--------------------> | | | 205| | +-+Map report's URI | | | 206| | | |to D-Bus path | +----------------------+ | | 207| | <-+ | |Note that sensor's | | | 208| | | Invoke GetAll method | |PropertiesChanged | | | 209| | | on report D-Bus | |signal triggers the | | | 210| | | object | |report update. It is | | | 211| | +-----------------------> |sufficient that the | | | 212| | | | |signal from only one | | | 213| | Return metric | Return report data | |sensor triggers report| | | 214| | report in JSON <-----------------------+ |update. | | | 215| | format | | +----------------------+ | | 216| <--------------------+ | | | 217| | | | | | 218+-----------------------------------------------------------------------------+ 219 | | | | 220+-+--------------------+------------------------------------------------------+ 221|On demand metric report update flow| | | | 222+-+--------------------+------------+ | | | 223| | | | | | 224| | GET on Metric | | | | 225| | Report URI | | | | 226| +--------------------> | | | 227| | +-+Map report's URI | | | 228| | | |to D-Bus path | | | 229| | <-+ | | | 230| | | | | | 231| | | Invoke the Update | | | 232| | | method for report | | | 233| | | D+Bus object | | | 234| | +-----------------------> | | 235| | | +-+Update method triggers | | 236| | | | |report to be updated | | 237| | | | |with the latest known | | 238| | | | |sensor's readings. | | 239| | | | |No additional sensor | | 240| | | <-+readings are performed. | | 241+----------------------------------Optional-----------------------------------+ 242| | | | | | 243| | Send report as SSE or push-style event | | | 244| | using Redfish Event Service (not shown | | | 245| | here) if configured to do so. | | | 246| <--------------------------------------------+ | | 247| | | | | | 248+-----------------------------------------------------------------------------+ 249| | | Update method call | | | 250| | | result | | | 251| | <-----------------------+ | | 252| | | | | | 253| | | Invoke GetAll method | | | 254| | | on report D-Bus | | | 255| | | object | | | 256| | +-----------------------> | | 257| | | | | | 258| | Return metric | Return report data | | | 259| | report in JSON <-----------------------+ | | 260| | format | | | | 261| <--------------------+ | | | 262| | | | | | 263+-----------------------------------------------------------------------------+ 264 | | | | 265``` 266 267The Redfish implementation in bmcweb is stateless, thus it is not able to 268store metric reports. All operations on metric reports shall be done in 269the Monitoring Service. Sending metric report as SSE or push-style events 270shall be done via the [Redfish Event Service][6]. It is marked as optional 271because metric report does not have to be configured for pushing its data 272through the event. 273 274In case of on demand metric report update, Monitoring Service performs no 275additional sensor readings because it already has the latest values, since 276they are updated on PropertiesChanged signal from the D-Bus sensors. 277 278**Monitoring service on [D-Bus][4]** 279 280Monitoring service exposes specific interfaces on D-Bus. One of them will be 281used for reading report management. The second one will be used for triggers 282management. 283 284**Reading report management** 285 286The reading report management D-Bus object: 287 288```ascii 289xyz.openbmc_project.MonitoringService.ReportsManagement 290/xyz/openbmc_project/MonitoringService/Reports 291``` 292The ```ReportsManagement``` supports the following interface apart from 293standard D-Bus interface. 294 295| Name | Type | Signature | Result/Value | Flags | 296|------|------|-----------|--------------|-------| 297|```xyz.openbmc_project.MonitoringService.ReportsManagement``` | interface | - | - | - | 298|```.AddReport``` | method | ssuas | s | - | 299|```.MaxReports``` | property | u | 50 | emits-change | 300|```.PollRateResolution``` | property | u | 100 | emits-change | 301 302The ```AddReport``` method is used to create metric report. The report 303may contain single or multiple sensor readings. It is stored in the BMC's 304volatile memory. The method has the following arguments: 305 306| Argument | Type | Description | 307|----------|------|-------------| 308| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> testqrapndyY | 309| ReportingType | string | Reporting type: <br> "xyz.openbmc_project.MonitoringService.Metric.Periodic" - For periodic update "xyz.openbmc_project.MonitoringService.Metric.OnChange" - For update when value changes "xyz.openbmc_project.MonitoringService.Metric.OnRequest" - For update when user requests data | 310| ScanPeriod | uint32_t | Scan period used when Periodic type is set (in milliseconds) | 311| MetricsParams | array of structures | Collection of metric parameters. | 312 313The ```MetricParams``` array entry is a structure containing: 314| Field | Type | Description | 315|----------|------|-------------| 316| Sensor's path | object | D-Bus path, path to the sensor providing readings. | 317| Operation's type | enum | {SINGLE, MAX, MIN, AVG, SUM} - information about aggregated operation. | 318| Metric id | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 319 320The ```ScanPeriod``` is defined per report, thus all sensors listed in the MetricsParams 321collection will be scanned wit the same frequency. Also the ReportingType is 322defined per report. In case when *xyz.openbmc_project.MonitoringService.Metric.OnChange* 323ReportingType was defined, metric report will emit signal when at least one 324reading has changed. 325 326The ```AddReport``` method returns: 327```ascii 328String for created report - ie. '/xyz/openbmc_project/MonitoringService/Reports/testqrapndyY' 329``` 330 331Such created metric report implements the following interfaces, methods and 332properties (apart from standard D-Bus interface): 333 334| Name | Type | Signature | Result/Value | Flags | 335|------|------|-----------|--------------|-------| 336|```xyz.openbmc_project.Object.Delete``` | interface | - | - | - | 337|```.Delete``` | method | - | - | - | 338|```xyz.openbmc_project.MonitoringService.Report``` | interface | - | - | - | 339|```.Update``` | method | - | - | - | 340|```.ReadingParameters``` | property | a(sos) | 1 "/" | emits-change writable | 341|```.Readings``` | property | a(svs) | 0 | emits-change read-only | 342|```.ReportingType``` | property | s | One of reporting type strings| emits-change writable | 343|```.ScanPeriod``` | property | u | 100 | emits-change writable | 344 345The ```Update``` method is defined for the on demand metric report update. It 346shall trigger the ```Readings``` property to be updated and send 347PropertiesChanged signal. 348 349The ```ReadingParameters``` property contains an array of structures containing 350unique metric id, D-Bus sensor path and aggregated operation type. This 351property is made writable in order to support metric report modifications. 352 353| Field Type | Field Description | 354|-------------|----------------------------| 355| string | Unique metric id | 356| object path | D-Bus sensor's path | 357| string | Aggregated operation type | 358 359The Readings property contains the array of the structures containing metric 360unique id, sensor's reading value and reading timestamp. 361 362| Field Type | Field Description | 363|------------|----------------------------| 364| string | Unique metric id | 365| variant | Sensor's reading value | 366| string | Sensor's reading timestamp | 367 368The ```ScanPeriod``` property has single value for the whole metric report. 369The Delete method results in deleting the whole metric report. 370 371The ```MaxReports``` property of 372the ```xyz.openbmc_project.MonitoringService.ReportsManagement``` interface 373contains the max number of metric reports supported by the Monitoring Service. 374This property is added to be compliant with the Redfish Telemetry Service 375schema, that contains ```MaxReports``` property. 376 377**Trigger management** 378 379The trigger management D-Bus object: 380 381```ascii 382xyz.openbmc_project.MonitoringService.TriggersManagement 383/xyz/openbmc_project/MonitoringService/Triggers 384``` 385The ```TriggersManagement``` supports the following interface apart from 386standard D-Bus interface. 387 388| Name | Type | Signature | Result/Value | Flags | 389|------|------|-----------|--------------|-------| 390|```xyz.openbmc_project.MonitoringService.TriggersManagement``` | interface | - | - | - | 391|```.AddTrigger``` | method | sssv(os) | s | - | 392 393The ```AddTrigger``` method shall be used to create new trigger for the 394certain metric. Triggers are stored in BMC's volatile memory. The method 395has the following arguments: 396 397| Argument | Type | Description | 398|----------|------|-------------| 399| Prefix | string | Defines prefix for report so it will be "<prefix\><randomString\>" i.e.: for prefix "test" -> trigger0dfvAgVt6 | 400| ActionType | string | Action type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Log" - For logging to log service "xyz.openbmc_project.MonitoringService.Trigger.Event" - For sending Redfish event "xyz.openbmc_project.MonitoringService.Trigger.Update" - For trigger metric report update | 401| MetricType | string | Metric type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete" - for discrete sensors "xyz.openbmc_project.MonitoringService.Trigger.Numeric" - for numeric sensors | 402| TriggerParams | variant | Variant containing structure with either discrete triggers or numeric thresholds. | 403| MetricParam | structure | Structure containing D-Bus sensor's path and unique metric Id and optional D-Bus path to metric report to trigger. | 404 405The ```TriggerParams``` is variant type, which shall contain structure 406depending on the ```MetricType``` value. In case when ```MetricType``` contains 407the ```xyz.openbmc_project.MonitoringService.Trigger.Discrete``` value, 408 ```TriggerParams``` shall contain structure with discrete triggers. 409When ```MetricType``` contains 410the ```xyz.openbmc_project.MonitoringService.Trigger.Numeric``` value, 411 ```TriggerParams``` shall contain structure with numeric thresholds. 412 413Discrete triggers structure: 414 415| Field | Type | Description | 416|-------|------|-------------| 417| TriggerCondition | string | Discrete trigger condition: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 418| DiscreteTriggers | array of structures | Array of discrete trigger structures. | 419 420Member of DiscreteTriggers array: 421 422| Field | Type | Description | 423|-------|------|-------------| 424| TriggerId| string | Unique trigger Id | 425| Severity | string | Severity: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Critical" - requires immediate attention | 426| Value | variant | Value of discrete metric, that constitutes a trigger event. | 427| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 428 429Numeric thresholds structure shall contain up to 4 thresholds: upper warning, upper critical, 430lower warning and lower critical. Thus it will contain up to 4 structures shown below: 431 432| Field | Type | Description | 433|-------|------|-------------| 434| ThresholdType | string | Numeric trigger type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical","xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning","xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical","xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning"| 435| DwellTime | uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 436| Activation | string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" | 437| ThresholdValue | variant | Value of reading that will trigger the threshold | 438 439The numeric threshold trigger type meaning: 440 441- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical" - 442indicates the reading is above normal range and requires immediate attention 443- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning" - 444indicates the reading is above normal range and may require attention 445- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical" - 446indicates the reading is below normal range and requires immediate attention 447- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning" - 448indicates the reading is below normal range and may require attention 449 450The numeric threshold activation meaning: 451 452- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing" - 453trigger action when reading is changing from below to above the threshold's value 454- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing" - 455trigger action when reading is changing from above to below the threshold's value 456- "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" - 457trigger action when reading is crossing the threshold's value in either direction 458described above 459 460The ```MetricParam``` structure contains the following data: 461 462| Field | Type | Description | 463|-------|------|-------------| 464| SensorPath | object path | D-Bus path to sensor, for which trigger is defined. | 465| MetricId | string | Contains unique metric id, that can be mapped to Redfish MetricId. | 466| ReportPath | object path | D-Bus path to Monitoring Service's reading report which update shall be triggered when trigger condition occurs. This is optional and shall be filled when trigger's ActionType is set to "xyz.openbmc_project.MonitoringService.Trigger.Update". | 467 468The ```AddTrigger``` method returns: 469```ascii 470String for created trigger - ie. '/xyz/openbmc_project/MonitoringService/Triggers/trigger0dfvAgVt6' 471``` 472Such created trigger implements the following interfaces, methods and 473properties (apart from standard D-Bus interface): 474 475| Name | Type | Signature | Result/Value | Flags | 476|------|------|-----------|--------------|-------| 477|```xyz.openbmc_project.Object.Delete``` | interface | - | - | - | 478|```.Delete``` | method | - | - | - | 479|```xyz.openbmc_project.MonitoringService.Trigger``` | interface | - | - | - | 480|```.MetricType``` | property | s | One of the MetricType strings | emits-change read-only | 481|```.Triggers``` | property | {sa{ssvu64}} or a{su64sv} | The structure containing triggers. It depends on ```.MetricType``` property how the structure is defined. | emits-change writable | 482|```.ActionType``` | property | s | One of ActionType strings | emits-change writable | 483|```.Metric``` | property | (oso) | Structure containing details of metric, for which trigger is defined. | emits-change writable | 484 485The ```.MetricType``` property contains information about metric type for which 486trigger was created. It can be either discrete or numeric. This property is 487read-only, thus created trigger cannot be changed from discrete to numeric or 488from numeric to discrete. This also determines how the ```.Triggers``` property 489looks like on D-Bus. 490 491If ```.MetricType``` is equal to "xyz.openbmc_project.MonitoringService.Trigger.Discrete" 492then ```.Triggers``` property contains discrete trigger that looks like this: 493 494| Type | Description | 495|------|-------------| 496| string | Discrete trigger condition: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Changed" - trigger ocurs when value of metric has changed; "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Specified" - trigger occurs when value of metric becomes one of the values listed in the discrete triggers. | 497| array of structures | Array of discrete trigger structures. | 498 499Member of DiscreteTriggers array: 500 501| Type | Description | 502|------|-------------| 503| string | Unique trigger Id | 504| string | Severity: <br> "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.OK" - normal, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Warning" - requires attention, "xyz.openbmc_project.MonitoringService.Trigger.Discrete.Severity.Critical" - requires immediate attention | 505| variant | Value of discrete metric, that constitutes a trigger event. | 506| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 507 508If ```.MetricType``` is equal to "xyz.openbmc_project.MonitoringService.Trigger.Numeric" 509then ```.Triggers``` property contains numeric trigger that is an array of 4 structures 510presented below: 511 512| Type | Description | 513|------|-------------| 514| string | Numeric trigger type: <br> "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperCritical", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.UpperWarning", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerCritical", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.LowerWarning"| 515| uint64 | Time in milliseconds that a trigger occurrence persists before the action defined in the ```ActionType``` is performed. | 516| string | Indicates direction of crossing the threshold value that trigger the threshold's action: "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Increasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Decreasing", "xyz.openbmc_project.MonitoringService.Trigger.Numeric.Activation.Either" | 517| variant | Value of reading that will trigger the threshold | 518 519The ```.Metric``` property stores the details about reading, for which trigger was defined. 520It is in a form of structure consisting of three fields. 521 522| Field type | Description | 523|------------|--------------| 524| object path | D-Bus path of sensor. This is a path to the sensor, for which reading trigger is defined. | 525| string | Unique metric Id | 526| object path | D-Bus path of existing reading report. This is required when trigger's action is to update metric report. This path shall point to existing reading report within the Monitoring Service. | 527 528**Trigger operations** 529 530Triggers support three types of operation: Log, Event and Update. For each, 531there is a different way of proceeding. 532 5331. For action Log, the event shall 534be logged to the system journal. In this case the Monitoring Service writes 535data to system journal using libjournal. The Redfish log service shall then 536retrieve the data by reading system journal. All is shown on the diagram below. 537 538```ascii 539+---------------------------+ 540|bmcweb| | +----------------------+ 541+------/ +-----------+-+ | |Monitoring| | 542| |Redfish | | | |Service | | 543| |log service| | | +----------/ | 544| +-----------/ | | | | 545| | | | | | 546| | | | | | 547| +------^------+ | +-----------+----------+ 548+---------------------------+ | 549 | | 550 +----collect----+ event 551 journal entry | (write to journal) 552 | | 553 +------------------------------------+ | 554 |systemd| | | | 555 +-------/ +----------+ +---+------+ | | 556 | |journal| | |libjournal| | | 557 | +-------/ <--> <-------+ 558 | | | +----------+ | 559 | | | | 560 | | | | 561 | +----------+ | 562 | | 563 +------------------------------------+ 564``` 5652. For action Event, the Monitoring Service shall send event using the 566[Redfish Event Service][6] either as push-style event or SSE. 567 5683. For action Update, the Monitoring Service will trigger the update of reading 569report pointed by it's D-Bus path contained in ReportPath property inside 570the ```.Metric``` structure. The update shall cause the reading report's D-Bus 571object to emit property change signal. This will cause Redfish Metric Report to 572be streamed out if it was configured to do so. 573 574**Telemetry Service Redfish API** 575 576Telemetry service shall support 2019.1 Redfish schemas for telemetry resources. 577Metric report definitions determines which metrics are to be included in metric 578report. Metric definition is assigned to particular metric type and it 579describes how the metric should be interpreted. The following resource schemas 580shall be supported: 581 582- TelemetryService 1.1.2 583- MetricDefinition 1.0.3 584- MetricReportDefinition 1.3.0 585- MetricReport 1.2.0 586- Triggers 1.1.1 587 588The following diagram shows relations between these resources. 589 590```ascii 591 +----------------------------------------------------------------------------+ 592 | Service root | 593 +----------------------------------+-------------------------------+---------+ 594 | | 595 | | 596 | | 597 +----------------------------------v-----------------+ +----------v---------+ 598 | | |Chassis | 599 | Telemetry Service | | | 600 | | | | 601 | | | +---------------+ | 602 +---------+--------------+------------------+--------+ | | | | 603 | | | | | Chassis 1 | | 604 | | | | | | | 605 | | | | +---------+-----+ | 606 | | | | | | 607+----------v--+ +--------v----+ +----------v-----+ +--------------------+ 608|Triggers | |Metric | |Metric report | | 609| | |definition | | | | 610| | | +---------+ | | | Reads | 611| +---------+ | | |Reading | | | +-----------+ | ReadingVolts +--v------+ 612| | | | | |Volts <------+ +------------------> | 613| |Trigger 1| | | +---------+ | | | Metric | | | | 614| | | | | | | | report 1 | | Reads | Power | 615| | | | | +---------+ | | | | | PowerConsumed | | 616| | | | | | | | | | | | Watts | | 617| +--+---+--+ | | |Power <------+ +------------------> | 618| | | | | |Consumed | | | +-----^-----+ | +----^----+ 619| | | | | |Watts | | | | | | 620| | | | | +---------+ | | | | | 621| | | | | | | | | | 622+-------------+ +-------------+ +----------------+ | 623 | | | | 624 | | Triggers report update | | 625 | | (when applicable) | | 626 | +--------------------------------+ | 627 | | 628 | Monitors PowerConsumedWatts to check | 629 | whether trigger value is exceeded | 630 +------------------------------------------------------------------+ 631``` 632 633The diagram shows the relations between Redfish resources. Metric report is 634defined to be generated periodically, on demand or on change. Each metric in the 635Metric Report contains the URI to its metric definition and Redfish sensor, 636which reading value is presented. Nevertheless, under this presentation layer, 637Monitoring Service is gathering D-Bus sensors readings and exposing them 638in reading reports over D-Bus for the Telemetry Service. Each D-Bus sensor 639is mapped to Redfish sensor. 640 641Below examples of Redfish resources for the Telemetry Service are shown. 642 643The telemetry service Redfish resource example: 644 645```json 646{ 647 "@odata.type": "#TelemetryService.v1_1_2.TelemetryService", 648 "Id": "TelemetryService", 649 "Name": "Telemetry Service", 650 "Status": { 651 "State": "Enabled", 652 "Health": "OK" 653 }, 654 "MinCollectionInterval": "T00:00:10s", 655 "SupportedCollectionFunctions": [], 656 "MaxReports": <max_no_of_reports>, 657 "MetricDefinitions": { 658 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions" 659 }, 660 "MetricReportDefinitions": { 661 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions" 662 }, 663 "MetricReports": { 664 "@odata.id": "/redfish/v1/TelemetryService/MetricReports" 665 }, 666 "Triggers": { 667 "@odata.id": "/redfish/v1/TelemetryService/Triggers" 668 }, 669 "LogService": { 670 "@odata.id": "/redfish/v1/Managers/<manager_name>/LogServices/Journal" 671 }, 672 "@odata.context": "/redfish/v1/$metadata#TelemetryService", 673 "@odata.id": "/redfish/v1/TelemetryService" 674} 675``` 676 677Sample metric report definition: 678 679```json 680{ 681 "@odata.type": "#MetricReportDefinition.v1_3_0.MetricReportDefinition", 682 "Id": "SampleMetric", 683 "Name": "Sample Metric Report Definition", 684 "MetricReportDefinitionType": "Periodic", 685 "Schedule": { 686 "RecurrenceInterval": "T00:00:10" 687 }, 688 "ReportActions": [ 689 "LogToMetricReportsCollection" 690 ], 691 "ReportUpdates": "Overwrite", 692 "MetricReport": { 693 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/SampleMetric" 694 }, 695 "Status": { 696 "State": "Enabled" 697 }, 698 "Metrics": [ 699 { 700 "MetricId": "Test", 701 "MetricProperties": [ 702 "/redfish/v1/Chassis/NC_Baseboard/Power#/PowerControl/0/PowerConsumedWatts" 703 ] 704 } 705 ], 706 "@odata.context": "/redfish/v1/$metadata#MetricReportDefinition.MetricReportDefinition", 707 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/PlatformPowerUsage" 708} 709``` 710 711Sample metric report: 712 713```json 714{ 715 "@odata.type": "#MetricReport.v1_2_0.MetricReport", 716 "Id": "SampleMetric", 717 "Name": "Sample Metric Report", 718 "ReportSequence": "0", 719 "MetricReportDefinition": { 720 "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 721 }, 722 "MetricValues": [ 723 { 724 "MetricDefinition": { 725 "@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SampleMetricDefinition" 726 }, 727 "MetricId": "Test", 728 "MetricValue": "100", 729 "Timestamp": "2016-11-08T12:25:00-05:00", 730 "MetricProperty": "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 731 } 732 ], 733 "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport", 734 "@odata.id": "/redfish/v1/TelemetryService/MetricReports/AvgPlatformPowerUsage" 735} 736``` 737 738Sample trigger, that will trigger metric report update: 739 740```json 741{ 742 "@odata.type": "#Triggers.v1_1_1.Triggers", 743 "Id": "SampleTrigger", 744 "Name": "Sample Trigger", 745 "MetricType": "Numeric", 746 "Links": { 747 "MetricReportDefinitions": [ 748 "/redfish/v1/TelemetryService/MetricReportDefinitions/SampleMetric" 749 ] 750 }, 751 "Status": { 752 "State": "Enabled" 753 }, 754 "TriggerActions": [ 755 "RedfishMetricReport" 756 ], 757 "NumericThresholds": { 758 "UpperCritical": { 759 "Reading": 50, 760 "Activation": "Increasing", 761 "DwellTime": "PT0.001S" 762 }, 763 "UpperWarning": { 764 "Reading": 48.1, 765 "Activation": "Increasing", 766 "DwellTime": "PT0.004S" 767 } 768 }, 769 "MetricProperties": [ 770 "/redfish/v1/Chassis/Tray_1/Power#/0/PowerConsumedWatts" 771 ], 772 "@odata.context": "/redfish/v1/$metadata#Triggers.Triggers", 773 "@odata.id": "/redfish/v1/TelemetryService/Triggers/PlatformPowerCapTriggers" 774} 775``` 776 777**Performance tests** 778 779Performance test were conducted on the AST2500 system with 64 MB flash and 780512 MB RAM. Flash consumption by the Monitoring Service is 197.5 kB. The 781runtime statistics are shown in the table below. The reading report is 782mapped into single Metric Report. The runtime data is collected for the 783Monitoring Service component only. All reports was created with 784```xyz.openbmc_project.MonitoringService.Metric.OnChange``` property to 785maximize the workload. In the configuration with 50 reports and 50 sensors 786it is about 200 new readings per second, generating 200 reading reports 787per second. The table shows CPU usage and memory usage. The VSZ is the amount 788of memory mapped into the address space of the process. It includes pages 789backed by the process' executable file and shared libraries, its heap and 790stack, as well as anything else it has mapped. 791 792 793| Monitoring service state | VSZ | %VSZ | %CPU | 794|--------------------------------------------------|------|------|------| 795| Idle (0 reports, 0 sensors) |5188 B| 1% | 0% | 796| 1 report, 1 sensor |5188 B| 1% | 1% | 797| 2 reports, 1 sensor |5188 B| 1% | 1% | 798| 2 reports, 2 sensors (1 sensor per report) |5188 B| 1% | 1% | 799| 1 report, 10 sensors |5188 B| 1% | 1% | 800| 10 reports, 10 sensors (same for each report) |5320 B| 1% | 1-2% | 801| 2 reports, 20 sensors (10 per report) |5188 B| 1% | 1% | 802| 30 reports, 30 sensors (10 per report) |5444 B| 1% | 5-9% | 803| 50 reports, 50 sensors (10 per report) |5572 B| 1% |11-14%| 804 805The last two configurations use 10 sensors per reading report, which gives 8063 or 5 distinctive configurations. Each such configuration is used to 807create 10 reading reports to obtain the desired amount of 30 or 50 reading 808reports. 809 810In this architecture reading report is created every time when Redfish 811Metric Report Definition is posted (creating new Metric Report). 812 813## Alternatives Considered 814The [framework based on collectd/librrd][5] was considered as alternate design. 815Although it seems to be versatile and scalable solution, it has some drawbacks 816from our point of view: 817* Collectd's footprint in the minimal working configuration is around 2.6 MB, 818while available space for the OpenBMC is limited to 64 MB. 819* In this design, librrd is used to store metrics on the BMC's non-volatile 820storage, which may be an issue, when lots of metrics are captured and stored 821to OpenBMC's limited storage space. Also flash wear-out issue may occur, when 822metrics are captured frequently (like once per second). 823* Monitoring Service is directly compatible with Redfish Telemetry 824Service API, which means, that Monitoring Service's reading reports can 825be directly mapped to Redfish Metric Reports. 826* Monitoring Service unifies the way how the BMC's telemetry is exposed over 827the Redfish and may be used with multiple front-ends, thus there is no problem 828 to add support telemetry over IPMI or any other API. 829 830Since this design assumes flexibility and modularity, there is no obstacles 831to use collectd in cooperation with Monitoring Service. The one of possible 832configurations is shown on the diagram below. 833 834```ascii 835 +-----------------+ +-----------------+ 836 | | | Monitoring | 837 | D-Bus sensors | | Service | 838 | | | | 839 +--------^--------+ +--------^--------+ 840 | | 841 | | 842 | | 843<--------^--v-----------D-Bus--------v-^----------> 844 | | 845 | | 846 | | 847 +-------v------------+ +----------v--------+ 848 | collectd metrics | | | 849 | exposed as D-Bus | | bmcweb | 850 | sensors | | (with Redfish | 851 +---------^----------+ | Telemetry | 852 | | Service) | 853 | | | 854 +------+-------+ +-------------------+ 855 | | 856 | collectd | 857 | | 858 +--------------+ 859``` 860Here collectd is used as the source of some set of metrics. It exposes them 861as the D-Bus sensors, which can easily be consumed either by the bmcweb and 862Monitoring Service without any changes in their D-Bus interfaces. In such 863configuration Monitoring Service provides metric reports and triggers 864management. 865 866Other possible configuration is to use collectd without the Monitoring Service, 867but in such case, collectd does not provide metric reports and triggers support 868compatible with the Redfish. In such case, Redfish Telemetry Service won't be 869supported or metric reports and triggers support has to be provided by the 870collectd. 871 872## Impacts 873This design impacts the architecture of the bmcweb component, since it adds 874the Telemetry Service implementation as a component for the existing 875Redfish API implementation. 876 877## Testing 878This is the very high-level description of the proposed set of tests. 879Testing shall be done on three basic levels: 880* Unit tests 881* Functional tests 882* Performance tests 883 884**Unit tests** 885 886The Monitoring Service's code shall be covered by the unit tests. The preferred 887framework is the [GTest/GMock][7]. The unit tests shall be ran before code 888change is to be committed to make sure, that nothing is broken in existing 889functionality. Also, when new code is introduced, a new set of unit tests shall 890be committed with it according to test-driven development principle. Unit tests 891shall be also carefully reviewed. 892 893**Functional tests** 894 895Functional tests will be divided into two steps. 896 897First step is for testing the 898Monitoring Service metric reports management. Test scenario shall contain 899creating metric report by POSTing proper metric report definition, reading 900metric report (using GET on proper URI) and deleting the metric report. The 901required configuration for such test is D-Bus sensors (at least some of them) 902and bmcweb with Redfish Telemetry Service implemented. The tests shall be 903performed on real hardware. For ease of metric testing, dummy D-Bus sensors 904may be provided to provide specifically prepared metrics. This configuration 905shall also enable testing aggregated operations (MIN, MAX, SUM, AVG). 906 907Second step is to test triggers and events generation. This will require also 908Event Service to be implemented along with Log Service. Tests shall cover all 909scenarios with sending metric report as an event, triggering metric report 910update and logging events. 911 912**Performance tests** 913 914Performance tests shall be done using full OpenBMC configuration with all 915the required set of features. The tests shall create a lot of metric reports 916(up to maximum number) along with all possible triggers. Measurements shall 917cover the periodic metric report jitter, delays in event logging or sending, 918BMC's CPU utilization and the performance impact on other services. 919 920[1]: https://www.dmtf.org/sites/default/files/standards/documents/DSP2051_0.1.0a.zip 921[2]: https://github.com/openbmc/docs/blob/master/architecture/sensor-architecture.md 922[3]: https://www.kernel.org/doc/Documentation/hwmon/ 923[4]: https://www.freedesktop.org/wiki/Software/dbus/ 924[5]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/22257 925[6]: https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/24749 926[7]: https://github.com/google/googletest 927