1 # OpenBMC Anti-patterns 2 3 From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern): 4 5 "An anti-pattern is a common response to a recurring problem that is usually 6 ineffective and risks being highly counterproductive." 7 8 The developers of OpenBMC do not get 100% of decisions right 100% of the time. 9 That, combined with the fact that software development is often an exercise in 10 copying and pasting, results in mistakes happening over and over again. 11 12 This page aims to document some of the anti-patterns that exist in OpenBMC to 13 ease the job of those reviewing code. If an anti-pattern is spotted, rather that 14 repeating the same explanations over and over, a link to this document can be 15 provided. 16 17 <!-- begin copy/paste on next line --> 18 19 ## Anti-pattern template [one line description] 20 21 ### Identification 22 23 (1 paragraph) Describe how to spot the anti-pattern. 24 25 ### Description 26 27 (1 paragraph) Describe the negative effects of the anti-pattern. 28 29 ### Background 30 31 (1 paragraph) Describe why the anti-pattern exists. If you don't know, try 32 running git blame and look at who wrote the code originally, and ask them on the 33 mailing list or in Discord what their original intent was, so it can be 34 documented here (and you may possibly discover it isn't as much of an 35 anti-pattern as you thought). If you are unable to determine why the 36 anti-pattern exists, put: "Unknown" here. 37 38 ### Resolution 39 40 (1 paragraph) Describe the preferred way to solve the problem solved by the 41 anti-pattern and the positive effects of solving it in the manner described. 42 43 <!-- end copy/paste on previous line --> 44 45 ## Custom ArgumentParser object 46 47 ### Identification 48 49 The ArgumentParser object is typically present to wrap calls to get options. It 50 abstracts away the parsing and provides a `[]` operator to access the 51 parameters. 52 53 ### Description 54 55 Writing a custom ArgumentParser object creates nearly duplicate code in a 56 repository. The ArgumentParser itself is the same, however, the options provided 57 differ. Writing a custom argument parser re-invents the wheel on c++ command 58 line argument parsing. 59 60 ### Background 61 62 The ArgumentParser exists because it was in place early and then copied into 63 each new repository as an easy way to handle argument parsing. 64 65 ### Resolution 66 67 The CLI11 library was designed and implemented specifically to support modern 68 argument parsing. It handles the cases seen in OpenBMC daemons and has some 69 handy built-in validators, and allows easy customizations to validation. 70 71 ## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure 72 73 ### Identification 74 75 ``` 76 PKG_CHECK_MODULES( 77 [PHOSPHOR_LOGGING], 78 [phosphor-logging], 79 [], 80 [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])]) 81 ``` 82 83 ### Description 84 85 The autotools PKG_CHECK_MODULES macro provides the ability to specify an "if 86 found" and "if not found" behavior. By default, the "if not found" behavior will 87 list the package not found. In many cases, this is sufficient to a developer to 88 know what package is missing. In most cases, it's another OpenBMC package. 89 90 If the library sought's name isn't related to the package providing it, then the 91 failure message should be set to something more useful to the developer. 92 93 ### Resolution 94 95 Use the default macro behavior when it is clear that the missing package is 96 another OpenBMC package. 97 98 ``` 99 PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging]) 100 ``` 101 102 ## Explicit listing of shared library packages in RDEPENDS in bitbake metadata 103 104 ### Identification 105 106 ``` 107 RDEPENDS_${PN} = "libsystemd" 108 ``` 109 110 ### Description 111 112 Out of the box bitbake examines built applications, automatically adds runtime 113 dependencies and thus ensures any library packages dependencies are 114 automatically added to images, sdks, etc. There is no need to list them 115 explicitly in a recipe. 116 117 Dependencies change over time, and listing them explicitly is likely prone to 118 errors - the net effect being unnecessary shared library packages being 119 installed into images. 120 121 Consult 122 https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS 123 for information on when to use explicit runtime dependencies. 124 125 ### Background 126 127 The initial bitbake metadata author for OpenBMC was not aware that bitbake added 128 these dependencies automatically. Initial bitbake metadata therefore listed 129 shared library dependencies explicitly, and was subsequently copy pasted. 130 131 ### Resolution 132 133 Do not list shared library packages in RDEPENDS. This eliminates the possibility 134 of installing unnecessary shared library packages due to unmaintained library 135 dependency lists in bitbake metadata. 136 137 ## Use of /usr/bin/env in systemd service files 138 139 ### Identification 140 141 In systemd unit files: 142 143 ``` 144 [Service] 145 146 ExecStart=/usr/bin/env some-application 147 ``` 148 149 ### Description 150 151 Outside of OpenBMC, most applications that provide systemd unit files don't 152 launch applications in this way. So if nothing else, this just looks strange and 153 violates the 154 [princple of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). 155 156 ### Background 157 158 This anti-pattern exists because a requirement exists to enable live patching of 159 applications on read-only filesystems. Launching applications in this way was 160 part of the implementation that satisfied the live patch requirement. For 161 example: 162 163 ``` 164 /usr/bin/phosphor-hwmon 165 ``` 166 167 on a read-only filesystem becomes: 168 169 ``` 170 /usr/local/bin/phosphor-hwmon` 171 ``` 172 173 on a writeable /usr/local filesystem. 174 175 ### Resolution 176 177 The /usr/bin/env method only enables live patching of applications. A method 178 that supports live patching of any file in the read-only filesystem has emerged. 179 Assuming a writeable filesystem exists _somewhere_ on the bmc, something like: 180 181 ``` 182 mkdir -p /var/persist/usr 183 mkdir -p /var/persist/work/usr 184 mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr 185 ``` 186 187 can enable live system patching without any additional requirements on how 188 applications are launched from systemd service files. This is the preferred 189 method for enabling live system patching as it allows OpenBMC developers to 190 write systemd service files in the same way as most other projects. 191 192 To undo existing instances of this anti-pattern remove /usr/bin/env from systemd 193 service files and replace with the fully qualified path to the application being 194 launched. For example, given an application in /usr/bin: 195 196 ``` 197 sed -i s,/usr/bin/env ,/usr/bin/, foo.service 198 ``` 199 200 ## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin 201 202 ### Identification 203 204 OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake 205 metadata, makefiles or configure scripts. systemd service files pointing to 206 `/bin` or `/usr/bin` executables. 207 208 ### Description 209 210 Installing OpenBMC applications in incorrect locations violates the 211 [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) 212 and more importantly violates the 213 [FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard). 214 215 ### Background 216 217 There are typically two types of executables: 218 219 1. Long-running daemons started by, for instance, systemd service files and 220 _NOT_ intended to be directly executed by users. 221 2. Utilities intended to be used by a user as a CLI. 222 223 Executables of type 1 should not be placed anywhere in `${PATH}` because it is 224 confusing and error-prone to users, but should instead be placed in a 225 `/usr/libexec/<package>` subdirectory. 226 227 Executables of type 2 should be placed in `/usr/bin` because they are intended 228 to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate as 229 we transition to having non-root access). 230 231 The sbin anti-pattern exists because the FHS was misinterpreted at an early 232 point in OpenBMC project history, and has proliferated ever since. 233 234 From the hier(7) man page: 235 236 ``` 237 /usr/bin This is the primary directory for executable programs. Most programs 238 executed by normal users which are not needed for booting or for repairing the 239 system and which are not installed locally should be placed in this directory. 240 241 /usr/sbin This directory contains program binaries for system administration 242 which are not essential for the boot process, for mounting /usr, or for system 243 repair. 244 245 /usr/libexec Directory contains binaries for internal use only and they are 246 not meant to be executed directly by users shell or scripts. 247 ``` 248 249 The FHS description for `/usr/sbin` refers to "system administration" but the 250 de-facto interpretation of the system being administered refers to the OS 251 installation and not a system in the OpenBMC sense of managed system. As such 252 OpenBMC applications should be installed in `/usr/bin`. 253 254 It is becoming common practice in Linux for daemons to now be moved to `libexec` 255 and considered "internal use" from the perspective of the systemd service file 256 that controls their launch. 257 258 ### Resolution 259 260 Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate. 261 262 ## Handling unexpected error codes and exceptions 263 264 ### Identification 265 266 The anti-pattern is for an application to continue processing after it 267 encounters unexpected conditions in the form of error codes and exceptions and 268 not capturing the data needed to resolve the problem. 269 270 Example C++ code: 271 272 ``` 273 using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure; 274 try 275 { 276 ... use d-Bus APIs... 277 } 278 catch (InternalFailure& e) 279 { 280 phosphor::logging::commit<InternalFailure>(); 281 } 282 ``` 283 284 ### Description 285 286 Suppressing unexpected errors can lead an application to incorrect or erratic 287 behavior which can affect the service it provides and the overall system. 288 289 Writing a log entry instead of a core dump may not give enough information to 290 debug a truly unexpected problem, so developers do not get a chance to 291 investigate problems and the system's reliability does not improve over time. 292 293 ### Background 294 295 Programmers want their application to continue processing when it encounters 296 conditions it can recover from. Sometimes they try too hard and continue when it 297 is not appropriate. 298 299 Programmers also want to log what is happening in the application, so they write 300 log entries that give debug data when something goes wrong, typically targeted 301 for their use. They don't consider how the log entry is consumed by the BMC 302 administrator or automated service tools. 303 304 The `InternalFailure` in the [Phosphor logging README][] is overused. 305 306 [phosphor logging readme]: 307 https://github.com/openbmc/phosphor-logging/blob/master/README.md 308 309 ### Resolution 310 311 Several items are needed: 312 313 1. Check all places where a return code or errno value is given. Strongly 314 consider that a default error handler should throw an exception, for example 315 `std::system_error`. 316 2. Have a good reason to handle specific error codes. See below. 317 3. Have a good reason to handle specific exceptions. Allow other exceptions to 318 propagate. 319 4. Document (in terms of impacts to other services) what happens when this 320 service fails, stops, or is restarted. Use that to inform the recovery 321 strategy. 322 323 In the error handler: 324 325 1. Consider what data (if any) should be logged. Determine who will consume the 326 log entry: BMC developers, administrator, or an analysis tool. Usually the 327 answer is more than one of these. 328 329 The following example log entries use an IPMI request to set network access 330 on, but the user input is invalid. 331 332 - BMC Developer: Reference internal applications, services, pids, etc. the 333 developer would be familiar with. 334 335 Example: 336 `ipmid service successfully processed a network setting packet, however the user input of USB0 is not a valid network interface to configure.` 337 338 - Administrator: Reference the external interfaces of the BMC such as the 339 REST API. They can respond to feedback about invalid input, or a need to 340 restart the BMC. 341 342 Example: 343 `The network interface of USB0 is not a valid option. Retry the IPMI command with a valid interface.` 344 345 - Analyzer tool: Consider breaking the log down and including several 346 properties which an analyzer can leverage. For instance, tagging the log 347 with 'Internal' is not helpful. However, breaking that down into something 348 like [UserInput][ipmi][Network] tells at a quick glance that the input 349 received for configuring the network via an IPMI command was invalid. 350 Categorization and system impact are key things to focus on when creating 351 logs for an analysis application. 352 353 Example: 354 `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not valid.` 355 356 2. Determine if the application can fully recover from the condition. If not, 357 don't continue. Allow the system to determine if it writes a core dump or 358 restarts the service. If there are severe impacts when the service fails, 359 consider using a better error recovery mechanism. 360 361 ## Non-standard debug application options and logging 362 363 ### Identification 364 365 An application uses non-standard methods on startup to indicate verbose logging 366 and/or does not utilize standard systemd-journald debug levels for logging. 367 368 ### Description 369 370 When debugging issues within OpenBMC that cross multiple applications, it's very 371 difficult to enable the appropriate debug when different applications have 372 different mechanisms for enabling debug. For example, different OpenBMC 373 applications take the following as command line parameters to enable extra 374 debug: 375 376 - 0xff, --vv, -vv, -v, --verbose, <and more> 377 378 Along these same lines, some applications then have their own internal methods 379 of writing debug data. They use std::cout, printf, fprintf, ... Doing this 380 causes other issues when trying to compare debug data across different 381 applications that may be having their buffers flushed at different times (and 382 picked up by journald). 383 384 ### Background 385 386 Everyone has their own ways to debug. There was no real standard within OpenBMC 387 on how to do it so everyone came up with whatever they were familiar with. 388 389 ### Resolution 390 391 If an OpenBMC application is to support enhanced debug via a command line then 392 it will support the standard "-v,--verbose" option. 393 394 In general, OpenBMC developers should utilize the "debug" journald level for 395 debug messages. This can be enabled/disabled globally and would apply to all 396 applications. If a developer believes this would cause too much debug data in 397 certain cases then they can protect these journald debug calls around a 398 --verbose command line option. 399 400 ## DBus interface representing GPIOs 401 402 ### Identification 403 404 Desire to expose a DBus interface to drive GPIOs, for example: 405 406 - https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/ 407 - https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/ 408 - https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/ 409 410 ### Description 411 412 Platform functionality selected by GPIOs might equally be selected by other 413 means with a shift in system design philosophy. As such, GPIOs are a (hardware) 414 implementation detail. Exposing the implementation on DBus forces the 415 distribution of platform design details across multiple applications, which 416 violates the software design principle of [low coupling][coupling] and impacts 417 our confidence in maintenance. 418 419 [coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29 420 421 ### Background 422 423 GPIOs are often used to select functionality that might be hard to generalise, 424 and therefore hard to abstract. If this is the case, then the functionality in 425 question is probably best wrapped up as an implementation detail of a behaviour 426 that is more generally applicable (e.g. host power-on procedures). 427 428 ### Resolution 429 430 Consider what functionality the GPIO provides and design or exploit an existing 431 interface to expose that behaviour instead. 432 433 ## Very long lambda callbacks 434 435 ### Identification 436 437 C++ code that is similar to the following: 438 439 ```cpp 440 dbus::utility::getSubTree("/", interfaces, 441 [asyncResp](boost::system::error_code& ec, 442 MapperGetSubTreeResult& res){ 443 <too many lines of code> 444 }) 445 ``` 446 447 ### Description 448 449 Inline lambdas, while useful in some contexts, are difficult to read, and have 450 inconsistent formatting with tools like `clang-format`, which causes significant 451 problems in review, and in applying patchsets that might have minor conflicts. 452 In addition, because they are declared in a function scope, they are difficult 453 to unit test, and produce log messages that are difficult to read given their 454 unnamed nature. They are also difficult to debug, lacking any symbol name to 455 which to attach breakpoints or tracepoints. 456 457 ### Background 458 459 Lambdas are a natural approach to implementing callbacks in the context of 460 asynchronous IO. Further, the Boost and sdbusplus ASIO APIs tend to encourage 461 this approach. Doing something other than lambdas requires more effort, and so 462 these options tend not to be chosen without pressure to do so. 463 464 ### Resolution 465 466 Prefer to either use `std::bind_front` and a method or static function to handle 467 the return, or a lambda that is less than 10 lines of code to handle an error 468 inline. In cases where `std::bind_front` cannot be used, such as in 469 `sdbusplus::asio::connection::async_method_call`, keep the lambda length less 470 than 10 lines, and call the appropriate function for handling non-trivial 471 transforms. 472 473 ```cpp 474 void afterGetSubTree(std::shared_ptr<bmcweb::AsyncResp>& asyncResp, 475 boost::system::error_code& ec, 476 MapperGetSubTreeResult& res){ 477 <many lines of code> 478 } 479 dbus::utility::getSubTree("/xyz/openbmc_project/inventory", interfaces, 480 std::bind_front(afterGetSubTree, asyncResp)); 481 ``` 482 483 See also the [Cpp Core Guidelines][] for generalized guidelines on when lambdas 484 are appropriate. The above recommendation is aligned with the Cpp Core 485 Guidelines. 486 487 [Cpp Core Guidelines]: 488 https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f11-use-an-unnamed-lambda-if-you-need-a-simple-function-object-in-one-place-only 489 490 ## Placing internal headers in a parallel subtree 491 492 ### Identification 493 494 Declaring types and functions that do not participate in a public API of a 495 library in header files that do not live alongside their implementation in the 496 source tree. 497 498 ### Description 499 500 There's no reason to put internal headers in a parallel subtree (for example, 501 `include/`). It's more effort to organise, puts unnecessary distance between 502 declarations and definitions, and increases effort required in the build system 503 configuration. 504 505 ### Background 506 507 In C and C++, header files expose the public API of a library to its consumers 508 and need to be installed onto the developer's system in a location known to the 509 compiler. To delineate which header files are to be installed and which are not, 510 the public header files are often placed into an `include/` subdirectory of the 511 source tree to mark their importance. 512 513 Any functions or structures that are implementation details of the library 514 should not be provided in its installed header files. Ignoring this philosphy 515 over-exposes the library's design and may lead to otherwise unnecessary API or 516 ABI breaks in the future. 517 518 Further, projects whose artifacts are only application binaries have no public 519 API or ABI in the sense of a library. Any header files in the source tree of 520 such projects have no need to be installed onto a developer's system and 521 segregation in path from the implementation serves no purpose. 522 523 ### Resolution 524 525 Place internal header files immediately alongside source files containing the 526 corresponding implementation. 527 528 ## Ill-defined data structuring in `lg2` message strings 529 530 ### Identification 531 532 Attempts at encoding information into the journal's MESSAGE string that is at 533 most only plausible to parse using a regex while also reducing human 534 readability. For example: 535 536 ``` 537 error( 538 "Error getting time, PATH={BMC_TIME_PATH} TIME INTERACE={TIME_INTF} ERROR={ERR_EXCEP}", 539 "BMC_TIME_PATH", bmcTimePath, "TIME_INTF", timeInterface, 540 "ERR_EXCEP", e); 541 ``` 542 543 ### Description 544 545 [`lg2` is OpenBMC's preferred C++ logging interface][phosphor-logging-lg2] and 546 is implemented on top of the systemd journal and its library APIs. 547 [systemd-journald provides structured logging][systemd-structured-logging], 548 which allows us to capture additional metadata alongside the provided message. 549 550 [phosphor-logging-lg2]: 551 https://github.com/openbmc/phosphor-logging/blob/master/docs/structured-logging.md 552 [systemd-structured-logging]: 553 https://0pointer.de/blog/projects/journal-submit.html 554 555 The concept of structured logging allows for convenient programmable access to 556 metadata associated with a log event. The journal captures a default set of 557 metadata with each message logged. However, the primary way the entries are 558 exposed to users is `journalctl`'s default behaviour of printing just the 559 `MESSAGE` value for each log entry. This may lead to a misunderstanding that the 560 printed message is the only way to expose related metadata for investigating 561 defects. 562 563 For human ergonomics `lg2` provides the ability to interpolate structured data 564 into the journal's `MESSAGE` string. This aids with human inspection of the logs 565 as it becomes possible to highlight important metadata for a log event. However, 566 it's not intended that this interpolation be used for ad-hoc, ill-defined 567 attempts at exposing metadata for automated analysis. 568 569 All key-value pairs provided to the `lg2` logging APIs are captured in the 570 structured log event, regardless of whether any particular key is interpolated 571 into the `MESSAGE` string. It is always possible to recover the information 572 associated with the log event even if it's not captured in the `MESSAGE` string. 573 574 `phosphor-time-manager` demonstrates a reasonable use of the `lg2` APIs. One 575 logging instance in the code [is as follows][phosphor-time-manager-lg2]: 576 577 [phosphor-time-manager-lg2]: 578 https://github.com/openbmc/phosphor-time-manager/blob/5ce9ac0e56440312997b25771507585905e8b360/manager.cpp#L98 579 580 ``` 581 info("Time mode has been changed to {MODE}", "MODE", newMode); 582 ``` 583 584 By default, this renders in the output of `journalctl` as: 585 586 ``` 587 Sep 23 06:09:57 bmc phosphor-time-manager[373]: Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP 588 ``` 589 590 However, we can use some journalctl commandline options to inspect the 591 structured data associated with the log entry: 592 593 ``` 594 # journalctl --identifier=phosphor-time-manager --boot --output=verbose | grep -v '^ _' | head -n 9 595 Sat 2023-09-23 06:09:57.645208 UTC [s=85c1cb5f8e02445aa110a5164c9c07f6;i=244;b=ffd111d3cdca41c8893bb728a1c6cb20;m=133a5a0;t=606009314d0d9;x=9a54e8714754a6cb] 596 PRIORITY=6 597 MESSAGE=Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP 598 LOG2_FMTMSG=Time mode has been changed to {MODE} 599 CODE_FILE=/usr/src/debug/phosphor-time-manager/1.0+git/manager.cpp 600 CODE_LINE=98 601 CODE_FUNC=bool phosphor::time::Manager::setCurrentTimeMode(const std::string&) 602 MODE=xyz.openbmc_project.Time.Synchronization.Method.NTP 603 SYSLOG_IDENTIFIER=phosphor-time-manager 604 ``` 605 606 Here we find that `MODE` and its value are captured as its own metadata entry in 607 the structured data, as well as being interpolated into `MESSAGE` as requested. 608 Additionally, from the log entry we can find _how_ `MODE` was interpolated into 609 `MESSAGE` using the format string captured in the `LOG2_FMTMSG` metadata entry. 610 611 `LOG2_FMTMSG` also provides a stable handle for identifying the existence of a 612 specific class of log events in the journal, further aiding automated analysis. 613 614 ### Background 615 616 A variety of patches such as [PLDM:Catching exception precisely and printing 617 it][openbmc-gerrit-67994] added a number of ad-hoc, ill-defined attempts at 618 providing all the metadata through the `MESSAGE` entry. 619 620 [openbmc-gerrit-67994]: https://gerrit.openbmc.org/c/openbmc/pldm/+/67994 621 622 ### Resolution 623 624 `lg2` messages should be formatted for consumption by humans. They should not 625 contain ad-hoc, ill-defined attempts at integrating metadata for automated 626 analysis. 627