1# OpenBMC Anti-patterns 2 3From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern): 4 5"An anti-pattern is a common response to a recurring problem that is usually 6ineffective and risks being highly counterproductive." 7 8The developers of OpenBMC do not get 100% of decisions right 100% of the time. 9That, combined with the fact that software development is often an exercise in 10copying and pasting, results in mistakes happening over and over again. 11 12This page aims to document some of the anti-patterns that exist in OpenBMC to 13ease the job of those reviewing code. If an anti-pattern is spotted, rather that 14repeating the same explanations over and over, a link to this document can be 15provided. 16 17<!-- begin copy/paste on next line --> 18 19## Anti-pattern template [one line description] 20 21### Identification 22 23(1 paragraph) Describe how to spot the anti-pattern. 24 25### Description 26 27(1 paragraph) Describe the negative effects of the anti-pattern. 28 29### Background 30 31(1 paragraph) Describe why the anti-pattern exists. If you don't know, try 32running git blame and look at who wrote the code originally, and ask them on the 33mailing list or in Discord what their original intent was, so it can be 34documented here (and you may possibly discover it isn't as much of an 35anti-pattern as you thought). If you are unable to determine why the 36anti-pattern exists, put: "Unknown" here. 37 38### Resolution 39 40(1 paragraph) Describe the preferred way to solve the problem solved by the 41anti-pattern and the positive effects of solving it in the manner described. 42 43<!-- end copy/paste on previous line --> 44 45## Custom ArgumentParser object 46 47### Identification 48 49The ArgumentParser object is typically present to wrap calls to get options. It 50abstracts away the parsing and provides a `[]` operator to access the 51parameters. 52 53### Description 54 55Writing a custom ArgumentParser object creates nearly duplicate code in a 56repository. The ArgumentParser itself is the same, however, the options provided 57differ. Writing a custom argument parser re-invents the wheel on c++ command 58line argument parsing. 59 60### Background 61 62The ArgumentParser exists because it was in place early and then copied into 63each new repository as an easy way to handle argument parsing. 64 65### Resolution 66 67The CLI11 library was designed and implemented specifically to support modern 68argument parsing. It handles the cases seen in OpenBMC daemons and has some 69handy built-in validators, and allows easy customizations to validation. 70 71## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure 72 73### Identification 74 75``` 76PKG_CHECK_MODULES( 77 [PHOSPHOR_LOGGING], 78 [phosphor-logging], 79 [], 80 [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])]) 81``` 82 83### Description 84 85The autotools PKG_CHECK_MODULES macro provides the ability to specify an "if 86found" and "if not found" behavior. By default, the "if not found" behavior will 87list the package not found. In many cases, this is sufficient to a developer to 88know what package is missing. In most cases, it's another OpenBMC package. 89 90If the library sought's name isn't related to the package providing it, then the 91failure message should be set to something more useful to the developer. 92 93### Resolution 94 95Use the default macro behavior when it is clear that the missing package is 96another OpenBMC package. 97 98``` 99PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging]) 100``` 101 102## Explicit listing of shared library packages in RDEPENDS in bitbake metadata 103 104### Identification 105 106``` 107RDEPENDS_${PN} = "libsystemd" 108``` 109 110### Description 111 112Out of the box bitbake examines built applications, automatically adds runtime 113dependencies and thus ensures any library packages dependencies are 114automatically added to images, sdks, etc. There is no need to list them 115explicitly in a recipe. 116 117Dependencies change over time, and listing them explicitly is likely prone to 118errors - the net effect being unnecessary shared library packages being 119installed into images. 120 121Consult 122https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS 123for information on when to use explicit runtime dependencies. 124 125### Background 126 127The initial bitbake metadata author for OpenBMC was not aware that bitbake added 128these dependencies automatically. Initial bitbake metadata therefore listed 129shared library dependencies explicitly, and was subsequently copy pasted. 130 131### Resolution 132 133Do not list shared library packages in RDEPENDS. This eliminates the possibility 134of installing unnecessary shared library packages due to unmaintained library 135dependency lists in bitbake metadata. 136 137## Use of /usr/bin/env in systemd service files 138 139### Identification 140 141In systemd unit files: 142 143``` 144[Service] 145 146ExecStart=/usr/bin/env some-application 147``` 148 149### Description 150 151Outside of OpenBMC, most applications that provide systemd unit files don't 152launch applications in this way. So if nothing else, this just looks strange and 153violates the 154[princple of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). 155 156### Background 157 158This anti-pattern exists because a requirement exists to enable live patching of 159applications on read-only filesystems. Launching applications in this way was 160part of the implementation that satisfied the live patch requirement. For 161example: 162 163``` 164/usr/bin/phosphor-hwmon 165``` 166 167on a read-only filesystem becomes: 168 169``` 170/usr/local/bin/phosphor-hwmon` 171``` 172 173on a writeable /usr/local filesystem. 174 175### Resolution 176 177The /usr/bin/env method only enables live patching of applications. A method 178that supports live patching of any file in the read-only filesystem has emerged. 179Assuming a writeable filesystem exists _somewhere_ on the bmc, something like: 180 181``` 182mkdir -p /var/persist/usr 183mkdir -p /var/persist/work/usr 184mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr 185``` 186 187can enable live system patching without any additional requirements on how 188applications are launched from systemd service files. This is the preferred 189method for enabling live system patching as it allows OpenBMC developers to 190write systemd service files in the same way as most other projects. 191 192To undo existing instances of this anti-pattern remove /usr/bin/env from systemd 193service files and replace with the fully qualified path to the application being 194launched. For example, given an application in /usr/bin: 195 196``` 197sed -i s,/usr/bin/env ,/usr/bin/, foo.service 198``` 199 200## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin 201 202### Identification 203 204OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake 205metadata, makefiles or configure scripts. systemd service files pointing to 206`/bin` or `/usr/bin` executables. 207 208### Description 209 210Installing OpenBMC applications in incorrect locations violates the 211[principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) 212and more importantly violates the 213[FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard). 214 215### Background 216 217There are typically two types of executables: 218 2191. Long-running daemons started by, for instance, systemd service files and 220 _NOT_ intended to be directly executed by users. 2212. Utilities intended to be used by a user as a CLI. 222 223Executables of type 1 should not be placed anywhere in `${PATH}` because it is 224confusing and error-prone to users, but should instead be placed in a 225`/usr/libexec/<package>` subdirectory. 226 227Executables of type 2 should be placed in `/usr/bin` because they are intended 228to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate as 229we transition to having non-root access). 230 231The sbin anti-pattern exists because the FHS was misinterpreted at an early 232point in OpenBMC project history, and has proliferated ever since. 233 234From the hier(7) man page: 235 236``` 237/usr/bin This is the primary directory for executable programs. Most programs 238executed by normal users which are not needed for booting or for repairing the 239system and which are not installed locally should be placed in this directory. 240 241/usr/sbin This directory contains program binaries for system administration 242which are not essential for the boot process, for mounting /usr, or for system 243repair. 244 245/usr/libexec Directory contains binaries for internal use only and they are 246not meant to be executed directly by users shell or scripts. 247``` 248 249The FHS description for `/usr/sbin` refers to "system administration" but the 250de-facto interpretation of the system being administered refers to the OS 251installation and not a system in the OpenBMC sense of managed system. As such 252OpenBMC applications should be installed in `/usr/bin`. 253 254It is becoming common practice in Linux for daemons to now be moved to `libexec` 255and considered "internal use" from the perspective of the systemd service file 256that controls their launch. 257 258### Resolution 259 260Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate. 261 262## Handling unexpected error codes and exceptions 263 264### Identification 265 266The anti-pattern is for an application to continue processing after it 267encounters unexpected conditions in the form of error codes and exceptions and 268not capturing the data needed to resolve the problem. 269 270Example C++ code: 271 272``` 273using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure; 274try 275{ 276 ... use d-Bus APIs... 277} 278catch (InternalFailure& e) 279{ 280 phosphor::logging::commit<InternalFailure>(); 281} 282``` 283 284### Description 285 286Suppressing unexpected errors can lead an application to incorrect or erratic 287behavior which can affect the service it provides and the overall system. 288 289Writing a log entry instead of a core dump may not give enough information to 290debug a truly unexpected problem, so developers do not get a chance to 291investigate problems and the system's reliability does not improve over time. 292 293### Background 294 295Programmers want their application to continue processing when it encounters 296conditions it can recover from. Sometimes they try too hard and continue when it 297is not appropriate. 298 299Programmers also want to log what is happening in the application, so they write 300log entries that give debug data when something goes wrong, typically targeted 301for their use. They don't consider how the log entry is consumed by the BMC 302administrator or automated service tools. 303 304The `InternalFailure` in the [Phosphor logging README][] is overused. 305 306[phosphor logging readme]: 307 https://github.com/openbmc/phosphor-logging/blob/master/README.md 308 309### Resolution 310 311Several items are needed: 312 3131. Check all places where a return code or errno value is given. Strongly 314 consider that a default error handler should throw an exception, for example 315 `std::system_error`. 3162. Have a good reason to handle specific error codes. See below. 3173. Have a good reason to handle specific exceptions. Allow other exceptions to 318 propagate. 3194. Document (in terms of impacts to other services) what happens when this 320 service fails, stops, or is restarted. Use that to inform the recovery 321 strategy. 322 323In the error handler: 324 3251. Consider what data (if any) should be logged. Determine who will consume the 326 log entry: BMC developers, administrator, or an analysis tool. Usually the 327 answer is more than one of these. 328 329 The following example log entries use an IPMI request to set network access 330 on, but the user input is invalid. 331 332 - BMC Developer: Reference internal applications, services, pids, etc. the 333 developer would be familiar with. 334 335 Example: 336 `ipmid service successfully processed a network setting packet, however the user input of USB0 is not a valid network interface to configure.` 337 338 - Administrator: Reference the external interfaces of the BMC such as the 339 REST API. They can respond to feedback about invalid input, or a need to 340 restart the BMC. 341 342 Example: 343 `The network interface of USB0 is not a valid option. Retry the IPMI command with a valid interface.` 344 345 - Analyzer tool: Consider breaking the log down and including several 346 properties which an analyzer can leverage. For instance, tagging the log 347 with 'Internal' is not helpful. However, breaking that down into something 348 like [UserInput][ipmi][Network] tells at a quick glance that the input 349 received for configuring the network via an IPMI command was invalid. 350 Categorization and system impact are key things to focus on when creating 351 logs for an analysis application. 352 353 Example: 354 `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not valid.` 355 3562. Determine if the application can fully recover from the condition. If not, 357 don't continue. Allow the system to determine if it writes a core dump or 358 restarts the service. If there are severe impacts when the service fails, 359 consider using a better error recovery mechanism. 360 361## Non-standard debug application options and logging 362 363### Identification 364 365An application uses non-standard methods on startup to indicate verbose logging 366and/or does not utilize standard systemd-journald debug levels for logging. 367 368### Description 369 370When debugging issues within OpenBMC that cross multiple applications, it's very 371difficult to enable the appropriate debug when different applications have 372different mechanisms for enabling debug. For example, different OpenBMC 373applications take the following as command line parameters to enable extra 374debug: 375 376- 0xff, --vv, -vv, -v, --verbose, <and more> 377 378Along these same lines, some applications then have their own internal methods 379of writing debug data. They use std::cout, printf, fprintf, ... Doing this 380causes other issues when trying to compare debug data across different 381applications that may be having their buffers flushed at different times (and 382picked up by journald). 383 384### Background 385 386Everyone has their own ways to debug. There was no real standard within OpenBMC 387on how to do it so everyone came up with whatever they were familiar with. 388 389### Resolution 390 391If an OpenBMC application is to support enhanced debug via a command line then 392it will support the standard "-v,--verbose" option. 393 394In general, OpenBMC developers should utilize the "debug" journald level for 395debug messages. This can be enabled/disabled globally and would apply to all 396applications. If a developer believes this would cause too much debug data in 397certain cases then they can protect these journald debug calls around a 398--verbose command line option. 399 400## DBus interface representing GPIOs 401 402### Identification 403 404Desire to expose a DBus interface to drive GPIOs, for example: 405 406- https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/ 407- https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/ 408- https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/ 409 410### Description 411 412Platform functionality selected by GPIOs might equally be selected by other 413means with a shift in system design philosophy. As such, GPIOs are a (hardware) 414implementation detail. Exposing the implementation on DBus forces the 415distribution of platform design details across multiple applications, which 416violates the software design principle of [low coupling][coupling] and impacts 417our confidence in maintenance. 418 419[coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29 420 421### Background 422 423GPIOs are often used to select functionality that might be hard to generalise, 424and therefore hard to abstract. If this is the case, then the functionality in 425question is probably best wrapped up as an implementation detail of a behaviour 426that is more generally applicable (e.g. host power-on procedures). 427 428### Resolution 429 430Consider what functionality the GPIO provides and design or exploit an existing 431interface to expose that behaviour instead. 432 433## Very long lambda callbacks 434 435### Identification 436 437C++ code that is similar to the following: 438 439```cpp 440dbus::utility::getSubTree("/", interfaces, 441 [asyncResp](boost::system::error_code& ec, 442 MapperGetSubTreeResult& res){ 443 <too many lines of code> 444 }) 445``` 446 447### Description 448 449Inline lambdas, while useful in some contexts, are difficult to read, and have 450inconsistent formatting with tools like `clang-format`, which causes significant 451problems in review, and in applying patchsets that might have minor conflicts. 452In addition, because they are declared in a function scope, they are difficult 453to unit test, and produce log messages that are difficult to read given their 454unnamed nature. They are also difficult to debug, lacking any symbol name to 455which to attach breakpoints or tracepoints. 456 457### Background 458 459Lambdas are a natural approach to implementing callbacks in the context of 460asynchronous IO. Further, the Boost and sdbusplus ASIO APIs tend to encourage 461this approach. Doing something other than lambdas requires more effort, and so 462these options tend not to be chosen without pressure to do so. 463 464### Resolution 465 466Prefer to either use `std::bind_front` and a method or static function to handle 467the return, or a lambda that is less than 10 lines of code to handle an error 468inline. In cases where `std::bind_front` cannot be used, such as in 469`sdbusplus::asio::connection::async_method_call`, keep the lambda length less 470than 10 lines, and call the appropriate function for handling non-trivial 471transforms. 472 473```cpp 474void afterGetSubTree(std::shared_ptr<bmcweb::AsyncResp>& asyncResp, 475 boost::system::error_code& ec, 476 MapperGetSubTreeResult& res){ 477 <many lines of code> 478} 479dbus::utility::getSubTree("/xyz/openbmc_project/inventory", interfaces, 480 std::bind_front(afterGetSubTree, asyncResp)); 481``` 482 483See also the [Cpp Core Guidelines][] for generalized guidelines on when lambdas 484are appropriate. The above recommendation is aligned with the Cpp Core 485Guidelines. 486 487[Cpp Core Guidelines]: 488 https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f11-use-an-unnamed-lambda-if-you-need-a-simple-function-object-in-one-place-only 489 490## Placing internal headers in a parallel subtree 491 492### Identification 493 494Declaring types and functions that do not participate in a public API of a 495library in header files that do not live alongside their implementation in the 496source tree. 497 498### Description 499 500There's no reason to put internal headers in a parallel subtree (for example, 501`include/`). It's more effort to organise, puts unnecessary distance between 502declarations and definitions, and increases effort required in the build system 503configuration. 504 505### Background 506 507In C and C++, header files expose the public API of a library to its consumers 508and need to be installed onto the developer's system in a location known to the 509compiler. To delineate which header files are to be installed and which are not, 510the public header files are often placed into an `include/` subdirectory of the 511source tree to mark their importance. 512 513Any functions or structures that are implementation details of the library 514should not be provided in its installed header files. Ignoring this philosphy 515over-exposes the library's design and may lead to otherwise unnecessary API or 516ABI breaks in the future. 517 518Further, projects whose artifacts are only application binaries have no public 519API or ABI in the sense of a library. Any header files in the source tree of 520such projects have no need to be installed onto a developer's system and 521segregation in path from the implementation serves no purpose. 522 523### Resolution 524 525Place internal header files immediately alongside source files containing the 526corresponding implementation. 527 528## Ill-defined data structuring in `lg2` message strings 529 530### Identification 531 532Attempts at encoding information into the journal's MESSAGE string that is at 533most only plausible to parse using a regex while also reducing human 534readability. For example: 535 536``` 537error( 538 "Error getting time, PATH={BMC_TIME_PATH} TIME INTERACE={TIME_INTF} ERROR={ERR_EXCEP}", 539 "BMC_TIME_PATH", bmcTimePath, "TIME_INTF", timeInterface, 540 "ERR_EXCEP", e); 541``` 542 543### Description 544 545[`lg2` is OpenBMC's preferred C++ logging interface][phosphor-logging-lg2] and 546is implemented on top of the systemd journal and its library APIs. 547[systemd-journald provides structured logging][systemd-structured-logging], 548which allows us to capture additional metadata alongside the provided message. 549 550[phosphor-logging-lg2]: 551 https://github.com/openbmc/phosphor-logging/blob/master/docs/structured-logging.md 552[systemd-structured-logging]: 553 https://0pointer.de/blog/projects/journal-submit.html 554 555The concept of structured logging allows for convenient programmable access to 556metadata associated with a log event. The journal captures a default set of 557metadata with each message logged. However, the primary way the entries are 558exposed to users is `journalctl`'s default behaviour of printing just the 559`MESSAGE` value for each log entry. This may lead to a misunderstanding that the 560printed message is the only way to expose related metadata for investigating 561defects. 562 563For human ergonomics `lg2` provides the ability to interpolate structured data 564into the journal's `MESSAGE` string. This aids with human inspection of the logs 565as it becomes possible to highlight important metadata for a log event. However, 566it's not intended that this interpolation be used for ad-hoc, ill-defined 567attempts at exposing metadata for automated analysis. 568 569All key-value pairs provided to the `lg2` logging APIs are captured in the 570structured log event, regardless of whether any particular key is interpolated 571into the `MESSAGE` string. It is always possible to recover the information 572associated with the log event even if it's not captured in the `MESSAGE` string. 573 574`phosphor-time-manager` demonstrates a reasonable use of the `lg2` APIs. One 575logging instance in the code [is as follows][phosphor-time-manager-lg2]: 576 577[phosphor-time-manager-lg2]: 578 https://github.com/openbmc/phosphor-time-manager/blob/5ce9ac0e56440312997b25771507585905e8b360/manager.cpp#L98 579 580``` 581info("Time mode has been changed to {MODE}", "MODE", newMode); 582``` 583 584By default, this renders in the output of `journalctl` as: 585 586``` 587Sep 23 06:09:57 bmc phosphor-time-manager[373]: Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP 588``` 589 590However, we can use some journalctl commandline options to inspect the 591structured data associated with the log entry: 592 593``` 594# journalctl --identifier=phosphor-time-manager --boot --output=verbose | grep -v '^ _' | head -n 9 595Sat 2023-09-23 06:09:57.645208 UTC [s=85c1cb5f8e02445aa110a5164c9c07f6;i=244;b=ffd111d3cdca41c8893bb728a1c6cb20;m=133a5a0;t=606009314d0d9;x=9a54e8714754a6cb] 596 PRIORITY=6 597 MESSAGE=Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP 598 LOG2_FMTMSG=Time mode has been changed to {MODE} 599 CODE_FILE=/usr/src/debug/phosphor-time-manager/1.0+git/manager.cpp 600 CODE_LINE=98 601 CODE_FUNC=bool phosphor::time::Manager::setCurrentTimeMode(const std::string&) 602 MODE=xyz.openbmc_project.Time.Synchronization.Method.NTP 603 SYSLOG_IDENTIFIER=phosphor-time-manager 604``` 605 606Here we find that `MODE` and its value are captured as its own metadata entry in 607the structured data, as well as being interpolated into `MESSAGE` as requested. 608Additionally, from the log entry we can find _how_ `MODE` was interpolated into 609`MESSAGE` using the format string captured in the `LOG2_FMTMSG` metadata entry. 610 611`LOG2_FMTMSG` also provides a stable handle for identifying the existence of a 612specific class of log events in the journal, further aiding automated analysis. 613 614### Background 615 616A variety of patches such as [PLDM:Catching exception precisely and printing 617it][openbmc-gerrit-67994] added a number of ad-hoc, ill-defined attempts at 618providing all the metadata through the `MESSAGE` entry. 619 620[openbmc-gerrit-67994]: https://gerrit.openbmc.org/c/openbmc/pldm/+/67994 621 622### Resolution 623 624`lg2` messages should be formatted for consumption by humans. They should not 625contain ad-hoc, ill-defined attempts at integrating metadata for automated 626analysis. 627