1# OpenBMC Anti-patterns 2 3From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern): 4 5 6"An anti-pattern is a common response to a recurring problem that is usually 7ineffective and risks being highly counterproductive." 8 9 10The developers of OpenBMC do not get 100% of decisions right 100% of the time. 11That, combined with the fact that software development is often an exercise in 12copying and pasting, results in mistakes happening over and over again. 13 14 15This page aims to document some of the anti-patterns that exist in OpenBMC to 16ease the job of those reviewing code. If an anti-pattern is spotted, rather 17that repeating the same explanations over and over, a link to this document can 18be provided. 19 20 21<!-- begin copy/paste on next line --> 22 23## Anti-pattern template [one line description] 24 25### Identification 26(1 paragraph) Describe how to spot the anti-pattern. 27 28### Description 29(1 paragraph) Describe the negative effects of the anti-pattern. 30 31### Background 32(1 paragraph) Describe why the anti-pattern exists. If you don't know, try 33running git blame and look at who wrote the code originally, and ask them on the 34mailing list or in Discord what their original intent was, so it can be documented 35here (and you may possibly discover it isn't as much of an anti-pattern as you 36thought). If you are unable to determine why the anti-pattern exists, put: 37"Unknown" here. 38 39### Resolution 40(1 paragraph) Describe the preferred way to solve the problem solved by the 41anti-pattern and the positive effects of solving it in the manner described. 42 43<!-- end copy/paste on previous line --> 44 45## Custom ArgumentParser object 46 47### Identification 48 49The ArgumentParser object is typically present to wrap calls to get options. 50It abstracts away the parsing and provides a `[]` operator to access the 51parameters. 52 53### Description 54 55Writing a custom ArgumentParser object creates nearly duplicate code in a 56repository. The ArgumentParser itself is the same, however, the options 57provided differ. Writing a custom argument parser re-invents the wheel on 58c++ command line argument parsing. 59 60### Background 61 62The ArgumentParser exists because it was in place early and then copied into 63each new repository as an easy way to handle argument parsing. 64 65### Resolution 66 67The CLI11 library was designed and implemented specifically to support modern 68argument parsing. It handles the cases seen in OpenBMC daemons and has some 69handy built-in validators, and allows easy customizations to validation. 70 71## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure 72 73### Identification 74``` 75PKG_CHECK_MODULES( 76 [PHOSPHOR_LOGGING], 77 [phosphor-logging], 78 [], 79 [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])]) 80``` 81 82### Description 83 84The autotools PKG_CHECK_MODULES macro provides the ability to specify an 85"if found" and "if not found" behavior. By default, the "if not found" 86behavior will list the package not found. In many cases, this is sufficient 87to a developer to know what package is missing. In most cases, it's another 88OpenBMC package. 89 90If the library sought's name isn't related to the package providing it, then 91the failure message should be set to something more useful to the developer. 92 93### Resolution 94 95Use the default macro behavior when it is clear that the missing package is 96another OpenBMC package. 97 98``` 99PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging]) 100``` 101 102## Explicit listing of shared library packages in RDEPENDS in bitbake metadata 103 104### Identification 105``` 106RDEPENDS_${PN} = "libsystemd" 107``` 108 109### Description 110Out of the box bitbake examines built applications, automatically adds runtime 111dependencies and thus ensures any library packages dependencies are 112automatically added to images, sdks, etc. There is no need to list them 113explicitly in a recipe. 114 115Dependencies change over time, and listing them explicitly is likely prone to 116errors - the net effect being unnecessary shared library packages being 117installed into images. 118 119Consult 120https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS 121for information on when to use explicit runtime dependencies. 122 123### Background 124The initial bitbake metadata author for OpenBMC was not aware that bitbake 125added these dependencies automatically. Initial bitbake metadata therefore 126listed shared library dependencies explicitly, and was subsequently copy pasted. 127 128### Resolution 129Do not list shared library packages in RDEPENDS. This eliminates the 130possibility of installing unnecessary shared library packages due to 131unmaintained library dependency lists in bitbake metadata. 132 133## Use of /usr/bin/env in systemd service files 134 135### Identification 136In systemd unit files: 137``` 138[Service] 139 140ExecStart=/usr/bin/env some-application 141``` 142 143### Description 144Outside of OpenBMC, most applications that provide systemd unit files don't 145launch applications in this way. So if nothing else, this just looks strange 146and violates the [princple of least 147astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). 148 149### Background 150This anti-pattern exists because a requirement exists to enable live patching of 151applications on read-only filesystems. Launching applications in this way was 152part of the implementation that satisfied the live patch requirement. For 153example: 154 155``` 156/usr/bin/phosphor-hwmon 157``` 158 159on a read-only filesystem becomes: 160 161``` 162/usr/local/bin/phosphor-hwmon` 163``` 164 165on a writeable /usr/local filesystem. 166 167### Resolution 168The /usr/bin/env method only enables live patching of applications. A method 169that supports live patching of any file in the read-only filesystem has emerged. 170Assuming a writeable filesystem exists _somewhere_ on the bmc, something like: 171 172``` 173mkdir -p /var/persist/usr 174mkdir -p /var/persist/work/usr 175mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr 176``` 177can enable live system patching without any additional requirements on how 178applications are launched from systemd service files. This is the preferred 179method for enabling live system patching as it allows OpenBMC developers to 180write systemd service files in the same way as most other projects. 181 182To undo existing instances of this anti-pattern remove /usr/bin/env from systemd 183service files and replace with the fully qualified path to the application being 184launched. For example, given an application in /usr/bin: 185 186``` 187sed -i s,/usr/bin/env ,/usr/bin/, foo.service 188``` 189 190## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin 191 192### Identification 193OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake 194metadata, makefiles or configure scripts. systemd service files pointing to 195`/bin` or `/usr/bin` executables. 196 197### Description 198Installing OpenBMC applications in incorrect locations violates the 199[principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) and more importantly violates the 200[FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard). 201 202### Background 203 204There are typically two types of executables: 205 2061. Long-running daemons started by, for instance, systemd service files and 207 *NOT* intended to be directly executed by users. 2082. Utilities intended to be used by a user as a CLI. 209 210Executables of type 1 should not be placed anywhere in `${PATH}` because it 211is confusing and error-prone to users, but should instead be placed in a 212`/usr/libexec/<package>` subdirectory. 213 214Executables of type 2 should be placed in `/usr/bin` because they are intended 215to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate 216as we transition to having non-root access). 217 218The sbin anti-pattern exists because the FHS was misinterpreted at an early 219point in OpenBMC project history, and has proliferated ever since. 220 221From the hier(7) man page: 222 223``` 224/usr/bin This is the primary directory for executable programs. Most programs 225executed by normal users which are not needed for booting or for repairing the 226system and which are not installed locally should be placed in this directory. 227 228/usr/sbin This directory contains program binaries for system administration 229which are not essential for the boot process, for mounting /usr, or for system 230repair. 231 232/usr/libexec Directory contains binaries for internal use only and they are 233not meant to be executed directly by users shell or scripts. 234``` 235 236The FHS description for `/usr/sbin` refers to "system administration" but the 237de-facto interpretation of the system being administered refers to the OS 238installation and not a system in the OpenBMC sense of managed system. As such 239OpenBMC applications should be installed in `/usr/bin`. 240 241It is becoming common practice in Linux for daemons to now be moved to `libexec` 242and considered "internal use" from the perspective of the systemd service file 243that controls their launch. 244 245### Resolution 246Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate. 247 248## Handling unexpected error codes and exceptions 249 250### Identification 251The anti-pattern is for an application to continue processing after it 252encounters unexpected conditions in the form of error codes and exceptions and 253not capturing the data needed to resolve the problem. 254 255Example C++ code: 256``` 257using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure; 258try 259{ 260 ... use d-Bus APIs... 261} 262catch (InternalFailure& e) 263{ 264 phosphor::logging::commit<InternalFailure>(); 265} 266``` 267 268### Description 269Suppressing unexpected errors can lead an application to incorrect or erratic 270behavior which can affect the service it provides and the overall system. 271 272Writing a log entry instead of a core dump may not give enough information to 273debug a truly unexpected problem, so developers do not get a chance to 274investigate problems and the system's reliability does not improve over time. 275 276### Background 277Programmers want their application to continue processing when it encounters 278conditions it can recover from. Sometimes they try too hard and continue when 279it is not appropriate. 280 281Programmers also want to log what is happening in the application, so they 282write log entries that give debug data when something goes wrong, typically 283targeted for their use. They don't consider how the log entry is consumed 284by the BMC administrator or automated service tools. 285 286The `InternalFailure` in the [Phosphor logging README][] is overused. 287 288[Phosphor logging README]: https://github.com/openbmc/phosphor-logging/blob/master/README.md 289 290### Resolution 291 292Several items are needed: 2931. Check all places where a return code or errno value is given. Strongly 294 consider that a default error handler should throw an exception, for 295 example `std::system_error`. 2962. Have a good reason to handle specific error codes. See below. 2973. Have a good reason to handle specific exceptions. Allow other exceptions 298 to propagate. 2994. Document (in terms of impacts to other services) what happens when this 300 service fails, stops, or is restarted. Use that to inform the recovery 301 strategy. 302 303In the error handler: 304 3051. Consider what data (if any) should be logged. Determine who will consume 306 the log entry: BMC developers, administrator, or an analysis tool. Usually 307 the answer is more than one of these. 308 309 The following example log entries use an IPMI request to set network access 310 on, but the user input is invalid. 311 312 - BMC Developer: Reference internal applications, services, pids, etc. 313 the developer would be familiar with. 314 315 Example: `ipmid service successfully processed a network setting packet, 316 however the user input of USB0 is not a valid network interface to 317 configure.` 318 319 - Administrator: Reference the external interfaces of the BMC such as the 320 REST API. They can respond to feedback about invalid input, or a need 321 to restart the BMC. 322 323 Example: `The network interface of USB0 is not a valid option. Retry the 324 IPMI command with a valid interface.` 325 326 - Analyzer tool: Consider breaking the log down and including several 327 properties which an analyzer can leverage. For instance, tagging the 328 log with 'Internal' is not helpful. However, breaking that down into 329 something like [UserInput][IPMI][Network] tells at a quick glance that 330 the input received for configuring the network via an IPMI command was 331 invalid. Categorization and system impact are key things to focus on 332 when creating logs for an analysis application. 333 334 Example: `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not 335 valid.` 336 3372. Determine if the application can fully recover from the condition. If not, 338 don't continue. Allow the system to determine if it writes a core dump or 339 restarts the service. If there are severe impacts when the service fails, 340 consider using a better error recovery mechanism. 341 342## Non-standard debug application options and logging 343 344### Identification 345An application uses non-standard methods on startup to indicate verbose 346logging and/or does not utilize standard systemd-journald debug levels for 347logging. 348 349### Description 350When debugging issues within OpenBMC that cross multiple applications, it's 351very difficult to enable the appropriate debug when different applications 352have different mechanisms for enabling debug. For example, different OpenBMC 353applications take the following as command line parameters to enable extra 354debug: 355- 0xff, --vv, -vv, -v, --verbose, <and more> 356 357Along these same lines, some applications then have their own internal methods 358of writing debug data. They use std::cout, printf, fprintf, ... 359Doing this causes other issues when trying to compare debug data across 360different applications that may be having their buffers flushed at different 361times (and picked up by journald). 362 363### Background 364Everyone has their own ways to debug. There was no real standard within 365OpenBMC on how to do it so everyone came up with whatever they were familiar 366with. 367 368### Resolution 369If an OpenBMC application is to support enhanced debug via a command line then 370it will support the standard "-v,--verbose" option. 371 372In general, OpenBMC developers should utilize the "debug" journald level for 373debug messages. This can be enabled/disabled globally and would apply to 374all applications. If a developer believes this would cause too much debug 375data in certain cases then they can protect these journald debug calls around 376a --verbose command line option. 377 378## DBus interface representing GPIOs 379 380### Identification 381Desire to expose a DBus interface to drive GPIOs, for example: 382 383* https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/ 384* https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/ 385* https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/ 386 387### Description 388Platform functionality selected by GPIOs might equally be selected by other 389means with a shift in system design philosophy. As such, GPIOs are a (hardware) 390implementation detail. Exposing the implementation on DBus forces the 391distribution of platform design details across multiple applications, which 392violates the software design principle of [low coupling][coupling] and impacts 393our confidence in maintenance. 394 395[coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29 396 397### Background 398GPIOs are often used to select functionality that might be hard to generalise, 399and therefore hard to abstract. If this is the case, then the functionality 400in question is probably best wrapped up as an implementation detail of a 401behaviour that is more generally applicable (e.g. host power-on procedures). 402 403### Resolution 404Consider what functionality the GPIO provides and design or exploit an existing 405interface to expose that behaviour instead. 406