1# OpenBMC Anti-patterns 2 3From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern): 4 5"An anti-pattern is a common response to a recurring problem that is usually 6ineffective and risks being highly counterproductive." 7 8The developers of OpenBMC do not get 100% of decisions right 100% of the time. 9That, combined with the fact that software development is often an exercise in 10copying and pasting, results in mistakes happening over and over again. 11 12This page aims to document some of the anti-patterns that exist in OpenBMC to 13ease the job of those reviewing code. If an anti-pattern is spotted, rather that 14repeating the same explanations over and over, a link to this document can be 15provided. 16 17<!-- begin copy/paste on next line --> 18 19## Anti-pattern template [one line description] 20 21### Identification 22 23(1 paragraph) Describe how to spot the anti-pattern. 24 25### Description 26 27(1 paragraph) Describe the negative effects of the anti-pattern. 28 29### Background 30 31(1 paragraph) Describe why the anti-pattern exists. If you don't know, try 32running git blame and look at who wrote the code originally, and ask them on the 33mailing list or in Discord what their original intent was, so it can be 34documented here (and you may possibly discover it isn't as much of an 35anti-pattern as you thought). If you are unable to determine why the 36anti-pattern exists, put: "Unknown" here. 37 38### Resolution 39 40(1 paragraph) Describe the preferred way to solve the problem solved by the 41anti-pattern and the positive effects of solving it in the manner described. 42 43<!-- end copy/paste on previous line --> 44 45## Custom ArgumentParser object 46 47### Identification 48 49The ArgumentParser object is typically present to wrap calls to get options. It 50abstracts away the parsing and provides a `[]` operator to access the 51parameters. 52 53### Description 54 55Writing a custom ArgumentParser object creates nearly duplicate code in a 56repository. The ArgumentParser itself is the same, however, the options provided 57differ. Writing a custom argument parser re-invents the wheel on c++ command 58line argument parsing. 59 60### Background 61 62The ArgumentParser exists because it was in place early and then copied into 63each new repository as an easy way to handle argument parsing. 64 65### Resolution 66 67The CLI11 library was designed and implemented specifically to support modern 68argument parsing. It handles the cases seen in OpenBMC daemons and has some 69handy built-in validators, and allows easy customizations to validation. 70 71## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure 72 73### Identification 74 75``` 76PKG_CHECK_MODULES( 77 [PHOSPHOR_LOGGING], 78 [phosphor-logging], 79 [], 80 [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])]) 81``` 82 83### Description 84 85The autotools PKG_CHECK_MODULES macro provides the ability to specify an "if 86found" and "if not found" behavior. By default, the "if not found" behavior will 87list the package not found. In many cases, this is sufficient to a developer to 88know what package is missing. In most cases, it's another OpenBMC package. 89 90If the library sought's name isn't related to the package providing it, then the 91failure message should be set to something more useful to the developer. 92 93### Resolution 94 95Use the default macro behavior when it is clear that the missing package is 96another OpenBMC package. 97 98``` 99PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging]) 100``` 101 102## Explicit listing of shared library packages in RDEPENDS in bitbake metadata 103 104### Identification 105 106``` 107RDEPENDS_${PN} = "libsystemd" 108``` 109 110### Description 111 112Out of the box bitbake examines built applications, automatically adds runtime 113dependencies and thus ensures any library packages dependencies are 114automatically added to images, sdks, etc. There is no need to list them 115explicitly in a recipe. 116 117Dependencies change over time, and listing them explicitly is likely prone to 118errors - the net effect being unnecessary shared library packages being 119installed into images. 120 121Consult 122https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS 123for information on when to use explicit runtime dependencies. 124 125### Background 126 127The initial bitbake metadata author for OpenBMC was not aware that bitbake added 128these dependencies automatically. Initial bitbake metadata therefore listed 129shared library dependencies explicitly, and was subsequently copy pasted. 130 131### Resolution 132 133Do not list shared library packages in RDEPENDS. This eliminates the possibility 134of installing unnecessary shared library packages due to unmaintained library 135dependency lists in bitbake metadata. 136 137## Use of /usr/bin/env in systemd service files 138 139### Identification 140 141In systemd unit files: 142 143``` 144[Service] 145 146ExecStart=/usr/bin/env some-application 147``` 148 149### Description 150 151Outside of OpenBMC, most applications that provide systemd unit files don't 152launch applications in this way. So if nothing else, this just looks strange and 153violates the 154[princple of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). 155 156### Background 157 158This anti-pattern exists because a requirement exists to enable live patching of 159applications on read-only filesystems. Launching applications in this way was 160part of the implementation that satisfied the live patch requirement. For 161example: 162 163``` 164/usr/bin/phosphor-hwmon 165``` 166 167on a read-only filesystem becomes: 168 169``` 170/usr/local/bin/phosphor-hwmon` 171``` 172 173on a writeable /usr/local filesystem. 174 175### Resolution 176 177The /usr/bin/env method only enables live patching of applications. A method 178that supports live patching of any file in the read-only filesystem has emerged. 179Assuming a writeable filesystem exists _somewhere_ on the bmc, something like: 180 181``` 182mkdir -p /var/persist/usr 183mkdir -p /var/persist/work/usr 184mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr 185``` 186 187can enable live system patching without any additional requirements on how 188applications are launched from systemd service files. This is the preferred 189method for enabling live system patching as it allows OpenBMC developers to 190write systemd service files in the same way as most other projects. 191 192To undo existing instances of this anti-pattern remove /usr/bin/env from systemd 193service files and replace with the fully qualified path to the application being 194launched. For example, given an application in /usr/bin: 195 196``` 197sed -i s,/usr/bin/env ,/usr/bin/, foo.service 198``` 199 200## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin 201 202### Identification 203 204OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake 205metadata, makefiles or configure scripts. systemd service files pointing to 206`/bin` or `/usr/bin` executables. 207 208### Description 209 210Installing OpenBMC applications in incorrect locations violates the 211[principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) 212and more importantly violates the 213[FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard). 214 215### Background 216 217There are typically two types of executables: 218 2191. Long-running daemons started by, for instance, systemd service files and 220 _NOT_ intended to be directly executed by users. 2212. Utilities intended to be used by a user as a CLI. 222 223Executables of type 1 should not be placed anywhere in `${PATH}` because it is 224confusing and error-prone to users, but should instead be placed in a 225`/usr/libexec/<package>` subdirectory. 226 227Executables of type 2 should be placed in `/usr/bin` because they are intended 228to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate as 229we transition to having non-root access). 230 231The sbin anti-pattern exists because the FHS was misinterpreted at an early 232point in OpenBMC project history, and has proliferated ever since. 233 234From the hier(7) man page: 235 236``` 237/usr/bin This is the primary directory for executable programs. Most programs 238executed by normal users which are not needed for booting or for repairing the 239system and which are not installed locally should be placed in this directory. 240 241/usr/sbin This directory contains program binaries for system administration 242which are not essential for the boot process, for mounting /usr, or for system 243repair. 244 245/usr/libexec Directory contains binaries for internal use only and they are 246not meant to be executed directly by users shell or scripts. 247``` 248 249The FHS description for `/usr/sbin` refers to "system administration" but the 250de-facto interpretation of the system being administered refers to the OS 251installation and not a system in the OpenBMC sense of managed system. As such 252OpenBMC applications should be installed in `/usr/bin`. 253 254It is becoming common practice in Linux for daemons to now be moved to `libexec` 255and considered "internal use" from the perspective of the systemd service file 256that controls their launch. 257 258### Resolution 259 260Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate. 261 262## Handling unexpected error codes and exceptions 263 264### Identification 265 266The anti-pattern is for an application to continue processing after it 267encounters unexpected conditions in the form of error codes and exceptions and 268not capturing the data needed to resolve the problem. 269 270Example C++ code: 271 272``` 273using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure; 274try 275{ 276 ... use d-Bus APIs... 277} 278catch (InternalFailure& e) 279{ 280 phosphor::logging::commit<InternalFailure>(); 281} 282``` 283 284### Description 285 286Suppressing unexpected errors can lead an application to incorrect or erratic 287behavior which can affect the service it provides and the overall system. 288 289Writing a log entry instead of a core dump may not give enough information to 290debug a truly unexpected problem, so developers do not get a chance to 291investigate problems and the system's reliability does not improve over time. 292 293### Background 294 295Programmers want their application to continue processing when it encounters 296conditions it can recover from. Sometimes they try too hard and continue when it 297is not appropriate. 298 299Programmers also want to log what is happening in the application, so they write 300log entries that give debug data when something goes wrong, typically targeted 301for their use. They don't consider how the log entry is consumed by the BMC 302administrator or automated service tools. 303 304The `InternalFailure` in the [Phosphor logging README][] is overused. 305 306[phosphor logging readme]: 307 https://github.com/openbmc/phosphor-logging/blob/master/README.md 308 309### Resolution 310 311Several items are needed: 312 3131. Check all places where a return code or errno value is given. Strongly 314 consider that a default error handler should throw an exception, for example 315 `std::system_error`. 3162. Have a good reason to handle specific error codes. See below. 3173. Have a good reason to handle specific exceptions. Allow other exceptions to 318 propagate. 3194. Document (in terms of impacts to other services) what happens when this 320 service fails, stops, or is restarted. Use that to inform the recovery 321 strategy. 322 323In the error handler: 324 3251. Consider what data (if any) should be logged. Determine who will consume the 326 log entry: BMC developers, administrator, or an analysis tool. Usually the 327 answer is more than one of these. 328 329 The following example log entries use an IPMI request to set network access 330 on, but the user input is invalid. 331 332 - BMC Developer: Reference internal applications, services, pids, etc. the 333 developer would be familiar with. 334 335 Example: 336 `ipmid service successfully processed a network setting packet, however the user input of USB0 is not a valid network interface to configure.` 337 338 - Administrator: Reference the external interfaces of the BMC such as the 339 REST API. They can respond to feedback about invalid input, or a need to 340 restart the BMC. 341 342 Example: 343 `The network interface of USB0 is not a valid option. Retry the IPMI command with a valid interface.` 344 345 - Analyzer tool: Consider breaking the log down and including several 346 properties which an analyzer can leverage. For instance, tagging the log 347 with 'Internal' is not helpful. However, breaking that down into something 348 like [UserInput][ipmi][Network] tells at a quick glance that the input 349 received for configuring the network via an IPMI command was invalid. 350 Categorization and system impact are key things to focus on when creating 351 logs for an analysis application. 352 353 Example: 354 `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not valid.` 355 3562. Determine if the application can fully recover from the condition. If not, 357 don't continue. Allow the system to determine if it writes a core dump or 358 restarts the service. If there are severe impacts when the service fails, 359 consider using a better error recovery mechanism. 360 361## Non-standard debug application options and logging 362 363### Identification 364 365An application uses non-standard methods on startup to indicate verbose logging 366and/or does not utilize standard systemd-journald debug levels for logging. 367 368### Description 369 370When debugging issues within OpenBMC that cross multiple applications, it's very 371difficult to enable the appropriate debug when different applications have 372different mechanisms for enabling debug. For example, different OpenBMC 373applications take the following as command line parameters to enable extra 374debug: 375 376- 0xff, --vv, -vv, -v, --verbose, <and more> 377 378Along these same lines, some applications then have their own internal methods 379of writing debug data. They use std::cout, printf, fprintf, ... Doing this 380causes other issues when trying to compare debug data across different 381applications that may be having their buffers flushed at different times (and 382picked up by journald). 383 384### Background 385 386Everyone has their own ways to debug. There was no real standard within OpenBMC 387on how to do it so everyone came up with whatever they were familiar with. 388 389### Resolution 390 391If an OpenBMC application is to support enhanced debug via a command line then 392it will support the standard "-v,--verbose" option. 393 394In general, OpenBMC developers should utilize the "debug" journald level for 395debug messages. This can be enabled/disabled globally and would apply to all 396applications. If a developer believes this would cause too much debug data in 397certain cases then they can protect these journald debug calls around a 398--verbose command line option. 399 400## DBus interface representing GPIOs 401 402### Identification 403 404Desire to expose a DBus interface to drive GPIOs, for example: 405 406- https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/ 407- https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/ 408- https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/ 409 410### Description 411 412Platform functionality selected by GPIOs might equally be selected by other 413means with a shift in system design philosophy. As such, GPIOs are a (hardware) 414implementation detail. Exposing the implementation on DBus forces the 415distribution of platform design details across multiple applications, which 416violates the software design principle of [low coupling][coupling] and impacts 417our confidence in maintenance. 418 419[coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29 420 421### Background 422 423GPIOs are often used to select functionality that might be hard to generalise, 424and therefore hard to abstract. If this is the case, then the functionality in 425question is probably best wrapped up as an implementation detail of a behaviour 426that is more generally applicable (e.g. host power-on procedures). 427 428### Resolution 429 430Consider what functionality the GPIO provides and design or exploit an existing 431interface to expose that behaviour instead. 432