xref: /openbmc/docs/anti-patterns.md (revision 5ef3526d0836782378ba82d01edb0a19908a0559)
1 # OpenBMC Anti-patterns
2 
3 From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern):
4 
5 "An anti-pattern is a common response to a recurring problem that is usually
6 ineffective and risks being highly counterproductive."
7 
8 The developers of OpenBMC do not get 100% of decisions right 100% of the time.
9 That, combined with the fact that software development is often an exercise in
10 copying and pasting, results in mistakes happening over and over again.
11 
12 This page aims to document some of the anti-patterns that exist in OpenBMC to
13 ease the job of those reviewing code. If an anti-pattern is spotted, rather that
14 repeating the same explanations over and over, a link to this document can be
15 provided.
16 
17 <!-- begin copy/paste on next line -->
18 
19 ## Anti-pattern template [one line description]
20 
21 ### Identification
22 
23 (1 paragraph) Describe how to spot the anti-pattern.
24 
25 ### Description
26 
27 (1 paragraph) Describe the negative effects of the anti-pattern.
28 
29 ### Background
30 
31 (1 paragraph) Describe why the anti-pattern exists. If you don't know, try
32 running git blame and look at who wrote the code originally, and ask them on the
33 mailing list or in Discord what their original intent was, so it can be
34 documented here (and you may possibly discover it isn't as much of an
35 anti-pattern as you thought). If you are unable to determine why the
36 anti-pattern exists, put: "Unknown" here.
37 
38 ### Resolution
39 
40 (1 paragraph) Describe the preferred way to solve the problem solved by the
41 anti-pattern and the positive effects of solving it in the manner described.
42 
43 <!-- end copy/paste on previous line -->
44 
45 ## Custom ArgumentParser object
46 
47 ### Identification
48 
49 The ArgumentParser object is typically present to wrap calls to get options. It
50 abstracts away the parsing and provides a `[]` operator to access the
51 parameters.
52 
53 ### Description
54 
55 Writing a custom ArgumentParser object creates nearly duplicate code in a
56 repository. The ArgumentParser itself is the same, however, the options provided
57 differ. Writing a custom argument parser re-invents the wheel on c++ command
58 line argument parsing.
59 
60 ### Background
61 
62 The ArgumentParser exists because it was in place early and then copied into
63 each new repository as an easy way to handle argument parsing.
64 
65 ### Resolution
66 
67 The CLI11 library was designed and implemented specifically to support modern
68 argument parsing. It handles the cases seen in OpenBMC daemons and has some
69 handy built-in validators, and allows easy customizations to validation.
70 
71 ## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure
72 
73 ### Identification
74 
75 ```
76 PKG_CHECK_MODULES(
77     [PHOSPHOR_LOGGING],
78     [phosphor-logging],
79     [],
80     [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])])
81 ```
82 
83 ### Description
84 
85 The autotools PKG_CHECK_MODULES macro provides the ability to specify an "if
86 found" and "if not found" behavior. By default, the "if not found" behavior will
87 list the package not found. In many cases, this is sufficient to a developer to
88 know what package is missing. In most cases, it's another OpenBMC package.
89 
90 If the library sought's name isn't related to the package providing it, then the
91 failure message should be set to something more useful to the developer.
92 
93 ### Resolution
94 
95 Use the default macro behavior when it is clear that the missing package is
96 another OpenBMC package.
97 
98 ```
99 PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging])
100 ```
101 
102 ## Explicit listing of shared library packages in RDEPENDS in bitbake metadata
103 
104 ### Identification
105 
106 ```
107 RDEPENDS_${PN} = "libsystemd"
108 ```
109 
110 ### Description
111 
112 Out of the box bitbake examines built applications, automatically adds runtime
113 dependencies and thus ensures any library packages dependencies are
114 automatically added to images, sdks, etc. There is no need to list them
115 explicitly in a recipe.
116 
117 Dependencies change over time, and listing them explicitly is likely prone to
118 errors - the net effect being unnecessary shared library packages being
119 installed into images.
120 
121 Consult
122 https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS
123 for information on when to use explicit runtime dependencies.
124 
125 ### Background
126 
127 The initial bitbake metadata author for OpenBMC was not aware that bitbake added
128 these dependencies automatically. Initial bitbake metadata therefore listed
129 shared library dependencies explicitly, and was subsequently copy pasted.
130 
131 ### Resolution
132 
133 Do not list shared library packages in RDEPENDS. This eliminates the possibility
134 of installing unnecessary shared library packages due to unmaintained library
135 dependency lists in bitbake metadata.
136 
137 ## Use of /usr/bin/env in systemd service files
138 
139 ### Identification
140 
141 In systemd unit files:
142 
143 ```
144 [Service]
145 
146 ExecStart=/usr/bin/env some-application
147 ```
148 
149 ### Description
150 
151 Outside of OpenBMC, most applications that provide systemd unit files don't
152 launch applications in this way. So if nothing else, this just looks strange and
153 violates the
154 [princple of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment).
155 
156 ### Background
157 
158 This anti-pattern exists because a requirement exists to enable live patching of
159 applications on read-only filesystems. Launching applications in this way was
160 part of the implementation that satisfied the live patch requirement. For
161 example:
162 
163 ```
164 /usr/bin/phosphor-hwmon
165 ```
166 
167 on a read-only filesystem becomes:
168 
169 ```
170 /usr/local/bin/phosphor-hwmon`
171 ```
172 
173 on a writeable /usr/local filesystem.
174 
175 ### Resolution
176 
177 The /usr/bin/env method only enables live patching of applications. A method
178 that supports live patching of any file in the read-only filesystem has emerged.
179 Assuming a writeable filesystem exists _somewhere_ on the bmc, something like:
180 
181 ```
182 mkdir -p /var/persist/usr
183 mkdir -p /var/persist/work/usr
184 mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr
185 ```
186 
187 can enable live system patching without any additional requirements on how
188 applications are launched from systemd service files. This is the preferred
189 method for enabling live system patching as it allows OpenBMC developers to
190 write systemd service files in the same way as most other projects.
191 
192 To undo existing instances of this anti-pattern remove /usr/bin/env from systemd
193 service files and replace with the fully qualified path to the application being
194 launched. For example, given an application in /usr/bin:
195 
196 ```
197 sed -i s,/usr/bin/env ,/usr/bin/, foo.service
198 ```
199 
200 ## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin
201 
202 ### Identification
203 
204 OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake
205 metadata, makefiles or configure scripts. systemd service files pointing to
206 `/bin` or `/usr/bin` executables.
207 
208 ### Description
209 
210 Installing OpenBMC applications in incorrect locations violates the
211 [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment)
212 and more importantly violates the
213 [FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard).
214 
215 ### Background
216 
217 There are typically two types of executables:
218 
219 1. Long-running daemons started by, for instance, systemd service files and
220    _NOT_ intended to be directly executed by users.
221 2. Utilities intended to be used by a user as a CLI.
222 
223 Executables of type 1 should not be placed anywhere in `${PATH}` because it is
224 confusing and error-prone to users, but should instead be placed in a
225 `/usr/libexec/<package>` subdirectory.
226 
227 Executables of type 2 should be placed in `/usr/bin` because they are intended
228 to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate as
229 we transition to having non-root access).
230 
231 The sbin anti-pattern exists because the FHS was misinterpreted at an early
232 point in OpenBMC project history, and has proliferated ever since.
233 
234 From the hier(7) man page:
235 
236 ```
237 /usr/bin This is the primary directory for executable programs.  Most programs
238 executed by normal users which are not needed for booting or for repairing the
239 system and which are not installed locally should be placed in this directory.
240 
241 /usr/sbin This directory contains program binaries for system administration
242 which are not essential for the boot process, for mounting /usr, or for system
243 repair.
244 
245 /usr/libexec Directory  contains  binaries for internal use only and they are
246 not meant to be executed directly by users shell or scripts.
247 ```
248 
249 The FHS description for `/usr/sbin` refers to "system administration" but the
250 de-facto interpretation of the system being administered refers to the OS
251 installation and not a system in the OpenBMC sense of managed system. As such
252 OpenBMC applications should be installed in `/usr/bin`.
253 
254 It is becoming common practice in Linux for daemons to now be moved to `libexec`
255 and considered "internal use" from the perspective of the systemd service file
256 that controls their launch.
257 
258 ### Resolution
259 
260 Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate.
261 
262 ## Handling unexpected error codes and exceptions
263 
264 ### Identification
265 
266 The anti-pattern is for an application to continue processing after it
267 encounters unexpected conditions in the form of error codes and exceptions and
268 not capturing the data needed to resolve the problem.
269 
270 Example C++ code:
271 
272 ```
273 using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure;
274 try
275 {
276   ... use d-Bus APIs...
277 }
278 catch (InternalFailure& e)
279 {
280     phosphor::logging::commit<InternalFailure>();
281 }
282 ```
283 
284 ### Description
285 
286 Suppressing unexpected errors can lead an application to incorrect or erratic
287 behavior which can affect the service it provides and the overall system.
288 
289 Writing a log entry instead of a core dump may not give enough information to
290 debug a truly unexpected problem, so developers do not get a chance to
291 investigate problems and the system's reliability does not improve over time.
292 
293 ### Background
294 
295 Programmers want their application to continue processing when it encounters
296 conditions it can recover from. Sometimes they try too hard and continue when it
297 is not appropriate.
298 
299 Programmers also want to log what is happening in the application, so they write
300 log entries that give debug data when something goes wrong, typically targeted
301 for their use. They don't consider how the log entry is consumed by the BMC
302 administrator or automated service tools.
303 
304 The `InternalFailure` in the [Phosphor logging README][] is overused.
305 
306 [phosphor logging readme]:
307   https://github.com/openbmc/phosphor-logging/blob/master/README.md
308 
309 ### Resolution
310 
311 Several items are needed:
312 
313 1. Check all places where a return code or errno value is given. Strongly
314    consider that a default error handler should throw an exception, for example
315    `std::system_error`.
316 2. Have a good reason to handle specific error codes. See below.
317 3. Have a good reason to handle specific exceptions. Allow other exceptions to
318    propagate.
319 4. Document (in terms of impacts to other services) what happens when this
320    service fails, stops, or is restarted. Use that to inform the recovery
321    strategy.
322 
323 In the error handler:
324 
325 1. Consider what data (if any) should be logged. Determine who will consume the
326    log entry: BMC developers, administrator, or an analysis tool. Usually the
327    answer is more than one of these.
328 
329    The following example log entries use an IPMI request to set network access
330    on, but the user input is invalid.
331 
332    - BMC Developer: Reference internal applications, services, pids, etc. the
333      developer would be familiar with.
334 
335      Example:
336      `ipmid service successfully processed a network setting packet, however the user input of USB0 is not a valid network interface to configure.`
337 
338    - Administrator: Reference the external interfaces of the BMC such as the
339      REST API. They can respond to feedback about invalid input, or a need to
340      restart the BMC.
341 
342      Example:
343      `The network interface of USB0 is not a valid option. Retry the IPMI command with a valid interface.`
344 
345    - Analyzer tool: Consider breaking the log down and including several
346      properties which an analyzer can leverage. For instance, tagging the log
347      with 'Internal' is not helpful. However, breaking that down into something
348      like [UserInput][ipmi][Network] tells at a quick glance that the input
349      received for configuring the network via an IPMI command was invalid.
350      Categorization and system impact are key things to focus on when creating
351      logs for an analysis application.
352 
353      Example:
354      `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not valid.`
355 
356 2. Determine if the application can fully recover from the condition. If not,
357    don't continue. Allow the system to determine if it writes a core dump or
358    restarts the service. If there are severe impacts when the service fails,
359    consider using a better error recovery mechanism.
360 
361 ## Non-standard debug application options and logging
362 
363 ### Identification
364 
365 An application uses non-standard methods on startup to indicate verbose logging
366 and/or does not utilize standard systemd-journald debug levels for logging.
367 
368 ### Description
369 
370 When debugging issues within OpenBMC that cross multiple applications, it's very
371 difficult to enable the appropriate debug when different applications have
372 different mechanisms for enabling debug. For example, different OpenBMC
373 applications take the following as command line parameters to enable extra
374 debug:
375 
376 - 0xff, --vv, -vv, -v, --verbose, <and more>
377 
378 Along these same lines, some applications then have their own internal methods
379 of writing debug data. They use std::cout, printf, fprintf, ... Doing this
380 causes other issues when trying to compare debug data across different
381 applications that may be having their buffers flushed at different times (and
382 picked up by journald).
383 
384 ### Background
385 
386 Everyone has their own ways to debug. There was no real standard within OpenBMC
387 on how to do it so everyone came up with whatever they were familiar with.
388 
389 ### Resolution
390 
391 If an OpenBMC application is to support enhanced debug via a command line then
392 it will support the standard "-v,--verbose" option.
393 
394 In general, OpenBMC developers should utilize the "debug" journald level for
395 debug messages. This can be enabled/disabled globally and would apply to all
396 applications. If a developer believes this would cause too much debug data in
397 certain cases then they can protect these journald debug calls around a
398 --verbose command line option.
399 
400 ## DBus interface representing GPIOs
401 
402 ### Identification
403 
404 Desire to expose a DBus interface to drive GPIOs, for example:
405 
406 - https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/
407 - https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/
408 - https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/
409 
410 ### Description
411 
412 Platform functionality selected by GPIOs might equally be selected by other
413 means with a shift in system design philosophy. As such, GPIOs are a (hardware)
414 implementation detail. Exposing the implementation on DBus forces the
415 distribution of platform design details across multiple applications, which
416 violates the software design principle of [low coupling][coupling] and impacts
417 our confidence in maintenance.
418 
419 [coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29
420 
421 ### Background
422 
423 GPIOs are often used to select functionality that might be hard to generalise,
424 and therefore hard to abstract. If this is the case, then the functionality in
425 question is probably best wrapped up as an implementation detail of a behaviour
426 that is more generally applicable (e.g. host power-on procedures).
427 
428 ### Resolution
429 
430 Consider what functionality the GPIO provides and design or exploit an existing
431 interface to expose that behaviour instead.
432 
433 ## Very long lambda callbacks
434 
435 ### Identification
436 
437 C++ code that is similar to the following:
438 
439 ```cpp
440 dbus::utility::getSubTree("/", interfaces,
441                          [asyncResp](boost::system::error_code& ec,
442                                      MapperGetSubTreeResult& res){
443                             <too many lines of code>
444                          })
445 ```
446 
447 ### Description
448 
449 Inline lambdas, while useful in some contexts, are difficult to read, and have
450 inconsistent formatting with tools like `clang-format`, which causes significant
451 problems in review, and in applying patchsets that might have minor conflicts.
452 In addition, because they are declared in a function scope, they are difficult
453 to unit test, and produce log messages that are difficult to read given their
454 unnamed nature. They are also difficult to debug, lacking any symbol name to
455 which to attach breakpoints or tracepoints.
456 
457 ### Background
458 
459 Lambdas are a natural approach to implementing callbacks in the context of
460 asynchronous IO. Further, the Boost and sdbusplus ASIO APIs tend to encourage
461 this approach. Doing something other than lambdas requires more effort, and so
462 these options tend not to be chosen without pressure to do so.
463 
464 ### Resolution
465 
466 Prefer to either use `std::bind_front` and a method or static function to handle
467 the return, or a lambda that is less than 10 lines of code to handle an error
468 inline. In cases where `std::bind_front` cannot be used, such as in
469 `sdbusplus::asio::connection::async_method_call`, keep the lambda length less
470 than 10 lines, and call the appropriate function for handling non-trivial
471 transforms.
472 
473 ```cpp
474 void afterGetSubTree(std::shared_ptr<bmcweb::AsyncResp>& asyncResp,
475                      boost::system::error_code& ec,
476                      MapperGetSubTreeResult& res){
477    <many lines of code>
478 }
479 dbus::utility::getSubTree("/xyz/openbmc_project/inventory", interfaces,
480                          std::bind_front(afterGetSubTree, asyncResp));
481 ```
482 
483 See also the [Cpp Core Guidelines][] for generalized guidelines on when lambdas
484 are appropriate. The above recommendation is aligned with the Cpp Core
485 Guidelines.
486 
487 [Cpp Core Guidelines]:
488   https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f11-use-an-unnamed-lambda-if-you-need-a-simple-function-object-in-one-place-only
489 
490 ## Placing internal headers in a parallel subtree
491 
492 ### Identification
493 
494 Declaring types and functions that do not participate in a public API of a
495 library in header files that do not live alongside their implementation in the
496 source tree.
497 
498 ### Description
499 
500 There's no reason to put internal headers in a parallel subtree (for example,
501 `include/`). It's more effort to organise, puts unnecessary distance between
502 declarations and definitions, and increases effort required in the build system
503 configuration.
504 
505 ### Background
506 
507 In C and C++, header files expose the public API of a library to its consumers
508 and need to be installed onto the developer's system in a location known to the
509 compiler. To delineate which header files are to be installed and which are not,
510 the public header files are often placed into an `include/` subdirectory of the
511 source tree to mark their importance.
512 
513 Any functions or structures that are implementation details of the library
514 should not be provided in its installed header files. Ignoring this philosphy
515 over-exposes the library's design and may lead to otherwise unnecessary API or
516 ABI breaks in the future.
517 
518 Further, projects whose artifacts are only application binaries have no public
519 API or ABI in the sense of a library. Any header files in the source tree of
520 such projects have no need to be installed onto a developer's system and
521 segregation in path from the implementation serves no purpose.
522 
523 ### Resolution
524 
525 Place internal header files immediately alongside source files containing the
526 corresponding implementation.
527 
528 ## Ill-defined data structuring in `lg2` message strings
529 
530 ### Identification
531 
532 Attempts at encoding information into the journal's MESSAGE string that is at
533 most only plausible to parse using a regex while also reducing human
534 readability. For example:
535 
536 ```
537 error(
538     "Error getting time, PATH={BMC_TIME_PATH} TIME INTERACE={TIME_INTF}  ERROR={ERR_EXCEP}",
539     "BMC_TIME_PATH", bmcTimePath, "TIME_INTF", timeInterface,
540     "ERR_EXCEP", e);
541 ```
542 
543 ### Description
544 
545 [`lg2` is OpenBMC's preferred C++ logging interface][phosphor-logging-lg2] and
546 is implemented on top of the systemd journal and its library APIs.
547 [systemd-journald provides structured logging][systemd-structured-logging],
548 which allows us to capture additional metadata alongside the provided message.
549 
550 [phosphor-logging-lg2]:
551   https://github.com/openbmc/phosphor-logging/blob/master/docs/structured-logging.md
552 [systemd-structured-logging]:
553   https://0pointer.de/blog/projects/journal-submit.html
554 
555 The concept of structured logging allows for convenient programmable access to
556 metadata associated with a log event. The journal captures a default set of
557 metadata with each message logged. However, the primary way the entries are
558 exposed to users is `journalctl`'s default behaviour of printing just the
559 `MESSAGE` value for each log entry. This may lead to a misunderstanding that the
560 printed message is the only way to expose related metadata for investigating
561 defects.
562 
563 For human ergonomics `lg2` provides the ability to interpolate structured data
564 into the journal's `MESSAGE` string. This aids with human inspection of the logs
565 as it becomes possible to highlight important metadata for a log event. However,
566 it's not intended that this interpolation be used for ad-hoc, ill-defined
567 attempts at exposing metadata for automated analysis.
568 
569 All key-value pairs provided to the `lg2` logging APIs are captured in the
570 structured log event, regardless of whether any particular key is interpolated
571 into the `MESSAGE` string. It is always possible to recover the information
572 associated with the log event even if it's not captured in the `MESSAGE` string.
573 
574 `phosphor-time-manager` demonstrates a reasonable use of the `lg2` APIs. One
575 logging instance in the code [is as follows][phosphor-time-manager-lg2]:
576 
577 [phosphor-time-manager-lg2]:
578   https://github.com/openbmc/phosphor-time-manager/blob/5ce9ac0e56440312997b25771507585905e8b360/manager.cpp#L98
579 
580 ```
581 info("Time mode has been changed to {MODE}", "MODE", newMode);
582 ```
583 
584 By default, this renders in the output of `journalctl` as:
585 
586 ```
587 Sep 23 06:09:57 bmc phosphor-time-manager[373]: Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP
588 ```
589 
590 However, we can use some journalctl commandline options to inspect the
591 structured data associated with the log entry:
592 
593 ```
594 # journalctl --identifier=phosphor-time-manager --boot --output=verbose | grep -v '^    _' | head -n 9
595 Sat 2023-09-23 06:09:57.645208 UTC [s=85c1cb5f8e02445aa110a5164c9c07f6;i=244;b=ffd111d3cdca41c8893bb728a1c6cb20;m=133a5a0;t=606009314d0d9;x=9a54e8714754a6cb]
596     PRIORITY=6
597     MESSAGE=Time mode has been changed to xyz.openbmc_project.Time.Synchronization.Method.NTP
598     LOG2_FMTMSG=Time mode has been changed to {MODE}
599     CODE_FILE=/usr/src/debug/phosphor-time-manager/1.0+git/manager.cpp
600     CODE_LINE=98
601     CODE_FUNC=bool phosphor::time::Manager::setCurrentTimeMode(const std::string&)
602     MODE=xyz.openbmc_project.Time.Synchronization.Method.NTP
603     SYSLOG_IDENTIFIER=phosphor-time-manager
604 ```
605 
606 Here we find that `MODE` and its value are captured as its own metadata entry in
607 the structured data, as well as being interpolated into `MESSAGE` as requested.
608 Additionally, from the log entry we can find _how_ `MODE` was interpolated into
609 `MESSAGE` using the format string captured in the `LOG2_FMTMSG` metadata entry.
610 
611 `LOG2_FMTMSG` also provides a stable handle for identifying the existence of a
612 specific class of log events in the journal, further aiding automated analysis.
613 
614 ### Background
615 
616 A variety of patches such as [PLDM:Catching exception precisely and printing
617 it][openbmc-gerrit-67994] added a number of ad-hoc, ill-defined attempts at
618 providing all the metadata through the `MESSAGE` entry.
619 
620 [openbmc-gerrit-67994]: https://gerrit.openbmc.org/c/openbmc/pldm/+/67994
621 
622 ### Resolution
623 
624 `lg2` messages should be formatted for consumption by humans. They should not
625 contain ad-hoc, ill-defined attempts at integrating metadata for automated
626 analysis.
627