xref: /openbmc/docs/anti-patterns.md (revision f4febd00)
1# OpenBMC Anti-patterns
2
3From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern):
4
5"An anti-pattern is a common response to a recurring problem that is usually
6ineffective and risks being highly counterproductive."
7
8The developers of OpenBMC do not get 100% of decisions right 100% of the time.
9That, combined with the fact that software development is often an exercise in
10copying and pasting, results in mistakes happening over and over again.
11
12This page aims to document some of the anti-patterns that exist in OpenBMC to
13ease the job of those reviewing code. If an anti-pattern is spotted, rather that
14repeating the same explanations over and over, a link to this document can be
15provided.
16
17<!-- begin copy/paste on next line -->
18
19## Anti-pattern template [one line description]
20
21### Identification
22
23(1 paragraph) Describe how to spot the anti-pattern.
24
25### Description
26
27(1 paragraph) Describe the negative effects of the anti-pattern.
28
29### Background
30
31(1 paragraph) Describe why the anti-pattern exists. If you don't know, try
32running git blame and look at who wrote the code originally, and ask them on the
33mailing list or in Discord what their original intent was, so it can be
34documented here (and you may possibly discover it isn't as much of an
35anti-pattern as you thought). If you are unable to determine why the
36anti-pattern exists, put: "Unknown" here.
37
38### Resolution
39
40(1 paragraph) Describe the preferred way to solve the problem solved by the
41anti-pattern and the positive effects of solving it in the manner described.
42
43<!-- end copy/paste on previous line -->
44
45## Custom ArgumentParser object
46
47### Identification
48
49The ArgumentParser object is typically present to wrap calls to get options. It
50abstracts away the parsing and provides a `[]` operator to access the
51parameters.
52
53### Description
54
55Writing a custom ArgumentParser object creates nearly duplicate code in a
56repository. The ArgumentParser itself is the same, however, the options provided
57differ. Writing a custom argument parser re-invents the wheel on c++ command
58line argument parsing.
59
60### Background
61
62The ArgumentParser exists because it was in place early and then copied into
63each new repository as an easy way to handle argument parsing.
64
65### Resolution
66
67The CLI11 library was designed and implemented specifically to support modern
68argument parsing. It handles the cases seen in OpenBMC daemons and has some
69handy built-in validators, and allows easy customizations to validation.
70
71## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure
72
73### Identification
74
75```
76PKG_CHECK_MODULES(
77    [PHOSPHOR_LOGGING],
78    [phosphor-logging],
79    [],
80    [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])])
81```
82
83### Description
84
85The autotools PKG_CHECK_MODULES macro provides the ability to specify an "if
86found" and "if not found" behavior. By default, the "if not found" behavior will
87list the package not found. In many cases, this is sufficient to a developer to
88know what package is missing. In most cases, it's another OpenBMC package.
89
90If the library sought's name isn't related to the package providing it, then the
91failure message should be set to something more useful to the developer.
92
93### Resolution
94
95Use the default macro behavior when it is clear that the missing package is
96another OpenBMC package.
97
98```
99PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging])
100```
101
102## Explicit listing of shared library packages in RDEPENDS in bitbake metadata
103
104### Identification
105
106```
107RDEPENDS_${PN} = "libsystemd"
108```
109
110### Description
111
112Out of the box bitbake examines built applications, automatically adds runtime
113dependencies and thus ensures any library packages dependencies are
114automatically added to images, sdks, etc. There is no need to list them
115explicitly in a recipe.
116
117Dependencies change over time, and listing them explicitly is likely prone to
118errors - the net effect being unnecessary shared library packages being
119installed into images.
120
121Consult
122https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS
123for information on when to use explicit runtime dependencies.
124
125### Background
126
127The initial bitbake metadata author for OpenBMC was not aware that bitbake added
128these dependencies automatically. Initial bitbake metadata therefore listed
129shared library dependencies explicitly, and was subsequently copy pasted.
130
131### Resolution
132
133Do not list shared library packages in RDEPENDS. This eliminates the possibility
134of installing unnecessary shared library packages due to unmaintained library
135dependency lists in bitbake metadata.
136
137## Use of /usr/bin/env in systemd service files
138
139### Identification
140
141In systemd unit files:
142
143```
144[Service]
145
146ExecStart=/usr/bin/env some-application
147```
148
149### Description
150
151Outside of OpenBMC, most applications that provide systemd unit files don't
152launch applications in this way. So if nothing else, this just looks strange and
153violates the
154[princple of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment).
155
156### Background
157
158This anti-pattern exists because a requirement exists to enable live patching of
159applications on read-only filesystems. Launching applications in this way was
160part of the implementation that satisfied the live patch requirement. For
161example:
162
163```
164/usr/bin/phosphor-hwmon
165```
166
167on a read-only filesystem becomes:
168
169```
170/usr/local/bin/phosphor-hwmon`
171```
172
173on a writeable /usr/local filesystem.
174
175### Resolution
176
177The /usr/bin/env method only enables live patching of applications. A method
178that supports live patching of any file in the read-only filesystem has emerged.
179Assuming a writeable filesystem exists _somewhere_ on the bmc, something like:
180
181```
182mkdir -p /var/persist/usr
183mkdir -p /var/persist/work/usr
184mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr
185```
186
187can enable live system patching without any additional requirements on how
188applications are launched from systemd service files. This is the preferred
189method for enabling live system patching as it allows OpenBMC developers to
190write systemd service files in the same way as most other projects.
191
192To undo existing instances of this anti-pattern remove /usr/bin/env from systemd
193service files and replace with the fully qualified path to the application being
194launched. For example, given an application in /usr/bin:
195
196```
197sed -i s,/usr/bin/env ,/usr/bin/, foo.service
198```
199
200## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin
201
202### Identification
203
204OpenBMC executables that are installed in `/usr/sbin`. `$sbindir` in bitbake
205metadata, makefiles or configure scripts. systemd service files pointing to
206`/bin` or `/usr/bin` executables.
207
208### Description
209
210Installing OpenBMC applications in incorrect locations violates the
211[principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment)
212and more importantly violates the
213[FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard).
214
215### Background
216
217There are typically two types of executables:
218
2191. Long-running daemons started by, for instance, systemd service files and
220   _NOT_ intended to be directly executed by users.
2212. Utilities intended to be used by a user as a CLI.
222
223Executables of type 1 should not be placed anywhere in `${PATH}` because it is
224confusing and error-prone to users, but should instead be placed in a
225`/usr/libexec/<package>` subdirectory.
226
227Executables of type 2 should be placed in `/usr/bin` because they are intended
228to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate as
229we transition to having non-root access).
230
231The sbin anti-pattern exists because the FHS was misinterpreted at an early
232point in OpenBMC project history, and has proliferated ever since.
233
234From the hier(7) man page:
235
236```
237/usr/bin This is the primary directory for executable programs.  Most programs
238executed by normal users which are not needed for booting or for repairing the
239system and which are not installed locally should be placed in this directory.
240
241/usr/sbin This directory contains program binaries for system administration
242which are not essential for the boot process, for mounting /usr, or for system
243repair.
244
245/usr/libexec Directory  contains  binaries for internal use only and they are
246not meant to be executed directly by users shell or scripts.
247```
248
249The FHS description for `/usr/sbin` refers to "system administration" but the
250de-facto interpretation of the system being administered refers to the OS
251installation and not a system in the OpenBMC sense of managed system. As such
252OpenBMC applications should be installed in `/usr/bin`.
253
254It is becoming common practice in Linux for daemons to now be moved to `libexec`
255and considered "internal use" from the perspective of the systemd service file
256that controls their launch.
257
258### Resolution
259
260Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate.
261
262## Handling unexpected error codes and exceptions
263
264### Identification
265
266The anti-pattern is for an application to continue processing after it
267encounters unexpected conditions in the form of error codes and exceptions and
268not capturing the data needed to resolve the problem.
269
270Example C++ code:
271
272```
273using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure;
274try
275{
276  ... use d-Bus APIs...
277}
278catch (InternalFailure& e)
279{
280    phosphor::logging::commit<InternalFailure>();
281}
282```
283
284### Description
285
286Suppressing unexpected errors can lead an application to incorrect or erratic
287behavior which can affect the service it provides and the overall system.
288
289Writing a log entry instead of a core dump may not give enough information to
290debug a truly unexpected problem, so developers do not get a chance to
291investigate problems and the system's reliability does not improve over time.
292
293### Background
294
295Programmers want their application to continue processing when it encounters
296conditions it can recover from. Sometimes they try too hard and continue when it
297is not appropriate.
298
299Programmers also want to log what is happening in the application, so they write
300log entries that give debug data when something goes wrong, typically targeted
301for their use. They don't consider how the log entry is consumed by the BMC
302administrator or automated service tools.
303
304The `InternalFailure` in the [Phosphor logging README][] is overused.
305
306[phosphor logging readme]:
307  https://github.com/openbmc/phosphor-logging/blob/master/README.md
308
309### Resolution
310
311Several items are needed:
312
3131. Check all places where a return code or errno value is given. Strongly
314   consider that a default error handler should throw an exception, for example
315   `std::system_error`.
3162. Have a good reason to handle specific error codes. See below.
3173. Have a good reason to handle specific exceptions. Allow other exceptions to
318   propagate.
3194. Document (in terms of impacts to other services) what happens when this
320   service fails, stops, or is restarted. Use that to inform the recovery
321   strategy.
322
323In the error handler:
324
3251. Consider what data (if any) should be logged. Determine who will consume the
326   log entry: BMC developers, administrator, or an analysis tool. Usually the
327   answer is more than one of these.
328
329   The following example log entries use an IPMI request to set network access
330   on, but the user input is invalid.
331
332   - BMC Developer: Reference internal applications, services, pids, etc. the
333     developer would be familiar with.
334
335     Example:
336     `ipmid service successfully processed a network setting packet, however the user input of USB0 is not a valid network interface to configure.`
337
338   - Administrator: Reference the external interfaces of the BMC such as the
339     REST API. They can respond to feedback about invalid input, or a need to
340     restart the BMC.
341
342     Example:
343     `The network interface of USB0 is not a valid option. Retry the IPMI command with a valid interface.`
344
345   - Analyzer tool: Consider breaking the log down and including several
346     properties which an analyzer can leverage. For instance, tagging the log
347     with 'Internal' is not helpful. However, breaking that down into something
348     like [UserInput][ipmi][Network] tells at a quick glance that the input
349     received for configuring the network via an IPMI command was invalid.
350     Categorization and system impact are key things to focus on when creating
351     logs for an analysis application.
352
353     Example:
354     `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not valid.`
355
3562. Determine if the application can fully recover from the condition. If not,
357   don't continue. Allow the system to determine if it writes a core dump or
358   restarts the service. If there are severe impacts when the service fails,
359   consider using a better error recovery mechanism.
360
361## Non-standard debug application options and logging
362
363### Identification
364
365An application uses non-standard methods on startup to indicate verbose logging
366and/or does not utilize standard systemd-journald debug levels for logging.
367
368### Description
369
370When debugging issues within OpenBMC that cross multiple applications, it's very
371difficult to enable the appropriate debug when different applications have
372different mechanisms for enabling debug. For example, different OpenBMC
373applications take the following as command line parameters to enable extra
374debug:
375
376- 0xff, --vv, -vv, -v, --verbose, <and more>
377
378Along these same lines, some applications then have their own internal methods
379of writing debug data. They use std::cout, printf, fprintf, ... Doing this
380causes other issues when trying to compare debug data across different
381applications that may be having their buffers flushed at different times (and
382picked up by journald).
383
384### Background
385
386Everyone has their own ways to debug. There was no real standard within OpenBMC
387on how to do it so everyone came up with whatever they were familiar with.
388
389### Resolution
390
391If an OpenBMC application is to support enhanced debug via a command line then
392it will support the standard "-v,--verbose" option.
393
394In general, OpenBMC developers should utilize the "debug" journald level for
395debug messages. This can be enabled/disabled globally and would apply to all
396applications. If a developer believes this would cause too much debug data in
397certain cases then they can protect these journald debug calls around a
398--verbose command line option.
399
400## DBus interface representing GPIOs
401
402### Identification
403
404Desire to expose a DBus interface to drive GPIOs, for example:
405
406- https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/
407- https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/
408- https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/
409
410### Description
411
412Platform functionality selected by GPIOs might equally be selected by other
413means with a shift in system design philosophy. As such, GPIOs are a (hardware)
414implementation detail. Exposing the implementation on DBus forces the
415distribution of platform design details across multiple applications, which
416violates the software design principle of [low coupling][coupling] and impacts
417our confidence in maintenance.
418
419[coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29
420
421### Background
422
423GPIOs are often used to select functionality that might be hard to generalise,
424and therefore hard to abstract. If this is the case, then the functionality in
425question is probably best wrapped up as an implementation detail of a behaviour
426that is more generally applicable (e.g. host power-on procedures).
427
428### Resolution
429
430Consider what functionality the GPIO provides and design or exploit an existing
431interface to expose that behaviour instead.
432