xref: /openbmc/docs/anti-patterns.md (revision e2430b15af7752582d5743bb7af655d1fd52305f)
1 # OpenBMC Anti-patterns
2 
3 From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern):
4 
5 
6 "An anti-pattern is a common response to a recurring problem that is usually
7 ineffective and risks being highly counterproductive."
8 
9 
10 The developers of OpenBMC do not get 100% of decisions right 100% of the time.
11 That, combined with the fact that software development is often an exercise in
12 copying and pasting, results in mistakes happening over and over again.
13 
14 
15 This page aims to document some of the anti-patterns that exist in OpenBMC to
16 ease the job of those reviewing code.  If an anti-pattern is spotted, rather
17 that repeating the same explanations over and over, a link to this document can
18 be provided.
19 
20 
21 <!-- begin copy/paste on next line -->
22 
23 ## Anti-pattern template [one line description]
24 
25 ### Identification
26 (1 paragraph) Describe how to spot the anti-pattern.
27 
28 ### Description
29 (1 paragraph) Describe the negative effects of the anti-pattern.
30 
31 ### Background
32 (1 paragraph) Describe why the anti-pattern exists.  If you don't know, try
33 running git blame and look at who wrote the code originally, and ask them on the
34 mailing list or in Discord what their original intent was, so it can be documented
35 here (and you may possibly discover it isn't as much of an anti-pattern as you
36 thought).  If you are unable to determine why the anti-pattern exists, put:
37 "Unknown" here.
38 
39 ### Resolution
40 (1 paragraph) Describe the preferred way to solve the problem solved by the
41 anti-pattern and the positive effects of solving it in the manner described.
42 
43 <!-- end copy/paste on previous line -->
44 
45 ## Custom ArgumentParser object
46 
47 ### Identification
48 
49 The ArgumentParser object is typically present to wrap calls to get options.
50 It abstracts away the parsing and provides a `[]` operator to access the
51 parameters.
52 
53 ### Description
54 
55 Writing a custom ArgumentParser object creates nearly duplicate code in a
56 repository.  The ArgumentParser itself is the same, however, the options
57 provided differ.  Writing a custom argument parser re-invents the wheel on
58 c++ command line argument parsing.
59 
60 ### Background
61 
62 The ArgumentParser exists because it was in place early and then copied into
63 each new repository as an easy way to handle argument parsing.
64 
65 ### Resolution
66 
67 The CLI11 library was designed and implemented specifically to support modern
68 argument parsing.  It handles the cases seen in OpenBMC daemons and has some
69 handy built-in validators, and allows easy customizations to validation.
70 
71 ## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure
72 
73 ### Identification
74 ```
75 PKG_CHECK_MODULES(
76     [PHOSPHOR_LOGGING],
77     [phosphor-logging],
78     [],
79     [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])])
80 ```
81 
82 ### Description
83 
84 The autotools PKG_CHECK_MODULES macro provides the ability to specify an
85 "if found" and "if not found" behavior.  By default, the "if not found"
86 behavior will list the package not found.  In many cases, this is sufficient
87 to a developer to know what package is missing.  In most cases, it's another
88 OpenBMC package.
89 
90 If the library sought's name isn't related to the package providing it, then
91 the failure message should be set to something more useful to the developer.
92 
93 ### Resolution
94 
95 Use the default macro behavior when it is clear that the missing package is
96 another OpenBMC package.
97 
98 ```
99 PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging])
100 ```
101 
102 ## Explicit listing of shared library packages in RDEPENDS in bitbake metadata
103 
104 ### Identification
105 ```
106 RDEPENDS_${PN} = "libsystemd"
107 ```
108 
109 ### Description
110 Out of the box bitbake examines built applications, automatically adds runtime
111 dependencies and thus ensures any library packages dependencies are
112 automatically added to images, sdks, etc.  There is no need to list them
113 explicitly in a recipe.
114 
115 Dependencies change over time, and listing them explicitly is likely prone to
116 errors - the net effect being unnecessary shared library packages being
117 installed into images.
118 
119 Consult
120 https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS
121 for information on when to use explicit runtime dependencies.
122 
123 ### Background
124 The initial bitbake metadata author for OpenBMC was not aware that bitbake
125 added these dependencies automatically.  Initial bitbake metadata therefore
126 listed shared library dependencies explicitly, and was subsequently copy pasted.
127 
128 ### Resolution
129 Do not list shared library packages in RDEPENDS.  This eliminates the
130 possibility of installing unnecessary shared library packages due to
131 unmaintained library dependency lists in bitbake metadata.
132 
133 ## Use of /usr/bin/env in systemd service files
134 
135 ### Identification
136 In systemd unit files:
137 ```
138 [Service]
139 
140 ExecStart=/usr/bin/env some-application
141 ```
142 
143 ### Description
144 Outside of OpenBMC, most applications that provide systemd unit files don't
145 launch applications in this way.  So if nothing else, this just looks strange
146 and violates the [princple of least
147 astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment).
148 
149 ### Background
150 This anti-pattern exists because a requirement exists to enable live patching of
151 applications on read-only filesystems.  Launching applications in this way was
152 part of the implementation that satisfied the live patch requirement.  For
153 example:
154 
155 ```
156 /usr/bin/phosphor-hwmon
157 ```
158 
159 on a read-only filesystem becomes:
160 
161 ```
162 /usr/local/bin/phosphor-hwmon`
163 ```
164 
165 on a writeable /usr/local filesystem.
166 
167 ### Resolution
168 The /usr/bin/env method only enables live patching of applications.  A method
169 that supports live patching of any file in the read-only filesystem has emerged.
170 Assuming a writeable filesystem exists _somewhere_ on the bmc, something like:
171 
172 ```
173 mkdir -p /var/persist/usr
174 mkdir -p /var/persist/work/usr
175 mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr
176 ```
177 can enable live system patching without any additional requirements on how
178 applications are launched from systemd service files.  This is the preferred
179 method for enabling live system patching as it allows OpenBMC developers to
180 write systemd service files in the same way as most other projects.
181 
182 To undo existing instances of this anti-pattern remove /usr/bin/env from systemd
183 service files and replace with the fully qualified path to the application being
184 launched.  For example, given an application in /usr/bin:
185 
186 ```
187 sed -i s,/usr/bin/env ,/usr/bin/, foo.service
188 ```
189 
190 ## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin
191 
192 ### Identification
193 OpenBMC executables that are installed in `/usr/sbin`.  `$sbindir` in bitbake
194 metadata, makefiles or configure scripts.  systemd service files pointing to
195 `/bin` or `/usr/bin` executables.
196 
197 ### Description
198 Installing OpenBMC applications in incorrect locations violates the
199 [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) and more importantly violates the
200 [FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard).
201 
202 ### Background
203 
204 There are typically two types of executables:
205 
206 1. Long-running daemons started by, for instance, systemd service files and
207    *NOT* intended to be directly executed by users.
208 2. Utilities intended to be used by a user as a CLI.
209 
210 Executables of type 1 should not be placed anywhere in `${PATH}` because it
211 is confusing and error-prone to users, but should instead be placed in a
212 `/usr/libexec/<package>` subdirectory.
213 
214 Executables of type 2 should be placed in `/usr/bin` because they are intended
215 to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate
216 as we transition to having non-root access).
217 
218 The sbin anti-pattern exists because the FHS was misinterpreted at an early
219 point in OpenBMC project history, and has proliferated ever since.
220 
221 From the hier(7) man page:
222 
223 ```
224 /usr/bin This is the primary directory for executable programs.  Most programs
225 executed by normal users which are not needed for booting or for repairing the
226 system and which are not installed locally should be placed in this directory.
227 
228 /usr/sbin This directory contains program binaries for system administration
229 which are not essential for the boot process, for mounting /usr, or for system
230 repair.
231 
232 /usr/libexec Directory  contains  binaries for internal use only and they are
233 not meant to be executed directly by users shell or scripts.
234 ```
235 
236 The FHS description for `/usr/sbin` refers to "system administration" but the
237 de-facto interpretation of the system being administered refers to the OS
238 installation and not a system in the OpenBMC sense of managed system.  As such
239 OpenBMC applications should be installed in `/usr/bin`.
240 
241 It is becoming common practice in Linux for daemons to now be moved to `libexec`
242 and considered "internal use" from the perspective of the systemd service file
243 that controls their launch.
244 
245 ### Resolution
246 Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate.
247 
248 ## Handling unexpected error codes and exceptions
249 
250 ### Identification
251 The anti-pattern is for an application to continue processing after it
252 encounters unexpected conditions in the form of error codes and exceptions and
253 not capturing the data needed to resolve the problem.
254 
255 Example C++ code:
256 ```
257 using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure;
258 try
259 {
260   ... use d-Bus APIs...
261 }
262 catch (InternalFailure& e)
263 {
264     phosphor::logging::commit<InternalFailure>();
265 }
266 ```
267 
268 ### Description
269 Suppressing unexpected errors can lead an application to incorrect or erratic
270 behavior which can affect the service it provides and the overall system.
271 
272 Writing a log entry instead of a core dump may not give enough information to
273 debug a truly unexpected problem, so developers do not get a chance to
274 investigate problems and the system's reliability does not improve over time.
275 
276 ### Background
277 Programmers want their application to continue processing when it encounters
278 conditions it can recover from.  Sometimes they try too hard and continue when
279 it is not appropriate.
280 
281 Programmers also want to log what is happening in the application, so they
282 write log entries that give debug data when something goes wrong, typically
283 targeted for their use.  They don't consider how the log entry is consumed
284 by the BMC administrator or automated service tools.
285 
286 The `InternalFailure` in the [Phosphor logging README][] is overused.
287 
288 [Phosphor logging README]: https://github.com/openbmc/phosphor-logging/blob/master/README.md
289 
290 ### Resolution
291 
292 Several items are needed:
293 1. Check all places where a return code or errno value is given.  Strongly
294    consider that a default error handler should throw an exception, for
295    example `std::system_error`.
296 2. Have a good reason to handle specific error codes.  See below.
297 3. Have a good reason to handle specific exceptions.  Allow other exceptions
298    to propagate.
299 4. Document (in terms of impacts to other services) what happens when this
300    service fails, stops, or is restarted.  Use that to inform the recovery
301    strategy.
302 
303 In the error handler:
304 
305 1. Consider what data (if any) should be logged.  Determine who will consume
306    the log entry: BMC developers, administrator, or an analysis tool.  Usually
307    the answer is more than one of these.
308 
309    The following example log entries use an IPMI request to set network access
310    on, but the user input is invalid.
311 
312     - BMC Developer: Reference internal applications, services, pids, etc.
313       the developer would be familiar with.
314 
315       Example: `ipmid service successfully processed a network setting packet,
316       however the user input of USB0 is not a valid network interface to
317       configure.`
318 
319     - Administrator: Reference the external interfaces of the BMC such as the
320       REST API.  They can respond to feedback about invalid input, or a need
321       to restart the BMC.
322 
323       Example: `The network interface of USB0 is not a valid option. Retry the
324       IPMI command with a valid interface.`
325 
326     - Analyzer tool: Consider breaking the log down and including several
327       properties which an analyzer can leverage.  For instance, tagging the
328       log with 'Internal' is not helpful.  However, breaking that down into
329       something like [UserInput][IPMI][Network] tells at a quick glance that
330       the input received for configuring the network via an IPMI command was
331       invalid.  Categorization and system impact are key things to focus on
332       when creating logs for an analysis application.
333 
334       Example: `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not
335       valid.`
336 
337 2. Determine if the application can fully recover from the condition.  If not,
338    don't continue.  Allow the system to determine if it writes a core dump or
339    restarts the service.  If there are severe impacts when the service fails,
340    consider using a better error recovery mechanism.
341 
342 ## Non-standard debug application options and logging
343 
344 ### Identification
345 An application uses non-standard methods on startup to indicate verbose
346 logging and/or does not utilize standard systemd-journald debug levels for
347 logging.
348 
349 ### Description
350 When debugging issues within OpenBMC that cross multiple applications, it's
351 very difficult to enable the appropriate debug when different applications
352 have different mechanisms for enabling debug. For example, different OpenBMC
353 applications take the following as command line parameters to enable extra
354 debug:
355 - 0xff, --vv, -vv, -v, --verbose, <and more>
356 
357 Along these same lines, some applications then have their own internal methods
358 of writing debug data. They use std::cout, printf, fprintf, ...
359 Doing this causes other issues when trying to compare debug data across
360 different applications that may be having their buffers flushed at different
361 times (and picked up by journald).
362 
363 ### Background
364 Everyone has their own ways to debug. There was no real standard within
365 OpenBMC on how to do it so everyone came up with whatever they were familiar
366 with.
367 
368 ### Resolution
369 If an OpenBMC application is to support enhanced debug via a command line then
370 it will support the standard "-v,--verbose" option.
371 
372 In general, OpenBMC developers should utilize the "debug" journald level for
373 debug messages. This can be enabled/disabled globally and would apply to
374 all applications. If a developer believes this would cause too much debug
375 data in certain cases then they can protect these journald debug calls around
376 a --verbose command line option.
377 
378 ## DBus interface representing GPIOs
379 
380 ### Identification
381 Desire to expose a DBus interface to drive GPIOs, for example:
382 
383 * https://lore.kernel.org/openbmc/YV21cD3HOOGi7K2f@heinlein/
384 * https://lore.kernel.org/openbmc/CAH2-KxBV9_0Dt79Quy0f4HkXXPdHfBw9FsG=4KwdWXBYNEA-ew@mail.gmail.com/
385 * https://lore.kernel.org/openbmc/YtPrcDzaxXiM6vYJ@heinlein.stwcx.org.github.beta.tailscale.net/
386 
387 ### Description
388 Platform functionality selected by GPIOs might equally be selected by other
389 means with a shift in system design philosophy. As such, GPIOs are a (hardware)
390 implementation detail. Exposing the implementation on DBus forces the
391 distribution of platform design details across multiple applications, which
392 violates the software design principle of [low coupling][coupling] and impacts
393 our confidence in maintenance.
394 
395 [coupling]: https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29
396 
397 ### Background
398 GPIOs are often used to select functionality that might be hard to generalise,
399 and therefore hard to abstract. If this is the case, then the functionality
400 in question is probably best wrapped up as an implementation detail of a
401 behaviour that is more generally applicable (e.g. host power-on procedures).
402 
403 ### Resolution
404 Consider what functionality the GPIO provides and design or exploit an existing
405 interface to expose that behaviour instead.
406