xref: /openbmc/docs/anti-patterns.md (revision 6ac6f3df)
1# OpenBMC Anti-patterns
2
3From [Wikipedia](https://en.wikipedia.org/wiki/Anti-pattern):
4
5
6"An anti-pattern is a common response to a recurring problem that is usually
7ineffective and risks being highly counterproductive."
8
9
10The developers of OpenBMC do not get 100% of decisions right 100% of the time.
11That, combined with the fact that software development is often an exercise in
12copying and pasting, results in mistakes happening over and over again.
13
14
15This page aims to document some of the anti-patterns that exist in OpenBMC to
16ease the job of those reviewing code.  If an anti-pattern is spotted, rather
17that repeating the same explanations over and over, a link to this document can
18be provided.
19
20
21<!-- begin copy/paste on next line -->
22
23## Anti-pattern template [one line description]
24
25### Identification
26(1 paragraph) Describe how to spot the anti-pattern.
27
28### Description
29(1 paragraph) Describe the negative effects of the anti-pattern.
30
31### Background
32(1 paragraph) Describe why the anti-pattern exists.  If you don't know, try
33running git blame and look at who wrote the code originally, and ask them on the
34mailing list or in Discord what their original intent was, so it can be documented
35here (and you may possibly discover it isn't as much of an anti-pattern as you
36thought).  If you are unable to determine why the anti-pattern exists, put:
37"Unknown" here.
38
39### Resolution
40(1 paragraph) Describe the preferred way to solve the problem solved by the
41anti-pattern and the positive effects of solving it in the manner described.
42
43<!-- end copy/paste on previous line -->
44
45## Custom ArgumentParser object
46
47### Identification
48
49The ArgumentParser object is typically present to wrap calls to get options.
50It abstracts away the parsing and provides a `[]` operator to access the
51parameters.
52
53### Description
54
55Writing a custom ArgumentParser object creates nearly duplicate code in a
56repository.  The ArgumentParser itself is the same, however, the options
57provided differ.  Writing a custom argument parser re-invents the wheel on
58c++ command line argument parsing.
59
60### Background
61
62The ArgumentParser exists because it was in place early and then copied into
63each new repository as an easy way to handle argument parsing.
64
65### Resolution
66
67The CLI11 library was designed and implemented specifically to support modern
68argument parsing.  It handles the cases seen in OpenBMC daemons and has some
69handy built-in validators, and allows easy customizations to validation.
70
71## Explicit AC_MSG_ERROR on PKG_CHECK_MODULES failure
72
73### Identification
74```
75PKG_CHECK_MODULES(
76    [PHOSPHOR_LOGGING],
77    [phosphor-logging],
78    [],
79    [AC_MSG_ERROR([Could not find phosphor-logging...openbmc/phosphor-logging package required])])
80```
81
82### Description
83
84The autotools PKG_CHECK_MODULES macro provides the ability to specify an
85"if found" and "if not found" behavior.  By default, the "if not found"
86behavior will list the package not found.  In many cases, this is sufficient
87to a developer to know what package is missing.  In most cases, it's another
88OpenBMC package.
89
90If the library sought's name isn't related to the package providing it, then
91the failure message should be set to something more useful to the developer.
92
93### Resolution
94
95Use the default macro behavior when it is clear that the missing package is
96another OpenBMC package.
97
98```
99PKG_CHECK_MODULES([PHOSPHOR_LOGGING], [phosphor-logging])
100```
101
102## Explicit listing of shared library packages in RDEPENDS in bitbake metadata
103
104### Identification
105```
106RDEPENDS_${PN} = "libsystemd"
107```
108
109### Description
110Out of the box bitbake examines built applications, automatically adds runtime
111dependencies and thus ensures any library packages dependencies are
112automatically added to images, sdks, etc.  There is no need to list them
113explicitly in a recipe.
114
115Dependencies change over time, and listing them explicitly is likely prone to
116errors - the net effect being unnecessary shared library packages being
117installed into images.
118
119Consult
120https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#var-RDEPENDS
121for information on when to use explicit runtime dependencies.
122
123### Background
124The initial bitbake metadata author for OpenBMC was not aware that bitbake
125added these dependencies automatically.  Initial bitbake metadata therefore
126listed shared library dependencies explicitly, and was subsequently copy pasted.
127
128### Resolution
129Do not list shared library packages in RDEPENDS.  This eliminates the
130possibility of installing unnecessary shared library packages due to
131unmaintained library dependency lists in bitbake metadata.
132
133## Use of /usr/bin/env in systemd service files
134
135### Identification
136In systemd unit files:
137```
138[Service]
139
140ExecStart=/usr/bin/env some-application
141```
142
143### Description
144Outside of OpenBMC, most applications that provide systemd unit files don't
145launch applications in this way.  So if nothing else, this just looks strange
146and violates the [princple of least
147astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment).
148
149### Background
150This anti-pattern exists because a requirement exists to enable live patching of
151applications on read-only filesystems.  Launching applications in this way was
152part of the implementation that satisfied the live patch requirement.  For
153example:
154
155```
156/usr/bin/phosphor-hwmon
157```
158
159on a read-only filesystem becomes:
160
161```
162/usr/local/bin/phosphor-hwmon`
163```
164
165on a writeable /usr/local filesystem.
166
167### Resolution
168The /usr/bin/env method only enables live patching of applications.  A method
169that supports live patching of any file in the read-only filesystem has emerged.
170Assuming a writeable filesystem exists _somewhere_ on the bmc, something like:
171
172```
173mkdir -p /var/persist/usr
174mkdir -p /var/persist/work/usr
175mount -t overlay -o lowerdir=/usr,upperdir=/var/persist/usr,workdir=/var/persist/work/usr overlay /usr
176```
177can enable live system patching without any additional requirements on how
178applications are launched from systemd service files.  This is the preferred
179method for enabling live system patching as it allows OpenBMC developers to
180write systemd service files in the same way as most other projects.
181
182To undo existing instances of this anti-pattern remove /usr/bin/env from systemd
183service files and replace with the fully qualified path to the application being
184launched.  For example, given an application in /usr/bin:
185
186```
187sed -i s,/usr/bin/env ,/usr/bin/, foo.service
188```
189
190## Incorrect placement of executables in /sbin, /usr/sbin or /bin, /usr/bin
191
192### Identification
193OpenBMC executables that are installed in `/usr/sbin`.  `$sbindir` in bitbake
194metadata, makefiles or configure scripts.  systemd service files pointing to
195`/bin` or `/usr/bin` executables.
196
197### Description
198Installing OpenBMC applications in incorrect locations violates the
199[principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) and more importantly violates the
200[FHS](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard).
201
202### Background
203
204There are typically two types of executables:
205
2061. Long-running daemons started by, for instance, systemd service files and
207   *NOT* intended to be directly executed by users.
2082. Utilities intended to be used by a user as a CLI.
209
210Executables of type 1 should not be placed anywhere in `${PATH}` because it
211is confusing and error-prone to users, but should instead be placed in a
212`/usr/libexec/<package>` subdirectory.
213
214Executables of type 2 should be placed in `/usr/bin` because they are intended
215to be used by users and should be in `${PATH}` (also, `sbin` is inappropriate
216as we transition to having non-root access).
217
218The sbin anti-pattern exists because the FHS was misinterpreted at an early
219point in OpenBMC project history, and has proliferated ever since.
220
221From the hier(7) man page:
222
223```
224/usr/bin This is the primary directory for executable programs.  Most programs
225executed by normal users which are not needed for booting or for repairing the
226system and which are not installed locally should be placed in this directory.
227
228/usr/sbin This directory contains program binaries for system administration
229which are not essential for the boot process, for mounting /usr, or for system
230repair.
231
232/usr/libexec Directory  contains  binaries for internal use only and they are
233not meant to be executed directly by users shell or scripts.
234```
235
236The FHS description for `/usr/sbin` refers to "system administration" but the
237de-facto interpretation of the system being administered refers to the OS
238installation and not a system in the OpenBMC sense of managed system.  As such
239OpenBMC applications should be installed in `/usr/bin`.
240
241It is becoming common practice in Linux for daemons to now be moved to `libexec`
242and considered "internal use" from the perspective of the systemd service file
243that controls their launch.
244
245### Resolution
246Install OpenBMC applications in `/usr/libexec` or `/usr/bin/` as appropriate.
247
248## Handling unexpected error codes and exceptions
249
250### Identification
251The anti-pattern is for an application to continue processing after it
252encounters unexpected conditions in the form of error codes and exceptions and
253not capturing the data needed to resolve the problem.
254
255Example C++ code:
256```
257using InternalFailure = sdbusplus::xyz::openbmc_project::Common::Error::InternalFailure;
258try
259{
260  ... use d-Bus APIs...
261}
262catch (InternalFailure& e)
263{
264    phosphor::logging::commit<InternalFailure>();
265}
266```
267
268### Description
269Suppressing unexpected errors can lead an application to incorrect or erratic
270behavior which can affect the service it provides and the overall system.
271
272Writing a log entry instead of a core dump may not give enough information to
273debug a truly unexpected problem, so developers do not get a chance to
274investigate problems and the system's reliability does not improve over time.
275
276### Background
277Programmers want their application to continue processing when it encounters
278conditions it can recover from.  Sometimes they try too hard and continue when
279it is not appropriate.
280
281Programmers also want to log what is happening in the application, so they
282write log entries that give debug data when something goes wrong, typically
283targeted for their use.  They don't consider how the log entry is consumed
284by the BMC administrator or automated service tools.
285
286The `InternalFailure` in the [Phosphor logging README][] is overused.
287
288[Phosphor logging README]: https://github.com/openbmc/phosphor-logging/blob/master/README.md
289
290### Resolution
291
292Several items are needed:
2931. Check all places where a return code or errno value is given.  Strongly
294   consider that a default error handler should throw an exception, for
295   example `std::system_error`.
2962. Have a good reason to handle specific error codes.  See below.
2973. Have a good reason to handle specific exceptions.  Allow other exceptions
298   to propagate.
2994. Document (in terms of impacts to other services) what happens when this
300   service fails, stops, or is restarted.  Use that to inform the recovery
301   strategy.
302
303In the error handler:
304
3051. Consider what data (if any) should be logged.  Determine who will consume
306   the log entry: BMC developers, administrator, or an analysis tool.  Usually
307   the answer is more than one of these.
308
309   The following example log entries use an IPMI request to set network access
310   on, but the user input is invalid.
311
312    - BMC Developer: Reference internal applications, services, pids, etc.
313      the developer would be familiar with.
314
315      Example: `ipmid service successfully processed a network setting packet,
316      however the user input of USB0 is not a valid network interface to
317      configure.`
318
319    - Administrator: Reference the external interfaces of the BMC such as the
320      REST API.  They can respond to feedback about invalid input, or a need
321      to restart the BMC.
322
323      Example: `The network interface of USB0 is not a valid option. Retry the
324      IPMI command with a valid interface.`
325
326    - Analyzer tool: Consider breaking the log down and including several
327      properties which an analyzer can leverage.  For instance, tagging the
328      log with 'Internal' is not helpful.  However, breaking that down into
329      something like [UserInput][IPMI][Network] tells at a quick glance that
330      the input received for configuring the network via an IPMI command was
331      invalid.  Categorization and system impact are key things to focus on
332      when creating logs for an analysis application.
333
334      Example: `[UserInput][IPMI][Network][Config][Warning] Interface USB0 not
335      valid.`
336
3372. Determine if the application can fully recover from the condition.  If not,
338   don't continue.  Allow the system to determine if it writes a core dump or
339   restarts the service.  If there are severe impacts when the service fails,
340   consider using a better error recovery mechanism.
341
342## Non-standard debug application options and logging
343
344### Identification
345An application uses non-standard methods on startup to indicate verbose
346logging and/or does not utilize standard systemd-journald debug levels for
347logging.
348
349### Description
350When debugging issues within OpenBMC that cross multiple applications, it's
351very difficult to enable the appropriate debug when different applications
352have different mechanisms for enabling debug. For example, different OpenBMC
353applications take the following as command line parameters to enable extra
354debug:
355- 0xff, --vv, -vv, -v, --verbose, <and more>
356
357Along these same lines, some applications then have their own internal methods
358of writing debug data. They use std::cout, printf, fprintf, ...
359Doing this causes other issues when trying to compare debug data across
360different applications that may be having their buffers flushed at different
361times (and picked up by journald).
362
363### Background
364Everyone has their own ways to debug. There was no real standard within
365OpenBMC on how to do it so everyone came up with whatever they were familiar
366with.
367
368### Resolution
369If an OpenBMC application is to support enhanced debug via a command line then
370it will support the standard "-v,--verbose" option.
371
372In general, OpenBMC developers should utilize the "debug" journald level for
373debug messages. This can be enabled/disabled globally and would apply to
374all applications. If a developer believes this would cause too much debug
375data in certain cases then they can protect these journald debug calls around
376a --verbose command line option.
377