1# Checklist for making changes to `libpldm`
2
3## Philosophy and influences
4
5- [Good Practices in Library Design, Implementation, and Maintenance - Ulrich
6  Drepper][goodpractice]
7
8[goodpractice]: https://www.akkadia.org/drepper/goodpractice.pdf
9
10- [How Do I Make This Hard to Misuse? - Rusty Russell][rusty-api-scale-good]
11
12[rusty-api-scale-good]: https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
13
14- [What If I Don't Actually Like My Users? - Rusty Russell][rusty-api-scale-bad]
15
16[rusty-api-scale-bad]: https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
17
18- [Red flags that indicate questionable quality - Lennart
19  Poettering][poettering-library-red-flags]
20
21[poettering-library-red-flags]:
22  https://mastodon.social/@pid_eins/112517953375791453
23
24- [Not sure if this is a gcc bug or some weird corner of UB or what... - Andrew
25  Zonenberg][azonenberg-packed-struct]
26
27[azonenberg-packed-struct]: https://ioc.exchange/@azonenberg/112535511250395148
28
29- [The Good, the Bad, and the Weird - Trail of Bits
30  Blog][trail-of-bits-weird-machines]
31
32[trail-of-bits-weird-machines]:
33  https://blog.trailofbits.com/2018/10/26/the-good-the-bad-and-the-weird/
34
35- [Logic for Programmers - Hillel Wayne][logic-for-programmers]
36
37[logic-for-programmers]: https://leanpub.com/logic
38
39- [Parse, don’t validate - Alexis King][alexis-king-parse-dont-validate]
40
41[alexis-king-parse-dont-validate]:
42  https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
43
44- [The byte order fallacy - command center][command-center-byte-order-fallacy]
45
46[command-center-byte-order-fallacy]:
47  https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
48
49- [what’s the most annoying thing about writing C - Jessica
50  Paquette][barrelshifter-annoying-c]
51
52[barrelshifter-annoying-c]:
53  https://mastodon.social/@barrelshifter/114016810912730423
54
55- [The Power of Ten: Rules for Developing Safety Critical Code - NASA/JPL
56  Laboratory for Reliable Software][nasa-reliable-code]
57
58[nasa-reliable-code]:
59  https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.pdf
60
61- [C Isn't A Programming Language Anymore - Aria Desires][aria-c-is-a-protocol]
62
63[aria-c-is-a-protocol]: https://faultlore.com/blah/c-isnt-a-language/
64
65## References
66
67- [The C programming language - C Standards Committee][c-language]
68
69[c-language]: https://www.c-language.org/
70
71- [SEI CERT C Coding Standard][sei-cert-c-coding-standard]
72
73[sei-cert-c-coding-standard]:
74  https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard
75
76- [Common Weakness Enumeration (CWE) - Software
77  Development][common-weakness-enumeration-sw]
78
79[common-weakness-enumeration-sw]:
80  https://cwe.mitre.org/data/definitions/699.html
81
82## Definitions
83
84- **Error condition**: An invalid state reached at runtime, caused either by
85  resource exhaustion, or incorrect use of the library's public APIs and data
86  types.
87
88- **Invariant**: A condition in the library's implementation that must never
89  evaluate false.
90
91- **Public API**: Any definitions and declarations under `include/libpldm`.
92
93- **Wire format**: Any message structure defined in the DMTF PLDM protocol
94  specifications.
95
96## Elaborations
97
98- Resource exhaustion is always an error condition and never an invariant
99  violation.
100
101- An invariant violation is always a programming failure of the library's
102  implementation, and never the result of incorrect use of the library's public
103  APIs (see error condition).
104
105- Corollaries of the above two points:
106  - Incorrect use of public API functions is always an error condition, and is
107    dealt with by returning an error code.
108
109  - Incorrect use of static functions in the library's implementation is an
110    invariant violation which may be established using `assert()`.
111
112- `assert()` is the recommended way to demonstrate invariants are upheld.
113
114## Library background
115
116### The ABI lifecycle
117
118```mermaid
119---
120title: libpldm symbol lifecycle
121---
122stateDiagram-v2
123    direction LR
124    [*] --> Testing: Add
125    Testing --> Testing: Change
126    Testing --> [*]: Remove
127    Testing --> Stable: Stabilise
128    Stable --> Deprecated: Deprecate
129    Deprecated --> [*]: Remove
130```
131
132The ABI of the library produced by the build is controlled using the `abi` meson
133option. The following use cases determine how the `abi` option should be
134specified:
135
136| Use Case    | Meson Configuration               |
137| ----------- | --------------------------------- |
138| Production  | `-Dabi=deprecated,stable`         |
139| Maintenance | `-Dabi=stable`                    |
140| Development | `-Dabi=deprecated,stable,testing` |
141
142### Maintenance
143
144Applications and libraries that depend on `libpldm` can identify how to migrate
145off of deprecated APIs by constraining the library ABI to the stable category.
146This will force the compiler identify any call-sites that try to link against
147deprecated symbols.
148
149### Development
150
151Applications and libraries often require functionality that doesn't yet exist in
152`libpldm`. The work is thus in two parts:
153
1541. Add the required APIs to `libpldm`
1552. Use the new APIs from `libpldm` in the dependent application or library
156
157Adding APIs to a library is a difficult task. Generally, once an API is exposed
158in the library's ABI, any changes to the API risk breaking applications already
159making use of it. To make sure we have more than one shot at getting an API
160right, all new APIs must first be exposed in the testing category. Concretely:
161
162Patches adding new APIs MUST mark them as testing and MUST NOT mark them as
163stable.
164
165### Marking functions as testing, stable or deprecated
166
167Three macros are provided through `config.h` (automatically included for all
168translation units) to mark functions as testing, stable or deprecated:
169
1701. `LIBPLDM_ABI_TESTING`
1712. `LIBPLDM_ABI_STABLE`
1723. `LIBPLDM_ABI_DEPRECATED`
173
174These annotations go immediately before your function signature:
175
176```c
177LIBPLDM_ABI_TESTING
178pldm_requester_rc_t pldm_transport_send_msg(struct pldm_transport *transport,
179                                            pldm_tid_t tid,
180                                            const void *pldm_req_msg,
181                                            size_t req_msg_len)
182{
183    ...
184}
185```
186
187### What does it mean to mark a function as stable?
188
189Marking a function as stable makes the following promise to users of the
190library:
191
192> We will not remove or change the symbol name, argument count, argument types,
193> return type, or interpretation of relevant values for the function before
194> first marking it as `LIBPLDM_ABI_DEPRECATED` and then subsequently creating a
195> tagged release
196
197Marking a function as stable does _not_ promise that it is free of
198implementation bugs. It is just a promise that the prototype won't change
199without notice.
200
201Given this, it is always okay to implement functions marked stable in terms of
202functions marked testing inside of libpldm. If we remove or change the prototype
203of a function marked testing the only impact is that we need to fix up any call
204sites of that function in the same patch.
205
206### Requirements for stabilising a function
207
208To move a function from the testing category to the stable category, it's
209required that patches demonstrating use of the function in a dependent
210application or library be linked in the commit message of the stabilisation
211change. We require this to demonstrate that the implementer has considered its
212use in context _before_ preventing us from making changes to the API.
213
214### Building a dependent application or library against a testing ABI
215
216Meson is broadly used in the OpenBMC ecosystem, the historical home of
217`libpldm`. Meson's subprojects are a relatively painless way of managing
218dependencies for the purpose of developing complex applications and libraries.
219Use of `libpldm` as a subproject is both supported and encouraged.
220
221`libpldm`'s ABI can be controlled from a parent project through meson's
222subproject configuration syntax:
223
224```shell
225meson setup ... -Dlibpldm:abi=deprecated,stable,testing ...
226```
227
228## Adding a new API
229
230### Naming macros, functions and types
231
232- [ ] All publicly exposed macros, types and functions relating to the PLDM
233      specifications must be prefixed with either `pldm_` or `PLDM_` as
234      appropriate
235  - The only (temporary) exception are the `encode_*()` and `decode_*()`
236    function symbols
237
238- [ ] All publicly exposed macros, types and functions relating to the library
239      implementation must be prefixed with `libpldm_` or `LIBPLDM_`
240
241- [ ] All `pldm_`-prefixed symbols must also name the related specification. For
242      example, for DSP0248 Platform Monitoring and Control, the symbol prefix
243      should be `pldm_platform_`.
244
245- [ ] All enum members must be prefixed with the type name
246
247### API design
248
249- [ ] If I've added support for a new PLDM message type, then I've defined both
250      the encoder and decoder for that message.
251  - This applies for both request _and_ response message types.
252
253- [ ] I've designed my APIs so their implementation does not require heap
254      allocation.
255  - Prefer [defining iterators][libpldm-iterator] over the message buffer to
256    extract sub-structures from variable-length messages. Iterators avoid both
257    requiring heap allocation in the implementation or splitting the API to
258    allow the caller to allocate appropriate space. Instead, the caller is
259    provided with an on-stack struct containing the extracted sub-structure.
260
261[libpldm-iterator]:
262  https://github.com/openbmc/libpldm/commit/3a2c6589c5660d2066b612bae28ca393a8aa1c2b
263
264- [ ] My new public message codec functions take a `struct` representing the
265      message as a parameter
266  - Function prototypes must _not_ decompose the message to individual
267    parameters. This approach is not ergonomic and is difficult to make
268    type-safe. This is especially true for message decoding functions which must
269    use pointers for out-parameters, where it has often become ambiguous whether
270    the underlying memory represents a single object or an array.
271
272- [ ] Each new `struct` I've defined is used in at least one new function I've
273      added to the public API.
274
275- [ ] My new public `struct` definitions are _not_ marked
276      `__attribute__((packed))`
277
278- [ ] My new public `struct` definitions do _not_ define a flexible array
279      member, unless:
280  - [ ] It's contained in an `#ifndef __cplusplus` macro guard, as flexible
281        arrays are not specified by C++, and
282
283  - [ ] I've implemented an accessor function so the array base pointer can be
284        accessed from C++, and
285
286  - [ ] It is defined as per the C17 specification by omitting the length[^1]
287    - Note: Any array defined with length 1 is _not_ a flexible array, and any
288      access beyond the first element invokes undefined behaviour in both C and
289      C++.
290
291  - [ ] I've annotated the flexible array member with `LIBPLDM_CC_COUNTED_BY()`
292
293[^1]:
294    [C17 draft specification][c17-draft-standard], 6.7.2.1 Structure and union
295    specifiers, paragraph 18.
296
297[c17-draft-standard]:
298  https://web.archive.org/web/20181230041359/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
299
300### ABI control
301
302- [ ] My new function symbols are marked with `LIBPLDM_ABI_TESTING` in the
303      implementation
304
305- [ ] I've guarded the test cases of functions marked `LIBPLDM_ABI_TESTING` so
306      that they are not compiled when the corresponding function symbols aren't
307      visible
308
309### Error handling and invariants
310
311- [ ] All my error conditions are handled by returning an error code to the
312      caller.
313
314- [ ] All my invariants are tested using `assert()`.
315
316- [ ] I have not used `assert()` to evaluate any error conditions without also
317      handling the error condition by returning an error code the the caller.
318  - Release builds of the library are configured with `assert()` disabled
319    (`-Db_ndebug=if-release`, which provides `-DNDEBUG` in `CFLAGS`).
320
321- [ ] My new APIs return negative `errno` values on error and not PLDM
322      completion codes.
323  - [ ] The specific error values my function returns and their meaning in the
324        context of the function call are listed in the API documentation.
325
326### Implementation
327
328- [ ] If my work interacts with the PLDM wire format, then I have done so using
329      the `msgbuf` APIs found in `src/msgbuf.h` (and under `src/msgbuf/`) to
330      minimise concerns around spatial memory safety and endian-correctness.
331
332- [ ] I've used `goto` to clean up resources acquired prior to encountering an
333      error condition
334  - Replication of resource cleanup across multiple error paths is error-prone,
335    especially when multiple, dependent resources have been acquired.
336
337- [ ] I've released acquired resources in stack-order
338  - This should be the case regardless of whether we're in the happy path at the
339    end of object lifetime or an error path during construction.
340
341- [ ] I've declared variables in [reverse-christmas-tree (inverted pyramid)
342      order][hisham-make-pyramids] in any block scopes I've added or changed.
343
344[hisham-make-pyramids]:
345  https://web.archive.org/web/20220404224603/https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
346
347### Testing
348
349- [ ] I've implemented test cases with reasonable branch coverage of each new
350      function I've added
351
352### Maintenance
353
354- [ ] If I've added support for a new message type, then my commit message
355      specifies all of:
356  - [ ] The relevant DMTF specification by its DSP number and title
357  - [ ] The relevant version of the specification
358  - [ ] The section of the specification that defines the message type
359
360- [ ] If my work impacts the public API of the library, then I've added an entry
361      to `CHANGELOG.md` describing my work
362
363### OEM/vendor-specific APIs
364
365- [ ] I've documented the wire format for all OEM messages under
366      `docs/oem/${OEM_NAME}/`
367
368- [ ] I've added public OEM API declarations and definitions under
369      `include/libpldm/oem/${OEM_NAME}/`, and installed them to the same
370      relative location.
371
372- [ ] I've implemented the public OEM APIs under `src/oem/${OEM_NAME}/`
373
374- [ ] I've implemented the OEM API tests under `tests/oem/${OEM_NAME}/`
375
376The `${OEM_NAME}` folder must be created with the name of the OEM/vendor in
377lower case.
378
379Finally, the OEM name must be added to the list of choices for the `oem` meson
380option, and the `meson.build` files updated throughout the tree to guard
381integration of the OEM extensions.
382
383## Stabilising an existing API
384
385- [ ] The API of interest is currently marked `LIBPLDM_ABI_TESTING`
386
387- [ ] My commit message links to a publicly visible patch that makes use of the
388      API
389
390- [ ] My commit updates the annotation from `LIBPLDM_ABI_TESTING` to
391      `LIBPLDM_ABI_STABLE` only for the function symbols demonstrated by the
392      patch linked in the commit message.
393
394- [ ] I've removed guards from the function's tests so they are always compiled
395
396- [ ] If I've updated the ABI dump, then I've used the OpenBMC CI container to
397      do so.
398
399## Updating an ABI dump
400
401To update the ABI dump you'll need to build an appropriate OpenBMC CI container
402image of your own. Some hints on how to do this locally can be found [in the
403openbmc/docs repository][openbmc-docs-local-ci]. You can list your locally built
404images with `docker images`.
405
406[openbmc-docs-local-ci]:
407  https://github.com/openbmc/docs/blob/master/testing/local-ci-build.md
408
409Assuming:
410
411```shell
412export OPENBMC_CI_IMAGE=openbmc/ubuntu-unit-test:2024-W21-ce361f95ff4fa669
413```
414
415the ABI dump can be updated with:
416
417```shell
418docker run \
419  --cap-add=sys_admin \
420  --rm=true \
421  --privileged=true \
422  -u $USER \
423  -w $(pwd) \
424  -v $(pwd):$(pwd) \
425  -e MAKEFLAGS= \
426  -t $OPENBMC_CI_IMAGE \
427  ./scripts/abi-dump-updater
428```
429
430## Removing an API
431
432- [ ] If the function is marked `LIBPLDM_ABI_TESTING`, then I have removed it
433
434- [ ] If the function is marked `LIBPLDM_ABI_STABLE`, then I have changed the
435      annotation to `LIBPLDM_ABI_DEPRECATED` and left it in-place.
436  - [ ] I have updated the ABI dump, or will mark the change as WIP until it has
437        been.
438
439- [ ] If the function is marked `LIBPLDM_ABI_DEPRECATED`, then I have removed it
440      only after satisfying myself that each of the following is true:
441  - [ ] There are no known users of the function left in the community
442  - [ ] There has been at least one tagged release of `libpldm` subsequent to
443        the API being marked deprecated
444
445## Renaming an API
446
447A change to an API is a pure rename only if there are no additional behavioural
448changes. Renaming an API with no other behavioural changes is really two
449actions:
450
4511. Introducing the new API name
4522. Deprecating the old API name
453
454- [ ] Only the name of the function has changed. None of its behaviour has
455      changed.
456
457- [ ] Both the new and the old functions are declared in the public headers
458
459- [ ] I've aliased the old function name to the new function name via the
460      `libpldm_deprecated_aliases` list in `meson.build`
461
462- [ ] I've added a [semantic patch][coccinelle] to migrate users from the old
463      name to the new name
464
465[coccinelle]: https://coccinelle.gitlabpages.inria.fr/website/
466
467- [ ] I've updated the ABI dump to capture the rename, or will mark the change
468      as WIP until it has been.
469
470## Fixing Implementation Defects
471
472- [ ] My change fixing the bug includes a [Fixes tag][linux-kernel-fixes-tag]
473      identifying the change introducing the defect.
474
475[linux-kernel-fixes-tag]:
476  https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
477
478- [ ] My change fixing the bug includes test cases demonstrating that the bug is
479      fixed.
480