xref: /openbmc/docs/designs/uart-mux-support.md (revision 8e6dbb4d)
1# uart-mux-support design
2
3Author: Alexander Hansen <alexander.hansen@9elements.com>
4
5Other contributors: Andrew Jeffery <andrew@codeconstruct.com.au> @arj, Jeremy
6Kerr <jk@ozlabs.org>, Patrick Williams <patrick@stwcx.xyz>
7
8Created: June 17, 2024
9
10## Problem Description
11
12Some hardware configurations feature a UART mux which can be switched via GPIOs.
13To support this configuration, obmc-console needs to provide a method for
14console selection to avoid manually setting GPIOs.
15
16## Background and References
17
18There are already [open changes for obmc-console][obmc-console-uart-mux-series]
19but it has been determined that this feature needs a design document.
20
21[obmc-console-uart-mux-series]:
22  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71864
23
24The background here is that there are some design choices which may affect other
25subprojects - not in the way of causing regression, but later when the mentioned
26hardware configuration needs to be supported in those projects.
27
28## Requirements
29
30- The user can select a console to be muxed
31
32- Platform policy (whichever service implements it) can select the appropriate
33  console depending on the host state and other information.
34
35- It is clear to whoever is reading the logs of that console when a console was
36  connected or disconnected via mux control. There should be no inexplicable
37  gaps in log files.
38
39- The mux configuration can be specified in a single file
40
41- Console selection (implies mux control) must be possible from an external
42  application.
43
44The scope of this change is obmc-console and other projects which rely on the
45APIs exposed by it.
46
47The change will not affect users who do not have this hardware configuration.
48
49## Design Considerations
50
51There are a number of choices available for adding mux support into
52obmc-console:
53
541. What the "connection endpoint" (Unix domain socket, D-Bus object) represents.
55   This could be either:
56
57   1. The TTY device exposed by Linux
58   2. The desired downstream mux port
59
602. How the mux state is controlled. We might control it by any of:
61
62   1. An out-of-band command (e.g. via a D-Bus method that's somehow associated
63      with the connection endpoint)
64   2. An in-band command (e.g. introducing an SSH-style escape-sequence)
65   3. Selecting the mux port based on the endpoint to which the user has
66      connected
67
683. The circumstances under which we allow the mux state to be changed
69
70   1. Active connections prevent the mux state from being changed
71   2. The mux state can always change but will terminate any existing
72      conflicting connections
73   3. The mux state can always change and has no impact on existing conflicting
74      connections
75
764. Whether we want the data stream on a given connection to represent:
77   1. The console IO regardless of the mux state
78   2. The console IO isolated to a specific mux port
79
80There are constraints on some combinations of these. For instance:
81
82- If the connection endpoint represents the TTY device exposed by Linux (1.1)
83  then we can't select the mux port based on the endpoint to which the user has
84  connected (2.3) as we simply don't have the information required
85
86- If the connection endpoint represents the desired downstream mux port (1.2)
87  then it doesn't make sense to implement support for an in-band command to
88  change the mux state (2.2) as it's a violation of the abstraction
89
90- If the connection endpoint represents the desired downstream mux port (1.2)
91  then it can't provide the console IO of another mux port (4.1) as that's
92  contrary to the definition.
93
94With these in mind we end up with the following table of design options:
95
96| ID  | Connection Endpoint (1) | Mux Control Defined By (2) | Mux Control Policy (3)                           | Stream Data (4)   |
97| --- | ----------------------- | -------------------------- | ------------------------------------------------ | ----------------- |
98| A   | TTY (1.1)               | Out-of-band command (2.1)  | Active connections prevent mux change (3.1)      | Isolated (4.2)    |
99| B   | TTY                     | Out-of-band command        | Mux change with disconnections (3.2)             | Isolated          |
100| C   | TTY                     | Out-of-band command        | Mux change without disconnections (3.3)          | Multiplexed (4.1) |
101| D   | TTY                     | In-band command (2.2)      | Mux change without disconnections                | Multiplexed       |
102| E   | Mux port (1.2)          | Connection-based (2.3)     | Conflicting connections prevent mux change (3.1) | Isolated          |
103| F   | Mux port                | Connection-based           | Mux change with disconnections                   | Isolated          |
104| G   | Mux port                | Connection-based           | Mux change without disconnections                | Isolated          |
105| H   | Mux port                | Out-of-band command        | Conflicting connections prevent mux change       | Isolated          |
106| I   | Mux port                | Out-of-band command        | Mux change with disconnections                   | Isolated          |
107| J   | Mux port                | Out-of-band command        | Mux change without disconnections                | Isolated          |
108
109### Scenarios and Use Cases
110
1111. A UART mux selecting between a satellite BMC on a blade and the blade host
112
113   A software update is in progress on the satellite BMC and the mux has been
114   switched to capture the output of whatever the satellite is printing. It is
115   important to log the output of the update process to understand any failures
116   that might result.
117
118   While the satellite BMC update is in progress, a user chooses to connect to
119   the host console.
120
1212. A blade's satellite BMC, CPLD and host are all on separate ports of a UART
122   mux, and relevant output from the blade's boot process must be captured
123
124   The boot process for a blade requires a sequence of actions across its
125   satellite BMC, CPLD and host. Each component contributes critical information
126   about the boot process, which is output on the respective consoles at various
127   points in time.
128
129   For ease of correlation, their output should be logged together.
130
131### Discussion
132
133Scenario 1 is problematic. It highlights the fundamental concern of ownership of
134the mux state. In the scenario the system is in a sensitive state where a
135specific mux configuration is required (to output update progress from the
136satellite BMC), but a user has shown intent for the selection of another (to
137interact with the host console).
138
139What should occur? And does this choice impact how we choose to control the mux?
140
141Taking a connection-based approach to setting the mux state (2.3) will cause the
142user connecting to the host console endpoint to immediately disrupt the update
143progress output from the satellite BMC.
144
145By contrast, by setting the mux state with an out-of-band command (2.1) and not
146on the initiation of a connection (2.3), the user connecting to the host console
147will not immediately disrupt the update progress output from the satellite BMC.
148
149However, we can presume the user is connecting to the host console endpoint for
150a reason. With extra actions, using the out-of-band command interface, they may
151equally choose to switch the mux without regard for the system state, disrupting
152the update progress output from the satellite BMC.
153
154This highlights that the fundamental problem is access to the system by multiple
155users who are neither coordinating with each other nor the system state. The
156question that follows is:
157
158Should it be the responsibility of obmc-console to coordinate otherwise
159un-coordinated users?
160
161This is a question of policy: How those users should be coordinated will likely
162look very different based on concerns such as the role of the platform in a
163larger system, the roles and needs of the users interacting with it, and the
164concrete design of the platform itself.
165
166obmc-console should implement a mechanism to control the mux state, but likely
167shouldn't apply any policy governing access to the muxed consoles.
168
169A further concern for the out-of-band command approach is its interactions with
170other components exposing consoles:
171
1721. The dropbear/obmc-console-client integration exposing consoles via SSH
1732. [bmcweb](https://github.com/openbmc/bmcweb/blob/master/include/obmc_console.hpp)
1743. [phosphor-net-ipmid](https://github.com/openbmc/phosphor-net-ipmid/blob/master/sol/sol_manager.hpp)
175
176With the out-of-band command approach these components have to choose between:
177
178- Not providing any capability to change the mux state; rather, they defer to
179  making the user log in via SSH to affect the change themselves
180
181- Expose some mechanism for setting the mux state in terms of their own external
182  interfaces
183
184- Assume that a user connecting to the exposed console endpoint wants to select
185  that console if it's behind a mux
186
187The first assumes that SSH is exposed at all and accessible by users who need
188access to the muxed consoles. It's not yet clear whether this is a reasonable
189expectation.
190
191The second assumes that these external interfaces have the capability to model
192the problem. It's not yet clear that this is the case for either of IPMI or
193Redfish, and it's not the case for serial over SSH.
194
195The third implies that we must add capability to all three components to drive
196the out-of-band command interface when they receive a connection for a given
197console. The net result is no behavioural difference from obmc-console
198implementing this itself (2.3), but increased complexity across the system.
199
200## Implementation Considerations
201
202### How are muxed consoles represented on D-Bus?
203
204Every console will have its own D-Bus name, as this is backwards-compatible with
205the current implementation.
206
207Multiple consoles can be represented as a split- or unified- object tree.
208
209### Tradeoffs of unified vs split object tree on D-Bus
210
211In split-tree, it is not clear which consoles all belong to one UART mux, but in
212unified-tree, this is clear.
213
214In unified-tree, one console is reachable via the D-Bus name of another,
215effectively creating multiple ways of doing something.
216
217Example:
218
219```
220busctl set-property xyz.openbmc_project.Console.host1 \
221/xyz/openbmc_project/console/host2 \
222xyz.openbmc_project.Console.Access Connect ""
223```
224
225So a choice has to be made how to represent multiple consoles on dbus, and what
226information needs to be exposed to other subprojects.
227
228Unified Tree:
229
230```
231busctl tree --user xyz.openbmc_project.Console.host1
232└─/xyz
233  └─/xyz/openbmc_project
234    └─/xyz/openbmc_project/console
235      ├─/xyz/openbmc_project/console/host1
236      └─/xyz/openbmc_project/console/host2
237```
238
239Split Tree:
240
241```
242busctl tree --user xyz.openbmc_project.Console.host1
243└─/xyz
244  └─/xyz/openbmc_project
245    └─/xyz/openbmc_project/console
246      └─/xyz/openbmc_project/console/host1
247
248busctl tree --user xyz.openbmc_project.Console.host2
249└─/xyz
250  └─/xyz/openbmc_project
251    └─/xyz/openbmc_project/console
252      └─/xyz/openbmc_project/console/host2
253```
254
255The choice of representation impacts how the mux can be described on D-Bus,
256which is necessary if the out-of-band command strategy (2.1) is chosen. Two
257possibilities for exposing an out-of-band mux control on D-Bus are:
258
2591. Implement an interface on each console object that defines a boolean `Active`
260   property, and an `Activate()` method. The `Activate()` method, by nature of
261   being implemented on the console object, has all the context it needs to
262   switch the mux without requiring caller-supplied parameters. The `Activate`
263   property is `true` when the mux is configured for the console of interest,
264   and `false` otherwise. A `PropertiesChanged` D-Bus signal for the `Active`
265   variable may alert local users to changes of mux state.
266
2672. Implement a `Mux` interface on an object common to all consoles exposed by
268   the mux. The `Mux` interface might have a writable string `Selected` property
269   that represents the state of the mux and provides a mechanism to switch it to
270   a given console.
271
272These have both been [discussed on an existing patch to
273phosphor-dbus-interfaces][pdi-uart-mux-control-interface].
274
275[pdi-uart-mux-control-interface]:
276  https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/71878/comment/dd34b099_66dbc49e/
277
278The second approach is quite explicit - directly representing the mux state
279makes it easy to discover the state of the system. However, it motivates the
280choice of a unified object tree to provide a common object path to host the
281`Mux` interface (e.g. at `/xyz/openbmc_project/console`). This is desired to
282avoid an alternative instance of the "multiple representations of one thing"
283problem highlighted in the discussion of claiming multiple bus names for the
284unified object tree: If the tree isn't unified, this `Mux` interface would have
285to be represented and synchronised on objects across multiple D-Bus connections.
286
287The first approach doesn't have this limitation. However, it does have the
288trade-off previously mentioned, that it's unclear how any of the consoles in the
289system are related, and what the impact might be of activating any one of them.
290
291Choosing a strategy for D-Bus representation is required if we add to the D-Bus
292API, i.e. with the out-of-band command design point (2.1). However, the choice
293becomes more of an implementation detail if either of design options 2.2 or 2.3
294are selected. The choice in those cases is instead motivated by the level of
295clarity we desire in describing the relationships between consoles.
296
297## Pruning the Design Decision Tree
298
299To help shape the choices here, we have the existing behaviours of obmc-console
300[discussed on the PDI patch][pdi-uart-mux-control-interface]:
301
3021. We already have support for concurrent console server instances
303
3042. Concurrent console support is implemented as one obmc-console-server process
305   per Linux TTY device
306
3073. As each Linux TTY device is paired with its obmc-console-server process, each
308   obmc-console-server D-Bus connection needs a unique name
309
3104. We use the unique console-ids to name global resources, including both the
311   D-Bus connection and the instance's unix domain socket.
312
313As in the linked discussion, given the `console-id` value really represents
314what's at the remote end of the BMC's TTY device for regular unmuxed consoles,
315it stands to reason that we should continue this strategy for muxed consoles.
316Taking this approach avoids adding a new endpoint ABI to obmc-console and
317eliminates design options A-D inclusive.
318
319Further, on the basis of frustrating behaviour in the face of lingering network
320connections, preventing mux changes on the grounds of an existing connection
321seems like a bad path forward.
322
323This leaves us with design options `F`, `G`, `I`, and `J`, which are
324differentiated by how the mux is switched, and its effect on already-connected
325clients.
326
327Concentrating on how the mux is switched, based on the discussion about the
328D-Bus representation above, the discussion on the PDI patch, and the impact on
329related applications, it's reasonable to say there are some complications with
330the out-of-band command method (2.1).
331
332By contrast we can consider the alternative: We make the mux state reflect the
333endpoint of the most recent connection. This has the benefit of functioning for
334both the Unix domain socket and D-Bus access with no further effort. Neither
335bmcweb nor phosphor-net-ipmid need be patched. The choice also eliminates the
336D-Bus complications mentioned above as there's no need for the additional D-Bus
337interface.
338
339This reasoning leaves us the choice of design options `F` and `G`.
340
341`F` and `G` are differentiated by whether or not we drop connections on
342endpoints that are not the endpoint selected by the mux. There's been some back
343and forth on that subject elsewhere[[1][drop-connections-discussion-1]]
344[[2][drop-connections-discussion-2]], but it seems that not disconnecting
345clients is effectively a worse implementation of design option `C`, which we've
346already eliminated. It's worse than `C` because instead of 1 connection we could
347have `N` connections for `N` mux ports, `(N - 1)` of which are idle. Not only
348that, but the `(N - 1)` connections are effectively zombies, as they have no way
349to switch the mux back to their associated port without establishing yet another
350connection. It follows that if we're establishing a subsequent connection in
351order to switch the mux we may as well disconnect the existing session, in which
352case it may as well have been disconnected when the mux switched away to begin
353with[^1].
354
355[drop-connections-discussion-1]:
356  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71228/comment/62a5fce9_60c3ad3e/
357[drop-connections-discussion-2]:
358  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71867/comment/756f0abe_5ebe8d66/
359
360[^1]: which also saves resources
361
362These arguments combined eliminate all but option `F`. It seems to sit at a neat
363nexus in terms of both existing ABI, desired behaviour, and implementation
364complexity.
365
366Addendum: Discussions so far have been are around a _minimal_ design that
367achieves the desired console behaviour. It's worth noting that design option `F`
368(connection-based mux control which disconnects conflicting clients) allows us
369to _optionally_ implement an out-of-band command interface in addition, because
370the observable behaviour is no different to a new connection being accepted:
371conflicting clients are disconnected and the mux is switched. This may be
372helpful to implement platform policy around logging.
373
374## Proposed Design
375
376It's proposed that we use one obmc-console-server process to expose the `N`
377consoles connected to a UART mux, where each console represents one mux port.
378The mux is switched based on the endpoint of the most recent client connection,
379and any conflicting clients are disconnected. This is design option `F` in the
380table above.
381
382The internal datastructures of obmc-console will change to accomodate the
383design.
384
385We will use one config file for the `N` muxed consoles. The configuration will
386provide a similar approach for specifying the mux GPIOs to that used by [the
387i2c-mux-gpio devicetree binding][linux-i2c-mux-gpio].
388
389[linux-i2c-mux-gpio]:
390  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/i2c/i2c-mux-gpio.yaml?h=v6.9#n12
391
392Below is a block diagram of the relationships between the software and hardware
393components:
394
395```
396                                          +--------------------+
397                                          | server.conf        |
398                                          +--------------------+
399                                               |
400                                               |
401                                               |
402                                               |
403                                          +----+----+                                 +-----+     +-------+
404                                          |         |                                 |     |     |       |
405                                          |         |     +-------+     +-------+     |     +-----+ UART1 |
406+-----------------------------------+     |         |     |       |     |       |     |     |     |       |
407| xyz.openbmc_project.Console.host1 +-----+         +-----+ ttyS0 +-----+ UART0 +-----+     |     +-------+
408+-----------------------------------+     |         |     |       |     |       |     |     |
409                                          |  obmc   |     +-------+     +-------+     |     |
410                                          | console |                                 | MUX |
411                                          | server  |                   +-------+     |     |
412+-----------------------------------+     |         |                   |       |     |     |
413| xyz.openbmc_project.Console.host2 +-----+         +-------------------+ GPIO  +-----+     |     +-------+
414+-----------------------------------+     |         |                   |       |     |     |     |       |
415                                          |         |                   +-------+     |     +-----+ UART2 |
416                                          |         |                                 |     |     |       |
417                                          +----+----+                                 +-----+     +-------+
418
419```
420
421To inform people who may be reading log files for a console, connection and
422disconnection events of a console via mux control will produce messages for
423clients and in log files.
424
425Requirements are:
426
427- Making it clear this message is from obmc-console
428- Timestamp
429- Indication of connected/disconnected
430
431These messages are not meant as an API or reliable means to get information
432about mux state. Any application on the other side of the uart could also
433produce the exact same messages, even if unlikely.
434
435The initial format of these messages will be something like:
436
437```
438[obmc-console] %Y-%m-%d %H:%M:%S UTC CONNECTED
439[obmc-console] %Y-%m-%d %H:%M:%S UTC DISCONNECTED
440```
441
442for the connect and disconnect case.
443
444For the D-Bus representation we choose the unified tree.
445
446## Other Alternatives Considered
447
448### Kernel implementation
449
450Did not do that since the support can be implemented in userspace. Also it may
451not be merged since the hardware configuration it supports may not be widely
452available. It may be better to have a userspace implementation to refer back to
453in case someone wants to do a kernel implementation later.
454
455### Multiple obmc-console-server processes for the multiple consoles
456
457This was considered and implemented is a PoC, but discarded later as it would be
458easier to synchronize everything in a single process.
459
460### Multiple configuration files for multiple consoles
461
462This was considered but it would duplicate configuration, like the definition of
463the mux GPIOs. Inconsistencies across the files would also need to be managed.
464
465## Impacts
466
467### API Impact
468
469### Performance Impact
470
471Minimal to none.
472
473### Developer Impact
474
475Minimal. Existing users do not need to change anything about their
476configuration.
477
478### Organizational
479
480- Does this repository require a new repository? No
481- Who will be the initial maintainer(s) of this repository?
482- Which repositories are expected to be modified to execute this design?
483  obmc-console, docs
484- Make a list, and add listed repository maintainers to the gerrit review.
485
486## Testing
487
488There are already integration tests for this feature available on gerrit.
489