# uart-mux-support design

Author: Alexander Hansen <alexander.hansen@9elements.com>

Other contributors: Andrew Jeffery <andrew@codeconstruct.com.au> @arj, Jeremy
Kerr <jk@ozlabs.org>, Patrick Williams <patrick@stwcx.xyz>

Created: June 17, 2024

## Problem Description

Some hardware configurations feature a UART mux which can be switched via GPIOs.
To support this configuration, obmc-console needs to provide a method for
console selection to avoid manually setting GPIOs.

## Background and References

There are already [open changes for obmc-console][obmc-console-uart-mux-series]
but it has been determined that this feature needs a design document.

[obmc-console-uart-mux-series]:
  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71864

The background here is that there are some design choices which may affect other
subprojects - not in the way of causing regression, but later when the mentioned
hardware configuration needs to be supported in those projects.

## Requirements

- The user can select a console to be muxed

- Platform policy (whichever service implements it) can select the appropriate
  console depending on the host state and other information.

- It is clear to whoever is reading the logs of that console when a console was
  connected or disconnected via mux control. There should be no inexplicable
  gaps in log files.

- The mux configuration can be specified in a single file

- Console selection (implies mux control) must be possible from an external
  application.

The scope of this change is obmc-console and other projects which rely on the
APIs exposed by it.

The change will not affect users who do not have this hardware configuration.

## Design Considerations

There are a number of choices available for adding mux support into
obmc-console:

1. What the "connection endpoint" (Unix domain socket, D-Bus object) represents.
   This could be either:

   1. The TTY device exposed by Linux
   2. The desired downstream mux port

2. How the mux state is controlled. We might control it by any of:

   1. An out-of-band command (e.g. via a D-Bus method that's somehow associated
      with the connection endpoint)
   2. An in-band command (e.g. introducing an SSH-style escape-sequence)
   3. Selecting the mux port based on the endpoint to which the user has
      connected

3. The circumstances under which we allow the mux state to be changed

   1. Active connections prevent the mux state from being changed
   2. The mux state can always change but will terminate any existing
      conflicting connections
   3. The mux state can always change and has no impact on existing conflicting
      connections

4. Whether we want the data stream on a given connection to represent:
   1. The console IO regardless of the mux state
   2. The console IO isolated to a specific mux port

There are constraints on some combinations of these. For instance:

- If the connection endpoint represents the TTY device exposed by Linux (1.1)
  then we can't select the mux port based on the endpoint to which the user has
  connected (2.3) as we simply don't have the information required

- If the connection endpoint represents the desired downstream mux port (1.2)
  then it doesn't make sense to implement support for an in-band command to
  change the mux state (2.2) as it's a violation of the abstraction

- If the connection endpoint represents the desired downstream mux port (1.2)
  then it can't provide the console IO of another mux port (4.1) as that's
  contrary to the definition.

With these in mind we end up with the following table of design options:

| ID  | Connection Endpoint (1) | Mux Control Defined By (2) | Mux Control Policy (3)                           | Stream Data (4)   |
| --- | ----------------------- | -------------------------- | ------------------------------------------------ | ----------------- |
| A   | TTY (1.1)               | Out-of-band command (2.1)  | Active connections prevent mux change (3.1)      | Isolated (4.2)    |
| B   | TTY                     | Out-of-band command        | Mux change with disconnections (3.2)             | Isolated          |
| C   | TTY                     | Out-of-band command        | Mux change without disconnections (3.3)          | Multiplexed (4.1) |
| D   | TTY                     | In-band command (2.2)      | Mux change without disconnections                | Multiplexed       |
| E   | Mux port (1.2)          | Connection-based (2.3)     | Conflicting connections prevent mux change (3.1) | Isolated          |
| F   | Mux port                | Connection-based           | Mux change with disconnections                   | Isolated          |
| G   | Mux port                | Connection-based           | Mux change without disconnections                | Isolated          |
| H   | Mux port                | Out-of-band command        | Conflicting connections prevent mux change       | Isolated          |
| I   | Mux port                | Out-of-band command        | Mux change with disconnections                   | Isolated          |
| J   | Mux port                | Out-of-band command        | Mux change without disconnections                | Isolated          |

### Scenarios and Use Cases

1. A UART mux selecting between a satellite BMC on a blade and the blade host

   A software update is in progress on the satellite BMC and the mux has been
   switched to capture the output of whatever the satellite is printing. It is
   important to log the output of the update process to understand any failures
   that might result.

   While the satellite BMC update is in progress, a user chooses to connect to
   the host console.

2. A blade's satellite BMC, CPLD and host are all on separate ports of a UART
   mux, and relevant output from the blade's boot process must be captured

   The boot process for a blade requires a sequence of actions across its
   satellite BMC, CPLD and host. Each component contributes critical information
   about the boot process, which is output on the respective consoles at various
   points in time.

   For ease of correlation, their output should be logged together.

### Discussion

Scenario 1 is problematic. It highlights the fundamental concern of ownership of
the mux state. In the scenario the system is in a sensitive state where a
specific mux configuration is required (to output update progress from the
satellite BMC), but a user has shown intent for the selection of another (to
interact with the host console).

What should occur? And does this choice impact how we choose to control the mux?

Taking a connection-based approach to setting the mux state (2.3) will cause the
user connecting to the host console endpoint to immediately disrupt the update
progress output from the satellite BMC.

By contrast, by setting the mux state with an out-of-band command (2.1) and not
on the initiation of a connection (2.3), the user connecting to the host console
will not immediately disrupt the update progress output from the satellite BMC.

However, we can presume the user is connecting to the host console endpoint for
a reason. With extra actions, using the out-of-band command interface, they may
equally choose to switch the mux without regard for the system state, disrupting
the update progress output from the satellite BMC.

This highlights that the fundamental problem is access to the system by multiple
users who are neither coordinating with each other nor the system state. The
question that follows is:

Should it be the responsibility of obmc-console to coordinate otherwise
un-coordinated users?

This is a question of policy: How those users should be coordinated will likely
look very different based on concerns such as the role of the platform in a
larger system, the roles and needs of the users interacting with it, and the
concrete design of the platform itself.

obmc-console should implement a mechanism to control the mux state, but likely
shouldn't apply any policy governing access to the muxed consoles.

A further concern for the out-of-band command approach is its interactions with
other components exposing consoles:

1. The dropbear/obmc-console-client integration exposing consoles via SSH
2. [bmcweb](https://github.com/openbmc/bmcweb/blob/master/include/obmc_console.hpp)
3. [phosphor-net-ipmid](https://github.com/openbmc/phosphor-net-ipmid/blob/master/sol/sol_manager.hpp)

With the out-of-band command approach these components have to choose between:

- Not providing any capability to change the mux state; rather, they defer to
  making the user log in via SSH to affect the change themselves

- Expose some mechanism for setting the mux state in terms of their own external
  interfaces

- Assume that a user connecting to the exposed console endpoint wants to select
  that console if it's behind a mux

The first assumes that SSH is exposed at all and accessible by users who need
access to the muxed consoles. It's not yet clear whether this is a reasonable
expectation.

The second assumes that these external interfaces have the capability to model
the problem. It's not yet clear that this is the case for either of IPMI or
Redfish, and it's not the case for serial over SSH.

The third implies that we must add capability to all three components to drive
the out-of-band command interface when they receive a connection for a given
console. The net result is no behavioural difference from obmc-console
implementing this itself (2.3), but increased complexity across the system.

## Implementation Considerations

### How are muxed consoles represented on D-Bus?

Every console will have its own D-Bus name, as this is backwards-compatible with
the current implementation.

Multiple consoles can be represented as a split- or unified- object tree.

### Tradeoffs of unified vs split object tree on D-Bus

In split-tree, it is not clear which consoles all belong to one UART mux, but in
unified-tree, this is clear.

In unified-tree, one console is reachable via the D-Bus name of another,
effectively creating multiple ways of doing something.

Example:

```
busctl set-property xyz.openbmc_project.Console.host1 \
/xyz/openbmc_project/console/host2 \
xyz.openbmc_project.Console.Access Connect ""
```

So a choice has to be made how to represent multiple consoles on dbus, and what
information needs to be exposed to other subprojects.

Unified Tree:

```
busctl tree --user xyz.openbmc_project.Console.host1
└─/xyz
  └─/xyz/openbmc_project
    └─/xyz/openbmc_project/console
      ├─/xyz/openbmc_project/console/host1
      └─/xyz/openbmc_project/console/host2
```

Split Tree:

```
busctl tree --user xyz.openbmc_project.Console.host1
└─/xyz
  └─/xyz/openbmc_project
    └─/xyz/openbmc_project/console
      └─/xyz/openbmc_project/console/host1

busctl tree --user xyz.openbmc_project.Console.host2
└─/xyz
  └─/xyz/openbmc_project
    └─/xyz/openbmc_project/console
      └─/xyz/openbmc_project/console/host2
```

The choice of representation impacts how the mux can be described on D-Bus,
which is necessary if the out-of-band command strategy (2.1) is chosen. Two
possibilities for exposing an out-of-band mux control on D-Bus are:

1. Implement an interface on each console object that defines a boolean `Active`
   property, and an `Activate()` method. The `Activate()` method, by nature of
   being implemented on the console object, has all the context it needs to
   switch the mux without requiring caller-supplied parameters. The `Activate`
   property is `true` when the mux is configured for the console of interest,
   and `false` otherwise. A `PropertiesChanged` D-Bus signal for the `Active`
   variable may alert local users to changes of mux state.

2. Implement a `Mux` interface on an object common to all consoles exposed by
   the mux. The `Mux` interface might have a writable string `Selected` property
   that represents the state of the mux and provides a mechanism to switch it to
   a given console.

These have both been [discussed on an existing patch to
phosphor-dbus-interfaces][pdi-uart-mux-control-interface].

[pdi-uart-mux-control-interface]:
  https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/71878/comment/dd34b099_66dbc49e/

The second approach is quite explicit - directly representing the mux state
makes it easy to discover the state of the system. However, it motivates the
choice of a unified object tree to provide a common object path to host the
`Mux` interface (e.g. at `/xyz/openbmc_project/console`). This is desired to
avoid an alternative instance of the "multiple representations of one thing"
problem highlighted in the discussion of claiming multiple bus names for the
unified object tree: If the tree isn't unified, this `Mux` interface would have
to be represented and synchronised on objects across multiple D-Bus connections.

The first approach doesn't have this limitation. However, it does have the
trade-off previously mentioned, that it's unclear how any of the consoles in the
system are related, and what the impact might be of activating any one of them.

Choosing a strategy for D-Bus representation is required if we add to the D-Bus
API, i.e. with the out-of-band command design point (2.1). However, the choice
becomes more of an implementation detail if either of design options 2.2 or 2.3
are selected. The choice in those cases is instead motivated by the level of
clarity we desire in describing the relationships between consoles.

## Pruning the Design Decision Tree

To help shape the choices here, we have the existing behaviours of obmc-console
[discussed on the PDI patch][pdi-uart-mux-control-interface]:

1. We already have support for concurrent console server instances

2. Concurrent console support is implemented as one obmc-console-server process
   per Linux TTY device

3. As each Linux TTY device is paired with its obmc-console-server process, each
   obmc-console-server D-Bus connection needs a unique name

4. We use the unique console-ids to name global resources, including both the
   D-Bus connection and the instance's unix domain socket.

As in the linked discussion, given the `console-id` value really represents
what's at the remote end of the BMC's TTY device for regular unmuxed consoles,
it stands to reason that we should continue this strategy for muxed consoles.
Taking this approach avoids adding a new endpoint ABI to obmc-console and
eliminates design options A-D inclusive.

Further, on the basis of frustrating behaviour in the face of lingering network
connections, preventing mux changes on the grounds of an existing connection
seems like a bad path forward.

This leaves us with design options `F`, `G`, `I`, and `J`, which are
differentiated by how the mux is switched, and its effect on already-connected
clients.

Concentrating on how the mux is switched, based on the discussion about the
D-Bus representation above, the discussion on the PDI patch, and the impact on
related applications, it's reasonable to say there are some complications with
the out-of-band command method (2.1).

By contrast we can consider the alternative: We make the mux state reflect the
endpoint of the most recent connection. This has the benefit of functioning for
both the Unix domain socket and D-Bus access with no further effort. Neither
bmcweb nor phosphor-net-ipmid need be patched. The choice also eliminates the
D-Bus complications mentioned above as there's no need for the additional D-Bus
interface.

This reasoning leaves us the choice of design options `F` and `G`.

`F` and `G` are differentiated by whether or not we drop connections on
endpoints that are not the endpoint selected by the mux. There's been some back
and forth on that subject elsewhere[[1][drop-connections-discussion-1]]
[[2][drop-connections-discussion-2]], but it seems that not disconnecting
clients is effectively a worse implementation of design option `C`, which we've
already eliminated. It's worse than `C` because instead of 1 connection we could
have `N` connections for `N` mux ports, `(N - 1)` of which are idle. Not only
that, but the `(N - 1)` connections are effectively zombies, as they have no way
to switch the mux back to their associated port without establishing yet another
connection. It follows that if we're establishing a subsequent connection in
order to switch the mux we may as well disconnect the existing session, in which
case it may as well have been disconnected when the mux switched away to begin
with[^1].

[drop-connections-discussion-1]:
  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71228/comment/62a5fce9_60c3ad3e/
[drop-connections-discussion-2]:
  https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71867/comment/756f0abe_5ebe8d66/

[^1]: which also saves resources

These arguments combined eliminate all but option `F`. It seems to sit at a neat
nexus in terms of both existing ABI, desired behaviour, and implementation
complexity.

Addendum: Discussions so far have been are around a _minimal_ design that
achieves the desired console behaviour. It's worth noting that design option `F`
(connection-based mux control which disconnects conflicting clients) allows us
to _optionally_ implement an out-of-band command interface in addition, because
the observable behaviour is no different to a new connection being accepted:
conflicting clients are disconnected and the mux is switched. This may be
helpful to implement platform policy around logging.

## Proposed Design

It's proposed that we use one obmc-console-server process to expose the `N`
consoles connected to a UART mux, where each console represents one mux port.
The mux is switched based on the endpoint of the most recent client connection,
and any conflicting clients are disconnected. This is design option `F` in the
table above.

The internal datastructures of obmc-console will change to accomodate the
design.

We will use one config file for the `N` muxed consoles. The configuration will
provide a similar approach for specifying the mux GPIOs to that used by [the
i2c-mux-gpio devicetree binding][linux-i2c-mux-gpio].

[linux-i2c-mux-gpio]:
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/i2c/i2c-mux-gpio.yaml?h=v6.9#n12

Below is a block diagram of the relationships between the software and hardware
components:

```
                                          +--------------------+
                                          | server.conf        |
                                          +--------------------+
                                               |
                                               |
                                               |
                                               |
                                          +----+----+                                 +-----+     +-------+
                                          |         |                                 |     |     |       |
                                          |         |     +-------+     +-------+     |     +-----+ UART1 |
+-----------------------------------+     |         |     |       |     |       |     |     |     |       |
| xyz.openbmc_project.Console.host1 +-----+         +-----+ ttyS0 +-----+ UART0 +-----+     |     +-------+
+-----------------------------------+     |         |     |       |     |       |     |     |
                                          |  obmc   |     +-------+     +-------+     |     |
                                          | console |                                 | MUX |
                                          | server  |                   +-------+     |     |
+-----------------------------------+     |         |                   |       |     |     |
| xyz.openbmc_project.Console.host2 +-----+         +-------------------+ GPIO  +-----+     |     +-------+
+-----------------------------------+     |         |                   |       |     |     |     |       |
                                          |         |                   +-------+     |     +-----+ UART2 |
                                          |         |                                 |     |     |       |
                                          +----+----+                                 +-----+     +-------+

```

To inform people who may be reading log files for a console, connection and
disconnection events of a console via mux control will produce messages for
clients and in log files.

Requirements are:

- Making it clear this message is from obmc-console
- Timestamp
- Indication of connected/disconnected

These messages are not meant as an API or reliable means to get information
about mux state. Any application on the other side of the uart could also
produce the exact same messages, even if unlikely.

The initial format of these messages will be something like:

```
[obmc-console] %Y-%m-%d %H:%M:%S UTC CONNECTED
[obmc-console] %Y-%m-%d %H:%M:%S UTC DISCONNECTED
```

for the connect and disconnect case.

For the D-Bus representation we choose the unified tree.

## Other Alternatives Considered

### Kernel implementation

Did not do that since the support can be implemented in userspace. Also it may
not be merged since the hardware configuration it supports may not be widely
available. It may be better to have a userspace implementation to refer back to
in case someone wants to do a kernel implementation later.

### Multiple obmc-console-server processes for the multiple consoles

This was considered and implemented is a PoC, but discarded later as it would be
easier to synchronize everything in a single process.

### Multiple configuration files for multiple consoles

This was considered but it would duplicate configuration, like the definition of
the mux GPIOs. Inconsistencies across the files would also need to be managed.

## Impacts

### API Impact

### Performance Impact

Minimal to none.

### Developer Impact

Minimal. Existing users do not need to change anything about their
configuration.

### Organizational

- Does this repository require a new repository? No
- Who will be the initial maintainer(s) of this repository?
- Which repositories are expected to be modified to execute this design?
  obmc-console, docs
- Make a list, and add listed repository maintainers to the gerrit review.

## Testing

There are already integration tests for this feature available on gerrit.