1# uart-mux-support design 2 3Author: Alexander Hansen <alexander.hansen@9elements.com> 4 5Other contributors: Andrew Jeffery <andrew@codeconstruct.com.au> @arj, Jeremy 6Kerr <jk@ozlabs.org>, Patrick Williams <patrick@stwcx.xyz> 7 8Created: June 17, 2024 9 10## Problem Description 11 12Some hardware configurations feature a UART mux which can be switched via GPIOs. 13To support this configuration, obmc-console needs to provide a method for 14console selection to avoid manually setting GPIOs. 15 16## Background and References 17 18There are already [open changes for obmc-console][obmc-console-uart-mux-series] 19but it has been determined that this feature needs a design document. 20 21[obmc-console-uart-mux-series]: 22 https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71864 23 24The background here is that there are some design choices which may affect other 25subprojects - not in the way of causing regression, but later when the mentioned 26hardware configuration needs to be supported in those projects. 27 28## Requirements 29 30- The user can select a console to be muxed 31 32- Platform policy (whichever service implements it) can select the appropriate 33 console depending on the host state and other information. 34 35- It is clear to whoever is reading the logs of that console when a console was 36 connected or disconnected via mux control. There should be no inexplicable 37 gaps in log files. 38 39- The mux configuration can be specified in a single file 40 41- Console selection (implies mux control) must be possible from an external 42 application. 43 44The scope of this change is obmc-console and other projects which rely on the 45APIs exposed by it. 46 47The change will not affect users who do not have this hardware configuration. 48 49## Design Considerations 50 51There are a number of choices available for adding mux support into 52obmc-console: 53 541. What the "connection endpoint" (Unix domain socket, D-Bus object) represents. 55 This could be either: 56 57 1. The TTY device exposed by Linux 58 2. The desired downstream mux port 59 602. How the mux state is controlled. We might control it by any of: 61 62 1. An out-of-band command (e.g. via a D-Bus method that's somehow associated 63 with the connection endpoint) 64 2. An in-band command (e.g. introducing an SSH-style escape-sequence) 65 3. Selecting the mux port based on the endpoint to which the user has 66 connected 67 683. The circumstances under which we allow the mux state to be changed 69 70 1. Active connections prevent the mux state from being changed 71 2. The mux state can always change but will terminate any existing 72 conflicting connections 73 3. The mux state can always change and has no impact on existing conflicting 74 connections 75 764. Whether we want the data stream on a given connection to represent: 77 1. The console IO regardless of the mux state 78 2. The console IO isolated to a specific mux port 79 80There are constraints on some combinations of these. For instance: 81 82- If the connection endpoint represents the TTY device exposed by Linux (1.1) 83 then we can't select the mux port based on the endpoint to which the user has 84 connected (2.3) as we simply don't have the information required 85 86- If the connection endpoint represents the desired downstream mux port (1.2) 87 then it doesn't make sense to implement support for an in-band command to 88 change the mux state (2.2) as it's a violation of the abstraction 89 90- If the connection endpoint represents the desired downstream mux port (1.2) 91 then it can't provide the console IO of another mux port (4.1) as that's 92 contrary to the definition. 93 94With these in mind we end up with the following table of design options: 95 96| ID | Connection Endpoint (1) | Mux Control Defined By (2) | Mux Control Policy (3) | Stream Data (4) | 97| --- | ----------------------- | -------------------------- | ------------------------------------------------ | ----------------- | 98| A | TTY (1.1) | Out-of-band command (2.1) | Active connections prevent mux change (3.1) | Isolated (4.2) | 99| B | TTY | Out-of-band command | Mux change with disconnections (3.2) | Isolated | 100| C | TTY | Out-of-band command | Mux change without disconnections (3.3) | Multiplexed (4.1) | 101| D | TTY | In-band command (2.2) | Mux change without disconnections | Multiplexed | 102| E | Mux port (1.2) | Connection-based (2.3) | Conflicting connections prevent mux change (3.1) | Isolated | 103| F | Mux port | Connection-based | Mux change with disconnections | Isolated | 104| G | Mux port | Connection-based | Mux change without disconnections | Isolated | 105| H | Mux port | Out-of-band command | Conflicting connections prevent mux change | Isolated | 106| I | Mux port | Out-of-band command | Mux change with disconnections | Isolated | 107| J | Mux port | Out-of-band command | Mux change without disconnections | Isolated | 108 109### Scenarios and Use Cases 110 1111. A UART mux selecting between a satellite BMC on a blade and the blade host 112 113 A software update is in progress on the satellite BMC and the mux has been 114 switched to capture the output of whatever the satellite is printing. It is 115 important to log the output of the update process to understand any failures 116 that might result. 117 118 While the satellite BMC update is in progress, a user chooses to connect to 119 the host console. 120 1212. A blade's satellite BMC, CPLD and host are all on separate ports of a UART 122 mux, and relevant output from the blade's boot process must be captured 123 124 The boot process for a blade requires a sequence of actions across its 125 satellite BMC, CPLD and host. Each component contributes critical information 126 about the boot process, which is output on the respective consoles at various 127 points in time. 128 129 For ease of correlation, their output should be logged together. 130 131### Discussion 132 133Scenario 1 is problematic. It highlights the fundamental concern of ownership of 134the mux state. In the scenario the system is in a sensitive state where a 135specific mux configuration is required (to output update progress from the 136satellite BMC), but a user has shown intent for the selection of another (to 137interact with the host console). 138 139What should occur? And does this choice impact how we choose to control the mux? 140 141Taking a connection-based approach to setting the mux state (2.3) will cause the 142user connecting to the host console endpoint to immediately disrupt the update 143progress output from the satellite BMC. 144 145By contrast, by setting the mux state with an out-of-band command (2.1) and not 146on the initiation of a connection (2.3), the user connecting to the host console 147will not immediately disrupt the update progress output from the satellite BMC. 148 149However, we can presume the user is connecting to the host console endpoint for 150a reason. With extra actions, using the out-of-band command interface, they may 151equally choose to switch the mux without regard for the system state, disrupting 152the update progress output from the satellite BMC. 153 154This highlights that the fundamental problem is access to the system by multiple 155users who are neither coordinating with each other nor the system state. The 156question that follows is: 157 158Should it be the responsibility of obmc-console to coordinate otherwise 159un-coordinated users? 160 161This is a question of policy: How those users should be coordinated will likely 162look very different based on concerns such as the role of the platform in a 163larger system, the roles and needs of the users interacting with it, and the 164concrete design of the platform itself. 165 166obmc-console should implement a mechanism to control the mux state, but likely 167shouldn't apply any policy governing access to the muxed consoles. 168 169A further concern for the out-of-band command approach is its interactions with 170other components exposing consoles: 171 1721. The dropbear/obmc-console-client integration exposing consoles via SSH 1732. [bmcweb](https://github.com/openbmc/bmcweb/blob/master/include/obmc_console.hpp) 1743. [phosphor-net-ipmid](https://github.com/openbmc/phosphor-net-ipmid/blob/master/sol/sol_manager.hpp) 175 176With the out-of-band command approach these components have to choose between: 177 178- Not providing any capability to change the mux state; rather, they defer to 179 making the user log in via SSH to affect the change themselves 180 181- Expose some mechanism for setting the mux state in terms of their own external 182 interfaces 183 184- Assume that a user connecting to the exposed console endpoint wants to select 185 that console if it's behind a mux 186 187The first assumes that SSH is exposed at all and accessible by users who need 188access to the muxed consoles. It's not yet clear whether this is a reasonable 189expectation. 190 191The second assumes that these external interfaces have the capability to model 192the problem. It's not yet clear that this is the case for either of IPMI or 193Redfish, and it's not the case for serial over SSH. 194 195The third implies that we must add capability to all three components to drive 196the out-of-band command interface when they receive a connection for a given 197console. The net result is no behavioural difference from obmc-console 198implementing this itself (2.3), but increased complexity across the system. 199 200## Implementation Considerations 201 202### How are muxed consoles represented on D-Bus? 203 204Every console will have its own D-Bus name, as this is backwards-compatible with 205the current implementation. 206 207Multiple consoles can be represented as a split- or unified- object tree. 208 209### Tradeoffs of unified vs split object tree on D-Bus 210 211In split-tree, it is not clear which consoles all belong to one UART mux, but in 212unified-tree, this is clear. 213 214In unified-tree, one console is reachable via the D-Bus name of another, 215effectively creating multiple ways of doing something. 216 217Example: 218 219``` 220busctl set-property xyz.openbmc_project.Console.host1 \ 221/xyz/openbmc_project/console/host2 \ 222xyz.openbmc_project.Console.Access Connect "" 223``` 224 225So a choice has to be made how to represent multiple consoles on dbus, and what 226information needs to be exposed to other subprojects. 227 228Unified Tree: 229 230``` 231busctl tree --user xyz.openbmc_project.Console.host1 232└─/xyz 233 └─/xyz/openbmc_project 234 └─/xyz/openbmc_project/console 235 ├─/xyz/openbmc_project/console/host1 236 └─/xyz/openbmc_project/console/host2 237``` 238 239Split Tree: 240 241``` 242busctl tree --user xyz.openbmc_project.Console.host1 243└─/xyz 244 └─/xyz/openbmc_project 245 └─/xyz/openbmc_project/console 246 └─/xyz/openbmc_project/console/host1 247 248busctl tree --user xyz.openbmc_project.Console.host2 249└─/xyz 250 └─/xyz/openbmc_project 251 └─/xyz/openbmc_project/console 252 └─/xyz/openbmc_project/console/host2 253``` 254 255The choice of representation impacts how the mux can be described on D-Bus, 256which is necessary if the out-of-band command strategy (2.1) is chosen. Two 257possibilities for exposing an out-of-band mux control on D-Bus are: 258 2591. Implement an interface on each console object that defines a boolean `Active` 260 property, and an `Activate()` method. The `Activate()` method, by nature of 261 being implemented on the console object, has all the context it needs to 262 switch the mux without requiring caller-supplied parameters. The `Activate` 263 property is `true` when the mux is configured for the console of interest, 264 and `false` otherwise. A `PropertiesChanged` D-Bus signal for the `Active` 265 variable may alert local users to changes of mux state. 266 2672. Implement a `Mux` interface on an object common to all consoles exposed by 268 the mux. The `Mux` interface might have a writable string `Selected` property 269 that represents the state of the mux and provides a mechanism to switch it to 270 a given console. 271 272These have both been [discussed on an existing patch to 273phosphor-dbus-interfaces][pdi-uart-mux-control-interface]. 274 275[pdi-uart-mux-control-interface]: 276 https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/71878/comment/dd34b099_66dbc49e/ 277 278The second approach is quite explicit - directly representing the mux state 279makes it easy to discover the state of the system. However, it motivates the 280choice of a unified object tree to provide a common object path to host the 281`Mux` interface (e.g. at `/xyz/openbmc_project/console`). This is desired to 282avoid an alternative instance of the "multiple representations of one thing" 283problem highlighted in the discussion of claiming multiple bus names for the 284unified object tree: If the tree isn't unified, this `Mux` interface would have 285to be represented and synchronised on objects across multiple D-Bus connections. 286 287The first approach doesn't have this limitation. However, it does have the 288trade-off previously mentioned, that it's unclear how any of the consoles in the 289system are related, and what the impact might be of activating any one of them. 290 291Choosing a strategy for D-Bus representation is required if we add to the D-Bus 292API, i.e. with the out-of-band command design point (2.1). However, the choice 293becomes more of an implementation detail if either of design options 2.2 or 2.3 294are selected. The choice in those cases is instead motivated by the level of 295clarity we desire in describing the relationships between consoles. 296 297## Pruning the Design Decision Tree 298 299To help shape the choices here, we have the existing behaviours of obmc-console 300[discussed on the PDI patch][pdi-uart-mux-control-interface]: 301 3021. We already have support for concurrent console server instances 303 3042. Concurrent console support is implemented as one obmc-console-server process 305 per Linux TTY device 306 3073. As each Linux TTY device is paired with its obmc-console-server process, each 308 obmc-console-server D-Bus connection needs a unique name 309 3104. We use the unique console-ids to name global resources, including both the 311 D-Bus connection and the instance's unix domain socket. 312 313As in the linked discussion, given the `console-id` value really represents 314what's at the remote end of the BMC's TTY device for regular unmuxed consoles, 315it stands to reason that we should continue this strategy for muxed consoles. 316Taking this approach avoids adding a new endpoint ABI to obmc-console and 317eliminates design options A-D inclusive. 318 319Further, on the basis of frustrating behaviour in the face of lingering network 320connections, preventing mux changes on the grounds of an existing connection 321seems like a bad path forward. 322 323This leaves us with design options `F`, `G`, `I`, and `J`, which are 324differentiated by how the mux is switched, and its effect on already-connected 325clients. 326 327Concentrating on how the mux is switched, based on the discussion about the 328D-Bus representation above, the discussion on the PDI patch, and the impact on 329related applications, it's reasonable to say there are some complications with 330the out-of-band command method (2.1). 331 332By contrast we can consider the alternative: We make the mux state reflect the 333endpoint of the most recent connection. This has the benefit of functioning for 334both the Unix domain socket and D-Bus access with no further effort. Neither 335bmcweb nor phosphor-net-ipmid need be patched. The choice also eliminates the 336D-Bus complications mentioned above as there's no need for the additional D-Bus 337interface. 338 339This reasoning leaves us the choice of design options `F` and `G`. 340 341`F` and `G` are differentiated by whether or not we drop connections on 342endpoints that are not the endpoint selected by the mux. There's been some back 343and forth on that subject elsewhere[[1][drop-connections-discussion-1]] 344[[2][drop-connections-discussion-2]], but it seems that not disconnecting 345clients is effectively a worse implementation of design option `C`, which we've 346already eliminated. It's worse than `C` because instead of 1 connection we could 347have `N` connections for `N` mux ports, `(N - 1)` of which are idle. Not only 348that, but the `(N - 1)` connections are effectively zombies, as they have no way 349to switch the mux back to their associated port without establishing yet another 350connection. It follows that if we're establishing a subsequent connection in 351order to switch the mux we may as well disconnect the existing session, in which 352case it may as well have been disconnected when the mux switched away to begin 353with[^1]. 354 355[drop-connections-discussion-1]: 356 https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71228/comment/62a5fce9_60c3ad3e/ 357[drop-connections-discussion-2]: 358 https://gerrit.openbmc.org/c/openbmc/obmc-console/+/71867/comment/756f0abe_5ebe8d66/ 359 360[^1]: which also saves resources 361 362These arguments combined eliminate all but option `F`. It seems to sit at a neat 363nexus in terms of both existing ABI, desired behaviour, and implementation 364complexity. 365 366Addendum: Discussions so far have been are around a _minimal_ design that 367achieves the desired console behaviour. It's worth noting that design option `F` 368(connection-based mux control which disconnects conflicting clients) allows us 369to _optionally_ implement an out-of-band command interface in addition, because 370the observable behaviour is no different to a new connection being accepted: 371conflicting clients are disconnected and the mux is switched. This may be 372helpful to implement platform policy around logging. 373 374## Proposed Design 375 376It's proposed that we use one obmc-console-server process to expose the `N` 377consoles connected to a UART mux, where each console represents one mux port. 378The mux is switched based on the endpoint of the most recent client connection, 379and any conflicting clients are disconnected. This is design option `F` in the 380table above. 381 382The internal datastructures of obmc-console will change to accomodate the 383design. 384 385We will use one config file for the `N` muxed consoles. The configuration will 386provide a similar approach for specifying the mux GPIOs to that used by [the 387i2c-mux-gpio devicetree binding][linux-i2c-mux-gpio]. 388 389[linux-i2c-mux-gpio]: 390 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/i2c/i2c-mux-gpio.yaml?h=v6.9#n12 391 392Below is a block diagram of the relationships between the software and hardware 393components: 394 395``` 396 +--------------------+ 397 | server.conf | 398 +--------------------+ 399 | 400 | 401 | 402 | 403 +----+----+ +-----+ +-------+ 404 | | | | | | 405 | | +-------+ +-------+ | +-----+ UART1 | 406+-----------------------------------+ | | | | | | | | | | 407| xyz.openbmc_project.Console.host1 +-----+ +-----+ ttyS0 +-----+ UART0 +-----+ | +-------+ 408+-----------------------------------+ | | | | | | | | 409 | obmc | +-------+ +-------+ | | 410 | console | | MUX | 411 | server | +-------+ | | 412+-----------------------------------+ | | | | | | 413| xyz.openbmc_project.Console.host2 +-----+ +-------------------+ GPIO +-----+ | +-------+ 414+-----------------------------------+ | | | | | | | | 415 | | +-------+ | +-----+ UART2 | 416 | | | | | | 417 +----+----+ +-----+ +-------+ 418 419``` 420 421To inform people who may be reading log files for a console, connection and 422disconnection events of a console via mux control will produce messages for 423clients and in log files. 424 425Requirements are: 426 427- Making it clear this message is from obmc-console 428- Timestamp 429- Indication of connected/disconnected 430 431These messages are not meant as an API or reliable means to get information 432about mux state. Any application on the other side of the uart could also 433produce the exact same messages, even if unlikely. 434 435The initial format of these messages will be something like: 436 437``` 438[obmc-console] %Y-%m-%d %H:%M:%S UTC CONNECTED 439[obmc-console] %Y-%m-%d %H:%M:%S UTC DISCONNECTED 440``` 441 442for the connect and disconnect case. 443 444For the D-Bus representation we choose the unified tree. 445 446## Other Alternatives Considered 447 448### Kernel implementation 449 450Did not do that since the support can be implemented in userspace. Also it may 451not be merged since the hardware configuration it supports may not be widely 452available. It may be better to have a userspace implementation to refer back to 453in case someone wants to do a kernel implementation later. 454 455### Multiple obmc-console-server processes for the multiple consoles 456 457This was considered and implemented is a PoC, but discarded later as it would be 458easier to synchronize everything in a single process. 459 460### Multiple configuration files for multiple consoles 461 462This was considered but it would duplicate configuration, like the definition of 463the mux GPIOs. Inconsistencies across the files would also need to be managed. 464 465## Impacts 466 467### API Impact 468 469### Performance Impact 470 471Minimal to none. 472 473### Developer Impact 474 475Minimal. Existing users do not need to change anything about their 476configuration. 477 478### Organizational 479 480- Does this repository require a new repository? No 481- Who will be the initial maintainer(s) of this repository? 482- Which repositories are expected to be modified to execute this design? 483 obmc-console, docs 484- Make a list, and add listed repository maintainers to the gerrit review. 485 486## Testing 487 488There are already integration tests for this feature available on gerrit. 489