1f0ca2e41SJeremy Kerr# OpenBMC in-kernel MCTP 2f0ca2e41SJeremy Kerr 3f0ca2e41SJeremy KerrAuthor: Jeremy Kerr `<jk@codeconstruct.com.au>` 4f0ca2e41SJeremy Kerr 5f0ca2e41SJeremy KerrPlease refer to the [MCTP Overview](mctp.md) document for general MCTP design 6f0ca2e41SJeremy Kerrdescription, background and requirements. 7f0ca2e41SJeremy Kerr 8f0ca2e41SJeremy KerrThis document describes a kernel-based implementation of MCTP infrastructure, 9f0ca2e41SJeremy Kerrproviding a sockets-based API for MCTP communication within an OpenBMC-based 10f0ca2e41SJeremy Kerrplatform. 11f0ca2e41SJeremy Kerr 12f0ca2e41SJeremy Kerr# Requirements for a kernel implementation 13f0ca2e41SJeremy Kerr 14*f4febd00SPatrick Williams- The MCTP messaging API should be an obvious application of the existing POSIX 15f0ca2e41SJeremy Kerr socket interface 16f0ca2e41SJeremy Kerr 17*f4febd00SPatrick Williams- Configuration should be simple for a straightforward MCTP endpoint: a single 18f0ca2e41SJeremy Kerr network with a single local endpoint id (EID). 19f0ca2e41SJeremy Kerr 20*f4febd00SPatrick Williams- Infrastructure should be flexible enough to allow for more complex MCTP 21f0ca2e41SJeremy Kerr networks, allowing: 22f0ca2e41SJeremy Kerr 23*f4febd00SPatrick Williams - each MCTP network (as defined by section 3.2.31 of DSP0236) may consist of 24*f4febd00SPatrick Williams multiple local physical interfaces, and/or multiple EIDs; 25f0ca2e41SJeremy Kerr 26f0ca2e41SJeremy Kerr - multiple distinct (ie., non-bridged) networks, possibly containing 27f0ca2e41SJeremy Kerr duplicated EIDs between networks; 28f0ca2e41SJeremy Kerr 29f0ca2e41SJeremy Kerr - multiple local EIDs on a single interface, and 30f0ca2e41SJeremy Kerr 31f0ca2e41SJeremy Kerr - customisable routing/bridging configurations within a network. 32f0ca2e41SJeremy Kerr 33*f4febd00SPatrick Williams# Proposed Design 34f0ca2e41SJeremy Kerr 35f0ca2e41SJeremy KerrThe design contains several components: 36f0ca2e41SJeremy Kerr 37*f4febd00SPatrick Williams- An interface for userspace applications to send and receive MCTP messages: A 38f0ca2e41SJeremy Kerr mapping of the sockets API to MCTP usage 39f0ca2e41SJeremy Kerr 40*f4febd00SPatrick Williams- Infrastructure for control and configuration of the MCTP network(s), 41f0ca2e41SJeremy Kerr consisting of a configuration utility, and a kernel messaging facility for 42f0ca2e41SJeremy Kerr this utility to use. 43f0ca2e41SJeremy Kerr 44*f4febd00SPatrick Williams- Kernel drivers for physical interface bindings. 45f0ca2e41SJeremy Kerr 46f0ca2e41SJeremy KerrIn general, the kernel components cover the transport functionality of MCTP, 47f0ca2e41SJeremy Kerrsuch as message assembly/disassembly, packet forwarding, and physical interface 48f0ca2e41SJeremy Kerrimplementations. 49f0ca2e41SJeremy Kerr 50f0ca2e41SJeremy KerrHigher-level protocols (such as PLDM) are implemented in userspace, through the 51f0ca2e41SJeremy Kerrintroduced socket API. This also includes the majority of the MCTP Control 52f0ca2e41SJeremy KerrProtocol implementation (DSP0236, section 11) - MCTP endpoints will typically 53f0ca2e41SJeremy Kerrhave a specific process to request and respond to control protocol messages. 54f0ca2e41SJeremy KerrHowever, the kernel will include a small subset of control protocol code to 55f0ca2e41SJeremy Kerrallow very simple endpoints, with static EID allocations, to run without this 56f0ca2e41SJeremy Kerrprocess. MCTP endpoints that require more than just single-endpoint 57f0ca2e41SJeremy Kerrfunctionality (bus owners, bridges, etc), and/or dynamic EID allocation, would 58f0ca2e41SJeremy Kerrinclude the control message protocol process. 59f0ca2e41SJeremy Kerr 60f0ca2e41SJeremy KerrA new driver is introduced to handle each physical interface binding. These 61f0ca2e41SJeremy Kerrdrivers expose the appropriate `struct net_device` to handle transmission and 62f0ca2e41SJeremy Kerrreception of MCTP packets on their associated hardware channels. Under Linux, 63f0ca2e41SJeremy Kerrthe namespace for these interfaces is separate from other network interfaces - 64f0ca2e41SJeremy Kerrsuch as those for ethernet. 65f0ca2e41SJeremy Kerr 66*f4febd00SPatrick Williams## Structure: interfaces & networks 67f0ca2e41SJeremy Kerr 68f0ca2e41SJeremy KerrThe kernel models the local MCTP topology through two items: interfaces and 69f0ca2e41SJeremy Kerrnetworks. 70f0ca2e41SJeremy Kerr 71f0ca2e41SJeremy KerrAn interface (or "link") is an instance of an MCTP physical transport binding 72f0ca2e41SJeremy Kerr(as defined by DSP0236, section 3.2.47), likely connected to a specific hardware 73*f4febd00SPatrick Williamsdevice. This is represented as a `struct netdevice`, and has a user-visible name 74*f4febd00SPatrick Williamsand index (`ifindex`). Non-hardware-attached interfaces are permitted, to allow 75*f4febd00SPatrick Williamslocal loopback and/or virtual interfaces. 76f0ca2e41SJeremy Kerr 77f0ca2e41SJeremy KerrA network defines a unique address space for MCTP endpoints by endpoint-ID 78f0ca2e41SJeremy Kerr(described by DSP0236, section 3.2.31). A network has a user-visible identifier 79146f9098SGeorge Keishingto allow references from userspace. Route definitions are specific to one 80f0ca2e41SJeremy Kerrnetwork. 81f0ca2e41SJeremy Kerr 82f0ca2e41SJeremy KerrInterfaces are associated with one network. A network may be associated with one 83f0ca2e41SJeremy Kerror more interfaces. 84f0ca2e41SJeremy Kerr 85f0ca2e41SJeremy KerrIf multiple networks are present, each may contain EIDs that are also present on 86f0ca2e41SJeremy Kerrother networks. 87f0ca2e41SJeremy Kerr 88*f4febd00SPatrick Williams## Sockets API 89f0ca2e41SJeremy Kerr 90*f4febd00SPatrick Williams### Protocol definitions 91f0ca2e41SJeremy Kerr 92f0ca2e41SJeremy KerrWe define a new address family (and corresponding protocol family) for MCTP: 93f0ca2e41SJeremy Kerr 94f0ca2e41SJeremy Kerr```c 95f0ca2e41SJeremy Kerr #define AF_MCTP /* TBD */ 96f0ca2e41SJeremy Kerr #define PF_MCTP AF_MCTP 97f0ca2e41SJeremy Kerr``` 98f0ca2e41SJeremy Kerr 99f0ca2e41SJeremy KerrMCTP sockets are created with the `socket()` syscall, specifying `AF_MCTP` as 100f0ca2e41SJeremy Kerrthe domain. Currently, only a `SOCK_DGRAM` socket type is defined. 101f0ca2e41SJeremy Kerr 102f0ca2e41SJeremy Kerr```c 103f0ca2e41SJeremy Kerr int sd = socket(AF_MCTP, SOCK_DGRAM, 0); 104f0ca2e41SJeremy Kerr``` 105f0ca2e41SJeremy Kerr 106f0ca2e41SJeremy KerrThe only (current) value for the `protocol` argument is 0. Future protocol 107f0ca2e41SJeremy Kerrimplementations may be added later. 108f0ca2e41SJeremy Kerr 109f0ca2e41SJeremy KerrMCTP Sockets opened with a protocol value of 0 will communicate directly at the 110f0ca2e41SJeremy Kerrtransport layer; message buffers received by the application will consist of 111f0ca2e41SJeremy Kerrmessage data from reassembled MCTP packets, and will include the full message 112f0ca2e41SJeremy Kerrincluding message type byte and optional message integrity check (IC). 113f0ca2e41SJeremy KerrIndividual packet headers are not included; they may be accessible through a 114f0ca2e41SJeremy Kerrfuture `SOCK_RAW` socket type. 115f0ca2e41SJeremy Kerr 116f0ca2e41SJeremy KerrAs with all socket address families, source and destination addresses are 117f0ca2e41SJeremy Kerrspecified with a new `sockaddr` type: 118f0ca2e41SJeremy Kerr 119f0ca2e41SJeremy Kerr```c 120f0ca2e41SJeremy Kerr struct sockaddr_mctp { 121f0ca2e41SJeremy Kerr sa_family_t smctp_family; /* = AF_MCTP */ 122f0ca2e41SJeremy Kerr int smctp_network; 123f0ca2e41SJeremy Kerr struct mctp_addr smctp_addr; 124f0ca2e41SJeremy Kerr uint8_t smctp_type; 125f0ca2e41SJeremy Kerr uint8_t smctp_tag; 126f0ca2e41SJeremy Kerr }; 127f0ca2e41SJeremy Kerr 128f0ca2e41SJeremy Kerr struct mctp_addr { 129f0ca2e41SJeremy Kerr uint8_t s_addr; 130f0ca2e41SJeremy Kerr }; 131f0ca2e41SJeremy Kerr 132f0ca2e41SJeremy Kerr /* MCTP network values */ 133f0ca2e41SJeremy Kerr #define MCTP_NET_ANY 0 134f0ca2e41SJeremy Kerr 135f0ca2e41SJeremy Kerr /* MCTP EID values */ 136f0ca2e41SJeremy Kerr #define MCTP_ADDR_ANY 0xff 137f0ca2e41SJeremy Kerr #define MCTP_ADDR_BCAST 0xff 138f0ca2e41SJeremy Kerr 139f0ca2e41SJeremy Kerr /* MCTP type values. Only the least-significant 7 bits of 140f0ca2e41SJeremy Kerr * smctp_type are used for tag matches; the specification defines 141f0ca2e41SJeremy Kerr * the type to be 7 bits. 142f0ca2e41SJeremy Kerr */ 143f0ca2e41SJeremy Kerr #define MCTP_TYPE_MASK 0x7f 144f0ca2e41SJeremy Kerr 145f0ca2e41SJeremy Kerr /* MCTP tag defintions; used for smcp_tag field of sockaddr_mctp */ 146f0ca2e41SJeremy Kerr /* MCTP-spec-defined fields */ 147f0ca2e41SJeremy Kerr #define MCTP_TAG_MASK 0x07 148f0ca2e41SJeremy Kerr #define MCTP_TAG_OWNER 0x08 149f0ca2e41SJeremy Kerr /* Others: reserved */ 150f0ca2e41SJeremy Kerr 151f0ca2e41SJeremy Kerr /* Helpers */ 152f0ca2e41SJeremy Kerr #define MCTP_TAG_RSP(x) (x & MCTP_TAG_MASK) /* response to a request: clear TO, keep value */ 153f0ca2e41SJeremy Kerr``` 154f0ca2e41SJeremy Kerr 155*f4febd00SPatrick Williams### Syscall behaviour 156f0ca2e41SJeremy Kerr 157f0ca2e41SJeremy KerrThe following sections describe the MCTP-specific behaviours of the standard 158f0ca2e41SJeremy Kerrsocket system calls. These behaviours have been chosen to map closely to the 159f0ca2e41SJeremy Kerrexisting sockets APIs. 160f0ca2e41SJeremy Kerr 161*f4febd00SPatrick Williams#### `bind()`: set local socket address 162f0ca2e41SJeremy Kerr 163f0ca2e41SJeremy KerrSockets that receive incoming request packets will bind to a local address, 164f0ca2e41SJeremy Kerrusing the `bind()` syscall. 165f0ca2e41SJeremy Kerr 166f0ca2e41SJeremy Kerr```c 167f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 168f0ca2e41SJeremy Kerr 169f0ca2e41SJeremy Kerr addr.smctp_family = AF_MCTP; 170f0ca2e41SJeremy Kerr addr.smctp_network = MCTP_NET_ANY; 171f0ca2e41SJeremy Kerr addr.smctp_addr.s_addr = MCTP_ADDR_ANY; 172f0ca2e41SJeremy Kerr addr.smctp_type = MCTP_TYPE_PLDM; 173f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_OWNER; 174f0ca2e41SJeremy Kerr 175f0ca2e41SJeremy Kerr int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr)); 176f0ca2e41SJeremy Kerr``` 177f0ca2e41SJeremy Kerr 178f0ca2e41SJeremy KerrThis establishes the local address of the socket. Incoming MCTP messages that 179f0ca2e41SJeremy Kerrmatch the network, address, and message type will be received by this socket. 180f0ca2e41SJeremy KerrThe reference to 'incoming' is important here; a bound socket will only receive 181f0ca2e41SJeremy Kerrmessages with the TO bit set, to indicate an incoming request message, rather 182f0ca2e41SJeremy Kerrthan a response. 183f0ca2e41SJeremy Kerr 184f0ca2e41SJeremy KerrThe `smctp_tag` value will configure the tags accepted from the remote side of 185f0ca2e41SJeremy Kerrthis socket. Given the above, the only valid value is `MCTP_TAG_OWNER`, which 186f0ca2e41SJeremy Kerrwill result in remotely "owned" tags being routed to this socket. Since 187*f4febd00SPatrick Williams`MCTP_TAG_OWNER` is set, the 3 least-significant bits of `smctp_tag` are not 188*f4febd00SPatrick Williamsused; callers must set them to zero. See the 189*f4febd00SPatrick Williams[Tag behaviour for transmitted messages](#tag-behaviour-for-transmitted-messages) 190*f4febd00SPatrick Williamssection for more details. If the `MCTP_TAG_OWNER` bit is not set, `bind()` will 191*f4febd00SPatrick Williamsfail with an errno of `EINVAL`. 192f0ca2e41SJeremy Kerr 193f0ca2e41SJeremy KerrA `smctp_network` value of `MCTP_NET_ANY` will configure the socket to receive 194f0ca2e41SJeremy Kerrincoming packets from any locally-connected network. A specific network value 195f0ca2e41SJeremy Kerrwill cause the socket to only receive incoming messages from that network. 196f0ca2e41SJeremy Kerr 197f0ca2e41SJeremy KerrThe `smctp_addr` field specifies a local address to bind to. A value of 198*f4febd00SPatrick Williams`MCTP_ADDR_ANY` configures the socket to receive messages addressed to any local 199*f4febd00SPatrick Williamsdestination EID. 200f0ca2e41SJeremy Kerr 201f0ca2e41SJeremy KerrThe `smctp_type` field specifies which message types to receive. Only the lower 202f0ca2e41SJeremy Kerr7 bits of the type is matched on incoming messages (ie., the most-significant IC 203f0ca2e41SJeremy Kerrbit is not part of the match). This results in the socket receiving packets with 204f0ca2e41SJeremy Kerrand without a message integrity check footer. 205f0ca2e41SJeremy Kerr 206*f4febd00SPatrick Williams#### `connect()`: set remote socket address 207f0ca2e41SJeremy Kerr 208f0ca2e41SJeremy KerrSockets may specify a socket's remote address with the `connect()` syscall: 209f0ca2e41SJeremy Kerr 210f0ca2e41SJeremy Kerr```c 211f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 212f0ca2e41SJeremy Kerr int rc; 213f0ca2e41SJeremy Kerr 214f0ca2e41SJeremy Kerr addr.smctp_family = AF_MCTP; 215f0ca2e41SJeremy Kerr addr.smctp_network = MCTP_NET_ANY; 216f0ca2e41SJeremy Kerr addr.smctp_addr.s_addr = 8; 217f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_OWNER; 218f0ca2e41SJeremy Kerr addr.smctp_type = MCTP_TYPE_PLDM; 219f0ca2e41SJeremy Kerr 220f0ca2e41SJeremy Kerr rc = connect(sd, (struct sockaddr *)&addr, sizeof(addr)); 221f0ca2e41SJeremy Kerr``` 222f0ca2e41SJeremy Kerr 223f0ca2e41SJeremy KerrThis establishes the remote address of a socket, used for future message 224f0ca2e41SJeremy Kerrtransmission. Like other `SOCK_DGRAM` behaviour, this does not generate any MCTP 225f0ca2e41SJeremy Kerrtraffic directly, but just sets the default destination for messages sent from 226f0ca2e41SJeremy Kerrthis socket. 227f0ca2e41SJeremy Kerr 228f0ca2e41SJeremy KerrThe `smctp_network` field may specify a locally-attached network, or the value 229f0ca2e41SJeremy Kerr`MCTP_NET_ANY`, in which case the kernel will select a suitable MCTP network. 230f0ca2e41SJeremy KerrThis is guaranteed to work for single-network configurations, but may require 231f0ca2e41SJeremy Kerradditional routing definitions for endpoints attached to multiple distinct 232f0ca2e41SJeremy Kerrnetworks. See the [Addressing](#addressing) section for details. 233f0ca2e41SJeremy Kerr 234f0ca2e41SJeremy KerrThe `smctp_addr` field specifies a remote EID. This may be the `MCTP_ADDR_BCAST` 235f0ca2e41SJeremy Kerrthe MCTP broadcast EID (0xff). 236f0ca2e41SJeremy Kerr 237f0ca2e41SJeremy KerrThe `smctp_type` field specifies the type field of messages transferred over 238f0ca2e41SJeremy Kerrthis socket. 239f0ca2e41SJeremy Kerr 240f0ca2e41SJeremy KerrThe `smctp_tag` value will configure the tag used for the local side of this 241f0ca2e41SJeremy Kerrsocket. The only valid value is `MCTP_TAG_OWNER`, which will result in an 242f0ca2e41SJeremy Kerr"owned" tag to be allocated for this socket, and will remain allocated for all 243f0ca2e41SJeremy Kerrfuture outgoing messages, until either the socket is closed, or `connect()` is 244f0ca2e41SJeremy Kerrcalled again. If a tag cannot be allocated, `connect()` will report an error, 245*f4febd00SPatrick Williamswith an errno value of `EAGAIN`. See the 246*f4febd00SPatrick Williams[Tag behaviour for transmitted messages](#tag-behaviour-for-transmitted-messages) 247*f4febd00SPatrick Williamssection for more details. If the `MCTP_TAG_OWNER` bit is not set, `connect()` 248*f4febd00SPatrick Williamswill fail with an errno of `EINVAL`. 249f0ca2e41SJeremy Kerr 250f0ca2e41SJeremy KerrRequesters which connect to a single responder will typically use `connect()` to 251f0ca2e41SJeremy Kerrspecify the peer address and tag for future outgoing messages. 252f0ca2e41SJeremy Kerr 253*f4febd00SPatrick Williams#### `sendto()`, `sendmsg()`, `send()` & `write()`: transmit an MCTP message 254f0ca2e41SJeremy Kerr 255*f4febd00SPatrick WilliamsAn MCTP message is transmitted using one of the `sendto()`, `sendmsg()`, 256*f4febd00SPatrick Williams`send()` or `write()` syscalls. Using `sendto()` as the primary example: 257f0ca2e41SJeremy Kerr 258f0ca2e41SJeremy Kerr```c 259f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 260f0ca2e41SJeremy Kerr char buf[14]; 261f0ca2e41SJeremy Kerr ssize_t len; 262f0ca2e41SJeremy Kerr 263f0ca2e41SJeremy Kerr /* set message destination */ 264f0ca2e41SJeremy Kerr addr.smctp_family = AF_MCTP; 265f0ca2e41SJeremy Kerr addr.smctp_network = 0; 266f0ca2e41SJeremy Kerr addr.smctp_addr.s_addr = 8; 267f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_OWNER; 268f0ca2e41SJeremy Kerr addr.smctp_type = MCTP_TYPE_ECHO; 269f0ca2e41SJeremy Kerr 270f0ca2e41SJeremy Kerr /* arbitrary message to send, with message-type header */ 271f0ca2e41SJeremy Kerr buf[0] = MCTP_TYPE_ECHO; 272f0ca2e41SJeremy Kerr memcpy(buf + 1, "hello, world!", sizeof(buf) - 1); 273f0ca2e41SJeremy Kerr 274f0ca2e41SJeremy Kerr len = sendto(sd, buf, sizeof(buf), 0, 275f0ca2e41SJeremy Kerr (struct sockaddr_mctp *)&addr, sizeof(addr)); 276f0ca2e41SJeremy Kerr``` 277f0ca2e41SJeremy Kerr 278f0ca2e41SJeremy KerrThe address argument is treated the same way as for `connect()`: The network and 279f0ca2e41SJeremy Kerraddress fields define the remote address to send to. If `smctp_tag` has the 280f0ca2e41SJeremy Kerr`MCTP_TAG_OWNER`, the kernel will ignore any bits set in `MCTP_TAG_VALUE`, and 281f0ca2e41SJeremy Kerrgenerate a tag value suitable for the destination EID. If `MCTP_TAG_OWNER` is 282f0ca2e41SJeremy Kerrnot set, the message will be sent with the tag value as specified. If a tag 283f0ca2e41SJeremy Kerrvalue cannot be allocated, the system call will report an errno of `EAGAIN`. 284f0ca2e41SJeremy Kerr 285f0ca2e41SJeremy KerrThe application must provide the message type byte as the first byte of the 286f0ca2e41SJeremy Kerrmessage buffer passed to `sendto()`. If a message integrity check is to be 287f0ca2e41SJeremy Kerrincluded in the transmitted message, it must also be provided in the message 288f0ca2e41SJeremy Kerrbuffer, and the most-significant bit of the message type byte must be 1. 289f0ca2e41SJeremy Kerr 290f0ca2e41SJeremy KerrIf the first byte of the message does not match the message type value, then the 291f0ca2e41SJeremy Kerrsystem call will return an error of `EPROTO`. 292f0ca2e41SJeremy Kerr 293f0ca2e41SJeremy KerrThe `send()` and `write()` system calls behave in a similar way, but do not 294f0ca2e41SJeremy Kerrspecify a remote address. Therefore, `connect()` must be called beforehand; if 295f0ca2e41SJeremy Kerrnot, these calls will return an error of `EDESTADDRREQ` (Destination address 296f0ca2e41SJeremy Kerrrequired). 297f0ca2e41SJeremy Kerr 298f0ca2e41SJeremy KerrUsing `sendto()` or `sendmsg()` on a connected socket may override the remote 299f0ca2e41SJeremy Kerrsocket address specified in `connect()`. The `connect()` address and tag will 300f0ca2e41SJeremy Kerrremain associated with the socket, for future unaddressed sends. The tag 301f0ca2e41SJeremy Kerrallocated through a call to `sendto()` or `sendmsg()` on a connected socket is 302f0ca2e41SJeremy Kerrsubject to the same invalidation logic as on an unconnected socket: It is 303f0ca2e41SJeremy Kerrexpired either by timeout or by a subsequent `sendto()`. 304f0ca2e41SJeremy Kerr 305f0ca2e41SJeremy KerrThe `sendmsg()` system call allows a more compact argument interface, and the 306*f4febd00SPatrick Williamsmessage buffer to be specified as a scatter-gather list. At present no ancillary 307*f4febd00SPatrick Williamsmessage types (used for the `msg_control` data passed to `sendmsg()`) are 308*f4febd00SPatrick Williamsdefined. 309f0ca2e41SJeremy Kerr 310f0ca2e41SJeremy KerrTransmitting a message on an unconnected socket with `MCTP_TAG_OWNER` specified 311f0ca2e41SJeremy Kerrwill cause an allocation of a tag, if no valid tag is already allocated for that 312f0ca2e41SJeremy Kerrdestination. The (destination-eid,tag) tuple acts as an implicit local socket 313f0ca2e41SJeremy Kerraddress, to allow the socket to receive responses to this outgoing message. If 314f0ca2e41SJeremy Kerrany previous allocation has been performed (to for a different remote EID), that 315f0ca2e41SJeremy Kerrallocation is lost. This tag behaviour can be controlled through the 316f0ca2e41SJeremy Kerr`MCTP_TAG_CONTROL` socket option. 317f0ca2e41SJeremy Kerr 318*f4febd00SPatrick WilliamsSockets will only receive responses to requests they have sent (with TO=1) and 319*f4febd00SPatrick Williamsmay only respond (with TO=0) to requests they have received. 320f0ca2e41SJeremy Kerr 321*f4febd00SPatrick Williams#### `recvfrom()`, `recvmsg()`, `recv()` & `read()`: receive an MCTP message 322f0ca2e41SJeremy Kerr 323f0ca2e41SJeremy KerrAn MCTP message can be received by an application using one of the `recvfrom()`, 324f0ca2e41SJeremy Kerr`recvmsg()`, `recv()` or `read()` system calls. Using `recvfrom()` as the 325f0ca2e41SJeremy Kerrprimary example: 326f0ca2e41SJeremy Kerr 327f0ca2e41SJeremy Kerr```c 328f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 329f0ca2e41SJeremy Kerr socklen_t addrlen; 330f0ca2e41SJeremy Kerr char buf[14]; 331f0ca2e41SJeremy Kerr ssize_t len; 332f0ca2e41SJeremy Kerr 333f0ca2e41SJeremy Kerr addrlen = sizeof(addr); 334f0ca2e41SJeremy Kerr 335f0ca2e41SJeremy Kerr len = recvfrom(sd, buf, sizeof(buf), 0, 336f0ca2e41SJeremy Kerr (struct sockaddr_mctp *)&addr, &addrlen); 337f0ca2e41SJeremy Kerr 338f0ca2e41SJeremy Kerr /* We can expect addr to describe an MCTP address */ 339f0ca2e41SJeremy Kerr assert(addrlen >= sizeof(buf)); 340f0ca2e41SJeremy Kerr assert(addr.smctp_family == AF_MCTP); 341f0ca2e41SJeremy Kerr 342f0ca2e41SJeremy Kerr printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr); 343f0ca2e41SJeremy Kerr``` 344f0ca2e41SJeremy Kerr 345f0ca2e41SJeremy KerrThe address argument to `recvfrom` and `recvmsg` is populated with the remote 346f0ca2e41SJeremy Kerraddress of the incoming message, including tag value (this will be needed in 347f0ca2e41SJeremy Kerrorder to reply to the message). 348f0ca2e41SJeremy Kerr 349f0ca2e41SJeremy KerrThe first byte of the message buffer will contain the message type byte. If an 350f0ca2e41SJeremy Kerrintegrity check follows the message, it will be included in the received buffer. 351f0ca2e41SJeremy Kerr 352f0ca2e41SJeremy KerrThe `recv()` and `read()` system calls behave in a similar way, but do not 353f0ca2e41SJeremy Kerrprovide a remote address to the application. Therefore, these are only useful if 354f0ca2e41SJeremy Kerrthe remote address is already known, or the message does not require a reply. 355f0ca2e41SJeremy Kerr 356f0ca2e41SJeremy KerrLike the send calls, sockets will only receive responses to requests they have 357f0ca2e41SJeremy Kerrsent (TO=1) and may only respond (TO=0) to requests they have received. 358f0ca2e41SJeremy Kerr 359*f4febd00SPatrick Williams#### `getsockname()` & `getpeername()`: query local/remote socket address 360f0ca2e41SJeremy Kerr 361f0ca2e41SJeremy KerrThe `getsockname()` system call returns the `struct sockaddr_mctp` value for the 362f0ca2e41SJeremy Kerrlocal side of this socket, `getpeername()` for the remote (ie, that used in a 363f0ca2e41SJeremy Kerrconnect()). Since the tag value is a property of the remote address, 364f0ca2e41SJeremy Kerr`getpeername()` may be used to retrieve a kernel-allocated tag value. 365f0ca2e41SJeremy Kerr 366f0ca2e41SJeremy KerrCalling `getpeername()` on an unconnected socket will result in an error of 367f0ca2e41SJeremy Kerr`ENOTCONN`. 368f0ca2e41SJeremy Kerr 369*f4febd00SPatrick Williams#### Socket options 370f0ca2e41SJeremy Kerr 371f0ca2e41SJeremy KerrThe following socket options are defined for MCTP sockets: 372f0ca2e41SJeremy Kerr 373*f4febd00SPatrick Williams##### `MCTP_ADDR_EXT`: Use extended addressing information in sendmsg/recvmsg 374f0ca2e41SJeremy Kerr 375f0ca2e41SJeremy KerrEnabling this socket option allows an application to specify extended addressing 376f0ca2e41SJeremy Kerrinformation on transmitted packets, and access the same on received packets. 377f0ca2e41SJeremy Kerr 378f0ca2e41SJeremy KerrWhen the `MCTP_ADDR_EXT` socket option is enabled, the application may specify 379f0ca2e41SJeremy Kerran expanded `struct sockaddr` to the `recvfrom()` and `sendto()` system calls. 380f0ca2e41SJeremy KerrThis as defined as: 381f0ca2e41SJeremy Kerr 382f0ca2e41SJeremy Kerr```c 383f0ca2e41SJeremy Kerr struct sockaddr_mctp_ext { 384f0ca2e41SJeremy Kerr /* fields exactly match struct sockaddr_mctp */ 385f0ca2e41SJeremy Kerr sa_family_t smctp_family; /* = AF_MCTP */ 386f0ca2e41SJeremy Kerr int smctp_network; 387f0ca2e41SJeremy Kerr struct mctp_addr smctp_addr; 388f0ca2e41SJeremy Kerr uint8_t smcp_tag; 389f0ca2e41SJeremy Kerr /* extended addressing */ 390f0ca2e41SJeremy Kerr int smctp_ifindex; 391f0ca2e41SJeremy Kerr uint8_t smctp_halen; 392f0ca2e41SJeremy Kerr unsigned char smctp_haddr[/* TBD */]; 393f0ca2e41SJeremy Kerr } 394f0ca2e41SJeremy Kerr``` 395f0ca2e41SJeremy Kerr 396f0ca2e41SJeremy KerrIf the `addrlen` specified to `sendto()` or `recvfrom()` is sufficient to 397f0ca2e41SJeremy Kerrcontain this larger structure, then the extended addressing fields are consumed 398f0ca2e41SJeremy Kerr/ populated respectively. 399f0ca2e41SJeremy Kerr 400*f4febd00SPatrick Williams##### `MCTP_TAG_CONTROL`: manage outgoing tag allocation behaviour 401f0ca2e41SJeremy Kerr 402f0ca2e41SJeremy KerrThe set/getsockopt argument is a `mctp_tagctl` structure: 403f0ca2e41SJeremy Kerr 404f0ca2e41SJeremy Kerr struct mctp_tagctl { 405f0ca2e41SJeremy Kerr bool retain; 406f0ca2e41SJeremy Kerr struct timespec timeout; 407f0ca2e41SJeremy Kerr }; 408f0ca2e41SJeremy Kerr 409f0ca2e41SJeremy KerrThis allows an application to control the behaviour of allocated tags for 410f0ca2e41SJeremy Kerrnon-connected sockets when transferring messages to multiple different 411f0ca2e41SJeremy Kerrdestinations (ie., where a `struct sockaddr_mctp` is provided for individual 412f0ca2e41SJeremy Kerrmessages, and the `smctp_addr` destination for those sockets may vary across 413f0ca2e41SJeremy Kerrcalls). 414f0ca2e41SJeremy Kerr 415f0ca2e41SJeremy KerrThe `retain` flag indicates to the kernel that the socket should not release tag 416f0ca2e41SJeremy Kerrallocations when a message is sent to a new destination EID. This causes the 417f0ca2e41SJeremy Kerrsocket to continue to receive incoming messages to the old (dest,tag) tuple, in 418f0ca2e41SJeremy Kerraddition to the new tuple. 419f0ca2e41SJeremy Kerr 420f0ca2e41SJeremy KerrThe `timeout` value specifies a maximum amount of time to retain tag values. 421f0ca2e41SJeremy KerrThis should be based on the reply timeout for any upper-level protocol. 422f0ca2e41SJeremy Kerr 423f0ca2e41SJeremy KerrThe kernel may reject a request to set values that would cause excessive tag 424f0ca2e41SJeremy Kerrallocation by this socket. The kernel may also reject subsequent tag-allocation 425f0ca2e41SJeremy Kerrrequests (through send or connect syscalls) which would cause excessive tags to 426f0ca2e41SJeremy Kerrbe consumed by the socket, even though the tag control settings were accepted in 427f0ca2e41SJeremy Kerrthe setsockopt operation. 428f0ca2e41SJeremy Kerr 429f0ca2e41SJeremy KerrChanging the default tag control behaviour should only be required when: 430f0ca2e41SJeremy Kerr 431*f4febd00SPatrick Williams- the socket is sending messages with TO=1 (ie, is a requester); and 432*f4febd00SPatrick Williams- messages are sent to multiple different destination EIDs from the one socket. 433f0ca2e41SJeremy Kerr 434*f4febd00SPatrick Williams#### Syscalls not implemented 435f0ca2e41SJeremy Kerr 436f0ca2e41SJeremy KerrThe following system calls are not implemented for MCTP, primarily as they are 437f0ca2e41SJeremy Kerrnot used in `SOCK_DGRAM`-type sockets: 438f0ca2e41SJeremy Kerr 439*f4febd00SPatrick Williams- `listen()` 440*f4febd00SPatrick Williams- `accept()` 441*f4febd00SPatrick Williams- `ioctl()` 442*f4febd00SPatrick Williams- `shutdown()` 443*f4febd00SPatrick Williams- `mmap()` 444f0ca2e41SJeremy Kerr 445*f4febd00SPatrick Williams### Userspace examples 446f0ca2e41SJeremy Kerr 447f0ca2e41SJeremy KerrThese examples cover three general use-cases: 448f0ca2e41SJeremy Kerr 449*f4febd00SPatrick Williams- **requester**: sends requests to a particular (EID, type) target, and receives 450*f4febd00SPatrick Williams responses to those packets 451f0ca2e41SJeremy Kerr 452f0ca2e41SJeremy Kerr This is similar to a typical UDP client 453f0ca2e41SJeremy Kerr 454f0ca2e41SJeremy Kerr- **responder**: receives all locally-addressed messages of a specific 455f0ca2e41SJeremy Kerr message-type, and responds to the requester immediately. 456f0ca2e41SJeremy Kerr 457f0ca2e41SJeremy Kerr This is similar to a typical UDP server 458f0ca2e41SJeremy Kerr 459f0ca2e41SJeremy Kerr- **controller**: a specific service for a bus owner; may send broadcast 460*f4febd00SPatrick Williams messages, manage EID allocations, update local MCTP stack state. Will need 461*f4febd00SPatrick Williams low-level packet data. 462f0ca2e41SJeremy Kerr 463f0ca2e41SJeremy Kerr This is similar to a DHCP server. 464f0ca2e41SJeremy Kerr 465*f4febd00SPatrick Williams#### Requester 466f0ca2e41SJeremy Kerr 467*f4febd00SPatrick Williams"Client"-side implementation to send requests to a responder, and receive a 468*f4febd00SPatrick Williamsresponse. This uses a (fictitious) message type of `MCTP_TYPE_ECHO`. 469f0ca2e41SJeremy Kerr 470f0ca2e41SJeremy Kerr```c 471f0ca2e41SJeremy Kerr int main() { 472f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 473f0ca2e41SJeremy Kerr socklen_t addrlen; 474f0ca2e41SJeremy Kerr struct { 475f0ca2e41SJeremy Kerr uint8_t type; 476f0ca2e41SJeremy Kerr uint8_t data[14]; 477f0ca2e41SJeremy Kerr } msg; 478f0ca2e41SJeremy Kerr int sd, rc; 479f0ca2e41SJeremy Kerr 480f0ca2e41SJeremy Kerr sd = socket(AF_MCTP, SOCK_DGRAM, 0); 481f0ca2e41SJeremy Kerr 482f0ca2e41SJeremy Kerr addr.sa_family = AF_MCTP; 483f0ca2e41SJeremy Kerr addr.smctp_network = MCTP_NET_ANY; /* any network */ 484f0ca2e41SJeremy Kerr addr.smctp_addr.s_addr = 9; /* remote eid 9 */ 485f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_OWNER; /* kernel will allocate an owned tag */ 486f0ca2e41SJeremy Kerr addr.smctp_type = MCTP_TYPE_ECHO; /* ficticious message type */ 487f0ca2e41SJeremy Kerr addrlen = sizeof(addr); 488f0ca2e41SJeremy Kerr 489f0ca2e41SJeremy Kerr /* set message type and payload */ 490f0ca2e41SJeremy Kerr msg.type = MCTP_TYPE_ECHO; 491f0ca2e41SJeremy Kerr strncpy(msg.data, "hello, world!", sizeof(msg.data)); 492f0ca2e41SJeremy Kerr 493f0ca2e41SJeremy Kerr /* send message */ 494f0ca2e41SJeremy Kerr rc = sendto(sd, &msg, sizeof(msg), 0, 495f0ca2e41SJeremy Kerr (struct sockaddr *)&addr, addrlen); 496f0ca2e41SJeremy Kerr 497f0ca2e41SJeremy Kerr if (rc < 0) 498f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "sendto"); 499f0ca2e41SJeremy Kerr 500f0ca2e41SJeremy Kerr /* Receive reply. This will block until a reply arrives, 501f0ca2e41SJeremy Kerr * which may never happen. Actual code would need a timeout 502f0ca2e41SJeremy Kerr * here. */ 503f0ca2e41SJeremy Kerr rc = recvfrom(sd, &msg, sizeof(msg), 0, 504f0ca2e41SJeremy Kerr (struct sockaddr *)&addr, &addrlen); 505f0ca2e41SJeremy Kerr if (rc < 0) 506f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "recvfrom"); 507f0ca2e41SJeremy Kerr 508f0ca2e41SJeremy Kerr assert(msg.type == MCTP_TYPE_ECHO); 509f0ca2e41SJeremy Kerr /* ensure we're nul-terminated */ 510f0ca2e41SJeremy Kerr msg.data[sizeof(msg.data)-1] = '\0'; 511f0ca2e41SJeremy Kerr 512f0ca2e41SJeremy Kerr printf("reply: %s\n", msg.data); 513f0ca2e41SJeremy Kerr 514f0ca2e41SJeremy Kerr return EXIT_SUCCESS; 515f0ca2e41SJeremy Kerr } 516f0ca2e41SJeremy Kerr``` 517f0ca2e41SJeremy Kerr 518*f4febd00SPatrick Williams#### Responder 519f0ca2e41SJeremy Kerr 520f0ca2e41SJeremy Kerr"Server"-side implementation to receive requests and respond. Like the client, 521*f4febd00SPatrick WilliamsThis uses a (fictitious) message type of `MCTP_TYPE_ECHO` in the 522*f4febd00SPatrick Williams`struct sockaddr_mctp`; only messages matching this type will be received. 523f0ca2e41SJeremy Kerr 524f0ca2e41SJeremy Kerr```c 525f0ca2e41SJeremy Kerr int main() { 526f0ca2e41SJeremy Kerr struct sockaddr_mctp addr; 527f0ca2e41SJeremy Kerr socklen_t addrlen; 528f0ca2e41SJeremy Kerr int sd, rc; 529f0ca2e41SJeremy Kerr 530f0ca2e41SJeremy Kerr sd = socket(AF_MCTP, SOCK_DGRAM, 0); 531f0ca2e41SJeremy Kerr 532f0ca2e41SJeremy Kerr addr.sa_family = AF_MCTP; 533f0ca2e41SJeremy Kerr addr.smctp_network = MCTP_NET_ANY; /* any network */ 534f0ca2e41SJeremy Kerr addr.smctp_addr.s_addr = MCTP_EID_ANY; 535f0ca2e41SJeremy Kerr addr.smctp_type = MCTP_TYPE_ECHO; 536f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_OWNER; 537f0ca2e41SJeremy Kerr addrlen = sizeof(addr); 538f0ca2e41SJeremy Kerr 539f0ca2e41SJeremy Kerr rc = bind(sd, (struct sockaddr *)&addr, addrlen); 540f0ca2e41SJeremy Kerr if (rc) 541f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "bind"); 542f0ca2e41SJeremy Kerr 543f0ca2e41SJeremy Kerr for (;;) { 544f0ca2e41SJeremy Kerr struct { 545f0ca2e41SJeremy Kerr uint8_t type; 546f0ca2e41SJeremy Kerr uint8_t data[14]; 547f0ca2e41SJeremy Kerr } msg; 548f0ca2e41SJeremy Kerr 549f0ca2e41SJeremy Kerr rc = recvfrom(sd, &msg, sizeof(msg), 0, 550f0ca2e41SJeremy Kerr (struct sockaddr *)&addr, &addrlen); 551f0ca2e41SJeremy Kerr if (rc < 0) 552f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "recvfrom"); 553f0ca2e41SJeremy Kerr if (rc < 1) 554f0ca2e41SJeremy Kerr warnx("not enough data for a message type"); 555f0ca2e41SJeremy Kerr 556f0ca2e41SJeremy Kerr assert(addrlen == sizeof(addr)); 557f0ca2e41SJeremy Kerr assert(msg.type == MCTP_TYPE_ECHO); 558f0ca2e41SJeremy Kerr 559f0ca2e41SJeremy Kerr printf("%zd bytes from EID %d\n", rc, addr.smctp_addr); 560f0ca2e41SJeremy Kerr 561f0ca2e41SJeremy Kerr /* Reply to requester; this macro just clears the TO-bit. 562f0ca2e41SJeremy Kerr * Other addr fields will describe the remote endpoint, 563f0ca2e41SJeremy Kerr * so use those as-is. 564f0ca2e41SJeremy Kerr */ 565f0ca2e41SJeremy Kerr addr.smctp_tag = MCTP_TAG_RSP(addr.smctp_tag); 566f0ca2e41SJeremy Kerr 567f0ca2e41SJeremy Kerr rc = sendto(sd, &msg, rc, 0, 568f0ca2e41SJeremy Kerr (struct sockaddr *)&addr, addrlen); 569f0ca2e41SJeremy Kerr if (rc < 0) 570f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "sendto"); 571f0ca2e41SJeremy Kerr } 572f0ca2e41SJeremy Kerr 573f0ca2e41SJeremy Kerr return EXIT_SUCCESS; 574f0ca2e41SJeremy Kerr } 575f0ca2e41SJeremy Kerr``` 576f0ca2e41SJeremy Kerr 577*f4febd00SPatrick Williams#### Broadcast request 578f0ca2e41SJeremy Kerr 579f0ca2e41SJeremy KerrSends a request to a broadcast EID, and receives (unicast) replies. Typical 580f0ca2e41SJeremy Kerrcontrol protocol pattern. 581f0ca2e41SJeremy Kerr 582f0ca2e41SJeremy Kerr```c 583f0ca2e41SJeremy Kerr int main() { 584f0ca2e41SJeremy Kerr struct sockaddr_mctp txaddr, rxaddr; 585f0ca2e41SJeremy Kerr struct timespec start, cur; 586f0ca2e41SJeremy Kerr struct pollfd pollfds[1]; 587f0ca2e41SJeremy Kerr socklen_t addrlen; 588f0ca2e41SJeremy Kerr uint8_t buf[2]; 589f0ca2e41SJeremy Kerr int timeout; 590f0ca2e41SJeremy Kerr 591f0ca2e41SJeremy Kerr sd = socket(AF_MCTP, SOCK_DGRAM, 0); 592f0ca2e41SJeremy Kerr 593f0ca2e41SJeremy Kerr /* destination address setup */ 594f0ca2e41SJeremy Kerr txaddr.sa_family = AF_MCTP; 595f0ca2e41SJeremy Kerr txaddr.smctp_network = 1; /* specific network required for broadcast */ 596f0ca2e41SJeremy Kerr txaddr.smctp_addr.s_addr = MCTP_TAG_BCAST; /* broadcast dest */ 597f0ca2e41SJeremy Kerr txaddr.smctp_type = MCTP_TYPE_CONTROL; 598f0ca2e41SJeremy Kerr txaddr.smctp_tag = MCTP_TAG_OWNER; 599f0ca2e41SJeremy Kerr 600f0ca2e41SJeremy Kerr buf[0] = MCTP_TYPE_CONTROL; 601f0ca2e41SJeremy Kerr buf[1] = 'a'; 602f0ca2e41SJeremy Kerr 603f0ca2e41SJeremy Kerr /* We're doing a sendto() to a broadcast address here. If we were 604f0ca2e41SJeremy Kerr * sending more than one broadcast message, we'd be better off 605f0ca2e41SJeremy Kerr * doing connect(); sendto();, in order to retain the tag 606f0ca2e41SJeremy Kerr * reservation across all transmitted messages. However, since this 607f0ca2e41SJeremy Kerr * is a single transmit, that makes no difference in this 608f0ca2e41SJeremy Kerr * particular case. 609f0ca2e41SJeremy Kerr */ 610f0ca2e41SJeremy Kerr rc = sendto(sd, buf, 2, 0, (struct sockaddr *)&txaddr, 611f0ca2e41SJeremy Kerr sizeof(txaddr)); 612f0ca2e41SJeremy Kerr if (rc < 0) 613f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "sendto"); 614f0ca2e41SJeremy Kerr 615f0ca2e41SJeremy Kerr /* Set up poll behaviour, and record our starting time for 616f0ca2e41SJeremy Kerr * reply timeouts */ 617f0ca2e41SJeremy Kerr pollfds[0].fd = sd; 618f0ca2e41SJeremy Kerr pollfds[0].events = POLLIN; 619f0ca2e41SJeremy Kerr clock_gettime(CLOCK_MONOTONIC, &start); 620f0ca2e41SJeremy Kerr 621f0ca2e41SJeremy Kerr for (;;) { 622f0ca2e41SJeremy Kerr /* Calculate the amount of time left for replies */ 623f0ca2e41SJeremy Kerr clock_gettime(CLOCK_MONOTONIC, &cur); 624f0ca2e41SJeremy Kerr timeout = calculate_timeout(&start, &cur, 1000); 625f0ca2e41SJeremy Kerr 626f0ca2e41SJeremy Kerr rc = poll(pollfds, 1, timeout) 627f0ca2e41SJeremy Kerr if (rc < 0) 628f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "poll"); 629f0ca2e41SJeremy Kerr 630f0ca2e41SJeremy Kerr /* timeout receiving a reply? */ 631f0ca2e41SJeremy Kerr if (rc == 0) 632f0ca2e41SJeremy Kerr break; 633f0ca2e41SJeremy Kerr 634f0ca2e41SJeremy Kerr /* sanity check that we have a message to receive */ 635f0ca2e41SJeremy Kerr if (!(pollfds[0].revents & POLLIN)) 636f0ca2e41SJeremy Kerr break; 637f0ca2e41SJeremy Kerr 638f0ca2e41SJeremy Kerr addrlen = sizeof(rxaddr); 639f0ca2e41SJeremy Kerr 640f0ca2e41SJeremy Kerr rc = recvfrom(sd, &buf, 2, 0, (struct sockaddr *)&rxaddr, 641f0ca2e41SJeremy Kerr &addrlen); 642f0ca2e41SJeremy Kerr if (rc < 0) 643f0ca2e41SJeremy Kerr err(EXIT_FAILURE, "recvfrom"); 644f0ca2e41SJeremy Kerr 645f0ca2e41SJeremy Kerr assert(addrlen >= sizeof(rxaddr)); 646f0ca2e41SJeremy Kerr assert(rxaddr.smctp_family == AF_MCTP); 647f0ca2e41SJeremy Kerr 648f0ca2e41SJeremy Kerr printf("response from EID %d\n", rxaddr.smctp_addr); 649f0ca2e41SJeremy Kerr } 650f0ca2e41SJeremy Kerr 651f0ca2e41SJeremy Kerr return EXIT_SUCCESS; 652f0ca2e41SJeremy Kerr } 653f0ca2e41SJeremy Kerr``` 654f0ca2e41SJeremy Kerr 655*f4febd00SPatrick Williams### Implementation notes 656f0ca2e41SJeremy Kerr 657*f4febd00SPatrick Williams#### Addressing 658f0ca2e41SJeremy Kerr 659f0ca2e41SJeremy KerrTransmitted messages (through `sendto()` and related system calls) specify their 660*f4febd00SPatrick Williamsdestination via the `smctp_network` and `smctp_addr` fields of a 661*f4febd00SPatrick Williams`struct sockaddr_mctp`. 662f0ca2e41SJeremy Kerr 663f0ca2e41SJeremy KerrThe `smctp_addr` field maps directly to the destination endpoint's EID. 664f0ca2e41SJeremy Kerr 665f0ca2e41SJeremy KerrThe `smctp_network` field specifies a locally defined network identifier. To 666f0ca2e41SJeremy Kerrsimplify situations where there is only one network defined, the special value 667f0ca2e41SJeremy Kerr`MCTP_NET_ANY` is allowed. This will allow the kernel to select a specific 668f0ca2e41SJeremy Kerrnetwork for transmission. 669f0ca2e41SJeremy Kerr 670f0ca2e41SJeremy KerrThis selection is entirely user-configured; one specific network may be defined 671f0ca2e41SJeremy Kerras the system default, in which case it will be used for all message 672f0ca2e41SJeremy Kerrtransmission where `MCTP_NET_ANY` is used as the destination network. 673f0ca2e41SJeremy Kerr 674f0ca2e41SJeremy KerrIn particular, the destination EID is never used to select a destination 675f0ca2e41SJeremy Kerrnetwork. 676f0ca2e41SJeremy Kerr 677f0ca2e41SJeremy KerrMCTP responders should use the EID and network values of an incoming request to 678f0ca2e41SJeremy Kerrspecify the destination for any responses. 679f0ca2e41SJeremy Kerr 680*f4febd00SPatrick Williams#### Bridging/routing 681f0ca2e41SJeremy Kerr 682f0ca2e41SJeremy KerrThe network and interface structure allows multiple interfaces to share a common 683f0ca2e41SJeremy Kerrnetwork. By default, packets are not forwarded between interfaces. 684f0ca2e41SJeremy Kerr 685f0ca2e41SJeremy KerrA network can be configured for "forwarding" mode. In this mode, packets may be 686f0ca2e41SJeremy Kerrforwarded if their destination EID is non-local, and matches a route for another 687f0ca2e41SJeremy Kerrinterface on the same network. 688f0ca2e41SJeremy Kerr 689f0ca2e41SJeremy KerrAs per DSP0236, packet reassembly does not occur during the forwarding process. 690f0ca2e41SJeremy KerrIf the packet is larger than the MTU for the destination interface/route, then 691f0ca2e41SJeremy Kerrthe packet is dropped. 692f0ca2e41SJeremy Kerr 693*f4febd00SPatrick Williams#### Tag behaviour for transmitted messages 694f0ca2e41SJeremy Kerr 695f0ca2e41SJeremy KerrOn every message sent with the tag-owner bit set ("TO" in DSP0236), the kernel 696f0ca2e41SJeremy Kerrmust allocate a tag that will uniquely identify responses over a (destination 697f0ca2e41SJeremy KerrEID, source EID, tag-owner, tag) tuple. The tag value is 3 bits in size. 698f0ca2e41SJeremy Kerr 699f0ca2e41SJeremy KerrTo allow this, a `sendto()` with the `MCTP_TAG_OWNER` bit set in the `smctp_tag` 700f0ca2e41SJeremy Kerrfield will cause the kernel to allocate a unique tag for subsequent replies from 701f0ca2e41SJeremy Kerrthat specific remote EID. 702f0ca2e41SJeremy Kerr 703f0ca2e41SJeremy KerrThis allocation will expire when any of the following occur: 704f0ca2e41SJeremy Kerr 705*f4febd00SPatrick Williams- the socket is closed 706*f4febd00SPatrick Williams- a new message is sent to a new destination EID 707*f4febd00SPatrick Williams- an implementation-defined timeout expires 708f0ca2e41SJeremy Kerr 709f0ca2e41SJeremy KerrBecause the "tag space" is limited, it may not be possible for the kernel to 710f0ca2e41SJeremy Kerrallocate a unique tag for the outgoing message. In this case, the `sendto()` 711f0ca2e41SJeremy Kerrcall will fail with errno `EAGAIN`. This is analogous to the UDP behaviour when 712f0ca2e41SJeremy Kerra local port cannot be allocated for an outgoing message. 713f0ca2e41SJeremy Kerr 714f0ca2e41SJeremy KerrThe implementation-defined timeout value shall be chosen to reasonably cover 715f0ca2e41SJeremy Kerrstandard reply timeouts. If necessary, this timeout may be modified through the 716f0ca2e41SJeremy Kerr`MCTP_TAG_CONTROL` socket option. 717f0ca2e41SJeremy Kerr 718f0ca2e41SJeremy KerrFor applications that expect to perform an ongoing message exchange with a 719f0ca2e41SJeremy Kerrparticular destination address, they may use the `connect()` call to set a 720f0ca2e41SJeremy Kerrpersistent remote address. In this case, the tag will be allocated during 721f0ca2e41SJeremy Kerrconnect(), and remain reserved for this socket until any of the following occur: 722f0ca2e41SJeremy Kerr 723*f4febd00SPatrick Williams- the socket is closed 724*f4febd00SPatrick Williams- the remote address is changed through another call to `connect()`. 725f0ca2e41SJeremy Kerr 726f0ca2e41SJeremy KerrIn particular, calling `sendto()` with a different address does not release the 727f0ca2e41SJeremy Kerrtag reservation. 728f0ca2e41SJeremy Kerr 729f0ca2e41SJeremy KerrBroadcast messages are particularly onerous for tag reservations. When a message 730f0ca2e41SJeremy Kerris transmitted with TO=1 and dest=0xff (the broadcast EID), the kernel must 731f0ca2e41SJeremy Kerrreserve the tag across the entire range of possible EIDs. Therefore, a 732f0ca2e41SJeremy Kerrparticular tag value must be currently-unused across all EIDs to allow a 733f0ca2e41SJeremy Kerr`sendto()` to a broadcast address. Additionally, this reservation is not cleared 734f0ca2e41SJeremy Kerrwhen a reply is received, as there may be multiple replies to a broadcast. 735f0ca2e41SJeremy Kerr 736f0ca2e41SJeremy KerrFor this reason, applications wanting to send to the broadcast address should 737f0ca2e41SJeremy Kerruse the `connect()` system call to reserve a tag, and guarantee its availability 738f0ca2e41SJeremy Kerrfor future message transmission. Note that this will remove the tag value for 739*f4febd00SPatrick Williamsuse with _any other EID_. Sending to the broadcast address should be avoided; we 740f0ca2e41SJeremy Kerrexpect few applications will need this functionality. 741f0ca2e41SJeremy Kerr 742*f4febd00SPatrick Williams#### MCTP Control Protocol implementation 743f0ca2e41SJeremy Kerr 744f0ca2e41SJeremy KerrAside from the "Resolve endpoint EID" message, the MCTP control protocol 745f0ca2e41SJeremy Kerrimplementation would exist as a userspace process, `mctpd`. This process is 746f0ca2e41SJeremy Kerrresponsible for responding to incoming control protocol messages, any dynamic 747f0ca2e41SJeremy KerrEID allocations (for bus owner devices) and maintaining the MCTP route table 748f0ca2e41SJeremy Kerr(for bridging devices). 749f0ca2e41SJeremy Kerr 750f0ca2e41SJeremy KerrThis process would create a socket bound to the type `MCTP_TYPE_CONTROL`, with 751f0ca2e41SJeremy Kerrthe `MCTP_ADDR_EXT` socket option enabled in order to access physical addressing 752f0ca2e41SJeremy Kerrdata on incoming control protocol requests. It would interact with the kernel's 753f0ca2e41SJeremy Kerrroute table via a netlink interface - the same as that implemented for the 754f0ca2e41SJeremy Kerr[Utility and configuration interfaces](#utility-and-configuration-interfaces). 755f0ca2e41SJeremy Kerr 756*f4febd00SPatrick Williams### Neighbour and routing implementation 757f0ca2e41SJeremy Kerr 758f0ca2e41SJeremy KerrThe packet-transmission behaviour of the MCTP infrastructure relies on a single 759f0ca2e41SJeremy Kerrrouting table to lookup both route and neighbour information. Entries in this 760f0ca2e41SJeremy Kerrtable are of the format: 761f0ca2e41SJeremy Kerr 762f0ca2e41SJeremy Kerr| EID range | interface | physical address | metric | MTU | flags | expiry | 763*f4febd00SPatrick Williams| --------- | --------- | ---------------- | ------ | --- | ----- | ------ | 764f0ca2e41SJeremy Kerr 765f0ca2e41SJeremy KerrThis table can be updated from two sources: 766f0ca2e41SJeremy Kerr 767*f4febd00SPatrick Williams- From userspace, via a netlink interface (see the 768f0ca2e41SJeremy Kerr [Utility and configuration interfaces](#utility-and-configuration-interfaces) 769f0ca2e41SJeremy Kerr section). 770f0ca2e41SJeremy Kerr 771*f4febd00SPatrick Williams- Directly within the kernel, when basic neighbour information is discovered. 772f0ca2e41SJeremy Kerr Kernel-originated routes are marked as such in the flags field, and have a 773f0ca2e41SJeremy Kerr maximum validity age, indicated by the expiry field. 774f0ca2e41SJeremy Kerr 775f0ca2e41SJeremy KerrKernel-discovered routing information can originate from two sources: 776f0ca2e41SJeremy Kerr 777*f4febd00SPatrick Williams- physical-to-EID mappings discovered through received packets 778f0ca2e41SJeremy Kerr 779*f4febd00SPatrick Williams- explicit endpoint physical-address resolution requests 780f0ca2e41SJeremy Kerr 781f0ca2e41SJeremy KerrWhen a packet is to be transmitted to an EID that does not have an entry in the 782f0ca2e41SJeremy Kerrrouting table, the kernel may attempt to resolve the physical address of that 783f0ca2e41SJeremy Kerrendpoint using the Resolve Endpoint ID command of the MCTP Control Protocol 784f0ca2e41SJeremy Kerr(section 12.9 of DSP0236). The response message will be used to add a 785f0ca2e41SJeremy Kerrkernel-originated route into the routing table. 786f0ca2e41SJeremy Kerr 787f0ca2e41SJeremy KerrThis is the only kernel-internal usage of MCTP Control Protocol messages. 788f0ca2e41SJeremy Kerr 789*f4febd00SPatrick Williams## Utility and configuration interfaces 790f0ca2e41SJeremy Kerr 791f0ca2e41SJeremy KerrA small utility will be developed to control the state of the kernel MCTP stack. 792f0ca2e41SJeremy KerrThis will be similar in design to the 'iproute2' tools, which perform a similar 793f0ca2e41SJeremy Kerrfunction for the IPv4 and IPv6 protocols. 794f0ca2e41SJeremy Kerr 795f0ca2e41SJeremy KerrThe utility will be invoked as `mctp`, and provide subcommands for managing 796f0ca2e41SJeremy Kerrdifferent aspects of the kernel stack. 797f0ca2e41SJeremy Kerr 798*f4febd00SPatrick Williams### `mctp link`: manage interfaces 799f0ca2e41SJeremy Kerr 800f0ca2e41SJeremy Kerr```sh 801f0ca2e41SJeremy Kerr mctp link set <link> <up|down> 802f0ca2e41SJeremy Kerr mctp link set <link> network <network-id> 803f0ca2e41SJeremy Kerr mctp link set <link> mtu <mtu> 804f0ca2e41SJeremy Kerr mctp link set <link> bus-owner <hwaddr> 805f0ca2e41SJeremy Kerr``` 806f0ca2e41SJeremy Kerr 807*f4febd00SPatrick Williams### `mctp network`: manage networks 808f0ca2e41SJeremy Kerr 809f0ca2e41SJeremy Kerr```sh 810f0ca2e41SJeremy Kerr mctp network create <network-id> 811f0ca2e41SJeremy Kerr mctp network set <network-id> forwarding <on|off> 812f0ca2e41SJeremy Kerr mctp network set <network-id> default [<true|false>] 813f0ca2e41SJeremy Kerr mctp network delete <network-id> 814f0ca2e41SJeremy Kerr``` 815f0ca2e41SJeremy Kerr 816*f4febd00SPatrick Williams### `mctp address`: manage local EID assignments 817f0ca2e41SJeremy Kerr 818f0ca2e41SJeremy Kerr```sh 819f0ca2e41SJeremy Kerr mctp address add <eid> dev <link> 820f0ca2e41SJeremy Kerr mctp address del <eid> dev <link> 821f0ca2e41SJeremy Kerr``` 822f0ca2e41SJeremy Kerr 823*f4febd00SPatrick Williams### `mctp route`: manage routing tables 824f0ca2e41SJeremy Kerr 825f0ca2e41SJeremy Kerr```sh 826f0ca2e41SJeremy Kerr mctp route add net <network-id> eid <eid|eid-range> via <link> [hwaddr <addr>] [mtu <mtu>] [metric <metric>] 827f0ca2e41SJeremy Kerr mctp route del net <network-id> eid <eid|eid-range> via <link> [hwaddr <addr>] [mtu <mtu>] [metric <metric>] 828f0ca2e41SJeremy Kerr mctp route show [net <network-id>] 829f0ca2e41SJeremy Kerr``` 830f0ca2e41SJeremy Kerr 831*f4febd00SPatrick Williams### `mctp stat`: query socket status 832f0ca2e41SJeremy Kerr 833f0ca2e41SJeremy Kerr```sh 834f0ca2e41SJeremy Kerr mctp stat 835f0ca2e41SJeremy Kerr``` 836f0ca2e41SJeremy Kerr 837f0ca2e41SJeremy KerrA set of netlink message formats will be defined to support these control 838f0ca2e41SJeremy Kerrfunctions. 839f0ca2e41SJeremy Kerr 840*f4febd00SPatrick Williams# Design points & alternatives considered 841f0ca2e41SJeremy Kerr 842*f4febd00SPatrick Williams## Including message-type byte in send/receive buffers 843f0ca2e41SJeremy Kerr 844f0ca2e41SJeremy KerrThis design specifies that message buffers passed to the kernel in send syscalls 845f0ca2e41SJeremy Kerrand from the kernel in receive syscalls will have the message type byte as the 846f0ca2e41SJeremy Kerrfirst byte of the buffer. This corresponds to the definition of a MCTP message 847f0ca2e41SJeremy Kerrpayload in DSP0236. 848f0ca2e41SJeremy Kerr 849f0ca2e41SJeremy KerrThis somewhat duplicates the type data provided in `struct sockaddr_mctp`; it's 850f0ca2e41SJeremy Kerrsuperficially possible for the kernel to prepend this byte on send, and remove 851f0ca2e41SJeremy Kerrit on receive. 852f0ca2e41SJeremy Kerr 853f0ca2e41SJeremy KerrHowever, the exact format of the MCTP message payload is not precisely defined 854f0ca2e41SJeremy Kerrby the specification. Particularly, any message integrity check data (which 855f0ca2e41SJeremy Kerrwould also need to be appended / stripped in conjunction with the type byte) is 856f0ca2e41SJeremy Kerrdefined by the type specification, not DSP0236. The kernel would need knowledge 857f0ca2e41SJeremy Kerrof all protocols in order to correctly deconstruct the payload data. 858f0ca2e41SJeremy Kerr 859f0ca2e41SJeremy KerrTherefore, we transfer the message payload as-is to userspace, without any 860f0ca2e41SJeremy Kerrmodification by the kernel. 861f0ca2e41SJeremy Kerr 862*f4febd00SPatrick Williams## MCTP message-type specification: using `sockaddr_mctp.smctp_type` rather than protocol 863f0ca2e41SJeremy Kerr 864f0ca2e41SJeremy KerrThis design specifies message-types to be passed in the `smctp_type` field of 865f0ca2e41SJeremy Kerr`struct sockaddr_mctp`. An alternative would be to pass it in the `protocol` 866f0ca2e41SJeremy Kerrargument of the `socket()` system call: 867f0ca2e41SJeremy Kerr 868f0ca2e41SJeremy Kerr```c 869f0ca2e41SJeremy Kerr int socket(int domain /* = AF_MCTP */, int type /* = SOCK_DGRAM */, int protocol); 870f0ca2e41SJeremy Kerr``` 871f0ca2e41SJeremy Kerr 872f0ca2e41SJeremy KerrThe `smctp_type` implementation was chosen as it better matches the "addressing" 873f0ca2e41SJeremy Kerrmodel of the message type; sockets are bound to an incoming message type, 874*f4febd00SPatrick Williamssimilar to the IP protocol's model of binding UDP sockets to a local port 875*f4febd00SPatrick Williamsnumber. 876f0ca2e41SJeremy Kerr 877f0ca2e41SJeremy KerrThere is no kernel behaviour that depends on the specific type (particularly 878f0ca2e41SJeremy Kerrgiven the design choice above), so it is not suited to use the protocol argument 879f0ca2e41SJeremy Kerrhere. 880f0ca2e41SJeremy Kerr 881f0ca2e41SJeremy KerrFuture additions that perform protocol-specific message handling, and so alter 882f0ca2e41SJeremy Kerrthe send/receive buffer format, may use a new protocol argument. 883f0ca2e41SJeremy Kerr 884*f4febd00SPatrick Williams## Networks referenced by index rather than UUID 885f0ca2e41SJeremy Kerr 886f0ca2e41SJeremy KerrThis design proposes referencing networks by an integer index. The MCTP standard 887f0ca2e41SJeremy Kerrdoes optionally associate a RFC4122 UUID with a networks; it would be possible 888f0ca2e41SJeremy Kerrto use this UUID where we pass a network identifier. 889f0ca2e41SJeremy Kerr 890f0ca2e41SJeremy KerrThis approach does not incorporate knowledge of network UUIDs in the kernel. 891f0ca2e41SJeremy KerrGiven that the Get Network ID message in the MCTP Control Protocol is 892f0ca2e41SJeremy Kerrimplemented entirely via userspace, it does not need to be aware of network 893f0ca2e41SJeremy KerrUUIDs, and requiring network references (for example, the `smctp_network` field 894f0ca2e41SJeremy Kerrof `struct sockaddr_mctp`, as type `uuid_t`) complicates assignment. 895f0ca2e41SJeremy Kerr 896f0ca2e41SJeremy KerrInstead, the index integer is used instead, in a similar fashion to the integer 897f0ca2e41SJeremy Kerrindex used to reference `struct netdevice`s elsewhere in the network stack. 898f0ca2e41SJeremy Kerr 899*f4febd00SPatrick Williams## Tag behaviour alternatives 900f0ca2e41SJeremy Kerr 901*f4febd00SPatrick WilliamsWe considered _several_ different designs for the tag handling behaviour. A 902f0ca2e41SJeremy Kerrbrief overview of the more-feasible of those, and why they were rejected: 903f0ca2e41SJeremy Kerr 904*f4febd00SPatrick Williams### Each socket is allocated a unique tag value on creation 905f0ca2e41SJeremy Kerr 906f0ca2e41SJeremy KerrWe could allocate a tag for each socket on creation, and use that value when a 907f0ca2e41SJeremy Kerrtag is required. This, however: 908f0ca2e41SJeremy Kerr 909*f4febd00SPatrick Williams- needlessly consumes a tag on non-tag-owning sockets (ie, those which send with 910*f4febd00SPatrick Williams TO=0 - responders); and 911f0ca2e41SJeremy Kerr 912*f4febd00SPatrick Williams- limits us to 8 sockets per network. 913f0ca2e41SJeremy Kerr 914*f4febd00SPatrick Williams### Tags only used for message packetisation / reassembly 915f0ca2e41SJeremy Kerr 916f0ca2e41SJeremy KerrAn alternative would be to completely dissociate tag allocation from sockets; 917f0ca2e41SJeremy Kerrand only allocate a tag for the (short-lived) task of packetising a message, and 918*f4febd00SPatrick Williamssending those packets. Tags would be released when the last packet has been 919*f4febd00SPatrick Williamssent. 920f0ca2e41SJeremy Kerr 921f0ca2e41SJeremy KerrHowever, this removes any facility to correlate responses with the correct 922f0ca2e41SJeremy Kerrsocket, which is the purpose of the TO bit in DSP0236. In order for the sending 923f0ca2e41SJeremy Kerrapplication to receive the response, we would either need to: 924f0ca2e41SJeremy Kerr 925*f4febd00SPatrick Williams- limit the system to one socket of each message type (which, for example, 926f0ca2e41SJeremy Kerr precludes running a requester and a responder of the same type); or 927f0ca2e41SJeremy Kerr 928*f4febd00SPatrick Williams- forward all incoming messages of a specific message-type to all sockets 929*f4febd00SPatrick Williams listening on that type, making it trivial to eavesdrop on MCTP data of other 930*f4febd00SPatrick Williams applications 931f0ca2e41SJeremy Kerr 932*f4febd00SPatrick Williams### Allocate a tag for one request/response pair 933f0ca2e41SJeremy Kerr 934f0ca2e41SJeremy KerrAnother alternative would be to allocate a tag on each outgoing TO=1 message, 935*f4febd00SPatrick Williamsand then release that allocation after the incoming response to that tag (TO=0) 936*f4febd00SPatrick Williamsis observed. 937f0ca2e41SJeremy Kerr 938f0ca2e41SJeremy KerrHowever, MCTP protocols exist that do not have a 1:1 mapping of responses to 939f0ca2e41SJeremy Kerrrequests - more than one response may be valid for a given request message. For 940f0ca2e41SJeremy Kerrexample, in response to a request, a NVMe-MI implementation may send an 941f0ca2e41SJeremy Kerrin-progress reply before the final reply. In this case, we would release the tag 942f0ca2e41SJeremy Kerrafter the first response is received, and then have no way to correlate the 943f0ca2e41SJeremy Kerrsecond message with the socket. 944f0ca2e41SJeremy Kerr 945f0ca2e41SJeremy KerrBroadcast MCTP request messages may have multiple replies from multiple 946f0ca2e41SJeremy Kerrendpoints, meaning we cannot release the tag allocation on the first reply. 947