xref: /openbmc/docs/designs/mctp/mctp-kernel.md (revision f4febd002df578bad816239b70950f84ea4567e8)
1f0ca2e41SJeremy Kerr# OpenBMC in-kernel MCTP
2f0ca2e41SJeremy Kerr
3f0ca2e41SJeremy KerrAuthor: Jeremy Kerr `<jk@codeconstruct.com.au>`
4f0ca2e41SJeremy Kerr
5f0ca2e41SJeremy KerrPlease refer to the [MCTP Overview](mctp.md) document for general MCTP design
6f0ca2e41SJeremy Kerrdescription, background and requirements.
7f0ca2e41SJeremy Kerr
8f0ca2e41SJeremy KerrThis document describes a kernel-based implementation of MCTP infrastructure,
9f0ca2e41SJeremy Kerrproviding a sockets-based API for MCTP communication within an OpenBMC-based
10f0ca2e41SJeremy Kerrplatform.
11f0ca2e41SJeremy Kerr
12f0ca2e41SJeremy Kerr# Requirements for a kernel implementation
13f0ca2e41SJeremy Kerr
14*f4febd00SPatrick Williams- The MCTP messaging API should be an obvious application of the existing POSIX
15f0ca2e41SJeremy Kerr  socket interface
16f0ca2e41SJeremy Kerr
17*f4febd00SPatrick Williams- Configuration should be simple for a straightforward MCTP endpoint: a single
18f0ca2e41SJeremy Kerr  network with a single local endpoint id (EID).
19f0ca2e41SJeremy Kerr
20*f4febd00SPatrick Williams- Infrastructure should be flexible enough to allow for more complex MCTP
21f0ca2e41SJeremy Kerr  networks, allowing:
22f0ca2e41SJeremy Kerr
23*f4febd00SPatrick Williams  - each MCTP network (as defined by section 3.2.31 of DSP0236) may consist of
24*f4febd00SPatrick Williams    multiple local physical interfaces, and/or multiple EIDs;
25f0ca2e41SJeremy Kerr
26f0ca2e41SJeremy Kerr  - multiple distinct (ie., non-bridged) networks, possibly containing
27f0ca2e41SJeremy Kerr    duplicated EIDs between networks;
28f0ca2e41SJeremy Kerr
29f0ca2e41SJeremy Kerr  - multiple local EIDs on a single interface, and
30f0ca2e41SJeremy Kerr
31f0ca2e41SJeremy Kerr  - customisable routing/bridging configurations within a network.
32f0ca2e41SJeremy Kerr
33*f4febd00SPatrick Williams# Proposed Design
34f0ca2e41SJeremy Kerr
35f0ca2e41SJeremy KerrThe design contains several components:
36f0ca2e41SJeremy Kerr
37*f4febd00SPatrick Williams- An interface for userspace applications to send and receive MCTP messages: A
38f0ca2e41SJeremy Kerr  mapping of the sockets API to MCTP usage
39f0ca2e41SJeremy Kerr
40*f4febd00SPatrick Williams- Infrastructure for control and configuration of the MCTP network(s),
41f0ca2e41SJeremy Kerr  consisting of a configuration utility, and a kernel messaging facility for
42f0ca2e41SJeremy Kerr  this utility to use.
43f0ca2e41SJeremy Kerr
44*f4febd00SPatrick Williams- Kernel drivers for physical interface bindings.
45f0ca2e41SJeremy Kerr
46f0ca2e41SJeremy KerrIn general, the kernel components cover the transport functionality of MCTP,
47f0ca2e41SJeremy Kerrsuch as message assembly/disassembly, packet forwarding, and physical interface
48f0ca2e41SJeremy Kerrimplementations.
49f0ca2e41SJeremy Kerr
50f0ca2e41SJeremy KerrHigher-level protocols (such as PLDM) are implemented in userspace, through the
51f0ca2e41SJeremy Kerrintroduced socket API. This also includes the majority of the MCTP Control
52f0ca2e41SJeremy KerrProtocol implementation (DSP0236, section 11) - MCTP endpoints will typically
53f0ca2e41SJeremy Kerrhave a specific process to request and respond to control protocol messages.
54f0ca2e41SJeremy KerrHowever, the kernel will include a small subset of control protocol code to
55f0ca2e41SJeremy Kerrallow very simple endpoints, with static EID allocations, to run without this
56f0ca2e41SJeremy Kerrprocess. MCTP endpoints that require more than just single-endpoint
57f0ca2e41SJeremy Kerrfunctionality (bus owners, bridges, etc), and/or dynamic EID allocation, would
58f0ca2e41SJeremy Kerrinclude the control message protocol process.
59f0ca2e41SJeremy Kerr
60f0ca2e41SJeremy KerrA new driver is introduced to handle each physical interface binding. These
61f0ca2e41SJeremy Kerrdrivers expose the appropriate `struct net_device` to handle transmission and
62f0ca2e41SJeremy Kerrreception of MCTP packets on their associated hardware channels. Under Linux,
63f0ca2e41SJeremy Kerrthe namespace for these interfaces is separate from other network interfaces -
64f0ca2e41SJeremy Kerrsuch as those for ethernet.
65f0ca2e41SJeremy Kerr
66*f4febd00SPatrick Williams## Structure: interfaces & networks
67f0ca2e41SJeremy Kerr
68f0ca2e41SJeremy KerrThe kernel models the local MCTP topology through two items: interfaces and
69f0ca2e41SJeremy Kerrnetworks.
70f0ca2e41SJeremy Kerr
71f0ca2e41SJeremy KerrAn interface (or "link") is an instance of an MCTP physical transport binding
72f0ca2e41SJeremy Kerr(as defined by DSP0236, section 3.2.47), likely connected to a specific hardware
73*f4febd00SPatrick Williamsdevice. This is represented as a `struct netdevice`, and has a user-visible name
74*f4febd00SPatrick Williamsand index (`ifindex`). Non-hardware-attached interfaces are permitted, to allow
75*f4febd00SPatrick Williamslocal loopback and/or virtual interfaces.
76f0ca2e41SJeremy Kerr
77f0ca2e41SJeremy KerrA network defines a unique address space for MCTP endpoints by endpoint-ID
78f0ca2e41SJeremy Kerr(described by DSP0236, section 3.2.31). A network has a user-visible identifier
79146f9098SGeorge Keishingto allow references from userspace. Route definitions are specific to one
80f0ca2e41SJeremy Kerrnetwork.
81f0ca2e41SJeremy Kerr
82f0ca2e41SJeremy KerrInterfaces are associated with one network. A network may be associated with one
83f0ca2e41SJeremy Kerror more interfaces.
84f0ca2e41SJeremy Kerr
85f0ca2e41SJeremy KerrIf multiple networks are present, each may contain EIDs that are also present on
86f0ca2e41SJeremy Kerrother networks.
87f0ca2e41SJeremy Kerr
88*f4febd00SPatrick Williams## Sockets API
89f0ca2e41SJeremy Kerr
90*f4febd00SPatrick Williams### Protocol definitions
91f0ca2e41SJeremy Kerr
92f0ca2e41SJeremy KerrWe define a new address family (and corresponding protocol family) for MCTP:
93f0ca2e41SJeremy Kerr
94f0ca2e41SJeremy Kerr```c
95f0ca2e41SJeremy Kerr    #define AF_MCTP /* TBD */
96f0ca2e41SJeremy Kerr    #define PF_MCTP AF_MCTP
97f0ca2e41SJeremy Kerr```
98f0ca2e41SJeremy Kerr
99f0ca2e41SJeremy KerrMCTP sockets are created with the `socket()` syscall, specifying `AF_MCTP` as
100f0ca2e41SJeremy Kerrthe domain. Currently, only a `SOCK_DGRAM` socket type is defined.
101f0ca2e41SJeremy Kerr
102f0ca2e41SJeremy Kerr```c
103f0ca2e41SJeremy Kerr    int sd = socket(AF_MCTP, SOCK_DGRAM, 0);
104f0ca2e41SJeremy Kerr```
105f0ca2e41SJeremy Kerr
106f0ca2e41SJeremy KerrThe only (current) value for the `protocol` argument is 0. Future protocol
107f0ca2e41SJeremy Kerrimplementations may be added later.
108f0ca2e41SJeremy Kerr
109f0ca2e41SJeremy KerrMCTP Sockets opened with a protocol value of 0 will communicate directly at the
110f0ca2e41SJeremy Kerrtransport layer; message buffers received by the application will consist of
111f0ca2e41SJeremy Kerrmessage data from reassembled MCTP packets, and will include the full message
112f0ca2e41SJeremy Kerrincluding message type byte and optional message integrity check (IC).
113f0ca2e41SJeremy KerrIndividual packet headers are not included; they may be accessible through a
114f0ca2e41SJeremy Kerrfuture `SOCK_RAW` socket type.
115f0ca2e41SJeremy Kerr
116f0ca2e41SJeremy KerrAs with all socket address families, source and destination addresses are
117f0ca2e41SJeremy Kerrspecified with a new `sockaddr` type:
118f0ca2e41SJeremy Kerr
119f0ca2e41SJeremy Kerr```c
120f0ca2e41SJeremy Kerr    struct sockaddr_mctp {
121f0ca2e41SJeremy Kerr            sa_family_t         smctp_family; /* = AF_MCTP */
122f0ca2e41SJeremy Kerr            int                 smctp_network;
123f0ca2e41SJeremy Kerr            struct mctp_addr    smctp_addr;
124f0ca2e41SJeremy Kerr            uint8_t             smctp_type;
125f0ca2e41SJeremy Kerr            uint8_t             smctp_tag;
126f0ca2e41SJeremy Kerr    };
127f0ca2e41SJeremy Kerr
128f0ca2e41SJeremy Kerr    struct mctp_addr {
129f0ca2e41SJeremy Kerr            uint8_t             s_addr;
130f0ca2e41SJeremy Kerr    };
131f0ca2e41SJeremy Kerr
132f0ca2e41SJeremy Kerr    /* MCTP network values */
133f0ca2e41SJeremy Kerr    #define MCTP_NET_ANY        0
134f0ca2e41SJeremy Kerr
135f0ca2e41SJeremy Kerr    /* MCTP EID values */
136f0ca2e41SJeremy Kerr    #define MCTP_ADDR_ANY       0xff
137f0ca2e41SJeremy Kerr    #define MCTP_ADDR_BCAST     0xff
138f0ca2e41SJeremy Kerr
139f0ca2e41SJeremy Kerr    /* MCTP type values. Only the least-significant 7 bits of
140f0ca2e41SJeremy Kerr     * smctp_type are used for tag matches; the specification defines
141f0ca2e41SJeremy Kerr     * the type to be 7 bits.
142f0ca2e41SJeremy Kerr     */
143f0ca2e41SJeremy Kerr    #define MCTP_TYPE_MASK      0x7f
144f0ca2e41SJeremy Kerr
145f0ca2e41SJeremy Kerr    /* MCTP tag defintions; used for smcp_tag field of sockaddr_mctp */
146f0ca2e41SJeremy Kerr    /* MCTP-spec-defined fields */
147f0ca2e41SJeremy Kerr    #define MCTP_TAG_MASK    0x07
148f0ca2e41SJeremy Kerr    #define MCTP_TAG_OWNER   0x08
149f0ca2e41SJeremy Kerr    /* Others: reserved */
150f0ca2e41SJeremy Kerr
151f0ca2e41SJeremy Kerr    /* Helpers */
152f0ca2e41SJeremy Kerr    #define MCTP_TAG_RSP(x) (x & MCTP_TAG_MASK) /* response to a request: clear TO, keep value */
153f0ca2e41SJeremy Kerr```
154f0ca2e41SJeremy Kerr
155*f4febd00SPatrick Williams### Syscall behaviour
156f0ca2e41SJeremy Kerr
157f0ca2e41SJeremy KerrThe following sections describe the MCTP-specific behaviours of the standard
158f0ca2e41SJeremy Kerrsocket system calls. These behaviours have been chosen to map closely to the
159f0ca2e41SJeremy Kerrexisting sockets APIs.
160f0ca2e41SJeremy Kerr
161*f4febd00SPatrick Williams#### `bind()`: set local socket address
162f0ca2e41SJeremy Kerr
163f0ca2e41SJeremy KerrSockets that receive incoming request packets will bind to a local address,
164f0ca2e41SJeremy Kerrusing the `bind()` syscall.
165f0ca2e41SJeremy Kerr
166f0ca2e41SJeremy Kerr```c
167f0ca2e41SJeremy Kerr    struct sockaddr_mctp addr;
168f0ca2e41SJeremy Kerr
169f0ca2e41SJeremy Kerr    addr.smctp_family = AF_MCTP;
170f0ca2e41SJeremy Kerr    addr.smctp_network = MCTP_NET_ANY;
171f0ca2e41SJeremy Kerr    addr.smctp_addr.s_addr = MCTP_ADDR_ANY;
172f0ca2e41SJeremy Kerr    addr.smctp_type = MCTP_TYPE_PLDM;
173f0ca2e41SJeremy Kerr    addr.smctp_tag = MCTP_TAG_OWNER;
174f0ca2e41SJeremy Kerr
175f0ca2e41SJeremy Kerr    int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
176f0ca2e41SJeremy Kerr```
177f0ca2e41SJeremy Kerr
178f0ca2e41SJeremy KerrThis establishes the local address of the socket. Incoming MCTP messages that
179f0ca2e41SJeremy Kerrmatch the network, address, and message type will be received by this socket.
180f0ca2e41SJeremy KerrThe reference to 'incoming' is important here; a bound socket will only receive
181f0ca2e41SJeremy Kerrmessages with the TO bit set, to indicate an incoming request message, rather
182f0ca2e41SJeremy Kerrthan a response.
183f0ca2e41SJeremy Kerr
184f0ca2e41SJeremy KerrThe `smctp_tag` value will configure the tags accepted from the remote side of
185f0ca2e41SJeremy Kerrthis socket. Given the above, the only valid value is `MCTP_TAG_OWNER`, which
186f0ca2e41SJeremy Kerrwill result in remotely "owned" tags being routed to this socket. Since
187*f4febd00SPatrick Williams`MCTP_TAG_OWNER` is set, the 3 least-significant bits of `smctp_tag` are not
188*f4febd00SPatrick Williamsused; callers must set them to zero. See the
189*f4febd00SPatrick Williams[Tag behaviour for transmitted messages](#tag-behaviour-for-transmitted-messages)
190*f4febd00SPatrick Williamssection for more details. If the `MCTP_TAG_OWNER` bit is not set, `bind()` will
191*f4febd00SPatrick Williamsfail with an errno of `EINVAL`.
192f0ca2e41SJeremy Kerr
193f0ca2e41SJeremy KerrA `smctp_network` value of `MCTP_NET_ANY` will configure the socket to receive
194f0ca2e41SJeremy Kerrincoming packets from any locally-connected network. A specific network value
195f0ca2e41SJeremy Kerrwill cause the socket to only receive incoming messages from that network.
196f0ca2e41SJeremy Kerr
197f0ca2e41SJeremy KerrThe `smctp_addr` field specifies a local address to bind to. A value of
198*f4febd00SPatrick Williams`MCTP_ADDR_ANY` configures the socket to receive messages addressed to any local
199*f4febd00SPatrick Williamsdestination EID.
200f0ca2e41SJeremy Kerr
201f0ca2e41SJeremy KerrThe `smctp_type` field specifies which message types to receive. Only the lower
202f0ca2e41SJeremy Kerr7 bits of the type is matched on incoming messages (ie., the most-significant IC
203f0ca2e41SJeremy Kerrbit is not part of the match). This results in the socket receiving packets with
204f0ca2e41SJeremy Kerrand without a message integrity check footer.
205f0ca2e41SJeremy Kerr
206*f4febd00SPatrick Williams#### `connect()`: set remote socket address
207f0ca2e41SJeremy Kerr
208f0ca2e41SJeremy KerrSockets may specify a socket's remote address with the `connect()` syscall:
209f0ca2e41SJeremy Kerr
210f0ca2e41SJeremy Kerr```c
211f0ca2e41SJeremy Kerr    struct sockaddr_mctp addr;
212f0ca2e41SJeremy Kerr    int rc;
213f0ca2e41SJeremy Kerr
214f0ca2e41SJeremy Kerr    addr.smctp_family = AF_MCTP;
215f0ca2e41SJeremy Kerr    addr.smctp_network = MCTP_NET_ANY;
216f0ca2e41SJeremy Kerr    addr.smctp_addr.s_addr = 8;
217f0ca2e41SJeremy Kerr    addr.smctp_tag = MCTP_TAG_OWNER;
218f0ca2e41SJeremy Kerr    addr.smctp_type = MCTP_TYPE_PLDM;
219f0ca2e41SJeremy Kerr
220f0ca2e41SJeremy Kerr    rc = connect(sd, (struct sockaddr *)&addr, sizeof(addr));
221f0ca2e41SJeremy Kerr```
222f0ca2e41SJeremy Kerr
223f0ca2e41SJeremy KerrThis establishes the remote address of a socket, used for future message
224f0ca2e41SJeremy Kerrtransmission. Like other `SOCK_DGRAM` behaviour, this does not generate any MCTP
225f0ca2e41SJeremy Kerrtraffic directly, but just sets the default destination for messages sent from
226f0ca2e41SJeremy Kerrthis socket.
227f0ca2e41SJeremy Kerr
228f0ca2e41SJeremy KerrThe `smctp_network` field may specify a locally-attached network, or the value
229f0ca2e41SJeremy Kerr`MCTP_NET_ANY`, in which case the kernel will select a suitable MCTP network.
230f0ca2e41SJeremy KerrThis is guaranteed to work for single-network configurations, but may require
231f0ca2e41SJeremy Kerradditional routing definitions for endpoints attached to multiple distinct
232f0ca2e41SJeremy Kerrnetworks. See the [Addressing](#addressing) section for details.
233f0ca2e41SJeremy Kerr
234f0ca2e41SJeremy KerrThe `smctp_addr` field specifies a remote EID. This may be the `MCTP_ADDR_BCAST`
235f0ca2e41SJeremy Kerrthe MCTP broadcast EID (0xff).
236f0ca2e41SJeremy Kerr
237f0ca2e41SJeremy KerrThe `smctp_type` field specifies the type field of messages transferred over
238f0ca2e41SJeremy Kerrthis socket.
239f0ca2e41SJeremy Kerr
240f0ca2e41SJeremy KerrThe `smctp_tag` value will configure the tag used for the local side of this
241f0ca2e41SJeremy Kerrsocket. The only valid value is `MCTP_TAG_OWNER`, which will result in an
242f0ca2e41SJeremy Kerr"owned" tag to be allocated for this socket, and will remain allocated for all
243f0ca2e41SJeremy Kerrfuture outgoing messages, until either the socket is closed, or `connect()` is
244f0ca2e41SJeremy Kerrcalled again. If a tag cannot be allocated, `connect()` will report an error,
245*f4febd00SPatrick Williamswith an errno value of `EAGAIN`. See the
246*f4febd00SPatrick Williams[Tag behaviour for transmitted messages](#tag-behaviour-for-transmitted-messages)
247*f4febd00SPatrick Williamssection for more details. If the `MCTP_TAG_OWNER` bit is not set, `connect()`
248*f4febd00SPatrick Williamswill fail with an errno of `EINVAL`.
249f0ca2e41SJeremy Kerr
250f0ca2e41SJeremy KerrRequesters which connect to a single responder will typically use `connect()` to
251f0ca2e41SJeremy Kerrspecify the peer address and tag for future outgoing messages.
252f0ca2e41SJeremy Kerr
253*f4febd00SPatrick Williams#### `sendto()`, `sendmsg()`, `send()` & `write()`: transmit an MCTP message
254f0ca2e41SJeremy Kerr
255*f4febd00SPatrick WilliamsAn MCTP message is transmitted using one of the `sendto()`, `sendmsg()`,
256*f4febd00SPatrick Williams`send()` or `write()` syscalls. Using `sendto()` as the primary example:
257f0ca2e41SJeremy Kerr
258f0ca2e41SJeremy Kerr```c
259f0ca2e41SJeremy Kerr    struct sockaddr_mctp addr;
260f0ca2e41SJeremy Kerr    char buf[14];
261f0ca2e41SJeremy Kerr    ssize_t len;
262f0ca2e41SJeremy Kerr
263f0ca2e41SJeremy Kerr    /* set message destination */
264f0ca2e41SJeremy Kerr    addr.smctp_family = AF_MCTP;
265f0ca2e41SJeremy Kerr    addr.smctp_network = 0;
266f0ca2e41SJeremy Kerr    addr.smctp_addr.s_addr = 8;
267f0ca2e41SJeremy Kerr    addr.smctp_tag = MCTP_TAG_OWNER;
268f0ca2e41SJeremy Kerr    addr.smctp_type = MCTP_TYPE_ECHO;
269f0ca2e41SJeremy Kerr
270f0ca2e41SJeremy Kerr    /* arbitrary message to send, with message-type header */
271f0ca2e41SJeremy Kerr    buf[0] = MCTP_TYPE_ECHO;
272f0ca2e41SJeremy Kerr    memcpy(buf + 1, "hello, world!", sizeof(buf) - 1);
273f0ca2e41SJeremy Kerr
274f0ca2e41SJeremy Kerr    len = sendto(sd, buf, sizeof(buf), 0,
275f0ca2e41SJeremy Kerr                    (struct sockaddr_mctp *)&addr, sizeof(addr));
276f0ca2e41SJeremy Kerr```
277f0ca2e41SJeremy Kerr
278f0ca2e41SJeremy KerrThe address argument is treated the same way as for `connect()`: The network and
279f0ca2e41SJeremy Kerraddress fields define the remote address to send to. If `smctp_tag` has the
280f0ca2e41SJeremy Kerr`MCTP_TAG_OWNER`, the kernel will ignore any bits set in `MCTP_TAG_VALUE`, and
281f0ca2e41SJeremy Kerrgenerate a tag value suitable for the destination EID. If `MCTP_TAG_OWNER` is
282f0ca2e41SJeremy Kerrnot set, the message will be sent with the tag value as specified. If a tag
283f0ca2e41SJeremy Kerrvalue cannot be allocated, the system call will report an errno of `EAGAIN`.
284f0ca2e41SJeremy Kerr
285f0ca2e41SJeremy KerrThe application must provide the message type byte as the first byte of the
286f0ca2e41SJeremy Kerrmessage buffer passed to `sendto()`. If a message integrity check is to be
287f0ca2e41SJeremy Kerrincluded in the transmitted message, it must also be provided in the message
288f0ca2e41SJeremy Kerrbuffer, and the most-significant bit of the message type byte must be 1.
289f0ca2e41SJeremy Kerr
290f0ca2e41SJeremy KerrIf the first byte of the message does not match the message type value, then the
291f0ca2e41SJeremy Kerrsystem call will return an error of `EPROTO`.
292f0ca2e41SJeremy Kerr
293f0ca2e41SJeremy KerrThe `send()` and `write()` system calls behave in a similar way, but do not
294f0ca2e41SJeremy Kerrspecify a remote address. Therefore, `connect()` must be called beforehand; if
295f0ca2e41SJeremy Kerrnot, these calls will return an error of `EDESTADDRREQ` (Destination address
296f0ca2e41SJeremy Kerrrequired).
297f0ca2e41SJeremy Kerr
298f0ca2e41SJeremy KerrUsing `sendto()` or `sendmsg()` on a connected socket may override the remote
299f0ca2e41SJeremy Kerrsocket address specified in `connect()`. The `connect()` address and tag will
300f0ca2e41SJeremy Kerrremain associated with the socket, for future unaddressed sends. The tag
301f0ca2e41SJeremy Kerrallocated through a call to `sendto()` or `sendmsg()` on a connected socket is
302f0ca2e41SJeremy Kerrsubject to the same invalidation logic as on an unconnected socket: It is
303f0ca2e41SJeremy Kerrexpired either by timeout or by a subsequent `sendto()`.
304f0ca2e41SJeremy Kerr
305f0ca2e41SJeremy KerrThe `sendmsg()` system call allows a more compact argument interface, and the
306*f4febd00SPatrick Williamsmessage buffer to be specified as a scatter-gather list. At present no ancillary
307*f4febd00SPatrick Williamsmessage types (used for the `msg_control` data passed to `sendmsg()`) are
308*f4febd00SPatrick Williamsdefined.
309f0ca2e41SJeremy Kerr
310f0ca2e41SJeremy KerrTransmitting a message on an unconnected socket with `MCTP_TAG_OWNER` specified
311f0ca2e41SJeremy Kerrwill cause an allocation of a tag, if no valid tag is already allocated for that
312f0ca2e41SJeremy Kerrdestination. The (destination-eid,tag) tuple acts as an implicit local socket
313f0ca2e41SJeremy Kerraddress, to allow the socket to receive responses to this outgoing message. If
314f0ca2e41SJeremy Kerrany previous allocation has been performed (to for a different remote EID), that
315f0ca2e41SJeremy Kerrallocation is lost. This tag behaviour can be controlled through the
316f0ca2e41SJeremy Kerr`MCTP_TAG_CONTROL` socket option.
317f0ca2e41SJeremy Kerr
318*f4febd00SPatrick WilliamsSockets will only receive responses to requests they have sent (with TO=1) and
319*f4febd00SPatrick Williamsmay only respond (with TO=0) to requests they have received.
320f0ca2e41SJeremy Kerr
321*f4febd00SPatrick Williams#### `recvfrom()`, `recvmsg()`, `recv()` & `read()`: receive an MCTP message
322f0ca2e41SJeremy Kerr
323f0ca2e41SJeremy KerrAn MCTP message can be received by an application using one of the `recvfrom()`,
324f0ca2e41SJeremy Kerr`recvmsg()`, `recv()` or `read()` system calls. Using `recvfrom()` as the
325f0ca2e41SJeremy Kerrprimary example:
326f0ca2e41SJeremy Kerr
327f0ca2e41SJeremy Kerr```c
328f0ca2e41SJeremy Kerr    struct sockaddr_mctp addr;
329f0ca2e41SJeremy Kerr    socklen_t addrlen;
330f0ca2e41SJeremy Kerr    char buf[14];
331f0ca2e41SJeremy Kerr    ssize_t len;
332f0ca2e41SJeremy Kerr
333f0ca2e41SJeremy Kerr    addrlen = sizeof(addr);
334f0ca2e41SJeremy Kerr
335f0ca2e41SJeremy Kerr    len = recvfrom(sd, buf, sizeof(buf), 0,
336f0ca2e41SJeremy Kerr                    (struct sockaddr_mctp *)&addr, &addrlen);
337f0ca2e41SJeremy Kerr
338f0ca2e41SJeremy Kerr    /* We can expect addr to describe an MCTP address */
339f0ca2e41SJeremy Kerr    assert(addrlen >= sizeof(buf));
340f0ca2e41SJeremy Kerr    assert(addr.smctp_family == AF_MCTP);
341f0ca2e41SJeremy Kerr
342f0ca2e41SJeremy Kerr    printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr);
343f0ca2e41SJeremy Kerr```
344f0ca2e41SJeremy Kerr
345f0ca2e41SJeremy KerrThe address argument to `recvfrom` and `recvmsg` is populated with the remote
346f0ca2e41SJeremy Kerraddress of the incoming message, including tag value (this will be needed in
347f0ca2e41SJeremy Kerrorder to reply to the message).
348f0ca2e41SJeremy Kerr
349f0ca2e41SJeremy KerrThe first byte of the message buffer will contain the message type byte. If an
350f0ca2e41SJeremy Kerrintegrity check follows the message, it will be included in the received buffer.
351f0ca2e41SJeremy Kerr
352f0ca2e41SJeremy KerrThe `recv()` and `read()` system calls behave in a similar way, but do not
353f0ca2e41SJeremy Kerrprovide a remote address to the application. Therefore, these are only useful if
354f0ca2e41SJeremy Kerrthe remote address is already known, or the message does not require a reply.
355f0ca2e41SJeremy Kerr
356f0ca2e41SJeremy KerrLike the send calls, sockets will only receive responses to requests they have
357f0ca2e41SJeremy Kerrsent (TO=1) and may only respond (TO=0) to requests they have received.
358f0ca2e41SJeremy Kerr
359*f4febd00SPatrick Williams#### `getsockname()` & `getpeername()`: query local/remote socket address
360f0ca2e41SJeremy Kerr
361f0ca2e41SJeremy KerrThe `getsockname()` system call returns the `struct sockaddr_mctp` value for the
362f0ca2e41SJeremy Kerrlocal side of this socket, `getpeername()` for the remote (ie, that used in a
363f0ca2e41SJeremy Kerrconnect()). Since the tag value is a property of the remote address,
364f0ca2e41SJeremy Kerr`getpeername()` may be used to retrieve a kernel-allocated tag value.
365f0ca2e41SJeremy Kerr
366f0ca2e41SJeremy KerrCalling `getpeername()` on an unconnected socket will result in an error of
367f0ca2e41SJeremy Kerr`ENOTCONN`.
368f0ca2e41SJeremy Kerr
369*f4febd00SPatrick Williams#### Socket options
370f0ca2e41SJeremy Kerr
371f0ca2e41SJeremy KerrThe following socket options are defined for MCTP sockets:
372f0ca2e41SJeremy Kerr
373*f4febd00SPatrick Williams##### `MCTP_ADDR_EXT`: Use extended addressing information in sendmsg/recvmsg
374f0ca2e41SJeremy Kerr
375f0ca2e41SJeremy KerrEnabling this socket option allows an application to specify extended addressing
376f0ca2e41SJeremy Kerrinformation on transmitted packets, and access the same on received packets.
377f0ca2e41SJeremy Kerr
378f0ca2e41SJeremy KerrWhen the `MCTP_ADDR_EXT` socket option is enabled, the application may specify
379f0ca2e41SJeremy Kerran expanded `struct sockaddr` to the `recvfrom()` and `sendto()` system calls.
380f0ca2e41SJeremy KerrThis as defined as:
381f0ca2e41SJeremy Kerr
382f0ca2e41SJeremy Kerr```c
383f0ca2e41SJeremy Kerr    struct sockaddr_mctp_ext {
384f0ca2e41SJeremy Kerr            /* fields exactly match struct sockaddr_mctp */
385f0ca2e41SJeremy Kerr            sa_family_t         smctp_family; /* = AF_MCTP */
386f0ca2e41SJeremy Kerr            int                 smctp_network;
387f0ca2e41SJeremy Kerr            struct mctp_addr    smctp_addr;
388f0ca2e41SJeremy Kerr            uint8_t             smcp_tag;
389f0ca2e41SJeremy Kerr            /* extended addressing */
390f0ca2e41SJeremy Kerr            int                 smctp_ifindex;
391f0ca2e41SJeremy Kerr            uint8_t             smctp_halen;
392f0ca2e41SJeremy Kerr            unsigned char       smctp_haddr[/* TBD */];
393f0ca2e41SJeremy Kerr    }
394f0ca2e41SJeremy Kerr```
395f0ca2e41SJeremy Kerr
396f0ca2e41SJeremy KerrIf the `addrlen` specified to `sendto()` or `recvfrom()` is sufficient to
397f0ca2e41SJeremy Kerrcontain this larger structure, then the extended addressing fields are consumed
398f0ca2e41SJeremy Kerr/ populated respectively.
399f0ca2e41SJeremy Kerr
400*f4febd00SPatrick Williams##### `MCTP_TAG_CONTROL`: manage outgoing tag allocation behaviour
401f0ca2e41SJeremy Kerr
402f0ca2e41SJeremy KerrThe set/getsockopt argument is a `mctp_tagctl` structure:
403f0ca2e41SJeremy Kerr
404f0ca2e41SJeremy Kerr    struct mctp_tagctl {
405f0ca2e41SJeremy Kerr        bool            retain;
406f0ca2e41SJeremy Kerr        struct timespec timeout;
407f0ca2e41SJeremy Kerr    };
408f0ca2e41SJeremy Kerr
409f0ca2e41SJeremy KerrThis allows an application to control the behaviour of allocated tags for
410f0ca2e41SJeremy Kerrnon-connected sockets when transferring messages to multiple different
411f0ca2e41SJeremy Kerrdestinations (ie., where a `struct sockaddr_mctp` is provided for individual
412f0ca2e41SJeremy Kerrmessages, and the `smctp_addr` destination for those sockets may vary across
413f0ca2e41SJeremy Kerrcalls).
414f0ca2e41SJeremy Kerr
415f0ca2e41SJeremy KerrThe `retain` flag indicates to the kernel that the socket should not release tag
416f0ca2e41SJeremy Kerrallocations when a message is sent to a new destination EID. This causes the
417f0ca2e41SJeremy Kerrsocket to continue to receive incoming messages to the old (dest,tag) tuple, in
418f0ca2e41SJeremy Kerraddition to the new tuple.
419f0ca2e41SJeremy Kerr
420f0ca2e41SJeremy KerrThe `timeout` value specifies a maximum amount of time to retain tag values.
421f0ca2e41SJeremy KerrThis should be based on the reply timeout for any upper-level protocol.
422f0ca2e41SJeremy Kerr
423f0ca2e41SJeremy KerrThe kernel may reject a request to set values that would cause excessive tag
424f0ca2e41SJeremy Kerrallocation by this socket. The kernel may also reject subsequent tag-allocation
425f0ca2e41SJeremy Kerrrequests (through send or connect syscalls) which would cause excessive tags to
426f0ca2e41SJeremy Kerrbe consumed by the socket, even though the tag control settings were accepted in
427f0ca2e41SJeremy Kerrthe setsockopt operation.
428f0ca2e41SJeremy Kerr
429f0ca2e41SJeremy KerrChanging the default tag control behaviour should only be required when:
430f0ca2e41SJeremy Kerr
431*f4febd00SPatrick Williams- the socket is sending messages with TO=1 (ie, is a requester); and
432*f4febd00SPatrick Williams- messages are sent to multiple different destination EIDs from the one socket.
433f0ca2e41SJeremy Kerr
434*f4febd00SPatrick Williams#### Syscalls not implemented
435f0ca2e41SJeremy Kerr
436f0ca2e41SJeremy KerrThe following system calls are not implemented for MCTP, primarily as they are
437f0ca2e41SJeremy Kerrnot used in `SOCK_DGRAM`-type sockets:
438f0ca2e41SJeremy Kerr
439*f4febd00SPatrick Williams- `listen()`
440*f4febd00SPatrick Williams- `accept()`
441*f4febd00SPatrick Williams- `ioctl()`
442*f4febd00SPatrick Williams- `shutdown()`
443*f4febd00SPatrick Williams- `mmap()`
444f0ca2e41SJeremy Kerr
445*f4febd00SPatrick Williams### Userspace examples
446f0ca2e41SJeremy Kerr
447f0ca2e41SJeremy KerrThese examples cover three general use-cases:
448f0ca2e41SJeremy Kerr
449*f4febd00SPatrick Williams- **requester**: sends requests to a particular (EID, type) target, and receives
450*f4febd00SPatrick Williams  responses to those packets
451f0ca2e41SJeremy Kerr
452f0ca2e41SJeremy Kerr  This is similar to a typical UDP client
453f0ca2e41SJeremy Kerr
454f0ca2e41SJeremy Kerr- **responder**: receives all locally-addressed messages of a specific
455f0ca2e41SJeremy Kerr  message-type, and responds to the requester immediately.
456f0ca2e41SJeremy Kerr
457f0ca2e41SJeremy Kerr  This is similar to a typical UDP server
458f0ca2e41SJeremy Kerr
459f0ca2e41SJeremy Kerr- **controller**: a specific service for a bus owner; may send broadcast
460*f4febd00SPatrick Williams  messages, manage EID allocations, update local MCTP stack state. Will need
461*f4febd00SPatrick Williams  low-level packet data.
462f0ca2e41SJeremy Kerr
463f0ca2e41SJeremy Kerr  This is similar to a DHCP server.
464f0ca2e41SJeremy Kerr
465*f4febd00SPatrick Williams#### Requester
466f0ca2e41SJeremy Kerr
467*f4febd00SPatrick Williams"Client"-side implementation to send requests to a responder, and receive a
468*f4febd00SPatrick Williamsresponse. This uses a (fictitious) message type of `MCTP_TYPE_ECHO`.
469f0ca2e41SJeremy Kerr
470f0ca2e41SJeremy Kerr```c
471f0ca2e41SJeremy Kerr    int main() {
472f0ca2e41SJeremy Kerr            struct sockaddr_mctp addr;
473f0ca2e41SJeremy Kerr            socklen_t addrlen;
474f0ca2e41SJeremy Kerr            struct {
475f0ca2e41SJeremy Kerr                uint8_t type;
476f0ca2e41SJeremy Kerr                uint8_t data[14];
477f0ca2e41SJeremy Kerr            } msg;
478f0ca2e41SJeremy Kerr            int sd, rc;
479f0ca2e41SJeremy Kerr
480f0ca2e41SJeremy Kerr            sd = socket(AF_MCTP, SOCK_DGRAM, 0);
481f0ca2e41SJeremy Kerr
482f0ca2e41SJeremy Kerr            addr.sa_family = AF_MCTP;
483f0ca2e41SJeremy Kerr            addr.smctp_network = MCTP_NET_ANY; /* any network */
484f0ca2e41SJeremy Kerr            addr.smctp_addr.s_addr = 9;    /* remote eid 9 */
485f0ca2e41SJeremy Kerr            addr.smctp_tag = MCTP_TAG_OWNER; /* kernel will allocate an owned tag */
486f0ca2e41SJeremy Kerr            addr.smctp_type = MCTP_TYPE_ECHO; /* ficticious message type */
487f0ca2e41SJeremy Kerr            addrlen = sizeof(addr);
488f0ca2e41SJeremy Kerr
489f0ca2e41SJeremy Kerr            /* set message type and payload */
490f0ca2e41SJeremy Kerr            msg.type = MCTP_TYPE_ECHO;
491f0ca2e41SJeremy Kerr            strncpy(msg.data, "hello, world!", sizeof(msg.data));
492f0ca2e41SJeremy Kerr
493f0ca2e41SJeremy Kerr            /* send message */
494f0ca2e41SJeremy Kerr            rc = sendto(sd, &msg, sizeof(msg), 0,
495f0ca2e41SJeremy Kerr                            (struct sockaddr *)&addr, addrlen);
496f0ca2e41SJeremy Kerr
497f0ca2e41SJeremy Kerr            if (rc < 0)
498f0ca2e41SJeremy Kerr                    err(EXIT_FAILURE, "sendto");
499f0ca2e41SJeremy Kerr
500f0ca2e41SJeremy Kerr            /* Receive reply. This will block until a reply arrives,
501f0ca2e41SJeremy Kerr             * which may never happen. Actual code would need a timeout
502f0ca2e41SJeremy Kerr             * here. */
503f0ca2e41SJeremy Kerr            rc = recvfrom(sd, &msg, sizeof(msg), 0,
504f0ca2e41SJeremy Kerr                        (struct sockaddr *)&addr, &addrlen);
505f0ca2e41SJeremy Kerr            if (rc < 0)
506f0ca2e41SJeremy Kerr                    err(EXIT_FAILURE, "recvfrom");
507f0ca2e41SJeremy Kerr
508f0ca2e41SJeremy Kerr            assert(msg.type == MCTP_TYPE_ECHO);
509f0ca2e41SJeremy Kerr            /* ensure we're nul-terminated */
510f0ca2e41SJeremy Kerr            msg.data[sizeof(msg.data)-1] = '\0';
511f0ca2e41SJeremy Kerr
512f0ca2e41SJeremy Kerr            printf("reply: %s\n", msg.data);
513f0ca2e41SJeremy Kerr
514f0ca2e41SJeremy Kerr            return EXIT_SUCCESS;
515f0ca2e41SJeremy Kerr    }
516f0ca2e41SJeremy Kerr```
517f0ca2e41SJeremy Kerr
518*f4febd00SPatrick Williams#### Responder
519f0ca2e41SJeremy Kerr
520f0ca2e41SJeremy Kerr"Server"-side implementation to receive requests and respond. Like the client,
521*f4febd00SPatrick WilliamsThis uses a (fictitious) message type of `MCTP_TYPE_ECHO` in the
522*f4febd00SPatrick Williams`struct sockaddr_mctp`; only messages matching this type will be received.
523f0ca2e41SJeremy Kerr
524f0ca2e41SJeremy Kerr```c
525f0ca2e41SJeremy Kerr    int main() {
526f0ca2e41SJeremy Kerr            struct sockaddr_mctp addr;
527f0ca2e41SJeremy Kerr            socklen_t addrlen;
528f0ca2e41SJeremy Kerr            int sd, rc;
529f0ca2e41SJeremy Kerr
530f0ca2e41SJeremy Kerr            sd = socket(AF_MCTP, SOCK_DGRAM, 0);
531f0ca2e41SJeremy Kerr
532f0ca2e41SJeremy Kerr            addr.sa_family = AF_MCTP;
533f0ca2e41SJeremy Kerr            addr.smctp_network = MCTP_NET_ANY; /* any network */
534f0ca2e41SJeremy Kerr            addr.smctp_addr.s_addr = MCTP_EID_ANY;
535f0ca2e41SJeremy Kerr            addr.smctp_type = MCTP_TYPE_ECHO;
536f0ca2e41SJeremy Kerr            addr.smctp_tag = MCTP_TAG_OWNER;
537f0ca2e41SJeremy Kerr            addrlen = sizeof(addr);
538f0ca2e41SJeremy Kerr
539f0ca2e41SJeremy Kerr            rc = bind(sd, (struct sockaddr *)&addr, addrlen);
540f0ca2e41SJeremy Kerr            if (rc)
541f0ca2e41SJeremy Kerr                    err(EXIT_FAILURE, "bind");
542f0ca2e41SJeremy Kerr
543f0ca2e41SJeremy Kerr            for (;;) {
544f0ca2e41SJeremy Kerr                    struct {
545f0ca2e41SJeremy Kerr                        uint8_t type;
546f0ca2e41SJeremy Kerr                        uint8_t data[14];
547f0ca2e41SJeremy Kerr                    } msg;
548f0ca2e41SJeremy Kerr
549f0ca2e41SJeremy Kerr                    rc = recvfrom(sd, &msg, sizeof(msg), 0,
550f0ca2e41SJeremy Kerr                                    (struct sockaddr *)&addr, &addrlen);
551f0ca2e41SJeremy Kerr                    if (rc < 0)
552f0ca2e41SJeremy Kerr                            err(EXIT_FAILURE, "recvfrom");
553f0ca2e41SJeremy Kerr                    if (rc < 1)
554f0ca2e41SJeremy Kerr                            warnx("not enough data for a message type");
555f0ca2e41SJeremy Kerr
556f0ca2e41SJeremy Kerr                    assert(addrlen == sizeof(addr));
557f0ca2e41SJeremy Kerr                    assert(msg.type == MCTP_TYPE_ECHO);
558f0ca2e41SJeremy Kerr
559f0ca2e41SJeremy Kerr                    printf("%zd bytes from EID %d\n", rc, addr.smctp_addr);
560f0ca2e41SJeremy Kerr
561f0ca2e41SJeremy Kerr                    /* Reply to requester; this macro just clears the TO-bit.
562f0ca2e41SJeremy Kerr                     * Other addr fields will describe the remote endpoint,
563f0ca2e41SJeremy Kerr                     * so use those as-is.
564f0ca2e41SJeremy Kerr                     */
565f0ca2e41SJeremy Kerr                    addr.smctp_tag = MCTP_TAG_RSP(addr.smctp_tag);
566f0ca2e41SJeremy Kerr
567f0ca2e41SJeremy Kerr                    rc = sendto(sd, &msg, rc, 0,
568f0ca2e41SJeremy Kerr                                (struct sockaddr *)&addr, addrlen);
569f0ca2e41SJeremy Kerr                    if (rc < 0)
570f0ca2e41SJeremy Kerr                            err(EXIT_FAILURE, "sendto");
571f0ca2e41SJeremy Kerr            }
572f0ca2e41SJeremy Kerr
573f0ca2e41SJeremy Kerr            return EXIT_SUCCESS;
574f0ca2e41SJeremy Kerr    }
575f0ca2e41SJeremy Kerr```
576f0ca2e41SJeremy Kerr
577*f4febd00SPatrick Williams#### Broadcast request
578f0ca2e41SJeremy Kerr
579f0ca2e41SJeremy KerrSends a request to a broadcast EID, and receives (unicast) replies. Typical
580f0ca2e41SJeremy Kerrcontrol protocol pattern.
581f0ca2e41SJeremy Kerr
582f0ca2e41SJeremy Kerr```c
583f0ca2e41SJeremy Kerr    int main() {
584f0ca2e41SJeremy Kerr            struct sockaddr_mctp txaddr, rxaddr;
585f0ca2e41SJeremy Kerr            struct timespec start, cur;
586f0ca2e41SJeremy Kerr            struct pollfd pollfds[1];
587f0ca2e41SJeremy Kerr            socklen_t addrlen;
588f0ca2e41SJeremy Kerr            uint8_t buf[2];
589f0ca2e41SJeremy Kerr            int timeout;
590f0ca2e41SJeremy Kerr
591f0ca2e41SJeremy Kerr            sd = socket(AF_MCTP, SOCK_DGRAM, 0);
592f0ca2e41SJeremy Kerr
593f0ca2e41SJeremy Kerr            /* destination address setup */
594f0ca2e41SJeremy Kerr            txaddr.sa_family = AF_MCTP;
595f0ca2e41SJeremy Kerr            txaddr.smctp_network = 1; /* specific network required for broadcast */
596f0ca2e41SJeremy Kerr            txaddr.smctp_addr.s_addr = MCTP_TAG_BCAST; /* broadcast dest */
597f0ca2e41SJeremy Kerr            txaddr.smctp_type = MCTP_TYPE_CONTROL;
598f0ca2e41SJeremy Kerr            txaddr.smctp_tag = MCTP_TAG_OWNER;
599f0ca2e41SJeremy Kerr
600f0ca2e41SJeremy Kerr            buf[0] = MCTP_TYPE_CONTROL;
601f0ca2e41SJeremy Kerr            buf[1] = 'a';
602f0ca2e41SJeremy Kerr
603f0ca2e41SJeremy Kerr            /* We're doing a sendto() to a broadcast address here. If we were
604f0ca2e41SJeremy Kerr             * sending more than one broadcast message, we'd be better off
605f0ca2e41SJeremy Kerr             * doing connect(); sendto();, in order to retain the tag
606f0ca2e41SJeremy Kerr             * reservation across all transmitted messages. However, since this
607f0ca2e41SJeremy Kerr             * is a single transmit, that makes no difference in this
608f0ca2e41SJeremy Kerr             * particular case.
609f0ca2e41SJeremy Kerr             */
610f0ca2e41SJeremy Kerr            rc = sendto(sd, buf, 2, 0, (struct sockaddr *)&txaddr,
611f0ca2e41SJeremy Kerr                            sizeof(txaddr));
612f0ca2e41SJeremy Kerr            if (rc < 0)
613f0ca2e41SJeremy Kerr                    err(EXIT_FAILURE, "sendto");
614f0ca2e41SJeremy Kerr
615f0ca2e41SJeremy Kerr            /* Set up poll behaviour, and record our starting time for
616f0ca2e41SJeremy Kerr             * reply timeouts */
617f0ca2e41SJeremy Kerr            pollfds[0].fd = sd;
618f0ca2e41SJeremy Kerr            pollfds[0].events = POLLIN;
619f0ca2e41SJeremy Kerr            clock_gettime(CLOCK_MONOTONIC, &start);
620f0ca2e41SJeremy Kerr
621f0ca2e41SJeremy Kerr            for (;;) {
622f0ca2e41SJeremy Kerr                    /* Calculate the amount of time left for replies */
623f0ca2e41SJeremy Kerr                    clock_gettime(CLOCK_MONOTONIC, &cur);
624f0ca2e41SJeremy Kerr                    timeout = calculate_timeout(&start, &cur, 1000);
625f0ca2e41SJeremy Kerr
626f0ca2e41SJeremy Kerr                    rc = poll(pollfds, 1, timeout)
627f0ca2e41SJeremy Kerr                    if (rc < 0)
628f0ca2e41SJeremy Kerr                        err(EXIT_FAILURE, "poll");
629f0ca2e41SJeremy Kerr
630f0ca2e41SJeremy Kerr                    /* timeout receiving a reply? */
631f0ca2e41SJeremy Kerr                    if (rc == 0)
632f0ca2e41SJeremy Kerr                        break;
633f0ca2e41SJeremy Kerr
634f0ca2e41SJeremy Kerr                    /* sanity check that we have a message to receive */
635f0ca2e41SJeremy Kerr                    if (!(pollfds[0].revents & POLLIN))
636f0ca2e41SJeremy Kerr                        break;
637f0ca2e41SJeremy Kerr
638f0ca2e41SJeremy Kerr                    addrlen = sizeof(rxaddr);
639f0ca2e41SJeremy Kerr
640f0ca2e41SJeremy Kerr                    rc = recvfrom(sd, &buf, 2, 0, (struct sockaddr *)&rxaddr,
641f0ca2e41SJeremy Kerr                            &addrlen);
642f0ca2e41SJeremy Kerr                    if (rc < 0)
643f0ca2e41SJeremy Kerr                            err(EXIT_FAILURE, "recvfrom");
644f0ca2e41SJeremy Kerr
645f0ca2e41SJeremy Kerr                    assert(addrlen >= sizeof(rxaddr));
646f0ca2e41SJeremy Kerr                    assert(rxaddr.smctp_family == AF_MCTP);
647f0ca2e41SJeremy Kerr
648f0ca2e41SJeremy Kerr                    printf("response from EID %d\n", rxaddr.smctp_addr);
649f0ca2e41SJeremy Kerr            }
650f0ca2e41SJeremy Kerr
651f0ca2e41SJeremy Kerr            return EXIT_SUCCESS;
652f0ca2e41SJeremy Kerr    }
653f0ca2e41SJeremy Kerr```
654f0ca2e41SJeremy Kerr
655*f4febd00SPatrick Williams### Implementation notes
656f0ca2e41SJeremy Kerr
657*f4febd00SPatrick Williams#### Addressing
658f0ca2e41SJeremy Kerr
659f0ca2e41SJeremy KerrTransmitted messages (through `sendto()` and related system calls) specify their
660*f4febd00SPatrick Williamsdestination via the `smctp_network` and `smctp_addr` fields of a
661*f4febd00SPatrick Williams`struct sockaddr_mctp`.
662f0ca2e41SJeremy Kerr
663f0ca2e41SJeremy KerrThe `smctp_addr` field maps directly to the destination endpoint's EID.
664f0ca2e41SJeremy Kerr
665f0ca2e41SJeremy KerrThe `smctp_network` field specifies a locally defined network identifier. To
666f0ca2e41SJeremy Kerrsimplify situations where there is only one network defined, the special value
667f0ca2e41SJeremy Kerr`MCTP_NET_ANY` is allowed. This will allow the kernel to select a specific
668f0ca2e41SJeremy Kerrnetwork for transmission.
669f0ca2e41SJeremy Kerr
670f0ca2e41SJeremy KerrThis selection is entirely user-configured; one specific network may be defined
671f0ca2e41SJeremy Kerras the system default, in which case it will be used for all message
672f0ca2e41SJeremy Kerrtransmission where `MCTP_NET_ANY` is used as the destination network.
673f0ca2e41SJeremy Kerr
674f0ca2e41SJeremy KerrIn particular, the destination EID is never used to select a destination
675f0ca2e41SJeremy Kerrnetwork.
676f0ca2e41SJeremy Kerr
677f0ca2e41SJeremy KerrMCTP responders should use the EID and network values of an incoming request to
678f0ca2e41SJeremy Kerrspecify the destination for any responses.
679f0ca2e41SJeremy Kerr
680*f4febd00SPatrick Williams#### Bridging/routing
681f0ca2e41SJeremy Kerr
682f0ca2e41SJeremy KerrThe network and interface structure allows multiple interfaces to share a common
683f0ca2e41SJeremy Kerrnetwork. By default, packets are not forwarded between interfaces.
684f0ca2e41SJeremy Kerr
685f0ca2e41SJeremy KerrA network can be configured for "forwarding" mode. In this mode, packets may be
686f0ca2e41SJeremy Kerrforwarded if their destination EID is non-local, and matches a route for another
687f0ca2e41SJeremy Kerrinterface on the same network.
688f0ca2e41SJeremy Kerr
689f0ca2e41SJeremy KerrAs per DSP0236, packet reassembly does not occur during the forwarding process.
690f0ca2e41SJeremy KerrIf the packet is larger than the MTU for the destination interface/route, then
691f0ca2e41SJeremy Kerrthe packet is dropped.
692f0ca2e41SJeremy Kerr
693*f4febd00SPatrick Williams#### Tag behaviour for transmitted messages
694f0ca2e41SJeremy Kerr
695f0ca2e41SJeremy KerrOn every message sent with the tag-owner bit set ("TO" in DSP0236), the kernel
696f0ca2e41SJeremy Kerrmust allocate a tag that will uniquely identify responses over a (destination
697f0ca2e41SJeremy KerrEID, source EID, tag-owner, tag) tuple. The tag value is 3 bits in size.
698f0ca2e41SJeremy Kerr
699f0ca2e41SJeremy KerrTo allow this, a `sendto()` with the `MCTP_TAG_OWNER` bit set in the `smctp_tag`
700f0ca2e41SJeremy Kerrfield will cause the kernel to allocate a unique tag for subsequent replies from
701f0ca2e41SJeremy Kerrthat specific remote EID.
702f0ca2e41SJeremy Kerr
703f0ca2e41SJeremy KerrThis allocation will expire when any of the following occur:
704f0ca2e41SJeremy Kerr
705*f4febd00SPatrick Williams- the socket is closed
706*f4febd00SPatrick Williams- a new message is sent to a new destination EID
707*f4febd00SPatrick Williams- an implementation-defined timeout expires
708f0ca2e41SJeremy Kerr
709f0ca2e41SJeremy KerrBecause the "tag space" is limited, it may not be possible for the kernel to
710f0ca2e41SJeremy Kerrallocate a unique tag for the outgoing message. In this case, the `sendto()`
711f0ca2e41SJeremy Kerrcall will fail with errno `EAGAIN`. This is analogous to the UDP behaviour when
712f0ca2e41SJeremy Kerra local port cannot be allocated for an outgoing message.
713f0ca2e41SJeremy Kerr
714f0ca2e41SJeremy KerrThe implementation-defined timeout value shall be chosen to reasonably cover
715f0ca2e41SJeremy Kerrstandard reply timeouts. If necessary, this timeout may be modified through the
716f0ca2e41SJeremy Kerr`MCTP_TAG_CONTROL` socket option.
717f0ca2e41SJeremy Kerr
718f0ca2e41SJeremy KerrFor applications that expect to perform an ongoing message exchange with a
719f0ca2e41SJeremy Kerrparticular destination address, they may use the `connect()` call to set a
720f0ca2e41SJeremy Kerrpersistent remote address. In this case, the tag will be allocated during
721f0ca2e41SJeremy Kerrconnect(), and remain reserved for this socket until any of the following occur:
722f0ca2e41SJeremy Kerr
723*f4febd00SPatrick Williams- the socket is closed
724*f4febd00SPatrick Williams- the remote address is changed through another call to `connect()`.
725f0ca2e41SJeremy Kerr
726f0ca2e41SJeremy KerrIn particular, calling `sendto()` with a different address does not release the
727f0ca2e41SJeremy Kerrtag reservation.
728f0ca2e41SJeremy Kerr
729f0ca2e41SJeremy KerrBroadcast messages are particularly onerous for tag reservations. When a message
730f0ca2e41SJeremy Kerris transmitted with TO=1 and dest=0xff (the broadcast EID), the kernel must
731f0ca2e41SJeremy Kerrreserve the tag across the entire range of possible EIDs. Therefore, a
732f0ca2e41SJeremy Kerrparticular tag value must be currently-unused across all EIDs to allow a
733f0ca2e41SJeremy Kerr`sendto()` to a broadcast address. Additionally, this reservation is not cleared
734f0ca2e41SJeremy Kerrwhen a reply is received, as there may be multiple replies to a broadcast.
735f0ca2e41SJeremy Kerr
736f0ca2e41SJeremy KerrFor this reason, applications wanting to send to the broadcast address should
737f0ca2e41SJeremy Kerruse the `connect()` system call to reserve a tag, and guarantee its availability
738f0ca2e41SJeremy Kerrfor future message transmission. Note that this will remove the tag value for
739*f4febd00SPatrick Williamsuse with _any other EID_. Sending to the broadcast address should be avoided; we
740f0ca2e41SJeremy Kerrexpect few applications will need this functionality.
741f0ca2e41SJeremy Kerr
742*f4febd00SPatrick Williams#### MCTP Control Protocol implementation
743f0ca2e41SJeremy Kerr
744f0ca2e41SJeremy KerrAside from the "Resolve endpoint EID" message, the MCTP control protocol
745f0ca2e41SJeremy Kerrimplementation would exist as a userspace process, `mctpd`. This process is
746f0ca2e41SJeremy Kerrresponsible for responding to incoming control protocol messages, any dynamic
747f0ca2e41SJeremy KerrEID allocations (for bus owner devices) and maintaining the MCTP route table
748f0ca2e41SJeremy Kerr(for bridging devices).
749f0ca2e41SJeremy Kerr
750f0ca2e41SJeremy KerrThis process would create a socket bound to the type `MCTP_TYPE_CONTROL`, with
751f0ca2e41SJeremy Kerrthe `MCTP_ADDR_EXT` socket option enabled in order to access physical addressing
752f0ca2e41SJeremy Kerrdata on incoming control protocol requests. It would interact with the kernel's
753f0ca2e41SJeremy Kerrroute table via a netlink interface - the same as that implemented for the
754f0ca2e41SJeremy Kerr[Utility and configuration interfaces](#utility-and-configuration-interfaces).
755f0ca2e41SJeremy Kerr
756*f4febd00SPatrick Williams### Neighbour and routing implementation
757f0ca2e41SJeremy Kerr
758f0ca2e41SJeremy KerrThe packet-transmission behaviour of the MCTP infrastructure relies on a single
759f0ca2e41SJeremy Kerrrouting table to lookup both route and neighbour information. Entries in this
760f0ca2e41SJeremy Kerrtable are of the format:
761f0ca2e41SJeremy Kerr
762f0ca2e41SJeremy Kerr| EID range | interface | physical address | metric | MTU | flags | expiry |
763*f4febd00SPatrick Williams| --------- | --------- | ---------------- | ------ | --- | ----- | ------ |
764f0ca2e41SJeremy Kerr
765f0ca2e41SJeremy KerrThis table can be updated from two sources:
766f0ca2e41SJeremy Kerr
767*f4febd00SPatrick Williams- From userspace, via a netlink interface (see the
768f0ca2e41SJeremy Kerr  [Utility and configuration interfaces](#utility-and-configuration-interfaces)
769f0ca2e41SJeremy Kerr  section).
770f0ca2e41SJeremy Kerr
771*f4febd00SPatrick Williams- Directly within the kernel, when basic neighbour information is discovered.
772f0ca2e41SJeremy Kerr  Kernel-originated routes are marked as such in the flags field, and have a
773f0ca2e41SJeremy Kerr  maximum validity age, indicated by the expiry field.
774f0ca2e41SJeremy Kerr
775f0ca2e41SJeremy KerrKernel-discovered routing information can originate from two sources:
776f0ca2e41SJeremy Kerr
777*f4febd00SPatrick Williams- physical-to-EID mappings discovered through received packets
778f0ca2e41SJeremy Kerr
779*f4febd00SPatrick Williams- explicit endpoint physical-address resolution requests
780f0ca2e41SJeremy Kerr
781f0ca2e41SJeremy KerrWhen a packet is to be transmitted to an EID that does not have an entry in the
782f0ca2e41SJeremy Kerrrouting table, the kernel may attempt to resolve the physical address of that
783f0ca2e41SJeremy Kerrendpoint using the Resolve Endpoint ID command of the MCTP Control Protocol
784f0ca2e41SJeremy Kerr(section 12.9 of DSP0236). The response message will be used to add a
785f0ca2e41SJeremy Kerrkernel-originated route into the routing table.
786f0ca2e41SJeremy Kerr
787f0ca2e41SJeremy KerrThis is the only kernel-internal usage of MCTP Control Protocol messages.
788f0ca2e41SJeremy Kerr
789*f4febd00SPatrick Williams## Utility and configuration interfaces
790f0ca2e41SJeremy Kerr
791f0ca2e41SJeremy KerrA small utility will be developed to control the state of the kernel MCTP stack.
792f0ca2e41SJeremy KerrThis will be similar in design to the 'iproute2' tools, which perform a similar
793f0ca2e41SJeremy Kerrfunction for the IPv4 and IPv6 protocols.
794f0ca2e41SJeremy Kerr
795f0ca2e41SJeremy KerrThe utility will be invoked as `mctp`, and provide subcommands for managing
796f0ca2e41SJeremy Kerrdifferent aspects of the kernel stack.
797f0ca2e41SJeremy Kerr
798*f4febd00SPatrick Williams### `mctp link`: manage interfaces
799f0ca2e41SJeremy Kerr
800f0ca2e41SJeremy Kerr```sh
801f0ca2e41SJeremy Kerr    mctp link set <link> <up|down>
802f0ca2e41SJeremy Kerr    mctp link set <link> network <network-id>
803f0ca2e41SJeremy Kerr    mctp link set <link> mtu <mtu>
804f0ca2e41SJeremy Kerr    mctp link set <link> bus-owner <hwaddr>
805f0ca2e41SJeremy Kerr```
806f0ca2e41SJeremy Kerr
807*f4febd00SPatrick Williams### `mctp network`: manage networks
808f0ca2e41SJeremy Kerr
809f0ca2e41SJeremy Kerr```sh
810f0ca2e41SJeremy Kerr    mctp network create <network-id>
811f0ca2e41SJeremy Kerr    mctp network set <network-id> forwarding <on|off>
812f0ca2e41SJeremy Kerr    mctp network set <network-id> default [<true|false>]
813f0ca2e41SJeremy Kerr    mctp network delete <network-id>
814f0ca2e41SJeremy Kerr```
815f0ca2e41SJeremy Kerr
816*f4febd00SPatrick Williams### `mctp address`: manage local EID assignments
817f0ca2e41SJeremy Kerr
818f0ca2e41SJeremy Kerr```sh
819f0ca2e41SJeremy Kerr    mctp address add <eid> dev <link>
820f0ca2e41SJeremy Kerr    mctp address del <eid> dev <link>
821f0ca2e41SJeremy Kerr```
822f0ca2e41SJeremy Kerr
823*f4febd00SPatrick Williams### `mctp route`: manage routing tables
824f0ca2e41SJeremy Kerr
825f0ca2e41SJeremy Kerr```sh
826f0ca2e41SJeremy Kerr    mctp route add net <network-id> eid <eid|eid-range> via <link> [hwaddr <addr>] [mtu <mtu>] [metric <metric>]
827f0ca2e41SJeremy Kerr    mctp route del net <network-id> eid <eid|eid-range> via <link> [hwaddr <addr>] [mtu <mtu>] [metric <metric>]
828f0ca2e41SJeremy Kerr    mctp route show [net <network-id>]
829f0ca2e41SJeremy Kerr```
830f0ca2e41SJeremy Kerr
831*f4febd00SPatrick Williams### `mctp stat`: query socket status
832f0ca2e41SJeremy Kerr
833f0ca2e41SJeremy Kerr```sh
834f0ca2e41SJeremy Kerr    mctp stat
835f0ca2e41SJeremy Kerr```
836f0ca2e41SJeremy Kerr
837f0ca2e41SJeremy KerrA set of netlink message formats will be defined to support these control
838f0ca2e41SJeremy Kerrfunctions.
839f0ca2e41SJeremy Kerr
840*f4febd00SPatrick Williams# Design points & alternatives considered
841f0ca2e41SJeremy Kerr
842*f4febd00SPatrick Williams## Including message-type byte in send/receive buffers
843f0ca2e41SJeremy Kerr
844f0ca2e41SJeremy KerrThis design specifies that message buffers passed to the kernel in send syscalls
845f0ca2e41SJeremy Kerrand from the kernel in receive syscalls will have the message type byte as the
846f0ca2e41SJeremy Kerrfirst byte of the buffer. This corresponds to the definition of a MCTP message
847f0ca2e41SJeremy Kerrpayload in DSP0236.
848f0ca2e41SJeremy Kerr
849f0ca2e41SJeremy KerrThis somewhat duplicates the type data provided in `struct sockaddr_mctp`; it's
850f0ca2e41SJeremy Kerrsuperficially possible for the kernel to prepend this byte on send, and remove
851f0ca2e41SJeremy Kerrit on receive.
852f0ca2e41SJeremy Kerr
853f0ca2e41SJeremy KerrHowever, the exact format of the MCTP message payload is not precisely defined
854f0ca2e41SJeremy Kerrby the specification. Particularly, any message integrity check data (which
855f0ca2e41SJeremy Kerrwould also need to be appended / stripped in conjunction with the type byte) is
856f0ca2e41SJeremy Kerrdefined by the type specification, not DSP0236. The kernel would need knowledge
857f0ca2e41SJeremy Kerrof all protocols in order to correctly deconstruct the payload data.
858f0ca2e41SJeremy Kerr
859f0ca2e41SJeremy KerrTherefore, we transfer the message payload as-is to userspace, without any
860f0ca2e41SJeremy Kerrmodification by the kernel.
861f0ca2e41SJeremy Kerr
862*f4febd00SPatrick Williams## MCTP message-type specification: using `sockaddr_mctp.smctp_type` rather than protocol
863f0ca2e41SJeremy Kerr
864f0ca2e41SJeremy KerrThis design specifies message-types to be passed in the `smctp_type` field of
865f0ca2e41SJeremy Kerr`struct sockaddr_mctp`. An alternative would be to pass it in the `protocol`
866f0ca2e41SJeremy Kerrargument of the `socket()` system call:
867f0ca2e41SJeremy Kerr
868f0ca2e41SJeremy Kerr```c
869f0ca2e41SJeremy Kerr    int socket(int domain /* = AF_MCTP */, int type /* = SOCK_DGRAM */, int protocol);
870f0ca2e41SJeremy Kerr```
871f0ca2e41SJeremy Kerr
872f0ca2e41SJeremy KerrThe `smctp_type` implementation was chosen as it better matches the "addressing"
873f0ca2e41SJeremy Kerrmodel of the message type; sockets are bound to an incoming message type,
874*f4febd00SPatrick Williamssimilar to the IP protocol's model of binding UDP sockets to a local port
875*f4febd00SPatrick Williamsnumber.
876f0ca2e41SJeremy Kerr
877f0ca2e41SJeremy KerrThere is no kernel behaviour that depends on the specific type (particularly
878f0ca2e41SJeremy Kerrgiven the design choice above), so it is not suited to use the protocol argument
879f0ca2e41SJeremy Kerrhere.
880f0ca2e41SJeremy Kerr
881f0ca2e41SJeremy KerrFuture additions that perform protocol-specific message handling, and so alter
882f0ca2e41SJeremy Kerrthe send/receive buffer format, may use a new protocol argument.
883f0ca2e41SJeremy Kerr
884*f4febd00SPatrick Williams## Networks referenced by index rather than UUID
885f0ca2e41SJeremy Kerr
886f0ca2e41SJeremy KerrThis design proposes referencing networks by an integer index. The MCTP standard
887f0ca2e41SJeremy Kerrdoes optionally associate a RFC4122 UUID with a networks; it would be possible
888f0ca2e41SJeremy Kerrto use this UUID where we pass a network identifier.
889f0ca2e41SJeremy Kerr
890f0ca2e41SJeremy KerrThis approach does not incorporate knowledge of network UUIDs in the kernel.
891f0ca2e41SJeremy KerrGiven that the Get Network ID message in the MCTP Control Protocol is
892f0ca2e41SJeremy Kerrimplemented entirely via userspace, it does not need to be aware of network
893f0ca2e41SJeremy KerrUUIDs, and requiring network references (for example, the `smctp_network` field
894f0ca2e41SJeremy Kerrof `struct sockaddr_mctp`, as type `uuid_t`) complicates assignment.
895f0ca2e41SJeremy Kerr
896f0ca2e41SJeremy KerrInstead, the index integer is used instead, in a similar fashion to the integer
897f0ca2e41SJeremy Kerrindex used to reference `struct netdevice`s elsewhere in the network stack.
898f0ca2e41SJeremy Kerr
899*f4febd00SPatrick Williams## Tag behaviour alternatives
900f0ca2e41SJeremy Kerr
901*f4febd00SPatrick WilliamsWe considered _several_ different designs for the tag handling behaviour. A
902f0ca2e41SJeremy Kerrbrief overview of the more-feasible of those, and why they were rejected:
903f0ca2e41SJeremy Kerr
904*f4febd00SPatrick Williams### Each socket is allocated a unique tag value on creation
905f0ca2e41SJeremy Kerr
906f0ca2e41SJeremy KerrWe could allocate a tag for each socket on creation, and use that value when a
907f0ca2e41SJeremy Kerrtag is required. This, however:
908f0ca2e41SJeremy Kerr
909*f4febd00SPatrick Williams- needlessly consumes a tag on non-tag-owning sockets (ie, those which send with
910*f4febd00SPatrick Williams  TO=0 - responders); and
911f0ca2e41SJeremy Kerr
912*f4febd00SPatrick Williams- limits us to 8 sockets per network.
913f0ca2e41SJeremy Kerr
914*f4febd00SPatrick Williams### Tags only used for message packetisation / reassembly
915f0ca2e41SJeremy Kerr
916f0ca2e41SJeremy KerrAn alternative would be to completely dissociate tag allocation from sockets;
917f0ca2e41SJeremy Kerrand only allocate a tag for the (short-lived) task of packetising a message, and
918*f4febd00SPatrick Williamssending those packets. Tags would be released when the last packet has been
919*f4febd00SPatrick Williamssent.
920f0ca2e41SJeremy Kerr
921f0ca2e41SJeremy KerrHowever, this removes any facility to correlate responses with the correct
922f0ca2e41SJeremy Kerrsocket, which is the purpose of the TO bit in DSP0236. In order for the sending
923f0ca2e41SJeremy Kerrapplication to receive the response, we would either need to:
924f0ca2e41SJeremy Kerr
925*f4febd00SPatrick Williams- limit the system to one socket of each message type (which, for example,
926f0ca2e41SJeremy Kerr  precludes running a requester and a responder of the same type); or
927f0ca2e41SJeremy Kerr
928*f4febd00SPatrick Williams- forward all incoming messages of a specific message-type to all sockets
929*f4febd00SPatrick Williams  listening on that type, making it trivial to eavesdrop on MCTP data of other
930*f4febd00SPatrick Williams  applications
931f0ca2e41SJeremy Kerr
932*f4febd00SPatrick Williams### Allocate a tag for one request/response pair
933f0ca2e41SJeremy Kerr
934f0ca2e41SJeremy KerrAnother alternative would be to allocate a tag on each outgoing TO=1 message,
935*f4febd00SPatrick Williamsand then release that allocation after the incoming response to that tag (TO=0)
936*f4febd00SPatrick Williamsis observed.
937f0ca2e41SJeremy Kerr
938f0ca2e41SJeremy KerrHowever, MCTP protocols exist that do not have a 1:1 mapping of responses to
939f0ca2e41SJeremy Kerrrequests - more than one response may be valid for a given request message. For
940f0ca2e41SJeremy Kerrexample, in response to a request, a NVMe-MI implementation may send an
941f0ca2e41SJeremy Kerrin-progress reply before the final reply. In this case, we would release the tag
942f0ca2e41SJeremy Kerrafter the first response is received, and then have no way to correlate the
943f0ca2e41SJeremy Kerrsecond message with the socket.
944f0ca2e41SJeremy Kerr
945f0ca2e41SJeremy KerrBroadcast MCTP request messages may have multiple replies from multiple
946f0ca2e41SJeremy Kerrendpoints, meaning we cannot release the tag allocation on the first reply.
947