1510156a7SJakub Kicinski.. SPDX-License-Identifier: BSD-3-Clause 2510156a7SJakub Kicinski 3510156a7SJakub Kicinski======================= 4510156a7SJakub KicinskiIntroduction to Netlink 5510156a7SJakub Kicinski======================= 6510156a7SJakub Kicinski 7510156a7SJakub KicinskiNetlink is often described as an ioctl() replacement. 8510156a7SJakub KicinskiIt aims to replace fixed-format C structures as supplied 9510156a7SJakub Kicinskito ioctl() with a format which allows an easy way to add 10510156a7SJakub Kicinskior extended the arguments. 11510156a7SJakub Kicinski 12510156a7SJakub KicinskiTo achieve this Netlink uses a minimal fixed-format metadata header 13510156a7SJakub Kicinskifollowed by multiple attributes in the TLV (type, length, value) format. 14510156a7SJakub Kicinski 15510156a7SJakub KicinskiUnfortunately the protocol has evolved over the years, in an organic 16510156a7SJakub Kicinskiand undocumented fashion, making it hard to coherently explain. 17510156a7SJakub KicinskiTo make the most practical sense this document starts by describing 18510156a7SJakub Kicinskinetlink as it is used today and dives into more "historical" uses 19510156a7SJakub Kicinskiin later sections. 20510156a7SJakub Kicinski 21510156a7SJakub KicinskiOpening a socket 22510156a7SJakub Kicinski================ 23510156a7SJakub Kicinski 24510156a7SJakub KicinskiNetlink communication happens over sockets, a socket needs to be 25510156a7SJakub Kicinskiopened first: 26510156a7SJakub Kicinski 27510156a7SJakub Kicinski.. code-block:: c 28510156a7SJakub Kicinski 29510156a7SJakub Kicinski fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); 30510156a7SJakub Kicinski 31510156a7SJakub KicinskiThe use of sockets allows for a natural way of exchanging information 32510156a7SJakub Kicinskiin both directions (to and from the kernel). The operations are still 33510156a7SJakub Kicinskiperformed synchronously when applications send() the request but 34510156a7SJakub Kicinskia separate recv() system call is needed to read the reply. 35510156a7SJakub Kicinski 36510156a7SJakub KicinskiA very simplified flow of a Netlink "call" will therefore look 37510156a7SJakub Kicinskisomething like: 38510156a7SJakub Kicinski 39510156a7SJakub Kicinski.. code-block:: c 40510156a7SJakub Kicinski 41510156a7SJakub Kicinski fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); 42510156a7SJakub Kicinski 43510156a7SJakub Kicinski /* format the request */ 44510156a7SJakub Kicinski send(fd, &request, sizeof(request)); 45510156a7SJakub Kicinski n = recv(fd, &response, RSP_BUFFER_SIZE); 46510156a7SJakub Kicinski /* interpret the response */ 47510156a7SJakub Kicinski 48510156a7SJakub KicinskiNetlink also provides natural support for "dumping", i.e. communicating 49510156a7SJakub Kicinskito user space all objects of a certain type (e.g. dumping all network 50510156a7SJakub Kicinskiinterfaces). 51510156a7SJakub Kicinski 52510156a7SJakub Kicinski.. code-block:: c 53510156a7SJakub Kicinski 54510156a7SJakub Kicinski fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); 55510156a7SJakub Kicinski 56510156a7SJakub Kicinski /* format the dump request */ 57510156a7SJakub Kicinski send(fd, &request, sizeof(request)); 58510156a7SJakub Kicinski while (1) { 59510156a7SJakub Kicinski n = recv(fd, &buffer, RSP_BUFFER_SIZE); 60510156a7SJakub Kicinski /* one recv() call can read multiple messages, hence the loop below */ 61510156a7SJakub Kicinski for (nl_msg in buffer) { 62510156a7SJakub Kicinski if (nl_msg.nlmsg_type == NLMSG_DONE) 63510156a7SJakub Kicinski goto dump_finished; 64510156a7SJakub Kicinski /* process the object */ 65510156a7SJakub Kicinski } 66510156a7SJakub Kicinski } 67510156a7SJakub Kicinski dump_finished: 68510156a7SJakub Kicinski 69510156a7SJakub KicinskiThe first two arguments of the socket() call require little explanation - 70510156a7SJakub Kicinskiit is opening a Netlink socket, with all headers provided by the user 71510156a7SJakub Kicinski(hence NETLINK, RAW). The last argument is the protocol within Netlink. 72510156a7SJakub KicinskiThis field used to identify the subsystem with which the socket will 73510156a7SJakub Kicinskicommunicate. 74510156a7SJakub Kicinski 75510156a7SJakub KicinskiClassic vs Generic Netlink 76510156a7SJakub Kicinski-------------------------- 77510156a7SJakub Kicinski 78510156a7SJakub KicinskiInitial implementation of Netlink depended on a static allocation 79510156a7SJakub Kicinskiof IDs to subsystems and provided little supporting infrastructure. 80510156a7SJakub KicinskiLet us refer to those protocols collectively as **Classic Netlink**. 81510156a7SJakub KicinskiThe list of them is defined on top of the ``include/uapi/linux/netlink.h`` 82510156a7SJakub Kicinskifile, they include among others - general networking (NETLINK_ROUTE), 83510156a7SJakub KicinskiiSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT). 84510156a7SJakub Kicinski 85510156a7SJakub Kicinski**Generic Netlink** (introduced in 2005) allows for dynamic registration of 86510156a7SJakub Kicinskisubsystems (and subsystem ID allocation), introspection and simplifies 87510156a7SJakub Kicinskiimplementing the kernel side of the interface. 88510156a7SJakub Kicinski 89510156a7SJakub KicinskiThe following section describes how to use Generic Netlink, as the 90510156a7SJakub Kicinskinumber of subsystems using Generic Netlink outnumbers the older 91510156a7SJakub Kicinskiprotocols by an order of magnitude. There are also no plans for adding 92510156a7SJakub Kicinskimore Classic Netlink protocols to the kernel. 93510156a7SJakub KicinskiBasic information on how communicating with core networking parts of 94510156a7SJakub Kicinskithe Linux kernel (or another of the 20 subsystems using Classic 95510156a7SJakub KicinskiNetlink) differs from Generic Netlink is provided later in this document. 96510156a7SJakub Kicinski 97510156a7SJakub KicinskiGeneric Netlink 98510156a7SJakub Kicinski=============== 99510156a7SJakub Kicinski 100510156a7SJakub KicinskiIn addition to the Netlink fixed metadata header each Netlink protocol 101510156a7SJakub Kicinskidefines its own fixed metadata header. (Similarly to how network 102510156a7SJakub Kicinskiheaders stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.) 103510156a7SJakub Kicinski 104510156a7SJakub KicinskiA Netlink message always starts with struct nlmsghdr, which is followed 105510156a7SJakub Kicinskiby a protocol-specific header. In case of Generic Netlink the protocol 106510156a7SJakub Kicinskiheader is struct genlmsghdr. 107510156a7SJakub Kicinski 108510156a7SJakub KicinskiThe practical meaning of the fields in case of Generic Netlink is as follows: 109510156a7SJakub Kicinski 110510156a7SJakub Kicinski.. code-block:: c 111510156a7SJakub Kicinski 112510156a7SJakub Kicinski struct nlmsghdr { 113510156a7SJakub Kicinski __u32 nlmsg_len; /* Length of message including headers */ 114510156a7SJakub Kicinski __u16 nlmsg_type; /* Generic Netlink Family (subsystem) ID */ 115510156a7SJakub Kicinski __u16 nlmsg_flags; /* Flags - request or dump */ 116510156a7SJakub Kicinski __u32 nlmsg_seq; /* Sequence number */ 117510156a7SJakub Kicinski __u32 nlmsg_pid; /* Port ID, set to 0 */ 118510156a7SJakub Kicinski }; 119510156a7SJakub Kicinski struct genlmsghdr { 120510156a7SJakub Kicinski __u8 cmd; /* Command, as defined by the Family */ 121510156a7SJakub Kicinski __u8 version; /* Irrelevant, set to 1 */ 122510156a7SJakub Kicinski __u16 reserved; /* Reserved, set to 0 */ 123510156a7SJakub Kicinski }; 124510156a7SJakub Kicinski /* TLV attributes follow... */ 125510156a7SJakub Kicinski 126510156a7SJakub KicinskiIn Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify 127510156a7SJakub Kicinskiwhich operation within the subsystem the message was referring to 128510156a7SJakub Kicinski(e.g. get information about a netdev). Generic Netlink needs to mux 129510156a7SJakub Kicinskimultiple subsystems in a single protocol so it uses this field to 130510156a7SJakub Kicinskiidentify the subsystem, and :c:member:`genlmsghdr.cmd` identifies 131510156a7SJakub Kicinskithe operation instead. (See :ref:`res_fam` for 132510156a7SJakub Kicinskiinformation on how to find the Family ID of the subsystem of interest.) 133510156a7SJakub KicinskiNote that the first 16 values (0 - 15) of this field are reserved for 134510156a7SJakub Kicinskicontrol messages both in Classic Netlink and Generic Netlink. 135510156a7SJakub KicinskiSee :ref:`nl_msg_type` for more details. 136510156a7SJakub Kicinski 137510156a7SJakub KicinskiThere are 3 usual types of message exchanges on a Netlink socket: 138510156a7SJakub Kicinski 139510156a7SJakub Kicinski - performing a single action (``do``); 140510156a7SJakub Kicinski - dumping information (``dump``); 141510156a7SJakub Kicinski - getting asynchronous notifications (``multicast``). 142510156a7SJakub Kicinski 143510156a7SJakub KicinskiClassic Netlink is very flexible and presumably allows other types 144510156a7SJakub Kicinskiof exchanges to happen, but in practice those are the three that get 145510156a7SJakub Kicinskiused. 146510156a7SJakub Kicinski 147510156a7SJakub KicinskiAsynchronous notifications are sent by the kernel and received by 148510156a7SJakub Kicinskithe user sockets which subscribed to them. ``do`` and ``dump`` requests 149510156a7SJakub Kicinskiare initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should 150510156a7SJakub Kicinskibe set as follows: 151510156a7SJakub Kicinski 152510156a7SJakub Kicinski - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK`` 153510156a7SJakub Kicinski - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP`` 154510156a7SJakub Kicinski 155510156a7SJakub Kicinski:c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically 156510156a7SJakub Kicinskiincreasing value. The value gets echoed back in responses and doesn't 157510156a7SJakub Kicinskimatter in practice, but setting it to an increasing value for each 158510156a7SJakub Kicinskimessage sent is considered good hygiene. The purpose of the field is 159510156a7SJakub Kicinskimatching responses to requests. Asynchronous notifications will have 160510156a7SJakub Kicinski:c:member:`nlmsghdr.nlmsg_seq` of ``0``. 161510156a7SJakub Kicinski 162510156a7SJakub Kicinski:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. 163510156a7SJakub KicinskiThis field can be set to ``0`` when talking to the kernel. 164510156a7SJakub KicinskiSee :ref:`nlmsg_pid` for the (uncommon) uses of the field. 165510156a7SJakub Kicinski 166510156a7SJakub KicinskiThe expected use for :c:member:`genlmsghdr.version` was to allow 167510156a7SJakub Kicinskiversioning of the APIs provided by the subsystems. No subsystem to 168510156a7SJakub Kicinskidate made significant use of this field, so setting it to ``1`` seems 169510156a7SJakub Kicinskilike a safe bet. 170510156a7SJakub Kicinski 171510156a7SJakub Kicinski.. _nl_msg_type: 172510156a7SJakub Kicinski 173510156a7SJakub KicinskiNetlink message types 174510156a7SJakub Kicinski--------------------- 175510156a7SJakub Kicinski 176510156a7SJakub KicinskiAs previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries 177510156a7SJakub Kicinskiprotocol specific values but the first 16 identifiers are reserved 178510156a7SJakub Kicinski(first subsystem specific message type should be equal to 179510156a7SJakub Kicinski``NLMSG_MIN_TYPE`` which is ``0x10``). 180510156a7SJakub Kicinski 181510156a7SJakub KicinskiThere are only 4 Netlink control messages defined: 182510156a7SJakub Kicinski 183510156a7SJakub Kicinski - ``NLMSG_NOOP`` - ignore the message, not used in practice; 184510156a7SJakub Kicinski - ``NLMSG_ERROR`` - carries the return code of an operation; 185510156a7SJakub Kicinski - ``NLMSG_DONE`` - marks the end of a dump; 186510156a7SJakub Kicinski - ``NLMSG_OVERRUN`` - socket buffer has overflown, not used to date. 187510156a7SJakub Kicinski 188510156a7SJakub Kicinski``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance. 189510156a7SJakub KicinskiThey carry return codes for operations. Note that unless 190510156a7SJakub Kicinskithe ``NLM_F_ACK`` flag is set on the request Netlink will not respond 191510156a7SJakub Kicinskiwith ``NLMSG_ERROR`` if there is no error. To avoid having to special-case 192510156a7SJakub Kicinskithis quirk it is recommended to always set ``NLM_F_ACK``. 193510156a7SJakub Kicinski 194510156a7SJakub KicinskiThe format of ``NLMSG_ERROR`` is described by struct nlmsgerr:: 195510156a7SJakub Kicinski 196510156a7SJakub Kicinski ---------------------------------------------- 197510156a7SJakub Kicinski | struct nlmsghdr - response header | 198510156a7SJakub Kicinski ---------------------------------------------- 199510156a7SJakub Kicinski | int error | 200510156a7SJakub Kicinski ---------------------------------------------- 201510156a7SJakub Kicinski | struct nlmsghdr - original request header | 202510156a7SJakub Kicinski ---------------------------------------------- 203510156a7SJakub Kicinski | ** optionally (1) payload of the request | 204510156a7SJakub Kicinski ---------------------------------------------- 205510156a7SJakub Kicinski | ** optionally (2) extended ACK | 206510156a7SJakub Kicinski ---------------------------------------------- 207510156a7SJakub Kicinski 208510156a7SJakub KicinskiThere are two instances of struct nlmsghdr here, first of the response 209510156a7SJakub Kicinskiand second of the request. ``NLMSG_ERROR`` carries the information about 210510156a7SJakub Kicinskithe request which led to the error. This could be useful when trying 211510156a7SJakub Kicinskito match requests to responses or re-parse the request to dump it into 212510156a7SJakub Kicinskilogs. 213510156a7SJakub Kicinski 214510156a7SJakub KicinskiThe payload of the request is not echoed in messages reporting success 215510156a7SJakub Kicinski(``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set. 216510156a7SJakub KicinskiThe latter is common 217510156a7SJakub Kicinskiand perhaps recommended as having to read a copy of every request back 218510156a7SJakub Kicinskifrom the kernel is rather wasteful. The absence of request payload 219510156a7SJakub Kicinskiis indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`. 220510156a7SJakub Kicinski 221510156a7SJakub KicinskiThe second optional element of ``NLMSG_ERROR`` are the extended ACK 222510156a7SJakub Kicinskiattributes. See :ref:`ext_ack` for more details. The presence 223510156a7SJakub Kicinskiof extended ACK is indicated by ``NLM_F_ACK_TLVS`` in 224510156a7SJakub Kicinski:c:member:`nlmsghdr.nlmsg_flags`. 225510156a7SJakub Kicinski 226510156a7SJakub Kicinski``NLMSG_DONE`` is simpler, the request is never echoed but the extended 227510156a7SJakub KicinskiACK attributes may be present:: 228510156a7SJakub Kicinski 229510156a7SJakub Kicinski ---------------------------------------------- 230510156a7SJakub Kicinski | struct nlmsghdr - response header | 231510156a7SJakub Kicinski ---------------------------------------------- 232510156a7SJakub Kicinski | int error | 233510156a7SJakub Kicinski ---------------------------------------------- 234510156a7SJakub Kicinski | ** optionally extended ACK | 235510156a7SJakub Kicinski ---------------------------------------------- 236510156a7SJakub Kicinski 237510156a7SJakub Kicinski.. _res_fam: 238510156a7SJakub Kicinski 239510156a7SJakub KicinskiResolving the Family ID 240510156a7SJakub Kicinski----------------------- 241510156a7SJakub Kicinski 242510156a7SJakub KicinskiThis section explains how to find the Family ID of a subsystem. 243510156a7SJakub KicinskiIt also serves as an example of Generic Netlink communication. 244510156a7SJakub Kicinski 245510156a7SJakub KicinskiGeneric Netlink is itself a subsystem exposed via the Generic Netlink API. 246510156a7SJakub KicinskiTo avoid a circular dependency Generic Netlink has a statically allocated 247510156a7SJakub KicinskiFamily ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``). 248510156a7SJakub KicinskiThe Generic Netlink family implements a command used to find out information 249510156a7SJakub Kicinskiabout other families (``CTRL_CMD_GETFAMILY``). 250510156a7SJakub Kicinski 251510156a7SJakub KicinskiTo get information about the Generic Netlink family named for example 252510156a7SJakub Kicinski``"test1"`` we need to send a message on the previously opened Generic Netlink 253510156a7SJakub Kicinskisocket. The message should target the Generic Netlink Family (1), be a 254510156a7SJakub Kicinski``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this 255510156a7SJakub Kicinskicall would make the kernel respond with information about *all* the families 256510156a7SJakub Kicinskiit knows about. Last but not least the name of the family in question has 257510156a7SJakub Kicinskito be specified (4) as an attribute with the appropriate type:: 258510156a7SJakub Kicinski 259510156a7SJakub Kicinski struct nlmsghdr: 260510156a7SJakub Kicinski __u32 nlmsg_len: 32 261510156a7SJakub Kicinski __u16 nlmsg_type: GENL_ID_CTRL // (1) 262510156a7SJakub Kicinski __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK // (2) 263510156a7SJakub Kicinski __u32 nlmsg_seq: 1 264510156a7SJakub Kicinski __u32 nlmsg_pid: 0 265510156a7SJakub Kicinski 266510156a7SJakub Kicinski struct genlmsghdr: 267510156a7SJakub Kicinski __u8 cmd: CTRL_CMD_GETFAMILY // (3) 268510156a7SJakub Kicinski __u8 version: 2 /* or 1, doesn't matter */ 269510156a7SJakub Kicinski __u16 reserved: 0 270510156a7SJakub Kicinski 271510156a7SJakub Kicinski struct nlattr: // (4) 272510156a7SJakub Kicinski __u16 nla_len: 10 273510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_NAME 274510156a7SJakub Kicinski char data: test1\0 275510156a7SJakub Kicinski 276510156a7SJakub Kicinski (padding:) 277510156a7SJakub Kicinski char data: \0\0 278510156a7SJakub Kicinski 279510156a7SJakub KicinskiThe length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len` 280510156a7SJakub Kicinskiand :c:member:`nlattr.nla_len`) always *include* the header. 281510156a7SJakub KicinskiAttribute headers in netlink must be aligned to 4 bytes from the start 282510156a7SJakub Kicinskiof the message, hence the extra ``\0\0`` after ``CTRL_ATTR_FAMILY_NAME``. 283510156a7SJakub KicinskiThe attribute lengths *exclude* the padding. 284510156a7SJakub Kicinski 285510156a7SJakub KicinskiIf the family is found kernel will reply with two messages, the response 286510156a7SJakub Kicinskiwith all the information about the family:: 287510156a7SJakub Kicinski 288510156a7SJakub Kicinski /* Message #1 - reply */ 289510156a7SJakub Kicinski struct nlmsghdr: 290510156a7SJakub Kicinski __u32 nlmsg_len: 136 291510156a7SJakub Kicinski __u16 nlmsg_type: GENL_ID_CTRL 292510156a7SJakub Kicinski __u16 nlmsg_flags: 0 293510156a7SJakub Kicinski __u32 nlmsg_seq: 1 /* echoed from our request */ 294510156a7SJakub Kicinski __u32 nlmsg_pid: 5831 /* The PID of our user space process */ 295510156a7SJakub Kicinski 296510156a7SJakub Kicinski struct genlmsghdr: 297510156a7SJakub Kicinski __u8 cmd: CTRL_CMD_GETFAMILY 298510156a7SJakub Kicinski __u8 version: 2 299510156a7SJakub Kicinski __u16 reserved: 0 300510156a7SJakub Kicinski 301510156a7SJakub Kicinski struct nlattr: 302510156a7SJakub Kicinski __u16 nla_len: 10 303510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_NAME 304510156a7SJakub Kicinski char data: test1\0 305510156a7SJakub Kicinski 306510156a7SJakub Kicinski (padding:) 307510156a7SJakub Kicinski data: \0\0 308510156a7SJakub Kicinski 309510156a7SJakub Kicinski struct nlattr: 310510156a7SJakub Kicinski __u16 nla_len: 6 311510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_ID 312510156a7SJakub Kicinski __u16: 123 /* The Family ID we are after */ 313510156a7SJakub Kicinski 314510156a7SJakub Kicinski (padding:) 315510156a7SJakub Kicinski char data: \0\0 316510156a7SJakub Kicinski 317510156a7SJakub Kicinski struct nlattr: 318510156a7SJakub Kicinski __u16 nla_len: 9 319510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_VERSION 320510156a7SJakub Kicinski __u16: 1 321510156a7SJakub Kicinski 322510156a7SJakub Kicinski /* ... etc, more attributes will follow. */ 323510156a7SJakub Kicinski 324510156a7SJakub KicinskiAnd the error code (success) since ``NLM_F_ACK`` had been set on the request:: 325510156a7SJakub Kicinski 326510156a7SJakub Kicinski /* Message #2 - the ACK */ 327510156a7SJakub Kicinski struct nlmsghdr: 328510156a7SJakub Kicinski __u32 nlmsg_len: 36 329510156a7SJakub Kicinski __u16 nlmsg_type: NLMSG_ERROR 330510156a7SJakub Kicinski __u16 nlmsg_flags: NLM_F_CAPPED /* There won't be a payload */ 331510156a7SJakub Kicinski __u32 nlmsg_seq: 1 /* echoed from our request */ 332510156a7SJakub Kicinski __u32 nlmsg_pid: 5831 /* The PID of our user space process */ 333510156a7SJakub Kicinski 334510156a7SJakub Kicinski int error: 0 335510156a7SJakub Kicinski 336510156a7SJakub Kicinski struct nlmsghdr: /* Copy of the request header as we sent it */ 337510156a7SJakub Kicinski __u32 nlmsg_len: 32 338510156a7SJakub Kicinski __u16 nlmsg_type: GENL_ID_CTRL 339510156a7SJakub Kicinski __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK 340510156a7SJakub Kicinski __u32 nlmsg_seq: 1 341510156a7SJakub Kicinski __u32 nlmsg_pid: 0 342510156a7SJakub Kicinski 343510156a7SJakub KicinskiThe order of attributes (struct nlattr) is not guaranteed so the user 344510156a7SJakub Kicinskihas to walk the attributes and parse them. 345510156a7SJakub Kicinski 346510156a7SJakub KicinskiNote that Generic Netlink sockets are not associated or bound to a single 347510156a7SJakub Kicinskifamily. A socket can be used to exchange messages with many different 348510156a7SJakub Kicinskifamilies, selecting the recipient family on message-by-message basis using 349510156a7SJakub Kicinskithe :c:member:`nlmsghdr.nlmsg_type` field. 350510156a7SJakub Kicinski 351510156a7SJakub Kicinski.. _ext_ack: 352510156a7SJakub Kicinski 353510156a7SJakub KicinskiExtended ACK 354510156a7SJakub Kicinski------------ 355510156a7SJakub Kicinski 356510156a7SJakub KicinskiExtended ACK controls reporting of additional error/warning TLVs 357510156a7SJakub Kicinskiin ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward 358510156a7SJakub Kicinskicompatibility this feature has to be explicitly enabled by setting 359510156a7SJakub Kicinskithe ``NETLINK_EXT_ACK`` setsockopt() to ``1``. 360510156a7SJakub Kicinski 361510156a7SJakub KicinskiTypes of extended ack attributes are defined in enum nlmsgerr_attrs. 362690252f1SJakub KicinskiThe most commonly used attributes are ``NLMSGERR_ATTR_MSG``, 363690252f1SJakub Kicinski``NLMSGERR_ATTR_OFFS`` and ``NLMSGERR_ATTR_MISS_*``. 364510156a7SJakub Kicinski 365510156a7SJakub Kicinski``NLMSGERR_ATTR_MSG`` carries a message in English describing 366510156a7SJakub Kicinskithe encountered problem. These messages are far more detailed 367510156a7SJakub Kicinskithan what can be expressed thru standard UNIX error codes. 368510156a7SJakub Kicinski 369510156a7SJakub Kicinski``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem. 370510156a7SJakub Kicinski 371690252f1SJakub Kicinski``NLMSGERR_ATTR_MISS_TYPE`` and ``NLMSGERR_ATTR_MISS_NEST`` 372690252f1SJakub Kicinskiinform about a missing attribute. 373690252f1SJakub Kicinski 374510156a7SJakub KicinskiExtended ACKs can be reported on errors as well as in case of success. 375510156a7SJakub KicinskiThe latter should be treated as a warning. 376510156a7SJakub Kicinski 377510156a7SJakub KicinskiExtended ACKs greatly improve the usability of Netlink and should 378510156a7SJakub Kicinskialways be enabled, appropriately parsed and reported to the user. 379510156a7SJakub Kicinski 380510156a7SJakub KicinskiAdvanced topics 381510156a7SJakub Kicinski=============== 382510156a7SJakub Kicinski 383510156a7SJakub KicinskiDump consistency 384510156a7SJakub Kicinski---------------- 385510156a7SJakub Kicinski 386510156a7SJakub KicinskiSome of the data structures kernel uses for storing objects make 387510156a7SJakub Kicinskiit hard to provide an atomic snapshot of all the objects in a dump 388510156a7SJakub Kicinski(without impacting the fast-paths updating them). 389510156a7SJakub Kicinski 390510156a7SJakub KicinskiKernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump 391510156a7SJakub Kicinski(including the ``NLMSG_DONE`` message) if the dump was interrupted and 392510156a7SJakub Kicinskimay be inconsistent (e.g. missing objects). User space should retry 393510156a7SJakub Kicinskithe dump if it sees the flag set. 394510156a7SJakub Kicinski 395510156a7SJakub KicinskiIntrospection 396510156a7SJakub Kicinski------------- 397510156a7SJakub Kicinski 398510156a7SJakub KicinskiThe basic introspection abilities are enabled by access to the Family 399510156a7SJakub Kicinskiobject as reported in :ref:`res_fam`. User can query information about 400510156a7SJakub Kicinskithe Generic Netlink family, including which operations are supported 401510156a7SJakub Kicinskiby the kernel and what attributes the kernel understands. 402510156a7SJakub KicinskiFamily information includes the highest ID of an attribute kernel can parse, 403510156a7SJakub Kicinskia separate command (``CTRL_CMD_GETPOLICY``) provides detailed information 404510156a7SJakub Kicinskiabout supported attributes, including ranges of values the kernel accepts. 405510156a7SJakub Kicinski 406510156a7SJakub KicinskiQuerying family information is useful in cases when user space needs 407510156a7SJakub Kicinskito make sure that the kernel has support for a feature before issuing 408510156a7SJakub Kicinskia request. 409510156a7SJakub Kicinski 410510156a7SJakub Kicinski.. _nlmsg_pid: 411510156a7SJakub Kicinski 412510156a7SJakub Kicinskinlmsg_pid 413510156a7SJakub Kicinski--------- 414510156a7SJakub Kicinski 415510156a7SJakub Kicinski:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. 416510156a7SJakub KicinskiIt is referred to as Port ID, sometimes Process ID because for historical 417510156a7SJakub Kicinskireasons if the application does not select (bind() to) an explicit Port ID 418510156a7SJakub Kicinskikernel will automatically assign it the ID equal to its Process ID 419510156a7SJakub Kicinski(as reported by the getpid() system call). 420510156a7SJakub Kicinski 421510156a7SJakub KicinskiSimilarly to the bind() semantics of the TCP/IP network protocols the value 422510156a7SJakub Kicinskiof zero means "assign automatically", hence it is common for applications 423510156a7SJakub Kicinskito leave the :c:member:`nlmsghdr.nlmsg_pid` field initialized to ``0``. 424510156a7SJakub Kicinski 425510156a7SJakub KicinskiThe field is still used today in rare cases when kernel needs to send 426510156a7SJakub Kicinskia unicast notification. User space application can use bind() to associate 427510156a7SJakub Kicinskiits socket with a specific PID, it then communicates its PID to the kernel. 428510156a7SJakub KicinskiThis way the kernel can reach the specific user space process. 429510156a7SJakub Kicinski 430510156a7SJakub KicinskiThis sort of communication is utilized in UMH (User Mode Helper)-like 431510156a7SJakub Kicinskiscenarios when kernel needs to trigger user space processing or ask user 432510156a7SJakub Kicinskispace for a policy decision. 433510156a7SJakub Kicinski 434510156a7SJakub KicinskiMulticast notifications 435510156a7SJakub Kicinski----------------------- 436510156a7SJakub Kicinski 437510156a7SJakub KicinskiOne of the strengths of Netlink is the ability to send event notifications 438510156a7SJakub Kicinskito user space. This is a unidirectional form of communication (kernel -> 439510156a7SJakub Kicinskiuser) and does not involve any control messages like ``NLMSG_ERROR`` or 440510156a7SJakub Kicinski``NLMSG_DONE``. 441510156a7SJakub Kicinski 442510156a7SJakub KicinskiFor example the Generic Netlink family itself defines a set of multicast 443510156a7SJakub Kicinskinotifications about registered families. When a new family is added the 444510156a7SJakub Kicinskisockets subscribed to the notifications will get the following message:: 445510156a7SJakub Kicinski 446510156a7SJakub Kicinski struct nlmsghdr: 447510156a7SJakub Kicinski __u32 nlmsg_len: 136 448510156a7SJakub Kicinski __u16 nlmsg_type: GENL_ID_CTRL 449510156a7SJakub Kicinski __u16 nlmsg_flags: 0 450510156a7SJakub Kicinski __u32 nlmsg_seq: 0 451510156a7SJakub Kicinski __u32 nlmsg_pid: 0 452510156a7SJakub Kicinski 453510156a7SJakub Kicinski struct genlmsghdr: 454510156a7SJakub Kicinski __u8 cmd: CTRL_CMD_NEWFAMILY 455510156a7SJakub Kicinski __u8 version: 2 456510156a7SJakub Kicinski __u16 reserved: 0 457510156a7SJakub Kicinski 458510156a7SJakub Kicinski struct nlattr: 459510156a7SJakub Kicinski __u16 nla_len: 10 460510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_NAME 461510156a7SJakub Kicinski char data: test1\0 462510156a7SJakub Kicinski 463510156a7SJakub Kicinski (padding:) 464510156a7SJakub Kicinski data: \0\0 465510156a7SJakub Kicinski 466510156a7SJakub Kicinski struct nlattr: 467510156a7SJakub Kicinski __u16 nla_len: 6 468510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_ID 469510156a7SJakub Kicinski __u16: 123 /* The Family ID we are after */ 470510156a7SJakub Kicinski 471510156a7SJakub Kicinski (padding:) 472510156a7SJakub Kicinski char data: \0\0 473510156a7SJakub Kicinski 474510156a7SJakub Kicinski struct nlattr: 475510156a7SJakub Kicinski __u16 nla_len: 9 476510156a7SJakub Kicinski __u16 nla_type: CTRL_ATTR_FAMILY_VERSION 477510156a7SJakub Kicinski __u16: 1 478510156a7SJakub Kicinski 479510156a7SJakub Kicinski /* ... etc, more attributes will follow. */ 480510156a7SJakub Kicinski 481510156a7SJakub KicinskiThe notification contains the same information as the response 482510156a7SJakub Kicinskito the ``CTRL_CMD_GETFAMILY`` request. 483510156a7SJakub Kicinski 484510156a7SJakub KicinskiThe Netlink headers of the notification are mostly 0 and irrelevant. 485510156a7SJakub KicinskiThe :c:member:`nlmsghdr.nlmsg_seq` may be either zero or a monotonically 486510156a7SJakub Kicinskiincreasing notification sequence number maintained by the family. 487510156a7SJakub Kicinski 488510156a7SJakub KicinskiTo receive notifications the user socket must subscribe to the relevant 489510156a7SJakub Kicinskinotification group. Much like the Family ID, the Group ID for a given 490510156a7SJakub Kicinskimulticast group is dynamic and can be found inside the Family information. 491510156a7SJakub KicinskiThe ``CTRL_ATTR_MCAST_GROUPS`` attribute contains nests with names 492510156a7SJakub Kicinski(``CTRL_ATTR_MCAST_GRP_NAME``) and IDs (``CTRL_ATTR_MCAST_GRP_ID``) of 493510156a7SJakub Kicinskithe groups family. 494510156a7SJakub Kicinski 495510156a7SJakub KicinskiOnce the Group ID is known a setsockopt() call adds the socket to the group: 496510156a7SJakub Kicinski 497510156a7SJakub Kicinski.. code-block:: c 498510156a7SJakub Kicinski 499510156a7SJakub Kicinski unsigned int group_id; 500510156a7SJakub Kicinski 501510156a7SJakub Kicinski /* .. find the group ID... */ 502510156a7SJakub Kicinski 503510156a7SJakub Kicinski setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, 504510156a7SJakub Kicinski &group_id, sizeof(group_id)); 505510156a7SJakub Kicinski 506510156a7SJakub KicinskiThe socket will now receive notifications. 507510156a7SJakub Kicinski 508510156a7SJakub KicinskiIt is recommended to use separate sockets for receiving notifications 509510156a7SJakub Kicinskiand sending requests to the kernel. The asynchronous nature of notifications 510510156a7SJakub Kicinskimeans that they may get mixed in with the responses making the message 511510156a7SJakub Kicinskihandling much harder. 512510156a7SJakub Kicinski 513510156a7SJakub KicinskiBuffer sizing 514510156a7SJakub Kicinski------------- 515510156a7SJakub Kicinski 516510156a7SJakub KicinskiNetlink sockets are datagram sockets rather than stream sockets, 517510156a7SJakub Kicinskimeaning that each message must be received in its entirety by a single 518510156a7SJakub Kicinskirecv()/recvmsg() system call. If the buffer provided by the user is too 519510156a7SJakub Kicinskishort, the message will be truncated and the ``MSG_TRUNC`` flag set 520510156a7SJakub Kicinskiin struct msghdr (struct msghdr is the second argument 521510156a7SJakub Kicinskiof the recvmsg() system call, *not* a Netlink header). 522510156a7SJakub Kicinski 523510156a7SJakub KicinskiUpon truncation the remaining part of the message is discarded. 524510156a7SJakub Kicinski 525510156a7SJakub KicinskiNetlink expects that the user buffer will be at least 8kB or a page 526510156a7SJakub Kicinskisize of the CPU architecture, whichever is bigger. Particular Netlink 527510156a7SJakub Kicinskifamilies may, however, require a larger buffer. 32kB buffer is recommended 528510156a7SJakub Kicinskifor most efficient handling of dumps (larger buffer fits more dumped 529510156a7SJakub Kicinskiobjects and therefore fewer recvmsg() calls are needed). 530510156a7SJakub Kicinski 531*ee940b57SDonald Hunter.. _classic_netlink: 532*ee940b57SDonald Hunter 533510156a7SJakub KicinskiClassic Netlink 534510156a7SJakub Kicinski=============== 535510156a7SJakub Kicinski 536510156a7SJakub KicinskiThe main differences between Classic and Generic Netlink are the dynamic 537510156a7SJakub Kicinskiallocation of subsystem identifiers and availability of introspection. 538510156a7SJakub KicinskiIn theory the protocol does not differ significantly, however, in practice 539510156a7SJakub KicinskiClassic Netlink experimented with concepts which were abandoned in Generic 540510156a7SJakub KicinskiNetlink (really, they usually only found use in a small corner of a single 541510156a7SJakub Kicinskisubsystem). This section is meant as an explainer of a few of such concepts, 542510156a7SJakub Kicinskiwith the explicit goal of giving the Generic Netlink 543510156a7SJakub Kicinskiusers the confidence to ignore them when reading the uAPI headers. 544510156a7SJakub Kicinski 545510156a7SJakub KicinskiMost of the concepts and examples here refer to the ``NETLINK_ROUTE`` family, 546510156a7SJakub Kicinskiwhich covers much of the configuration of the Linux networking stack. 547510156a7SJakub KicinskiReal documentation of that family, deserves a chapter (or a book) of its own. 548510156a7SJakub Kicinski 549510156a7SJakub KicinskiFamilies 550510156a7SJakub Kicinski-------- 551510156a7SJakub Kicinski 552510156a7SJakub KicinskiNetlink refers to subsystems as families. This is a remnant of using 553510156a7SJakub Kicinskisockets and the concept of protocol families, which are part of message 554510156a7SJakub Kicinskidemultiplexing in ``NETLINK_ROUTE``. 555510156a7SJakub Kicinski 556510156a7SJakub KicinskiSadly every layer of encapsulation likes to refer to whatever it's carrying 557510156a7SJakub Kicinskias "families" making the term very confusing: 558510156a7SJakub Kicinski 559510156a7SJakub Kicinski 1. AF_NETLINK is a bona fide socket protocol family 560510156a7SJakub Kicinski 2. AF_NETLINK's documentation refers to what comes after its own 561510156a7SJakub Kicinski header (struct nlmsghdr) in a message as a "Family Header" 562510156a7SJakub Kicinski 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows 563510156a7SJakub Kicinski struct nlmsghdr), yet it also calls its users "Families". 564510156a7SJakub Kicinski 565510156a7SJakub KicinskiNote that the Generic Netlink Family IDs are in a different "ID space" 566510156a7SJakub Kicinskiand overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO`` 567510156a7SJakub Kicinskihas the Classic Netlink protocol ID of 21 which Generic Netlink will 568510156a7SJakub Kicinskihappily allocate to one of its families as well). 569510156a7SJakub Kicinski 570510156a7SJakub KicinskiStrict checking 571510156a7SJakub Kicinski--------------- 572510156a7SJakub Kicinski 573510156a7SJakub KicinskiThe ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking 574510156a7SJakub Kicinskiin ``NETLINK_ROUTE``. It was needed because historically kernel did not 575510156a7SJakub Kicinskivalidate the fields of structures it didn't process. This made it impossible 576510156a7SJakub Kicinskito start using those fields later without risking regressions in applications 577510156a7SJakub Kicinskiwhich initialized them incorrectly or not at all. 578510156a7SJakub Kicinski 579510156a7SJakub Kicinski``NETLINK_GET_STRICT_CHK`` declares that the application is initializing 580510156a7SJakub Kicinskiall fields correctly. It also opts into validating that message does not 581510156a7SJakub Kicinskicontain trailing data and requests that kernel rejects attributes with 582510156a7SJakub Kicinskitype higher than largest attribute type known to the kernel. 583510156a7SJakub Kicinski 584510156a7SJakub Kicinski``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``. 585510156a7SJakub Kicinski 586510156a7SJakub KicinskiUnknown attributes 587510156a7SJakub Kicinski------------------ 588510156a7SJakub Kicinski 589510156a7SJakub KicinskiHistorically Netlink ignored all unknown attributes. The thinking was that 590510156a7SJakub Kicinskiit would free the application from having to probe what kernel supports. 591510156a7SJakub KicinskiThe application could make a request to change the state and check which 592510156a7SJakub Kicinskiparts of the request "stuck". 593510156a7SJakub Kicinski 594510156a7SJakub KicinskiThis is no longer the case for new Generic Netlink families and those opting 595510156a7SJakub Kicinskiin to strict checking. See enum netlink_validation for validation types 596510156a7SJakub Kicinskiperformed. 597510156a7SJakub Kicinski 598510156a7SJakub KicinskiFixed metadata and structures 599510156a7SJakub Kicinski----------------------------- 600510156a7SJakub Kicinski 601510156a7SJakub KicinskiClassic Netlink made liberal use of fixed-format structures within 602510156a7SJakub Kicinskithe messages. Messages would commonly have a structure with 603510156a7SJakub Kicinskia considerable number of fields after struct nlmsghdr. It was also 604510156a7SJakub Kicinskicommon to put structures with multiple members inside attributes, 605510156a7SJakub Kicinskiwithout breaking each member into an attribute of its own. 606510156a7SJakub Kicinski 607510156a7SJakub KicinskiThis has caused problems with validation and extensibility and 608510156a7SJakub Kicinskitherefore using binary structures is actively discouraged for new 609510156a7SJakub Kicinskiattributes. 610510156a7SJakub Kicinski 611510156a7SJakub KicinskiRequest types 612510156a7SJakub Kicinski------------- 613510156a7SJakub Kicinski 614510156a7SJakub Kicinski``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``, 615510156a7SJakub Kicinskiand ``SET``. Each object can handle all or some of those requests 616510156a7SJakub Kicinski(objects being netdevs, routes, addresses, qdiscs etc.) Request type 617510156a7SJakub Kicinskiis defined by the 2 lowest bits of the message type, so commands for 618510156a7SJakub Kicinskinew objects would always be allocated with a stride of 4. 619510156a7SJakub Kicinski 620d56b699dSBjorn HelgaasEach object would also have its own fixed metadata shared by all request 621510156a7SJakub Kicinskitypes (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address 622510156a7SJakub Kicinskirequests, struct tcmsg for qdisc requests). 623510156a7SJakub Kicinski 624510156a7SJakub KicinskiEven though other protocols and Generic Netlink commands often use 625510156a7SJakub Kicinskithe same verbs in their message names (``GET``, ``SET``) the concept 626510156a7SJakub Kicinskiof request types did not find wider adoption. 627510156a7SJakub Kicinski 6285493a2adSJakub KicinskiNotification echo 6295493a2adSJakub Kicinski----------------- 630510156a7SJakub Kicinski 6315493a2adSJakub Kicinski``NLM_F_ECHO`` requests for notifications resulting from the request 6325493a2adSJakub Kicinskito be queued onto the requesting socket. This is useful to discover 6335493a2adSJakub Kicinskithe impact of the request. 634510156a7SJakub Kicinski 6355493a2adSJakub KicinskiNote that this feature is not universally implemented. 636510156a7SJakub Kicinski 6375493a2adSJakub KicinskiOther request-type-specific flags 6385493a2adSJakub Kicinski--------------------------------- 6395493a2adSJakub Kicinski 6405493a2adSJakub KicinskiClassic Netlink defined various flags for its ``GET``, ``NEW`` 6415493a2adSJakub Kicinskiand ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr. 6425493a2adSJakub KicinskiSince request types have not been generalized the request type specific 6435493a2adSJakub Kicinskiflags are rarely used (and considered deprecated for new families). 6445493a2adSJakub Kicinski 6455493a2adSJakub KicinskiFor ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into 6465493a2adSJakub Kicinski``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used. 6475493a2adSJakub Kicinski 6485493a2adSJakub KicinskiFor ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK`` 6495493a2adSJakub Kicinskionly by FDB some operations. 6505493a2adSJakub Kicinski 6515493a2adSJakub KicinskiThe flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately, 6525493a2adSJakub Kicinskithe meaning is not crystal clear. The following description is based on the 6535493a2adSJakub Kicinskibest guess of the intention of the authors, and in practice all families 6545493a2adSJakub Kicinskistray from it in one way or another. ``NLM_F_REPLACE`` asks to replace 6555493a2adSJakub Kicinskian existing object, if no matching object exists the operation should fail. 6565493a2adSJakub Kicinski``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already 6575493a2adSJakub Kicinskiexisted. 6585493a2adSJakub Kicinski``NLM_F_CREATE`` asks for the object to be created if it does not 6595493a2adSJakub Kicinskiexist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``. 6605493a2adSJakub Kicinski 6615493a2adSJakub KicinskiA comment in the main Netlink uAPI header states:: 6625493a2adSJakub Kicinski 6635493a2adSJakub Kicinski 4.4BSD ADD NLM_F_CREATE|NLM_F_EXCL 6645493a2adSJakub Kicinski 4.4BSD CHANGE NLM_F_REPLACE 6655493a2adSJakub Kicinski 6665493a2adSJakub Kicinski True CHANGE NLM_F_CREATE|NLM_F_REPLACE 6675493a2adSJakub Kicinski Append NLM_F_CREATE 6685493a2adSJakub Kicinski Check NLM_F_EXCL 6695493a2adSJakub Kicinski 6705493a2adSJakub Kicinskiwhich seems to indicate that those flags predate request types. 6715493a2adSJakub Kicinski``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead 6725493a2adSJakub Kicinskiof ``SET`` commands. 6735493a2adSJakub Kicinski``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists 6745493a2adSJakub Kicinskiwithout creating it, presumably predating ``GET`` commands. 6755493a2adSJakub Kicinski 6765493a2adSJakub Kicinski``NLM_F_APPEND`` indicates that if one key can have multiple objects associated 6775493a2adSJakub Kicinskiwith it (e.g. multiple next-hop objects for a route) the new object should be 6785493a2adSJakub Kicinskiadded to the list rather than replacing the entire list. 679510156a7SJakub Kicinski 680510156a7SJakub KicinskiuAPI reference 681510156a7SJakub Kicinski============== 682510156a7SJakub Kicinski 683510156a7SJakub Kicinski.. kernel-doc:: include/uapi/linux/netlink.h 684