1ea5bacaaSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2ea5bacaaSMauro Carvalho Chehab 3ea5bacaaSMauro Carvalho Chehab===================================================== 4ea5bacaaSMauro Carvalho ChehabNetdev features mess and how to get out from it alive 5ea5bacaaSMauro Carvalho Chehab===================================================== 6ea5bacaaSMauro Carvalho Chehab 7ea5bacaaSMauro Carvalho ChehabAuthor: 8ea5bacaaSMauro Carvalho Chehab Michał Mirosław <mirq-linux@rere.qmqm.pl> 9ea5bacaaSMauro Carvalho Chehab 10ea5bacaaSMauro Carvalho Chehab 11ea5bacaaSMauro Carvalho Chehab 12ea5bacaaSMauro Carvalho ChehabPart I: Feature sets 13ea5bacaaSMauro Carvalho Chehab==================== 14ea5bacaaSMauro Carvalho Chehab 15ea5bacaaSMauro Carvalho ChehabLong gone are the days when a network card would just take and give packets 16ea5bacaaSMauro Carvalho Chehabverbatim. Today's devices add multiple features and bugs (read: offloads) 17ea5bacaaSMauro Carvalho Chehabthat relieve an OS of various tasks like generating and checking checksums, 18ea5bacaaSMauro Carvalho Chehabsplitting packets, classifying them. Those capabilities and their state 19ea5bacaaSMauro Carvalho Chehabare commonly referred to as netdev features in Linux kernel world. 20ea5bacaaSMauro Carvalho Chehab 21ea5bacaaSMauro Carvalho ChehabThere are currently three sets of features relevant to the driver, and 22ea5bacaaSMauro Carvalho Chehabone used internally by network core: 23ea5bacaaSMauro Carvalho Chehab 24ea5bacaaSMauro Carvalho Chehab 1. netdev->hw_features set contains features whose state may possibly 25ea5bacaaSMauro Carvalho Chehab be changed (enabled or disabled) for a particular device by user's 26ea5bacaaSMauro Carvalho Chehab request. This set should be initialized in ndo_init callback and not 27ea5bacaaSMauro Carvalho Chehab changed later. 28ea5bacaaSMauro Carvalho Chehab 29ea5bacaaSMauro Carvalho Chehab 2. netdev->features set contains features which are currently enabled 30ea5bacaaSMauro Carvalho Chehab for a device. This should be changed only by network core or in 31ea5bacaaSMauro Carvalho Chehab error paths of ndo_set_features callback. 32ea5bacaaSMauro Carvalho Chehab 33ea5bacaaSMauro Carvalho Chehab 3. netdev->vlan_features set contains features whose state is inherited 34ea5bacaaSMauro Carvalho Chehab by child VLAN devices (limits netdev->features set). This is currently 35ea5bacaaSMauro Carvalho Chehab used for all VLAN devices whether tags are stripped or inserted in 36ea5bacaaSMauro Carvalho Chehab hardware or software. 37ea5bacaaSMauro Carvalho Chehab 38ea5bacaaSMauro Carvalho Chehab 4. netdev->wanted_features set contains feature set requested by user. 39ea5bacaaSMauro Carvalho Chehab This set is filtered by ndo_fix_features callback whenever it or 40ea5bacaaSMauro Carvalho Chehab some device-specific conditions change. This set is internal to 41ea5bacaaSMauro Carvalho Chehab networking core and should not be referenced in drivers. 42ea5bacaaSMauro Carvalho Chehab 43ea5bacaaSMauro Carvalho Chehab 44ea5bacaaSMauro Carvalho Chehab 45ea5bacaaSMauro Carvalho ChehabPart II: Controlling enabled features 46ea5bacaaSMauro Carvalho Chehab===================================== 47ea5bacaaSMauro Carvalho Chehab 48ea5bacaaSMauro Carvalho ChehabWhen current feature set (netdev->features) is to be changed, new set 49ea5bacaaSMauro Carvalho Chehabis calculated and filtered by calling ndo_fix_features callback 50ea5bacaaSMauro Carvalho Chehaband netdev_fix_features(). If the resulting set differs from current 51ea5bacaaSMauro Carvalho Chehabset, it is passed to ndo_set_features callback and (if the callback 52ea5bacaaSMauro Carvalho Chehabreturns success) replaces value stored in netdev->features. 53ea5bacaaSMauro Carvalho ChehabNETDEV_FEAT_CHANGE notification is issued after that whenever current 54ea5bacaaSMauro Carvalho Chehabset might have changed. 55ea5bacaaSMauro Carvalho Chehab 56ea5bacaaSMauro Carvalho ChehabThe following events trigger recalculation: 57ea5bacaaSMauro Carvalho Chehab 1. device's registration, after ndo_init returned success 58ea5bacaaSMauro Carvalho Chehab 2. user requested changes in features state 59ea5bacaaSMauro Carvalho Chehab 3. netdev_update_features() is called 60ea5bacaaSMauro Carvalho Chehab 61ea5bacaaSMauro Carvalho Chehabndo_*_features callbacks are called with rtnl_lock held. Missing callbacks 62ea5bacaaSMauro Carvalho Chehabare treated as always returning success. 63ea5bacaaSMauro Carvalho Chehab 64ea5bacaaSMauro Carvalho ChehabA driver that wants to trigger recalculation must do so by calling 65ea5bacaaSMauro Carvalho Chehabnetdev_update_features() while holding rtnl_lock. This should not be done 66ea5bacaaSMauro Carvalho Chehabfrom ndo_*_features callbacks. netdev->features should not be modified by 67ea5bacaaSMauro Carvalho Chehabdriver except by means of ndo_fix_features callback. 68ea5bacaaSMauro Carvalho Chehab 69ea5bacaaSMauro Carvalho Chehab 70ea5bacaaSMauro Carvalho Chehab 71ea5bacaaSMauro Carvalho ChehabPart III: Implementation hints 72ea5bacaaSMauro Carvalho Chehab============================== 73ea5bacaaSMauro Carvalho Chehab 74ea5bacaaSMauro Carvalho Chehab * ndo_fix_features: 75ea5bacaaSMauro Carvalho Chehab 76ea5bacaaSMauro Carvalho ChehabAll dependencies between features should be resolved here. The resulting 77ea5bacaaSMauro Carvalho Chehabset can be reduced further by networking core imposed limitations (as coded 78ea5bacaaSMauro Carvalho Chehabin netdev_fix_features()). For this reason it is safer to disable a feature 79ea5bacaaSMauro Carvalho Chehabwhen its dependencies are not met instead of forcing the dependency on. 80ea5bacaaSMauro Carvalho Chehab 81ea5bacaaSMauro Carvalho ChehabThis callback should not modify hardware nor driver state (should be 82ea5bacaaSMauro Carvalho Chehabstateless). It can be called multiple times between successive 83ea5bacaaSMauro Carvalho Chehabndo_set_features calls. 84ea5bacaaSMauro Carvalho Chehab 85ea5bacaaSMauro Carvalho ChehabCallback must not alter features contained in NETIF_F_SOFT_FEATURES or 86ea5bacaaSMauro Carvalho ChehabNETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but 87ea5bacaaSMauro Carvalho Chehabcare must be taken as the change won't affect already configured VLANs. 88ea5bacaaSMauro Carvalho Chehab 89ea5bacaaSMauro Carvalho Chehab * ndo_set_features: 90ea5bacaaSMauro Carvalho Chehab 91ea5bacaaSMauro Carvalho ChehabHardware should be reconfigured to match passed feature set. The set 92ea5bacaaSMauro Carvalho Chehabshould not be altered unless some error condition happens that can't 93ea5bacaaSMauro Carvalho Chehabbe reliably detected in ndo_fix_features. In this case, the callback 94ea5bacaaSMauro Carvalho Chehabshould update netdev->features to match resulting hardware state. 95ea5bacaaSMauro Carvalho ChehabErrors returned are not (and cannot be) propagated anywhere except dmesg. 96ea5bacaaSMauro Carvalho Chehab(Note: successful return is zero, >0 means silent error.) 97ea5bacaaSMauro Carvalho Chehab 98ea5bacaaSMauro Carvalho Chehab 99ea5bacaaSMauro Carvalho Chehab 100ea5bacaaSMauro Carvalho ChehabPart IV: Features 101ea5bacaaSMauro Carvalho Chehab================= 102ea5bacaaSMauro Carvalho Chehab 103ea5bacaaSMauro Carvalho ChehabFor current list of features, see include/linux/netdev_features.h. 104ea5bacaaSMauro Carvalho ChehabThis section describes semantics of some of them. 105ea5bacaaSMauro Carvalho Chehab 106ea5bacaaSMauro Carvalho Chehab * Transmit checksumming 107ea5bacaaSMauro Carvalho Chehab 108ea5bacaaSMauro Carvalho ChehabFor complete description, see comments near the top of include/linux/skbuff.h. 109ea5bacaaSMauro Carvalho Chehab 110ea5bacaaSMauro Carvalho ChehabNote: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. 111ea5bacaaSMauro Carvalho ChehabIt means that device can fill TCP/UDP-like checksum anywhere in the packets 112ea5bacaaSMauro Carvalho Chehabwhatever headers there might be. 113ea5bacaaSMauro Carvalho Chehab 114ea5bacaaSMauro Carvalho Chehab * Transmit TCP segmentation offload 115ea5bacaaSMauro Carvalho Chehab 116ea5bacaaSMauro Carvalho ChehabNETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit 117ea5bacaaSMauro Carvalho Chehabset, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). 118ea5bacaaSMauro Carvalho Chehab 119ea5bacaaSMauro Carvalho Chehab * Transmit UDP segmentation offload 120ea5bacaaSMauro Carvalho Chehab 121ea5bacaaSMauro Carvalho ChehabNETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds 122ea5bacaaSMauro Carvalho Chehabgso_size. On segmentation, it segments the payload on gso_size boundaries and 123ea5bacaaSMauro Carvalho Chehabreplicates the network and UDP headers (fixing up the last one if less than 124ea5bacaaSMauro Carvalho Chehabgso_size). 125ea5bacaaSMauro Carvalho Chehab 126ea5bacaaSMauro Carvalho Chehab * Transmit DMA from high memory 127ea5bacaaSMauro Carvalho Chehab 128ea5bacaaSMauro Carvalho ChehabOn platforms where this is relevant, NETIF_F_HIGHDMA signals that 129ea5bacaaSMauro Carvalho Chehabndo_start_xmit can handle skbs with frags in high memory. 130ea5bacaaSMauro Carvalho Chehab 131ea5bacaaSMauro Carvalho Chehab * Transmit scatter-gather 132ea5bacaaSMauro Carvalho Chehab 133ea5bacaaSMauro Carvalho ChehabThose features say that ndo_start_xmit can handle fragmented skbs: 134ea5bacaaSMauro Carvalho ChehabNETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- 135ea5bacaaSMauro Carvalho Chehabchained skbs (skb->next/prev list). 136ea5bacaaSMauro Carvalho Chehab 137ea5bacaaSMauro Carvalho Chehab * Software features 138ea5bacaaSMauro Carvalho Chehab 139ea5bacaaSMauro Carvalho ChehabFeatures contained in NETIF_F_SOFT_FEATURES are features of networking 140ea5bacaaSMauro Carvalho Chehabstack. Driver should not change behaviour based on them. 141ea5bacaaSMauro Carvalho Chehab 142ea5bacaaSMauro Carvalho Chehab * LLTX driver (deprecated for hardware drivers) 143ea5bacaaSMauro Carvalho Chehab 144ea5bacaaSMauro Carvalho ChehabNETIF_F_LLTX is meant to be used by drivers that don't need locking at all, 145ea5bacaaSMauro Carvalho Chehabe.g. software tunnels. 146ea5bacaaSMauro Carvalho Chehab 147ea5bacaaSMauro Carvalho ChehabThis is also used in a few legacy drivers that implement their 148ea5bacaaSMauro Carvalho Chehabown locking, don't use it for new (hardware) drivers. 149ea5bacaaSMauro Carvalho Chehab 150ea5bacaaSMauro Carvalho Chehab * netns-local device 151ea5bacaaSMauro Carvalho Chehab 152ea5bacaaSMauro Carvalho ChehabNETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between 153ea5bacaaSMauro Carvalho Chehabnetwork namespaces (e.g. loopback). 154ea5bacaaSMauro Carvalho Chehab 155ea5bacaaSMauro Carvalho ChehabDon't use it in drivers. 156ea5bacaaSMauro Carvalho Chehab 157ea5bacaaSMauro Carvalho Chehab * VLAN challenged 158ea5bacaaSMauro Carvalho Chehab 159ea5bacaaSMauro Carvalho ChehabNETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN 160ea5bacaaSMauro Carvalho Chehabheaders. Some drivers set this because the cards can't handle the bigger MTU. 161ea5bacaaSMauro Carvalho Chehab[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU 162ea5bacaaSMauro Carvalho ChehabVLANs. This may be not useful, though.] 163ea5bacaaSMauro Carvalho Chehab 164ea5bacaaSMauro Carvalho Chehab* rx-fcs 165ea5bacaaSMauro Carvalho Chehab 166ea5bacaaSMauro Carvalho ChehabThis requests that the NIC append the Ethernet Frame Checksum (FCS) 167ea5bacaaSMauro Carvalho Chehabto the end of the skb data. This allows sniffers and other tools to 168ea5bacaaSMauro Carvalho Chehabread the CRC recorded by the NIC on receipt of the packet. 169ea5bacaaSMauro Carvalho Chehab 170ea5bacaaSMauro Carvalho Chehab* rx-all 171ea5bacaaSMauro Carvalho Chehab 172ea5bacaaSMauro Carvalho ChehabThis requests that the NIC receive all possible frames, including errored 173ea5bacaaSMauro Carvalho Chehabframes (such as bad FCS, etc). This can be helpful when sniffing a link with 174ea5bacaaSMauro Carvalho Chehabbad packets on it. Some NICs may receive more packets if also put into normal 175ea5bacaaSMauro Carvalho ChehabPROMISC mode. 176ea5bacaaSMauro Carvalho Chehab 177ea5bacaaSMauro Carvalho Chehab* rx-gro-hw 178ea5bacaaSMauro Carvalho Chehab 179ea5bacaaSMauro Carvalho ChehabThis requests that the NIC enables Hardware GRO (generic receive offload). 180ea5bacaaSMauro Carvalho ChehabHardware GRO is basically the exact reverse of TSO, and is generally 181ea5bacaaSMauro Carvalho Chehabstricter than Hardware LRO. A packet stream merged by Hardware GRO must 182ea5bacaaSMauro Carvalho Chehabbe re-segmentable by GSO or TSO back to the exact original packet stream. 183ea5bacaaSMauro Carvalho ChehabHardware GRO is dependent on RXCSUM since every packet successfully merged 184ea5bacaaSMauro Carvalho Chehabby hardware must also have the checksum verified by hardware. 185*dcf0cd1cSGeorge McCollister 186*dcf0cd1cSGeorge McCollister* hsr-tag-ins-offload 187*dcf0cd1cSGeorge McCollister 188*dcf0cd1cSGeorge McCollisterThis should be set for devices which insert an HSR (High-availability Seamless 189*dcf0cd1cSGeorge McCollisterRedundancy) or PRP (Parallel Redundancy Protocol) tag automatically. 190*dcf0cd1cSGeorge McCollister 191*dcf0cd1cSGeorge McCollister* hsr-tag-rm-offload 192*dcf0cd1cSGeorge McCollister 193*dcf0cd1cSGeorge McCollisterThis should be set for devices which remove HSR (High-availability Seamless 194*dcf0cd1cSGeorge McCollisterRedundancy) or PRP (Parallel Redundancy Protocol) tags automatically. 195*dcf0cd1cSGeorge McCollister 196*dcf0cd1cSGeorge McCollister* hsr-fwd-offload 197*dcf0cd1cSGeorge McCollister 198*dcf0cd1cSGeorge McCollisterThis should be set for devices which forward HSR (High-availability Seamless 199*dcf0cd1cSGeorge McCollisterRedundancy) frames from one port to another in hardware. 200*dcf0cd1cSGeorge McCollister 201*dcf0cd1cSGeorge McCollister* hsr-dup-offload 202*dcf0cd1cSGeorge McCollister 203*dcf0cd1cSGeorge McCollisterThis should be set for devices which duplicate outgoing HSR (High-availability 204*dcf0cd1cSGeorge McCollisterSeamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically 205*dcf0cd1cSGeorge McCollisterframes in hardware. 206