xref: /openbmc/linux/Documentation/networking/netdev-features.rst (revision d0034a7a4ac7fae708146ac0059b9c47a1543f0d)
1ea5bacaaSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2ea5bacaaSMauro Carvalho Chehab
3ea5bacaaSMauro Carvalho Chehab=====================================================
4ea5bacaaSMauro Carvalho ChehabNetdev features mess and how to get out from it alive
5ea5bacaaSMauro Carvalho Chehab=====================================================
6ea5bacaaSMauro Carvalho Chehab
7ea5bacaaSMauro Carvalho ChehabAuthor:
8ea5bacaaSMauro Carvalho Chehab	Michał Mirosław <mirq-linux@rere.qmqm.pl>
9ea5bacaaSMauro Carvalho Chehab
10ea5bacaaSMauro Carvalho Chehab
11ea5bacaaSMauro Carvalho Chehab
12ea5bacaaSMauro Carvalho ChehabPart I: Feature sets
13ea5bacaaSMauro Carvalho Chehab====================
14ea5bacaaSMauro Carvalho Chehab
15ea5bacaaSMauro Carvalho ChehabLong gone are the days when a network card would just take and give packets
16ea5bacaaSMauro Carvalho Chehabverbatim.  Today's devices add multiple features and bugs (read: offloads)
17ea5bacaaSMauro Carvalho Chehabthat relieve an OS of various tasks like generating and checking checksums,
18ea5bacaaSMauro Carvalho Chehabsplitting packets, classifying them.  Those capabilities and their state
19ea5bacaaSMauro Carvalho Chehabare commonly referred to as netdev features in Linux kernel world.
20ea5bacaaSMauro Carvalho Chehab
21ea5bacaaSMauro Carvalho ChehabThere are currently three sets of features relevant to the driver, and
22ea5bacaaSMauro Carvalho Chehabone used internally by network core:
23ea5bacaaSMauro Carvalho Chehab
24ea5bacaaSMauro Carvalho Chehab 1. netdev->hw_features set contains features whose state may possibly
25ea5bacaaSMauro Carvalho Chehab    be changed (enabled or disabled) for a particular device by user's
26ea5bacaaSMauro Carvalho Chehab    request.  This set should be initialized in ndo_init callback and not
27ea5bacaaSMauro Carvalho Chehab    changed later.
28ea5bacaaSMauro Carvalho Chehab
29ea5bacaaSMauro Carvalho Chehab 2. netdev->features set contains features which are currently enabled
30ea5bacaaSMauro Carvalho Chehab    for a device.  This should be changed only by network core or in
31ea5bacaaSMauro Carvalho Chehab    error paths of ndo_set_features callback.
32ea5bacaaSMauro Carvalho Chehab
33ea5bacaaSMauro Carvalho Chehab 3. netdev->vlan_features set contains features whose state is inherited
34ea5bacaaSMauro Carvalho Chehab    by child VLAN devices (limits netdev->features set).  This is currently
35ea5bacaaSMauro Carvalho Chehab    used for all VLAN devices whether tags are stripped or inserted in
36ea5bacaaSMauro Carvalho Chehab    hardware or software.
37ea5bacaaSMauro Carvalho Chehab
38ea5bacaaSMauro Carvalho Chehab 4. netdev->wanted_features set contains feature set requested by user.
39ea5bacaaSMauro Carvalho Chehab    This set is filtered by ndo_fix_features callback whenever it or
40ea5bacaaSMauro Carvalho Chehab    some device-specific conditions change. This set is internal to
41ea5bacaaSMauro Carvalho Chehab    networking core and should not be referenced in drivers.
42ea5bacaaSMauro Carvalho Chehab
43ea5bacaaSMauro Carvalho Chehab
44ea5bacaaSMauro Carvalho Chehab
45ea5bacaaSMauro Carvalho ChehabPart II: Controlling enabled features
46ea5bacaaSMauro Carvalho Chehab=====================================
47ea5bacaaSMauro Carvalho Chehab
48ea5bacaaSMauro Carvalho ChehabWhen current feature set (netdev->features) is to be changed, new set
49ea5bacaaSMauro Carvalho Chehabis calculated and filtered by calling ndo_fix_features callback
50ea5bacaaSMauro Carvalho Chehaband netdev_fix_features(). If the resulting set differs from current
51ea5bacaaSMauro Carvalho Chehabset, it is passed to ndo_set_features callback and (if the callback
52ea5bacaaSMauro Carvalho Chehabreturns success) replaces value stored in netdev->features.
53ea5bacaaSMauro Carvalho ChehabNETDEV_FEAT_CHANGE notification is issued after that whenever current
54ea5bacaaSMauro Carvalho Chehabset might have changed.
55ea5bacaaSMauro Carvalho Chehab
56ea5bacaaSMauro Carvalho ChehabThe following events trigger recalculation:
57ea5bacaaSMauro Carvalho Chehab 1. device's registration, after ndo_init returned success
58ea5bacaaSMauro Carvalho Chehab 2. user requested changes in features state
59ea5bacaaSMauro Carvalho Chehab 3. netdev_update_features() is called
60ea5bacaaSMauro Carvalho Chehab
61ea5bacaaSMauro Carvalho Chehabndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
62ea5bacaaSMauro Carvalho Chehabare treated as always returning success.
63ea5bacaaSMauro Carvalho Chehab
64ea5bacaaSMauro Carvalho ChehabA driver that wants to trigger recalculation must do so by calling
65ea5bacaaSMauro Carvalho Chehabnetdev_update_features() while holding rtnl_lock. This should not be done
66ea5bacaaSMauro Carvalho Chehabfrom ndo_*_features callbacks. netdev->features should not be modified by
67ea5bacaaSMauro Carvalho Chehabdriver except by means of ndo_fix_features callback.
68ea5bacaaSMauro Carvalho Chehab
69ea5bacaaSMauro Carvalho Chehab
70ea5bacaaSMauro Carvalho Chehab
71ea5bacaaSMauro Carvalho ChehabPart III: Implementation hints
72ea5bacaaSMauro Carvalho Chehab==============================
73ea5bacaaSMauro Carvalho Chehab
74ea5bacaaSMauro Carvalho Chehab * ndo_fix_features:
75ea5bacaaSMauro Carvalho Chehab
76ea5bacaaSMauro Carvalho ChehabAll dependencies between features should be resolved here. The resulting
77ea5bacaaSMauro Carvalho Chehabset can be reduced further by networking core imposed limitations (as coded
78ea5bacaaSMauro Carvalho Chehabin netdev_fix_features()). For this reason it is safer to disable a feature
79ea5bacaaSMauro Carvalho Chehabwhen its dependencies are not met instead of forcing the dependency on.
80ea5bacaaSMauro Carvalho Chehab
81ea5bacaaSMauro Carvalho ChehabThis callback should not modify hardware nor driver state (should be
82ea5bacaaSMauro Carvalho Chehabstateless).  It can be called multiple times between successive
83ea5bacaaSMauro Carvalho Chehabndo_set_features calls.
84ea5bacaaSMauro Carvalho Chehab
85ea5bacaaSMauro Carvalho ChehabCallback must not alter features contained in NETIF_F_SOFT_FEATURES or
86ea5bacaaSMauro Carvalho ChehabNETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
87ea5bacaaSMauro Carvalho Chehabcare must be taken as the change won't affect already configured VLANs.
88ea5bacaaSMauro Carvalho Chehab
89ea5bacaaSMauro Carvalho Chehab * ndo_set_features:
90ea5bacaaSMauro Carvalho Chehab
91ea5bacaaSMauro Carvalho ChehabHardware should be reconfigured to match passed feature set. The set
92ea5bacaaSMauro Carvalho Chehabshould not be altered unless some error condition happens that can't
93ea5bacaaSMauro Carvalho Chehabbe reliably detected in ndo_fix_features. In this case, the callback
94ea5bacaaSMauro Carvalho Chehabshould update netdev->features to match resulting hardware state.
95ea5bacaaSMauro Carvalho ChehabErrors returned are not (and cannot be) propagated anywhere except dmesg.
96ea5bacaaSMauro Carvalho Chehab(Note: successful return is zero, >0 means silent error.)
97ea5bacaaSMauro Carvalho Chehab
98ea5bacaaSMauro Carvalho Chehab
99ea5bacaaSMauro Carvalho Chehab
100ea5bacaaSMauro Carvalho ChehabPart IV: Features
101ea5bacaaSMauro Carvalho Chehab=================
102ea5bacaaSMauro Carvalho Chehab
103ea5bacaaSMauro Carvalho ChehabFor current list of features, see include/linux/netdev_features.h.
104ea5bacaaSMauro Carvalho ChehabThis section describes semantics of some of them.
105ea5bacaaSMauro Carvalho Chehab
106ea5bacaaSMauro Carvalho Chehab * Transmit checksumming
107ea5bacaaSMauro Carvalho Chehab
108ea5bacaaSMauro Carvalho ChehabFor complete description, see comments near the top of include/linux/skbuff.h.
109ea5bacaaSMauro Carvalho Chehab
110ea5bacaaSMauro Carvalho ChehabNote: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
111ea5bacaaSMauro Carvalho ChehabIt means that device can fill TCP/UDP-like checksum anywhere in the packets
112ea5bacaaSMauro Carvalho Chehabwhatever headers there might be.
113ea5bacaaSMauro Carvalho Chehab
114ea5bacaaSMauro Carvalho Chehab * Transmit TCP segmentation offload
115ea5bacaaSMauro Carvalho Chehab
116ea5bacaaSMauro Carvalho ChehabNETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
117ea5bacaaSMauro Carvalho Chehabset, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
118ea5bacaaSMauro Carvalho Chehab
119ea5bacaaSMauro Carvalho Chehab * Transmit UDP segmentation offload
120ea5bacaaSMauro Carvalho Chehab
121ea5bacaaSMauro Carvalho ChehabNETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds
122ea5bacaaSMauro Carvalho Chehabgso_size. On segmentation, it segments the payload on gso_size boundaries and
123ea5bacaaSMauro Carvalho Chehabreplicates the network and UDP headers (fixing up the last one if less than
124ea5bacaaSMauro Carvalho Chehabgso_size).
125ea5bacaaSMauro Carvalho Chehab
126ea5bacaaSMauro Carvalho Chehab * Transmit DMA from high memory
127ea5bacaaSMauro Carvalho Chehab
128ea5bacaaSMauro Carvalho ChehabOn platforms where this is relevant, NETIF_F_HIGHDMA signals that
129ea5bacaaSMauro Carvalho Chehabndo_start_xmit can handle skbs with frags in high memory.
130ea5bacaaSMauro Carvalho Chehab
131ea5bacaaSMauro Carvalho Chehab * Transmit scatter-gather
132ea5bacaaSMauro Carvalho Chehab
133ea5bacaaSMauro Carvalho ChehabThose features say that ndo_start_xmit can handle fragmented skbs:
134ea5bacaaSMauro Carvalho ChehabNETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
135ea5bacaaSMauro Carvalho Chehabchained skbs (skb->next/prev list).
136ea5bacaaSMauro Carvalho Chehab
137ea5bacaaSMauro Carvalho Chehab * Software features
138ea5bacaaSMauro Carvalho Chehab
139ea5bacaaSMauro Carvalho ChehabFeatures contained in NETIF_F_SOFT_FEATURES are features of networking
140ea5bacaaSMauro Carvalho Chehabstack. Driver should not change behaviour based on them.
141ea5bacaaSMauro Carvalho Chehab
142ea5bacaaSMauro Carvalho Chehab * LLTX driver (deprecated for hardware drivers)
143ea5bacaaSMauro Carvalho Chehab
144ea5bacaaSMauro Carvalho ChehabNETIF_F_LLTX is meant to be used by drivers that don't need locking at all,
145ea5bacaaSMauro Carvalho Chehabe.g. software tunnels.
146ea5bacaaSMauro Carvalho Chehab
147ea5bacaaSMauro Carvalho ChehabThis is also used in a few legacy drivers that implement their
148ea5bacaaSMauro Carvalho Chehabown locking, don't use it for new (hardware) drivers.
149ea5bacaaSMauro Carvalho Chehab
150ea5bacaaSMauro Carvalho Chehab * netns-local device
151ea5bacaaSMauro Carvalho Chehab
152ea5bacaaSMauro Carvalho ChehabNETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
153ea5bacaaSMauro Carvalho Chehabnetwork namespaces (e.g. loopback).
154ea5bacaaSMauro Carvalho Chehab
155ea5bacaaSMauro Carvalho ChehabDon't use it in drivers.
156ea5bacaaSMauro Carvalho Chehab
157ea5bacaaSMauro Carvalho Chehab * VLAN challenged
158ea5bacaaSMauro Carvalho Chehab
159ea5bacaaSMauro Carvalho ChehabNETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
160ea5bacaaSMauro Carvalho Chehabheaders. Some drivers set this because the cards can't handle the bigger MTU.
161ea5bacaaSMauro Carvalho Chehab[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
162ea5bacaaSMauro Carvalho ChehabVLANs. This may be not useful, though.]
163ea5bacaaSMauro Carvalho Chehab
164ea5bacaaSMauro Carvalho Chehab*  rx-fcs
165ea5bacaaSMauro Carvalho Chehab
166ea5bacaaSMauro Carvalho ChehabThis requests that the NIC append the Ethernet Frame Checksum (FCS)
167ea5bacaaSMauro Carvalho Chehabto the end of the skb data.  This allows sniffers and other tools to
168ea5bacaaSMauro Carvalho Chehabread the CRC recorded by the NIC on receipt of the packet.
169ea5bacaaSMauro Carvalho Chehab
170ea5bacaaSMauro Carvalho Chehab*  rx-all
171ea5bacaaSMauro Carvalho Chehab
172ea5bacaaSMauro Carvalho ChehabThis requests that the NIC receive all possible frames, including errored
173ea5bacaaSMauro Carvalho Chehabframes (such as bad FCS, etc).  This can be helpful when sniffing a link with
174ea5bacaaSMauro Carvalho Chehabbad packets on it.  Some NICs may receive more packets if also put into normal
175ea5bacaaSMauro Carvalho ChehabPROMISC mode.
176ea5bacaaSMauro Carvalho Chehab
177ea5bacaaSMauro Carvalho Chehab*  rx-gro-hw
178ea5bacaaSMauro Carvalho Chehab
179ea5bacaaSMauro Carvalho ChehabThis requests that the NIC enables Hardware GRO (generic receive offload).
180ea5bacaaSMauro Carvalho ChehabHardware GRO is basically the exact reverse of TSO, and is generally
181ea5bacaaSMauro Carvalho Chehabstricter than Hardware LRO.  A packet stream merged by Hardware GRO must
182ea5bacaaSMauro Carvalho Chehabbe re-segmentable by GSO or TSO back to the exact original packet stream.
183ea5bacaaSMauro Carvalho ChehabHardware GRO is dependent on RXCSUM since every packet successfully merged
184ea5bacaaSMauro Carvalho Chehabby hardware must also have the checksum verified by hardware.
185*dcf0cd1cSGeorge McCollister
186*dcf0cd1cSGeorge McCollister* hsr-tag-ins-offload
187*dcf0cd1cSGeorge McCollister
188*dcf0cd1cSGeorge McCollisterThis should be set for devices which insert an HSR (High-availability Seamless
189*dcf0cd1cSGeorge McCollisterRedundancy) or PRP (Parallel Redundancy Protocol) tag automatically.
190*dcf0cd1cSGeorge McCollister
191*dcf0cd1cSGeorge McCollister* hsr-tag-rm-offload
192*dcf0cd1cSGeorge McCollister
193*dcf0cd1cSGeorge McCollisterThis should be set for devices which remove HSR (High-availability Seamless
194*dcf0cd1cSGeorge McCollisterRedundancy) or PRP (Parallel Redundancy Protocol) tags automatically.
195*dcf0cd1cSGeorge McCollister
196*dcf0cd1cSGeorge McCollister* hsr-fwd-offload
197*dcf0cd1cSGeorge McCollister
198*dcf0cd1cSGeorge McCollisterThis should be set for devices which forward HSR (High-availability Seamless
199*dcf0cd1cSGeorge McCollisterRedundancy) frames from one port to another in hardware.
200*dcf0cd1cSGeorge McCollister
201*dcf0cd1cSGeorge McCollister* hsr-dup-offload
202*dcf0cd1cSGeorge McCollister
203*dcf0cd1cSGeorge McCollisterThis should be set for devices which duplicate outgoing HSR (High-availability
204*dcf0cd1cSGeorge McCollisterSeamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically
205*dcf0cd1cSGeorge McCollisterframes in hardware.
206