#
957e2235 |
| 03-Aug-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let driver
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded"), we started having Ethernet switch drivers calling directly into a function exported by net/bridge/br_switchdev.c, which is a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the bridge before, like cpsw and am65-cpsw, now do - otherwise it is not possible to call a symbol exported by a driver that can be built as module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit b0e81817629a ("net: build all switchdev drivers as modules when the bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the | build :( | | In case, of ^cpsw the switchdev mode is kinda optional and in many | cases (especially for testing purposes, NFS) the multi-mac mode is | still preferable mode. | | There were no such tight dependency between switchdev drivers and | bridge core before and switchdev serviced as independent, notification | based layer between them, so ^cpsw still can be "Y" and bridge can be | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE | will need to be set as "Y", or we will have to update drivers to | support build with BRIDGE=n and maintain separate builds for | networking vs non-networking testing. But is this enough? Wouldn't | it cause 'chain reaction' required to add more and more "Y" options | (like CONFIG_VLAN_8021Q)? | | PS. Just to be sure we on the same page - ARM builds will be forced | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our | automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to avoid dependencies between switchdev drivers and the bridge, and have the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the build time dependency for Ethernet switch drivers where the switchdev support is optional, like cpsw/am65-cpsw, and this made the drivers follow the bridge (compile as module if the bridge is a module) only if the optional switchdev support in the driver was enabled in the first place: https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
but this still did not solve the fact that cpsw and am65-cpsw now must be built as modules when the bridge is a module - it just expressed correctly that optional dependency. But the new behavior is an apparent regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in, NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we need a framework that can handle the possible absence of the bridge from the running system, i.e. runtime bloatware as opposed to build-time bloatware.
Luckily we already have this framework, since switchdev has been using it extensively. Events from the bridge side are transmitted to the driver side using notifier chains - this was originally done so that unrelated drivers could snoop for events emitted by the bridge towards ports that are implemented by other drivers (think of a switch driver with LAG offload that listens for switchdev events on a bonding/team interface that it offloads).
There are also events which are transmitted from the driver side to the bridge side, which again are modeled using notifiers. SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with notifying the bridge that a MAC address has been dynamically learned. So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work that the bridge needs to do when a port becomes offloaded is blocking in its nature: replay VLANs, MDBs etc. The calling context is indeed blocking (we are under rtnl_mutex), but the existing switchdev notification chain that the bridge is subscribed to is only the atomic one. So we need to subscribe the bridge to the blocking switchdev notification chain too.
This patch: - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload unchanged - moves the implementation of switchdev_bridge_port_{,un}offload from the bridge module into the switchdev module. - makes everybody that is subscribed to the switchdev blocking notifier chain "hear" offload & unoffload events - makes the bridge driver subscribe and handle those events - moves the bridge driver's handling of those events into 2 new functions called br_switchdev_port_{,un}offload. These functions contain in fact the core of the logic that was previously in switchdev_bridge_port_{,un}offload, just that now we go through an extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure used to carry the bridge port information, struct switchdev_notifier_brport_info, does not contain a "bool handled". This is because in the current usage pattern, we always know that a switchdev bridge port offloading event will be handled by the bridge, because the switchdev_bridge_port_offload() call was initiated by a NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3bdc7066 |
| 28-Jul-2021 |
David S. Miller <davem@davemloft.net> |
Merge branch 'devlink-register'
Leon Romanovsky says:
==================== Remove duplicated devlink registration check
Changelog: v1: * Added two new patches that remove registration field from
Merge branch 'devlink-register'
Leon Romanovsky says:
==================== Remove duplicated devlink registration check
Changelog: v1: * Added two new patches that remove registration field from mlx5 and ti drivers. v0: https://lore.kernel.org/lkml/ed7bbb1e4c51dd58e6035a058e93d16f883b09ce.1627215829.git.leonro@nvidia.com
--------------------------------------------------------------------
Both registered flag and devlink pointer are set at the same time and indicate the same thing - devlink/devlink_port are ready. Instead of checking ->registered use devlink pointer as an indication. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
acf34954 |
| 28-Jul-2021 |
Leon Romanovsky <leonro@nvidia.com> |
net: ti: am65-cpsw-nuss: fix wrong devlink release order
The commit that introduced devlink support released devlink resources in wrong order, that made an unwind flow to be asymmetrical. In additio
net: ti: am65-cpsw-nuss: fix wrong devlink release order
The commit that introduced devlink support released devlink resources in wrong order, that made an unwind flow to be asymmetrical. In addition, the am65-cpsw-nuss used internal to devlink core field - registered.
In order to fix the unwind flow and remove such access to the registered field, rewrite the code to call devlink_port_unregister only on registered ports.
Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7c57706b |
| 27-Jul-2021 |
David S. Miller <davem@davemloft.net> |
Merge branch 'ndo_ioctl-rework'
Arnd Bergmann says:
==================== ndo_ioctl rework
This series is a follow-up to the series for removing compat_alloc_user_space() and copy_in_user() that ha
Merge branch 'ndo_ioctl-rework'
Arnd Bergmann says:
==================== ndo_ioctl rework
This series is a follow-up to the series for removing compat_alloc_user_space() and copy_in_user() that has now been merged.
I wanted to be sure I address all the ways that 'struct ifreq' is used in device drivers through .ndo_do_ioctl, originally to prove that my approach of changing the struct definition was correct, but then I discarded that approach and went on anyway.
Roughly, the contents here are:
- split out all the users of SIOCDEVPRIVATE ioctls into a separate ndo_siocdevprivate callback, to better see what gets used where
- fix compat handling for those drivers that pass data directly inside of 'ifreq' rather than using an indirect ifr_data pointer
- remove unreachable code in ndo_ioctl handlers that relies on command codes we never pass into that, in particular for wireless drivers
- split out the ethernet specific ioctls into yet another ndo_eth_ioctl callback, as these are by far the most common use of ndo_do_ioctl today. I considered splitting them further into MII and timestamp controls, but went with the simpler change for now.
- split out bonding and wandev ioctls into separate helpers
- rework the bridge handling with a separate callback
At this point, only a few oddball things remain in ndo_do_ioctl: appletalk and ieee802154 pass down SIOCSIFADDR/SIOCGIFADDR and some wireless drivers have completely dead code.
I have thoroughly compile tested this on randconfig builds, but not done any notable runtime testing, so please review. All of it is also available as part of a larger branch at
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git \ compat-alloc-user-space-12
Changes since v2: - rebase to net-next - fix qeth regression - Cc driver maintainers for each patch and in cover letter
Changes since v1:
- rebase to linux-5.14-rc2 - add conversion for ndo_siowandev, bridge and bonding drivers - leave broken wifi drivers untouched for now
Link: https://lore.kernel.org/netdev/20201106221743.3271965-14-arnd@kernel.org/ ====================
show more ...
|
#
a7605370 |
| 27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SI
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
This is a purely cosmetic change intended to help readers find their way through the implementation.
Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ca31fef1 |
| 27-Jul-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Backmerge remote-tracking branch 'drm/drm-next' into drm-misc-next
Required bump from v5.13-rc3 to v5.14-rc3, and to pick up sysfb compilation fixes.
Signed-off-by: Maarten Lankhorst <maarten.lankh
Backmerge remote-tracking branch 'drm/drm-next' into drm-misc-next
Required bump from v5.13-rc3 to v5.14-rc3, and to pick up sysfb compilation fixes.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
show more ...
|
#
353b7a55 |
| 27-Jul-2021 |
Tony Lindgren <tony@atomide.com> |
Merge branch 'fixes-v5.14' into fixes
|
Revision tags: v5.10.53 |
|
#
356ae88f |
| 23-Jul-2021 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bridge-tx-fwd'
Vladimir Oltean says:
==================== Allow TX forwarding for the software bridge data path to be offloaded to capable devices
On RX, switchdev drivers have the a
Merge branch 'bridge-tx-fwd'
Vladimir Oltean says:
==================== Allow TX forwarding for the software bridge data path to be offloaded to capable devices
On RX, switchdev drivers have the ability to mark packets for the software bridge as "already forwarded in hardware" via skb->offload_fwd_mark. This instructs the nbp_switchdev_allowed_egress() function to perform software forwarding of that packet only to the bridge ports that are not in the same hardware domain as the source packet.
This series expands the concept for TX, in the sense that we can trust the accelerator to: (a) look up its FDB (which is more or less in sync with the software bridge FDB) for selecting the destination ports for a packet (b) replicate the frame in hardware in case it's a multicast/broadcast, instead of the software bridge having to clone it and send the clones to each net device one at a time. This reduces the bandwidth needed between the CPU and the accelerator, as well as the CPU time spent.
This is done by augmenting nbp_switchdev_allowed_egress() to also exclude the bridge ports which have the tx_fwd_offload capability if the skb has already been transmitted to one port from their hardware domain.
Even though in reality, the software bridge still technically looks up the FDB/MDB for every frame, but all skb clones are suppressed, this offload specifically requires that the switchdev accelerator looks up its FDB/MDB again. It is intended to be used to inject "data plane packets" into the hardware as opposed to "control plane packets" which target a precise destination port.
Towards that goal, the bridge always provides the TX packets with skb->offload_fwd_mark = true with the VLAN tag always present, so that the accelerator can forward according to that VLAN broadcast domain.
This work is not intended to cater to switches which can inject control plane packets to a bit mask of destination ports. I see that as a more difficult task to accomplish with potentially less benefits (it provides only replication offload). The reason it is more difficult is that struct skb_buff would probably need to be extended to contain a list of struct net_devices that the packet must be replicated to. Sending data plane packets avoids that issue by keeping the hardware and software FDB more or less in sync and looking it up twice.
Additionally, the ability for the software bridge to request data plane packets to be sent brings the opportunity for "dumb switches" to support traffic termination to/from the bridge. Such switches (DSA or otherwise) typically only use control packets for link-local traps, and sending or receiving a control packet is an expensive operation.
For this class of switches, this patch series makes the difference between supporting and not supporting local IP termination through a VLAN-aware bridge, bridging with a foreign interface, bridging with software upper interfaces like LAG, etc. So instead of telling them "oh, what a dumb switch you are!", we can now tell them "oh, what a stark contrast you have between the control and data plane!".
Patches 1-3 tested on Turris MOX (3 mv88e6xxx switches in a daisy chain topology) and a second DSA driver to be added soon. Patches 4-5 tested only on Turris MOX.
===========================================================
Changes in v5: - make sure the static key is decremented on bridge port unoffload - rename functions and variables so that the "tx_fwd_offload" string is easy to grep across the git tree - simplify DSA core bookkeeping of the bridge_num
===========================================================
Changes in v4:
The biggest change compared to the previous series is not present in the patches, but is rather a lack of them. Previously we were replaying switchdev objects on the public notifier chain, but that was a mistake in my reasoning and it was reverted for v4. Therefore, we are now passing the notifier blocks as arguments to switchdev_bridge_port_offload() for all drivers. This alone gets rid of 7 patches compared to v3.
Other changes are: - Take more care for the case where mlxsw leaves a VLAN or LAG upper that is a bridge port, make sure that switchdev_bridge_port_unoffload() gets called for that case - A couple of DSA bug fixes - Add change logs for all patches - Copy all switchdev driver maintainers on the changes relevant to them
===========================================================
Message for v3: https://patchwork.kernel.org/project/netdevbpf/cover/20210712152142.800651-1-vladimir.oltean@nxp.com/
In this submission I have introduced a "native switchdev" driver API to signal whether the TX forwarding offload is supported or not. This comes after a third person has said that the macvlan offload framework used for v2 and v1 was simply too convoluted.
This large patch set is submitted for discussion purposes (it is provided in its entirety so it can be applied & tested on net-next). It is only minimally tested, and yet I will not copy all switchdev driver maintainers until we agree on the viability of this approach.
The major changes compared to v2: - The introduction of switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() as two major API changes from the perspective of a switchdev driver. All drivers were converted to call these. - Augment switchdev_bridge_port_{,un}offload to also handle the switchdev object replays on port join/leave. - Augment switchdev_bridge_port_offload to also signal whether the TX forwarding offload is supported.
===========================================================
Message for v2: https://patchwork.kernel.org/project/netdevbpf/cover/20210703115705.1034112-1-vladimir.oltean@nxp.com/
For this series I have taken Tobias' work from here: https://patchwork.kernel.org/project/netdevbpf/cover/20210426170411.1789186-1-tobias@waldekranz.com/ and made the following changes: - I collected and integrated (hopefully all of) Nikolay's, Ido's and my feedback on the bridge driver changes. Otherwise, the structure of the bridge changes is pretty much the same as Tobias left it. - I basically rewrote the DSA infrastructure for the data plane forwarding offload, based on the commonalities with another switch driver for which I implemented this feature (not submitted here) - I adapted mv88e6xxx to use the new infrastructure, hopefully it still works but I didn't test that ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
47211192 |
| 22-Jul-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: switchdev: allow the TX data plane forwarding to be offloaded
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done betw
net: bridge: switchdev: allow the TX data plane forwarding to be offloaded
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain.
Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication.
The basic flow between the driver and the bridge is as follows:
- When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true.
- The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true.
- The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet.
v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP
v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f796fcd6 |
| 22-Jul-2021 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bridge-port-offload'
Vladimir Oltean says:
==================== Let switchdev drivers offload and unoffload bridge ports at their own convenience
This series introduces an explicit A
Merge branch 'bridge-port-offload'
Vladimir Oltean says:
==================== Let switchdev drivers offload and unoffload bridge ports at their own convenience
This series introduces an explicit API through which switchdev drivers mark a bridge port as offloaded or not: - switchdev_bridge_port_offload() - switchdev_bridge_port_unoffload()
Currently, the bridge assumes that a port is offloaded if dev_get_port_parent_id(dev, &ppid, recurse=true) returns something, but that is just an assumption that breaks some use cases (like a non-offloaded LAG interface on top of a switchdev port, bridged with other switchdev ports).
Along with some consolidation of the bridge logic to assign a "switchdev offloading mark" to a port (now better called a "hardware domain"), this series allows the bridge driver side to no longer impose restrictions on that configuration.
Right now, all switchdev drivers must be modified to use the explicit API, but more and more logic can then be placed centrally in the bridge and therefore ease the job of a switchdev driver writer in the future.
For example, the first thing we can hook into the explicit switchdev offloading API calls are the switchdev object and FDB replay helpers. So far, these have only been used by DSA in "pull" mode (where the driver must ask for them). Adding the replay helpers to other drivers involves a lot of repetition. But by moving the helpers inside the bridge port offload/unoffload hook points, we can move the entire replay process to "push" mode (where the bridge provides them automatically).
The explicit switchdev offloading API will see further extensions in the future.
The patches were split from a larger series for easier review: https://patchwork.kernel.org/project/netdevbpf/cover/20210718214434.3938850-1-vladimir.oltean@nxp.com/
Changes in v6: - Make the switchdev replay helpers opt-in - Opt out of the replay helpers for mlxsw, rocker, prestera, sparx5, cpsw, am65-cpsw ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4e51bf44 |
| 21-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: move the switchdev object replay helpers to "push" mode
Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some
net: bridge: move the switchdev object replay helpers to "push" mode
Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances:
- an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port.
Also, since commit 0d2cfbd41c4a ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG.
With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers.
To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev).
Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays.
Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler.
Also unexport the replay helpers, since nobody except the bridge calls them directly now.
Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch.
We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them:
- nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge.
- br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality.
- br_mdb_replay() should fix a real issue which is described in commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay().
So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode
An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2f5dc00f |
| 21-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: let drivers inform which bridge ports are offloaded
On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_
net: bridge: switchdev: let drivers inform which bridge ports are offloaded
On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it is, it assigns the source hardware domain of that skb based on the hardware domain of the ingress port. Then during forwarding, it enforces that the egress port must have a different hardware domain than the ingress one (this is done in nbp_switchdev_allowed_egress).
Non-switchdev drivers don't report any physical switch id (neither through devlink nor .ndo_get_port_parent_id), therefore the bridge assigns them a hardware domain of 0, and packets coming from them will always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
Problems appear due to the fact that DSA would like to perform software fallback for bonding and team interfaces that the physical switch cannot offload.
+-- br0 ---+ / / | \ / / | \ / | | bond0 / | | / \ swp0 swp1 swp2 swp3 swp4
There, it is desirable that the presence of swp3 and swp4 under a non-offloaded LAG does not preclude us from doing hardware bridging beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high enough that software bridging between {swp0,swp1,swp2} and bond0 is not impractical.
But this creates an impossible paradox given the current way in which port hardware domains are assigned. When the driver receives a packet from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to something.
- If we set it to 0, then the bridge will forward it towards swp1, swp2 and bond0. But the switch has already forwarded it towards swp1 and swp2 (not to bond0, remember, that isn't offloaded, so as far as the switch is concerned, ports swp3 and swp4 are not looking up the FDB, and the entire bond0 is a destination that is strictly behind the CPU). But we don't want duplicated traffic towards swp1 and swp2, so it's not ok to set skb->offload_fwd_mark = 0.
- If we set it to 1, then the bridge will not forward the skb towards the ports with the same switchdev mark, i.e. not to swp1, swp2 and bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should have forwarded the skb there.
So the real issue is that bond0 will be assigned the same hardware domain as {swp0,swp1,swp2}, because the function that assigns hardware domains to bridge ports, nbp_switchdev_add(), recurses through bond0's lower interfaces until it finds something that implements devlink (calls dev_get_port_parent_id with bool recurse = true). This is a problem because the fact that bond0 can be offloaded by swp3 and swp4 in our example is merely an assumption.
A solution is to give the bridge explicit hints as to what hardware domain it should use for each port.
Currently, the bridging offload is very 'silent': a driver registers a netdevice notifier, which is put on the netns's notifier chain, and which sniffs around for NETDEV_CHANGEUPPER events where the upper is a bridge, and the lower is an interface it knows about (one registered by this driver, normally). Then, from within that notifier, it does a bunch of stuff behind the bridge's back, without the bridge necessarily knowing that there's somebody offloading that port. It looks like this:
ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v call_netdevice_notifiers | v dsa_slave_netdevice_event | v oh, hey! it's for me! | v .port_bridge_join
What we do to solve the conundrum is to be less silent, and change the switchdev drivers to present themselves to the bridge. Something like this:
ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | hardware domain for v | this port, and zero dsa_slave_netdevice_event | if I got nothing. | | v | oh, hey! it's for me! | | | v | .port_bridge_join | | | +------------------------+ switchdev_bridge_port_offload(swp0, swp0)
Then stacked interfaces (like bond0 on top of swp3/swp4) would be treated differently in DSA, depending on whether we can or cannot offload them.
The offload case:
ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | switchdev mark for v | bond0. dsa_slave_netdevice_event | Coincidentally (or not), | | bond0 and swp0, swp1, swp2 v | all have the same switchdev hmm, it's not quite for me, | mark now, since the ASIC but my driver has already | is able to forward towards called .port_lag_join | all these ports in hw. for it, because I have | a port with dp->lag_dev == bond0. | | | v | .port_bridge_join | for swp3 and swp4 | | | +------------------------+ switchdev_bridge_port_offload(bond0, swp3) switchdev_bridge_port_offload(bond0, swp4)
And the non-offload case:
ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge waiting: call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload | | wasn't called, okay, I'll use a v | hwdom of zero for this one. dsa_slave_netdevice_event : Then packets received on swp0 will | : not be software-forwarded towards v : swp1, but they will towards bond0. it's not for me, but bond0 is an upper of swp3 and swp4, but their dp->lag_dev is NULL because they couldn't offload it.
Basically we can draw the conclusion that the lowers of a bridge port can come and go, so depending on the configuration of lowers for a bridge port, it can dynamically toggle between offloaded and unoffloaded. Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
This patch changes the way any switchdev driver interacts with the bridge. From now on, everybody needs to call switchdev_bridge_port_offload and switchdev_bridge_port_unoffload, otherwise the bridge will treat the port as non-offloaded and allow software flooding to other ports from the same ASIC.
Note that these functions lay the ground for a more complex handshake between switchdev drivers and the bridge in the future.
For drivers that will request a replay of the switchdev objects when they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we place the call to switchdev_bridge_port_unoffload() strategically inside the NETDEV_PRECHANGEUPPER notifier's code path, and not inside NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers need the netdev adjacency lists to be valid, and that is only true in NETDEV_PRECHANGEUPPER.
Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.10.52, v5.10.51 |
|
#
320424c7 |
| 18-Jul-2021 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.13' into next
Sync up with the mainline to get the latest parport API.
|
Revision tags: v5.10.50 |
|
#
611ac726 |
| 13-Jul-2021 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Catching up with 5.14-rc1 and also preparing for a needed common topic branch for the "Minor revid/stepping and workaround cleanup"
Reference: https://patc
Merge drm/drm-next into drm-intel-gt-next
Catching up with 5.14-rc1 and also preparing for a needed common topic branch for the "Minor revid/stepping and workaround cleanup"
Reference: https://patchwork.freedesktop.org/series/92299/ Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
#
d5bfbad2 |
| 13-Jul-2021 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catching up with 5.14-rc1
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v5.10.49 |
|
#
dbe69e43 |
| 30-Jun-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core:
- BPF: - add syscall program type and libbpf
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core:
- BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler
- add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions
- Marvell (prestera): - add flower and match all - devlink trap - link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
show more ...
|
#
b6df0078 |
| 29-Jun-2021 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in net/netfilter/nf_tables_api.c.
Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-n
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in net/netfilter/nf_tables_api.c.
Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version.
skmsg, and L4 bpf - keep the bpf code but remove the flags and err params.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.13, v5.10.46 |
|
#
ce8eb4c7 |
| 22-Jun-2021 |
Vignesh Raghavendra <vigneshr@ti.com> |
net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
When changing number of TX queues using ethtool:
# ethtool -L eth0 tx 1 [ 135.301047] Unable to handle kernel paging request
net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
When changing number of TX queues using ethtool:
# ethtool -L eth0 tx 1 [ 135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000 [...] [ 135.525128] Call trace: [ 135.525142] dma_release_from_dev_coherent+0x2c/0xb0 [ 135.525148] dma_free_attrs+0x54/0xe0 [ 135.525156] k3_cppi_desc_pool_destroy+0x50/0xa0 [ 135.525164] am65_cpsw_nuss_remove_tx_chns+0x88/0xdc [ 135.525171] am65_cpsw_set_channels+0x3c/0x70 [...]
This is because k3_cppi_desc_pool_destroy() which is called after k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns() references struct device that is unregistered at the end of k3_udma_glue_release_tx_chn()
Therefore the right order is to call k3_cppi_desc_pool_destroy() and destroy desc pool before calling k3_udma_glue_release_tx_chn(). Fix this throughout the driver.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.10.43 |
|
#
c441bfb5 |
| 09-Jun-2021 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.13-rc3' into asoc-5.13
Linux 5.13-rc3
|
Revision tags: v5.10.42 |
|
#
942baad2 |
| 02-Jun-2021 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Pulling in -rc2 fixes and TTM changes that next upcoming patches depend on.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
Revision tags: v5.10.41, v5.10.40, v5.10.39 |
|
#
c37fe6af |
| 18-May-2021 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.13-rc2' into spi-5.13
Linux 5.13-rc2
|
#
85ebe5ae |
| 18-May-2021 |
Tony Lindgren <tony@atomide.com> |
Merge branch 'fixes-rc1' into fixes
|
#
d22fe808 |
| 17-May-2021 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Time to get back in sync...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v5.4.119 |
|
#
fd531024 |
| 11-May-2021 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v5.12 fixes. Requested for vmwgfx.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v5.10.36 |
|
#
c55b44c9 |
| 11-May-2021 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-fixes into drm-misc-fixes
Start this new release drm-misc-fixes branch
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|