13c91d114SIoana Ciornei============
23c91d114SIoana CiorneiArchitecture
33c91d114SIoana Ciornei============
43c91d114SIoana Ciornei
53c91d114SIoana CiorneiThis document describes the **Distributed Switch Architecture (DSA)** subsystem
63c91d114SIoana Ciorneidesign principles, limitations, interactions with other subsystems, and how to
73c91d114SIoana Ciorneidevelop drivers for this subsystem as well as a TODO for developers interested
83c91d114SIoana Ciorneiin joining the effort.
93c91d114SIoana Ciornei
103c91d114SIoana CiorneiDesign principles
113c91d114SIoana Ciornei=================
123c91d114SIoana Ciornei
135a48b743SBjorn HelgaasThe Distributed Switch Architecture subsystem was primarily designed to
145a48b743SBjorn Helgaassupport Marvell Ethernet switches (MV88E6xxx, a.k.a. Link Street product
155a48b743SBjorn Helgaasline) using Linux, but has since evolved to support other vendors as well.
163c91d114SIoana Ciornei
173c91d114SIoana CiorneiThe original philosophy behind this design was to be able to use unmodified
183c91d114SIoana CiorneiLinux tools such as bridge, iproute2, ifconfig to work transparently whether
193c91d114SIoana Ciorneithey configured/queried a switch port network device or a regular network
203c91d114SIoana Ciorneidevice.
213c91d114SIoana Ciornei
225a48b743SBjorn HelgaasAn Ethernet switch typically comprises multiple front-panel ports and one
235a48b743SBjorn Helgaasor more CPU or management ports. The DSA subsystem currently relies on the
243c91d114SIoana Ciorneipresence of a management port connected to an Ethernet controller capable of
253c91d114SIoana Ciorneireceiving Ethernet frames from the switch. This is a very common setup for all
263c91d114SIoana Ciorneikinds of Ethernet switches found in Small Home and Office products: routers,
275a48b743SBjorn Helgaasgateways, or even top-of-rack switches. This host Ethernet controller will
283c91d114SIoana Ciorneibe later referred to as "master" and "cpu" in DSA terminology and code.
293c91d114SIoana Ciornei
303c91d114SIoana CiorneiThe D in DSA stands for Distributed, because the subsystem has been designed
313c91d114SIoana Ciorneiwith the ability to configure and manage cascaded switches on top of each other
323c91d114SIoana Ciorneiusing upstream and downstream Ethernet links between switches. These specific
333c91d114SIoana Ciorneiports are referred to as "dsa" ports in DSA terminology and code. A collection
343c91d114SIoana Ciorneiof multiple switches connected to each other is called a "switch tree".
353c91d114SIoana Ciornei
365a48b743SBjorn HelgaasFor each front-panel port, DSA creates specialized network devices which are
373c91d114SIoana Ciorneiused as controlling and data-flowing endpoints for use by the Linux networking
383c91d114SIoana Ciorneistack. These specialized network interfaces are referred to as "slave" network
393c91d114SIoana Ciorneiinterfaces in DSA terminology and code.
403c91d114SIoana Ciornei
413c91d114SIoana CiorneiThe ideal case for using DSA is when an Ethernet switch supports a "switch tag"
423c91d114SIoana Ciorneiwhich is a hardware feature making the switch insert a specific tag for each
435a48b743SBjorn HelgaasEthernet frame it receives to/from specific ports to help the management
443c91d114SIoana Ciorneiinterface figure out:
453c91d114SIoana Ciornei
463c91d114SIoana Ciornei- what port is this frame coming from
473c91d114SIoana Ciornei- what was the reason why this frame got forwarded
483c91d114SIoana Ciornei- how to send CPU originated traffic to specific ports
493c91d114SIoana Ciornei
503c91d114SIoana CiorneiThe subsystem does support switches not capable of inserting/stripping tags, but
513c91d114SIoana Ciorneithe features might be slightly limited in that case (traffic separation relies
523c91d114SIoana Ciorneion Port-based VLAN IDs).
533c91d114SIoana Ciornei
543c91d114SIoana CiorneiNote that DSA does not currently create network interfaces for the "cpu" and
553c91d114SIoana Ciornei"dsa" ports because:
563c91d114SIoana Ciornei
573c91d114SIoana Ciornei- the "cpu" port is the Ethernet switch facing side of the management
583c91d114SIoana Ciornei  controller, and as such, would create a duplication of feature, since you
593c91d114SIoana Ciornei  would get two interfaces for the same conduit: master netdev, and "cpu" netdev
603c91d114SIoana Ciornei
613c91d114SIoana Ciornei- the "dsa" port(s) are just conduits between two or more switches, and as such
623c91d114SIoana Ciornei  cannot really be used as proper network interfaces either, only the
633c91d114SIoana Ciornei  downstream, or the top-most upstream interface makes sense with that model
643c91d114SIoana Ciornei
653c91d114SIoana CiorneiSwitch tagging protocols
663c91d114SIoana Ciornei------------------------
673c91d114SIoana Ciornei
687714ee15SVladimir OlteanDSA supports many vendor-specific tagging protocols, one software-defined
697714ee15SVladimir Olteantagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).
703c91d114SIoana Ciornei
713c91d114SIoana CiorneiThe exact format of the tag protocol is vendor specific, but in general, they
723c91d114SIoana Ciorneiall contain something which:
733c91d114SIoana Ciornei
743c91d114SIoana Ciornei- identifies which port the Ethernet frame came from/should be sent to
753c91d114SIoana Ciornei- provides a reason why this frame was forwarded to the management interface
763c91d114SIoana Ciornei
777714ee15SVladimir OlteanAll tagging protocols are in ``net/dsa/tag_*.c`` files and implement the
787714ee15SVladimir Olteanmethods of the ``struct dsa_device_ops`` structure, which are detailed below.
797714ee15SVladimir Oltean
807714ee15SVladimir OlteanTagging protocols generally fall in one of three categories:
817714ee15SVladimir Oltean
827714ee15SVladimir Oltean1. The switch-specific frame header is located before the Ethernet header,
837714ee15SVladimir Oltean   shifting to the right (from the perspective of the DSA master's frame
847714ee15SVladimir Oltean   parser) the MAC DA, MAC SA, EtherType and the entire L2 payload.
857714ee15SVladimir Oltean2. The switch-specific frame header is located before the EtherType, keeping
867714ee15SVladimir Oltean   the MAC DA and MAC SA in place from the DSA master's perspective, but
877714ee15SVladimir Oltean   shifting the 'real' EtherType and L2 payload to the right.
887714ee15SVladimir Oltean3. The switch-specific frame header is located at the tail of the packet,
897714ee15SVladimir Oltean   keeping all frame headers in place and not altering the view of the packet
907714ee15SVladimir Oltean   that the DSA master's frame parser has.
917714ee15SVladimir Oltean
927714ee15SVladimir OlteanA tagging protocol may tag all packets with switch tags of the same length, or
937714ee15SVladimir Olteanthe tag length might vary (for example packets with PTP timestamps might
947714ee15SVladimir Olteanrequire an extended switch tag, or there might be one tag length on TX and a
957714ee15SVladimir Olteandifferent one on RX). Either way, the tagging protocol driver must populate the
964e500251SVladimir Oltean``struct dsa_device_ops::needed_headroom`` and/or ``struct dsa_device_ops::needed_tailroom``
974e500251SVladimir Olteanwith the length in octets of the longest switch frame header/trailer. The DSA
984e500251SVladimir Olteanframework will automatically adjust the MTU of the master interface to
994e500251SVladimir Olteanaccommodate for this extra size in order for DSA user ports to support the
1004e500251SVladimir Olteanstandard MTU (L2 payload length) of 1500 octets. The ``needed_headroom`` and
1014e500251SVladimir Oltean``needed_tailroom`` properties are also used to request from the network stack,
1024e500251SVladimir Olteanon a best-effort basis, the allocation of packets with enough extra space such
1034e500251SVladimir Olteanthat the act of pushing the switch tag on transmission of a packet does not
1044e500251SVladimir Olteancause it to reallocate due to lack of memory.
1057714ee15SVladimir Oltean
1067714ee15SVladimir OlteanEven though applications are not expected to parse DSA-specific frame headers,
1077714ee15SVladimir Olteanthe format on the wire of the tagging protocol represents an Application Binary
1087714ee15SVladimir OlteanInterface exposed by the kernel towards user space, for decoders such as
1097714ee15SVladimir Oltean``libpcap``. The tagging protocol driver must populate the ``proto`` member of
1107714ee15SVladimir Oltean``struct dsa_device_ops`` with a value that uniquely describes the
1117714ee15SVladimir Olteancharacteristics of the interaction required between the switch hardware and the
1127714ee15SVladimir Olteandata path driver: the offset of each bit field within the frame header and any
1137714ee15SVladimir Olteanstateful processing required to deal with the frames (as may be required for
1147714ee15SVladimir OlteanPTP timestamping).
1157714ee15SVladimir Oltean
1167714ee15SVladimir OlteanFrom the perspective of the network stack, all switches within the same DSA
1177714ee15SVladimir Olteanswitch tree use the same tagging protocol. In case of a packet transiting a
1187714ee15SVladimir Olteanfabric with more than one switch, the switch-specific frame header is inserted
1197714ee15SVladimir Olteanby the first switch in the fabric that the packet was received on. This header
1207714ee15SVladimir Olteantypically contains information regarding its type (whether it is a control
1217714ee15SVladimir Olteanframe that must be trapped to the CPU, or a data frame to be forwarded).
1227714ee15SVladimir OlteanControl frames should be decapsulated only by the software data path, whereas
1237714ee15SVladimir Olteandata frames might also be autonomously forwarded towards other user ports of
1247714ee15SVladimir Olteanother switches from the same fabric, and in this case, the outermost switch
1257714ee15SVladimir Olteanports must decapsulate the packet.
1267714ee15SVladimir Oltean
1277714ee15SVladimir OlteanNote that in certain cases, it might be the case that the tagging format used
1285a48b743SBjorn Helgaasby a leaf switch (not connected directly to the CPU) is not the same as what
1297714ee15SVladimir Olteanthe network stack sees. This can be seen with Marvell switch trees, where the
1307714ee15SVladimir OlteanCPU port can be configured to use either the DSA or the Ethertype DSA (EDSA)
1317714ee15SVladimir Olteanformat, but the DSA links are configured to use the shorter (without Ethertype)
1327714ee15SVladimir OlteanDSA frame header, in order to reduce the autonomous packet forwarding overhead.
1337714ee15SVladimir OlteanIt still remains the case that, if the DSA switch tree is configured for the
1347714ee15SVladimir OlteanEDSA tagging protocol, the operating system sees EDSA-tagged packets from the
1357714ee15SVladimir Olteanleaf switches that tagged them with the shorter DSA header. This can be done
1367714ee15SVladimir Olteanbecause the Marvell switch connected directly to the CPU is configured to
1377714ee15SVladimir Olteanperform tag translation between DSA and EDSA (which is simply the operation of
1387714ee15SVladimir Olteanadding or removing the ``ETH_P_EDSA`` EtherType and some padding octets).
1397714ee15SVladimir Oltean
1407714ee15SVladimir OlteanIt is possible to construct cascaded setups of DSA switches even if their
1417714ee15SVladimir Olteantagging protocols are not compatible with one another. In this case, there are
1427714ee15SVladimir Olteanno DSA links in this fabric, and each switch constitutes a disjoint DSA switch
1437714ee15SVladimir Olteantree. The DSA links are viewed as simply a pair of a DSA master (the out-facing
1447714ee15SVladimir Olteanport of the upstream DSA switch) and a CPU port (the in-facing port of the
1457714ee15SVladimir Olteandownstream DSA switch).
1467714ee15SVladimir Oltean
1477714ee15SVladimir OlteanThe tagging protocol of the attached DSA switch tree can be viewed through the
1487714ee15SVladimir Oltean``dsa/tagging`` sysfs attribute of the DSA master::
1497714ee15SVladimir Oltean
1507714ee15SVladimir Oltean    cat /sys/class/net/eth0/dsa/tagging
1517714ee15SVladimir Oltean
1527714ee15SVladimir OlteanIf the hardware and driver are capable, the tagging protocol of the DSA switch
1537714ee15SVladimir Olteantree can be changed at runtime. This is done by writing the new tagging
1547714ee15SVladimir Olteanprotocol name to the same sysfs device attribute as above (the DSA master and
1557714ee15SVladimir Olteanall attached switch ports must be down while doing this).
1567714ee15SVladimir Oltean
1577714ee15SVladimir OlteanIt is desirable that all tagging protocols are testable with the ``dsa_loop``
1587714ee15SVladimir Olteanmockup driver, which can be attached to any network interface. The goal is that
1597714ee15SVladimir Olteanany network interface should be capable of transmitting the same packet in the
1607714ee15SVladimir Olteansame way, and the tagger should decode the same received packet in the same way
1617714ee15SVladimir Olteanregardless of the driver used for the switch control path, and the driver used
1627714ee15SVladimir Olteanfor the DSA master.
1637714ee15SVladimir Oltean
1647714ee15SVladimir OlteanThe transmission of a packet goes through the tagger's ``xmit`` function.
1657714ee15SVladimir OlteanThe passed ``struct sk_buff *skb`` has ``skb->data`` pointing at
1667714ee15SVladimir Oltean``skb_mac_header(skb)``, i.e. at the destination MAC address, and the passed
1677714ee15SVladimir Oltean``struct net_device *dev`` represents the virtual DSA user network interface
1687714ee15SVladimir Olteanwhose hardware counterpart the packet must be steered to (i.e. ``swp0``).
1697714ee15SVladimir OlteanThe job of this method is to prepare the skb in a way that the switch will
1707714ee15SVladimir Olteanunderstand what egress port the packet is for (and not deliver it towards other
1717714ee15SVladimir Olteanports). Typically this is fulfilled by pushing a frame header. Checking for
1727714ee15SVladimir Olteaninsufficient size in the skb headroom or tailroom is unnecessary provided that
1734e500251SVladimir Olteanthe ``needed_headroom`` and ``needed_tailroom`` properties were filled out
1744e500251SVladimir Olteanproperly, because DSA ensures there is enough space before calling this method.
1757714ee15SVladimir Oltean
1767714ee15SVladimir OlteanThe reception of a packet goes through the tagger's ``rcv`` function. The
1777714ee15SVladimir Olteanpassed ``struct sk_buff *skb`` has ``skb->data`` pointing at
1787714ee15SVladimir Oltean``skb_mac_header(skb) + ETH_ALEN`` octets, i.e. to where the first octet after
1797714ee15SVladimir Olteanthe EtherType would have been, were this frame not tagged. The role of this
1807714ee15SVladimir Olteanmethod is to consume the frame header, adjust ``skb->data`` to really point at
1817714ee15SVladimir Olteanthe first octet after the EtherType, and to change ``skb->dev`` to point to the
1827714ee15SVladimir Olteanvirtual DSA user network interface corresponding to the physical front-facing
1837714ee15SVladimir Olteanswitch port that the packet was received on.
1847714ee15SVladimir Oltean
1857714ee15SVladimir OlteanSince tagging protocols in category 1 and 2 break software (and most often also
1867714ee15SVladimir Olteanhardware) packet dissection on the DSA master, features such as RPS (Receive
1877714ee15SVladimir OlteanPacket Steering) on the DSA master would be broken. The DSA framework deals
1887714ee15SVladimir Olteanwith this by hooking into the flow dissector and shifting the offset at which
1897714ee15SVladimir Olteanthe IP header is to be found in the tagged frame as seen by the DSA master.
1907714ee15SVladimir OlteanThis behavior is automatic based on the ``overhead`` value of the tagging
1917714ee15SVladimir Olteanprotocol. If not all packets are of equal size, the tagger can implement the
1927714ee15SVladimir Oltean``flow_dissect`` method of the ``struct dsa_device_ops`` and override this
1937714ee15SVladimir Olteandefault behavior by specifying the correct offset incurred by each individual
1947714ee15SVladimir OlteanRX packet. Tail taggers do not cause issues to the flow dissector.
1957714ee15SVladimir Oltean
196a997157eSLuiz Angelo Daros de LucaChecksum offload should work with category 1 and 2 taggers when the DSA master
197a997157eSLuiz Angelo Daros de Lucadriver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and
198a997157eSLuiz Angelo Daros de Lucacsum_offset. For those cases, DSA will shift the checksum start and offset by
199a997157eSLuiz Angelo Daros de Lucathe tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM
200a997157eSLuiz Angelo Daros de Lucaor NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
201a997157eSLuiz Angelo Daros de Lucaoffload hardware already expects that specific tag (perhaps due to matching
202a997157eSLuiz Angelo Daros de Lucavendors). DSA slaves inherit those flags from the master port, and it is up to
203a997157eSLuiz Angelo Daros de Lucathe driver to correctly fall back to software checksum when the IP header is not
204a997157eSLuiz Angelo Daros de Lucawhere the hardware expects. If that check is ineffective, the packets might go
205a997157eSLuiz Angelo Daros de Lucato the network without a proper checksum (the checksum field will have the
206a997157eSLuiz Angelo Daros de Lucapseudo IP header sum). For category 3, when the offload hardware does not
207a997157eSLuiz Angelo Daros de Lucaalready expect the switch tag in use, the checksum must be calculated before any
208a997157eSLuiz Angelo Daros de Lucatag is inserted (i.e. inside the tagger). Otherwise, the DSA master would
209a997157eSLuiz Angelo Daros de Lucainclude the tail tag in the (software or hardware) checksum calculation. Then,
210a997157eSLuiz Angelo Daros de Lucawhen the tag gets stripped by the switch during transmission, it will leave an
211a997157eSLuiz Angelo Daros de Lucaincorrect IP checksum in place.
212a997157eSLuiz Angelo Daros de Luca
2137714ee15SVladimir OlteanDue to various reasons (most common being category 1 taggers being associated
2147714ee15SVladimir Olteanwith DSA-unaware masters, mangling what the master perceives as MAC DA), the
2157714ee15SVladimir Olteantagging protocol may require the DSA master to operate in promiscuous mode, to
2167714ee15SVladimir Olteanreceive all frames regardless of the value of the MAC DA. This can be done by
2177714ee15SVladimir Olteansetting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.
2187714ee15SVladimir OlteanNote that this assumes a DSA-unaware master driver, which is the norm.
2197714ee15SVladimir Oltean
2203c91d114SIoana CiorneiMaster network devices
2213c91d114SIoana Ciornei----------------------
2223c91d114SIoana Ciornei
2233c91d114SIoana CiorneiMaster network devices are regular, unmodified Linux network device drivers for
2243c91d114SIoana Ciorneithe CPU/management Ethernet interface. Such a driver might occasionally need to
2253c91d114SIoana Ciorneiknow whether DSA is enabled (e.g.: to enable/disable specific offload features),
2263c91d114SIoana Ciorneibut the DSA subsystem has been proven to work with industry standard drivers:
2273c91d114SIoana Ciornei``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these
2283c91d114SIoana Ciorneidrivers. Such network devices are also often referred to as conduit network
2293c91d114SIoana Ciorneidevices since they act as a pipe between the host processor and the hardware
2303c91d114SIoana CiorneiEthernet switch.
2313c91d114SIoana Ciornei
2323c91d114SIoana CiorneiNetworking stack hooks
2333c91d114SIoana Ciornei----------------------
2343c91d114SIoana Ciornei
2354f6a009cSRandy DunlapWhen a master netdev is used with DSA, a small hook is placed in the
2363c91d114SIoana Ciorneinetworking stack is in order to have the DSA subsystem process the Ethernet
2373c91d114SIoana Ciorneiswitch specific tagging protocol. DSA accomplishes this by registering a
2383c91d114SIoana Ciorneispecific (and fake) Ethernet type (later becoming ``skb->protocol``) with the
2393c91d114SIoana Ciorneinetworking stack, this is also known as a ``ptype`` or ``packet_type``. A typical
2403c91d114SIoana CiorneiEthernet Frame receive sequence looks like this:
2413c91d114SIoana Ciornei
2423c91d114SIoana CiorneiMaster network device (e.g.: e1000e):
2433c91d114SIoana Ciornei
2443c91d114SIoana Ciornei1. Receive interrupt fires:
2453c91d114SIoana Ciornei
2463c91d114SIoana Ciornei        - receive function is invoked
2473c91d114SIoana Ciornei        - basic packet processing is done: getting length, status etc.
2483c91d114SIoana Ciornei        - packet is prepared to be processed by the Ethernet layer by calling
2493c91d114SIoana Ciornei          ``eth_type_trans``
2503c91d114SIoana Ciornei
2513c91d114SIoana Ciornei2. net/ethernet/eth.c::
2523c91d114SIoana Ciornei
2533c91d114SIoana Ciornei          eth_type_trans(skb, dev)
2543c91d114SIoana Ciornei                  if (dev->dsa_ptr != NULL)
2553c91d114SIoana Ciornei                          -> skb->protocol = ETH_P_XDSA
2563c91d114SIoana Ciornei
2573c91d114SIoana Ciornei3. drivers/net/ethernet/\*::
2583c91d114SIoana Ciornei
2593c91d114SIoana Ciornei          netif_receive_skb(skb)
2603c91d114SIoana Ciornei                  -> iterate over registered packet_type
2613c91d114SIoana Ciornei                          -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
2623c91d114SIoana Ciornei
2633c91d114SIoana Ciornei4. net/dsa/dsa.c::
2643c91d114SIoana Ciornei
2653c91d114SIoana Ciornei          -> dsa_switch_rcv()
2663c91d114SIoana Ciornei                  -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c'
2673c91d114SIoana Ciornei
2683c91d114SIoana Ciornei5. net/dsa/tag_*.c:
2693c91d114SIoana Ciornei
2703c91d114SIoana Ciornei        - inspect and strip switch tag protocol to determine originating port
2713c91d114SIoana Ciornei        - locate per-port network device
2723c91d114SIoana Ciornei        - invoke ``eth_type_trans()`` with the DSA slave network device
2733c91d114SIoana Ciornei        - invoked ``netif_receive_skb()``
2743c91d114SIoana Ciornei
2753c91d114SIoana CiorneiPast this point, the DSA slave network devices get delivered regular Ethernet
2763c91d114SIoana Ciorneiframes that can be processed by the networking stack.
2773c91d114SIoana Ciornei
2783c91d114SIoana CiorneiSlave network devices
2793c91d114SIoana Ciornei---------------------
2803c91d114SIoana Ciornei
2813c91d114SIoana CiorneiSlave network devices created by DSA are stacked on top of their master network
2823c91d114SIoana Ciorneidevice, each of these network interfaces will be responsible for being a
2833c91d114SIoana Ciorneicontrolling and data-flowing end-point for each front-panel port of the switch.
2843c91d114SIoana CiorneiThese interfaces are specialized in order to:
2853c91d114SIoana Ciornei
2863c91d114SIoana Ciornei- insert/remove the switch tag protocol (if it exists) when sending traffic
2873c91d114SIoana Ciornei  to/from specific switch ports
2883c91d114SIoana Ciornei- query the switch for ethtool operations: statistics, link state,
2893c91d114SIoana Ciornei  Wake-on-LAN, register dumps...
2905a48b743SBjorn Helgaas- manage external/internal PHY: link, auto-negotiation, etc.
2913c91d114SIoana Ciornei
2923c91d114SIoana CiorneiThese slave network devices have custom net_device_ops and ethtool_ops function
2933c91d114SIoana Ciorneipointers which allow DSA to introduce a level of layering between the networking
2945a48b743SBjorn Helgaasstack/ethtool and the switch driver implementation.
2953c91d114SIoana Ciornei
2963c91d114SIoana CiorneiUpon frame transmission from these slave network devices, DSA will look up which
2975a48b743SBjorn Helgaasswitch tagging protocol is currently registered with these network devices and
2983c91d114SIoana Ciorneiinvoke a specific transmit routine which takes care of adding the relevant
2993c91d114SIoana Ciorneiswitch tag in the Ethernet frames.
3003c91d114SIoana Ciornei
3013c91d114SIoana CiorneiThese frames are then queued for transmission using the master network device
3025a48b743SBjorn Helgaas``ndo_start_xmit()`` function. Since they contain the appropriate switch tag, the
3033c91d114SIoana CiorneiEthernet switch will be able to process these incoming frames from the
3045a48b743SBjorn Helgaasmanagement interface and deliver them to the physical switch port.
3053c91d114SIoana Ciornei
306*0773e3a8SVladimir OlteanWhen using multiple CPU ports, it is possible to stack a LAG (bonding/team)
307*0773e3a8SVladimir Olteandevice between the DSA slave devices and the physical DSA masters. The LAG
308*0773e3a8SVladimir Olteandevice is thus also a DSA master, but the LAG slave devices continue to be DSA
309*0773e3a8SVladimir Olteanmasters as well (just with no user port assigned to them; this is needed for
310*0773e3a8SVladimir Olteanrecovery in case the LAG DSA master disappears). Thus, the data path of the LAG
311*0773e3a8SVladimir OlteanDSA master is used asymmetrically. On RX, the ``ETH_P_XDSA`` handler, which
312*0773e3a8SVladimir Olteancalls ``dsa_switch_rcv()``, is invoked early (on the physical DSA master;
313*0773e3a8SVladimir OlteanLAG slave). Therefore, the RX data path of the LAG DSA master is not used.
314*0773e3a8SVladimir OlteanOn the other hand, TX takes place linearly: ``dsa_slave_xmit`` calls
315*0773e3a8SVladimir Oltean``dsa_enqueue_skb``, which calls ``dev_queue_xmit`` towards the LAG DSA master.
316*0773e3a8SVladimir OlteanThe latter calls ``dev_queue_xmit`` towards one physical DSA master or the
317*0773e3a8SVladimir Olteanother, and in both cases, the packet exits the system through a hardware path
318*0773e3a8SVladimir Olteantowards the switch.
319*0773e3a8SVladimir Oltean
3203c91d114SIoana CiorneiGraphical representation
3213c91d114SIoana Ciornei------------------------
3223c91d114SIoana Ciornei
3233c91d114SIoana CiorneiSummarized, this is basically how DSA looks like from a network device
3243c91d114SIoana Ciorneiperspective::
3253c91d114SIoana Ciornei
3260f455371SVladimir Oltean                Unaware application
3270f455371SVladimir Oltean              opens and binds socket
3280f455371SVladimir Oltean                       |  ^
3293c91d114SIoana Ciornei                       |  |
3300f455371SVladimir Oltean           +-----------v--|--------------------+
3310f455371SVladimir Oltean           |+------+ +------+ +------+ +------+|
3320f455371SVladimir Oltean           || swp0 | | swp1 | | swp2 | | swp3 ||
3330f455371SVladimir Oltean           |+------+-+------+-+------+-+------+|
3340f455371SVladimir Oltean           |          DSA switch driver        |
3350f455371SVladimir Oltean           +-----------------------------------+
3360f455371SVladimir Oltean                         |        ^
3370f455371SVladimir Oltean            Tag added by |        | Tag consumed by
3380f455371SVladimir Oltean           switch driver |        | switch driver
3390f455371SVladimir Oltean                         v        |
3400f455371SVladimir Oltean           +-----------------------------------+
3410f455371SVladimir Oltean           | Unmodified host interface driver  | Software
3420f455371SVladimir Oltean   --------+-----------------------------------+------------
3430f455371SVladimir Oltean           |       Host interface (eth0)       | Hardware
3440f455371SVladimir Oltean           +-----------------------------------+
3450f455371SVladimir Oltean                         |        ^
3460f455371SVladimir Oltean         Tag consumed by |        | Tag added by
3470f455371SVladimir Oltean         switch hardware |        | switch hardware
3480f455371SVladimir Oltean                         v        |
3490f455371SVladimir Oltean           +-----------------------------------+
3500f455371SVladimir Oltean           |               Switch              |
3510f455371SVladimir Oltean           |+------+ +------+ +------+ +------+|
3520f455371SVladimir Oltean           || swp0 | | swp1 | | swp2 | | swp3 ||
3530f455371SVladimir Oltean           ++------+-+------+-+------+-+------++
3543c91d114SIoana Ciornei
3553c91d114SIoana CiorneiSlave MDIO bus
3563c91d114SIoana Ciornei--------------
3573c91d114SIoana Ciornei
3583c91d114SIoana CiorneiIn order to be able to read to/from a switch PHY built into it, DSA creates a
3593c91d114SIoana Ciorneislave MDIO bus which allows a specific switch driver to divert and intercept
3603c91d114SIoana CiorneiMDIO reads/writes towards specific PHY addresses. In most MDIO-connected
3613c91d114SIoana Ciorneiswitches, these functions would utilize direct or indirect PHY addressing mode
3623c91d114SIoana Ciorneito return standard MII registers from the switch builtin PHYs, allowing the PHY
3633c91d114SIoana Ciorneilibrary and/or to return link status, link partner pages, auto-negotiation
3645a48b743SBjorn Helgaasresults, etc.
3653c91d114SIoana Ciornei
3665a48b743SBjorn HelgaasFor Ethernet switches which have both external and internal MDIO buses, the
3673c91d114SIoana Ciorneislave MII bus can be utilized to mux/demux MDIO reads and writes towards either
3683c91d114SIoana Ciorneiinternal or external MDIO devices this switch might be connected to: internal
3693c91d114SIoana CiorneiPHYs, external PHYs, or even external switches.
3703c91d114SIoana Ciornei
3713c91d114SIoana CiorneiData structures
3723c91d114SIoana Ciornei---------------
3733c91d114SIoana Ciornei
3743c91d114SIoana CiorneiDSA data structures are defined in ``include/net/dsa.h`` as well as
3753c91d114SIoana Ciornei``net/dsa/dsa_priv.h``:
3763c91d114SIoana Ciornei
3773c91d114SIoana Ciornei- ``dsa_chip_data``: platform data configuration for a given switch device,
3783c91d114SIoana Ciornei  this structure describes a switch device's parent device, its address, as
3793c91d114SIoana Ciornei  well as various properties of its ports: names/labels, and finally a routing
3803c91d114SIoana Ciornei  table indication (when cascading switches)
3813c91d114SIoana Ciornei
3823c91d114SIoana Ciornei- ``dsa_platform_data``: platform device configuration data which can reference
3835a48b743SBjorn Helgaas  a collection of dsa_chip_data structures if multiple switches are cascaded,
3843c91d114SIoana Ciornei  the master network device this switch tree is attached to needs to be
3853c91d114SIoana Ciornei  referenced
3863c91d114SIoana Ciornei
3873c91d114SIoana Ciornei- ``dsa_switch_tree``: structure assigned to the master network device under
3883c91d114SIoana Ciornei  ``dsa_ptr``, this structure references a dsa_platform_data structure as well as
3893c91d114SIoana Ciornei  the tagging protocol supported by the switch tree, and which receive/transmit
3903c91d114SIoana Ciornei  function hooks should be invoked, information about the directly attached
3913c91d114SIoana Ciornei  switch is also provided: CPU port. Finally, a collection of dsa_switch are
3923c91d114SIoana Ciornei  referenced to address individual switches in the tree.
3933c91d114SIoana Ciornei
3943c91d114SIoana Ciornei- ``dsa_switch``: structure describing a switch device in the tree, referencing
3953c91d114SIoana Ciornei  a ``dsa_switch_tree`` as a backpointer, slave network devices, master network
3963c91d114SIoana Ciornei  device, and a reference to the backing``dsa_switch_ops``
3973c91d114SIoana Ciornei
3983c91d114SIoana Ciornei- ``dsa_switch_ops``: structure referencing function pointers, see below for a
3993c91d114SIoana Ciornei  full description.
4003c91d114SIoana Ciornei
4013c91d114SIoana CiorneiDesign limitations
4023c91d114SIoana Ciornei==================
4033c91d114SIoana Ciornei
4043c91d114SIoana CiorneiLack of CPU/DSA network devices
4053c91d114SIoana Ciornei-------------------------------
4063c91d114SIoana Ciornei
4073c91d114SIoana CiorneiDSA does not currently create slave network devices for the CPU or DSA ports, as
4083c91d114SIoana Ciorneidescribed before. This might be an issue in the following cases:
4093c91d114SIoana Ciornei
4103c91d114SIoana Ciornei- inability to fetch switch CPU port statistics counters using ethtool, which
4113c91d114SIoana Ciornei  can make it harder to debug MDIO switch connected using xMII interfaces
4123c91d114SIoana Ciornei
4133c91d114SIoana Ciornei- inability to configure the CPU port link parameters based on the Ethernet
4143c91d114SIoana Ciornei  controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/
4153c91d114SIoana Ciornei
4163c91d114SIoana Ciornei- inability to configure specific VLAN IDs / trunking VLANs between switches
4173c91d114SIoana Ciornei  when using a cascaded setup
4183c91d114SIoana Ciornei
4193c91d114SIoana CiorneiCommon pitfalls using DSA setups
4203c91d114SIoana Ciornei--------------------------------
4213c91d114SIoana Ciornei
4223c91d114SIoana CiorneiOnce a master network device is configured to use DSA (dev->dsa_ptr becomes
4233c91d114SIoana Ciorneinon-NULL), and the switch behind it expects a tagging protocol, this network
4243c91d114SIoana Ciorneiinterface can only exclusively be used as a conduit interface. Sending packets
4253c91d114SIoana Ciorneidirectly through this interface (e.g.: opening a socket using this interface)
4263c91d114SIoana Ciorneiwill not make us go through the switch tagging protocol transmit function, so
4273c91d114SIoana Ciorneithe Ethernet switch on the other end, expecting a tag will typically drop this
4283c91d114SIoana Ciorneiframe.
4293c91d114SIoana Ciornei
4303c91d114SIoana CiorneiInteractions with other subsystems
4313c91d114SIoana Ciornei==================================
4323c91d114SIoana Ciornei
4333c91d114SIoana CiorneiDSA currently leverages the following subsystems:
4343c91d114SIoana Ciornei
4353c91d114SIoana Ciornei- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c``
4363c91d114SIoana Ciornei- Switchdev:``net/switchdev/*``
4373c91d114SIoana Ciornei- Device Tree for various of_* functions
4388411abbcSVladimir Oltean- Devlink: ``net/core/devlink.c``
4393c91d114SIoana Ciornei
4403c91d114SIoana CiorneiMDIO/PHY library
4413c91d114SIoana Ciornei----------------
4423c91d114SIoana Ciornei
4433c91d114SIoana CiorneiSlave network devices exposed by DSA may or may not be interfacing with PHY
4443c91d114SIoana Ciorneidevices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA
4453c91d114SIoana Ciorneisubsystem deals with all possible combinations:
4463c91d114SIoana Ciornei
4473c91d114SIoana Ciornei- internal PHY devices, built into the Ethernet switch hardware
4483c91d114SIoana Ciornei- external PHY devices, connected via an internal or external MDIO bus
4493c91d114SIoana Ciornei- internal PHY devices, connected via an internal MDIO bus
4503c91d114SIoana Ciornei- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
4513c91d114SIoana Ciornei  fixed PHYs
4523c91d114SIoana Ciornei
4533c91d114SIoana CiorneiThe PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the
4543c91d114SIoana Ciorneilogic basically looks like this:
4553c91d114SIoana Ciornei
4563c91d114SIoana Ciornei- if Device Tree is used, the PHY device is looked up using the standard
4573c91d114SIoana Ciornei  "phy-handle" property, if found, this PHY device is created and registered
4583c91d114SIoana Ciornei  using ``of_phy_connect()``
4593c91d114SIoana Ciornei
4605a48b743SBjorn Helgaas- if Device Tree is used and the PHY device is "fixed", that is, conforms to
4613c91d114SIoana Ciornei  the definition of a non-MDIO managed PHY as defined in
4623c91d114SIoana Ciornei  ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered
4633c91d114SIoana Ciornei  and connected transparently using the special fixed MDIO bus driver
4643c91d114SIoana Ciornei
4653c91d114SIoana Ciornei- finally, if the PHY is built into the switch, as is very common with
4663c91d114SIoana Ciornei  standalone switch packages, the PHY is probed using the slave MII bus created
4673c91d114SIoana Ciornei  by DSA
4683c91d114SIoana Ciornei
4693c91d114SIoana Ciornei
4703c91d114SIoana CiorneiSWITCHDEV
4713c91d114SIoana Ciornei---------
4723c91d114SIoana Ciornei
4733c91d114SIoana CiorneiDSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and
4743c91d114SIoana Ciorneimore specifically with its VLAN filtering portion when configuring VLANs on top
475f8843991SVladimir Olteanof per-port slave network devices. As of today, the only SWITCHDEV objects
476f8843991SVladimir Olteansupported by DSA are the FDB and VLAN objects.
4773c91d114SIoana Ciornei
4788411abbcSVladimir OlteanDevlink
4798411abbcSVladimir Oltean-------
4808411abbcSVladimir Oltean
4818411abbcSVladimir OlteanDSA registers one devlink device per physical switch in the fabric.
4828411abbcSVladimir OlteanFor each devlink device, every physical port (i.e. user ports, CPU ports, DSA
4838411abbcSVladimir Olteanlinks or unused ports) is exposed as a devlink port.
4848411abbcSVladimir Oltean
4858411abbcSVladimir OlteanDSA drivers can make use of the following devlink features:
4868794be45SVladimir Oltean
4878411abbcSVladimir Oltean- Regions: debugging feature which allows user space to dump driver-defined
4888411abbcSVladimir Oltean  areas of hardware information in a low-level, binary format. Both global
4898411abbcSVladimir Oltean  regions as well as per-port regions are supported. It is possible to export
4908411abbcSVladimir Oltean  devlink regions even for pieces of data that are already exposed in some way
4918411abbcSVladimir Oltean  to the standard iproute2 user space programs (ip-link, bridge), like address
4928411abbcSVladimir Oltean  tables and VLAN tables. For example, this might be useful if the tables
4938411abbcSVladimir Oltean  contain additional hardware-specific details which are not visible through
4948411abbcSVladimir Oltean  the iproute2 abstraction, or it might be useful to inspect these tables on
4958411abbcSVladimir Oltean  the non-user ports too, which are invisible to iproute2 because no network
4968411abbcSVladimir Oltean  interface is registered for them.
4978411abbcSVladimir Oltean- Params: a feature which enables user to configure certain low-level tunable
4988411abbcSVladimir Oltean  knobs pertaining to the device. Drivers may implement applicable generic
4998411abbcSVladimir Oltean  devlink params, or may add new device-specific devlink params.
5008411abbcSVladimir Oltean- Resources: a monitoring feature which enables users to see the degree of
5018411abbcSVladimir Oltean  utilization of certain hardware tables in the device, such as FDB, VLAN, etc.
5028411abbcSVladimir Oltean- Shared buffers: a QoS feature for adjusting and partitioning memory and frame
5038411abbcSVladimir Oltean  reservations per port and per traffic class, in the ingress and egress
5048411abbcSVladimir Oltean  directions, such that low-priority bulk traffic does not impede the
5058411abbcSVladimir Oltean  processing of high-priority critical traffic.
5068411abbcSVladimir Oltean
5078411abbcSVladimir OlteanFor more details, consult ``Documentation/networking/devlink/``.
5088411abbcSVladimir Oltean
5093c91d114SIoana CiorneiDevice Tree
5103c91d114SIoana Ciornei-----------
5113c91d114SIoana Ciornei
5123c91d114SIoana CiorneiDSA features a standardized binding which is documented in
5133c91d114SIoana Ciornei``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper
5143c91d114SIoana Ciorneifunctions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query
5155a48b743SBjorn Helgaasper-port PHY specific details: interface connection, MDIO bus location, etc.
5163c91d114SIoana Ciornei
5173c91d114SIoana CiorneiDriver development
5183c91d114SIoana Ciornei==================
5193c91d114SIoana Ciornei
52019b3b13cSVladimir OlteanDSA switch drivers need to implement a ``dsa_switch_ops`` structure which will
5213c91d114SIoana Ciorneicontain the various members described below.
5223c91d114SIoana Ciornei
52319b3b13cSVladimir OlteanProbing, registration and device lifetime
52419b3b13cSVladimir Oltean-----------------------------------------
5253c91d114SIoana Ciornei
52619b3b13cSVladimir OlteanDSA switches are regular ``device`` structures on buses (be they platform, SPI,
52719b3b13cSVladimir OlteanI2C, MDIO or otherwise). The DSA framework is not involved in their probing
52819b3b13cSVladimir Olteanwith the device core.
52919b3b13cSVladimir Oltean
53019b3b13cSVladimir OlteanSwitch registration from the perspective of a driver means passing a valid
53119b3b13cSVladimir Oltean``struct dsa_switch`` pointer to ``dsa_register_switch()``, usually from the
53219b3b13cSVladimir Olteanswitch driver's probing function. The following members must be valid in the
53319b3b13cSVladimir Olteanprovided structure:
53419b3b13cSVladimir Oltean
53519b3b13cSVladimir Oltean- ``ds->dev``: will be used to parse the switch's OF node or platform data.
53619b3b13cSVladimir Oltean
53719b3b13cSVladimir Oltean- ``ds->num_ports``: will be used to create the port list for this switch, and
53819b3b13cSVladimir Oltean  to validate the port indices provided in the OF node.
53919b3b13cSVladimir Oltean
54019b3b13cSVladimir Oltean- ``ds->ops``: a pointer to the ``dsa_switch_ops`` structure holding the DSA
54119b3b13cSVladimir Oltean  method implementations.
54219b3b13cSVladimir Oltean
54319b3b13cSVladimir Oltean- ``ds->priv``: backpointer to a driver-private data structure which can be
54419b3b13cSVladimir Oltean  retrieved in all further DSA method callbacks.
54519b3b13cSVladimir Oltean
54619b3b13cSVladimir OlteanIn addition, the following flags in the ``dsa_switch`` structure may optionally
54719b3b13cSVladimir Olteanbe configured to obtain driver-specific behavior from the DSA core. Their
54819b3b13cSVladimir Olteanbehavior when set is documented through comments in ``include/net/dsa.h``.
54919b3b13cSVladimir Oltean
55019b3b13cSVladimir Oltean- ``ds->vlan_filtering_is_global``
55119b3b13cSVladimir Oltean
55219b3b13cSVladimir Oltean- ``ds->needs_standalone_vlan_filtering``
55319b3b13cSVladimir Oltean
55419b3b13cSVladimir Oltean- ``ds->configure_vlan_while_not_filtering``
55519b3b13cSVladimir Oltean
55619b3b13cSVladimir Oltean- ``ds->untag_bridge_pvid``
55719b3b13cSVladimir Oltean
55819b3b13cSVladimir Oltean- ``ds->assisted_learning_on_cpu_port``
55919b3b13cSVladimir Oltean
56019b3b13cSVladimir Oltean- ``ds->mtu_enforcement_ingress``
56119b3b13cSVladimir Oltean
56219b3b13cSVladimir Oltean- ``ds->fdb_isolation``
56319b3b13cSVladimir Oltean
56419b3b13cSVladimir OlteanInternally, DSA keeps an array of switch trees (group of switches) global to
56519b3b13cSVladimir Olteanthe kernel, and attaches a ``dsa_switch`` structure to a tree on registration.
56619b3b13cSVladimir OlteanThe tree ID to which the switch is attached is determined by the first u32
56719b3b13cSVladimir Olteannumber of the ``dsa,member`` property of the switch's OF node (0 if missing).
56819b3b13cSVladimir OlteanThe switch ID within the tree is determined by the second u32 number of the
56919b3b13cSVladimir Olteansame OF property (0 if missing). Registering multiple switches with the same
57019b3b13cSVladimir Olteanswitch ID and tree ID is illegal and will cause an error. Using platform data,
57119b3b13cSVladimir Olteana single switch and a single switch tree is permitted.
57219b3b13cSVladimir Oltean
57319b3b13cSVladimir OlteanIn case of a tree with multiple switches, probing takes place asymmetrically.
57419b3b13cSVladimir OlteanThe first N-1 callers of ``dsa_register_switch()`` only add their ports to the
57519b3b13cSVladimir Olteanport list of the tree (``dst->ports``), each port having a backpointer to its
57619b3b13cSVladimir Olteanassociated switch (``dp->ds``). Then, these switches exit their
57719b3b13cSVladimir Oltean``dsa_register_switch()`` call early, because ``dsa_tree_setup_routing_table()``
57819b3b13cSVladimir Olteanhas determined that the tree is not yet complete (not all ports referenced by
57919b3b13cSVladimir OlteanDSA links are present in the tree's port list). The tree becomes complete when
58019b3b13cSVladimir Olteanthe last switch calls ``dsa_register_switch()``, and this triggers the effective
58119b3b13cSVladimir Olteancontinuation of initialization (including the call to ``ds->ops->setup()``) for
58219b3b13cSVladimir Olteanall switches within that tree, all as part of the calling context of the last
58319b3b13cSVladimir Olteanswitch's probe function.
58419b3b13cSVladimir Oltean
58519b3b13cSVladimir OlteanThe opposite of registration takes place when calling ``dsa_unregister_switch()``,
58619b3b13cSVladimir Olteanwhich removes a switch's ports from the port list of the tree. The entire tree
58719b3b13cSVladimir Olteanis torn down when the first switch unregisters.
5883c91d114SIoana Ciornei
58954367831SVladimir OlteanIt is mandatory for DSA switch drivers to implement the ``shutdown()`` callback
59054367831SVladimir Olteanof their respective bus, and call ``dsa_switch_shutdown()`` from it (a minimal
59154367831SVladimir Olteanversion of the full teardown performed by ``dsa_unregister_switch()``).
59254367831SVladimir OlteanThe reason is that DSA keeps a reference on the master net device, and if the
59354367831SVladimir Olteandriver for the master device decides to unbind on shutdown, DSA's reference
59454367831SVladimir Olteanwill block that operation from finalizing.
59554367831SVladimir Oltean
59654367831SVladimir OlteanEither ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` must be called,
59754367831SVladimir Olteanbut not both, and the device driver model permits the bus' ``remove()`` method
59854367831SVladimir Olteanto be called even if ``shutdown()`` was already called. Therefore, drivers are
59954367831SVladimir Olteanexpected to implement a mutual exclusion method between ``remove()`` and
60054367831SVladimir Oltean``shutdown()`` by setting their drvdata to NULL after any of these has run, and
60154367831SVladimir Olteanchecking whether the drvdata is NULL before proceeding to take any action.
60254367831SVladimir Oltean
60354367831SVladimir OlteanAfter ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` was called, no
60454367831SVladimir Olteanfurther callbacks via the provided ``dsa_switch_ops`` may take place, and the
60554367831SVladimir Olteandriver may free the data structures associated with the ``dsa_switch``.
60654367831SVladimir Oltean
6073c91d114SIoana CiorneiSwitch configuration
6083c91d114SIoana Ciornei--------------------
6093c91d114SIoana Ciornei
610c3f0e84dSVladimir Oltean- ``get_tag_protocol``: this is to indicate what kind of tagging protocol is
611c56313a4SVladimir Oltean  supported, should be a valid value from the ``dsa_tag_protocol`` enum.
612c56313a4SVladimir Oltean  The returned information does not have to be static; the driver is passed the
613c56313a4SVladimir Oltean  CPU port number, as well as the tagging protocol of a possibly stacked
614c56313a4SVladimir Oltean  upstream switch, in case there are hardware limitations in terms of supported
615c56313a4SVladimir Oltean  tag formats.
6163c91d114SIoana Ciornei
617d6a0336aSVladimir Oltean- ``change_tag_protocol``: when the default tagging protocol has compatibility
618d6a0336aSVladimir Oltean  problems with the master or other issues, the driver may support changing it
619d6a0336aSVladimir Oltean  at runtime, either through a device tree property or through sysfs. In that
620d6a0336aSVladimir Oltean  case, further calls to ``get_tag_protocol`` should report the protocol in
621d6a0336aSVladimir Oltean  current use.
622d6a0336aSVladimir Oltean
6233c91d114SIoana Ciornei- ``setup``: setup function for the switch, this function is responsible for setting
6243c91d114SIoana Ciornei  up the ``dsa_switch_ops`` private structure with all it needs: register maps,
6255a48b743SBjorn Helgaas  interrupts, mutexes, locks, etc. This function is also expected to properly
6263c91d114SIoana Ciornei  configure the switch to separate all network interfaces from each other, that
6273c91d114SIoana Ciornei  is, they should be isolated by the switch hardware itself, typically by creating
6283c91d114SIoana Ciornei  a Port-based VLAN ID for each port and allowing only the CPU port and the
6293c91d114SIoana Ciornei  specific port to be in the forwarding vector. Ports that are unused by the
6303c91d114SIoana Ciornei  platform should be disabled. Past this function, the switch is expected to be
6313c91d114SIoana Ciornei  fully configured and ready to serve any kind of request. It is recommended
6323c91d114SIoana Ciornei  to issue a software reset of the switch during this setup function in order to
6333c91d114SIoana Ciornei  avoid relying on what a previous software agent such as a bootloader/firmware
634b763f50dSVladimir Oltean  may have previously configured. The method responsible for undoing any
635b763f50dSVladimir Oltean  applicable allocations or operations done here is ``teardown``.
6363c91d114SIoana Ciornei
6373c87237eSVladimir Oltean- ``port_setup`` and ``port_teardown``: methods for initialization and
6383c87237eSVladimir Oltean  destruction of per-port data structures. It is mandatory for some operations
6393c87237eSVladimir Oltean  such as registering and unregistering devlink port regions to be done from
6403c87237eSVladimir Oltean  these methods, otherwise they are optional. A port will be torn down only if
6413c87237eSVladimir Oltean  it has been previously set up. It is possible for a port to be set up during
6423c87237eSVladimir Oltean  probing only to be torn down immediately afterwards, for example in case its
6433c87237eSVladimir Oltean  PHY cannot be found. In this case, probing of the DSA switch continues
6443c87237eSVladimir Oltean  without that particular port.
6453c87237eSVladimir Oltean
646*0773e3a8SVladimir Oltean- ``port_change_master``: method through which the affinity (association used
647*0773e3a8SVladimir Oltean  for traffic termination purposes) between a user port and a CPU port can be
648*0773e3a8SVladimir Oltean  changed. By default all user ports from a tree are assigned to the first
649*0773e3a8SVladimir Oltean  available CPU port that makes sense for them (most of the times this means
650*0773e3a8SVladimir Oltean  the user ports of a tree are all assigned to the same CPU port, except for H
651*0773e3a8SVladimir Oltean  topologies as described in commit 2c0b03258b8b). The ``port`` argument
652*0773e3a8SVladimir Oltean  represents the index of the user port, and the ``master`` argument represents
653*0773e3a8SVladimir Oltean  the new DSA master ``net_device``. The CPU port associated with the new
654*0773e3a8SVladimir Oltean  master can be retrieved by looking at ``struct dsa_port *cpu_dp =
655*0773e3a8SVladimir Oltean  master->dsa_ptr``. Additionally, the master can also be a LAG device where
656*0773e3a8SVladimir Oltean  all the slave devices are physical DSA masters. LAG DSA masters also have a
657*0773e3a8SVladimir Oltean  valid ``master->dsa_ptr`` pointer, however this is not unique, but rather a
658*0773e3a8SVladimir Oltean  duplicate of the first physical DSA master's (LAG slave) ``dsa_ptr``. In case
659*0773e3a8SVladimir Oltean  of a LAG DSA master, a further call to ``port_lag_join`` will be emitted
660*0773e3a8SVladimir Oltean  separately for the physical CPU ports associated with the physical DSA
661*0773e3a8SVladimir Oltean  masters, requesting them to create a hardware LAG associated with the LAG
662*0773e3a8SVladimir Oltean  interface.
663*0773e3a8SVladimir Oltean
6643c91d114SIoana CiorneiPHY devices and link management
6653c91d114SIoana Ciornei-------------------------------
6663c91d114SIoana Ciornei
6673c91d114SIoana Ciornei- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs,
6683c91d114SIoana Ciornei  if the PHY library PHY driver needs to know about information it cannot obtain
6693c91d114SIoana Ciornei  on its own (e.g.: coming from switch memory mapped registers), this function
6705a48b743SBjorn Helgaas  should return a 32-bit bitmask of "flags" that is private between the switch
6713c91d114SIoana Ciornei  driver and the Ethernet PHY driver in ``drivers/net/phy/\*``.
6723c91d114SIoana Ciornei
6733c91d114SIoana Ciornei- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read
6743c91d114SIoana Ciornei  the switch port MDIO registers. If unavailable, return 0xffff for each read.
6753c91d114SIoana Ciornei  For builtin switch Ethernet PHYs, this function should allow reading the link
6765a48b743SBjorn Helgaas  status, auto-negotiation results, link partner pages, etc.
6773c91d114SIoana Ciornei
6783c91d114SIoana Ciornei- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write
6793c91d114SIoana Ciornei  to the switch port MDIO registers. If unavailable return a negative error
6803c91d114SIoana Ciornei  code.
6813c91d114SIoana Ciornei
6823c91d114SIoana Ciornei- ``adjust_link``: Function invoked by the PHY library when a slave network device
6833c91d114SIoana Ciornei  is attached to a PHY device. This function is responsible for appropriately
6843c91d114SIoana Ciornei  configuring the switch port link parameters: speed, duplex, pause based on
6853c91d114SIoana Ciornei  what the ``phy_device`` is providing.
6863c91d114SIoana Ciornei
6873c91d114SIoana Ciornei- ``fixed_link_update``: Function invoked by the PHY library, and specifically by
6883c91d114SIoana Ciornei  the fixed PHY driver asking the switch driver for link parameters that could
6893c91d114SIoana Ciornei  not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
6903c91d114SIoana Ciornei  This is particularly useful for specific kinds of hardware such as QSGMII,
6913c91d114SIoana Ciornei  MoCA or other kinds of non-MDIO managed PHYs where out of band link
6923c91d114SIoana Ciornei  information is obtained
6933c91d114SIoana Ciornei
6943c91d114SIoana CiorneiEthtool operations
6953c91d114SIoana Ciornei------------------
6963c91d114SIoana Ciornei
6973c91d114SIoana Ciornei- ``get_strings``: ethtool function used to query the driver's strings, will
6985a48b743SBjorn Helgaas  typically return statistics strings, private flags strings, etc.
6993c91d114SIoana Ciornei
7003c91d114SIoana Ciornei- ``get_ethtool_stats``: ethtool function used to query per-port statistics and
7013c91d114SIoana Ciornei  return their values. DSA overlays slave network devices general statistics:
7023c91d114SIoana Ciornei  RX/TX counters from the network device, with switch driver specific statistics
7033c91d114SIoana Ciornei  per port
7043c91d114SIoana Ciornei
7053c91d114SIoana Ciornei- ``get_sset_count``: ethtool function used to query the number of statistics items
7063c91d114SIoana Ciornei
7073c91d114SIoana Ciornei- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this
7085a48b743SBjorn Helgaas  function may for certain implementations also query the master network device
7093c91d114SIoana Ciornei  Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
7103c91d114SIoana Ciornei
7113c91d114SIoana Ciornei- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port,
7123c91d114SIoana Ciornei  direct counterpart to set_wol with similar restrictions
7133c91d114SIoana Ciornei
7143c91d114SIoana Ciornei- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green
7153c91d114SIoana Ciornei  Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
7163c91d114SIoana Ciornei  PHY level if relevant. This function should enable EEE at the switch port MAC
7173c91d114SIoana Ciornei  controller and data-processing logic
7183c91d114SIoana Ciornei
7193c91d114SIoana Ciornei- ``get_eee``: ethtool function which is used to query a switch port EEE settings,
7203c91d114SIoana Ciornei  this function should return the EEE state of the switch port MAC controller
7213c91d114SIoana Ciornei  and data-processing logic as well as query the PHY for its currently configured
7223c91d114SIoana Ciornei  EEE settings
7233c91d114SIoana Ciornei
7243c91d114SIoana Ciornei- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM
7253c91d114SIoana Ciornei  length/size in bytes
7263c91d114SIoana Ciornei
7273c91d114SIoana Ciornei- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents
7283c91d114SIoana Ciornei
7293c91d114SIoana Ciornei- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM
7303c91d114SIoana Ciornei
7313c91d114SIoana Ciornei- ``get_regs_len``: ethtool function returning the register length for a given
7323c91d114SIoana Ciornei  switch
7333c91d114SIoana Ciornei
7343c91d114SIoana Ciornei- ``get_regs``: ethtool function returning the Ethernet switch internal register
7353c91d114SIoana Ciornei  contents. This function might require user-land code in ethtool to
7363c91d114SIoana Ciornei  pretty-print register values and registers
7373c91d114SIoana Ciornei
7383c91d114SIoana CiorneiPower management
7393c91d114SIoana Ciornei----------------
7403c91d114SIoana Ciornei
7413c91d114SIoana Ciornei- ``suspend``: function invoked by the DSA platform device when the system goes to
7423c91d114SIoana Ciornei  suspend, should quiesce all Ethernet switch activities, but keep ports
7433c91d114SIoana Ciornei  participating in Wake-on-LAN active as well as additional wake-up logic if
7443c91d114SIoana Ciornei  supported
7453c91d114SIoana Ciornei
7463c91d114SIoana Ciornei- ``resume``: function invoked by the DSA platform device when the system resumes,
7473c91d114SIoana Ciornei  should resume all Ethernet switch activities and re-configure the switch to be
7483c91d114SIoana Ciornei  in a fully active state
7493c91d114SIoana Ciornei
7503c91d114SIoana Ciornei- ``port_enable``: function invoked by the DSA slave network device ndo_open
7515a48b743SBjorn Helgaas  function when a port is administratively brought up, this function should
7525a48b743SBjorn Helgaas  fully enable a given switch port. DSA takes care of marking the port with
7533c91d114SIoana Ciornei  ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it
7543c91d114SIoana Ciornei  was not, and propagating these changes down to the hardware
7553c91d114SIoana Ciornei
7563c91d114SIoana Ciornei- ``port_disable``: function invoked by the DSA slave network device ndo_close
7575a48b743SBjorn Helgaas  function when a port is administratively brought down, this function should
7585a48b743SBjorn Helgaas  fully disable a given switch port. DSA takes care of marking the port with
7593c91d114SIoana Ciornei  ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
7603c91d114SIoana Ciornei  disabled while being a bridge member
7613c91d114SIoana Ciornei
7624e9d9bb6SVladimir OlteanAddress databases
7634e9d9bb6SVladimir Oltean-----------------
7644e9d9bb6SVladimir Oltean
7654e9d9bb6SVladimir OlteanSwitching hardware is expected to have a table for FDB entries, however not all
7664e9d9bb6SVladimir Olteanof them are active at the same time. An address database is the subset (partition)
7674e9d9bb6SVladimir Olteanof FDB entries that is active (can be matched by address learning on RX, or FDB
7684e9d9bb6SVladimir Olteanlookup on TX) depending on the state of the port. An address database may
7694e9d9bb6SVladimir Olteanoccasionally be called "FID" (Filtering ID) in this document, although the
7704e9d9bb6SVladimir Olteanunderlying implementation may choose whatever is available to the hardware.
7714e9d9bb6SVladimir Oltean
7724e9d9bb6SVladimir OlteanFor example, all ports that belong to a VLAN-unaware bridge (which is
7734e9d9bb6SVladimir Oltean*currently* VLAN-unaware) are expected to learn source addresses in the
7744e9d9bb6SVladimir Olteandatabase associated by the driver with that bridge (and not with other
7754e9d9bb6SVladimir OlteanVLAN-unaware bridges). During forwarding and FDB lookup, a packet received on a
7764e9d9bb6SVladimir OlteanVLAN-unaware bridge port should be able to find a VLAN-unaware FDB entry having
7774e9d9bb6SVladimir Olteanthe same MAC DA as the packet, which is present on another port member of the
7784e9d9bb6SVladimir Olteansame bridge. At the same time, the FDB lookup process must be able to not find
7794e9d9bb6SVladimir Olteanan FDB entry having the same MAC DA as the packet, if that entry points towards
7804e9d9bb6SVladimir Olteana port which is a member of a different VLAN-unaware bridge (and is therefore
7814e9d9bb6SVladimir Olteanassociated with a different address database).
7824e9d9bb6SVladimir Oltean
7834e9d9bb6SVladimir OlteanSimilarly, each VLAN of each offloaded VLAN-aware bridge should have an
7844e9d9bb6SVladimir Olteanassociated address database, which is shared by all ports which are members of
7854e9d9bb6SVladimir Olteanthat VLAN, but not shared by ports belonging to different bridges that are
7864e9d9bb6SVladimir Olteanmembers of the same VID.
7874e9d9bb6SVladimir Oltean
7884e9d9bb6SVladimir OlteanIn this context, a VLAN-unaware database means that all packets are expected to
7894e9d9bb6SVladimir Olteanmatch on it irrespective of VLAN ID (only MAC address lookup), whereas a
7904e9d9bb6SVladimir OlteanVLAN-aware database means that packets are supposed to match based on the VLAN
7914e9d9bb6SVladimir OlteanID from the classified 802.1Q header (or the pvid if untagged).
7924e9d9bb6SVladimir Oltean
7934e9d9bb6SVladimir OlteanAt the bridge layer, VLAN-unaware FDB entries have the special VID value of 0,
7944e9d9bb6SVladimir Olteanwhereas VLAN-aware FDB entries have non-zero VID values. Note that a
7954e9d9bb6SVladimir OlteanVLAN-unaware bridge may have VLAN-aware (non-zero VID) FDB entries, and a
7964e9d9bb6SVladimir OlteanVLAN-aware bridge may have VLAN-unaware FDB entries. As in hardware, the
7974e9d9bb6SVladimir Olteansoftware bridge keeps separate address databases, and offloads to hardware the
7984e9d9bb6SVladimir OlteanFDB entries belonging to these databases, through switchdev, asynchronously
7994e9d9bb6SVladimir Olteanrelative to the moment when the databases become active or inactive.
8004e9d9bb6SVladimir Oltean
8014e9d9bb6SVladimir OlteanWhen a user port operates in standalone mode, its driver should configure it to
8024e9d9bb6SVladimir Olteanuse a separate database called a port private database. This is different from
8034e9d9bb6SVladimir Olteanthe databases described above, and should impede operation as standalone port
8044e9d9bb6SVladimir Oltean(packet in, packet out to the CPU port) as little as possible. For example,
8054e9d9bb6SVladimir Olteanon ingress, it should not attempt to learn the MAC SA of ingress traffic, since
8064e9d9bb6SVladimir Olteanlearning is a bridging layer service and this is a standalone port, therefore
8074e9d9bb6SVladimir Olteanit would consume useless space. With no address learning, the port private
8084e9d9bb6SVladimir Olteandatabase should be empty in a naive implementation, and in this case, all
8094e9d9bb6SVladimir Olteanreceived packets should be trivially flooded to the CPU port.
8104e9d9bb6SVladimir Oltean
8114e9d9bb6SVladimir OlteanDSA (cascade) and CPU ports are also called "shared" ports because they service
8124e9d9bb6SVladimir Olteanmultiple address databases, and the database that a packet should be associated
8134e9d9bb6SVladimir Olteanto is usually embedded in the DSA tag. This means that the CPU port may
8144e9d9bb6SVladimir Olteansimultaneously transport packets coming from a standalone port (which were
8154e9d9bb6SVladimir Olteanclassified by hardware in one address database), and from a bridge port (which
8164e9d9bb6SVladimir Olteanwere classified to a different address database).
8174e9d9bb6SVladimir Oltean
8184e9d9bb6SVladimir OlteanSwitch drivers which satisfy certain criteria are able to optimize the naive
8194e9d9bb6SVladimir Olteanconfiguration by removing the CPU port from the flooding domain of the switch,
8204e9d9bb6SVladimir Olteanand just program the hardware with FDB entries pointing towards the CPU port
8214e9d9bb6SVladimir Olteanfor which it is known that software is interested in those MAC addresses.
8224e9d9bb6SVladimir OlteanPackets which do not match a known FDB entry will not be delivered to the CPU,
8234e9d9bb6SVladimir Olteanwhich will save CPU cycles required for creating an skb just to drop it.
8244e9d9bb6SVladimir Oltean
8254e9d9bb6SVladimir OlteanDSA is able to perform host address filtering for the following kinds of
8264e9d9bb6SVladimir Olteanaddresses:
8274e9d9bb6SVladimir Oltean
8284e9d9bb6SVladimir Oltean- Primary unicast MAC addresses of ports (``dev->dev_addr``). These are
8294e9d9bb6SVladimir Oltean  associated with the port private database of the respective user port,
8304e9d9bb6SVladimir Oltean  and the driver is notified to install them through ``port_fdb_add`` towards
8314e9d9bb6SVladimir Oltean  the CPU port.
8324e9d9bb6SVladimir Oltean
8334e9d9bb6SVladimir Oltean- Secondary unicast and multicast MAC addresses of ports (addresses added
8344e9d9bb6SVladimir Oltean  through ``dev_uc_add()`` and ``dev_mc_add()``). These are also associated
8354e9d9bb6SVladimir Oltean  with the port private database of the respective user port.
8364e9d9bb6SVladimir Oltean
8374e9d9bb6SVladimir Oltean- Local/permanent bridge FDB entries (``BR_FDB_LOCAL``). These are the MAC
8384e9d9bb6SVladimir Oltean  addresses of the bridge ports, for which packets must be terminated locally
8394e9d9bb6SVladimir Oltean  and not forwarded. They are associated with the address database for that
8404e9d9bb6SVladimir Oltean  bridge.
8414e9d9bb6SVladimir Oltean
8424e9d9bb6SVladimir Oltean- Static bridge FDB entries installed towards foreign (non-DSA) interfaces
8434e9d9bb6SVladimir Oltean  present in the same bridge as some DSA switch ports. These are also
8444e9d9bb6SVladimir Oltean  associated with the address database for that bridge.
8454e9d9bb6SVladimir Oltean
8464e9d9bb6SVladimir Oltean- Dynamically learned FDB entries on foreign interfaces present in the same
8474e9d9bb6SVladimir Oltean  bridge as some DSA switch ports, only if ``ds->assisted_learning_on_cpu_port``
8484e9d9bb6SVladimir Oltean  is set to true by the driver. These are associated with the address database
8494e9d9bb6SVladimir Oltean  for that bridge.
8504e9d9bb6SVladimir Oltean
8514e9d9bb6SVladimir OlteanFor various operations detailed below, DSA provides a ``dsa_db`` structure
8524e9d9bb6SVladimir Olteanwhich can be of the following types:
8534e9d9bb6SVladimir Oltean
8544e9d9bb6SVladimir Oltean- ``DSA_DB_PORT``: the FDB (or MDB) entry to be installed or deleted belongs to
8554e9d9bb6SVladimir Oltean  the port private database of user port ``db->dp``.
8564e9d9bb6SVladimir Oltean- ``DSA_DB_BRIDGE``: the entry belongs to one of the address databases of bridge
8574e9d9bb6SVladimir Oltean  ``db->bridge``. Separation between the VLAN-unaware database and the per-VID
8584e9d9bb6SVladimir Oltean  databases of this bridge is expected to be done by the driver.
8594e9d9bb6SVladimir Oltean- ``DSA_DB_LAG``: the entry belongs to the address database of LAG ``db->lag``.
8604e9d9bb6SVladimir Oltean  Note: ``DSA_DB_LAG`` is currently unused and may be removed in the future.
8614e9d9bb6SVladimir Oltean
8624e9d9bb6SVladimir OlteanThe drivers which act upon the ``dsa_db`` argument in ``port_fdb_add``,
8634e9d9bb6SVladimir Oltean``port_mdb_add`` etc should declare ``ds->fdb_isolation`` as true.
8644e9d9bb6SVladimir Oltean
8654e9d9bb6SVladimir OlteanDSA associates each offloaded bridge and each offloaded LAG with a one-based ID
8664e9d9bb6SVladimir Oltean(``struct dsa_bridge :: num``, ``struct dsa_lag :: id``) for the purposes of
8674e9d9bb6SVladimir Olteanrefcounting addresses on shared ports. Drivers may piggyback on DSA's numbering
8684e9d9bb6SVladimir Olteanscheme (the ID is readable through ``db->bridge.num`` and ``db->lag.id`` or may
8694e9d9bb6SVladimir Olteanimplement their own.
8704e9d9bb6SVladimir Oltean
8714e9d9bb6SVladimir OlteanOnly the drivers which declare support for FDB isolation are notified of FDB
8724e9d9bb6SVladimir Olteanentries on the CPU port belonging to ``DSA_DB_PORT`` databases.
8734e9d9bb6SVladimir OlteanFor compatibility/legacy reasons, ``DSA_DB_BRIDGE`` addresses are notified to
8744e9d9bb6SVladimir Olteandrivers even if they do not support FDB isolation. However, ``db->bridge.num``
8754e9d9bb6SVladimir Olteanand ``db->lag.id`` are always set to 0 in that case (to denote the lack of
8764e9d9bb6SVladimir Olteanisolation, for refcounting purposes).
8774e9d9bb6SVladimir Oltean
8784e9d9bb6SVladimir OlteanNote that it is not mandatory for a switch driver to implement physically
8794e9d9bb6SVladimir Olteanseparate address databases for each standalone user port. Since FDB entries in
8804e9d9bb6SVladimir Olteanthe port private databases will always point to the CPU port, there is no risk
8814e9d9bb6SVladimir Olteanfor incorrect forwarding decisions. In this case, all standalone ports may
8824e9d9bb6SVladimir Olteanshare the same database, but the reference counting of host-filtered addresses
8834e9d9bb6SVladimir Oltean(not deleting the FDB entry for a port's MAC address if it's still in use by
8844e9d9bb6SVladimir Olteananother port) becomes the responsibility of the driver, because DSA is unaware
8854e9d9bb6SVladimir Olteanthat the port databases are in fact shared. This can be achieved by calling
8864e9d9bb6SVladimir Oltean``dsa_fdb_present_in_other_db()`` and ``dsa_mdb_present_in_other_db()``.
8874e9d9bb6SVladimir OlteanThe down side is that the RX filtering lists of each user port are in fact
8884e9d9bb6SVladimir Olteanshared, which means that user port A may accept a packet with a MAC DA it
8894e9d9bb6SVladimir Olteanshouldn't have, only because that MAC address was in the RX filtering list of
8904e9d9bb6SVladimir Olteanuser port B. These packets will still be dropped in software, however.
8914e9d9bb6SVladimir Oltean
8923c91d114SIoana CiorneiBridge layer
8933c91d114SIoana Ciornei------------
8943c91d114SIoana Ciornei
89530836239SVladimir OlteanOffloading the bridge forwarding plane is optional and handled by the methods
89630836239SVladimir Olteanbelow. They may be absent, return -EOPNOTSUPP, or ``ds->max_num_bridges`` may
89730836239SVladimir Olteanbe non-zero and exceeded, and in this case, joining a bridge port is still
89830836239SVladimir Olteanpossible, but the packet forwarding will take place in software, and the ports
89930836239SVladimir Olteanunder a software bridge must remain configured in the same way as for
90030836239SVladimir Olteanstandalone operation, i.e. have all bridging service functions (address
90130836239SVladimir Olteanlearning etc) disabled, and send all received packets to the CPU port only.
90230836239SVladimir Oltean
90330836239SVladimir OlteanConcretely, a port starts offloading the forwarding plane of a bridge once it
90430836239SVladimir Olteanreturns success to the ``port_bridge_join`` method, and stops doing so after
90530836239SVladimir Oltean``port_bridge_leave`` has been called. Offloading the bridge means autonomously
90630836239SVladimir Olteanlearning FDB entries in accordance with the software bridge port's state, and
90730836239SVladimir Olteanautonomously forwarding (or flooding) received packets without CPU intervention.
90830836239SVladimir OlteanThis is optional even when offloading a bridge port. Tagging protocol drivers
90930836239SVladimir Olteanare expected to call ``dsa_default_offload_fwd_mark(skb)`` for packets which
91030836239SVladimir Olteanhave already been autonomously forwarded in the forwarding domain of the
91130836239SVladimir Olteaningress switch port. DSA, through ``dsa_port_devlink_setup()``, considers all
91230836239SVladimir Olteanswitch ports part of the same tree ID to be part of the same bridge forwarding
91330836239SVladimir Olteandomain (capable of autonomous forwarding to each other).
91430836239SVladimir Oltean
91530836239SVladimir OlteanOffloading the TX forwarding process of a bridge is a distinct concept from
91630836239SVladimir Olteansimply offloading its forwarding plane, and refers to the ability of certain
91730836239SVladimir Olteandriver and tag protocol combinations to transmit a single skb coming from the
91830836239SVladimir Olteanbridge device's transmit function to potentially multiple egress ports (and
91930836239SVladimir Olteanthereby avoid its cloning in software).
92030836239SVladimir Oltean
92130836239SVladimir OlteanPackets for which the bridge requests this behavior are called data plane
92230836239SVladimir Olteanpackets and have ``skb->offload_fwd_mark`` set to true in the tag protocol
92330836239SVladimir Olteandriver's ``xmit`` function. Data plane packets are subject to FDB lookup,
92430836239SVladimir Olteanhardware learning on the CPU port, and do not override the port STP state.
92530836239SVladimir OlteanAdditionally, replication of data plane packets (multicast, flooding) is
92630836239SVladimir Olteanhandled in hardware and the bridge driver will transmit a single skb for each
92730836239SVladimir Olteanpacket that may or may not need replication.
92830836239SVladimir Oltean
92930836239SVladimir OlteanWhen the TX forwarding offload is enabled, the tag protocol driver is
93030836239SVladimir Olteanresponsible to inject packets into the data plane of the hardware towards the
93130836239SVladimir Olteancorrect bridging domain (FID) that the port is a part of. The port may be
93230836239SVladimir OlteanVLAN-unaware, and in this case the FID must be equal to the FID used by the
93330836239SVladimir Olteandriver for its VLAN-unaware address database associated with that bridge.
93430836239SVladimir OlteanAlternatively, the bridge may be VLAN-aware, and in that case, it is guaranteed
93530836239SVladimir Olteanthat the packet is also VLAN-tagged with the VLAN ID that the bridge processed
93630836239SVladimir Olteanthis packet in. It is the responsibility of the hardware to untag the VID on
93730836239SVladimir Olteanthe egress-untagged ports, or keep the tag on the egress-tagged ones.
93830836239SVladimir Oltean
9393c91d114SIoana Ciornei- ``port_bridge_join``: bridge layer function invoked when a given switch port is
9405a48b743SBjorn Helgaas  added to a bridge, this function should do what's necessary at the switch
9415a48b743SBjorn Helgaas  level to permit the joining port to be added to the relevant logical
9423c91d114SIoana Ciornei  domain for it to ingress/egress traffic with other members of the bridge.
94330836239SVladimir Oltean  By setting the ``tx_fwd_offload`` argument to true, the TX forwarding process
94430836239SVladimir Oltean  of this bridge is also offloaded.
9453c91d114SIoana Ciornei
9463c91d114SIoana Ciornei- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
9475a48b743SBjorn Helgaas  removed from a bridge, this function should do what's necessary at the
9483c91d114SIoana Ciornei  switch level to deny the leaving port from ingress/egress traffic from the
9490cb8682eSVladimir Oltean  remaining bridge members.
9503c91d114SIoana Ciornei
9513c91d114SIoana Ciornei- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
9523c91d114SIoana Ciornei  state is computed by the bridge layer and should be propagated to switch
9530cb8682eSVladimir Oltean  hardware to forward/block/learn traffic.
9543c91d114SIoana Ciornei
9555a275f4cSVladimir Oltean- ``port_bridge_flags``: bridge layer function invoked when a port must
9565a275f4cSVladimir Oltean  configure its settings for e.g. flooding of unknown traffic or source address
9575a275f4cSVladimir Oltean  learning. The switch driver is responsible for initial setup of the
9585a275f4cSVladimir Oltean  standalone ports with address learning disabled and egress flooding of all
9595a275f4cSVladimir Oltean  types of traffic, then the DSA core notifies of any change to the bridge port
9605a275f4cSVladimir Oltean  flags when the port joins and leaves a bridge. DSA does not currently manage
9615a275f4cSVladimir Oltean  the bridge port flags for the CPU port. The assumption is that address
9625a275f4cSVladimir Oltean  learning should be statically enabled (if supported by the hardware) on the
9635a275f4cSVladimir Oltean  CPU port, and flooding towards the CPU port should also be enabled, due to a
9645a275f4cSVladimir Oltean  lack of an explicit address filtering mechanism in the DSA core.
9655a275f4cSVladimir Oltean
9660cb8682eSVladimir Oltean- ``port_fast_age``: bridge layer function invoked when flushing the
9670cb8682eSVladimir Oltean  dynamically learned FDB entries on the port is necessary. This is called when
9680cb8682eSVladimir Oltean  transitioning from an STP state where learning should take place to an STP
9690cb8682eSVladimir Oltean  state where it shouldn't, or when leaving a bridge, or when address learning
9700cb8682eSVladimir Oltean  is turned off via ``port_bridge_flags``.
9710cb8682eSVladimir Oltean
9723c91d114SIoana CiorneiBridge VLAN filtering
9733c91d114SIoana Ciornei---------------------
9743c91d114SIoana Ciornei
9753c91d114SIoana Ciornei- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets
9763c91d114SIoana Ciornei  configured for turning on or off VLAN filtering. If nothing specific needs to
9773c91d114SIoana Ciornei  be done at the hardware level, this callback does not need to be implemented.
9783c91d114SIoana Ciornei  When VLAN filtering is turned on, the hardware must be programmed with
9793c91d114SIoana Ciornei  rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed
9803c91d114SIoana Ciornei  VLAN ID map/rules.  If there is no PVID programmed into the switch port,
9813c91d114SIoana Ciornei  untagged frames must be rejected as well. When turned off the switch must
9823c91d114SIoana Ciornei  accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are
9833c91d114SIoana Ciornei  allowed.
9843c91d114SIoana Ciornei
9853c91d114SIoana Ciornei- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
9867b02f403SVladimir Oltean  (tagged or untagged) for the given switch port. The CPU port becomes a member
9877b02f403SVladimir Oltean  of a VLAN only if a foreign bridge port is also a member of it (and
9887b02f403SVladimir Oltean  forwarding needs to take place in software), or the VLAN is installed to the
9897b02f403SVladimir Oltean  VLAN group of the bridge device itself, for termination purposes
9907b02f403SVladimir Oltean  (``bridge vlan add dev br0 vid 100 self``). VLANs on shared ports are
9917b02f403SVladimir Oltean  reference counted and removed when there is no user left. Drivers do not need
9927b02f403SVladimir Oltean  to manually install a VLAN on the CPU port.
9933c91d114SIoana Ciornei
9943c91d114SIoana Ciornei- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
9953c91d114SIoana Ciornei  given switch port
9963c91d114SIoana Ciornei
9973c91d114SIoana Ciornei- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
9983c91d114SIoana Ciornei  Forwarding Database entry, the switch hardware should be programmed with the
9993c91d114SIoana Ciornei  specified address in the specified VLAN Id in the forwarding database
10006ba1a4aaSVladimir Oltean  associated with this VLAN ID.
10013c91d114SIoana Ciornei
10023c91d114SIoana Ciornei- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
10033c91d114SIoana Ciornei  Forwarding Database entry, the switch hardware should be programmed to delete
10043c91d114SIoana Ciornei  the specified MAC address from the specified VLAN ID if it was mapped into
10053c91d114SIoana Ciornei  this port forwarding database
10063c91d114SIoana Ciornei
1007ea7006a7SVladimir Oltean- ``port_fdb_dump``: bridge bypass function invoked by ``ndo_fdb_dump`` on the
1008ea7006a7SVladimir Oltean  physical DSA port interfaces. Since DSA does not attempt to keep in sync its
1009ea7006a7SVladimir Oltean  hardware FDB entries with the software bridge, this method is implemented as
1010ea7006a7SVladimir Oltean  a means to view the entries visible on user ports in the hardware database.
1011ea7006a7SVladimir Oltean  The entries reported by this function have the ``self`` flag in the output of
1012ea7006a7SVladimir Oltean  the ``bridge fdb show`` command.
10133c91d114SIoana Ciornei
10143c91d114SIoana Ciornei- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
10156ba1a4aaSVladimir Oltean  a multicast database entry. The switch hardware should be programmed with the
10163c91d114SIoana Ciornei  specified address in the specified VLAN ID in the forwarding database
10173c91d114SIoana Ciornei  associated with this VLAN ID.
10183c91d114SIoana Ciornei
10193c91d114SIoana Ciornei- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
10203c91d114SIoana Ciornei  multicast database entry, the switch hardware should be programmed to delete
10213c91d114SIoana Ciornei  the specified MAC address from the specified VLAN ID if it was mapped into
10223c91d114SIoana Ciornei  this port forwarding database.
10233c91d114SIoana Ciornei
1024a9985444SVladimir OlteanLink aggregation
1025a9985444SVladimir Oltean----------------
1026a9985444SVladimir Oltean
1027a9985444SVladimir OlteanLink aggregation is implemented in the Linux networking stack by the bonding
1028a9985444SVladimir Olteanand team drivers, which are modeled as virtual, stackable network interfaces.
1029a9985444SVladimir OlteanDSA is capable of offloading a link aggregation group (LAG) to hardware that
1030a9985444SVladimir Olteansupports the feature, and supports bridging between physical ports and LAGs,
1031a9985444SVladimir Olteanas well as between LAGs. A bonding/team interface which holds multiple physical
1032a9985444SVladimir Olteanports constitutes a logical port, although DSA has no explicit concept of a
1033a9985444SVladimir Olteanlogical port at the moment. Due to this, events where a LAG joins/leaves a
1034a9985444SVladimir Olteanbridge are treated as if all individual physical ports that are members of that
1035a9985444SVladimir OlteanLAG join/leave the bridge. Switchdev port attributes (VLAN filtering, STP
1036a9985444SVladimir Olteanstate, etc) and objects (VLANs, MDB entries) offloaded to a LAG as bridge port
1037a9985444SVladimir Olteanare treated similarly: DSA offloads the same switchdev object / port attribute
1038a9985444SVladimir Olteanon all members of the LAG. Static bridge FDB entries on a LAG are not yet
1039a9985444SVladimir Olteansupported, since the DSA driver API does not have the concept of a logical port
1040a9985444SVladimir OlteanID.
1041a9985444SVladimir Oltean
1042a9985444SVladimir Oltean- ``port_lag_join``: function invoked when a given switch port is added to a
1043a9985444SVladimir Oltean  LAG. The driver may return ``-EOPNOTSUPP``, and in this case, DSA will fall
1044a9985444SVladimir Oltean  back to a software implementation where all traffic from this port is sent to
1045a9985444SVladimir Oltean  the CPU.
1046a9985444SVladimir Oltean- ``port_lag_leave``: function invoked when a given switch port leaves a LAG
1047a9985444SVladimir Oltean  and returns to operation as a standalone port.
1048a9985444SVladimir Oltean- ``port_lag_change``: function invoked when the link state of any member of
1049a9985444SVladimir Oltean  the LAG changes, and the hashing function needs rebalancing to only make use
1050a9985444SVladimir Oltean  of the subset of physical LAG member ports that are up.
1051a9985444SVladimir Oltean
1052a9985444SVladimir OlteanDrivers that benefit from having an ID associated with each offloaded LAG
1053a9985444SVladimir Olteancan optionally populate ``ds->num_lag_ids`` from the ``dsa_switch_ops::setup``
1054a9985444SVladimir Olteanmethod. The LAG ID associated with a bonding/team interface can then be
1055a9985444SVladimir Olteanretrieved by a DSA switch driver using the ``dsa_lag_id`` function.
1056a9985444SVladimir Oltean
1057f8f3c20aSVladimir OlteanIEC 62439-2 (MRP)
1058f8f3c20aSVladimir Oltean-----------------
1059f8f3c20aSVladimir Oltean
1060f8f3c20aSVladimir OlteanThe Media Redundancy Protocol is a topology management protocol optimized for
1061f8f3c20aSVladimir Olteanfast fault recovery time for ring networks, which has some components
1062f8f3c20aSVladimir Olteanimplemented as a function of the bridge driver. MRP uses management PDUs
1063f8f3c20aSVladimir Oltean(Test, Topology, LinkDown/Up, Option) sent at a multicast destination MAC
1064f8f3c20aSVladimir Olteanaddress range of 01:15:4e:00:00:0x and with an EtherType of 0x88e3.
1065f8f3c20aSVladimir OlteanDepending on the node's role in the ring (MRM: Media Redundancy Manager,
1066f8f3c20aSVladimir OlteanMRC: Media Redundancy Client, MRA: Media Redundancy Automanager), certain MRP
1067f8f3c20aSVladimir OlteanPDUs might need to be terminated locally and others might need to be forwarded.
1068f8f3c20aSVladimir OlteanAn MRM might also benefit from offloading to hardware the creation and
1069f8f3c20aSVladimir Olteantransmission of certain MRP PDUs (Test).
1070f8f3c20aSVladimir Oltean
1071f8f3c20aSVladimir OlteanNormally an MRP instance can be created on top of any network interface,
1072f8f3c20aSVladimir Olteanhowever in the case of a device with an offloaded data path such as DSA, it is
1073f8f3c20aSVladimir Olteannecessary for the hardware, even if it is not MRP-aware, to be able to extract
1074f8f3c20aSVladimir Olteanthe MRP PDUs from the fabric before the driver can proceed with the software
1075f8f3c20aSVladimir Olteanimplementation. DSA today has no driver which is MRP-aware, therefore it only
1076f8f3c20aSVladimir Olteanlistens for the bare minimum switchdev objects required for the software assist
1077f8f3c20aSVladimir Olteanto work properly. The operations are detailed below.
1078f8f3c20aSVladimir Oltean
1079f8f3c20aSVladimir Oltean- ``port_mrp_add`` and ``port_mrp_del``: notifies driver when an MRP instance
1080f8f3c20aSVladimir Oltean  with a certain ring ID, priority, primary port and secondary port is
1081f8f3c20aSVladimir Oltean  created/deleted.
1082f8f3c20aSVladimir Oltean- ``port_mrp_add_ring_role`` and ``port_mrp_del_ring_role``: function invoked
1083f8f3c20aSVladimir Oltean  when an MRP instance changes ring roles between MRM or MRC. This affects
1084f8f3c20aSVladimir Oltean  which MRP PDUs should be trapped to software and which should be autonomously
1085f8f3c20aSVladimir Oltean  forwarded.
1086f8f3c20aSVladimir Oltean
10876e9530f4SVladimir OlteanIEC 62439-3 (HSR/PRP)
10886e9530f4SVladimir Oltean---------------------
10896e9530f4SVladimir Oltean
10906e9530f4SVladimir OlteanThe Parallel Redundancy Protocol (PRP) is a network redundancy protocol which
10916e9530f4SVladimir Olteanworks by duplicating and sequence numbering packets through two independent L2
10926e9530f4SVladimir Olteannetworks (which are unaware of the PRP tail tags carried in the packets), and
10936e9530f4SVladimir Olteaneliminating the duplicates at the receiver. The High-availability Seamless
10946e9530f4SVladimir OlteanRedundancy (HSR) protocol is similar in concept, except all nodes that carry
10956e9530f4SVladimir Olteanthe redundant traffic are aware of the fact that it is HSR-tagged (because HSR
10966e9530f4SVladimir Olteanuses a header with an EtherType of 0x892f) and are physically connected in a
10976e9530f4SVladimir Olteanring topology. Both HSR and PRP use supervision frames for monitoring the
10986e9530f4SVladimir Olteanhealth of the network and for discovery of other nodes.
10996e9530f4SVladimir Oltean
11006e9530f4SVladimir OlteanIn Linux, both HSR and PRP are implemented in the hsr driver, which
11016e9530f4SVladimir Olteaninstantiates a virtual, stackable network interface with two member ports.
11026e9530f4SVladimir OlteanThe driver only implements the basic roles of DANH (Doubly Attached Node
11036e9530f4SVladimir Olteanimplementing HSR) and DANP (Doubly Attached Node implementing PRP); the roles
11046e9530f4SVladimir Olteanof RedBox and QuadBox are not implemented (therefore, bridging a hsr network
11056e9530f4SVladimir Olteaninterface with a physical switch port does not produce the expected result).
11066e9530f4SVladimir Oltean
11076e9530f4SVladimir OlteanA driver which is able of offloading certain functions of a DANP or DANH should
11086e9530f4SVladimir Olteandeclare the corresponding netdev features as indicated by the documentation at
11096e9530f4SVladimir Oltean``Documentation/networking/netdev-features.rst``. Additionally, the following
11106e9530f4SVladimir Olteanmethods must be implemented:
11116e9530f4SVladimir Oltean
11126e9530f4SVladimir Oltean- ``port_hsr_join``: function invoked when a given switch port is added to a
11136e9530f4SVladimir Oltean  DANP/DANH. The driver may return ``-EOPNOTSUPP`` and in this case, DSA will
11146e9530f4SVladimir Oltean  fall back to a software implementation where all traffic from this port is
11156e9530f4SVladimir Oltean  sent to the CPU.
11166e9530f4SVladimir Oltean- ``port_hsr_leave``: function invoked when a given switch port leaves a
11176e9530f4SVladimir Oltean  DANP/DANH and returns to normal operation as a standalone port.
11186e9530f4SVladimir Oltean
11193c91d114SIoana CiorneiTODO
11203c91d114SIoana Ciornei====
11213c91d114SIoana Ciornei
11223c91d114SIoana CiorneiMaking SWITCHDEV and DSA converge towards an unified codebase
11233c91d114SIoana Ciornei-------------------------------------------------------------
11243c91d114SIoana Ciornei
11253c91d114SIoana CiorneiSWITCHDEV properly takes care of abstracting the networking stack with offload
11263c91d114SIoana Ciorneicapable hardware, but does not enforce a strict switch device driver model. On
11273c91d114SIoana Ciorneithe other DSA enforces a fairly strict device driver model, and deals with most
11283c91d114SIoana Ciorneiof the switch specific. At some point we should envision a merger between these
11293c91d114SIoana Ciorneitwo subsystems and get the best of both worlds.
1130