13c91d114SIoana Ciornei============ 23c91d114SIoana CiorneiArchitecture 33c91d114SIoana Ciornei============ 43c91d114SIoana Ciornei 53c91d114SIoana CiorneiThis document describes the **Distributed Switch Architecture (DSA)** subsystem 63c91d114SIoana Ciorneidesign principles, limitations, interactions with other subsystems, and how to 73c91d114SIoana Ciorneidevelop drivers for this subsystem as well as a TODO for developers interested 83c91d114SIoana Ciorneiin joining the effort. 93c91d114SIoana Ciornei 103c91d114SIoana CiorneiDesign principles 113c91d114SIoana Ciornei================= 123c91d114SIoana Ciornei 13*5a48b743SBjorn HelgaasThe Distributed Switch Architecture subsystem was primarily designed to 14*5a48b743SBjorn Helgaassupport Marvell Ethernet switches (MV88E6xxx, a.k.a. Link Street product 15*5a48b743SBjorn Helgaasline) using Linux, but has since evolved to support other vendors as well. 163c91d114SIoana Ciornei 173c91d114SIoana CiorneiThe original philosophy behind this design was to be able to use unmodified 183c91d114SIoana CiorneiLinux tools such as bridge, iproute2, ifconfig to work transparently whether 193c91d114SIoana Ciorneithey configured/queried a switch port network device or a regular network 203c91d114SIoana Ciorneidevice. 213c91d114SIoana Ciornei 22*5a48b743SBjorn HelgaasAn Ethernet switch typically comprises multiple front-panel ports and one 23*5a48b743SBjorn Helgaasor more CPU or management ports. The DSA subsystem currently relies on the 243c91d114SIoana Ciorneipresence of a management port connected to an Ethernet controller capable of 253c91d114SIoana Ciorneireceiving Ethernet frames from the switch. This is a very common setup for all 263c91d114SIoana Ciorneikinds of Ethernet switches found in Small Home and Office products: routers, 27*5a48b743SBjorn Helgaasgateways, or even top-of-rack switches. This host Ethernet controller will 283c91d114SIoana Ciorneibe later referred to as "master" and "cpu" in DSA terminology and code. 293c91d114SIoana Ciornei 303c91d114SIoana CiorneiThe D in DSA stands for Distributed, because the subsystem has been designed 313c91d114SIoana Ciorneiwith the ability to configure and manage cascaded switches on top of each other 323c91d114SIoana Ciorneiusing upstream and downstream Ethernet links between switches. These specific 333c91d114SIoana Ciorneiports are referred to as "dsa" ports in DSA terminology and code. A collection 343c91d114SIoana Ciorneiof multiple switches connected to each other is called a "switch tree". 353c91d114SIoana Ciornei 36*5a48b743SBjorn HelgaasFor each front-panel port, DSA creates specialized network devices which are 373c91d114SIoana Ciorneiused as controlling and data-flowing endpoints for use by the Linux networking 383c91d114SIoana Ciorneistack. These specialized network interfaces are referred to as "slave" network 393c91d114SIoana Ciorneiinterfaces in DSA terminology and code. 403c91d114SIoana Ciornei 413c91d114SIoana CiorneiThe ideal case for using DSA is when an Ethernet switch supports a "switch tag" 423c91d114SIoana Ciorneiwhich is a hardware feature making the switch insert a specific tag for each 43*5a48b743SBjorn HelgaasEthernet frame it receives to/from specific ports to help the management 443c91d114SIoana Ciorneiinterface figure out: 453c91d114SIoana Ciornei 463c91d114SIoana Ciornei- what port is this frame coming from 473c91d114SIoana Ciornei- what was the reason why this frame got forwarded 483c91d114SIoana Ciornei- how to send CPU originated traffic to specific ports 493c91d114SIoana Ciornei 503c91d114SIoana CiorneiThe subsystem does support switches not capable of inserting/stripping tags, but 513c91d114SIoana Ciorneithe features might be slightly limited in that case (traffic separation relies 523c91d114SIoana Ciorneion Port-based VLAN IDs). 533c91d114SIoana Ciornei 543c91d114SIoana CiorneiNote that DSA does not currently create network interfaces for the "cpu" and 553c91d114SIoana Ciornei"dsa" ports because: 563c91d114SIoana Ciornei 573c91d114SIoana Ciornei- the "cpu" port is the Ethernet switch facing side of the management 583c91d114SIoana Ciornei controller, and as such, would create a duplication of feature, since you 593c91d114SIoana Ciornei would get two interfaces for the same conduit: master netdev, and "cpu" netdev 603c91d114SIoana Ciornei 613c91d114SIoana Ciornei- the "dsa" port(s) are just conduits between two or more switches, and as such 623c91d114SIoana Ciornei cannot really be used as proper network interfaces either, only the 633c91d114SIoana Ciornei downstream, or the top-most upstream interface makes sense with that model 643c91d114SIoana Ciornei 653c91d114SIoana CiorneiSwitch tagging protocols 663c91d114SIoana Ciornei------------------------ 673c91d114SIoana Ciornei 687714ee15SVladimir OlteanDSA supports many vendor-specific tagging protocols, one software-defined 697714ee15SVladimir Olteantagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``). 703c91d114SIoana Ciornei 713c91d114SIoana CiorneiThe exact format of the tag protocol is vendor specific, but in general, they 723c91d114SIoana Ciorneiall contain something which: 733c91d114SIoana Ciornei 743c91d114SIoana Ciornei- identifies which port the Ethernet frame came from/should be sent to 753c91d114SIoana Ciornei- provides a reason why this frame was forwarded to the management interface 763c91d114SIoana Ciornei 777714ee15SVladimir OlteanAll tagging protocols are in ``net/dsa/tag_*.c`` files and implement the 787714ee15SVladimir Olteanmethods of the ``struct dsa_device_ops`` structure, which are detailed below. 797714ee15SVladimir Oltean 807714ee15SVladimir OlteanTagging protocols generally fall in one of three categories: 817714ee15SVladimir Oltean 827714ee15SVladimir Oltean1. The switch-specific frame header is located before the Ethernet header, 837714ee15SVladimir Oltean shifting to the right (from the perspective of the DSA master's frame 847714ee15SVladimir Oltean parser) the MAC DA, MAC SA, EtherType and the entire L2 payload. 857714ee15SVladimir Oltean2. The switch-specific frame header is located before the EtherType, keeping 867714ee15SVladimir Oltean the MAC DA and MAC SA in place from the DSA master's perspective, but 877714ee15SVladimir Oltean shifting the 'real' EtherType and L2 payload to the right. 887714ee15SVladimir Oltean3. The switch-specific frame header is located at the tail of the packet, 897714ee15SVladimir Oltean keeping all frame headers in place and not altering the view of the packet 907714ee15SVladimir Oltean that the DSA master's frame parser has. 917714ee15SVladimir Oltean 927714ee15SVladimir OlteanA tagging protocol may tag all packets with switch tags of the same length, or 937714ee15SVladimir Olteanthe tag length might vary (for example packets with PTP timestamps might 947714ee15SVladimir Olteanrequire an extended switch tag, or there might be one tag length on TX and a 957714ee15SVladimir Olteandifferent one on RX). Either way, the tagging protocol driver must populate the 964e500251SVladimir Oltean``struct dsa_device_ops::needed_headroom`` and/or ``struct dsa_device_ops::needed_tailroom`` 974e500251SVladimir Olteanwith the length in octets of the longest switch frame header/trailer. The DSA 984e500251SVladimir Olteanframework will automatically adjust the MTU of the master interface to 994e500251SVladimir Olteanaccommodate for this extra size in order for DSA user ports to support the 1004e500251SVladimir Olteanstandard MTU (L2 payload length) of 1500 octets. The ``needed_headroom`` and 1014e500251SVladimir Oltean``needed_tailroom`` properties are also used to request from the network stack, 1024e500251SVladimir Olteanon a best-effort basis, the allocation of packets with enough extra space such 1034e500251SVladimir Olteanthat the act of pushing the switch tag on transmission of a packet does not 1044e500251SVladimir Olteancause it to reallocate due to lack of memory. 1057714ee15SVladimir Oltean 1067714ee15SVladimir OlteanEven though applications are not expected to parse DSA-specific frame headers, 1077714ee15SVladimir Olteanthe format on the wire of the tagging protocol represents an Application Binary 1087714ee15SVladimir OlteanInterface exposed by the kernel towards user space, for decoders such as 1097714ee15SVladimir Oltean``libpcap``. The tagging protocol driver must populate the ``proto`` member of 1107714ee15SVladimir Oltean``struct dsa_device_ops`` with a value that uniquely describes the 1117714ee15SVladimir Olteancharacteristics of the interaction required between the switch hardware and the 1127714ee15SVladimir Olteandata path driver: the offset of each bit field within the frame header and any 1137714ee15SVladimir Olteanstateful processing required to deal with the frames (as may be required for 1147714ee15SVladimir OlteanPTP timestamping). 1157714ee15SVladimir Oltean 1167714ee15SVladimir OlteanFrom the perspective of the network stack, all switches within the same DSA 1177714ee15SVladimir Olteanswitch tree use the same tagging protocol. In case of a packet transiting a 1187714ee15SVladimir Olteanfabric with more than one switch, the switch-specific frame header is inserted 1197714ee15SVladimir Olteanby the first switch in the fabric that the packet was received on. This header 1207714ee15SVladimir Olteantypically contains information regarding its type (whether it is a control 1217714ee15SVladimir Olteanframe that must be trapped to the CPU, or a data frame to be forwarded). 1227714ee15SVladimir OlteanControl frames should be decapsulated only by the software data path, whereas 1237714ee15SVladimir Olteandata frames might also be autonomously forwarded towards other user ports of 1247714ee15SVladimir Olteanother switches from the same fabric, and in this case, the outermost switch 1257714ee15SVladimir Olteanports must decapsulate the packet. 1267714ee15SVladimir Oltean 1277714ee15SVladimir OlteanNote that in certain cases, it might be the case that the tagging format used 128*5a48b743SBjorn Helgaasby a leaf switch (not connected directly to the CPU) is not the same as what 1297714ee15SVladimir Olteanthe network stack sees. This can be seen with Marvell switch trees, where the 1307714ee15SVladimir OlteanCPU port can be configured to use either the DSA or the Ethertype DSA (EDSA) 1317714ee15SVladimir Olteanformat, but the DSA links are configured to use the shorter (without Ethertype) 1327714ee15SVladimir OlteanDSA frame header, in order to reduce the autonomous packet forwarding overhead. 1337714ee15SVladimir OlteanIt still remains the case that, if the DSA switch tree is configured for the 1347714ee15SVladimir OlteanEDSA tagging protocol, the operating system sees EDSA-tagged packets from the 1357714ee15SVladimir Olteanleaf switches that tagged them with the shorter DSA header. This can be done 1367714ee15SVladimir Olteanbecause the Marvell switch connected directly to the CPU is configured to 1377714ee15SVladimir Olteanperform tag translation between DSA and EDSA (which is simply the operation of 1387714ee15SVladimir Olteanadding or removing the ``ETH_P_EDSA`` EtherType and some padding octets). 1397714ee15SVladimir Oltean 1407714ee15SVladimir OlteanIt is possible to construct cascaded setups of DSA switches even if their 1417714ee15SVladimir Olteantagging protocols are not compatible with one another. In this case, there are 1427714ee15SVladimir Olteanno DSA links in this fabric, and each switch constitutes a disjoint DSA switch 1437714ee15SVladimir Olteantree. The DSA links are viewed as simply a pair of a DSA master (the out-facing 1447714ee15SVladimir Olteanport of the upstream DSA switch) and a CPU port (the in-facing port of the 1457714ee15SVladimir Olteandownstream DSA switch). 1467714ee15SVladimir Oltean 1477714ee15SVladimir OlteanThe tagging protocol of the attached DSA switch tree can be viewed through the 1487714ee15SVladimir Oltean``dsa/tagging`` sysfs attribute of the DSA master:: 1497714ee15SVladimir Oltean 1507714ee15SVladimir Oltean cat /sys/class/net/eth0/dsa/tagging 1517714ee15SVladimir Oltean 1527714ee15SVladimir OlteanIf the hardware and driver are capable, the tagging protocol of the DSA switch 1537714ee15SVladimir Olteantree can be changed at runtime. This is done by writing the new tagging 1547714ee15SVladimir Olteanprotocol name to the same sysfs device attribute as above (the DSA master and 1557714ee15SVladimir Olteanall attached switch ports must be down while doing this). 1567714ee15SVladimir Oltean 1577714ee15SVladimir OlteanIt is desirable that all tagging protocols are testable with the ``dsa_loop`` 1587714ee15SVladimir Olteanmockup driver, which can be attached to any network interface. The goal is that 1597714ee15SVladimir Olteanany network interface should be capable of transmitting the same packet in the 1607714ee15SVladimir Olteansame way, and the tagger should decode the same received packet in the same way 1617714ee15SVladimir Olteanregardless of the driver used for the switch control path, and the driver used 1627714ee15SVladimir Olteanfor the DSA master. 1637714ee15SVladimir Oltean 1647714ee15SVladimir OlteanThe transmission of a packet goes through the tagger's ``xmit`` function. 1657714ee15SVladimir OlteanThe passed ``struct sk_buff *skb`` has ``skb->data`` pointing at 1667714ee15SVladimir Oltean``skb_mac_header(skb)``, i.e. at the destination MAC address, and the passed 1677714ee15SVladimir Oltean``struct net_device *dev`` represents the virtual DSA user network interface 1687714ee15SVladimir Olteanwhose hardware counterpart the packet must be steered to (i.e. ``swp0``). 1697714ee15SVladimir OlteanThe job of this method is to prepare the skb in a way that the switch will 1707714ee15SVladimir Olteanunderstand what egress port the packet is for (and not deliver it towards other 1717714ee15SVladimir Olteanports). Typically this is fulfilled by pushing a frame header. Checking for 1727714ee15SVladimir Olteaninsufficient size in the skb headroom or tailroom is unnecessary provided that 1734e500251SVladimir Olteanthe ``needed_headroom`` and ``needed_tailroom`` properties were filled out 1744e500251SVladimir Olteanproperly, because DSA ensures there is enough space before calling this method. 1757714ee15SVladimir Oltean 1767714ee15SVladimir OlteanThe reception of a packet goes through the tagger's ``rcv`` function. The 1777714ee15SVladimir Olteanpassed ``struct sk_buff *skb`` has ``skb->data`` pointing at 1787714ee15SVladimir Oltean``skb_mac_header(skb) + ETH_ALEN`` octets, i.e. to where the first octet after 1797714ee15SVladimir Olteanthe EtherType would have been, were this frame not tagged. The role of this 1807714ee15SVladimir Olteanmethod is to consume the frame header, adjust ``skb->data`` to really point at 1817714ee15SVladimir Olteanthe first octet after the EtherType, and to change ``skb->dev`` to point to the 1827714ee15SVladimir Olteanvirtual DSA user network interface corresponding to the physical front-facing 1837714ee15SVladimir Olteanswitch port that the packet was received on. 1847714ee15SVladimir Oltean 1857714ee15SVladimir OlteanSince tagging protocols in category 1 and 2 break software (and most often also 1867714ee15SVladimir Olteanhardware) packet dissection on the DSA master, features such as RPS (Receive 1877714ee15SVladimir OlteanPacket Steering) on the DSA master would be broken. The DSA framework deals 1887714ee15SVladimir Olteanwith this by hooking into the flow dissector and shifting the offset at which 1897714ee15SVladimir Olteanthe IP header is to be found in the tagged frame as seen by the DSA master. 1907714ee15SVladimir OlteanThis behavior is automatic based on the ``overhead`` value of the tagging 1917714ee15SVladimir Olteanprotocol. If not all packets are of equal size, the tagger can implement the 1927714ee15SVladimir Oltean``flow_dissect`` method of the ``struct dsa_device_ops`` and override this 1937714ee15SVladimir Olteandefault behavior by specifying the correct offset incurred by each individual 1947714ee15SVladimir OlteanRX packet. Tail taggers do not cause issues to the flow dissector. 1957714ee15SVladimir Oltean 1967714ee15SVladimir OlteanDue to various reasons (most common being category 1 taggers being associated 1977714ee15SVladimir Olteanwith DSA-unaware masters, mangling what the master perceives as MAC DA), the 1987714ee15SVladimir Olteantagging protocol may require the DSA master to operate in promiscuous mode, to 1997714ee15SVladimir Olteanreceive all frames regardless of the value of the MAC DA. This can be done by 2007714ee15SVladimir Olteansetting the ``promisc_on_master`` property of the ``struct dsa_device_ops``. 2017714ee15SVladimir OlteanNote that this assumes a DSA-unaware master driver, which is the norm. 2027714ee15SVladimir Oltean 2033c91d114SIoana CiorneiMaster network devices 2043c91d114SIoana Ciornei---------------------- 2053c91d114SIoana Ciornei 2063c91d114SIoana CiorneiMaster network devices are regular, unmodified Linux network device drivers for 2073c91d114SIoana Ciorneithe CPU/management Ethernet interface. Such a driver might occasionally need to 2083c91d114SIoana Ciorneiknow whether DSA is enabled (e.g.: to enable/disable specific offload features), 2093c91d114SIoana Ciorneibut the DSA subsystem has been proven to work with industry standard drivers: 2103c91d114SIoana Ciornei``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these 2113c91d114SIoana Ciorneidrivers. Such network devices are also often referred to as conduit network 2123c91d114SIoana Ciorneidevices since they act as a pipe between the host processor and the hardware 2133c91d114SIoana CiorneiEthernet switch. 2143c91d114SIoana Ciornei 2153c91d114SIoana CiorneiNetworking stack hooks 2163c91d114SIoana Ciornei---------------------- 2173c91d114SIoana Ciornei 2184f6a009cSRandy DunlapWhen a master netdev is used with DSA, a small hook is placed in the 2193c91d114SIoana Ciorneinetworking stack is in order to have the DSA subsystem process the Ethernet 2203c91d114SIoana Ciorneiswitch specific tagging protocol. DSA accomplishes this by registering a 2213c91d114SIoana Ciorneispecific (and fake) Ethernet type (later becoming ``skb->protocol``) with the 2223c91d114SIoana Ciorneinetworking stack, this is also known as a ``ptype`` or ``packet_type``. A typical 2233c91d114SIoana CiorneiEthernet Frame receive sequence looks like this: 2243c91d114SIoana Ciornei 2253c91d114SIoana CiorneiMaster network device (e.g.: e1000e): 2263c91d114SIoana Ciornei 2273c91d114SIoana Ciornei1. Receive interrupt fires: 2283c91d114SIoana Ciornei 2293c91d114SIoana Ciornei - receive function is invoked 2303c91d114SIoana Ciornei - basic packet processing is done: getting length, status etc. 2313c91d114SIoana Ciornei - packet is prepared to be processed by the Ethernet layer by calling 2323c91d114SIoana Ciornei ``eth_type_trans`` 2333c91d114SIoana Ciornei 2343c91d114SIoana Ciornei2. net/ethernet/eth.c:: 2353c91d114SIoana Ciornei 2363c91d114SIoana Ciornei eth_type_trans(skb, dev) 2373c91d114SIoana Ciornei if (dev->dsa_ptr != NULL) 2383c91d114SIoana Ciornei -> skb->protocol = ETH_P_XDSA 2393c91d114SIoana Ciornei 2403c91d114SIoana Ciornei3. drivers/net/ethernet/\*:: 2413c91d114SIoana Ciornei 2423c91d114SIoana Ciornei netif_receive_skb(skb) 2433c91d114SIoana Ciornei -> iterate over registered packet_type 2443c91d114SIoana Ciornei -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() 2453c91d114SIoana Ciornei 2463c91d114SIoana Ciornei4. net/dsa/dsa.c:: 2473c91d114SIoana Ciornei 2483c91d114SIoana Ciornei -> dsa_switch_rcv() 2493c91d114SIoana Ciornei -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c' 2503c91d114SIoana Ciornei 2513c91d114SIoana Ciornei5. net/dsa/tag_*.c: 2523c91d114SIoana Ciornei 2533c91d114SIoana Ciornei - inspect and strip switch tag protocol to determine originating port 2543c91d114SIoana Ciornei - locate per-port network device 2553c91d114SIoana Ciornei - invoke ``eth_type_trans()`` with the DSA slave network device 2563c91d114SIoana Ciornei - invoked ``netif_receive_skb()`` 2573c91d114SIoana Ciornei 2583c91d114SIoana CiorneiPast this point, the DSA slave network devices get delivered regular Ethernet 2593c91d114SIoana Ciorneiframes that can be processed by the networking stack. 2603c91d114SIoana Ciornei 2613c91d114SIoana CiorneiSlave network devices 2623c91d114SIoana Ciornei--------------------- 2633c91d114SIoana Ciornei 2643c91d114SIoana CiorneiSlave network devices created by DSA are stacked on top of their master network 2653c91d114SIoana Ciorneidevice, each of these network interfaces will be responsible for being a 2663c91d114SIoana Ciorneicontrolling and data-flowing end-point for each front-panel port of the switch. 2673c91d114SIoana CiorneiThese interfaces are specialized in order to: 2683c91d114SIoana Ciornei 2693c91d114SIoana Ciornei- insert/remove the switch tag protocol (if it exists) when sending traffic 2703c91d114SIoana Ciornei to/from specific switch ports 2713c91d114SIoana Ciornei- query the switch for ethtool operations: statistics, link state, 2723c91d114SIoana Ciornei Wake-on-LAN, register dumps... 273*5a48b743SBjorn Helgaas- manage external/internal PHY: link, auto-negotiation, etc. 2743c91d114SIoana Ciornei 2753c91d114SIoana CiorneiThese slave network devices have custom net_device_ops and ethtool_ops function 2763c91d114SIoana Ciorneipointers which allow DSA to introduce a level of layering between the networking 277*5a48b743SBjorn Helgaasstack/ethtool and the switch driver implementation. 2783c91d114SIoana Ciornei 2793c91d114SIoana CiorneiUpon frame transmission from these slave network devices, DSA will look up which 280*5a48b743SBjorn Helgaasswitch tagging protocol is currently registered with these network devices and 2813c91d114SIoana Ciorneiinvoke a specific transmit routine which takes care of adding the relevant 2823c91d114SIoana Ciorneiswitch tag in the Ethernet frames. 2833c91d114SIoana Ciornei 2843c91d114SIoana CiorneiThese frames are then queued for transmission using the master network device 285*5a48b743SBjorn Helgaas``ndo_start_xmit()`` function. Since they contain the appropriate switch tag, the 2863c91d114SIoana CiorneiEthernet switch will be able to process these incoming frames from the 287*5a48b743SBjorn Helgaasmanagement interface and deliver them to the physical switch port. 2883c91d114SIoana Ciornei 2893c91d114SIoana CiorneiGraphical representation 2903c91d114SIoana Ciornei------------------------ 2913c91d114SIoana Ciornei 2923c91d114SIoana CiorneiSummarized, this is basically how DSA looks like from a network device 2933c91d114SIoana Ciorneiperspective:: 2943c91d114SIoana Ciornei 2950f455371SVladimir Oltean Unaware application 2960f455371SVladimir Oltean opens and binds socket 2970f455371SVladimir Oltean | ^ 2983c91d114SIoana Ciornei | | 2990f455371SVladimir Oltean +-----------v--|--------------------+ 3000f455371SVladimir Oltean |+------+ +------+ +------+ +------+| 3010f455371SVladimir Oltean || swp0 | | swp1 | | swp2 | | swp3 || 3020f455371SVladimir Oltean |+------+-+------+-+------+-+------+| 3030f455371SVladimir Oltean | DSA switch driver | 3040f455371SVladimir Oltean +-----------------------------------+ 3050f455371SVladimir Oltean | ^ 3060f455371SVladimir Oltean Tag added by | | Tag consumed by 3070f455371SVladimir Oltean switch driver | | switch driver 3080f455371SVladimir Oltean v | 3090f455371SVladimir Oltean +-----------------------------------+ 3100f455371SVladimir Oltean | Unmodified host interface driver | Software 3110f455371SVladimir Oltean --------+-----------------------------------+------------ 3120f455371SVladimir Oltean | Host interface (eth0) | Hardware 3130f455371SVladimir Oltean +-----------------------------------+ 3140f455371SVladimir Oltean | ^ 3150f455371SVladimir Oltean Tag consumed by | | Tag added by 3160f455371SVladimir Oltean switch hardware | | switch hardware 3170f455371SVladimir Oltean v | 3180f455371SVladimir Oltean +-----------------------------------+ 3190f455371SVladimir Oltean | Switch | 3200f455371SVladimir Oltean |+------+ +------+ +------+ +------+| 3210f455371SVladimir Oltean || swp0 | | swp1 | | swp2 | | swp3 || 3220f455371SVladimir Oltean ++------+-+------+-+------+-+------++ 3233c91d114SIoana Ciornei 3243c91d114SIoana CiorneiSlave MDIO bus 3253c91d114SIoana Ciornei-------------- 3263c91d114SIoana Ciornei 3273c91d114SIoana CiorneiIn order to be able to read to/from a switch PHY built into it, DSA creates a 3283c91d114SIoana Ciorneislave MDIO bus which allows a specific switch driver to divert and intercept 3293c91d114SIoana CiorneiMDIO reads/writes towards specific PHY addresses. In most MDIO-connected 3303c91d114SIoana Ciorneiswitches, these functions would utilize direct or indirect PHY addressing mode 3313c91d114SIoana Ciorneito return standard MII registers from the switch builtin PHYs, allowing the PHY 3323c91d114SIoana Ciorneilibrary and/or to return link status, link partner pages, auto-negotiation 333*5a48b743SBjorn Helgaasresults, etc. 3343c91d114SIoana Ciornei 335*5a48b743SBjorn HelgaasFor Ethernet switches which have both external and internal MDIO buses, the 3363c91d114SIoana Ciorneislave MII bus can be utilized to mux/demux MDIO reads and writes towards either 3373c91d114SIoana Ciorneiinternal or external MDIO devices this switch might be connected to: internal 3383c91d114SIoana CiorneiPHYs, external PHYs, or even external switches. 3393c91d114SIoana Ciornei 3403c91d114SIoana CiorneiData structures 3413c91d114SIoana Ciornei--------------- 3423c91d114SIoana Ciornei 3433c91d114SIoana CiorneiDSA data structures are defined in ``include/net/dsa.h`` as well as 3443c91d114SIoana Ciornei``net/dsa/dsa_priv.h``: 3453c91d114SIoana Ciornei 3463c91d114SIoana Ciornei- ``dsa_chip_data``: platform data configuration for a given switch device, 3473c91d114SIoana Ciornei this structure describes a switch device's parent device, its address, as 3483c91d114SIoana Ciornei well as various properties of its ports: names/labels, and finally a routing 3493c91d114SIoana Ciornei table indication (when cascading switches) 3503c91d114SIoana Ciornei 3513c91d114SIoana Ciornei- ``dsa_platform_data``: platform device configuration data which can reference 352*5a48b743SBjorn Helgaas a collection of dsa_chip_data structures if multiple switches are cascaded, 3533c91d114SIoana Ciornei the master network device this switch tree is attached to needs to be 3543c91d114SIoana Ciornei referenced 3553c91d114SIoana Ciornei 3563c91d114SIoana Ciornei- ``dsa_switch_tree``: structure assigned to the master network device under 3573c91d114SIoana Ciornei ``dsa_ptr``, this structure references a dsa_platform_data structure as well as 3583c91d114SIoana Ciornei the tagging protocol supported by the switch tree, and which receive/transmit 3593c91d114SIoana Ciornei function hooks should be invoked, information about the directly attached 3603c91d114SIoana Ciornei switch is also provided: CPU port. Finally, a collection of dsa_switch are 3613c91d114SIoana Ciornei referenced to address individual switches in the tree. 3623c91d114SIoana Ciornei 3633c91d114SIoana Ciornei- ``dsa_switch``: structure describing a switch device in the tree, referencing 3643c91d114SIoana Ciornei a ``dsa_switch_tree`` as a backpointer, slave network devices, master network 3653c91d114SIoana Ciornei device, and a reference to the backing``dsa_switch_ops`` 3663c91d114SIoana Ciornei 3673c91d114SIoana Ciornei- ``dsa_switch_ops``: structure referencing function pointers, see below for a 3683c91d114SIoana Ciornei full description. 3693c91d114SIoana Ciornei 3703c91d114SIoana CiorneiDesign limitations 3713c91d114SIoana Ciornei================== 3723c91d114SIoana Ciornei 3733c91d114SIoana CiorneiLack of CPU/DSA network devices 3743c91d114SIoana Ciornei------------------------------- 3753c91d114SIoana Ciornei 3763c91d114SIoana CiorneiDSA does not currently create slave network devices for the CPU or DSA ports, as 3773c91d114SIoana Ciorneidescribed before. This might be an issue in the following cases: 3783c91d114SIoana Ciornei 3793c91d114SIoana Ciornei- inability to fetch switch CPU port statistics counters using ethtool, which 3803c91d114SIoana Ciornei can make it harder to debug MDIO switch connected using xMII interfaces 3813c91d114SIoana Ciornei 3823c91d114SIoana Ciornei- inability to configure the CPU port link parameters based on the Ethernet 3833c91d114SIoana Ciornei controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/ 3843c91d114SIoana Ciornei 3853c91d114SIoana Ciornei- inability to configure specific VLAN IDs / trunking VLANs between switches 3863c91d114SIoana Ciornei when using a cascaded setup 3873c91d114SIoana Ciornei 3883c91d114SIoana CiorneiCommon pitfalls using DSA setups 3893c91d114SIoana Ciornei-------------------------------- 3903c91d114SIoana Ciornei 3913c91d114SIoana CiorneiOnce a master network device is configured to use DSA (dev->dsa_ptr becomes 3923c91d114SIoana Ciorneinon-NULL), and the switch behind it expects a tagging protocol, this network 3933c91d114SIoana Ciorneiinterface can only exclusively be used as a conduit interface. Sending packets 3943c91d114SIoana Ciorneidirectly through this interface (e.g.: opening a socket using this interface) 3953c91d114SIoana Ciorneiwill not make us go through the switch tagging protocol transmit function, so 3963c91d114SIoana Ciorneithe Ethernet switch on the other end, expecting a tag will typically drop this 3973c91d114SIoana Ciorneiframe. 3983c91d114SIoana Ciornei 3993c91d114SIoana CiorneiInteractions with other subsystems 4003c91d114SIoana Ciornei================================== 4013c91d114SIoana Ciornei 4023c91d114SIoana CiorneiDSA currently leverages the following subsystems: 4033c91d114SIoana Ciornei 4043c91d114SIoana Ciornei- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c`` 4053c91d114SIoana Ciornei- Switchdev:``net/switchdev/*`` 4063c91d114SIoana Ciornei- Device Tree for various of_* functions 4078411abbcSVladimir Oltean- Devlink: ``net/core/devlink.c`` 4083c91d114SIoana Ciornei 4093c91d114SIoana CiorneiMDIO/PHY library 4103c91d114SIoana Ciornei---------------- 4113c91d114SIoana Ciornei 4123c91d114SIoana CiorneiSlave network devices exposed by DSA may or may not be interfacing with PHY 4133c91d114SIoana Ciorneidevices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA 4143c91d114SIoana Ciorneisubsystem deals with all possible combinations: 4153c91d114SIoana Ciornei 4163c91d114SIoana Ciornei- internal PHY devices, built into the Ethernet switch hardware 4173c91d114SIoana Ciornei- external PHY devices, connected via an internal or external MDIO bus 4183c91d114SIoana Ciornei- internal PHY devices, connected via an internal MDIO bus 4193c91d114SIoana Ciornei- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a 4203c91d114SIoana Ciornei fixed PHYs 4213c91d114SIoana Ciornei 4223c91d114SIoana CiorneiThe PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the 4233c91d114SIoana Ciorneilogic basically looks like this: 4243c91d114SIoana Ciornei 4253c91d114SIoana Ciornei- if Device Tree is used, the PHY device is looked up using the standard 4263c91d114SIoana Ciornei "phy-handle" property, if found, this PHY device is created and registered 4273c91d114SIoana Ciornei using ``of_phy_connect()`` 4283c91d114SIoana Ciornei 429*5a48b743SBjorn Helgaas- if Device Tree is used and the PHY device is "fixed", that is, conforms to 4303c91d114SIoana Ciornei the definition of a non-MDIO managed PHY as defined in 4313c91d114SIoana Ciornei ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered 4323c91d114SIoana Ciornei and connected transparently using the special fixed MDIO bus driver 4333c91d114SIoana Ciornei 4343c91d114SIoana Ciornei- finally, if the PHY is built into the switch, as is very common with 4353c91d114SIoana Ciornei standalone switch packages, the PHY is probed using the slave MII bus created 4363c91d114SIoana Ciornei by DSA 4373c91d114SIoana Ciornei 4383c91d114SIoana Ciornei 4393c91d114SIoana CiorneiSWITCHDEV 4403c91d114SIoana Ciornei--------- 4413c91d114SIoana Ciornei 4423c91d114SIoana CiorneiDSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and 4433c91d114SIoana Ciorneimore specifically with its VLAN filtering portion when configuring VLANs on top 444f8843991SVladimir Olteanof per-port slave network devices. As of today, the only SWITCHDEV objects 445f8843991SVladimir Olteansupported by DSA are the FDB and VLAN objects. 4463c91d114SIoana Ciornei 4478411abbcSVladimir OlteanDevlink 4488411abbcSVladimir Oltean------- 4498411abbcSVladimir Oltean 4508411abbcSVladimir OlteanDSA registers one devlink device per physical switch in the fabric. 4518411abbcSVladimir OlteanFor each devlink device, every physical port (i.e. user ports, CPU ports, DSA 4528411abbcSVladimir Olteanlinks or unused ports) is exposed as a devlink port. 4538411abbcSVladimir Oltean 4548411abbcSVladimir OlteanDSA drivers can make use of the following devlink features: 4558794be45SVladimir Oltean 4568411abbcSVladimir Oltean- Regions: debugging feature which allows user space to dump driver-defined 4578411abbcSVladimir Oltean areas of hardware information in a low-level, binary format. Both global 4588411abbcSVladimir Oltean regions as well as per-port regions are supported. It is possible to export 4598411abbcSVladimir Oltean devlink regions even for pieces of data that are already exposed in some way 4608411abbcSVladimir Oltean to the standard iproute2 user space programs (ip-link, bridge), like address 4618411abbcSVladimir Oltean tables and VLAN tables. For example, this might be useful if the tables 4628411abbcSVladimir Oltean contain additional hardware-specific details which are not visible through 4638411abbcSVladimir Oltean the iproute2 abstraction, or it might be useful to inspect these tables on 4648411abbcSVladimir Oltean the non-user ports too, which are invisible to iproute2 because no network 4658411abbcSVladimir Oltean interface is registered for them. 4668411abbcSVladimir Oltean- Params: a feature which enables user to configure certain low-level tunable 4678411abbcSVladimir Oltean knobs pertaining to the device. Drivers may implement applicable generic 4688411abbcSVladimir Oltean devlink params, or may add new device-specific devlink params. 4698411abbcSVladimir Oltean- Resources: a monitoring feature which enables users to see the degree of 4708411abbcSVladimir Oltean utilization of certain hardware tables in the device, such as FDB, VLAN, etc. 4718411abbcSVladimir Oltean- Shared buffers: a QoS feature for adjusting and partitioning memory and frame 4728411abbcSVladimir Oltean reservations per port and per traffic class, in the ingress and egress 4738411abbcSVladimir Oltean directions, such that low-priority bulk traffic does not impede the 4748411abbcSVladimir Oltean processing of high-priority critical traffic. 4758411abbcSVladimir Oltean 4768411abbcSVladimir OlteanFor more details, consult ``Documentation/networking/devlink/``. 4778411abbcSVladimir Oltean 4783c91d114SIoana CiorneiDevice Tree 4793c91d114SIoana Ciornei----------- 4803c91d114SIoana Ciornei 4813c91d114SIoana CiorneiDSA features a standardized binding which is documented in 4823c91d114SIoana Ciornei``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper 4833c91d114SIoana Ciorneifunctions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query 484*5a48b743SBjorn Helgaasper-port PHY specific details: interface connection, MDIO bus location, etc. 4853c91d114SIoana Ciornei 4863c91d114SIoana CiorneiDriver development 4873c91d114SIoana Ciornei================== 4883c91d114SIoana Ciornei 4893c91d114SIoana CiorneiDSA switch drivers need to implement a dsa_switch_ops structure which will 4903c91d114SIoana Ciorneicontain the various members described below. 4913c91d114SIoana Ciornei 4923c91d114SIoana Ciornei``register_switch_driver()`` registers this dsa_switch_ops in its internal list 4933c91d114SIoana Ciorneiof drivers to probe for. ``unregister_switch_driver()`` does the exact opposite. 4943c91d114SIoana Ciornei 4953c91d114SIoana CiorneiUnless requested differently by setting the priv_size member accordingly, DSA 4963c91d114SIoana Ciorneidoes not allocate any driver private context space. 4973c91d114SIoana Ciornei 4983c91d114SIoana CiorneiSwitch configuration 4993c91d114SIoana Ciornei-------------------- 5003c91d114SIoana Ciornei 5013c91d114SIoana Ciornei- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported, 5023c91d114SIoana Ciornei should be a valid value from the ``dsa_tag_protocol`` enum 5033c91d114SIoana Ciornei 5043c91d114SIoana Ciornei- ``probe``: probe routine which will be invoked by the DSA platform device upon 5053c91d114SIoana Ciornei registration to test for the presence/absence of a switch device. For MDIO 5063c91d114SIoana Ciornei devices, it is recommended to issue a read towards internal registers using 5073c91d114SIoana Ciornei the switch pseudo-PHY and return whether this is a supported device. For other 5083c91d114SIoana Ciornei buses, return a non-NULL string 5093c91d114SIoana Ciornei 5103c91d114SIoana Ciornei- ``setup``: setup function for the switch, this function is responsible for setting 5113c91d114SIoana Ciornei up the ``dsa_switch_ops`` private structure with all it needs: register maps, 512*5a48b743SBjorn Helgaas interrupts, mutexes, locks, etc. This function is also expected to properly 5133c91d114SIoana Ciornei configure the switch to separate all network interfaces from each other, that 5143c91d114SIoana Ciornei is, they should be isolated by the switch hardware itself, typically by creating 5153c91d114SIoana Ciornei a Port-based VLAN ID for each port and allowing only the CPU port and the 5163c91d114SIoana Ciornei specific port to be in the forwarding vector. Ports that are unused by the 5173c91d114SIoana Ciornei platform should be disabled. Past this function, the switch is expected to be 5183c91d114SIoana Ciornei fully configured and ready to serve any kind of request. It is recommended 5193c91d114SIoana Ciornei to issue a software reset of the switch during this setup function in order to 5203c91d114SIoana Ciornei avoid relying on what a previous software agent such as a bootloader/firmware 5213c91d114SIoana Ciornei may have previously configured. 5223c91d114SIoana Ciornei 5233c91d114SIoana CiorneiPHY devices and link management 5243c91d114SIoana Ciornei------------------------------- 5253c91d114SIoana Ciornei 5263c91d114SIoana Ciornei- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs, 5273c91d114SIoana Ciornei if the PHY library PHY driver needs to know about information it cannot obtain 5283c91d114SIoana Ciornei on its own (e.g.: coming from switch memory mapped registers), this function 529*5a48b743SBjorn Helgaas should return a 32-bit bitmask of "flags" that is private between the switch 5303c91d114SIoana Ciornei driver and the Ethernet PHY driver in ``drivers/net/phy/\*``. 5313c91d114SIoana Ciornei 5323c91d114SIoana Ciornei- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read 5333c91d114SIoana Ciornei the switch port MDIO registers. If unavailable, return 0xffff for each read. 5343c91d114SIoana Ciornei For builtin switch Ethernet PHYs, this function should allow reading the link 535*5a48b743SBjorn Helgaas status, auto-negotiation results, link partner pages, etc. 5363c91d114SIoana Ciornei 5373c91d114SIoana Ciornei- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write 5383c91d114SIoana Ciornei to the switch port MDIO registers. If unavailable return a negative error 5393c91d114SIoana Ciornei code. 5403c91d114SIoana Ciornei 5413c91d114SIoana Ciornei- ``adjust_link``: Function invoked by the PHY library when a slave network device 5423c91d114SIoana Ciornei is attached to a PHY device. This function is responsible for appropriately 5433c91d114SIoana Ciornei configuring the switch port link parameters: speed, duplex, pause based on 5443c91d114SIoana Ciornei what the ``phy_device`` is providing. 5453c91d114SIoana Ciornei 5463c91d114SIoana Ciornei- ``fixed_link_update``: Function invoked by the PHY library, and specifically by 5473c91d114SIoana Ciornei the fixed PHY driver asking the switch driver for link parameters that could 5483c91d114SIoana Ciornei not be auto-negotiated, or obtained by reading the PHY registers through MDIO. 5493c91d114SIoana Ciornei This is particularly useful for specific kinds of hardware such as QSGMII, 5503c91d114SIoana Ciornei MoCA or other kinds of non-MDIO managed PHYs where out of band link 5513c91d114SIoana Ciornei information is obtained 5523c91d114SIoana Ciornei 5533c91d114SIoana CiorneiEthtool operations 5543c91d114SIoana Ciornei------------------ 5553c91d114SIoana Ciornei 5563c91d114SIoana Ciornei- ``get_strings``: ethtool function used to query the driver's strings, will 557*5a48b743SBjorn Helgaas typically return statistics strings, private flags strings, etc. 5583c91d114SIoana Ciornei 5593c91d114SIoana Ciornei- ``get_ethtool_stats``: ethtool function used to query per-port statistics and 5603c91d114SIoana Ciornei return their values. DSA overlays slave network devices general statistics: 5613c91d114SIoana Ciornei RX/TX counters from the network device, with switch driver specific statistics 5623c91d114SIoana Ciornei per port 5633c91d114SIoana Ciornei 5643c91d114SIoana Ciornei- ``get_sset_count``: ethtool function used to query the number of statistics items 5653c91d114SIoana Ciornei 5663c91d114SIoana Ciornei- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this 567*5a48b743SBjorn Helgaas function may for certain implementations also query the master network device 5683c91d114SIoana Ciornei Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN 5693c91d114SIoana Ciornei 5703c91d114SIoana Ciornei- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port, 5713c91d114SIoana Ciornei direct counterpart to set_wol with similar restrictions 5723c91d114SIoana Ciornei 5733c91d114SIoana Ciornei- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green 5743c91d114SIoana Ciornei Ethernet) settings, can optionally invoke the PHY library to enable EEE at the 5753c91d114SIoana Ciornei PHY level if relevant. This function should enable EEE at the switch port MAC 5763c91d114SIoana Ciornei controller and data-processing logic 5773c91d114SIoana Ciornei 5783c91d114SIoana Ciornei- ``get_eee``: ethtool function which is used to query a switch port EEE settings, 5793c91d114SIoana Ciornei this function should return the EEE state of the switch port MAC controller 5803c91d114SIoana Ciornei and data-processing logic as well as query the PHY for its currently configured 5813c91d114SIoana Ciornei EEE settings 5823c91d114SIoana Ciornei 5833c91d114SIoana Ciornei- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM 5843c91d114SIoana Ciornei length/size in bytes 5853c91d114SIoana Ciornei 5863c91d114SIoana Ciornei- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents 5873c91d114SIoana Ciornei 5883c91d114SIoana Ciornei- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM 5893c91d114SIoana Ciornei 5903c91d114SIoana Ciornei- ``get_regs_len``: ethtool function returning the register length for a given 5913c91d114SIoana Ciornei switch 5923c91d114SIoana Ciornei 5933c91d114SIoana Ciornei- ``get_regs``: ethtool function returning the Ethernet switch internal register 5943c91d114SIoana Ciornei contents. This function might require user-land code in ethtool to 5953c91d114SIoana Ciornei pretty-print register values and registers 5963c91d114SIoana Ciornei 5973c91d114SIoana CiorneiPower management 5983c91d114SIoana Ciornei---------------- 5993c91d114SIoana Ciornei 6003c91d114SIoana Ciornei- ``suspend``: function invoked by the DSA platform device when the system goes to 6013c91d114SIoana Ciornei suspend, should quiesce all Ethernet switch activities, but keep ports 6023c91d114SIoana Ciornei participating in Wake-on-LAN active as well as additional wake-up logic if 6033c91d114SIoana Ciornei supported 6043c91d114SIoana Ciornei 6053c91d114SIoana Ciornei- ``resume``: function invoked by the DSA platform device when the system resumes, 6063c91d114SIoana Ciornei should resume all Ethernet switch activities and re-configure the switch to be 6073c91d114SIoana Ciornei in a fully active state 6083c91d114SIoana Ciornei 6093c91d114SIoana Ciornei- ``port_enable``: function invoked by the DSA slave network device ndo_open 610*5a48b743SBjorn Helgaas function when a port is administratively brought up, this function should 611*5a48b743SBjorn Helgaas fully enable a given switch port. DSA takes care of marking the port with 6123c91d114SIoana Ciornei ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it 6133c91d114SIoana Ciornei was not, and propagating these changes down to the hardware 6143c91d114SIoana Ciornei 6153c91d114SIoana Ciornei- ``port_disable``: function invoked by the DSA slave network device ndo_close 616*5a48b743SBjorn Helgaas function when a port is administratively brought down, this function should 617*5a48b743SBjorn Helgaas fully disable a given switch port. DSA takes care of marking the port with 6183c91d114SIoana Ciornei ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is 6193c91d114SIoana Ciornei disabled while being a bridge member 6203c91d114SIoana Ciornei 6213c91d114SIoana CiorneiBridge layer 6223c91d114SIoana Ciornei------------ 6233c91d114SIoana Ciornei 6243c91d114SIoana Ciornei- ``port_bridge_join``: bridge layer function invoked when a given switch port is 625*5a48b743SBjorn Helgaas added to a bridge, this function should do what's necessary at the switch 626*5a48b743SBjorn Helgaas level to permit the joining port to be added to the relevant logical 6273c91d114SIoana Ciornei domain for it to ingress/egress traffic with other members of the bridge. 6283c91d114SIoana Ciornei 6293c91d114SIoana Ciornei- ``port_bridge_leave``: bridge layer function invoked when a given switch port is 630*5a48b743SBjorn Helgaas removed from a bridge, this function should do what's necessary at the 6313c91d114SIoana Ciornei switch level to deny the leaving port from ingress/egress traffic from the 6323c91d114SIoana Ciornei remaining bridge members. When the port leaves the bridge, it should be aged 6333c91d114SIoana Ciornei out at the switch hardware for the switch to (re) learn MAC addresses behind 6343c91d114SIoana Ciornei this port. 6353c91d114SIoana Ciornei 6363c91d114SIoana Ciornei- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP 6373c91d114SIoana Ciornei state is computed by the bridge layer and should be propagated to switch 6383c91d114SIoana Ciornei hardware to forward/block/learn traffic. The switch driver is responsible for 6393c91d114SIoana Ciornei computing a STP state change based on current and asked parameters and perform 6403c91d114SIoana Ciornei the relevant ageing based on the intersection results 6413c91d114SIoana Ciornei 6425a275f4cSVladimir Oltean- ``port_bridge_flags``: bridge layer function invoked when a port must 6435a275f4cSVladimir Oltean configure its settings for e.g. flooding of unknown traffic or source address 6445a275f4cSVladimir Oltean learning. The switch driver is responsible for initial setup of the 6455a275f4cSVladimir Oltean standalone ports with address learning disabled and egress flooding of all 6465a275f4cSVladimir Oltean types of traffic, then the DSA core notifies of any change to the bridge port 6475a275f4cSVladimir Oltean flags when the port joins and leaves a bridge. DSA does not currently manage 6485a275f4cSVladimir Oltean the bridge port flags for the CPU port. The assumption is that address 6495a275f4cSVladimir Oltean learning should be statically enabled (if supported by the hardware) on the 6505a275f4cSVladimir Oltean CPU port, and flooding towards the CPU port should also be enabled, due to a 6515a275f4cSVladimir Oltean lack of an explicit address filtering mechanism in the DSA core. 6525a275f4cSVladimir Oltean 65395ca3819SVladimir Oltean- ``port_bridge_tx_fwd_offload``: bridge layer function invoked after 65495ca3819SVladimir Oltean ``port_bridge_join`` when a driver sets ``ds->num_fwd_offloading_bridges`` to 65595ca3819SVladimir Oltean a non-zero value. Returning success in this function activates the TX 65695ca3819SVladimir Oltean forwarding offload bridge feature for this port, which enables the tagging 65795ca3819SVladimir Oltean protocol driver to inject data plane packets towards the bridging domain that 65895ca3819SVladimir Oltean the port is a part of. Data plane packets are subject to FDB lookup, hardware 65995ca3819SVladimir Oltean learning on the CPU port, and do not override the port STP state. 66095ca3819SVladimir Oltean Additionally, replication of data plane packets (multicast, flooding) is 66195ca3819SVladimir Oltean handled in hardware and the bridge driver will transmit a single skb for each 66295ca3819SVladimir Oltean packet that needs replication. The method is provided as a configuration 66395ca3819SVladimir Oltean point for drivers that need to configure the hardware for enabling this 66495ca3819SVladimir Oltean feature. 66595ca3819SVladimir Oltean 666*5a48b743SBjorn Helgaas- ``port_bridge_tx_fwd_unoffload``: bridge layer function invoked when a driver 66795ca3819SVladimir Oltean leaves a bridge port which had the TX forwarding offload feature enabled. 66895ca3819SVladimir Oltean 6693c91d114SIoana CiorneiBridge VLAN filtering 6703c91d114SIoana Ciornei--------------------- 6713c91d114SIoana Ciornei 6723c91d114SIoana Ciornei- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets 6733c91d114SIoana Ciornei configured for turning on or off VLAN filtering. If nothing specific needs to 6743c91d114SIoana Ciornei be done at the hardware level, this callback does not need to be implemented. 6753c91d114SIoana Ciornei When VLAN filtering is turned on, the hardware must be programmed with 6763c91d114SIoana Ciornei rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed 6773c91d114SIoana Ciornei VLAN ID map/rules. If there is no PVID programmed into the switch port, 6783c91d114SIoana Ciornei untagged frames must be rejected as well. When turned off the switch must 6793c91d114SIoana Ciornei accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are 6803c91d114SIoana Ciornei allowed. 6813c91d114SIoana Ciornei 6823c91d114SIoana Ciornei- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured 683f8843991SVladimir Oltean (tagged or untagged) for the given switch port. If the operation is not 684f8843991SVladimir Oltean supported by the hardware, this function should return ``-EOPNOTSUPP`` to 685f8843991SVladimir Oltean inform the bridge code to fallback to a software implementation. 6863c91d114SIoana Ciornei 6873c91d114SIoana Ciornei- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the 6883c91d114SIoana Ciornei given switch port 6893c91d114SIoana Ciornei 6903c91d114SIoana Ciornei- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback 6913c91d114SIoana Ciornei function that the driver has to call for each VLAN the given port is a member 6923c91d114SIoana Ciornei of. A switchdev object is used to carry the VID and bridge flags. 6933c91d114SIoana Ciornei 6943c91d114SIoana Ciornei- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a 6953c91d114SIoana Ciornei Forwarding Database entry, the switch hardware should be programmed with the 6963c91d114SIoana Ciornei specified address in the specified VLAN Id in the forwarding database 6973c91d114SIoana Ciornei associated with this VLAN ID. If the operation is not supported, this 6983c91d114SIoana Ciornei function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to 6993c91d114SIoana Ciornei a software implementation. 7003c91d114SIoana Ciornei 7013c91d114SIoana Ciornei.. note:: VLAN ID 0 corresponds to the port private database, which, in the context 7026fb44c43SGeert Uytterhoeven of DSA, would be its port-based VLAN, used by the associated bridge device. 7033c91d114SIoana Ciornei 7043c91d114SIoana Ciornei- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a 7053c91d114SIoana Ciornei Forwarding Database entry, the switch hardware should be programmed to delete 7063c91d114SIoana Ciornei the specified MAC address from the specified VLAN ID if it was mapped into 7073c91d114SIoana Ciornei this port forwarding database 7083c91d114SIoana Ciornei 7093c91d114SIoana Ciornei- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback 7103c91d114SIoana Ciornei function that the driver has to call for each MAC address known to be behind 7113c91d114SIoana Ciornei the given port. A switchdev object is used to carry the VID and FDB info. 7123c91d114SIoana Ciornei 7133c91d114SIoana Ciornei- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install 714f8843991SVladimir Oltean a multicast database entry. If the operation is not supported, this function 715f8843991SVladimir Oltean should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a 716f8843991SVladimir Oltean software implementation. The switch hardware should be programmed with the 7173c91d114SIoana Ciornei specified address in the specified VLAN ID in the forwarding database 7183c91d114SIoana Ciornei associated with this VLAN ID. 7193c91d114SIoana Ciornei 7203c91d114SIoana Ciornei.. note:: VLAN ID 0 corresponds to the port private database, which, in the context 7216fb44c43SGeert Uytterhoeven of DSA, would be its port-based VLAN, used by the associated bridge device. 7223c91d114SIoana Ciornei 7233c91d114SIoana Ciornei- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a 7243c91d114SIoana Ciornei multicast database entry, the switch hardware should be programmed to delete 7253c91d114SIoana Ciornei the specified MAC address from the specified VLAN ID if it was mapped into 7263c91d114SIoana Ciornei this port forwarding database. 7273c91d114SIoana Ciornei 7283c91d114SIoana Ciornei- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback 7293c91d114SIoana Ciornei function that the driver has to call for each MAC address known to be behind 7303c91d114SIoana Ciornei the given port. A switchdev object is used to carry the VID and MDB info. 7313c91d114SIoana Ciornei 732a9985444SVladimir OlteanLink aggregation 733a9985444SVladimir Oltean---------------- 734a9985444SVladimir Oltean 735a9985444SVladimir OlteanLink aggregation is implemented in the Linux networking stack by the bonding 736a9985444SVladimir Olteanand team drivers, which are modeled as virtual, stackable network interfaces. 737a9985444SVladimir OlteanDSA is capable of offloading a link aggregation group (LAG) to hardware that 738a9985444SVladimir Olteansupports the feature, and supports bridging between physical ports and LAGs, 739a9985444SVladimir Olteanas well as between LAGs. A bonding/team interface which holds multiple physical 740a9985444SVladimir Olteanports constitutes a logical port, although DSA has no explicit concept of a 741a9985444SVladimir Olteanlogical port at the moment. Due to this, events where a LAG joins/leaves a 742a9985444SVladimir Olteanbridge are treated as if all individual physical ports that are members of that 743a9985444SVladimir OlteanLAG join/leave the bridge. Switchdev port attributes (VLAN filtering, STP 744a9985444SVladimir Olteanstate, etc) and objects (VLANs, MDB entries) offloaded to a LAG as bridge port 745a9985444SVladimir Olteanare treated similarly: DSA offloads the same switchdev object / port attribute 746a9985444SVladimir Olteanon all members of the LAG. Static bridge FDB entries on a LAG are not yet 747a9985444SVladimir Olteansupported, since the DSA driver API does not have the concept of a logical port 748a9985444SVladimir OlteanID. 749a9985444SVladimir Oltean 750a9985444SVladimir Oltean- ``port_lag_join``: function invoked when a given switch port is added to a 751a9985444SVladimir Oltean LAG. The driver may return ``-EOPNOTSUPP``, and in this case, DSA will fall 752a9985444SVladimir Oltean back to a software implementation where all traffic from this port is sent to 753a9985444SVladimir Oltean the CPU. 754a9985444SVladimir Oltean- ``port_lag_leave``: function invoked when a given switch port leaves a LAG 755a9985444SVladimir Oltean and returns to operation as a standalone port. 756a9985444SVladimir Oltean- ``port_lag_change``: function invoked when the link state of any member of 757a9985444SVladimir Oltean the LAG changes, and the hashing function needs rebalancing to only make use 758a9985444SVladimir Oltean of the subset of physical LAG member ports that are up. 759a9985444SVladimir Oltean 760a9985444SVladimir OlteanDrivers that benefit from having an ID associated with each offloaded LAG 761a9985444SVladimir Olteancan optionally populate ``ds->num_lag_ids`` from the ``dsa_switch_ops::setup`` 762a9985444SVladimir Olteanmethod. The LAG ID associated with a bonding/team interface can then be 763a9985444SVladimir Olteanretrieved by a DSA switch driver using the ``dsa_lag_id`` function. 764a9985444SVladimir Oltean 765f8f3c20aSVladimir OlteanIEC 62439-2 (MRP) 766f8f3c20aSVladimir Oltean----------------- 767f8f3c20aSVladimir Oltean 768f8f3c20aSVladimir OlteanThe Media Redundancy Protocol is a topology management protocol optimized for 769f8f3c20aSVladimir Olteanfast fault recovery time for ring networks, which has some components 770f8f3c20aSVladimir Olteanimplemented as a function of the bridge driver. MRP uses management PDUs 771f8f3c20aSVladimir Oltean(Test, Topology, LinkDown/Up, Option) sent at a multicast destination MAC 772f8f3c20aSVladimir Olteanaddress range of 01:15:4e:00:00:0x and with an EtherType of 0x88e3. 773f8f3c20aSVladimir OlteanDepending on the node's role in the ring (MRM: Media Redundancy Manager, 774f8f3c20aSVladimir OlteanMRC: Media Redundancy Client, MRA: Media Redundancy Automanager), certain MRP 775f8f3c20aSVladimir OlteanPDUs might need to be terminated locally and others might need to be forwarded. 776f8f3c20aSVladimir OlteanAn MRM might also benefit from offloading to hardware the creation and 777f8f3c20aSVladimir Olteantransmission of certain MRP PDUs (Test). 778f8f3c20aSVladimir Oltean 779f8f3c20aSVladimir OlteanNormally an MRP instance can be created on top of any network interface, 780f8f3c20aSVladimir Olteanhowever in the case of a device with an offloaded data path such as DSA, it is 781f8f3c20aSVladimir Olteannecessary for the hardware, even if it is not MRP-aware, to be able to extract 782f8f3c20aSVladimir Olteanthe MRP PDUs from the fabric before the driver can proceed with the software 783f8f3c20aSVladimir Olteanimplementation. DSA today has no driver which is MRP-aware, therefore it only 784f8f3c20aSVladimir Olteanlistens for the bare minimum switchdev objects required for the software assist 785f8f3c20aSVladimir Olteanto work properly. The operations are detailed below. 786f8f3c20aSVladimir Oltean 787f8f3c20aSVladimir Oltean- ``port_mrp_add`` and ``port_mrp_del``: notifies driver when an MRP instance 788f8f3c20aSVladimir Oltean with a certain ring ID, priority, primary port and secondary port is 789f8f3c20aSVladimir Oltean created/deleted. 790f8f3c20aSVladimir Oltean- ``port_mrp_add_ring_role`` and ``port_mrp_del_ring_role``: function invoked 791f8f3c20aSVladimir Oltean when an MRP instance changes ring roles between MRM or MRC. This affects 792f8f3c20aSVladimir Oltean which MRP PDUs should be trapped to software and which should be autonomously 793f8f3c20aSVladimir Oltean forwarded. 794f8f3c20aSVladimir Oltean 7956e9530f4SVladimir OlteanIEC 62439-3 (HSR/PRP) 7966e9530f4SVladimir Oltean--------------------- 7976e9530f4SVladimir Oltean 7986e9530f4SVladimir OlteanThe Parallel Redundancy Protocol (PRP) is a network redundancy protocol which 7996e9530f4SVladimir Olteanworks by duplicating and sequence numbering packets through two independent L2 8006e9530f4SVladimir Olteannetworks (which are unaware of the PRP tail tags carried in the packets), and 8016e9530f4SVladimir Olteaneliminating the duplicates at the receiver. The High-availability Seamless 8026e9530f4SVladimir OlteanRedundancy (HSR) protocol is similar in concept, except all nodes that carry 8036e9530f4SVladimir Olteanthe redundant traffic are aware of the fact that it is HSR-tagged (because HSR 8046e9530f4SVladimir Olteanuses a header with an EtherType of 0x892f) and are physically connected in a 8056e9530f4SVladimir Olteanring topology. Both HSR and PRP use supervision frames for monitoring the 8066e9530f4SVladimir Olteanhealth of the network and for discovery of other nodes. 8076e9530f4SVladimir Oltean 8086e9530f4SVladimir OlteanIn Linux, both HSR and PRP are implemented in the hsr driver, which 8096e9530f4SVladimir Olteaninstantiates a virtual, stackable network interface with two member ports. 8106e9530f4SVladimir OlteanThe driver only implements the basic roles of DANH (Doubly Attached Node 8116e9530f4SVladimir Olteanimplementing HSR) and DANP (Doubly Attached Node implementing PRP); the roles 8126e9530f4SVladimir Olteanof RedBox and QuadBox are not implemented (therefore, bridging a hsr network 8136e9530f4SVladimir Olteaninterface with a physical switch port does not produce the expected result). 8146e9530f4SVladimir Oltean 8156e9530f4SVladimir OlteanA driver which is able of offloading certain functions of a DANP or DANH should 8166e9530f4SVladimir Olteandeclare the corresponding netdev features as indicated by the documentation at 8176e9530f4SVladimir Oltean``Documentation/networking/netdev-features.rst``. Additionally, the following 8186e9530f4SVladimir Olteanmethods must be implemented: 8196e9530f4SVladimir Oltean 8206e9530f4SVladimir Oltean- ``port_hsr_join``: function invoked when a given switch port is added to a 8216e9530f4SVladimir Oltean DANP/DANH. The driver may return ``-EOPNOTSUPP`` and in this case, DSA will 8226e9530f4SVladimir Oltean fall back to a software implementation where all traffic from this port is 8236e9530f4SVladimir Oltean sent to the CPU. 8246e9530f4SVladimir Oltean- ``port_hsr_leave``: function invoked when a given switch port leaves a 8256e9530f4SVladimir Oltean DANP/DANH and returns to normal operation as a standalone port. 8266e9530f4SVladimir Oltean 8273c91d114SIoana CiorneiTODO 8283c91d114SIoana Ciornei==== 8293c91d114SIoana Ciornei 8303c91d114SIoana CiorneiMaking SWITCHDEV and DSA converge towards an unified codebase 8313c91d114SIoana Ciornei------------------------------------------------------------- 8323c91d114SIoana Ciornei 8333c91d114SIoana CiorneiSWITCHDEV properly takes care of abstracting the networking stack with offload 8343c91d114SIoana Ciorneicapable hardware, but does not enforce a strict switch device driver model. On 8353c91d114SIoana Ciorneithe other DSA enforces a fairly strict device driver model, and deals with most 8363c91d114SIoana Ciorneiof the switch specific. At some point we should envision a merger between these 8373c91d114SIoana Ciorneitwo subsystems and get the best of both worlds. 8383c91d114SIoana Ciornei 8393c91d114SIoana CiorneiOther hanging fruits 8403c91d114SIoana Ciornei-------------------- 8413c91d114SIoana Ciornei 8423c91d114SIoana Ciornei- allowing more than one CPU/management interface: 8433c91d114SIoana Ciornei http://comments.gmane.org/gmane.linux.network/365657 844