13c91d114SIoana Ciornei============
23c91d114SIoana CiorneiArchitecture
33c91d114SIoana Ciornei============
43c91d114SIoana Ciornei
53c91d114SIoana CiorneiThis document describes the **Distributed Switch Architecture (DSA)** subsystem
63c91d114SIoana Ciorneidesign principles, limitations, interactions with other subsystems, and how to
73c91d114SIoana Ciorneidevelop drivers for this subsystem as well as a TODO for developers interested
83c91d114SIoana Ciorneiin joining the effort.
93c91d114SIoana Ciornei
103c91d114SIoana CiorneiDesign principles
113c91d114SIoana Ciornei=================
123c91d114SIoana Ciornei
133c91d114SIoana CiorneiThe Distributed Switch Architecture is a subsystem which was primarily designed
143c91d114SIoana Ciorneito support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line)
153c91d114SIoana Ciorneiusing Linux, but has since evolved to support other vendors as well.
163c91d114SIoana Ciornei
173c91d114SIoana CiorneiThe original philosophy behind this design was to be able to use unmodified
183c91d114SIoana CiorneiLinux tools such as bridge, iproute2, ifconfig to work transparently whether
193c91d114SIoana Ciorneithey configured/queried a switch port network device or a regular network
203c91d114SIoana Ciorneidevice.
213c91d114SIoana Ciornei
223c91d114SIoana CiorneiAn Ethernet switch is typically comprised of multiple front-panel ports, and one
233c91d114SIoana Ciorneior more CPU or management port. The DSA subsystem currently relies on the
243c91d114SIoana Ciorneipresence of a management port connected to an Ethernet controller capable of
253c91d114SIoana Ciorneireceiving Ethernet frames from the switch. This is a very common setup for all
263c91d114SIoana Ciorneikinds of Ethernet switches found in Small Home and Office products: routers,
273c91d114SIoana Ciorneigateways, or even top-of-the rack switches. This host Ethernet controller will
283c91d114SIoana Ciorneibe later referred to as "master" and "cpu" in DSA terminology and code.
293c91d114SIoana Ciornei
303c91d114SIoana CiorneiThe D in DSA stands for Distributed, because the subsystem has been designed
313c91d114SIoana Ciorneiwith the ability to configure and manage cascaded switches on top of each other
323c91d114SIoana Ciorneiusing upstream and downstream Ethernet links between switches. These specific
333c91d114SIoana Ciorneiports are referred to as "dsa" ports in DSA terminology and code. A collection
343c91d114SIoana Ciorneiof multiple switches connected to each other is called a "switch tree".
353c91d114SIoana Ciornei
363c91d114SIoana CiorneiFor each front-panel port, DSA will create specialized network devices which are
373c91d114SIoana Ciorneiused as controlling and data-flowing endpoints for use by the Linux networking
383c91d114SIoana Ciorneistack. These specialized network interfaces are referred to as "slave" network
393c91d114SIoana Ciorneiinterfaces in DSA terminology and code.
403c91d114SIoana Ciornei
413c91d114SIoana CiorneiThe ideal case for using DSA is when an Ethernet switch supports a "switch tag"
423c91d114SIoana Ciorneiwhich is a hardware feature making the switch insert a specific tag for each
433c91d114SIoana CiorneiEthernet frames it received to/from specific ports to help the management
443c91d114SIoana Ciorneiinterface figure out:
453c91d114SIoana Ciornei
463c91d114SIoana Ciornei- what port is this frame coming from
473c91d114SIoana Ciornei- what was the reason why this frame got forwarded
483c91d114SIoana Ciornei- how to send CPU originated traffic to specific ports
493c91d114SIoana Ciornei
503c91d114SIoana CiorneiThe subsystem does support switches not capable of inserting/stripping tags, but
513c91d114SIoana Ciorneithe features might be slightly limited in that case (traffic separation relies
523c91d114SIoana Ciorneion Port-based VLAN IDs).
533c91d114SIoana Ciornei
543c91d114SIoana CiorneiNote that DSA does not currently create network interfaces for the "cpu" and
553c91d114SIoana Ciornei"dsa" ports because:
563c91d114SIoana Ciornei
573c91d114SIoana Ciornei- the "cpu" port is the Ethernet switch facing side of the management
583c91d114SIoana Ciornei  controller, and as such, would create a duplication of feature, since you
593c91d114SIoana Ciornei  would get two interfaces for the same conduit: master netdev, and "cpu" netdev
603c91d114SIoana Ciornei
613c91d114SIoana Ciornei- the "dsa" port(s) are just conduits between two or more switches, and as such
623c91d114SIoana Ciornei  cannot really be used as proper network interfaces either, only the
633c91d114SIoana Ciornei  downstream, or the top-most upstream interface makes sense with that model
643c91d114SIoana Ciornei
653c91d114SIoana CiorneiSwitch tagging protocols
663c91d114SIoana Ciornei------------------------
673c91d114SIoana Ciornei
687714ee15SVladimir OlteanDSA supports many vendor-specific tagging protocols, one software-defined
697714ee15SVladimir Olteantagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).
703c91d114SIoana Ciornei
713c91d114SIoana CiorneiThe exact format of the tag protocol is vendor specific, but in general, they
723c91d114SIoana Ciorneiall contain something which:
733c91d114SIoana Ciornei
743c91d114SIoana Ciornei- identifies which port the Ethernet frame came from/should be sent to
753c91d114SIoana Ciornei- provides a reason why this frame was forwarded to the management interface
763c91d114SIoana Ciornei
777714ee15SVladimir OlteanAll tagging protocols are in ``net/dsa/tag_*.c`` files and implement the
787714ee15SVladimir Olteanmethods of the ``struct dsa_device_ops`` structure, which are detailed below.
797714ee15SVladimir Oltean
807714ee15SVladimir OlteanTagging protocols generally fall in one of three categories:
817714ee15SVladimir Oltean
827714ee15SVladimir Oltean1. The switch-specific frame header is located before the Ethernet header,
837714ee15SVladimir Oltean   shifting to the right (from the perspective of the DSA master's frame
847714ee15SVladimir Oltean   parser) the MAC DA, MAC SA, EtherType and the entire L2 payload.
857714ee15SVladimir Oltean2. The switch-specific frame header is located before the EtherType, keeping
867714ee15SVladimir Oltean   the MAC DA and MAC SA in place from the DSA master's perspective, but
877714ee15SVladimir Oltean   shifting the 'real' EtherType and L2 payload to the right.
887714ee15SVladimir Oltean3. The switch-specific frame header is located at the tail of the packet,
897714ee15SVladimir Oltean   keeping all frame headers in place and not altering the view of the packet
907714ee15SVladimir Oltean   that the DSA master's frame parser has.
917714ee15SVladimir Oltean
927714ee15SVladimir OlteanA tagging protocol may tag all packets with switch tags of the same length, or
937714ee15SVladimir Olteanthe tag length might vary (for example packets with PTP timestamps might
947714ee15SVladimir Olteanrequire an extended switch tag, or there might be one tag length on TX and a
957714ee15SVladimir Olteandifferent one on RX). Either way, the tagging protocol driver must populate the
967714ee15SVladimir Oltean``struct dsa_device_ops::overhead`` with the length in octets of the longest
977714ee15SVladimir Olteanswitch frame header. The DSA framework will automatically adjust the MTU of the
987714ee15SVladimir Olteanmaster interface to accomodate for this extra size in order for DSA user ports
997714ee15SVladimir Olteanto support the standard MTU (L2 payload length) of 1500 octets. The ``overhead``
1007714ee15SVladimir Olteanis also used to request from the network stack, on a best-effort basis, the
1017714ee15SVladimir Olteanallocation of packets with a ``needed_headroom`` or ``needed_tailroom``
1027714ee15SVladimir Olteansufficient such that the act of pushing the switch tag on transmission of a
1037714ee15SVladimir Olteanpacket does not cause it to reallocate due to lack of memory.
1047714ee15SVladimir Oltean
1057714ee15SVladimir OlteanEven though applications are not expected to parse DSA-specific frame headers,
1067714ee15SVladimir Olteanthe format on the wire of the tagging protocol represents an Application Binary
1077714ee15SVladimir OlteanInterface exposed by the kernel towards user space, for decoders such as
1087714ee15SVladimir Oltean``libpcap``. The tagging protocol driver must populate the ``proto`` member of
1097714ee15SVladimir Oltean``struct dsa_device_ops`` with a value that uniquely describes the
1107714ee15SVladimir Olteancharacteristics of the interaction required between the switch hardware and the
1117714ee15SVladimir Olteandata path driver: the offset of each bit field within the frame header and any
1127714ee15SVladimir Olteanstateful processing required to deal with the frames (as may be required for
1137714ee15SVladimir OlteanPTP timestamping).
1147714ee15SVladimir Oltean
1157714ee15SVladimir OlteanFrom the perspective of the network stack, all switches within the same DSA
1167714ee15SVladimir Olteanswitch tree use the same tagging protocol. In case of a packet transiting a
1177714ee15SVladimir Olteanfabric with more than one switch, the switch-specific frame header is inserted
1187714ee15SVladimir Olteanby the first switch in the fabric that the packet was received on. This header
1197714ee15SVladimir Olteantypically contains information regarding its type (whether it is a control
1207714ee15SVladimir Olteanframe that must be trapped to the CPU, or a data frame to be forwarded).
1217714ee15SVladimir OlteanControl frames should be decapsulated only by the software data path, whereas
1227714ee15SVladimir Olteandata frames might also be autonomously forwarded towards other user ports of
1237714ee15SVladimir Olteanother switches from the same fabric, and in this case, the outermost switch
1247714ee15SVladimir Olteanports must decapsulate the packet.
1257714ee15SVladimir Oltean
1267714ee15SVladimir OlteanNote that in certain cases, it might be the case that the tagging format used
1277714ee15SVladimir Olteanby a leaf switch (not connected directly to the CPU) to not be the same as what
1287714ee15SVladimir Olteanthe network stack sees. This can be seen with Marvell switch trees, where the
1297714ee15SVladimir OlteanCPU port can be configured to use either the DSA or the Ethertype DSA (EDSA)
1307714ee15SVladimir Olteanformat, but the DSA links are configured to use the shorter (without Ethertype)
1317714ee15SVladimir OlteanDSA frame header, in order to reduce the autonomous packet forwarding overhead.
1327714ee15SVladimir OlteanIt still remains the case that, if the DSA switch tree is configured for the
1337714ee15SVladimir OlteanEDSA tagging protocol, the operating system sees EDSA-tagged packets from the
1347714ee15SVladimir Olteanleaf switches that tagged them with the shorter DSA header. This can be done
1357714ee15SVladimir Olteanbecause the Marvell switch connected directly to the CPU is configured to
1367714ee15SVladimir Olteanperform tag translation between DSA and EDSA (which is simply the operation of
1377714ee15SVladimir Olteanadding or removing the ``ETH_P_EDSA`` EtherType and some padding octets).
1387714ee15SVladimir Oltean
1397714ee15SVladimir OlteanIt is possible to construct cascaded setups of DSA switches even if their
1407714ee15SVladimir Olteantagging protocols are not compatible with one another. In this case, there are
1417714ee15SVladimir Olteanno DSA links in this fabric, and each switch constitutes a disjoint DSA switch
1427714ee15SVladimir Olteantree. The DSA links are viewed as simply a pair of a DSA master (the out-facing
1437714ee15SVladimir Olteanport of the upstream DSA switch) and a CPU port (the in-facing port of the
1447714ee15SVladimir Olteandownstream DSA switch).
1457714ee15SVladimir Oltean
1467714ee15SVladimir OlteanThe tagging protocol of the attached DSA switch tree can be viewed through the
1477714ee15SVladimir Oltean``dsa/tagging`` sysfs attribute of the DSA master::
1487714ee15SVladimir Oltean
1497714ee15SVladimir Oltean    cat /sys/class/net/eth0/dsa/tagging
1507714ee15SVladimir Oltean
1517714ee15SVladimir OlteanIf the hardware and driver are capable, the tagging protocol of the DSA switch
1527714ee15SVladimir Olteantree can be changed at runtime. This is done by writing the new tagging
1537714ee15SVladimir Olteanprotocol name to the same sysfs device attribute as above (the DSA master and
1547714ee15SVladimir Olteanall attached switch ports must be down while doing this).
1557714ee15SVladimir Oltean
1567714ee15SVladimir OlteanIt is desirable that all tagging protocols are testable with the ``dsa_loop``
1577714ee15SVladimir Olteanmockup driver, which can be attached to any network interface. The goal is that
1587714ee15SVladimir Olteanany network interface should be capable of transmitting the same packet in the
1597714ee15SVladimir Olteansame way, and the tagger should decode the same received packet in the same way
1607714ee15SVladimir Olteanregardless of the driver used for the switch control path, and the driver used
1617714ee15SVladimir Olteanfor the DSA master.
1627714ee15SVladimir Oltean
1637714ee15SVladimir OlteanThe transmission of a packet goes through the tagger's ``xmit`` function.
1647714ee15SVladimir OlteanThe passed ``struct sk_buff *skb`` has ``skb->data`` pointing at
1657714ee15SVladimir Oltean``skb_mac_header(skb)``, i.e. at the destination MAC address, and the passed
1667714ee15SVladimir Oltean``struct net_device *dev`` represents the virtual DSA user network interface
1677714ee15SVladimir Olteanwhose hardware counterpart the packet must be steered to (i.e. ``swp0``).
1687714ee15SVladimir OlteanThe job of this method is to prepare the skb in a way that the switch will
1697714ee15SVladimir Olteanunderstand what egress port the packet is for (and not deliver it towards other
1707714ee15SVladimir Olteanports). Typically this is fulfilled by pushing a frame header. Checking for
1717714ee15SVladimir Olteaninsufficient size in the skb headroom or tailroom is unnecessary provided that
1727714ee15SVladimir Olteanthe ``overhead`` and ``tail_tag`` properties were filled out properly, because
1737714ee15SVladimir OlteanDSA ensures there is enough space before calling this method.
1747714ee15SVladimir Oltean
1757714ee15SVladimir OlteanThe reception of a packet goes through the tagger's ``rcv`` function. The
1767714ee15SVladimir Olteanpassed ``struct sk_buff *skb`` has ``skb->data`` pointing at
1777714ee15SVladimir Oltean``skb_mac_header(skb) + ETH_ALEN`` octets, i.e. to where the first octet after
1787714ee15SVladimir Olteanthe EtherType would have been, were this frame not tagged. The role of this
1797714ee15SVladimir Olteanmethod is to consume the frame header, adjust ``skb->data`` to really point at
1807714ee15SVladimir Olteanthe first octet after the EtherType, and to change ``skb->dev`` to point to the
1817714ee15SVladimir Olteanvirtual DSA user network interface corresponding to the physical front-facing
1827714ee15SVladimir Olteanswitch port that the packet was received on.
1837714ee15SVladimir Oltean
1847714ee15SVladimir OlteanSince tagging protocols in category 1 and 2 break software (and most often also
1857714ee15SVladimir Olteanhardware) packet dissection on the DSA master, features such as RPS (Receive
1867714ee15SVladimir OlteanPacket Steering) on the DSA master would be broken. The DSA framework deals
1877714ee15SVladimir Olteanwith this by hooking into the flow dissector and shifting the offset at which
1887714ee15SVladimir Olteanthe IP header is to be found in the tagged frame as seen by the DSA master.
1897714ee15SVladimir OlteanThis behavior is automatic based on the ``overhead`` value of the tagging
1907714ee15SVladimir Olteanprotocol. If not all packets are of equal size, the tagger can implement the
1917714ee15SVladimir Oltean``flow_dissect`` method of the ``struct dsa_device_ops`` and override this
1927714ee15SVladimir Olteandefault behavior by specifying the correct offset incurred by each individual
1937714ee15SVladimir OlteanRX packet. Tail taggers do not cause issues to the flow dissector.
1947714ee15SVladimir Oltean
1957714ee15SVladimir OlteanDue to various reasons (most common being category 1 taggers being associated
1967714ee15SVladimir Olteanwith DSA-unaware masters, mangling what the master perceives as MAC DA), the
1977714ee15SVladimir Olteantagging protocol may require the DSA master to operate in promiscuous mode, to
1987714ee15SVladimir Olteanreceive all frames regardless of the value of the MAC DA. This can be done by
1997714ee15SVladimir Olteansetting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.
2007714ee15SVladimir OlteanNote that this assumes a DSA-unaware master driver, which is the norm.
2017714ee15SVladimir Oltean
2027714ee15SVladimir OlteanHardware manufacturers are strongly discouraged to do this, but some tagging
2037714ee15SVladimir Olteanprotocols might not provide source port information on RX for all packets, but
2047714ee15SVladimir Olteane.g. only for control traffic (link-local PDUs). In this case, by implementing
2057714ee15SVladimir Olteanthe ``filter`` method of ``struct dsa_device_ops``, the tagger might select
2067714ee15SVladimir Olteanwhich packets are to be redirected on RX towards the virtual DSA user network
2077714ee15SVladimir Olteaninterfaces, and which are to be left in the DSA master's RX data path.
2087714ee15SVladimir Oltean
2097714ee15SVladimir OlteanIt might also happen (although silicon vendors are strongly discouraged to
2107714ee15SVladimir Olteanproduce hardware like this) that a tagging protocol splits the switch-specific
2117714ee15SVladimir Olteaninformation into a header portion and a tail portion, therefore not falling
2127714ee15SVladimir Olteancleanly into any of the above 3 categories. DSA does not support this
2137714ee15SVladimir Olteanconfiguration.
2147714ee15SVladimir Oltean
2153c91d114SIoana CiorneiMaster network devices
2163c91d114SIoana Ciornei----------------------
2173c91d114SIoana Ciornei
2183c91d114SIoana CiorneiMaster network devices are regular, unmodified Linux network device drivers for
2193c91d114SIoana Ciorneithe CPU/management Ethernet interface. Such a driver might occasionally need to
2203c91d114SIoana Ciorneiknow whether DSA is enabled (e.g.: to enable/disable specific offload features),
2213c91d114SIoana Ciorneibut the DSA subsystem has been proven to work with industry standard drivers:
2223c91d114SIoana Ciornei``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these
2233c91d114SIoana Ciorneidrivers. Such network devices are also often referred to as conduit network
2243c91d114SIoana Ciorneidevices since they act as a pipe between the host processor and the hardware
2253c91d114SIoana CiorneiEthernet switch.
2263c91d114SIoana Ciornei
2273c91d114SIoana CiorneiNetworking stack hooks
2283c91d114SIoana Ciornei----------------------
2293c91d114SIoana Ciornei
2304f6a009cSRandy DunlapWhen a master netdev is used with DSA, a small hook is placed in the
2313c91d114SIoana Ciorneinetworking stack is in order to have the DSA subsystem process the Ethernet
2323c91d114SIoana Ciorneiswitch specific tagging protocol. DSA accomplishes this by registering a
2333c91d114SIoana Ciorneispecific (and fake) Ethernet type (later becoming ``skb->protocol``) with the
2343c91d114SIoana Ciorneinetworking stack, this is also known as a ``ptype`` or ``packet_type``. A typical
2353c91d114SIoana CiorneiEthernet Frame receive sequence looks like this:
2363c91d114SIoana Ciornei
2373c91d114SIoana CiorneiMaster network device (e.g.: e1000e):
2383c91d114SIoana Ciornei
2393c91d114SIoana Ciornei1. Receive interrupt fires:
2403c91d114SIoana Ciornei
2413c91d114SIoana Ciornei        - receive function is invoked
2423c91d114SIoana Ciornei        - basic packet processing is done: getting length, status etc.
2433c91d114SIoana Ciornei        - packet is prepared to be processed by the Ethernet layer by calling
2443c91d114SIoana Ciornei          ``eth_type_trans``
2453c91d114SIoana Ciornei
2463c91d114SIoana Ciornei2. net/ethernet/eth.c::
2473c91d114SIoana Ciornei
2483c91d114SIoana Ciornei          eth_type_trans(skb, dev)
2493c91d114SIoana Ciornei                  if (dev->dsa_ptr != NULL)
2503c91d114SIoana Ciornei                          -> skb->protocol = ETH_P_XDSA
2513c91d114SIoana Ciornei
2523c91d114SIoana Ciornei3. drivers/net/ethernet/\*::
2533c91d114SIoana Ciornei
2543c91d114SIoana Ciornei          netif_receive_skb(skb)
2553c91d114SIoana Ciornei                  -> iterate over registered packet_type
2563c91d114SIoana Ciornei                          -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
2573c91d114SIoana Ciornei
2583c91d114SIoana Ciornei4. net/dsa/dsa.c::
2593c91d114SIoana Ciornei
2603c91d114SIoana Ciornei          -> dsa_switch_rcv()
2613c91d114SIoana Ciornei                  -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c'
2623c91d114SIoana Ciornei
2633c91d114SIoana Ciornei5. net/dsa/tag_*.c:
2643c91d114SIoana Ciornei
2653c91d114SIoana Ciornei        - inspect and strip switch tag protocol to determine originating port
2663c91d114SIoana Ciornei        - locate per-port network device
2673c91d114SIoana Ciornei        - invoke ``eth_type_trans()`` with the DSA slave network device
2683c91d114SIoana Ciornei        - invoked ``netif_receive_skb()``
2693c91d114SIoana Ciornei
2703c91d114SIoana CiorneiPast this point, the DSA slave network devices get delivered regular Ethernet
2713c91d114SIoana Ciorneiframes that can be processed by the networking stack.
2723c91d114SIoana Ciornei
2733c91d114SIoana CiorneiSlave network devices
2743c91d114SIoana Ciornei---------------------
2753c91d114SIoana Ciornei
2763c91d114SIoana CiorneiSlave network devices created by DSA are stacked on top of their master network
2773c91d114SIoana Ciorneidevice, each of these network interfaces will be responsible for being a
2783c91d114SIoana Ciorneicontrolling and data-flowing end-point for each front-panel port of the switch.
2793c91d114SIoana CiorneiThese interfaces are specialized in order to:
2803c91d114SIoana Ciornei
2813c91d114SIoana Ciornei- insert/remove the switch tag protocol (if it exists) when sending traffic
2823c91d114SIoana Ciornei  to/from specific switch ports
2833c91d114SIoana Ciornei- query the switch for ethtool operations: statistics, link state,
2843c91d114SIoana Ciornei  Wake-on-LAN, register dumps...
2853c91d114SIoana Ciornei- external/internal PHY management: link, auto-negotiation etc.
2863c91d114SIoana Ciornei
2873c91d114SIoana CiorneiThese slave network devices have custom net_device_ops and ethtool_ops function
2883c91d114SIoana Ciorneipointers which allow DSA to introduce a level of layering between the networking
2893c91d114SIoana Ciorneistack/ethtool, and the switch driver implementation.
2903c91d114SIoana Ciornei
2913c91d114SIoana CiorneiUpon frame transmission from these slave network devices, DSA will look up which
2923c91d114SIoana Ciorneiswitch tagging protocol is currently registered with these network devices, and
2933c91d114SIoana Ciorneiinvoke a specific transmit routine which takes care of adding the relevant
2943c91d114SIoana Ciorneiswitch tag in the Ethernet frames.
2953c91d114SIoana Ciornei
2963c91d114SIoana CiorneiThese frames are then queued for transmission using the master network device
2973c91d114SIoana Ciornei``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the
2983c91d114SIoana CiorneiEthernet switch will be able to process these incoming frames from the
2993c91d114SIoana Ciorneimanagement interface and delivers these frames to the physical switch port.
3003c91d114SIoana Ciornei
3013c91d114SIoana CiorneiGraphical representation
3023c91d114SIoana Ciornei------------------------
3033c91d114SIoana Ciornei
3043c91d114SIoana CiorneiSummarized, this is basically how DSA looks like from a network device
3053c91d114SIoana Ciorneiperspective::
3063c91d114SIoana Ciornei
3070f455371SVladimir Oltean                Unaware application
3080f455371SVladimir Oltean              opens and binds socket
3090f455371SVladimir Oltean                       |  ^
3103c91d114SIoana Ciornei                       |  |
3110f455371SVladimir Oltean           +-----------v--|--------------------+
3120f455371SVladimir Oltean           |+------+ +------+ +------+ +------+|
3130f455371SVladimir Oltean           || swp0 | | swp1 | | swp2 | | swp3 ||
3140f455371SVladimir Oltean           |+------+-+------+-+------+-+------+|
3150f455371SVladimir Oltean           |          DSA switch driver        |
3160f455371SVladimir Oltean           +-----------------------------------+
3170f455371SVladimir Oltean                         |        ^
3180f455371SVladimir Oltean            Tag added by |        | Tag consumed by
3190f455371SVladimir Oltean           switch driver |        | switch driver
3200f455371SVladimir Oltean                         v        |
3210f455371SVladimir Oltean           +-----------------------------------+
3220f455371SVladimir Oltean           | Unmodified host interface driver  | Software
3230f455371SVladimir Oltean   --------+-----------------------------------+------------
3240f455371SVladimir Oltean           |       Host interface (eth0)       | Hardware
3250f455371SVladimir Oltean           +-----------------------------------+
3260f455371SVladimir Oltean                         |        ^
3270f455371SVladimir Oltean         Tag consumed by |        | Tag added by
3280f455371SVladimir Oltean         switch hardware |        | switch hardware
3290f455371SVladimir Oltean                         v        |
3300f455371SVladimir Oltean           +-----------------------------------+
3310f455371SVladimir Oltean           |               Switch              |
3320f455371SVladimir Oltean           |+------+ +------+ +------+ +------+|
3330f455371SVladimir Oltean           || swp0 | | swp1 | | swp2 | | swp3 ||
3340f455371SVladimir Oltean           ++------+-+------+-+------+-+------++
3353c91d114SIoana Ciornei
3363c91d114SIoana CiorneiSlave MDIO bus
3373c91d114SIoana Ciornei--------------
3383c91d114SIoana Ciornei
3393c91d114SIoana CiorneiIn order to be able to read to/from a switch PHY built into it, DSA creates a
3403c91d114SIoana Ciorneislave MDIO bus which allows a specific switch driver to divert and intercept
3413c91d114SIoana CiorneiMDIO reads/writes towards specific PHY addresses. In most MDIO-connected
3423c91d114SIoana Ciorneiswitches, these functions would utilize direct or indirect PHY addressing mode
3433c91d114SIoana Ciorneito return standard MII registers from the switch builtin PHYs, allowing the PHY
3443c91d114SIoana Ciorneilibrary and/or to return link status, link partner pages, auto-negotiation
3453c91d114SIoana Ciorneiresults etc..
3463c91d114SIoana Ciornei
3473c91d114SIoana CiorneiFor Ethernet switches which have both external and internal MDIO busses, the
3483c91d114SIoana Ciorneislave MII bus can be utilized to mux/demux MDIO reads and writes towards either
3493c91d114SIoana Ciorneiinternal or external MDIO devices this switch might be connected to: internal
3503c91d114SIoana CiorneiPHYs, external PHYs, or even external switches.
3513c91d114SIoana Ciornei
3523c91d114SIoana CiorneiData structures
3533c91d114SIoana Ciornei---------------
3543c91d114SIoana Ciornei
3553c91d114SIoana CiorneiDSA data structures are defined in ``include/net/dsa.h`` as well as
3563c91d114SIoana Ciornei``net/dsa/dsa_priv.h``:
3573c91d114SIoana Ciornei
3583c91d114SIoana Ciornei- ``dsa_chip_data``: platform data configuration for a given switch device,
3593c91d114SIoana Ciornei  this structure describes a switch device's parent device, its address, as
3603c91d114SIoana Ciornei  well as various properties of its ports: names/labels, and finally a routing
3613c91d114SIoana Ciornei  table indication (when cascading switches)
3623c91d114SIoana Ciornei
3633c91d114SIoana Ciornei- ``dsa_platform_data``: platform device configuration data which can reference
3643c91d114SIoana Ciornei  a collection of dsa_chip_data structure if multiples switches are cascaded,
3653c91d114SIoana Ciornei  the master network device this switch tree is attached to needs to be
3663c91d114SIoana Ciornei  referenced
3673c91d114SIoana Ciornei
3683c91d114SIoana Ciornei- ``dsa_switch_tree``: structure assigned to the master network device under
3693c91d114SIoana Ciornei  ``dsa_ptr``, this structure references a dsa_platform_data structure as well as
3703c91d114SIoana Ciornei  the tagging protocol supported by the switch tree, and which receive/transmit
3713c91d114SIoana Ciornei  function hooks should be invoked, information about the directly attached
3723c91d114SIoana Ciornei  switch is also provided: CPU port. Finally, a collection of dsa_switch are
3733c91d114SIoana Ciornei  referenced to address individual switches in the tree.
3743c91d114SIoana Ciornei
3753c91d114SIoana Ciornei- ``dsa_switch``: structure describing a switch device in the tree, referencing
3763c91d114SIoana Ciornei  a ``dsa_switch_tree`` as a backpointer, slave network devices, master network
3773c91d114SIoana Ciornei  device, and a reference to the backing``dsa_switch_ops``
3783c91d114SIoana Ciornei
3793c91d114SIoana Ciornei- ``dsa_switch_ops``: structure referencing function pointers, see below for a
3803c91d114SIoana Ciornei  full description.
3813c91d114SIoana Ciornei
3823c91d114SIoana CiorneiDesign limitations
3833c91d114SIoana Ciornei==================
3843c91d114SIoana Ciornei
3853c91d114SIoana CiorneiLack of CPU/DSA network devices
3863c91d114SIoana Ciornei-------------------------------
3873c91d114SIoana Ciornei
3883c91d114SIoana CiorneiDSA does not currently create slave network devices for the CPU or DSA ports, as
3893c91d114SIoana Ciorneidescribed before. This might be an issue in the following cases:
3903c91d114SIoana Ciornei
3913c91d114SIoana Ciornei- inability to fetch switch CPU port statistics counters using ethtool, which
3923c91d114SIoana Ciornei  can make it harder to debug MDIO switch connected using xMII interfaces
3933c91d114SIoana Ciornei
3943c91d114SIoana Ciornei- inability to configure the CPU port link parameters based on the Ethernet
3953c91d114SIoana Ciornei  controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/
3963c91d114SIoana Ciornei
3973c91d114SIoana Ciornei- inability to configure specific VLAN IDs / trunking VLANs between switches
3983c91d114SIoana Ciornei  when using a cascaded setup
3993c91d114SIoana Ciornei
4003c91d114SIoana CiorneiCommon pitfalls using DSA setups
4013c91d114SIoana Ciornei--------------------------------
4023c91d114SIoana Ciornei
4033c91d114SIoana CiorneiOnce a master network device is configured to use DSA (dev->dsa_ptr becomes
4043c91d114SIoana Ciorneinon-NULL), and the switch behind it expects a tagging protocol, this network
4053c91d114SIoana Ciorneiinterface can only exclusively be used as a conduit interface. Sending packets
4063c91d114SIoana Ciorneidirectly through this interface (e.g.: opening a socket using this interface)
4073c91d114SIoana Ciorneiwill not make us go through the switch tagging protocol transmit function, so
4083c91d114SIoana Ciorneithe Ethernet switch on the other end, expecting a tag will typically drop this
4093c91d114SIoana Ciorneiframe.
4103c91d114SIoana Ciornei
4113c91d114SIoana CiorneiInteractions with other subsystems
4123c91d114SIoana Ciornei==================================
4133c91d114SIoana Ciornei
4143c91d114SIoana CiorneiDSA currently leverages the following subsystems:
4153c91d114SIoana Ciornei
4163c91d114SIoana Ciornei- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c``
4173c91d114SIoana Ciornei- Switchdev:``net/switchdev/*``
4183c91d114SIoana Ciornei- Device Tree for various of_* functions
4198411abbcSVladimir Oltean- Devlink: ``net/core/devlink.c``
4203c91d114SIoana Ciornei
4213c91d114SIoana CiorneiMDIO/PHY library
4223c91d114SIoana Ciornei----------------
4233c91d114SIoana Ciornei
4243c91d114SIoana CiorneiSlave network devices exposed by DSA may or may not be interfacing with PHY
4253c91d114SIoana Ciorneidevices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA
4263c91d114SIoana Ciorneisubsystem deals with all possible combinations:
4273c91d114SIoana Ciornei
4283c91d114SIoana Ciornei- internal PHY devices, built into the Ethernet switch hardware
4293c91d114SIoana Ciornei- external PHY devices, connected via an internal or external MDIO bus
4303c91d114SIoana Ciornei- internal PHY devices, connected via an internal MDIO bus
4313c91d114SIoana Ciornei- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
4323c91d114SIoana Ciornei  fixed PHYs
4333c91d114SIoana Ciornei
4343c91d114SIoana CiorneiThe PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the
4353c91d114SIoana Ciorneilogic basically looks like this:
4363c91d114SIoana Ciornei
4373c91d114SIoana Ciornei- if Device Tree is used, the PHY device is looked up using the standard
4383c91d114SIoana Ciornei  "phy-handle" property, if found, this PHY device is created and registered
4393c91d114SIoana Ciornei  using ``of_phy_connect()``
4403c91d114SIoana Ciornei
4413c91d114SIoana Ciornei- if Device Tree is used, and the PHY device is "fixed", that is, conforms to
4423c91d114SIoana Ciornei  the definition of a non-MDIO managed PHY as defined in
4433c91d114SIoana Ciornei  ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered
4443c91d114SIoana Ciornei  and connected transparently using the special fixed MDIO bus driver
4453c91d114SIoana Ciornei
4463c91d114SIoana Ciornei- finally, if the PHY is built into the switch, as is very common with
4473c91d114SIoana Ciornei  standalone switch packages, the PHY is probed using the slave MII bus created
4483c91d114SIoana Ciornei  by DSA
4493c91d114SIoana Ciornei
4503c91d114SIoana Ciornei
4513c91d114SIoana CiorneiSWITCHDEV
4523c91d114SIoana Ciornei---------
4533c91d114SIoana Ciornei
4543c91d114SIoana CiorneiDSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and
4553c91d114SIoana Ciorneimore specifically with its VLAN filtering portion when configuring VLANs on top
456f8843991SVladimir Olteanof per-port slave network devices. As of today, the only SWITCHDEV objects
457f8843991SVladimir Olteansupported by DSA are the FDB and VLAN objects.
4583c91d114SIoana Ciornei
4598411abbcSVladimir OlteanDevlink
4608411abbcSVladimir Oltean-------
4618411abbcSVladimir Oltean
4628411abbcSVladimir OlteanDSA registers one devlink device per physical switch in the fabric.
4638411abbcSVladimir OlteanFor each devlink device, every physical port (i.e. user ports, CPU ports, DSA
4648411abbcSVladimir Olteanlinks or unused ports) is exposed as a devlink port.
4658411abbcSVladimir Oltean
4668411abbcSVladimir OlteanDSA drivers can make use of the following devlink features:
4678411abbcSVladimir Oltean- Regions: debugging feature which allows user space to dump driver-defined
4688411abbcSVladimir Oltean  areas of hardware information in a low-level, binary format. Both global
4698411abbcSVladimir Oltean  regions as well as per-port regions are supported. It is possible to export
4708411abbcSVladimir Oltean  devlink regions even for pieces of data that are already exposed in some way
4718411abbcSVladimir Oltean  to the standard iproute2 user space programs (ip-link, bridge), like address
4728411abbcSVladimir Oltean  tables and VLAN tables. For example, this might be useful if the tables
4738411abbcSVladimir Oltean  contain additional hardware-specific details which are not visible through
4748411abbcSVladimir Oltean  the iproute2 abstraction, or it might be useful to inspect these tables on
4758411abbcSVladimir Oltean  the non-user ports too, which are invisible to iproute2 because no network
4768411abbcSVladimir Oltean  interface is registered for them.
4778411abbcSVladimir Oltean- Params: a feature which enables user to configure certain low-level tunable
4788411abbcSVladimir Oltean  knobs pertaining to the device. Drivers may implement applicable generic
4798411abbcSVladimir Oltean  devlink params, or may add new device-specific devlink params.
4808411abbcSVladimir Oltean- Resources: a monitoring feature which enables users to see the degree of
4818411abbcSVladimir Oltean  utilization of certain hardware tables in the device, such as FDB, VLAN, etc.
4828411abbcSVladimir Oltean- Shared buffers: a QoS feature for adjusting and partitioning memory and frame
4838411abbcSVladimir Oltean  reservations per port and per traffic class, in the ingress and egress
4848411abbcSVladimir Oltean  directions, such that low-priority bulk traffic does not impede the
4858411abbcSVladimir Oltean  processing of high-priority critical traffic.
4868411abbcSVladimir Oltean
4878411abbcSVladimir OlteanFor more details, consult ``Documentation/networking/devlink/``.
4888411abbcSVladimir Oltean
4893c91d114SIoana CiorneiDevice Tree
4903c91d114SIoana Ciornei-----------
4913c91d114SIoana Ciornei
4923c91d114SIoana CiorneiDSA features a standardized binding which is documented in
4933c91d114SIoana Ciornei``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper
4943c91d114SIoana Ciorneifunctions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query
4953c91d114SIoana Ciorneiper-port PHY specific details: interface connection, MDIO bus location etc..
4963c91d114SIoana Ciornei
4973c91d114SIoana CiorneiDriver development
4983c91d114SIoana Ciornei==================
4993c91d114SIoana Ciornei
5003c91d114SIoana CiorneiDSA switch drivers need to implement a dsa_switch_ops structure which will
5013c91d114SIoana Ciorneicontain the various members described below.
5023c91d114SIoana Ciornei
5033c91d114SIoana Ciornei``register_switch_driver()`` registers this dsa_switch_ops in its internal list
5043c91d114SIoana Ciorneiof drivers to probe for. ``unregister_switch_driver()`` does the exact opposite.
5053c91d114SIoana Ciornei
5063c91d114SIoana CiorneiUnless requested differently by setting the priv_size member accordingly, DSA
5073c91d114SIoana Ciorneidoes not allocate any driver private context space.
5083c91d114SIoana Ciornei
5093c91d114SIoana CiorneiSwitch configuration
5103c91d114SIoana Ciornei--------------------
5113c91d114SIoana Ciornei
5123c91d114SIoana Ciornei- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported,
5133c91d114SIoana Ciornei  should be a valid value from the ``dsa_tag_protocol`` enum
5143c91d114SIoana Ciornei
5153c91d114SIoana Ciornei- ``probe``: probe routine which will be invoked by the DSA platform device upon
5163c91d114SIoana Ciornei  registration to test for the presence/absence of a switch device. For MDIO
5173c91d114SIoana Ciornei  devices, it is recommended to issue a read towards internal registers using
5183c91d114SIoana Ciornei  the switch pseudo-PHY and return whether this is a supported device. For other
5193c91d114SIoana Ciornei  buses, return a non-NULL string
5203c91d114SIoana Ciornei
5213c91d114SIoana Ciornei- ``setup``: setup function for the switch, this function is responsible for setting
5223c91d114SIoana Ciornei  up the ``dsa_switch_ops`` private structure with all it needs: register maps,
5233c91d114SIoana Ciornei  interrupts, mutexes, locks etc.. This function is also expected to properly
5243c91d114SIoana Ciornei  configure the switch to separate all network interfaces from each other, that
5253c91d114SIoana Ciornei  is, they should be isolated by the switch hardware itself, typically by creating
5263c91d114SIoana Ciornei  a Port-based VLAN ID for each port and allowing only the CPU port and the
5273c91d114SIoana Ciornei  specific port to be in the forwarding vector. Ports that are unused by the
5283c91d114SIoana Ciornei  platform should be disabled. Past this function, the switch is expected to be
5293c91d114SIoana Ciornei  fully configured and ready to serve any kind of request. It is recommended
5303c91d114SIoana Ciornei  to issue a software reset of the switch during this setup function in order to
5313c91d114SIoana Ciornei  avoid relying on what a previous software agent such as a bootloader/firmware
5323c91d114SIoana Ciornei  may have previously configured.
5333c91d114SIoana Ciornei
5343c91d114SIoana CiorneiPHY devices and link management
5353c91d114SIoana Ciornei-------------------------------
5363c91d114SIoana Ciornei
5373c91d114SIoana Ciornei- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs,
5383c91d114SIoana Ciornei  if the PHY library PHY driver needs to know about information it cannot obtain
5393c91d114SIoana Ciornei  on its own (e.g.: coming from switch memory mapped registers), this function
5403c91d114SIoana Ciornei  should return a 32-bits bitmask of "flags", that is private between the switch
5413c91d114SIoana Ciornei  driver and the Ethernet PHY driver in ``drivers/net/phy/\*``.
5423c91d114SIoana Ciornei
5433c91d114SIoana Ciornei- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read
5443c91d114SIoana Ciornei  the switch port MDIO registers. If unavailable, return 0xffff for each read.
5453c91d114SIoana Ciornei  For builtin switch Ethernet PHYs, this function should allow reading the link
5463c91d114SIoana Ciornei  status, auto-negotiation results, link partner pages etc..
5473c91d114SIoana Ciornei
5483c91d114SIoana Ciornei- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write
5493c91d114SIoana Ciornei  to the switch port MDIO registers. If unavailable return a negative error
5503c91d114SIoana Ciornei  code.
5513c91d114SIoana Ciornei
5523c91d114SIoana Ciornei- ``adjust_link``: Function invoked by the PHY library when a slave network device
5533c91d114SIoana Ciornei  is attached to a PHY device. This function is responsible for appropriately
5543c91d114SIoana Ciornei  configuring the switch port link parameters: speed, duplex, pause based on
5553c91d114SIoana Ciornei  what the ``phy_device`` is providing.
5563c91d114SIoana Ciornei
5573c91d114SIoana Ciornei- ``fixed_link_update``: Function invoked by the PHY library, and specifically by
5583c91d114SIoana Ciornei  the fixed PHY driver asking the switch driver for link parameters that could
5593c91d114SIoana Ciornei  not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
5603c91d114SIoana Ciornei  This is particularly useful for specific kinds of hardware such as QSGMII,
5613c91d114SIoana Ciornei  MoCA or other kinds of non-MDIO managed PHYs where out of band link
5623c91d114SIoana Ciornei  information is obtained
5633c91d114SIoana Ciornei
5643c91d114SIoana CiorneiEthtool operations
5653c91d114SIoana Ciornei------------------
5663c91d114SIoana Ciornei
5673c91d114SIoana Ciornei- ``get_strings``: ethtool function used to query the driver's strings, will
5683c91d114SIoana Ciornei  typically return statistics strings, private flags strings etc.
5693c91d114SIoana Ciornei
5703c91d114SIoana Ciornei- ``get_ethtool_stats``: ethtool function used to query per-port statistics and
5713c91d114SIoana Ciornei  return their values. DSA overlays slave network devices general statistics:
5723c91d114SIoana Ciornei  RX/TX counters from the network device, with switch driver specific statistics
5733c91d114SIoana Ciornei  per port
5743c91d114SIoana Ciornei
5753c91d114SIoana Ciornei- ``get_sset_count``: ethtool function used to query the number of statistics items
5763c91d114SIoana Ciornei
5773c91d114SIoana Ciornei- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this
5783c91d114SIoana Ciornei  function may, for certain implementations also query the master network device
5793c91d114SIoana Ciornei  Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
5803c91d114SIoana Ciornei
5813c91d114SIoana Ciornei- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port,
5823c91d114SIoana Ciornei  direct counterpart to set_wol with similar restrictions
5833c91d114SIoana Ciornei
5843c91d114SIoana Ciornei- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green
5853c91d114SIoana Ciornei  Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
5863c91d114SIoana Ciornei  PHY level if relevant. This function should enable EEE at the switch port MAC
5873c91d114SIoana Ciornei  controller and data-processing logic
5883c91d114SIoana Ciornei
5893c91d114SIoana Ciornei- ``get_eee``: ethtool function which is used to query a switch port EEE settings,
5903c91d114SIoana Ciornei  this function should return the EEE state of the switch port MAC controller
5913c91d114SIoana Ciornei  and data-processing logic as well as query the PHY for its currently configured
5923c91d114SIoana Ciornei  EEE settings
5933c91d114SIoana Ciornei
5943c91d114SIoana Ciornei- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM
5953c91d114SIoana Ciornei  length/size in bytes
5963c91d114SIoana Ciornei
5973c91d114SIoana Ciornei- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents
5983c91d114SIoana Ciornei
5993c91d114SIoana Ciornei- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM
6003c91d114SIoana Ciornei
6013c91d114SIoana Ciornei- ``get_regs_len``: ethtool function returning the register length for a given
6023c91d114SIoana Ciornei  switch
6033c91d114SIoana Ciornei
6043c91d114SIoana Ciornei- ``get_regs``: ethtool function returning the Ethernet switch internal register
6053c91d114SIoana Ciornei  contents. This function might require user-land code in ethtool to
6063c91d114SIoana Ciornei  pretty-print register values and registers
6073c91d114SIoana Ciornei
6083c91d114SIoana CiorneiPower management
6093c91d114SIoana Ciornei----------------
6103c91d114SIoana Ciornei
6113c91d114SIoana Ciornei- ``suspend``: function invoked by the DSA platform device when the system goes to
6123c91d114SIoana Ciornei  suspend, should quiesce all Ethernet switch activities, but keep ports
6133c91d114SIoana Ciornei  participating in Wake-on-LAN active as well as additional wake-up logic if
6143c91d114SIoana Ciornei  supported
6153c91d114SIoana Ciornei
6163c91d114SIoana Ciornei- ``resume``: function invoked by the DSA platform device when the system resumes,
6173c91d114SIoana Ciornei  should resume all Ethernet switch activities and re-configure the switch to be
6183c91d114SIoana Ciornei  in a fully active state
6193c91d114SIoana Ciornei
6203c91d114SIoana Ciornei- ``port_enable``: function invoked by the DSA slave network device ndo_open
6213c91d114SIoana Ciornei  function when a port is administratively brought up, this function should be
6223c91d114SIoana Ciornei  fully enabling a given switch port. DSA takes care of marking the port with
6233c91d114SIoana Ciornei  ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it
6243c91d114SIoana Ciornei  was not, and propagating these changes down to the hardware
6253c91d114SIoana Ciornei
6263c91d114SIoana Ciornei- ``port_disable``: function invoked by the DSA slave network device ndo_close
6273c91d114SIoana Ciornei  function when a port is administratively brought down, this function should be
6283c91d114SIoana Ciornei  fully disabling a given switch port. DSA takes care of marking the port with
6293c91d114SIoana Ciornei  ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
6303c91d114SIoana Ciornei  disabled while being a bridge member
6313c91d114SIoana Ciornei
6323c91d114SIoana CiorneiBridge layer
6333c91d114SIoana Ciornei------------
6343c91d114SIoana Ciornei
6353c91d114SIoana Ciornei- ``port_bridge_join``: bridge layer function invoked when a given switch port is
6363c91d114SIoana Ciornei  added to a bridge, this function should be doing the necessary at the switch
6373c91d114SIoana Ciornei  level to permit the joining port from being added to the relevant logical
6383c91d114SIoana Ciornei  domain for it to ingress/egress traffic with other members of the bridge.
6393c91d114SIoana Ciornei
6403c91d114SIoana Ciornei- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
6413c91d114SIoana Ciornei  removed from a bridge, this function should be doing the necessary at the
6423c91d114SIoana Ciornei  switch level to deny the leaving port from ingress/egress traffic from the
6433c91d114SIoana Ciornei  remaining bridge members. When the port leaves the bridge, it should be aged
6443c91d114SIoana Ciornei  out at the switch hardware for the switch to (re) learn MAC addresses behind
6453c91d114SIoana Ciornei  this port.
6463c91d114SIoana Ciornei
6473c91d114SIoana Ciornei- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
6483c91d114SIoana Ciornei  state is computed by the bridge layer and should be propagated to switch
6493c91d114SIoana Ciornei  hardware to forward/block/learn traffic. The switch driver is responsible for
6503c91d114SIoana Ciornei  computing a STP state change based on current and asked parameters and perform
6513c91d114SIoana Ciornei  the relevant ageing based on the intersection results
6523c91d114SIoana Ciornei
6535a275f4cSVladimir Oltean- ``port_bridge_flags``: bridge layer function invoked when a port must
6545a275f4cSVladimir Oltean  configure its settings for e.g. flooding of unknown traffic or source address
6555a275f4cSVladimir Oltean  learning. The switch driver is responsible for initial setup of the
6565a275f4cSVladimir Oltean  standalone ports with address learning disabled and egress flooding of all
6575a275f4cSVladimir Oltean  types of traffic, then the DSA core notifies of any change to the bridge port
6585a275f4cSVladimir Oltean  flags when the port joins and leaves a bridge. DSA does not currently manage
6595a275f4cSVladimir Oltean  the bridge port flags for the CPU port. The assumption is that address
6605a275f4cSVladimir Oltean  learning should be statically enabled (if supported by the hardware) on the
6615a275f4cSVladimir Oltean  CPU port, and flooding towards the CPU port should also be enabled, due to a
6625a275f4cSVladimir Oltean  lack of an explicit address filtering mechanism in the DSA core.
6635a275f4cSVladimir Oltean
6643c91d114SIoana CiorneiBridge VLAN filtering
6653c91d114SIoana Ciornei---------------------
6663c91d114SIoana Ciornei
6673c91d114SIoana Ciornei- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets
6683c91d114SIoana Ciornei  configured for turning on or off VLAN filtering. If nothing specific needs to
6693c91d114SIoana Ciornei  be done at the hardware level, this callback does not need to be implemented.
6703c91d114SIoana Ciornei  When VLAN filtering is turned on, the hardware must be programmed with
6713c91d114SIoana Ciornei  rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed
6723c91d114SIoana Ciornei  VLAN ID map/rules.  If there is no PVID programmed into the switch port,
6733c91d114SIoana Ciornei  untagged frames must be rejected as well. When turned off the switch must
6743c91d114SIoana Ciornei  accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are
6753c91d114SIoana Ciornei  allowed.
6763c91d114SIoana Ciornei
6773c91d114SIoana Ciornei- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
678f8843991SVladimir Oltean  (tagged or untagged) for the given switch port. If the operation is not
679f8843991SVladimir Oltean  supported by the hardware, this function should return ``-EOPNOTSUPP`` to
680f8843991SVladimir Oltean  inform the bridge code to fallback to a software implementation.
6813c91d114SIoana Ciornei
6823c91d114SIoana Ciornei- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
6833c91d114SIoana Ciornei  given switch port
6843c91d114SIoana Ciornei
6853c91d114SIoana Ciornei- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback
6863c91d114SIoana Ciornei  function that the driver has to call for each VLAN the given port is a member
6873c91d114SIoana Ciornei  of. A switchdev object is used to carry the VID and bridge flags.
6883c91d114SIoana Ciornei
6893c91d114SIoana Ciornei- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
6903c91d114SIoana Ciornei  Forwarding Database entry, the switch hardware should be programmed with the
6913c91d114SIoana Ciornei  specified address in the specified VLAN Id in the forwarding database
6923c91d114SIoana Ciornei  associated with this VLAN ID. If the operation is not supported, this
6933c91d114SIoana Ciornei  function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to
6943c91d114SIoana Ciornei  a software implementation.
6953c91d114SIoana Ciornei
6963c91d114SIoana Ciornei.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
6976fb44c43SGeert Uytterhoeven        of DSA, would be its port-based VLAN, used by the associated bridge device.
6983c91d114SIoana Ciornei
6993c91d114SIoana Ciornei- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
7003c91d114SIoana Ciornei  Forwarding Database entry, the switch hardware should be programmed to delete
7013c91d114SIoana Ciornei  the specified MAC address from the specified VLAN ID if it was mapped into
7023c91d114SIoana Ciornei  this port forwarding database
7033c91d114SIoana Ciornei
7043c91d114SIoana Ciornei- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback
7053c91d114SIoana Ciornei  function that the driver has to call for each MAC address known to be behind
7063c91d114SIoana Ciornei  the given port. A switchdev object is used to carry the VID and FDB info.
7073c91d114SIoana Ciornei
7083c91d114SIoana Ciornei- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
709f8843991SVladimir Oltean  a multicast database entry. If the operation is not supported, this function
710f8843991SVladimir Oltean  should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a
711f8843991SVladimir Oltean  software implementation. The switch hardware should be programmed with the
7123c91d114SIoana Ciornei  specified address in the specified VLAN ID in the forwarding database
7133c91d114SIoana Ciornei  associated with this VLAN ID.
7143c91d114SIoana Ciornei
7153c91d114SIoana Ciornei.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
7166fb44c43SGeert Uytterhoeven        of DSA, would be its port-based VLAN, used by the associated bridge device.
7173c91d114SIoana Ciornei
7183c91d114SIoana Ciornei- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
7193c91d114SIoana Ciornei  multicast database entry, the switch hardware should be programmed to delete
7203c91d114SIoana Ciornei  the specified MAC address from the specified VLAN ID if it was mapped into
7213c91d114SIoana Ciornei  this port forwarding database.
7223c91d114SIoana Ciornei
7233c91d114SIoana Ciornei- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback
7243c91d114SIoana Ciornei  function that the driver has to call for each MAC address known to be behind
7253c91d114SIoana Ciornei  the given port. A switchdev object is used to carry the VID and MDB info.
7263c91d114SIoana Ciornei
727*a9985444SVladimir OlteanLink aggregation
728*a9985444SVladimir Oltean----------------
729*a9985444SVladimir Oltean
730*a9985444SVladimir OlteanLink aggregation is implemented in the Linux networking stack by the bonding
731*a9985444SVladimir Olteanand team drivers, which are modeled as virtual, stackable network interfaces.
732*a9985444SVladimir OlteanDSA is capable of offloading a link aggregation group (LAG) to hardware that
733*a9985444SVladimir Olteansupports the feature, and supports bridging between physical ports and LAGs,
734*a9985444SVladimir Olteanas well as between LAGs. A bonding/team interface which holds multiple physical
735*a9985444SVladimir Olteanports constitutes a logical port, although DSA has no explicit concept of a
736*a9985444SVladimir Olteanlogical port at the moment. Due to this, events where a LAG joins/leaves a
737*a9985444SVladimir Olteanbridge are treated as if all individual physical ports that are members of that
738*a9985444SVladimir OlteanLAG join/leave the bridge. Switchdev port attributes (VLAN filtering, STP
739*a9985444SVladimir Olteanstate, etc) and objects (VLANs, MDB entries) offloaded to a LAG as bridge port
740*a9985444SVladimir Olteanare treated similarly: DSA offloads the same switchdev object / port attribute
741*a9985444SVladimir Olteanon all members of the LAG. Static bridge FDB entries on a LAG are not yet
742*a9985444SVladimir Olteansupported, since the DSA driver API does not have the concept of a logical port
743*a9985444SVladimir OlteanID.
744*a9985444SVladimir Oltean
745*a9985444SVladimir Oltean- ``port_lag_join``: function invoked when a given switch port is added to a
746*a9985444SVladimir Oltean  LAG. The driver may return ``-EOPNOTSUPP``, and in this case, DSA will fall
747*a9985444SVladimir Oltean  back to a software implementation where all traffic from this port is sent to
748*a9985444SVladimir Oltean  the CPU.
749*a9985444SVladimir Oltean- ``port_lag_leave``: function invoked when a given switch port leaves a LAG
750*a9985444SVladimir Oltean  and returns to operation as a standalone port.
751*a9985444SVladimir Oltean- ``port_lag_change``: function invoked when the link state of any member of
752*a9985444SVladimir Oltean  the LAG changes, and the hashing function needs rebalancing to only make use
753*a9985444SVladimir Oltean  of the subset of physical LAG member ports that are up.
754*a9985444SVladimir Oltean
755*a9985444SVladimir OlteanDrivers that benefit from having an ID associated with each offloaded LAG
756*a9985444SVladimir Olteancan optionally populate ``ds->num_lag_ids`` from the ``dsa_switch_ops::setup``
757*a9985444SVladimir Olteanmethod. The LAG ID associated with a bonding/team interface can then be
758*a9985444SVladimir Olteanretrieved by a DSA switch driver using the ``dsa_lag_id`` function.
759*a9985444SVladimir Oltean
7603c91d114SIoana CiorneiTODO
7613c91d114SIoana Ciornei====
7623c91d114SIoana Ciornei
7633c91d114SIoana CiorneiMaking SWITCHDEV and DSA converge towards an unified codebase
7643c91d114SIoana Ciornei-------------------------------------------------------------
7653c91d114SIoana Ciornei
7663c91d114SIoana CiorneiSWITCHDEV properly takes care of abstracting the networking stack with offload
7673c91d114SIoana Ciorneicapable hardware, but does not enforce a strict switch device driver model. On
7683c91d114SIoana Ciorneithe other DSA enforces a fairly strict device driver model, and deals with most
7693c91d114SIoana Ciorneiof the switch specific. At some point we should envision a merger between these
7703c91d114SIoana Ciorneitwo subsystems and get the best of both worlds.
7713c91d114SIoana Ciornei
7723c91d114SIoana CiorneiOther hanging fruits
7733c91d114SIoana Ciornei--------------------
7743c91d114SIoana Ciornei
7753c91d114SIoana Ciornei- allowing more than one CPU/management interface:
7763c91d114SIoana Ciornei  http://comments.gmane.org/gmane.linux.network/365657
777